My perspective on Andrej Karpathy's "Software Is Changing (Again)"

The Three Paradigms of Software

Karpathy's Framework:

Andrej talks about how Software is fundamentally changing for the first time in 70 years, having changed approximately twice in recent years

Software 1.0: Traditional code written by humans for computers
Software 2.0: Neural network weights (not written directly but tuned through datasets and optimizers)
Software 3.0: Prompts in English that program Large Language Models (LLMs)

My Perspective:

As an engineering practitioner, I see fundamental challenges in "prompt engineering" that threaten its scalability and reliability:

Standardization Crisis: Unlike traditional programming languages with formal syntax, natural language prompts suffer from inherent ambiguity. While researchers have identified 26 design principles for prompt engineering, we lack universal standards comparable to programming language specifications.

Cross-model and Version Compatibility: Currently, "there is no such thing as prompt portability"—changing models or even versions requires re-evaluating and re-tuning all prompts. Each LLM requires model-specific prompt strategies due to different training methodologies and architectures.

Intellectual Property Ambiguity: Companies struggle to protect prompt-based innovations. While prompts may be patentable under certain conditions, trade secret protection often proves more practical given detection difficulties. This raises concerns about how we share instruction sets/memory banks and monetization of code/applications in the long term.

LLMs as New Computing Paradigms

Economic/Infrastructure Paradigm

LLMs require massive capital investment, creating natural monopolies that serve users through metered, centralized access.

Utilities Model: LLMs require capital expenditure to build (like electricity grids), operational expenditure to serve, and provide metered access.
Semiconductor Fab Model: LLMs have characteristics of semiconductor fabrication plants where "the cost of building and training these models is huge,
1960s Mainframe Era: The expensive compute forces centralization in cloud infrastructure with time-sharing and batch processing models.

Functional Paradigm (LLM as OS)

This is the truly revolutionary concept: LLMs are "becoming complex software ecosystems, the 'core' of modern applications" similar to how operating systems control and run applications

The OS paradigm suggests LLMs won't just be tools we use, but the foundational layer that all software runs on - similar to how every application today runs on top of Windows, macOS, or Linux.

My Perspective:

While I find this OS paradigm fascinating, in my opinion, in the short to medium term, LLMs won't replace deterministic computing fundamentals but will fill gaps where rule-based systems fail and humans currently make contextual judgment calls in large-scale operations.

Current LLMs are not suitable for core infrastructure where reliability and predictability are paramount, but provide significant ROI for areas where human judgment is used such as Operations, Business Process Management, Customer Service, Resource optimization

Psychology and Limitations of LLMs

Karpathy's "People Spirits" Framework:

Karpathy's framework is particularly valuable because he frames these as emergent psychological patterns rather than technical limitations. His "people spirits" concept suggests LLMs exhibit coherent psychological profiles that can be understood and worked with, rather than random technical quirks to be engineered around. He identifies

Jagged Intelligence - performing "extremely impressive tasks while simultaneously struggle with some very dumb problems"
Anterograde Amnesia - "they don't consolidate or build long-running knowledge or expertise once training is over and all they have is short-term memory (context window)"
Hallucinations - purely generating random information
Security vulnerabilities/gullibility - susceptibility to prompt injection
Lack of Cognitive Self-Knowledge - "the present lack of 'cognitive self-knowledge'" as a major limitation

My Perspective:

There are additional limitations from AI safety and alignment research that we should be aware of:

Sycophancy and People-Pleasing: Excessively agree with users and tell them what they want to hear rather than maintaining consistent, accurate positions
Confabulation: Create plausible-sounding but false information that fits the conversational context, rather than just hallucinations
Context Collapse: Inappropriately blend different social contexts (formal/informal, expert/novice) within single conversations instead of maintaining boundaries
Absence of Genuine Uncertainty: Express linguistic confidence markers but lack true epistemic humility—they can't genuinely recognize their knowledge limits
Anthropomorphic Projection: Users unconsciously attribute human-like consciousness and intentionality to LLMs, creating unrealistic expectations about their capabilities and understanding

As a Product Manager, you need to understand these LLM limitations and ensure that we have enough checks and balances built to counter these issues.

Designing LLM Apps with Partial Autonomy

Karpathy's Design Principles:

Successful LLM applications like Cursor excel by providing intuitive GUIs that enable fast human verification of AI work
"Autonomy Sliders" that let users control how much independence to grant the AI
Keep AI "on the leash" with human oversight rather than pursuing full autonomy
Warning against overoptimism about autonomous agents
Build "Iron Man suits" (human augmentation) over "Iron Man robots" (full replacement)

My Perspective:

As an engineering manager, we've prioritized code generation over validation, creating codebases with testing which is done as an after thought. The AI era should inverts this paradigm.

AI agents can generate acceptance tests directly from business requirements, then write code to pass those specifications. True CI/CD emerges when AI orchestrates the entire pipeline—generating, testing, monitoring logs, triggering automatic rollbacks, fixing issues, verification and redeployment.

With AI, verification becomes cheaper than not verifying, fundamentally changing quality economics.Truth shifts from residing in code to living in verification systems, making software engineering verification-centric rather than generation-focused.

Vibe Coding & Building Digital Infrastructure

Karpathy's Vision:

"Vibe coding" refers to programming in natural language, making everyone a potential programmer
While code generation became easy with LLM assistance, making applications "real" (authentication, payments, deployment) was still challenging and time-consuming
Need to build infrastructure specifically for AI agents as a new category of digital information consumers

My Perspective:

This is where the opportunities lie for the next big billion-dollar business: How do we build infrastructure for deploying these AI-generated codebases at scale?

Key Infrastructure Needs:

Deployment platforms optimized for AI-generated code
Monitoring and observability tools that understand prompt-driven applications
Security frameworks that account for natural language attack vectors
Version control systems that can track prompt evolution alongside code changes
Testing frameworks that can validate both deterministic and probabilistic system behaviors

The companies that solve the "last mile" problem—making AI-generated applications production-ready—will capture enormous value in this transition.

Gray Smoke