Andrej Karpathy on Vibe Coding, Agentic Engineering, and Software 3.0
Published 2026-04-29 - Runtime about 30 min - Watch on YouTube
TLDR
- Karpathy says December marked a sharp shift: agentic tools started producing chunks of code he no longer needed to correct.
- He argues software is moving from code and data toward prompts, context, and neural nets doing the heavy lifting.
Key Takeaways
- Karpathy calls vibe coding a floor-raising shift, while agentic engineering is about preserving professional software quality.
- He says the highest-value human skills now are taste, judgment, oversight, and spec design.
- Verifiable tasks like math and code attract the strongest model gains because reinforcement learning can target them directly.
- He believes almost everything may become automatable, but some domains will be easier to verify than others.
- He describes models as jagged, statistical ghosts rather than animals with intrinsic motivation or curiosity.
Notes
- Karpathy says he felt both exhilarated and unsettled when latest models began producing usable code chunks without repeated human correction.
- He ties the shift to December, when agentic workflows started feeling fundamentally better than earlier ChatGPT-adjacent usage.
- Software 1.0 is explicit code, software 2.0 is trained weights, and software 3.0 is prompting plus context as the programming lever.
- In this framing, the LLM is the interpreter and the context window is the control surface.
- He uses installer text as an example of the new paradigm: instead of a shell script, you hand an agent instructions to execute.
- MenuGen showed him that many apps are old-paradigm scaffolding around tasks a model can now do directly.
- He points to Gemini plus Nano Banana as a more native approach for overlaying menu items directly on an image.
- He extends software 3.0 beyond code to general information processing, including building LLM-generated knowledge bases and wikis.
- He imagines a future where neural nets are the host process and CPUs become more like co-processors.
- Verifiability matters because frontier labs train models with reinforcement learning rewards, and models peak in domains where outputs can be checked.
- He cites jagged behavior: models can refactor huge codebases or find vulnerabilities, yet still misjudge simple physical reasoning questions.
- He says founders should look for valuable RL environments they can build themselves, then fine-tune against them.