Andrej Karpathy on Vibe Coding, Agentic Engineering, and Software 3.0

Apr 29, 2026 · ai-agents · Source ↗

Published 2026-04-29 - Runtime about 30 min - Watch on YouTube

TLDR

Karpathy says December marked a sharp shift: agentic tools started producing chunks of code he no longer needed to correct.
He argues software is moving from code and data toward prompts, context, and neural nets doing the heavy lifting.

Key Takeaways

Karpathy calls vibe coding a floor-raising shift, while agentic engineering is about preserving professional software quality.
He says the highest-value human skills now are taste, judgment, oversight, and spec design.
Verifiable tasks like math and code attract the strongest model gains because reinforcement learning can target them directly.
He believes almost everything may become automatable, but some domains will be easier to verify than others.
He describes models as jagged, statistical ghosts rather than animals with intrinsic motivation or curiosity.

Notes

Karpathy says he felt both exhilarated and unsettled when latest models began producing usable code chunks without repeated human correction.
He ties the shift to December, when agentic workflows started feeling fundamentally better than earlier ChatGPT-adjacent usage.
Software 1.0 is explicit code, software 2.0 is trained weights, and software 3.0 is prompting plus context as the programming lever.
In this framing, the LLM is the interpreter and the context window is the control surface.
He uses installer text as an example of the new paradigm: instead of a shell script, you hand an agent instructions to execute.
MenuGen showed him that many apps are old-paradigm scaffolding around tasks a model can now do directly.
He points to Gemini plus Nano Banana as a more native approach for overlaying menu items directly on an image.
He extends software 3.0 beyond code to general information processing, including building LLM-generated knowledge bases and wikis.
He imagines a future where neural nets are the host process and CPUs become more like co-processors.
Verifiability matters because frontier labs train models with reinforcement learning rewards, and models peak in domains where outputs can be checked.
He cites jagged behavior: models can refactor huge codebases or find vulnerabilities, yet still misjudge simple physical reasoning questions.
He says founders should look for valuable RL environments they can build themselves, then fine-tune against them.