Andrej Karpathy on Vibe Coding, Agentic Engineering, and Software 3.0

· ai-agents · Source ↗

Published 2026-04-29 - Runtime about 30 min - Watch on YouTube

TLDR

  • Karpathy says December marked a sharp shift: agentic tools started producing chunks of code he no longer needed to correct.
  • He argues software is moving from code and data toward prompts, context, and neural nets doing the heavy lifting.

Key Takeaways

  • Karpathy calls vibe coding a floor-raising shift, while agentic engineering is about preserving professional software quality.
  • He says the highest-value human skills now are taste, judgment, oversight, and spec design.
  • Verifiable tasks like math and code attract the strongest model gains because reinforcement learning can target them directly.
  • He believes almost everything may become automatable, but some domains will be easier to verify than others.
  • He describes models as jagged, statistical ghosts rather than animals with intrinsic motivation or curiosity.

Notes

  • Karpathy says he felt both exhilarated and unsettled when latest models began producing usable code chunks without repeated human correction.
  • He ties the shift to December, when agentic workflows started feeling fundamentally better than earlier ChatGPT-adjacent usage.
  • Software 1.0 is explicit code, software 2.0 is trained weights, and software 3.0 is prompting plus context as the programming lever.
  • In this framing, the LLM is the interpreter and the context window is the control surface.
  • He uses installer text as an example of the new paradigm: instead of a shell script, you hand an agent instructions to execute.
  • MenuGen showed him that many apps are old-paradigm scaffolding around tasks a model can now do directly.
  • He points to Gemini plus Nano Banana as a more native approach for overlaying menu items directly on an image.
  • He extends software 3.0 beyond code to general information processing, including building LLM-generated knowledge bases and wikis.
  • He imagines a future where neural nets are the host process and CPUs become more like co-processors.
  • Verifiability matters because frontier labs train models with reinforcement learning rewards, and models peak in domains where outputs can be checked.
  • He cites jagged behavior: models can refactor huge codebases or find vulnerabilities, yet still misjudge simple physical reasoning questions.
  • He says founders should look for valuable RL environments they can build themselves, then fine-tune against them.