Simon Willison on the AI coding inflection point and dark factories

· ai · Source ↗

Published 2026-04-02 - Runtime about 100 min - Watch on YouTube

TLDR

  • Simon Willison says November 2025 crossed a threshold: coding agents moved from mostly working to reliably following instructions.
  • Agentic engineering now means using agents with red/green TDD, templates, and parallel work, while human review shifts to higher-level judgment.

Key Takeaways

  • Willison expects the next big leap to be dark factory software, where code is generated, tested, and QA’d without direct human review.
  • He predicts 50% of engineers will be writing 95% AI-generated code by the end of 2026.
  • Prompt injection remains unsolved; Willison says only architecture like Google DeepMind’s CAMEL-style quarantining can reduce risk.
  • OpenClaw shows demand for personal assistants, but also how quickly security and data-access risks get normalized.

Notes

  • Willison says Anthropic and OpenAI spent 2025 optimizing coding, and reasoning models like OpenAI’s o1 helped make code generation much stronger.
  • He pins the inflection point on November 2025, naming GPT-5.1 and Claude Opus 4.5 as the models that crossed the threshold.
  • Before that point, coding agents often produced code that mostly worked; after it, they usually followed instructions and produced usable software.
  • He now writes about 95% of his code without typing it himself and often works from his phone while walking the dog.
  • He distinguishes vibe coding from agentic engineering: vibe coding means not looking at code, while agentic engineering uses agents to build production software.
  • He argues vibe coding is fine for personal prototypes, but unsafe for software that can harm other people or external systems.
  • The hardest frontier is building software that is better than before, not just faster to produce.
  • He says AI makes UI prototypes almost free, so product teams can explore three directions quickly before choosing one to test with humans.
  • Human usability testing still matters more than AI simulating users, because real people reveal where prototypes fail.
  • He says coding agents intensify mental load: he can run four agents in parallel, but is often wiped out by 11:00 a.m.
  • His practical patterns include red/green TDD and starting every project from a thin template with style, boilerplate, and a single test.
  • He says tests can now be very verbose because updating thousands of test lines is the agent’s job, not his.