Simon Willison on the AI coding inflection point and dark factories
Published 2026-04-02 - Runtime about 100 min - Watch on YouTube
TLDR
- Simon Willison says November 2025 crossed a threshold: coding agents moved from mostly working to reliably following instructions.
- Agentic engineering now means using agents with red/green TDD, templates, and parallel work, while human review shifts to higher-level judgment.
Key Takeaways
- Willison expects the next big leap to be dark factory software, where code is generated, tested, and QA’d without direct human review.
- He predicts 50% of engineers will be writing 95% AI-generated code by the end of 2026.
- Prompt injection remains unsolved; Willison says only architecture like Google DeepMind’s CAMEL-style quarantining can reduce risk.
- OpenClaw shows demand for personal assistants, but also how quickly security and data-access risks get normalized.
Notes
- Willison says Anthropic and OpenAI spent 2025 optimizing coding, and reasoning models like OpenAI’s o1 helped make code generation much stronger.
- He pins the inflection point on November 2025, naming GPT-5.1 and Claude Opus 4.5 as the models that crossed the threshold.
- Before that point, coding agents often produced code that mostly worked; after it, they usually followed instructions and produced usable software.
- He now writes about 95% of his code without typing it himself and often works from his phone while walking the dog.
- He distinguishes vibe coding from agentic engineering: vibe coding means not looking at code, while agentic engineering uses agents to build production software.
- He argues vibe coding is fine for personal prototypes, but unsafe for software that can harm other people or external systems.
- The hardest frontier is building software that is better than before, not just faster to produce.
- He says AI makes UI prototypes almost free, so product teams can explore three directions quickly before choosing one to test with humans.
- Human usability testing still matters more than AI simulating users, because real people reveal where prototypes fail.
- He says coding agents intensify mental load: he can run four agents in parallel, but is often wiped out by 11:00 a.m.
- His practical patterns include red/green TDD and starting every project from a thin template with style, boilerplate, and a single test.
- He says tests can now be very verbose because updating thousands of test lines is the agent’s job, not his.