Simon Willison finds his own boundary between vibe coding and agentic engineering dissolving as Claude Code reliably handles routine tasks without his review.
Key Takeaways
Willison’s prior distinction: vibe coding ignores code quality; agentic engineering applies 25 years of engineering judgment with AI as amplifier.
The blur point: for well-defined tasks like JSON API endpoints with tests, he no longer reviews output, creating a “normalization of deviance” risk.
His mental model shift: treat Claude Code like a trusted internal team – black-box until problems surface, then dig into the repo.
Bottleneck has moved upstream: SDLC, design reviews, and QA processes were all calibrated for ~200 LOC/day; none of those processes are recalibrated yet.
Signal for evaluating AI-generated repos has collapsed – 100-commit repos with full test coverage and READMEs can now be produced in 30 minutes, making usage history the only trustworthy quality signal.
Hacker News Comment Review
Commenters split on whether the LOC framing is useful at all; the stronger reading is that LOC matters here as a measure of review burden, not output quality.
A recurring concern: AI errors have shifted from obvious compile failures to subtle edge-case bugs, security holes, and architectural drift – harder to catch precisely when review frequency drops.
Skepticism about Claude Code quality trends surfaced alongside the trust discussion, with some users reporting degraded agentic output recently rather than improvement.
Notable Comments
@underdeserver: frames code review like grading math homework – perfect code is fast to review; it’s the semi-coherent output that consumes time, so skipping review only saves time when the agent is already reliable.
@zarzavat: argues models have not become more trustworthy, errors are just harder to spot – compiling and passing tests no longer rules out wrong behavior or security vulnerabilities.