Nobody Reviews Compiler Output

May 7, 2026 · coding ai-agents · Source ↗

TLDR

Coding agents lack the upstream/downstream verification apparatus compilers rely on; the missing piece is that infrastructure, not human code review.

Compilers earn trust through type systems, reproducible builds, fuzzing, and sanitizers. Agents have none of that equivalent yet.
Michael Novati’s 417 PRs in a single day makes volumetric code review impossible, exposing a process gap, not just a scale gap.
Three layers are missing: formal specification agents execute against, AI-checks-AI CI pipelines, and production rollback culture applied to agent-generated changes.
Hardware chip verification is the closest model: black-box components validated by acceptance tests and dedicated test harness teams, not human inspection.
The post frames lights-out codebases as a design target, not just an inevitability, requiring formal specs, robust test infra, and fast rollback.

Commenters pushed back on the compiler analogy directly: compilers are not actually deterministic and do have codegen bugs, weakening the core trust argument.
The non-determinism gap is the sharpest technical objection: agent output can produce plausible-but-wrong code at high volume in ways compiler bugs cannot, making the analogy structurally incomplete.
A practical reductio surfaced: if formal specification layers are the answer, writing those specs may be harder than writing the program itself.

@zby: formal spec layers detailed enough for agents to execute against may simply be easier to just implement as code.
@secos: one team is already using the same AI to generate the formal proofs validating its AI-generated code.