Nobody Reviews Compiler Output

· coding ai-agents · Source ↗

TLDR

  • Coding agents lack the upstream/downstream verification apparatus compilers rely on; the missing piece is that infrastructure, not human code review.

Key Takeaways

  • Compilers earn trust through type systems, reproducible builds, fuzzing, and sanitizers. Agents have none of that equivalent yet.
  • Michael Novati’s 417 PRs in a single day makes volumetric code review impossible, exposing a process gap, not just a scale gap.
  • Three layers are missing: formal specification agents execute against, AI-checks-AI CI pipelines, and production rollback culture applied to agent-generated changes.
  • Hardware chip verification is the closest model: black-box components validated by acceptance tests and dedicated test harness teams, not human inspection.
  • The post frames lights-out codebases as a design target, not just an inevitability, requiring formal specs, robust test infra, and fast rollback.

Hacker News Comment Review

  • Commenters pushed back on the compiler analogy directly: compilers are not actually deterministic and do have codegen bugs, weakening the core trust argument.
  • The non-determinism gap is the sharpest technical objection: agent output can produce plausible-but-wrong code at high volume in ways compiler bugs cannot, making the analogy structurally incomplete.
  • A practical reductio surfaced: if formal specification layers are the answer, writing those specs may be harder than writing the program itself.

Notable Comments

  • @zby: formal spec layers detailed enough for agents to execute against may simply be easier to just implement as code.
  • @secos: one team is already using the same AI to generate the formal proofs validating its AI-generated code.

Original | Discuss on HN