Learnings from 100K Lines of Rust with AI (2025)

· coding systems ai · Source ↗

TLDR

  • One engineer rebuilt Azure’s multi-Paxos RSL consensus engine in Rust using AI agents, writing 130K lines in ~6 weeks and boosting throughput from 23K to 300K ops/sec in 3 weeks.

Key Takeaways

  • AI-generated code contracts (preconditions, postconditions, invariants) caught a subtle Paxos safety violation before production; GPT-5 High outperformed Opus 4.1 for contract quality.
  • Workflow: Claude Code and Codex CLI run async overnight tasks; single user stories are the optimal unit for agent execution without context loss.
  • Perf loop: AI instruments latency, writes Python quantile scripts, proposes fixes, re-measures; key wins from zero-copy, removing async overhead, and eliminating lock contention.
  • Lightweight SDD via /specify and /clarify (spec kit) replaced rigid requirement/design/task markdown chains that became inconsistent under iteration.
  • 1,300+ tests cover unit, minimal integration (proposer+acceptor), and full multi-replica with injected failures; tests represent 65%+ of the codebase.

Hacker News Comment Review

  • Commenters are skeptical about test depth: for a distributed consensus system, 1,300 tests across 130K lines is considered severely inadequate; the expectation is more test code than production code by an order of magnitude.
  • AI Rust lifetime handling is a real friction point; models default to .clone() spam rather than proper lifetime resolution, which limits raw throughput claims.
  • A recurring critique is lack of production validation: the implementation’s value is unverifiable without evidence it is deployed and running in Azure or equivalent.

Original | Discuss on HN