Separating what AI does well from what code does well

· ai coding cloud · Source ↗

TLDR

  • Kepler Finance built a verifiable AI research platform on Claude by routing intent/reasoning to the LLM and all retrieval/computation to deterministic infrastructure, covering 26M+ SEC filings.

Key Takeaways

  • Architecture splits cleanly: Claude handles decomposition and ambiguity flagging; deterministic Rust/Python engines handle ratio computation, fiscal period resolution, and provenance.
  • Claude was chosen after benchmarking all frontier models; others dropped constraints by step four or five on multi-step plans, and picked an ambiguous term and continued rather than escalating.
  • Kepler built a proprietary financial ontology, idempotent workflow skills, and automated eval pipelines testing every prompt change against thousands of known-correct answers before production.
  • Multi-model routing uses Opus 4.7 for complex reasoning stages and Sonnet 4.6 for constrained high-throughput stages; specialized recall models score 94% on taxonomy mapping vs. 38-46% for general models.
  • Full audit logging, siloed environments, end-to-end provenance, and SOC 2 Type II were built from day one to satisfy financial compliance requirements.

Hacker News Comment Review

  • Only one comment present, from a Kepler founder explicitly inviting technical pushback on the core architectural claim: LLM for intent, deterministic code for retrieval and computation, every number traced to source.
  • No substantive community debate, counterarguments, or implementation caveats have emerged yet.

Notable Comments

  • @eddiehammond: Kepler founder surfaces the architectural argument directly and asks HN to stress-test it.

Original | Discuss on HN