Separating what AI does well from what code does well

May 3, 2026 · ai coding cloud · Source ↗

TLDR

Kepler Finance built a verifiable AI research platform on Claude by routing intent/reasoning to the LLM and all retrieval/computation to deterministic infrastructure, covering 26M+ SEC filings.

Architecture splits cleanly: Claude handles decomposition and ambiguity flagging; deterministic Rust/Python engines handle ratio computation, fiscal period resolution, and provenance.
Claude was chosen after benchmarking all frontier models; others dropped constraints by step four or five on multi-step plans, and picked an ambiguous term and continued rather than escalating.
Kepler built a proprietary financial ontology, idempotent workflow skills, and automated eval pipelines testing every prompt change against thousands of known-correct answers before production.
Multi-model routing uses Opus 4.7 for complex reasoning stages and Sonnet 4.6 for constrained high-throughput stages; specialized recall models score 94% on taxonomy mapping vs. 38-46% for general models.
Full audit logging, siloed environments, end-to-end provenance, and SOC 2 Type II were built from day one to satisfy financial compliance requirements.

Only one comment present, from a Kepler founder explicitly inviting technical pushback on the core architectural claim: LLM for intent, deterministic code for retrieval and computation, every number traced to source.
No substantive community debate, counterarguments, or implementation caveats have emerged yet.

@eddiehammond: Kepler founder surfaces the architectural argument directly and asks HN to stress-test it.