The cost math behind routing Claude Code through Ollama (~90% cut)

· devtools ai open-source · Source ↗

TLDR

  • Routes Claude Code terminal sessions through a local Ollama model (Gemma, Qwen, DeepSeek) while keeping Claude Desktop on Anthropic Pro for strategy and architecture work.

Key Takeaways

  • Two-engine split: Claude Desktop on Anthropic handles planning and code review; Claude Code in the terminal routes to Ollama for lints, refactors, and file batch ops.
  • Context-heavy terminal tasks drain Claude Pro quota fast – the setup offloads that load to a free local or cloud-hosted open-source model.
  • Setup is a single copy-paste prompt dropped into a fresh Claude Desktop session; auto-detects macOS, Windows+WSL2, and Linux and does ~98% of configuration.
  • Walkthrough is a 21-slide self-contained HTML file – no build step, runs locally or served from any host.
  • Claimed result: ~90% reduction in Claude Code billing by keeping only high-judgment work on the paid Anthropic endpoint.

Hacker News Comment Review

  • The only live comment questions whether the flagship use cases – lints, grep-and-replace, file batch ops – justify LLM tokens at all, given that sed and shell tools handle them natively for free.
  • The framing of “cutting your Claude Code bill” assumes those terminal tasks are currently running through Claude Code rather than standard Unix tooling, which commenters treat as a dubious baseline.

Notable Comments

  • @irishcoffee: “Grep-and-replace? You mean, sed? People burn tokens instead of using sed?” – challenges whether the core problem being solved is real.

Original | Discuss on HN