Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview

· ai-agents coding ai · Source ↗

TLDR

  • Dirac, a Cline fork, topped TerminalBench-2 at 65.2% with gemini-3-flash-preview and costs 64.8% less than competing agents.

Key Takeaways

  • Beats Google’s official baseline (47.6%) and closed-source Junie CLI (64.3%) on TerminalBench-2 without benchmark-specific tuning or AGENTS.md files.
  • Achieves 8/8 task accuracy in internal evals at $0.18 average cost, versus $0.38-$0.73 for Cline, Kilo, Roo, and Opencode.
  • Cost and quality gains come from keeping context tightly curated: shorter context preserves model reasoning quality and cuts token spend simultaneously.
  • Core techniques: hash-anchored parallel edits, AST manipulation, and aggressive context pruning. Explicitly ships without MCP.
  • Available as a VS Code extension and npm CLI; supports Anthropic, OpenAI, Gemini, Groq, Mistral, xAI, and HuggingFace keys via env vars.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN