Xiaomi releases MiMo-v2.5 Family weights with strong coding and agent benchmarks

· coding ai ai-agents · Source ↗

TLDR

  • Xiaomi’s MiMo-V2.5-Pro is now MIT-licensed on HuggingFace, benchmarking within half a point of Claude Opus 4.6 on SWE-Bench Pro at 57.2.

Key Takeaways

  • Three demos make the capability case: Peking University SysY Rust compiler (4.3 hrs, 233/233 hidden tests), a working video editor (11.5 hrs, 1,868 tool calls, 8,192 lines), and an ngspice analog LDO circuit iterated to spec in ~1 hour.
  • Self-correction under load is a notable property: during the compiler run a refactoring pass at turn 512 broke two tests; the model diagnosed and recovered without human intervention.
  • Long-context is architecturally addressed: hybrid Local Sliding Window Attention (128-token window) plus Global Attention layers cuts KV cache storage ~7x; GraphWalks scores are non-zero at 1M tokens where V2-Pro scored zero.
  • Token efficiency claim: ~70K tokens per ClawEval trajectory vs. an estimated 120K+ for Opus 4.6, Gemini 3.1 Pro, and GPT-5.4 at comparable pass rates – self-reported but meaningful if it holds in production.
  • Deployment path: 1.02T params FP8, SGLang recommended, vLLM supported; temperature 1.0 / top_p 0.95; works with Claude Code, OpenCode, and Kilo agentic scaffolds out of the box.

Hacker News Comment Review

  • The single comment confirms the open-source timing: weights had been available for roughly a week before the MIT-licensed HuggingFace drop, suggesting the benchmarks and demos preceded the public release window.
  • No substantive technical debate has formed yet around inference costs, benchmark methodology, or the self-reported token efficiency figures.

Original | Discuss on HN