Kimi K2.6 just beat Claude, GPT-5.5, and Gemini in a coding challenge

· ai coding · Source ↗

TLDR

  • Moonshot AI’s open-weights Kimi K2.6 won a 10-model real-time programming contest (Word Gem Puzzle), outscoring Claude Opus 4.7, GPT-5.5, and Gemini Pro 3.1.

Key Takeaways

  • Kimi K2.6 finished 7-1-0 with 22 match points; Xiaomi’s MiMo V2-Pro was second; all Western frontier labs placed third or lower.
  • Kimi won via greedy tile-sliding: score each move by new positive-value words unlocked, execute best, repeat – highest cumulative score (77) in the tournament.
  • MiMo never slid once; it blasted claims from the initial grid in one TCP packet, scoring only on boards where seed words survived the scramble.
  • Claude and Grok also never slid, which collapsed their scores on 30x30 grids where reconstruction was the only path to points.
  • Kimi K2.6 scores 54 on the Artificial Analysis Intelligence Index vs. GPT-5.5 at 60 and Claude at 57 – open-weights within a few index points of closed frontier models.

Hacker News Comment Review

  • Commenters broadly flagged that a single novel-protocol contest is a narrow signal; performance on 3D spatial reasoning, long-context, and tool-use tasks tells a different story for Kimi.
  • Debate split between “open-weights parity is real and accelerating” and “task-specific wins don’t generalize” – both sides cite their own internal evals with contradictory conclusions.
  • Practical operators noted the scoring penalty for short words as a proxy for instruction-following under structured constraints, with Muse’s -15,309 score cited as a concrete failure mode worth watching for production deployments.

Notable Comments

  • @sieve: Reports Kimi consistently beat Sonnet on a real C+Python compiler/VM project on OpenCode Go plan, never hitting context limits.
  • @ponyous: Kimi fails on 3D model code generation evals – “lacks spatial understanding and makes many more code errors before it succeeds.”

Original | Discuss on HN