Kimi K2.6 just beat Claude, GPT-5.5, and Gemini in a coding challenge

May 3, 2026 · ai coding · Source ↗

TLDR

Moonshot AI’s open-weights Kimi K2.6 won Day 12 of an ongoing AI Coding Contest, finishing 7-1-0 with 22 match points across a real-time sliding-tile word puzzle.

Kimi K2.6 used a greedy slide loop: score each move by new positive-value words unlocked, execute best, repeat. Cumulative score: 77, highest in the tournament.
MiMo V2-Pro (Xiaomi) finished second by doing the opposite: never slid once, just bulk-claimed intact seed words from the initial board in a single TCP packet.
Claude Opus 4.7 and Grok Expert 4.2 also never slid, which capped their performance on 30x30 grids where scramble destroyed seed words and tile movement was the only path to points.
Muse claimed every word regardless of scoring penalties, finishing at -15,309 cumulative points; a bot that did nothing would have scored 15,309 points higher.
Kimi K2.6 scores 54 on the Artificial Analysis Intelligence Index vs. GPT-5.5 at 60 and Claude at 57, making it the closest open-weights model to closed-frontier performance yet.

Commenters broadly agree the real story is not Kimi beating Claude at coding generally, but that an open-weights model is now close enough in capability that task-specific strategy differences can flip the leaderboard.
Practitioners report Kimi outperforming Claude Sonnet on real C+Python compiler/VM projects, with lower cost and no context-window thrashing on the $20 Ollama cloud plan vs. Claude Pro.
Skeptics note the puzzle rewards aggressive claiming and active sliding, which may disadvantage heavily safety-tuned models, making this a task-design mismatch as much as a capability result.

@aykutseker: “This seems less like Kimi is better at coding than Claude and more like Kimi found the right strategy for this particular game.”
@pjerem: Notes Claude Pro $20 plan hits limits fast on side projects; Kimi via Ollama cloud plan stays usable, making cost-per-token the practical differentiator for indie builders.