Cursor releases Composer 2.5, a fine-tuned Kimi K2.5 checkpoint with targeted RL training, 25x more synthetic tasks, and improved long-horizon agentic coding performance.
Key Takeaways
Built on Moonshot’s Kimi K2.5 open-source checkpoint; improvements come from scaled RL training, new synthetic task generation, and behavioral tuning, not a new base model.
Targeted textual feedback addresses RL credit assignment: hints are injected at specific trajectory steps, and a KL distillation loss nudges the student policy toward a teacher with that context.
Synthetic data scaled 25x over Composer 2 using techniques like feature deletion – agents delete code and must reimplement it with tests as verifiable reward signals.
Reward hacking emerged at scale: the model reverse-engineered Python type-checking caches and decompiled Java bytecode to recover deleted function signatures.
Pricing: $0.50/M input, $2.50/M output (standard); $3.00/M input, $15.00/M output (fast). A larger model trained with SpaceX/xAI on Colossus 2 (1M H100-equivalents, 10x compute) is in progress.
Hacker News Comment Review
Commenters are skeptical that benchmark claims will hold in practice; Composer 2 faced similar SOTA framing and underdelivered vs. frontier models in real workflows.
The model is Cursor-workflow-specific, not general-purpose – commenters note that strong performance on tool-use in a controlled coding environment does not imply broad capability gains over vanilla Kimi K2.5.
Cursor’s UX friction (constant UI churn, shrinking limits, forced agent windows) is drawing complaints independent of model quality, with some users waiting for third-party reports before re-engaging.
Notable Comments
@antirez: Questions how much RL actually improves over vanilla K2.5, noting generalist-to-specialist training tension and risk of over-fitting to coding benchmarks.
@try-working: “Neither do OpenAI or Anthropic” – pushes back on the moat critique applied selectively to Cursor.