Qwen3.7-Max: The Agent Frontier

· ai-agents coding ai · Source ↗

TLDR

  • Alibaba’s Qwen3.7-Max is a proprietary agent-focused model claiming top-tier scores on SWE-Bench, MCP, and reasoning benchmarks, available soon via Alibaba Cloud Model Studio.

Key Takeaways

  • Beats or matches Claude Opus 4.6 and DeepSeek-V4-Pro on SWE-Verified (80.4), GPQA Diamond (92.4), Terminal Bench 2.0 (69.7), and MCP-Atlas (76.4).
  • In a 35-hour autonomous run with 1,158 tool calls, it achieved 10x geometric mean speedup on SGLang’s Extend Attention kernel on unseen T-Head ZW-M890 PPU hardware.
  • Cross-harness generalization is a core design goal: training decouples Task, Harness, and Verifier so the model performs consistently across Claude Code, OpenClaw, Qwen Code, and custom scaffolds.
  • Environment scaling (expanding diversity of agentic training environments) drives capability gains that generalize to out-of-domain benchmarks, not just tuned eval sets.
  • No open weights yet; API access on Alibaba Cloud Model Studio listed as “coming soon.”

Hacker News Comment Review

  • Commenters broadly flag cherry-picked comparisons: benchmarks use Opus 4.6 and older baselines while skipping GPT-4.7, Claude 4.7, and GPT-o3, which are available and likely stronger.
  • Community interest is high for open-weight releases in the 60-150B range, particularly a MoE variant around 120B-a14B for prosumer hardware.
  • No real-world coding agent usage reports surfaced in discussion, leaving benchmark claims unvalidated by practitioners.

Notable Comments

  • @tarruda: Specifically calls out desire for open-weight 122B and 397B releases from Qwen.

Original | Discuss on HN