Qwen3.7-Max: The Agent Frontier

· ai-agents coding ai · Source ↗

TLDR

  • Alibaba releases Qwen3.7-Max, a proprietary agent-focused model claiming top-tier scores on SWE-Bench, MCP, and reasoning benchmarks via Alibaba Cloud Model Studio.

Key Takeaways

  • SWE-Verified 80.4 matches Claude Opus 4.6 (80.8) and DS-V4-Pro Max (80.6); SWE-Pro 60.6 leads all listed competitors.
  • 35-hour fully autonomous kernel optimization run on T-Head ZW-M890 PPUs (unseen hardware) achieved 10x geometric mean speedup over SGLang Triton reference across 1,158 tool calls.
  • Cross-harness generalization via decoupled Task/Harness/Verifier rollout infra: consistent scores across Claude Code, OpenClaw, Qwen Code, and custom frameworks.
  • MCP-Atlas 76.4 and MCP-Mark 60.8 lead all listed models; SpreadSheetBench-v1 87.0 tops the office automation category.
  • Environment scaling from Qwen3.5 extended: diverse agentic training environments drive generalizable capability gains, not benchmark overfitting per out-of-domain eval results.

Hacker News Comment Review

  • Access friction is the top practical concern: Alibaba Cloud Model Studio only, no US hyperscaler partnership, and proxy services like OpenRouter are throttled for comparable models like DeepSeek V4.
  • Community is uncertain whether Qwen3.7-Max will get an open-weights release; prior pattern suggests Max-tier models stay proprietary, unlike Plus/smaller variants.
  • Builders already running Qwen3.6 locally via llama.cpp and OpenCode as a Claude Code fallback report it handles smaller tasks well, suggesting strong interest in open-weight successors.

Notable Comments

  • @eddyaipt: “Agents usually fail from silent state drift faster than from lack of reasoning depth” – practical framing for long-horizon agent reliability.

Original | Discuss on HN