Alibaba releases Qwen3.7-Max, a proprietary agent-focused model claiming top-tier scores on SWE-Bench, MCP, and reasoning benchmarks via Alibaba Cloud Model Studio.
Key Takeaways
SWE-Verified 80.4 matches Claude Opus 4.6 (80.8) and DS-V4-Pro Max (80.6); SWE-Pro 60.6 leads all listed competitors.
35-hour fully autonomous kernel optimization run on T-Head ZW-M890 PPUs (unseen hardware) achieved 10x geometric mean speedup over SGLang Triton reference across 1,158 tool calls.
Cross-harness generalization via decoupled Task/Harness/Verifier rollout infra: consistent scores across Claude Code, OpenClaw, Qwen Code, and custom frameworks.
MCP-Atlas 76.4 and MCP-Mark 60.8 lead all listed models; SpreadSheetBench-v1 87.0 tops the office automation category.
Environment scaling from Qwen3.5 extended: diverse agentic training environments drive generalizable capability gains, not benchmark overfitting per out-of-domain eval results.
Hacker News Comment Review
Access friction is the top practical concern: Alibaba Cloud Model Studio only, no US hyperscaler partnership, and proxy services like OpenRouter are throttled for comparable models like DeepSeek V4.
Community is uncertain whether Qwen3.7-Max will get an open-weights release; prior pattern suggests Max-tier models stay proprietary, unlike Plus/smaller variants.
Builders already running Qwen3.6 locally via llama.cpp and OpenCode as a Claude Code fallback report it handles smaller tasks well, suggesting strong interest in open-weight successors.
Notable Comments
@eddyaipt: “Agents usually fail from silent state drift faster than from lack of reasoning depth” – practical framing for long-horizon agent reliability.