Alibaba previews Qwen3.7-Max-Preview and Qwen3.7-Plus-Preview on Arena, ranking #6 lab in Text and #5 in Vision globally.
Key Takeaways
Qwen3.7-Max-Preview ranks #13 overall in Text Arena, with top-10 slots in Math, Expert, Software & IT, and Coding categories.
Qwen3.7-Plus-Preview ranks #16 in Vision Arena; vision capability is a second track Alibaba is actively competing on.
No model weights or API access announced yet; this is an Arena teaser ahead of a full series release.
Alibaba signals more models in the Qwen3.7 series are coming soon.
Hacker News Comment Review
Commenters running Qwen3.6 27B locally on a 3090 find it the first Qwen model stable enough for real use, though looping and chat-template quirks remain known issues.
There is clear disagreement on capability ceiling: some builders report strong agentic task completion with tools; others using the dense 27B variant say the gap vs. Opus or GPT-5.5 on real codebases is still large.
Open-weights continuity is a recurring concern; commenters value Alibaba’s release cadence but worry about policy changes once Chinese labs reach proprietary parity.
Benchmark reliability surfaced as a thread: any public leaderboard degrades fast via benchmaxxing DPO/RL, making filtered, private evals the only durable signal.
Notable Comments
@Aurornis: runs Qwen3.6 27B dense; says gap vs. Opus or GPT-5.5 on real codebases “is huge” despite strong home-run performance.
@vessenes: open benchmarks have “a very short life” due to benchmaxxing; private evals also leak over time, making fair ranking structurally hard.