Qwen 3.7 Preview

· ai · Source ↗

TLDR

  • Alibaba previews Qwen3.7-Max-Preview and Qwen3.7-Plus-Preview on Arena, ranking #6 lab in Text and #5 in Vision globally.

Key Takeaways

  • Qwen3.7-Max-Preview ranks #13 overall in Text Arena, with top-10 slots in Math, Expert, Software & IT, and Coding categories.
  • Qwen3.7-Plus-Preview ranks #16 in Vision Arena; vision capability is a second track Alibaba is actively competing on.
  • No model weights or API access announced yet; this is an Arena teaser ahead of a full series release.
  • Alibaba signals more models in the Qwen3.7 series are coming soon.

Hacker News Comment Review

  • Commenters running Qwen3.6 27B locally on a 3090 find it the first Qwen model stable enough for real use, though looping and chat-template quirks remain known issues.
  • There is clear disagreement on capability ceiling: some builders report strong agentic task completion with tools; others using the dense 27B variant say the gap vs. Opus or GPT-5.5 on real codebases is still large.
  • Open-weights continuity is a recurring concern; commenters value Alibaba’s release cadence but worry about policy changes once Chinese labs reach proprietary parity.
  • Benchmark reliability surfaced as a thread: any public leaderboard degrades fast via benchmaxxing DPO/RL, making filtered, private evals the only durable signal.

Notable Comments

  • @Aurornis: runs Qwen3.6 27B dense; says gap vs. Opus or GPT-5.5 on real codebases “is huge” despite strong home-run performance.
  • @vessenes: open benchmarks have “a very short life” due to benchmaxxing; private evals also leak over time, making fair ranking structurally hard.

Original | Discuss on HN