Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

· hn top · Source ↗

TLDR

  • Qwen3.6-27B runs locally on 32GB RAM at ~25 tok/s and handles 95% of real coding tasks for many developers.

Key Takeaways

  • Q4_K_M quantization fits in ~16.8GB; a 32GB machine is sufficient, 24GB gets ~91K context window.
  • Simon Willison benchmarked it on M5 Pro: 54 tok/s prefill, 25 tok/s generation – faster than expected for the size.
  • Apple Silicon users report qwen3.6:35b-a3b-nvfp4 via ollama runs well on 32GB M-series machines.
  • Closes meaningful ground on Claude Opus 4.7 for routine coding; Opus still leads on complex multi-step reasoning.
  • Open-source model pricing is a fraction of Anthropic/OpenAI API costs, intensifying the competitive squeeze on frontier labs.

Hacker News Comment Review

  • Consensus: Easter 2026 (Gemma 4 + Qwen3.6) marks a step-change in local model viability – the gap to hosted frontier models is narrowing fast but not closed.
  • Practical ceiling: commenters who run these daily still keep Opus as a fallback for tasks requiring sustained reliability; local models still “wander off.”
  • Hardware transparency is a recurring complaint – the community wants model announcements to lead with consumer GPU/RAM requirements and tok/s numbers, not just benchmarks.
  • KV-cache math is being done in the comments: 70MB per 1K context on 24GB at Q4_K_M; Q5 trades context headroom for quality.

Notable Comments

  • @simonw: Ran it on M5 Pro 128GB; benchmarked 25.57 tok/s generation on a 4,444-token output – “I like it better than the pelican I got from Opus 4.7.”
  • @syntaxing: Running Qwen3.6 35B and Gemma 4 26B fully local on M4 MBP – “it does 95% of what I need which is already crazy.”
  • @zkmon: Q4_K_M on llama.cpp yields ~91K context on 24GB VRAM; ~70MB per 1K context for KV-cache budgeting.
  • @jameson: Raises the moat question directly – open-source models at a fraction of Opus 4.6 pricing erode OpenAI/Anthropic’s margin buffer.

Original | Discuss on HN