Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model
TLDR
- Qwen3.6-27B (55.6GB) outperforms Qwen3.5-397B-A17B (807GB) on coding benchmarks and runs locally at ~25 tokens/s in a 16.8GB Q4_K_M quantized form.
Key Takeaways
- The model beats Qwen3.5-397B-A17B across all major coding benchmarks despite being a dense 27B vs. a 397B-total MoE architecture.
-
Quantized to Q4_K_M via Unsloth, the model fits in 16.8GB and runs with llama-server via
brew install llama.cpp. - Simon Willison tested it locally: a complex SVG generation task produced 4,444 tokens in 2min 53s at 25.57 t/s.
- A second SVG test (6,575 tokens, 4min 25s, 24.74 t/s) confirmed consistent throughput on a consumer machine.
- The 14x size reduction (807GB to 55.6GB full model) is the headline efficiency gain versus the prior open-source flagship.
Why It Matters
- A 16.8GB model that matches or exceeds an 807GB model lowers the bar for local agentic coding to a single consumer GPU or Mac.
- Qwen3.6-27B is runnable today with public tools (llama.cpp, Unsloth GGUF, Hugging Face) using a documented command-line recipe.
Simon Willison, Simon Willison’s Weblog · 2026-04-22 · Read the original