Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

· ai coding · Source ↗

TLDR

  • Qwen3.6-27B (55.6GB) outperforms Qwen3.5-397B-A17B (807GB) on coding benchmarks and runs locally at ~25 tokens/s in a 16.8GB Q4_K_M quantized form.

Key Takeaways

  • The model beats Qwen3.5-397B-A17B across all major coding benchmarks despite being a dense 27B vs. a 397B-total MoE architecture.
  • Quantized to Q4_K_M via Unsloth, the model fits in 16.8GB and runs with llama-server via brew install llama.cpp.
  • Simon Willison tested it locally: a complex SVG generation task produced 4,444 tokens in 2min 53s at 25.57 t/s.
  • A second SVG test (6,575 tokens, 4min 25s, 24.74 t/s) confirmed consistent throughput on a consumer machine.
  • The 14x size reduction (807GB to 55.6GB full model) is the headline efficiency gain versus the prior open-source flagship.

Why It Matters

  • A 16.8GB model that matches or exceeds an 807GB model lowers the bar for local agentic coding to a single consumer GPU or Mac.
  • Qwen3.6-27B is runnable today with public tools (llama.cpp, Unsloth GGUF, Hugging Face) using a documented command-line recipe.

Simon Willison, Simon Willison’s Weblog · 2026-04-22 · Read the original