DeepSeek V4 — almost on the frontier, a fraction of the price
TLDR
- DeepSeek released two preview MoE models, V4-Pro (1.6T params, 49B active) and V4-Flash (284B, 13B active), both MIT-licensed and priced far below comparable frontier models.
Key Takeaways
- V4-Flash costs $0.14/M input and $0.28/M output, undercutting GPT-5.4 Nano ($0.20/$1.25) and every other small frontier model.
- V4-Pro costs $1.74/M input and $3.48/M output, cheaper than Gemini 3.1 Pro ($2/$12), GPT-5.4 ($2.50/$15), Claude Sonnet 4.6 ($3/$15), and Claude Opus 4.7 ($5/$25).
- Efficiency gains explain the pricing: V4-Pro uses only 27% of the single-token FLOPs and 10% of the KV cache of V3.2 at 1M-token context; V4-Flash reaches 10% FLOPs and 7% KV cache.
- DeepSeek’s own benchmarks show V4-Pro-Max trails GPT-5.4 and Gemini-3.1-Pro by roughly 3-6 months on standard reasoning benchmarks.
- V4-Pro at 865GB is the largest open-weights model released to date, surpassing Kimi K2.6 (1.1T) and GLM-5.1 (754B); V4-Flash at 160GB may run quantized on a 128GB M5 MacBook Pro.
Why It Matters
- The V4 pricing restructures the cost baseline for API-driven products: frontier-quality output is now available at sub-$2/M input rates via an open-weights model.
- Both models use a 1M-token context window with MoE architecture, making long-context workloads significantly cheaper to run than with any comparable Western model.
- MIT licensing means the weights can be fine-tuned, self-hosted, or quantized; community quantizations via Unsloth are expected shortly.
Simon Willison, Simon Willison’s Weblog · 2026-04-24 · Read the original