ZAYA1-8B Matches DeepSeek-R1 on Math with Less Than 1B Active Parameters

May 7, 2026 · ai ai-agents coding · Source ↗

TLDR

Zyphra’s ZAYA1-8B runs at 760M active parameters (8.4B total MoE) and matches DeepSeek-R1 on AIME 2025 and HMMT math benchmarks.

MoE architecture activates only 760M of 8.4B parameters per token, keeping inference cost near sub-1B dense model levels.
Markovian RSA generates parallel reasoning traces in chunks, discarding only tail context to keep the window bounded – enabling scaling with compute budget rather than hitting a fixed ceiling.
Co-design matters: applying Markovian RSA to Qwen3-4B without co-training produced significantly smaller gains, so the method is not plug-and-play.
Agentic benchmarks are weak – BFCL-V4 at 39.22 and TAU2 at 43.12 trail Qwen3-4B-Thinking by ~10 points; not suitable for tool-calling or multi-step agent workflows.
Trained end-to-end on a 1,024-node AMD Instinct MI300X cluster using IBM and AMD Pensando Pollara interconnect – the most capable model publicly demonstrated on AMD hardware.

Commenters agree the agentic gap is the real blocker for coding harness adoption; most production coding agents rely on tool calls for context gathering, where ZAYA1-8B currently underperforms.
Practical friction noted for local deployment: standard vLLM fails, requiring Zyphra’s own fork – worth verifying before integrating into existing inference stacks.
Broader sentiment leans toward small-model optimism, with Qwen3 27B already cited as a working single-GPU agentic coding option, framing ZAYA1-8B as part of a real trend rather than an outlier.

@sva_: Notes potential to improve Markovian RSA beyond fixed tail-token cutting with a tunable parameter, suggesting the inference method is not yet fully optimized.