ZAYA1-8B: An 8B MoE Model with 760M Active Params Matching DeepSeek-R1 on Math

· ai ai-agents coding · Source ↗

TLDR

  • Zyphra’s ZAYA1-8B runs 760M active parameters, matches DeepSeek-R1 on math benchmarks, and was trained entirely on AMD MI300X hardware.

Key Takeaways

  • 8.4B total / 760M active MoE architecture beats Mistral Small 4 (119B total) on AIME 2026 (89.1 vs 86.4) and LiveCodeBench (65.8 vs 57.9).
  • Markovian RSA is a co-designed inference method: reasons in bounded chunks, seeds next round from trace tails, scales with compute budget without context window blowup.
  • Co-training is load-bearing: applying Markovian RSA to Qwen3-4B without co-training produced significantly smaller gains.
  • Agentic benchmarks lag badly: BFCL-V4 at 39.22 vs Qwen3-4B-Thinking’s 49.7; TAU2 at 43.12 vs 52.9. Not a drop-in for tool-calling workflows.
  • Local deployment requires Zyphra’s vLLM fork, not stock vLLM. Weights are Apache 2.0 on Hugging Face; serverless option on Zyphra Cloud.

Hacker News Comment Review

  • Commenters agree the agentic gap is the real blocker for replacing Claude or OpenAI in coding harnesses, since most coding agents rely on tool calls for context gathering before writing solutions.
  • Optimism exists that the trajectory is meaningful: Qwen3 27B is already enabling agentic coding on a single GPU, suggesting local-first replacement of frontier models is a near-term possibility, not a distant one.

Original | Discuss on HN