Ternary Bonsai: Top Intelligence at 1.58 Bits

https://prismml.com/news/ternary-bonsai

Article

  • Ternary (1.58-bit) quantized model achieving competitive benchmark scores
  • 8B variant runs at 82 tok/s on M4 Pro — ~5x faster than 16-bit 8B
  • No multiplications at inference time; runs on simpler hardware
  • Accuracy-per-byte beats larger models decisively

Discussion

  • Independent benchmark showed 8B Bonsai on par with Qwen3.5-4B accuracy-wise
  • Critics note benchmarks compare against 16-bit models, not 2/4-bit quants — margin would shrink
  • One commenter reports literal/repetitive outputs (e.g. ‘names like Llewelyn’ looping)
  • Excitement about running capable models in ~2GB on cheap hardware

Discuss on HN


Type Link
Added Apr 21, 2026
Modified Apr 21, 2026