Ternary Bonsai: Top Intelligence at 1.58 Bits

https://prismml.com/news/ternary-bonsai

Article

Ternary (1.58-bit) quantized model achieving competitive benchmark scores
8B variant runs at 82 tok/s on M4 Pro — ~5x faster than 16-bit 8B
No multiplications at inference time; runs on simpler hardware
Accuracy-per-byte beats larger models decisively

Discussion

Independent benchmark showed 8B Bonsai on par with Qwen3.5-4B accuracy-wise
Critics note benchmarks compare against 16-bit models, not 2/4-bit quants — margin would shrink
One commenter reports literal/repetitive outputs (e.g. ‘names like Llewelyn’ looping)
Excitement about running capable models in ~2GB on cheap hardware

Type	Link
Added	Apr 21, 2026
Modified	Apr 21, 2026

🔥 Top Stories 394 items