https://prismml.com/news/ternary-bonsai
Article
-
Ternary (1.58-bit) quantized model achieving competitive benchmark scores
-
8B variant runs at 82 tok/s on M4 Pro — ~5x faster than 16-bit 8B
-
No multiplications at inference time; runs on simpler hardware
-
Accuracy-per-byte beats larger models decisively
Discussion
-
Independent benchmark showed 8B Bonsai on par with Qwen3.5-4B accuracy-wise
-
Critics note benchmarks compare against 16-bit models, not 2/4-bit quants — margin would shrink
-
One commenter reports literal/repetitive outputs (e.g. ‘names like Llewelyn’ looping)
-
Excitement about running capable models in ~2GB on cheap hardware
Discuss on HN