KV Cache Compression 900000x Beyond TurboQuant and Per-Vector Shannon Limit

https://arxiv.org/abs/2504.15356

Article

Claims KV cache compression exceeding 900,000x over prior state-of-the-art
Uses the model itself as the compression dictionary
Purports to exceed the per-vector Shannon entropy limit

Discussion

Commenters intrigued; argument hard to follow
Model-as-dictionary is the novel framing
Skepticism about extraordinary claims without clear proof

Type	Link
Added	Apr 21, 2026
Modified	Apr 21, 2026

🔥 Top Stories 394 items