KV Cache Compression 900000x Beyond TurboQuant and Per-Vector Shannon Limit

https://arxiv.org/abs/2504.15356

Article

  • Claims KV cache compression exceeding 900,000x over prior state-of-the-art
  • Uses the model itself as the compression dictionary
  • Purports to exceed the per-vector Shannon entropy limit

Discussion

  • Commenters intrigued; argument hard to follow
  • Model-as-dictionary is the novel framing
  • Skepticism about extraordinary claims without clear proof

Discuss on HN


Type Link
Added Apr 21, 2026
Modified Apr 21, 2026