https://jchandra.com/posts/hae-ols/
Article
-
Proposes HAE-OLS: entropy-guided attention + OLS + SVD for KV cache compression
-
Targets high-fidelity reconstruction of dropped KV cache entries in LLMs
-
Low-rank reconstruction via SVD recovers information lost by standard Top-K eviction
Discussion
-
Commenters ask whether reconstruction error translates to real downstream task gains
-
Latency concern raised: OLS + SVD overhead vs. simple Top-K eviction
-
Surprise that SVD opportunity wasn’t caught sooner; entropy framing seen as the key unlock
Discuss on HN