Prefill-as-a-Service:KVCache of Next-Generation Models Could Go Cross-Datacenter
https://arxiv.org/abs/2604.15039Article
TL;DR
Cross-datacenter KV cache sharing could slash LLM inference costs by reusing expensive prefill computation.
Key Takeaways
- Prefill dominates long-context inference cost; cross-DC caching could eliminate redundant computation
- Analogous to per-user live video CDN caching — huge files, time-sensitive, scoped per user
- Off-peak pricing economics may outweigh pure caching gains; paper may oversell novelty
Discussion
Top comments:
- [martinald]: Standard CDN caching applied to LLM prefill — novel in scale/timing, not in concept
| Type | Link |
| Added | Apr 22, 2026 |
| Modified | Apr 22, 2026 |
| comments | 1 |
| hn_id | 47822117 |
| score | 28 |
| target_url | https://arxiv.org/abs/2604.15039 |