Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter
https://arxiv.org/abs/2604.15039Article
TL;DR
Researchers propose sharing LLM KV caches across datacenters to eliminate redundant prefill compute.
Key Takeaways
- Reusing KV cache cross-DC avoids recomputing shared system prompts for every request
- Analogous to CDN caching for live video: per-user, highly time-sensitive, very large files
- Real win may come from time-of-use pricing arbitrage, not geographic cache distribution
Discussion
Top comments:
- [martinald]: Standard caching applied to LLMs; biggest win likely from time-of-use pricing, not geography
| Type | Link |
| Added | Apr 22, 2026 |
| Modified | Apr 22, 2026 |
| comments | 1 |
| hn_id | 47822117 |
| score | 41 |
| target_url | https://arxiv.org/abs/2604.15039 |