Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter
https://arxiv.org/abs/2604.15039Article
TL;DR
Proposes distributing KV cache prefill across datacenters like CDN edge caching to cut inference latency.
Key Takeaways
- Treats LLM prefill as a CDN problem: time-sensitive, huge files, scoped per user
- Cross-datacenter KV cache sharing could significantly reduce redundant compute on repeated prompts
- Cache invalidation and per-user scoping make this far harder than standard CDN caching
Discussion
Top comments:
- [martinald]: Essentially standard CDN caching applied to LLM prefill — time-sensitive and per-user scoped
| Type | Link |
| Added | Apr 22, 2026 |
| Modified | Apr 22, 2026 |
| comments | 1 |
| hn_id | 47822117 |
| score | 43 |
| target_url | https://arxiv.org/abs/2604.15039 |