Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter

https://arxiv.org/abs/2604.15039

Article

TL;DR

Researchers propose sharing LLM KV caches across datacenters to eliminate redundant prefill compute.

Key Takeaways

Reusing KV cache cross-DC avoids recomputing shared system prompts for every request
Analogous to CDN caching for live video: per-user, highly time-sensitive, very large files
Real win may come from time-of-use pricing arbitrage, not geographic cache distribution

Discussion

Top comments:

[martinald]: Standard caching applied to LLMs; biggest win likely from time-of-use pricing, not geography

Type	Link
Added	Apr 22, 2026
Modified	Apr 22, 2026
comments	1
hn_id	47822117
score	41
target_url	https://arxiv.org/abs/2604.15039

🔥 Top Stories 531 items