Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter

https://arxiv.org/abs/2604.15039

Article

TL;DR

Proposes distributing KV cache prefill across datacenters like CDN edge caching to cut inference latency.

Key Takeaways

Treats LLM prefill as a CDN problem: time-sensitive, huge files, scoped per user
Cross-datacenter KV cache sharing could significantly reduce redundant compute on repeated prompts
Cache invalidation and per-user scoping make this far harder than standard CDN caching

Discussion

Top comments:

[martinald]: Essentially standard CDN caching applied to LLM prefill — time-sensitive and per-user scoped

Type	Link
Added	Apr 22, 2026
Modified	Apr 22, 2026
comments	1
hn_id	47822117
score	43
target_url	https://arxiv.org/abs/2604.15039

🔥 Top Stories 543 items