Prefill-as-a-Service:KVCache of Next-Generation Models Could Go Cross-Datacenter

Apr 22, 2026 · ai llm · Source ↗

Article

TL;DR

Paper proposes CDN-style cross-datacenter KV cache sharing to eliminate redundant LLM prefill compute.

Key Takeaways

KV caches are massive, time-sensitive, and per-user — analogous to live video CDN per session
Cross-datacenter prefill sharing could eliminate redundant cost for shared system prompts at scale
Top commenter calls it standard caching applied to LLMs — not fundamentally new architecture

Discussion

Top comments:

[martinald]: Standard CDN caching concepts applied to LLMs — time-sensitive huge per-user files

Sort of reminds me of video streaming on CDNs for live video (but per user)?