Prefill-as-a-Service:KVCache of Next-Generation Models Could Go Cross-Datacenter

Apr 22, 2026 · ai llm · Source ↗

Article

TL;DR

Paper proposes sharing LLM KV cache prefill results across datacenters to eliminate redundant compute.

Key Takeaways

Cross-datacenter KV cache sharing could cut inference cost on repeated popular prompts significantly
Constraints are extreme: time-sensitive, massive per-user files, scope-limited to single user
Off-peak pricing arbitrage may yield bigger wins than geographic prefill distribution

Discussion

Top comments:

[martinald]: This is standard CDN caching logic applied to per-user, time-sensitive, massive LLM files