Prefill-as-a-Service:KVCache of Next-Generation Models Could Go Cross-Datacenter
Article
TL;DR
Sharing KV cache prefill results across datacenters could slash redundant inference compute costs.
Key Takeaways
- Prefill is expensive; caching and distributing it like CDN video chunks reduces repeat costs
- Per-user, time-sensitive, massive-file constraints make this harder than standard CDN caching
- Time-of-use inference pricing may matter more than cross-DC prefill sharing in practice
Discussion
Top comments:
- [martinald]: Standard CDN caching logic but with time-sensitive, per-user, huge files