The eighth-generation TPU: An architecture deep dive
Article
TL;DR
Google splits TPU 8 into dedicated training (8T) and inference (8I) chips.
Key Takeaways
- TPU 8 ships as two SKUs: 8T optimized for training, 8I for inference latency.
- Architectural split signals memory bandwidth now bottlenecks AI more than raw FLOPs.
- Minimal HN engagement — only one substantive comment on a major architecture post.
Discussion
Top comments:
-
[zshn25]: Splitting chips by workload signals memory bandwidth is now the real bottleneck
Splitting TPUs into dedicated training vs inference chips feels like an admission that the bottleneck has shifted from FLOPs to memory bandwidth + latency.