The eighth-generation TPU: An architecture deep dive

Apr 22, 2026 · ai hardware · Source ↗

Article

TL;DR

Google splits TPU 8 into dedicated training (8T) and inference (8I) chips.

Key Takeaways

TPU 8 ships as two SKUs: 8T optimized for training, 8I for inference latency.
Architectural split signals memory bandwidth now bottlenecks AI more than raw FLOPs.
Minimal HN engagement — only one substantive comment on a major architecture post.

Discussion

Top comments:

[zshn25]: Splitting chips by workload signals memory bandwidth is now the real bottleneck

Splitting TPUs into dedicated training vs inference chips feels like an admission that the bottleneck has shifted from FLOPs to memory bandwidth + latency.