Nvidia CTO Michael Kagan: Scaling Beyond Moore's Law to Million-GPU Clusters

· hardware · Source ↗

Summary based on the YouTube transcript and episode description.

Nvidia CTO Michael Kagan explains how Mellanox networking—not just GPU compute—became the critical bottleneck and differentiator for AI clusters at scale.

  • AI model size has grown 2x every 3 months since ~2011, requiring 10x–16x annual performance gains vs. Moore’s Law’s 2x per two years.
  • Network jitter, not raw bandwidth, determines how many GPUs a job can practically parallelize across—wide latency variance forces splitting to 10 GPUs instead of 1,000.
  • At 100,000-GPU scale, component failure probability reaches near-certainty; hardware and software must be designed to keep running despite constant partial failures.
  • Inference compute demand now rivals or exceeds training: reasoning models run thousands of sequential inferences per query, and a trained model is inferred billions of times.
  • Nvidia is building separate GPU SKUs optimized for prefill (compute-intensive) vs. decode (memory-intensive) inference phases.
  • XAI’s current large cluster runs ~100–150 MW; industry is now planning gigawatt and 10-gigawatt data centers, driving a full shift to liquid cooling.
  • Nvidia accelerated its product release cadence from every two years to every year, targeting roughly 10x performance improvement per generation.
  • Kagan argues AI could surface entirely unknown laws of physics by generalizing across observed phenomena the way theoretical physicists do, but at far greater scale.

2025-10-28 · Watch on YouTube