DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters | Lex Fridman Podcast #459

· ai · Source ↗

Summary based on the YouTube transcript and episode description. Prompt input used 79979 of 250000 transcript characters.

Dylan Patel and Nathan Lambert break down DeepSeek’s real compute costs, China’s GPU stockpile, and the multi-gigawatt AI cluster arms race.

  • DeepSeek likely has ~50,000 GPUs total; the claimed 2,000-GPU training run covers only pre-training for V3, excluding research, ablations, and R1 costs.
  • DeepSeek V3 uses mixture-of-experts with 600B+ total parameters but activates only ~37B per token, dramatically cutting training and inference compute.
  • xAI Memphis cluster is currently the world’s largest single training cluster at 200,000 GPUs; Meta trains Llama 4 on ~128,000 GPUs.
  • OpenAI’s Stargate site in Abilene, Texas is designed for 2.2 gigawatts of power — more than most US cities — when fully built out.
  • Meta accidentally open-sourced a PyTorch operator that makes GPUs compute fake numbers during weight exchange to prevent power spikes from blowing up grid infrastructure.
  • Export controls’ more achievable goal is limiting China’s inference compute (serving AI at scale), not just blocking frontier training runs.
  • DeepSeek’s H800 GPUs had interconnect bandwidth cut vs H100 but equal flops; DeepSeek engineered around the interconnect limit through custom scheduling.
  • O3 mini benchmarks near DeepSeek R1 but costs more, hides chain-of-thought, and is closed-weight; R1 is MIT-licensed and shows full reasoning traces.

Guests: Dylan Patel, founder of SemiAnalysis; Nathan Lambert, research scientist at Allen Institute for AI (Ai2) · 2025-02-03 · Watch on YouTube