DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters | Lex Fridman Podcast #459
Dylan Patel and Nathan Lambert break down DeepSeek’s real compute costs, China’s GPU stockpile, and the multi-gigawatt AI cluster arms race.
- DeepSeek likely has ~50,000 GPUs total; the claimed 2,000-GPU training run covers only pre-training for V3, excluding research, ablations, and R1 costs.
- DeepSeek V3 uses mixture-of-experts with 600B+ total parameters but activates only ~37B per token, dramatically cutting training and inference compute.
- xAI Memphis cluster is currently the world’s largest single training cluster at 200,000 GPUs; Meta trains Llama 4 on ~128,000 GPUs.
- OpenAI’s Stargate site in Abilene, Texas is designed for 2.2 gigawatts of power — more than most US cities — when fully built out.
- Meta accidentally open-sourced a PyTorch operator that makes GPUs compute fake numbers during weight exchange to prevent power spikes from blowing up grid infrastructure.
- Export controls’ more achievable goal is limiting China’s inference compute (serving AI at scale), not just blocking frontier training runs.
- DeepSeek’s H800 GPUs had interconnect bandwidth cut vs H100 but equal flops; DeepSeek engineered around the interconnect limit through custom scheduling.
- O3 mini benchmarks near DeepSeek R1 but costs more, hides chain-of-thought, and is closed-weight; R1 is MIT-licensed and shows full reasoning traces.
Guests: Dylan Patel, founder of SemiAnalysis; Nathan Lambert, research scientist at Allen Institute for AI (Ai2) · 2025-02-03 · Watch on YouTube