Jonathan Ross, Founder & CEO @ Groq: NVIDIA vs Groq - The Future of Training vs Inference | E1260
Watch on YouTube ↗ Summary based on the YouTube transcript and episode description. Prompt input used 79979 of 83624 transcript characters.
Jonathan Ross argues Groq’s LPU architecture makes NVIDIA obsolete for inference, while training remains NVIDIA’s permanent moat.
- Groq scaled from 640 chips in production at start of 2024 to 40,000 by end; targeting 2M+ in 2025.
- The $1.5B figure widely reported is revenue, not a raise — roughly 30% of OpenAI’s revenue.
- Inference has historically been 10-20x more compute than training at Google; most investors still misunderstand this ratio.
- NVIDIA is a monopsony buyer of HBM memory (SK Hynix, Samsung, Micron are the only three suppliers), giving it a structural supply-chain moat in training.
- LPU architecture keeps model weights on-chip across 600-3,000 chips in a pipeline, eliminating external memory reads and improving energy efficiency ~3x per token vs. GPUs.
- Ross deployed inference in Saudi Arabia 51 days from contract to first tokens in production — GPU customers wait over a year.
- DeepSeek’s gains came from an algorithmic improvement (reward signal via boxed answers), not a refutation of compute scaling laws.
- China’s real AI handicap is political: censorship requirements prevent permissive, truthful models — Ross sees that as a structural disadvantage.
2025-02-17 · Watch on YouTube