Jeff Dean & Noam Shazeer — 25 years at Google: from PageRank to AGI

Name: Jeff Dean & Noam Shazeer — 25 years at Google: from PageRank to AGI
Uploaded: 2025-02-12T12:00:00.000000Z
Description: Jeff Dean and Noam Shazeer discuss 25 years at Google, from the 2007 two-trillion-token n-gram model to their vision of organic, modular ‘blob’ AI architectures replacing monolithic training runs. 25%…

Feb 12, 2025 · Source ↗

Watch on YouTube ↗ Summary based on the YouTube transcript and episode description. Prompt input used 79979 of 130361 transcript characters.

Jeff Dean and Noam Shazeer discuss 25 years at Google, from the 2007 two-trillion-token n-gram model to their vision of organic, modular ‘blob’ AI architectures replacing monolithic training runs.

25% of characters checked into Google’s codebase are now generated by internal AI coding models, per Sundar Pichai.
Noam Shazeer co-invented the Transformer in 2017 and rejoined Google in 2024 after leaving in 2021; he rejoins roughly every 12 years.
In 2007, Dean and the Google translation team built a 2-trillion-token 5-gram language model served across 200 machines, cutting translation latency from 12 hours to ~100ms.
Google had an internal chatbot (Meena) before ChatGPT launched; slow external release was partly due to hallucination and safety concerns, not lack of capability.
Dean and Shazeer advocate for organic, modular model architectures where specialized sub-models can be developed, swapped, or distilled independently — building toward this under the Pathways infrastructure.
Mixture-of-experts misconception: all experts must stay in HBM memory because efficient inference requires large batch sizes across all experts simultaneously, not routing away from unused ones.
Dean argues current training extracts insufficient value per token; techniques like masking, dropout, and multi-epoch training on existing text data could yield much more capable models without new data.
Long-context scaling (attending to trillions of tokens) is the key unsolved problem: naive quadratic attention is infeasible, requiring algorithmic approximations to give models effective access to all personal and web data.

2025-02-12 · Watch on YouTube

Related coverage

Jensen Huang – Will Nvidia’s moat persist?

Michael Nielsen – Why aliens will have a different tech stack than us

Terence Tao – How the world’s top mathematician uses AI

Dylan Patel — The single biggest bottleneck to scaling AI compute