What are we scaling?

· Source ↗

Summary based on the YouTube transcript and episode description.

Dwarkesh Patel argues RL scaling is incoherent with short AGI timelines, and continual learning — not RL — is the real missing capability.

  • Labs baking skills into models via RL implies models won’t generalize on the job — contradicting imminent-AGI timelines.
  • Beren Millidge: benchmark gains reflect billions spent on expert-labeled data, not just compute or algorithmic progress.
  • Toby Ord estimates a ~1,000,000x RL compute scale-up yields only a GPT-level capability boost.
  • Slow enterprise AI adoption is not diffusion lag — if models were truly AGI-level, onboarding would be faster than hiring humans.
  • Knowledge workers earn tens of trillions/year globally; labs earning orders of magnitude less reveals a real capability gap.
  • Goalpost shifting on AGI definitions is partially justified: Gemini 3 in 2020 would have seemed sufficient for half of knowledge work.
  • Continual learning — agents gaining domain experience and distilling it back to a shared model — is the actual missing driver, not RL from verifiable reward.
  • Human-level on-the-job learning may take another 5–10 years; no single lab breakthrough will trigger a runaway intelligence explosion given fierce multi-lab competition.

2025-12-23 · Watch on YouTube