What are we scaling?
Dwarkesh Patel argues RL scaling is incoherent with short AGI timelines, and continual learning — not RL — is the real missing capability.
- Labs baking skills into models via RL implies models won’t generalize on the job — contradicting imminent-AGI timelines.
- Beren Millidge: benchmark gains reflect billions spent on expert-labeled data, not just compute or algorithmic progress.
- Toby Ord estimates a ~1,000,000x RL compute scale-up yields only a GPT-level capability boost.
- Slow enterprise AI adoption is not diffusion lag — if models were truly AGI-level, onboarding would be faster than hiring humans.
- Knowledge workers earn tens of trillions/year globally; labs earning orders of magnitude less reveals a real capability gap.
- Goalpost shifting on AGI definitions is partially justified: Gemini 3 in 2020 would have seemed sufficient for half of knowledge work.
- Continual learning — agents gaining domain experience and distilling it back to a shared model — is the actual missing driver, not RL from verifiable reward.
- Human-level on-the-job learning may take another 5–10 years; no single lab breakthrough will trigger a runaway intelligence explosion given fierce multi-lab competition.
2025-12-23 · Watch on YouTube