Some thoughts on the Sutton interview

Name: Some thoughts on the Sutton interview
Uploaded: 2025-10-04T12:00:00.000000Z
Description: Dwarkesh Patel reflects on his Sutton interview, arguing LLM imitation learning and RL are complementary steps toward AGI, not dead ends. Sutton’s bitter lesson critique: LLMs waste compute during dep…

Oct 4, 2025 · Source ↗

Summary based on the YouTube transcript and episode description.

Dwarkesh Patel reflects on his Sutton interview, arguing LLM imitation learning and RL are complementary steps toward AGI, not dead ends.

Sutton’s bitter lesson critique: LLMs waste compute during deployment by not learning, and training oversamples inelastic human data.
Dwarkesh’s counter: imitation learning is just short-horizon RL — one token per episode — not a categorically different paradigm.
AlphaGo (human-bootstrapped) vs AlphaZero (scratch): both superhuman; human data isn’t detrimental, just not necessary at scale.
Ilya Sutskever framed pre-training data as fossil fuels — a necessary, non-renewable intermediary to reach the next energy regime.
LLMs RL’d on pre-trained priors now win gold at IMO and build full apps; you couldn’t bootstrap that RL from scratch yet.
Continual learning gap is real: LLMs extract ~1 bit per episode from outcome-based RL, while animals extract high-bandwidth world-model updates.
Dwarkesh speculates supervised fine-tuning as a tool call (outer-loop RL teaching the model mid-task) could replicate continual learning.
If LLMs reach AGI first, Dwarkesh expects successor systems built by those AGIs will be based on Sutton’s architecture vision.

2025-10-04 · Watch on YouTube

Related coverage

Jensen Huang – Will Nvidia’s moat persist?

Michael Nielsen – Why aliens will have a different tech stack than us

Terence Tao – How the world’s top mathematician uses AI

Dylan Patel — The single biggest bottleneck to scaling AI compute