Richard Sutton – Father of RL thinks LLMs are a dead end
Richard Sutton, 2024 Turing Award winner and RL pioneer, argues LLMs fundamentally cannot learn from experience and will be superseded by continual-learning agents.
- Sutton’s core claim: LLMs have no goal and no ground truth, so there is no definition of a right action and no basis for continual improvement.
- LLMs learn what humans say to do (imitation), not what happens in the world; they cannot be surprised by outcomes and make no update when reality differs from expectation.
- Gradient descent alone does not produce good generalization; every case of LLMs generalizing well was engineered by researchers, not by the algorithm.
- Sutton expects experience-based RL agents to outscale LLMs, making this another instance of The Bitter Lesson he authored in 2019.
- He contends that understanding a squirrel’s learning process would get us almost all the way to human intelligence; language is a thin veneer on top of basic animal learning.
- Temporal difference learning already solves the long-horizon reward problem (startup exits, chess endgames) via value functions that propagate future reward back to immediate actions.
- Sutton’s four-part succession argument: no global governance body, researchers will crack intelligence, super-intelligence follows, most intelligent entities accumulate power — making AI succession inevitable.
- A key future risk Sutton raises: digital agents spawning copies to learn remotely then reincorporating knowledge could introduce corruption, hidden goals, or adversarial takeover of the host mind.
2025-09-26 · Watch on YouTube