Richard Sutton – Father of RL thinks LLMs are a dead end

· ai · Source ↗

Summary based on the YouTube transcript and episode description.

Richard Sutton, 2024 Turing Award winner and RL pioneer, argues LLMs fundamentally cannot learn from experience and will be superseded by continual-learning agents.

  • Sutton’s core claim: LLMs have no goal and no ground truth, so there is no definition of a right action and no basis for continual improvement.
  • LLMs learn what humans say to do (imitation), not what happens in the world; they cannot be surprised by outcomes and make no update when reality differs from expectation.
  • Gradient descent alone does not produce good generalization; every case of LLMs generalizing well was engineered by researchers, not by the algorithm.
  • Sutton expects experience-based RL agents to outscale LLMs, making this another instance of The Bitter Lesson he authored in 2019.
  • He contends that understanding a squirrel’s learning process would get us almost all the way to human intelligence; language is a thin veneer on top of basic animal learning.
  • Temporal difference learning already solves the long-horizon reward problem (startup exits, chess endgames) via value functions that propagate future reward back to immediate actions.
  • Sutton’s four-part succession argument: no global governance body, researchers will crack intelligence, super-intelligence follows, most intelligent entities accumulate power — making AI succession inevitable.
  • A key future risk Sutton raises: digital agents spawning copies to learn remotely then reincorporating knowledge could introduce corruption, hidden goals, or adversarial takeover of the host mind.

2025-09-26 · Watch on YouTube