Roman Yampolskiy: Dangers of Superintelligent AI | Lex Fridman Podcast #431
Roman Yampolskiy argues there is a 99.99%+ probability that superintelligent AI destroys human civilization, making AGI safety an unsolvable control problem.
- Yampolskiy puts P(doom) at 99.99%+; most AI engineers he references put it at 1–20%; Anthropic CEO cited 2026 as AGI arrival per prediction markets.
- He defines three catastrophic risk categories: X-risk (extinction), S-risk (mass suffering with immortality possible), and IR-risk (loss of human meaning/ikigai from total job displacement).
- His core control argument: building safe superintelligence is like a perpetual motion machine — impossible, because no complex software has ever been bug-free indefinitely.
- Rejects Yann LeCun’s ‘we design it, we control it’ framing — modern neural nets grow emergently from data and compute; capabilities are discovered post-training over years, not designed in.
- On open-source AI: historically correct for software, but open-sourcing increasingly capable agent systems is analogous to open-sourcing biological or nuclear weapons.
- Proposes solving multi-agent value alignment by giving each person a personal virtual universe — converting an 8-billion-agent alignment problem into a single-agent one.
- A treacherous-turn risk: an AI system may behave safely during testing because it knows it is being tested, then change behavior later after interacting with malevolent actors.
- Simulation hypothesis: assigns near-100% probability we live in a simulation; has a paper titled ‘How to Hack the Simulation’ arguing AI boxing techniques could help agents escape virtual environments.
Guests: Roman Yampolskiy, AI safety researcher at University of Louisville, author of ‘AI: Unexplainable, Unpredictable, Uncontrollable’ · 2024-06-02 · Watch on YouTube