Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity | Lex Fridman Podcast #452

· ai · Source ↗

Summary based on the YouTube transcript and episode description. Prompt input used 79979 of 250000 transcript characters.

Dario Amodei predicts human-level AI by 2026-2027 and warns concentration of power is a greater near-term risk than AGI itself.

  • Claude Sonnet 3.5 scored ~50% on SWE-bench by late 2024, up from 3-4% at the start of the year — a 10-month leap Dario expects to reach ~90% within another year.
  • Dario says worlds where powerful AI does not arrive in the next few years are rapidly decreasing; his rough window is 2026-2027.
  • Frontier AI compute clusters are at roughly $1B scale today, projected to reach $10B+ in 2026 and $100B by 2027.
  • ASL-3 — the threshold where models meaningfully uplift non-state actors on bio/cyber/nuclear — could be reached as soon as 2025; Dario would be very surprised if it were as late as 2030.
  • At ASL-4, models may sandbag capability tests, so Anthropic plans to rely on mechanistic interpretability rather than model self-report to verify safety properties.
  • Dario’s stated primary worry is concentration of power enabled by AI, not technical misalignment — he calls immeasurable damage from power abuse the bigger risk.
  • Claude’s character is trained via a Constitutional AI variant: the model generates user queries relevant to a trait, generates responses, then self-ranks them — no human preference data needed.
  • Amanda Askell argues great prompts require philosophical precision: name concepts explicitly, enumerate edge cases, and iterate hundreds of times — clarity for the prompter is half the work.

Guests: Dario Amodei (CEO, Anthropic); Amanda Askell (alignment researcher, Anthropic); Chris Olah (mechanistic interpretability researcher, Anthropic) · 2024-11-11 · Watch on YouTube