OpenAI’s IMO Team on Why Models Are Finally Solving Elite-Level Math

· ai · Source ↗

Summary based on the YouTube transcript and episode description.

Alex Wei, Sheryl Hsu, and Noam Brown describe how a 3-person OpenAI team achieved IMO gold in ~2 months using general-purpose RL on hard-to-verify tasks.

  • A 3-person team built the IMO gold system in roughly 2 months, using no bespoke math tooling—only general-purpose RL techniques.
  • The model solved 5 of 6 IMO problems; Problem 6 (combinatorics) was unsolvable and the model correctly returned no answer rather than hallucinating.
  • Proofs were graded by three external IMO medalists each reaching unanimous consensus; the output style was described as ‘atrocious’ and alien but mathematically correct.
  • OpenAI deliberately avoided Lean (formal verification) to keep techniques general; the same RL and parallel-compute methods apply to other reasoning domains.
  • Reasoning time scaled from ~0.1 minutes a year ago to ~100 minutes now; research-grade math requires ~1,500 hours—a 1,000x gap still to close.
  • The model outperforms on Putnam problems (less time per problem, more knowledge-heavy) suggesting competition math frontier is no longer the binding constraint.
  • Noam Brown framed evaluation latency as the next scaling bottleneck: testing a model that thinks for a month takes a month to evaluate.
  • OpenAI plans to integrate these techniques broadly into production models but says deployment will take more time; access for mathematicians is being worked out.

2025-07-30 · Watch on YouTube