OpenAI’s Deep Research Team on Why Reinforcement Learning is the Future for AI Agents

· ai · Source ↗

Watch on YouTube ↗ Summary based on the YouTube transcript and episode description.

OpenAI’s Isa Fulford and Josh Tobin explain why end-to-end RL training—not prompt-chained graphs—is the architecture behind Deep Research and future agents.

  • Deep Research is a fine-tuned version of o3 trained end-to-end via RL on hard browsing and reasoning tasks, not a hand-coded agent graph.
  • Hand-coded operation graphs fall apart in production because humans can’t anticipate all edge cases; RL-trained models adapt dynamically to live web content.
  • Sam Altman projects Deep Research will handle a single-digit percentage of all economically valuable tasks globally.
  • High-quality training data was the hidden key to success—data quality is the biggest determinant of model quality.
  • Clarification flow before research starts was an intentional design choice: detailed prompts yield dramatically better 5–30 minute reports.
  • Future roadmap: private data sources, fused operator+browser capabilities, and RL recipe scaling to increasingly complex agentic tasks.
  • Reinforcement learning is “so back” because large pretrained LMs now provide the base (the cake) that RL fine-tuning (the cherries) previously lacked.

2025-02-25 · Watch on YouTube