Building the GitHub for RL Environments: Prime Intellect's Will Brown & Johannes Hagemann

https://www.youtube.com/watch?v=SJc1y5z5wwM

Prime Intellect’s Will Brown & Johannes Hagemann argue every company needs a model-product optimization loop — and their RL Environments Hub is the GitHub to make it happen

  • Cursor built Composer by post-training on Cursor itself as the RL environment — that product-model loop is why Cursor outperforms any generic coding tool.
  • As Claude Code grows popular, Anthropic has less incentive to optimize it for competing startups — the only fix is owning your own model-product loop.
  • Environments = evals: same abstraction, different use. Eval = test set offline; plug it into RL and it becomes your train set. Same infra, different label.
  • RL trades compute for data — critical when you’re at the largest model you have access to and have no bigger model to distill from; exploration is the only path.
  • Constructing RL environments is the natural successor to Scale AI-era data labeling — the bottleneck shifts from labeling answers to designing rubrics for what ‘done well’ looks like.
  • Recursive Language Models (RLMs): models managing their own context via a persistent Python REPL + sub-LM calls — Prime Intellect’s next research frontier, already showing gains on long-horizon benchmarks.
  • RL runs can detect reward hacking / backdoors in environments before they enter frontier training runs (GPT-5, Claude next) — being used as a data quality vetting layer.
  • Institutional knowledge compounding in weights beats a genius with no context: a 30-year employee analogy for why domain post-training beats prompting a frontier model.
  • Wiki search is their most-forked environment — designed as a swap-in template for agentic search over any private document corpus.
  • Context window is their acknowledged hard limit for long-horizon agents; training models to manage their own context (RLMs) is the proposed solution, not bigger windows.

Guests: Will Brown (Prime Intellect, co-founder), Johannes Hagemann (Prime Intellect, co-founder), hosted by Sonya Huang (Sequoia Capital) · 2026-02-10 · Watch on YouTube


Type Link
Added Feb 10, 2026
Modified Apr 16, 2026