Building the GitHub for RL Environments: Prime Intellect's Will Brown & Johannes Hagemann
Prime Intellect’s Will Brown and Johannes Hagemann argue every company will run its own AI research lab, built around RL environments as the new unit of model optimization.
- Cursor built a proprietary Composer model by post-training directly inside its own product environment — Prime Intellect cites this as the clearest proof of the product-model optimization loop.
- RL environments and evals are the same artifact: a dataset of tasks, an agent harness, and a rubric/reward function — the only difference is train vs. test use.
- Prime Intellect’s RL residency group (14–16 researchers) has shipped environments spanning formal math verification in Lean, medical physics, and cybersecurity CTF challenges.
- Context window limits are a current hard ceiling on long-horizon agents; Prime Intellect is actively researching Recursive Language Models (RLMs) where models manage their own context via a persistent Python REPL variable.
- RL trades compute for human data: when no larger model exists to distill from, exploration via RL is the only way to push beyond existing capability ceilings.
- Wiki-search is the most-forked environment on the hub — designed as a swap-your-documents template for agentic search over internal corpora.
- Institutional knowledge compounded in training weights outperforms a short prompt the same way a 30-year domain expert outperforms the world’s smartest generalist newcomer.
- Prime Intellect supports prompt optimization (including DSPy/GPA) around closed-weight models using the same environment infrastructure, not just open-weight fine-tuning.
2026-02-10 · Watch on YouTube