From Reasoning to Agentic: Credit Assignment in RL for LLMs

https://arxiv.org/abs/2604.09459

Summary

Addresses the credit assignment problem in reinforcement learning for LLMs: when you only get a final reward, which tokens or actions actually mattered? Covers two regimes — reasoning RL (distributing credit across a single chain-of-thought, 500-30K+ tokens) and agentic RL (multi-turn tool use with environment feedback). Essential reading for anyone training agents with outcome-based rewards.

Categories: cs.AI, cs.CL, cs.LG

Read paper


Type Link
Added Apr 13, 2026
Modified Apr 13, 2026