From Reasoning to Agentic: Credit Assignment in RL for LLMs
https://arxiv.org/abs/2604.09459Summary
Addresses the credit assignment problem in reinforcement learning for LLMs: when you only get a final reward, which tokens or actions actually mattered? Covers two regimes — reasoning RL (distributing credit across a single chain-of-thought, 500-30K+ tokens) and agentic RL (multi-turn tool use with environment feedback). Essential reading for anyone training agents with outcome-based rewards.
Categories: cs.AI, cs.CL, cs.LG
| Type | Link |
| Added | Apr 13, 2026 |
| Modified | Apr 13, 2026 |