From Reasoning to Agentic: Credit Assignment in RL for LLMs

Summary

Addresses the credit assignment problem in reinforcement learning for LLMs: when you only get a final reward, which tokens or actions actually mattered? Covers two regimes — reasoning RL (distributing credit across a single chain-of-thought, 500-30K+ tokens) and agentic RL (multi-turn tool use with environment feedback). Essential reading for anyone training agents with outcome-based rewards.

Categories: cs.AI, cs.CL, cs.LG

Read paper

Type	Link
Added	Apr 13, 2026
Modified	Apr 13, 2026

📄 Papers 8 items