Papers with Code paper 3d ago

Self-Distilled Agentic Reinforcement Learning

Reinforcement learning (RL) has emerged as a central paradigm for post-training LLM agents, yet its trajectory-level reward signal provides only coarse supervision for long-horizon...