📄 Papers
Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning LLMs
arxiv.org
From Reasoning to Agentic: Credit Assignment in RL for LLMs
arxiv.org
CrashSight: Vision-Language Benchmark for Traffic Crash Scene Understanding
arxiv.org
Robust Reasoning Benchmark: How Formatting Changes Break LLM Math Reasoning
arxiv.org
Medical Reasoning with Large Language Models: A Survey and MR-Bench
arxiv.org
VISOR: Agentic Visual RAG via Iterative Search and Over-horizon Reasoning
arxiv.org
Enhancing LLM Problem Solving via Tutor-Student Multi-Agent Interaction
arxiv.org
GRASP: Grounded CoT Reasoning with Dual-Stage Optimization for Multimodal Sarcasm Target Identification
arxiv.org