VISOR: Agentic Visual RAG via Iterative Search and Over-horizon Reasoning

https://arxiv.org/abs/2604.09508

Summary

Tackles the problem of agentic visual retrieval-augmented generation where evidence is scattered across multiple document pages. VISOR interleaves reasoning with iterative retrieval, addressing two bottlenecks: visual evidence sparsity (key info spread across pages) and fine-grained cross-page reasoning. A practical step toward agents that can reason over entire visual documents, not just single pages.

Categories: cs.AI, cs.CV, cs.IR

Read paper


Type Link
Added Apr 13, 2026
Modified Apr 13, 2026