CrashSight: Vision-Language Benchmark for Traffic Crash Scene Understanding

https://arxiv.org/abs/2604.08457

Summary

A large-scale benchmark testing vision-language models on safety-critical traffic crash understanding from infrastructure cameras (not just ego-vehicle dashcams). Evaluates whether VLMs can reason about crash phases, causes, and contributing factors — finding significant gaps in current models’ ability to handle real-world safety scenarios.

Categories: cs.CV, cs.AI, cs.CL

Read paper


Type Link
Added Apr 13, 2026
Modified Apr 13, 2026