GRASP: Grounded CoT Reasoning with Dual-Stage Optimization for Multimodal Sarcasm Target Identification

Summary

Introduces GRASP, a framework that uses chain-of-thought reasoning to identify sarcasm targets across text and images. Instead of just detecting “is this sarcastic?”, it pinpoints exactly which phrase or image region is being mocked — using a dual-stage training pipeline with grounded cross-modal alignment.

Categories: cs.AI, cs.CL, cs.CV

Read paper

Type	Link
Added	Apr 13, 2026
Modified	Apr 13, 2026

📄 Papers 8 items