GRASP: Grounded CoT Reasoning with Dual-Stage Optimization for Multimodal Sarcasm Target Identification

https://arxiv.org/abs/2604.08879

Summary

Introduces GRASP, a framework that uses chain-of-thought reasoning to identify sarcasm targets across text and images. Instead of just detecting β€œis this sarcastic?”, it pinpoints exactly which phrase or image region is being mocked β€” using a dual-stage training pipeline with grounded cross-modal alignment.

Categories: cs.AI, cs.CL, cs.CV

Read paper


Type Link
Added Apr 13, 2026
Modified Apr 13, 2026