GRASP: Grounded CoT Reasoning with Dual-Stage Optimization for Multimodal Sarcasm Target Identification
https://arxiv.org/abs/2604.08879Summary
Introduces GRASP, a framework that uses chain-of-thought reasoning to identify sarcasm targets across text and images. Instead of just detecting βis this sarcastic?β, it pinpoints exactly which phrase or image region is being mocked β using a dual-stage training pipeline with grounded cross-modal alignment.
Categories: cs.AI, cs.CL, cs.CV
| Type | Link |
| Added | Apr 13, 2026 |
| Modified | Apr 13, 2026 |