Research & Papers

Multimodal Claim Extraction for Fact-Checking

New benchmark reveals multimodal LLMs struggle with memes and screenshots, missing key rhetorical intent.

Deep Dive

A research team from the University of Cambridge and other institutions has published a groundbreaking paper introducing MICE (Multimodal Intent-aware Claim Extraction), the first dedicated framework for extracting factual claims from the complex combination of text and images found in social media posts. The researchers created a novel benchmark dataset consisting of real-world social media content annotated with gold-standard claims by professional fact-checkers, addressing a critical gap in automated fact-checking pipelines that have traditionally focused on text-only sources.

The study reveals that state-of-the-art multimodal LLMs like GPT-4V, Claude 3, and Llama 3 struggle significantly with this task, particularly when posts contain memes, screenshots, or photos with embedded text. Under a three-part evaluation framework measuring semantic alignment, faithfulness, and decontextualization, baseline models frequently misinterpret rhetorical devices like sarcasm and irony, and fail to properly separate factual claims from opinion or humor. The MICE framework specifically addresses these shortcomings by incorporating intent-aware processing that better models the communicative goals behind multimodal content.

Initial results show MICE achieves 15-20% improvements over standard prompting methods on intent-critical cases, though overall performance remains challenging with even the best models achieving only 68% accuracy on the new benchmark. The researchers have made their dataset and framework publicly available, providing a crucial resource for developers building next-generation fact-checking tools that must contend with the multimodal nature of modern misinformation.

Key Points
  • First benchmark for multimodal claim extraction from social media posts containing text and images like memes and screenshots
  • Baseline MLLMs (GPT-4V, Claude 3) struggle with rhetorical intent, achieving only 68% accuracy on the new dataset
  • MICE framework improves performance by 15-20% on intent-critical cases through specialized intent-aware processing

Why It Matters

Enables more accurate automated fact-checking of viral memes and social media content where most modern misinformation spreads.