Research & Papers

NUS researchers unveil CommunityFact: a dynamic multilingual misinformation benchmark

15,992 claims across 5 languages expose gaps in LLM fact-checking abilities

Deep Dive

CommunityFact is a new dynamic benchmark designed to test AI models on real-world, multilingual misinformation detection. It includes 15,992 standalone claims across five languages (presumably English and others unspecified) and two domains (likely news and social media). The benchmark is refreshable, meaning it can be updated over time to stay current. Researchers evaluated 10 leading LLMs, varying inference capabilities including thinking modes and web search access. Results show that closed-input verification (no internet access) remains a significant challenge, while web access provides the largest accuracy gains. However, web-enabled LLMs exhibit a systematic misalignment in their source selection policies compared to the sources that human Community Notes raters converge on. This gap can be closed through model-specific retrieval expansion or pruning mechanisms.

The study also highlights substantial variation across language-domain slices and across the evidence ecosystems used by web-enabled systems. Beyond evaluation, the authors propose using Community Notes (crowdsourced fact-checking labels from X/Twitter) as a training signal for claim-conditioned source suggesters, which could improve factual verification on novel claims. This positions CommunityFact as both an evaluation tool and a potential foundation for training better fact-checking AI. The work underscores the importance of dynamic, multilingual benchmarks that reflect the fast-moving nature of online misinformation.

Key Points
  • CommunityFact contains 15,992 claims across 5 languages and 2 domains for dynamic misinformation testing
  • Web access boosts LLM performance the most, but source selection is misaligned with human Community Notes raters
  • Community Notes are proposed as a training signal to improve source suggestion for novel claims

Why It Matters

CommunityFact provides a fresh, multilingual benchmark to reveal where LLMs still fail in real-world fact-checking scenarios.