Uncertainty-Aware Estimation of Mis/Disinformation Prevalence on Social Media
Study analyzes 6 social platforms across 4 EU countries, revealing measurement challenges in misinformation detection.
A team of researchers led by Ishari Amarasinghe has published a significant methodological paper titled 'Uncertainty-Aware Estimation of Mis/Disinformation Prevalence on Social Media' on arXiv. The study addresses a critical gap in misinformation research by proposing a comprehensive framework for quantifying measurement uncertainties that are typically overlooked. The researchers analyzed data collected between March and April 2025 from six major social platforms—Facebook, Instagram, LinkedIn, TikTok, X/Twitter, and YouTube—across four EU Member States: France, Poland, Slovakia, and Spain. All data was annotated by professional fact-checkers, providing a robust foundation for their analysis.
The methodology focuses on three distinct sources of uncertainty: sample uncertainty (from limited data), annotation uncertainty (from human disagreement and misclassification), and data retrieval uncertainty (from keyword-based collection methods). The team used confidence intervals, simulation-based methods, and bootstrapping techniques to quantify these uncertainties both separately and jointly. Their key finding reveals that keyword-based data retrieval—a common approach in misinformation studies—can introduce variability that exceeds baseline measurement uncertainty, leading to significantly wider confidence intervals around prevalence estimates.
This research represents an important advancement in how we measure and interpret misinformation on social media. By providing tools to quantify uncertainty, the methodology enables more robust analysis and helps researchers understand the limitations of their measurements. The study's empirical results demonstrate that without accounting for these uncertainties, misinformation prevalence estimates may be less reliable than previously assumed, potentially impacting the design and evaluation of mitigation strategies.
- Analyzed data from 6 social platforms (Facebook, Instagram, LinkedIn, TikTok, X/Twitter, YouTube) across 4 EU countries
- Quantified three uncertainty sources: sampling, human annotation, and keyword-based data retrieval
- Found keyword-based retrieval can widen confidence intervals beyond baseline variability
Why It Matters
Provides more reliable tools for measuring misinformation, helping platforms and policymakers design better mitigation strategies.