Mirończuk's meta-analysis: multimodal fusion boosts document classification +5.28%
First quantitative synthesis of 139 studies reveals +5.28% accuracy gain from multimodal fusion.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
A new systematic review by Marcin Mirończuk, published on arXiv, provides the first quantitative synthesis of information fusion techniques for document classification. Analyzing 139 primary studies, the paper introduces a formal framework to unify the fragmented field and performs a random-effects meta-analysis. The results show that multimodal fusion — combining multiple data sources like text, images, and metadata — significantly improves classification accuracy by an average of +5.28 percentage points (p=0.0016). Multiview fusion, which uses different representations of the same data, yields more modest but consistent gains: +4.67% for accuracy, +3.08% for F1-score, and statistically significant improvements in recall (all p<0.05).
However, the review uncovers serious methodological gaps: only 11.8% of multimodal studies and 23.3% of multiview studies used statistical tests to validate their findings, undermining the reliability of many reported results. Mirończuk emphasizes that successful information fusion depends less on algorithmic complexity and more on strategically aligning the fusion method with the task context, alongside a commitment to rigorous validation. This work provides practitioners with a unifying framework, quantitative evidence, and data-driven guidelines for designing effective document classification systems.
- Meta-analysis of 139 studies shows multimodal fusion improves accuracy by +5.28 percentage points (p=0.0016).
- Multiview fusion yields consistent but smaller gains: +4.67% accuracy, +3.08% F1-score (all p<0.05).
- Only 11.8% of multimodal and 23.3% of multiview studies used statistical tests, highlighting reproducibility issues.
Why It Matters
Provides the first quantitative evidence base for choosing fusion strategies, guiding practitioners away from complexity toward task-aligned validation.