Reproducibility and Artifact Consistency of the SIGIR 2022 Recommender Systems Papers Based on Message Passing
New analysis reveals widespread methodological errors and data leakage in influential AI research papers.
A new study by researchers Maurizio Ferrari Dacrema and Michael Benigni has exposed serious reproducibility issues in influential AI research. The team analyzed 10 graph-based recommender system papers, primarily from SIGIR 2022, that use neural networks and embeddings for recommendation tasks. Their investigation reveals three critical problems: widespread data leakage between training and testing sets, inconsistencies between published code and paper descriptions, and the use of artificially weak baselines that create the illusion of progress.
Specifically, the researchers found that for the Amazon-Book dataset, the state-of-the-art has actually worsened despite papers claiming continuous improvement. They attempted to reproduce the experiments but were unable to confirm most claims made in the original publications. This represents a significant challenge for the field, as these papers have influenced subsequent work presented at SIGIR 2023 and beyond.
The study highlights a growing reproducibility crisis in AI research, where complex models are often compared against weaker baselines rather than established, simpler approaches. This practice creates misleading narratives of progress while potentially obscuring genuine innovation. The researchers call for greater methodological rigor, better artifact documentation, and more honest baseline comparisons to ensure scientific integrity in recommender systems research.
- Analysis of 10 SIGIR 2022 papers revealed data leakage and methodological errors
- Performance on Amazon-Book dataset has worsened despite claims of improvement
- Researchers could not reproduce most papers' claims due to inconsistent artifacts
Why It Matters
Reproducibility issues undermine AI research credibility and waste resources on flawed approaches.