Code Sharing In Prediction Model Research: A Scoping Review
LLM-powered review of 3,967 papers reveals critical reproducibility crisis in AI research.
A team of 11 researchers from institutions including MIT and Stanford conducted a landmark scoping review of code-sharing practices in AI prediction model research. Using a novel LLM-assisted pipeline to screen 3,967 PubMed-indexed articles that cited TRIPOD or TRIPOD+AI reporting guidelines, they found that only 12.2% of studies included code-sharing statements. While code sharing increased to 15.8% in 2025 and was higher among TRIPOD+AI-citing studies, the overall rate remains alarmingly low for a field dependent on reproducibility.
When code was shared, researchers discovered substantial gaps in reproducibility features. Their LLM assessment of repositories against 14 predefined criteria revealed that while 80.5% contained README files, only 37.6% specified dependencies, and a mere 21.6% used version-constrained dependencies. Even fewer repositories (42.4%) were modular in structure. The study's own analysis code is publicly available, modeling the transparency it advocates for.
The findings provide the empirical foundation for TRIPOD-Code, an upcoming extension to the TRIPOD reporting guidelines specifically focused on code sharing standards. The research underscores that simply making code available is insufficient—documentation, dependency specification, licensing, and executable structure are equally critical for true reproducibility. This work highlights a systemic issue in AI/ML research that hampers scientific progress and clinical translation of prediction models.
- Only 12.2% of 3,967 AI prediction model papers shared code, with rates varying widely by journal and country
- LLM assessment found just 37.6% of shared repositories specified dependencies, and only 21.6% used version constraints
- Findings will directly inform TRIPOD-Code, a new reporting guideline extension to address reproducibility gaps
Why It Matters
Poor code sharing practices hinder scientific progress and slow the translation of AI prediction models into real-world clinical applications.