A Study of Scientific Computational Notebook Quality
A new study of 518 scientific code repositories shows a staggering reproducibility crisis in published research.
A team of researchers from UC San Diego and Carnegie Mellon University has published a sobering analysis of scientific code quality in the prestigious journal Nature. Their study, "A Study of Scientific Computational Notebook Quality," examined 518 code repositories linked to all 1,239 Nature publications from 2024. The findings reveal a severe reproducibility crisis: only 2 out of 19 Jupyter notebooks they attempted to execute actually ran successfully. The primary culprits were missing data files and unresolved dependency issues, preventing other scientists from verifying or building upon published results.
The problems extend far beyond reproducibility. The analysis found rampant code duplication, with 326 distinct "clone classes" of at least 10 identical lines appearing across 637 of the 1,510 notebooks studied. These clones often involved fundamental tasks like data visualization and statistical analysis. Furthermore, mutation analysis showed that notebooks frequently suffer from "tangled state changes," where the order of code execution drastically alters outcomes, making the code difficult to understand and reason about. This combination of unreproducible, duplicated, and tangled code poses a direct threat to the pace and integrity of scientific discovery.
The researchers employed a multi-faceted methodology, manually attempting to reproduce notebooks, reviewing documentation, and analyzing code clones and mutation patterns. Their curated corpus came directly from Code Availability statements in Nature papers, making it a representative sample of top-tier published research software. The study concludes that the scientific community urgently needs improved tools, better abstractions, and stronger incentives to create software that is truly reproducible, readable, and reusable.
- Only 2 of 19 Jupyter notebooks from Nature publications were reproducible, mostly due to missing data and dependencies.
- Researchers found 326 clone classes of duplicated code across 637 notebooks, indicating inefficient reuse of common tasks.
- Mutation analysis revealed widespread "tangled state" issues in notebooks, complicating comprehension and verification of results.
Why It Matters
This reproducibility crisis slows scientific progress and undermines trust in published findings, demanding better tools and practices.