An Empirical Investigation of Pre-Trained Deep Learning Model Reuse in the Scientific Process
First large-scale analysis shows AI model reuse is transforming scientific research, led by biochemistry.
A team of researchers from Purdue University and other institutions has published the first large-scale empirical study analyzing how pre-trained deep learning models (PTMs) like BERT, GPT variants, and vision transformers are being reused in natural sciences. Using an automated large language model pipeline, they analyzed 17,511 peer-reviewed, open-access papers published between January 2000 and December 2025 to quantify adoption patterns and impact. The study provides concrete data on a practice that has become widespread but poorly documented.
The analysis revealed three key findings. First, the field of 'Biochemistry, Genetics and Molecular Biology' has outpaced all other natural science disciplines in adopting and reusing PTMs, indicating a strong shift toward computational methods in life sciences. Second, 'adaptation' reuse—where scientists fine-tune existing models on their specific datasets—is the most prevalent pattern across all fields, far more common than simple conceptual reuse or full deployment. Third, the integration of PTMs has most significantly impacted the 'Test' stage of the scientific process, enabling high-throughput, data-driven experimentation and analysis that was previously cost-prohibitive.
This research establishes a crucial foundation for understanding the real-world implementation of AI in science. By moving beyond anecdotal evidence to analyze thousands of papers, the study shows how the prohibitive cost of training models from scratch is being circumvented, accelerating research. The findings suggest that the scientific method itself is evolving, with AI tools becoming embedded in the core workflow of testing hypotheses and analyzing results.
- Analyzed 17,511 scientific papers (2000-2025) using an LLM-driven pipeline to track AI model reuse.
- Found 'Biochemistry, Genetics and Molecular Biology' is the leading field for PTM adoption, with 'adaptation' (fine-tuning) as the dominant reuse pattern.
- PTM integration most impacts the 'Test' stage of science, enabling high-throughput, data-driven research previously limited by cost and complexity.
Why It Matters
Provides data-driven proof that AI reuse is accelerating scientific discovery, particularly in life sciences, by lowering technical barriers.