Read the Paper, Write the Code: Agentic Reproduction of Social-Science Results
Can LLMs write code to replicate studies without seeing original code?
A team of researchers led by Benjamin Kohler has developed an agentic system that can reproduce empirical social science results by reading a paper's methods description and using only the original data—without access to the original code, results, or even the full paper. The system extracts structured methods descriptions, runs reimplementations under strict information isolation, and enables deterministic cell-level comparison of reproduced outputs to original results. An error attribution step traces discrepancies through the system chain to identify root causes.
Evaluating four agent scaffolds and four LLMs on 48 papers with human-verified reproducibility, the agents largely recovered published results, but performance varied substantially between models, scaffolds, and papers. Root cause analysis revealed that failures stem from both agent errors and underspecification in the papers themselves. This work opens new possibilities for automating reproducibility checks and improving scientific rigor.
- System extracts structured methods from papers and reimplements code without seeing original code or results
- Tested 4 agent scaffolds and 4 LLMs across 48 social science papers with human-verified reproducibility
- Failures caused by both agent errors and underspecified methods in papers
Why It Matters
Automating reproducibility checks could dramatically speed up validation of social science research.