AgentSLR: Automating Systematic Literature Reviews in Epidemiology with Agentic AI
An open-source AI agent pipeline achieves human-level accuracy while delivering a 58x speed-up for systematic reviews.
A collaborative research team from institutions including the University of Oxford and MIT has published a groundbreaking paper on arXiv detailing AgentSLR, an open-source agentic AI pipeline designed to fully automate systematic literature reviews (SLRs). The system tackles the entire workflow—from article retrieval and screening to data extraction and final report synthesis—for epidemiological studies. In a rigorous validation against expert-curated ground truth data covering nine WHO-designated priority pathogens, AgentSLR achieved performance metrics comparable to human researchers. The most staggering result was a 58x reduction in time, collapsing a process that typically takes around 7 weeks down to approximately 20 hours.
The study also conducted a comparative analysis of five frontier large language models (LLMs) to power the agents, revealing that performance on complex SLR tasks is driven less by raw model size or inference cost and more by each model's distinctive reasoning and instruction-following capabilities. Through human-in-the-loop validation, the researchers identified key failure modes, providing a roadmap for future improvements. The findings conclusively demonstrate that orchestrating multiple AI agents (AI systems that can take sequential actions) can overcome bottlenecks in evidence-based policy and research, offering a scalable solution to a critical, time-intensive scientific process. The pipeline is open-source, paving the way for adaptation to other specialized fields beyond epidemiology.
- Achieves human-comparable accuracy in automating the full SLR workflow, from retrieval to synthesis.
- Delivers a 58x speed-up, reducing a typical 7-week review process to just 20 hours.
- Performance depends on specific LLM capabilities, not just model size, as shown in a 5-model comparison.
Why It Matters
This could dramatically accelerate evidence-based policy and medical research, turning months of work into days.