Research & Papers

EviSearch: A Human in the Loop System for Extracting and Auditing Clinical Evidence for Systematic Reviews

New multi-agent system extracts data from trial PDFs with page-level provenance for audit trails.

Deep Dive

A research team led by Naman Ahuja and seven other authors has developed EviSearch, a novel multi-agent AI system designed to revolutionize how clinical evidence is extracted for systematic reviews. The system directly processes native trial PDFs—preserving complex layouts, tables, and figures—to automatically generate structured, ontology-aligned evidence tables. Its key innovation is a pipeline that pairs a PDF-query agent with a retrieval-guided search agent, followed by a reconciliation module. This module forces a page-level human verification step whenever the AI agents disagree, ensuring high-precision extraction across multimodal sources (text, tables, figures) and generating reviewer-actionable provenance for every data point.

EviSearch is engineered for safety and auditability in high-stakes medical contexts. By logging all reconciler decisions and subsequent human reviewer edits, the system creates structured preference and supervision signals. This data can bootstrap iterative model improvement in a closed-loop fashion. In testing on a clinician-curated benchmark of oncology trial papers, EviSearch demonstrated substantially improved extraction accuracy compared to strong parsed-text baselines, while achieving comprehensive attribution coverage. The system is specifically intended to accelerate "living" systematic review workflows, significantly reduce the immense manual curation burden on clinicians and researchers, and provide a verifiable, trustworthy pathway for integrating large language model (LLM) capabilities into critical evidence synthesis pipelines.

Key Points
  • Uses multi-agent AI with specialized PDF-query and search agents, plus a reconciliation module for human-in-the-loop verification.
  • Generates per-cell provenance for audit trails, allowing clinicians to inspect and correct every extracted data point.
  • Substantially improved accuracy on oncology trial benchmarks and creates structured data to bootstrap future model training.

Why It Matters

Could dramatically accelerate medical research and drug approvals by automating the most tedious part of evidence synthesis with built-in safety checks.