New multi-agent AI system MeDxAgent improves interactive diagnosis accuracy by 10.3%
Mimics physician reasoning through targeted questioning across 20 specialties with 4,421 cases.
Large language models are increasingly used for health decision support, yet most evaluations treat diagnosis as a single-shot, multiple-choice task—far from real clinical practice where doctors refine hypotheses through interactive questioning. To bridge this gap, researchers from academia and Microsoft Research introduce MeDxAgent, a multi-agent consultation system designed for interactive, open-ended diagnosis. They also present MeDxBench, a large-scale benchmark of 4,421 clinical cases spanning 20 specialties. The system uses a multi-agent architecture where separate agents handle patient interaction, information gathering, and diagnosis. Key design choices include collecting demographics first, passing a summarized dialogue to the diagnosis agent, and feeding candidate diagnoses back into the questioning process—strategies that collectively mimic how physicians reason.
Results show MeDxAgent outperforms the baseline by 10.3% in accuracy, closing 52.3% of the gap to a full-information oracle (theoretical perfect scenario). Notably, the individual design choices only show their full effect in combination, suggesting that holistic pipeline design is critical. The system's ability to sequentially refine hypotheses through targeted questioning represents a significant step toward more realistic AI-assisted diagnosis. The paper, titled "MeDxAgent: Multi-Agent Consultation for Interactive Medical Diagnosis," is available on arXiv with code and dataset to be released upon publication. This work has implications for building AI systems that can collaborate with clinicians in real-world diagnostic workflows, potentially improving triage accuracy and reducing diagnostic errors.
- MeDxAgent achieves a 10.3% accuracy gain over baseline on MeDxBench, closing 52.3% of the gap to a full-information oracle.
- The benchmark MeDxBench includes 4,421 clinical cases across 20 specialties, enabling evaluation of interactive multi-turn diagnosis.
- Design choices like collecting demographics first and feeding candidate diagnoses for targeted questioning mirror physician reasoning but only work in combination.
Why It Matters
Interactive AI diagnosis that mimics clinical reasoning could improve triage and reduce misdiagnosis in real-world healthcare.