AD-CARE: A Guideline-grounded, Modality-agnostic LLM Agent for Real-world Alzheimer's Disease Diagnosis with Multi-cohort Assessment, Fairness Analysis, and Reader Study
The new LLM agent achieved 84.9% accuracy across 10,303 cases and improved clinician accuracy by 11%.
A research team led by Wenlong Hou and 18 other authors has introduced AD-CARE, a novel AI framework designed to tackle the complex challenge of diagnosing Alzheimer's Disease (AD) in real-world clinical settings. The system is a modality-agnostic, guideline-grounded Large Language Model (LLM) agent that orchestrates specialized diagnostic tools to generate comprehensive, report-style outputs from incomplete and heterogeneous patient data—such as medical images, cognitive test results, and patient history—without needing to fill in missing information. In a rigorous multi-cohort assessment involving 10,303 cases, AD-CARE achieved an overall diagnostic accuracy of 84.9%, representing a 4.2% to 13.7% relative improvement over existing baseline methods. The framework demonstrated robust performance across diverse datasets, with accuracy ranging from 80.4% to 98.8%, and consistently outperformed all tested baselines.
Beyond raw accuracy, AD-CARE showed significant promise in addressing critical issues of fairness and clinical workflow integration. The agent substantially reduced performance disparities across patient demographics, decreasing the average dispersion of four key metrics by 21%-68% for racial subgroups and 28%-51% for age subgroups. In a controlled reader study with neurologists and radiologists, using AD-CARE's outputs improved clinician diagnostic accuracy by 6%-11% and, crucially, more than halved their decision-making time. The framework also demonstrated backbone-agnostic utility, delivering performance gains of 2.29%-10.66% across eight different underlying LLMs (like GPT-4 and Claude) and effectively converging their capabilities. This positions AD-CARE not just as a research prototype but as a scalable, practically deployable tool that can be integrated into routine clinical workflows to provide multimodal decision support.
- Achieved 84.9% diagnostic accuracy across 10,303 real-world cases from six cohorts, a 4.2%-13.7% improvement over baselines.
- Reduced clinician decision time by over 50% and boosted their diagnostic accuracy by 6%-11% in a controlled reader study.
- Decreased performance disparity across racial subgroups by 21%-68% and works with incomplete patient data without imputing missing modalities.
Why It Matters
This represents a major step toward deployable AI clinical assistants that improve diagnostic accuracy, save doctor time, and promote healthcare equity.