Research & Papers

GAZE: Grounded Agentic Zero-shot Evaluation with Viewer-Level Tools and Literature Retrieval on Rare Brain MRI

New AI uses iterative tool calls to improve rare disease detection by 3x

Deep Dive

GAZE (Grounded Agentic Zero-shot Evaluation), developed by Duaa Alim, Mogtaba Alim, and Liam Chalcroft, reimagines how medical AI interprets brain MRIs. Instead of a single forward pass, GAZE mimics a radiologist by iteratively calling tools: viewer-level actions (zoom, windowing, contrast, edge detection) and retrieval from the U.S. National Library of Medicine databases (PubMed for literature, Open-i for images). Each step is validated against a structured schema, and full tool-call traces are logged for auditability.

On the NOVA benchmark (906 brain MRI cases covering 281 rare neurological conditions), GAZE reached 58.2 mean average precision at IoU 0.3 for lesion localization and 34.9% Top-1 diagnostic accuracy under a joint protocol—all without task-specific fine-tuning. Tool use disproportionately benefited rare pathologies: the fraction of cases with IoU > 0.3 rose from 17% to 58% for diseases with three or fewer training examples, compared to 25% to 68% for common conditions (≥10 cases). Retrieval ablations revealed a model-dependent trade-off where gains in diagnosis sometimes reduced localization accuracy, emphasizing the need for joint evaluation of diagnosis, localization, and captioning in medical VLMs.

Key Points
  • GAZE uses viewer-level tools (zoom, contrast, edge detection) and retrieval from PubMed and Open-i to iteratively analyze brain MRIs.
  • Tool use boosted rare disease detection from 17% to 58% IoU > 0.3 for conditions with ≤3 examples.
  • Achieved 34.9% Top-1 diagnostic accuracy and 58.2 mAP@0.3 lesion localization without fine-tuning.

Why It Matters

Makes AI-assisted radiology more reliable for rare diseases, potentially improving diagnosis for underserved patient populations.