Research & Papers

Seeing Through Experts Eyes A Foundational Vision Language Model Trained on Radiologists Gaze and Reasoning

arXiv cs.AI April 17, 2026

⚡New vision model trained on 30,000 gaze frames from radiologists follows expert diagnostic workflows.

Deep Dive

A research team led by Kinhei Lee has developed GazeX, a foundational vision-language model that fundamentally changes how AI interprets medical images by learning from radiologists' visual behavior. Unlike standard models that process semantic information, GazeX was trained on a curated dataset of over 30,000 gaze key frames from five radiologists, capturing their eye-tracking data, fixation patterns, and attention trajectories. This allows the model to emulate the structured, systematic protocols—such as the ABCDEF approach—that experts use to examine chest X-rays, ensuring all clinically relevant regions are assessed in a logical sequence. The goal is to bridge the critical gap between generic AI outputs and the nuanced, reliable reasoning required in clinical diagnostics.

GazeX was trained on a massive dataset of 231,835 radiographic studies, 780,014 question-answer pairs, and 1,162 image-sentence pairs with bounding boxes. The model demonstrates superior performance in radiology report generation, disease grounding, and visual question answering by producing outputs that are more accurate, interpretable, and consistent with expert judgment. Crucially, it generates verifiable evidence artifacts, including inspection trajectories and finding-linked localized regions, which enable efficient human verification. This approach of 'learning through expert eyes' provides a practical pathway toward more trustworthy, explainable, and diagnostically robust AI systems, not just for radiology but for other high-stakes visual domains.

Key Points

Trained on 30,000 gaze key frames from 5 radiologists to model expert visual examination protocols.
Leverages a dataset of 231,835 radiographic studies and 780,014 QA pairs for robust training.
Produces verifiable evidence artifacts like inspection trajectories for safer human-AI collaboration.

Why It Matters

Makes AI diagnostics more reliable and interpretable by mimicking expert workflows, reducing missed findings in critical medical imaging.

Read Original Article

Seeing Through Experts Eyes A Foundational Vision Language Model Trained on Radiologists Gaze and Reasoning

Why It Matters

Stay Ahead in AI