Gaze2Report: Radiology Report Generation via Visual-Gaze Prompt Tuning of LLMs
New AI framework uses predicted scanpaths to make LLMs write more clinically relevant radiology reports.
A research team from Johns Hopkins University and the University of Florida has introduced Gaze2Report, a novel framework for AI-generated radiology reports that addresses a critical gap in clinical alignment. Existing deep learning methods often produce generic or misaligned reports because they lack the medical reasoning priors that guide human radiologists. Gaze2Report tackles this by incorporating the concept of visual attention, specifically eye gaze patterns, which reveal where a radiologist looks and for how long when analyzing a scan. This information is crucial for understanding disease manifestation and structuring a relevant report.
However, collecting real-time eye gaze data in clinical settings is expensive and impractical. Gaze2Report's key innovation is its scanpath prediction module, which uses a Graph Neural Network (GNN) to generate synthetic 'visual-gaze tokens' that mimic a radiologist's focus areas. These tokens are combined with the medical image and report instructions to form a multimodal prompt. This prompt is then used to fine-tune only the Low-Rank Adaptation (LoRA) layers of a large language model (LLM), making the training efficient. The result is an autoregressive report generator that benefits from gaze-guided learning but operates entirely without physical gaze input during real-world use, solving the deployment bottleneck.
The work, accepted for an oral presentation at the IEEE International Symposium on Biomedical Imaging (ISBI) 2026, represents a significant step toward more interpretable and physician-aligned AI in medical imaging. By computationally modeling expert visual behavior, the system aims to generate reports that don't just describe findings but do so with a structure and emphasis that mirrors human diagnostic logic. This approach could lead to AI assistants that produce draft reports requiring less correction from overburdened radiologists, directly impacting clinical workflow efficiency.
- Uses a Graph Neural Network to predict radiologist eye scanpaths and generate visual-gaze tokens for LLM prompting.
- Fine-tunes only LoRA layers of an LLM for efficient, gaze-guided radiology report generation.
- Eliminates need for expensive, real-time gaze data during clinical inference, solving a major deployment hurdle.
Why It Matters
It could reduce radiologist workload by generating more clinically accurate draft reports that require less editing, improving diagnostic efficiency.