Automated Description Generation of Cytologic Findings for Lung Cytological Images Using a Pretrained Vision Model and Dual Text Decoders: Preliminary Study
A new vision-language model achieved 100% sensitivity and a 0.828 BLEU-4 score for generating medical findings.
A research team led by Atsushi Teramoto has published a study on a novel AI system designed to automate the generation of diagnostic reports for lung cytology, a critical but burdensome task in cancer diagnosis. The system was trained on 801 patch images from 206 patients. Its architecture is a hybrid model combining a pretrained convolutional neural network (CNN) for image analysis with two specialized text decoders built on the Transformer architecture. The CNN first classifies an input cell image as either benign or malignant, achieving a remarkable 100% sensitivity and 96.4% specificity. Based on this classification, the system then routes the image features to one of two independent text decoders—one optimized for describing benign findings, the other for malignant ones—to generate a coherent textual description of the cytologic observations.
This 'dual decoder' approach proved to be a key innovation. By separating the language generation pathways for benign and malignant cases, the model outperformed both existing large language model (LLM)-based image captioning methods and a simpler ablation model with a single text decoder. The quality of the generated text was validated using the BLEU-4 metric, scoring 0.828, which indicates a high degree of agreement with expert-written 'gold standard' reports. The model's decision-making process was also made interpretable through saliency maps, which visually highlighted the specific cellular areas the AI focused on to make its classification, adding a layer of transparency crucial for medical applications.
The study, published in the journal *Cytopathology* (2025), represents a significant step toward AI-assisted pathology. By automating the descriptive reporting of cell morphology, the system has the potential to drastically reduce the time-consuming manual labor required of cytotechnologists and pathologists. This could lead to faster turnaround times for diagnoses and allow medical professionals to dedicate more time to complex case analysis and patient care, ultimately streamlining the diagnostic pipeline for lung cancer.
- Achieved perfect 100% sensitivity for classifying malignant vs. benign lung cell images.
- Used a novel dual-text-decoder Transformer architecture, switching based on CNN classification, to generate findings.
- Outperformed standard methods with a BLEU-4 score of 0.828 for text generation accuracy.
Why It Matters
Automates a tedious, expert-level reporting task in cancer diagnosis, potentially speeding up results and reducing pathologist burnout.