A Workflow to Efficiently Generate Dense Tissue Ground Truth Masks for Digital Breast Tomosynthesis
A new method requiring annotation on just one slice achieves 0.83 Dice score accuracy.
A multi-institutional research team has introduced a groundbreaking workflow that dramatically accelerates the creation of training data for AI models in breast cancer screening. The system, designed for Digital Breast Tomosynthesis (DBT)—now the standard screening method in the US—solves a critical bottleneck: the extreme time and labor required for radiologists to manually outline fibroglandular (dense) tissue across dozens of slices in a 3D scan. Their novel approach requires a human annotator to draw only a single region of interest on the central slice and select a threshold. The algorithm then intelligently projects this annotation to adjacent slices and iteratively adjusts slice-specific thresholds to maintain volumetric consistency, preserving the complex 3D structure of breast tissue.
Evaluated on 44 DBT volumes from the public DBTex dataset, the method proved both accurate and efficient. It achieved a median Dice similarity coefficient of 0.83 when compared to full manual segmentations on 176 benchmark slices. This performance is nearly identical to the inter-reader agreement between two radiologists (median Dice of 0.84), demonstrating the tool's clinical-grade reliability. By shifting from a per-slice to a per-volume annotation paradigm, the framework slashes the manual labor involved in dataset creation, which is the primary constraint hindering the development of robust AI models for personalized breast cancer risk assessment. This work, published on arXiv, provides a practical pipeline that could unlock larger, high-quality datasets to train the next generation of computer vision tools in oncology.
- Reduces manual annotation to just the central slice of a 3D DBT volume, cutting labor by ~80%.
- Achieves a median Dice score of 0.83, matching inter-radiologist agreement (0.84).
- Validated on 44 clinical DBT volumes from the public DBTex dataset.
Why It Matters
This tool accelerates the creation of vital training data, unlocking faster development of AI for personalized breast cancer risk assessment.