Image & Video

External Validation of Deep Learning Models for BI-RADS Breast Density Prediction from Ultrasound Images

AI models achieve AUROC up to 0.899 for extremely dense breasts on 2,000-exam cohort

Deep Dive

A new study externally validated three deep learning models—DenseNet121, ViT-B/32, and ResNet50—for predicting BI-RADS breast density categories from ultrasound images. Tested on an independent cohort of 2,000 ultrasound exams (500 cancer cases matched with 1,500 negative controls by manufacturer and year), the models achieved strong performance overall. DenseNet121 led with a micro-averaged AUROC of 0.885, while ViT-B/32 and ResNet50 followed closely. All three performed best on extremely dense breasts (AUROC 0.868–0.899) and worst on heterogeneously dense tissue (0.699–0.729), a known clinical challenge.

The team also evaluated downstream 10-year cancer risk by combining age with AI-derived density in the Tyrer-Cuzick model. This combination yielded an AUROC of 0.541, slightly lower than the reference using mammography-reported density (0.570), but the difference was not statistically significant (p = 0.23). The paper, accepted at IWBI 2026, highlights that deep learning generalizes well across racial compositions but needs targeted optimization for heterogeneously dense breasts to improve clinical utility.

Key Points
  • DenseNet121 achieved highest overall performance (micro-averaged AUROC 0.885) across four BI-RADS density categories.
  • Models performed best on extremely dense breasts (AUROC 0.868–0.899) and worst on heterogeneously dense (0.699–0.729).
  • AI-derived density with age did not significantly outperform mammography-reported density for 10-year risk prediction (0.541 vs. 0.570, p=0.23).

Why It Matters

AI breast density from ultrasound could enable faster, cheaper screening—but heterogeneously dense tissue remains an accuracy bottleneck.