Image & Video

LUMINA: A Multi-Vendor Mammography Benchmark with Energy Harmonization Protocol

New dataset of 1,824 images from 6 vendors improves diagnostic AI accuracy by standardizing X-ray energy levels.

Deep Dive

A consortium of researchers has published LUMINA, a new benchmark dataset for mammography AI that directly addresses a major roadblock to clinical deployment: inconsistent image quality across different X-ray machine vendors. Existing public datasets are limited in size and vendor diversity, causing AI models to fail when faced with the subtle visual differences caused by varying acquisition energies and hardware. LUMINA contains 1,824 full-field digital mammography (FFDM) images from 468 patients, with pathology-confirmed labels for 960 benign and 864 malignant cases. Critically, it spans six different acquisition systems and explicitly tags each image with its vendor and energy metadata, allowing for the first systematic study of these domain shifts.

To combat this variability, the team developed a novel 'energy harmonization' protocol. This model-agnostic, pixel-space alignment method transforms images from various high- and low-energy styles into a consistent, low-energy reference space while carefully preserving crucial lesion morphology. When benchmarked on three key clinical tasks—diagnosis, BI-RADS classification, and density estimation—models trained with harmonized data showed clear improvements. The two-view model using EfficientNet-B0 achieved a top diagnostic AUC of 93.54%, and the Swin Transformer model led density prediction with an 89.43% macro-AUC. The harmonization not only boosted scores but also produced more accurate and localized model explanations via Grad-CAM, indicating the AI was focusing on clinically relevant areas.

Accepted for CVPR 2026, LUMINA provides two essential resources: a vendor-diverse, clinically annotated benchmark for rigorous testing, and a practical harmonization framework. This work moves the field beyond simply chasing accuracy on clean, single-vendor data and toward building AI that is robust enough for real-world hospital environments where equipment varies. It establishes a new standard for evaluating whether a mammography AI model will work reliably across the patchwork of imaging technology found in global healthcare systems.

Key Points
  • Contains 1,824 pathology-confirmed mammograms from 6 different X-ray system vendors, explicitly encoding acquisition energy data.
  • Proposes an 'energy harmonization' method that standardizes images, improving diagnostic AUC to 93.54% with EfficientNet-B0 and producing better model explanations.
  • Provides a curated benchmark and framework to build AI robust to real-world vendor variation, a key requirement for clinical deployment.

Why It Matters

Solves a core reliability issue in medical AI, enabling models that work consistently across different hospital imaging equipment worldwide.