A Framework for Exploring and Disentangling Intersectional Bias: A Case Study in Fetal Ultrasound
Pixel spacing drove up to 24% performance gaps across 94K ultrasound images, confounding demographic bias analyses.
A new paper from researchers including Aya Elgebaly and Aasa Feragen tackles a nuanced problem in medical AI fairness: performance disparities that persist even when representation is balanced. Their proposed framework combines unsupervised slice discovery, factor-wise analysis, and targeted intersectional evaluation to disentangle bias sources. In a case study using over 94,000 fetal ultrasound images for weight estimation, they analyzed both a deep learning model and the clinical standard Hadlock regression formula.
Pixel spacing (PS), a parameter considered suboptimal in current acquisition protocols, emerged as a consistent driver of performance differences. Higher PS improved results by up to 24% for selected subgroups in both models. Because PS is often adjusted for high maternal BMI or low gestational age, the effect carries substantial confounding risk. Intersectional analysis revealed that part of the PS signal is explained by gestational age, while PS improvements persist across BMI strata. The work underscores that medical AI fairness must account for acquisition conditions and interaction effects, not just demographic representation.
- Pixel spacing (PS) drove up to 24% performance improvements in subgroups across both deep learning and clinical Hadlock models.
- PS is often adapted for high BMI or low gestational age, creating confounding that traditional fairness audits miss.
- Framework uses unsupervised slice discovery and intersectional evaluation to separate technical, clinical, and demographic bias sources.
Why It Matters
Shows that technical acquisition parameters, not just data imbalance, can mask or amplify bias in medical AI.