Feature-level Site Leakage Reduction for Cross-Hospital Chest X-ray Transfer via Self-Supervised Learning
A new self-supervised learning technique reduces 'site leakage' in medical AI, improving pneumonia detection across hospitals.
A new research paper from Ayoub Louaye Bouaziz and Lokmane Chebouba tackles a critical problem in medical AI: 'site leakage,' where models trained on data from one hospital fail to generalize to others because they learn institution-specific artifacts (like scanner brands or imaging protocols) instead of genuine medical patterns. The researchers propose a novel framework that directly measures this leakage using a post-hoc linear probe that predicts which hospital an image came from based on the AI's internal features. They then apply multi-site self-supervised learning (SSL)—training a ResNet-18 model on unlabeled chest X-rays from NIH and CheXpert—to create more robust features before fine-tuning for pneumonia detection.
Their results are revealing. Multi-site SSL alone boosted the model's AUC (Area Under the Curve) for pneumonia detection on the unseen RSNA dataset by over 16%, from 0.6736 to 0.7804. When they added an adversarial 'site confusion' technique to explicitly force the model to ignore hospital signatures, they successfully reduced the measured site leakage—the probe's accuracy to identify the source hospital dropped from near-perfect (98.9%) to 85%. However, this forced invariance did not reliably improve diagnostic performance and sometimes increased result variance, challenging the common assumption that eliminating site signals is always beneficial. The study concludes that rigorously measuring leakage is essential for properly evaluating AI transfer methods in healthcare.
- Multi-site self-supervised learning (SSL) improved cross-hospital pneumonia detection AUC by 16%, from 0.674 to 0.780 on the RSNA dataset.
- The novel measurement technique reduced 'site leakage'—the AI's ability to identify the source hospital—from 98.9% accuracy down to 85%.
- The research shows that forcing 'site invariance' via adversarial training doesn't always boost diagnostic performance, changing how medical AI transfer is evaluated.
Why It Matters
This work provides a crucial framework for building medical AI that works reliably across different hospitals and imaging equipment, a major barrier to real-world deployment.