Beyond Benchmarks: A Framework for Post Deployment Validation of CT Lung Nodule Detection AI
A physics-guided framework shows AI lung cancer detection fails when CT scan parameters change from training data.
A new research paper by Daniel Soliman introduces a critical framework for validating AI systems that detect lung nodules in CT scans after they're deployed in hospitals. The study reveals that performance reported in controlled benchmark conditions often doesn't translate to real clinical settings where CT acquisition parameters differ. Using a MONAI RetinaNet model pretrained on the LUNA16 dataset, researchers tested 21 cases from the LIDC-IDRI dataset under five imaging conditions: baseline, 25% dose reduction, 50% dose reduction, 3mm slice thickness, and 5mm slice thickness.
Results showed baseline sensitivity was just 45.2% (detecting 57 of 126 consensus nodules). While dose reduction caused minimal degradation (41.3% at 25% dose), increasing slice thickness to 5mm caused a dramatic 19 percentage point drop to 26.2% sensitivity—representing a 42% relative performance decrease. This finding remained consistent across confidence thresholds from 0.1 to 0.9, indicating slice thickness represents a more fundamental constraint than image noise for AI detection accuracy.
The framework's significance lies in its practicality: it's reproducible, requires no proprietary scanner data, and is designed specifically for resource-constrained clinical environments. As AI-assisted lung nodule detection systems are increasingly deployed without site-specific validation, this physics-guided approach provides hospitals with a standardized method for ongoing quality assurance. The research highlights that without such post-deployment validation, AI systems may fail silently when faced with real-world variations in medical imaging protocols.
- 5mm CT slice thickness caused 42% performance drop in AI nodule detection compared to baseline
- Framework tested MONAI RetinaNet model on LIDC-IDRI data with simulated dose reduction and thickness changes
- Method requires no proprietary data and offers reproducible QA for hospitals with varying scan protocols
Why It Matters
Ensures AI cancer detection tools work reliably in real hospitals where CT scan parameters differ from training data.