Research & Papers

Medical AI Overfitting: InceptionV3 on X-ray Angiograms Hits Validation Wall

Small 900-frame dataset causes validation accuracy to collapse from 79% to 30%.

Deep Dive

A Reddit user (r/MachineLearning) is grappling with severe overfitting on a 2-class medical imaging task—distinguishing left coronary artery (LCA) from right coronary artery (RCA) using 2D X-ray angiograms. Their setup relies on InceptionV3 in PyTorch, with only ~900 training frames drawn from ~300 unique DICOM files, converted to 3-channel 299x299 grayscale arrays. They apply standard ImageNet transfer learning, class weights to handle a 2:1 LCA:RCA imbalance, and various regularizers. Despite these efforts, training accuracy skyrockets to 95-99% within a few epochs, while validation accuracy peaks early at just 74-79% and then plummets to 30-40%—clear evidence of the model memorizing texture artifacts rather than learning clinically meaningful features.

To combat this, the user has tried partial and full unfreezing of InceptionV3 layers, dropout layers (rates from 0.3 to 0.6), weight decay (1e-4), data augmentation (flips, 25° rotations, translation), and a ReduceLROnPlateau scheduler with factor 0.5 and patience 8. Yet none of these interventions prevented the validation collapse. The post highlights a persistent pain point in medical AI: small datasets and high-dimensional models often lead to overfitting, especially when images are from limited patient populations. The researcher explicitly asks for literature or novel strategies for small-sample medical classification, signaling a gap in practical transfer learning techniques for this domain. This case underscores the need for better domain-specific pretraining, more aggressive augmentation, or alternative architectures like Vision Transformers with inductive biases suited to medical imagery.

Key Points
  • 900 training frames from 300 DICOMs cause memorization of patient-specific textures
  • Validation accuracy peaks at 74-79% then collapses to 30-40% despite dropout up to 0.6 and weight decay 1e-4
  • Researcher seeks papers and strategies for small-sample medical classification, highlighting limitations of standard transfer learning

Why It Matters

This case exposes a critical bottleneck in medical AI: small datasets still defeat standard regularization, limiting reliable deployment of deep learning in clinical imaging.