Adapting Medical Vision Foundation Models for Volumetric Medical Image Segmentation via Active Learning and Selective Semi-supervised Fine-tuning
How to fine-tune medical vision models without wasting labels on easy samples?
Medical vision foundation models (Med-VFMs) often underperform on downstream volumetric segmentation tasks because standard fine-tuning relies on randomly selected labeled samples, which may miss the most informative data. To solve this, a team led by Jin Yang from Washington University in St. Louis proposes the Active Selective Semi-supervised Fine-tuning (ASSFT) framework. ASSFT combines an active learning strategy with selective semi-supervision to efficiently adapt Med-VFMs without access to original pre-training data. The core innovation is an Active Test-Time Sample Query strategy that uses two complementary metrics: Diversified Knowledge Divergence (DKD) quantifies both the knowledge gap between pre-training and target domains and the semantic diversity within the target dataset. This ensures the model selects samples that contain previously unlearned knowledge while maintaining intra-domain diversity. The second metric, Anatomical Segmentation Difficulty (ASD), estimates segmentation difficulty by measuring predictive uncertainty within foreground regions of interest. This allows the model to prioritise samples with complex anatomical patterns, such as organs or lesions, rather than those dominated by background uncertainty.
Beyond active sampling, ASSFT employs a Selective Semi-supervised Fine-tuning strategy to further boost performance by leveraging unlabeled target samples. Instead of using all pseudo-labeled data, the method selectively incorporates only reliable unlabeled samples based on predictive confidence and semantic distance to labeled samples. This prevents noisy pseudo-labels from degrading the model and ensures stable semi-supervised training. The paper, published on arXiv (2509.10784v3), spans 19 pages with 6 figures and 8 tables. While no code is explicitly released yet, the framework is designed to be practical for clinical settings where volumetric medical images (MRI, CT) are abundant but expert annotations are scarce. By intelligently querying the most valuable samples and selectively using unlabeled data, ASSFT promises to reduce annotation costs significantly while improving segmentation accuracy.
- ASSFT uses two novel query metrics: DKD (knowledge gap + diversity) and ASD (anatomical complexity) to pick the most informative samples for fine-tuning.
- Selective semi-supervised learning filters pseudo-labels by confidence and semantic distance, avoiding noisy data.
- The method adapts medical vision foundation models without needing original source data or large annotated sets.
Why It Matters
Efficiently adapts medical AI to new imaging tasks with fewer annotations, accelerating clinical deployment and reducing expert workload.