FetalCLIP achieved top linear probing F1 of 0.9261 in-domain and 0.9731 out-of-domain?

FetalCLIP achieved top linear probing F1 of 0.9261 in-domain and 0.9731 out-of-domain.

USFM led full fine-tuning with F1=0.9476 in-domain and 0.9515 out-of-domain?

USFM led full fine-tuning with F1=0.9476 in-domain and 0.9515 out-of-domain.

MOFO and UltraSAM underperformed, sometimes worse than natural-image pretrained DINOv3?

MOFO and UltraSAM underperformed, sometimes worse than natural-image pretrained DINOv3.

Image & Video

FetalCLIP and USFM top benchmark for fetal ultrasound classification

arXiv eess.IV May 28, 2026

⚡FetalCLIP hits F1 0.9731 on out-of-domain fetal plane data

Deep Dive

A new study on arXiv provides the first comprehensive benchmark of ultrasound-specific foundation models for fetal plane classification. The work, led by Leya Barrientos and colleagues, tested four models (USFM, MOFO, UltraSAM, FetalCLIP) against two CNN baselines (ResNet50, EfficientNet-V2) and a vision transformer (DINOv3) pretrained on natural images. All models were evaluated under two settings: full fine-tuning and linear probing with a frozen encoder, using 5-fold patient-level cross-validation on a Spanish fetal ultrasound dataset. Testing also included an external African cohort to assess cross-population generalization.

Key results show FetalCLIP achieved the highest F1 scores in the low-data linear probing regime: 0.9261 in-domain and 0.9731 out-of-domain. USFM performed best with full fine-tuning, scoring 0.9476 and 0.9515, respectively. In contrast, MOFO and UltraSAM degraded significantly in both settings, sometimes underperforming the natural-image pretrained DINOv3. The findings underscore that pretraining objectives and data composition heavily influence transferability, with FetalCLIP's contrastive objective proving especially robust for cross-population generalization. This benchmark offers critical guidance for deploying AI in fetal ultrasound, particularly in low-resource and diverse clinical settings where annotated data is scarce.

Key Points

FetalCLIP achieved top linear probing F1 of 0.9261 in-domain and 0.9731 out-of-domain.
USFM led full fine-tuning with F1=0.9476 in-domain and 0.9515 out-of-domain.
MOFO and UltraSAM underperformed, sometimes worse than natural-image pretrained DINOv3.

Why It Matters

Best-performing models like FetalCLIP enable accurate fetal screening across diverse populations with minimal labeled data.

Read Original Article

FetalCLIP and USFM top benchmark for fetal ultrasound classification

Why It Matters

Related Articles

🚀 Stay Ahead in AI