Learning to Segment using Summary Statistics and Weak Supervision
Researchers show that summary statistics plus a handful of pixels can train accurate segmentation models.
Medical experts often manually segment images to obtain diagnostic statistics (like area or volume) and then discard the segmentation masks, retaining only the aggregated numbers. A new paper from researchers Omkar Kulkarni, Edward Raff, and Tim Oates proposes a method to train segmentation models using exactly this type of sparse information. The key insight: summary statistics alone (e.g., the area of a tumor) are insufficient for learning, but combining them with just a few annotated pixels (the 'weak supervision' signal) dramatically improves results. The team designed a loss function with three components: image reconstruction quality, matching the summary statistics, and overlap between the predicted foreground and the weak pixel labels. This allows the model to infer the full segmentation mask from minimal human input.
Experiments on standard image datasets, breast cancer ultrasound scans, and CT scans of kidney tumors confirmed the method's effectiveness. Remarkably, the approach requires only the final statistics that radiologists already compute, plus a handful of manually labeled pixels (as few as 5–10). This could drastically reduce the annotation burden in medical imaging, where full segmentation masks are costly and time-consuming to produce. The paper (arXiv:2605.03059) suggests a practical path to scalable AI-assisted diagnosis, especially in resource-constrained settings where expert annotators are scarce.
- Trains segmentation models using only summary statistics (e.g., area) rather than full pixel-level masks.
- Adding just 5–10 weak supervisory pixels significantly improves performance over statistics alone.
- Novel loss function combines reconstruction, statistic matching, and overlap; tested on breast cancer ultrasound and kidney tumor CT scans.
Why It Matters
Could slash the cost of medical image labeling, enabling AI segmentation from statistics doctors already calculate.