NucEval: A Robust Evaluation Framework for Nuclear Instance Segmentation
New framework addresses vague regions, score normalization, overlaps, and border uncertainty.
Nuclear instance segmentation is a critical task in computational pathology, enabling downstream applications like cancer grading and prognosis. However, existing evaluation pipelines suffer from four fundamental flaws: ambiguous handling of vague regions (where ground truth is uncertain), inconsistent score normalization across models, failure to properly account for overlapping instances, and ignoring border uncertainty. A team led by Amirreza Mahbod et al. proposes NucEval, a unified framework that systematically addresses each issue. By applying modifications such as region-based soft labeling, adaptive thresholding, and boundary-aware metrics, NucEval delivers more accurate and reproducible evaluations.
Validated on the NuInsSeg dataset (which includes challenging heterogeneous regions) and two additional external datasets, NucEval demonstrates significant metric shifts compared to traditional pipelines. The evaluation uses three model types—CNNs and vision transformers (ViTs)—showing that the framework is model-agnostic. The code, complete guidelines, and illustrative examples are publicly released to encourage adoption. For computational pathology practitioners, NucEval means more trustworthy benchmarks, reducing the risk of overestimating model performance in clinical-grade segmentation tasks.
- NucEval addresses four specific evaluation flaws: vague regions, score normalization, overlapping instances, and border uncertainty.
- Validated on three datasets (NuInsSeg plus two external) using CNN and ViT models, showing measurable metric improvements.
- Code and full guidelines are open-source on the arXiv-linked repository (arXiv:2605.03144).
Why It Matters
Reliable evaluation pipeline for pathology AI - directly impacts clinical decision-making accuracy.