Biological Spatial Priors Regularize Foundation Model Representations for Cross-Site MSI Generalization in Colorectal Cancer
New method achieves 1.000 specificity on external READ slides using spatial biology cues.
A new study from Dasari Naga Raju tackles a persistent challenge in computational pathology: poor generalization of MSI prediction models across different medical institutions. While foundation models like UNI2-h and Virchow2 provide powerful feature representations, they still encode site-specific texture artifacts that hinder cross-site performance. The key innovation is the introduction of tile-level spatial priors derived from known MSI histology—specifically, peripheral distance encoding reflecting the Crohn's-like lymphocytic reaction at the tumor invasive margin, and local immune neighborhood encoding capturing lymphocyte-to-tumor ratios.
When injected into a TransMIL aggregator before self-attention, these biological priors act as regularizers, guiding the transformer toward site-invariant morphological features. Trained on 137 TCGA-COAD slides and tested on 50 TCGA-READ slides without fine-tuning, the peripheral distance prior achieved a remarkable MSI AUC of 0.959 ± 0.012 on COAD and perfect MSS specificity (1.000) on READ—outperforming the strongest reference configuration (0.957 AUC, 0.939 specificity). The local immune neighborhood prior showed comparable internal accuracy but lower cross-site specificity, suggesting that margin proximity encodes a more universal biological signal than local immune density. This work demonstrates that biologically grounded spatial priors can significantly reduce dependence on site-specific imaging patterns, paving the way for more robust clinical deployment of AI-based MSI screening.
- Peripheral distance encoding using Crohn's-like lymphocytic reaction at tumor invasive margin achieves 1.000 specificity on external TCGA-READ dataset
- Method integrates spatial priors into TransMIL aggregator with UNI2-h or Virchow2 foundation models, trained on only 137 TCGA-COAD slides
- Local immune neighborhood encoding shows comparable internal AUC but lower cross-site generalization, highlighting margin proximity as more site-invariant
Why It Matters
Enables reliable MSI screening from routine pathology slides across hospitals without costly molecular testing.