A Controlled Benchmark of Visual State-Space Backbones with Domain-Shift and Boundary Analysis for Remote-Sensing Segmentation
A controlled study reveals visual state-space models like VMamba struggle with boundary delineation during domain shifts.
A research team from Sri Lanka has published a controlled benchmark study evaluating visual state-space models (SSMs) like VMamba, MambaVision, and Spatial-Mamba for the specific task of remote-sensing semantic segmentation. Published on arXiv and accepted for IEEE IGARSS 2026, the study's core innovation is its strict experimental control: it isolates the encoder's performance by using a unified 4-stage feature interface and a fixed lightweight decoder across all tests. This methodology, applied to the LoveDA and ISPRS Potsdam datasets, provides a clear, apples-to-apples comparison of these emerging Mamba-based architectures against traditional CNN and Vision Transformer baselines.
The benchmark yielded three critical findings. First, simply scaling up SSM models (intra-family scaling) provided only modest performance gains. Second, the models exhibited strongly asymmetric cross-domain generalization, meaning they transferred knowledge poorly between different geographical or sensor-based datasets. Most significantly, the dominant failure mode identified was poor boundary delineation under distribution shift, highlighting a specific robustness weakness. While visual SSMs demonstrated favorable accuracy-efficiency trade-offs, the researchers conclude that future architectural improvements should prioritize robustness-oriented design and boundary-aware decoding strategies over merely making the encoder backbone larger.
- The study provides a first strictly controlled benchmark of visual SSMs (VMamba, MambaVision, Spatial-Mamba) for remote-sensing segmentation, isolating encoder effects.
- Key finding: Boundary delineation is the dominant failure mode for these models under domain shift, more critical than overall accuracy.
- Results suggest future Mamba-based vision model development should focus on robustness and boundary-aware decoders, not just encoder scaling.
Why It Matters
This provides a crucial roadmap for developers building efficient AI for satellite imagery, agriculture, and urban planning, highlighting where to focus engineering efforts.