Research & Papers

USEMA: Hybrid UNet with Mamba-Like Attention Boosts Medical Image Segmentation

New model beats vision transformers on efficiency and accuracy for medical imaging tasks.

Deep Dive

USEMA is a hybrid UNet architecture that merges local feature extraction from CNNs with SEMA (Scalable Efficient Mamba-like Attention). SEMA uses token localization via local window attention and arithmetic averaging to avoid quadratic complexity. Experiments show improved computational efficiency compared to transformer models using full self-attention, and superior segmentation performance relative to purely convolutional and Mamba-based models.

Key Points
  • USEMA uses SEMA (Scalable Efficient Mamba-like Attention) with local window attention and arithmetic averaging to avoid quadratic complexity of standard self-attention.
  • Hybrid architecture merges CNN feature extraction with efficient attention, achieving better segmentation accuracy than pure convolution or Mamba-based models.
  • Tested on multiple medical imaging modalities (CT, MRI, etc.) with improved computational efficiency vs. vision transformers.

Why It Matters

Faster, more accurate medical image segmentation could enable real-time diagnostics and reduce hardware requirements for clinical AI.