PathRWKV: Enhancing Whole Slide Image Inference with Asymmetric Recurrent Modeling
A State Space Model cuts GPU memory usage to constant while beating 11 SOTA methods.
PathRWKV introduces a novel approach to processing whole-slide pathology images, which are too large for direct GPU loading. Traditional two-stage MIL frameworks decouple tile extraction from slide-level modeling but suffer from four critical flaws: training/inference memory trade-offs, overfitting on small datasets, loss of spatial structure, and poor multi-scale feature handling. To tackle these, the team proposes an asymmetric structure using max pooling aggregation that enables parallelized training with high throughput and recurrent inference with constant (O(1)) memory complexity. They also incorporate 2D sinusoidal positional encoding to retain spatial context, random sampling for data diversity, and a multi-task learning module to regularize feature learning.
The model integrates TimeMix and ChannelMix modules for dynamic multi-scale feature modeling across temporal and spatial dimensions. In extensive experiments spanning 29,073 WSIs from 11 datasets—covering various cancer types and staining protocols—PathRWKV outperformed 11 state-of-the-art methods on 10 of those datasets. This makes it a scalable, memory-efficient solution for deployment in clinical pathology workflows where GPU resources are often limited and real-time inference is desirable.
- Asymmetric recurrent structure achieves O(1) memory complexity during inference, enabling processing on GPUs with limited VRAM
- Outperforms 11 SOTA methods on 10 out of 11 datasets covering 29,073 whole-slide images
- Incorporates 2D sinusoidal position encoding to restore spatial structure lost in tile-based MIL approaches
Why It Matters
Enables scalable, real-time AI-assisted cancer diagnosis on standard hardware, reducing barriers for clinical adoption.