P-SWA speeds neural video decoding 36% with parallel wavefronts
Sliding window attention gets a parallel boost, slashing latency and improving compression.
Most neural video codecs rely on temporal conditioning, which causes error propagation across long sequences. Transformer-based architectures like the Video Compression Transformer (VCT) avoid this drift but suffer from high computational cost and inferior rate-distortion (RD) performance. The recent Sliding Window Attention (SWA) method reduces complexity and improves RD, but it forces strictly sequential raster-scan decoding, creating a latency bottleneck. Researchers Alexander Kopte and André Kaup have now introduced P-SWA (Parallel Sliding Window Attention), which uses diagonal wavefronts to break the sequential dependency and enable parallel decoding. This is achieved by embedding a hyperprior and an accumulator that fuses side information with local spatial context.
In experiments, P-SWA achieves a 36% decoding speed increase over the parallel VCT baseline while delivering Bjøntegaard Delta-rate savings of 10.0% for I-frames and 7.1% for P-frames compared to the sequential SWA baseline. The paper has been accepted for ICIP 2026 and is available on arXiv. For professionals working on video streaming, real-time communications, or edge deployment, P-SWA represents a practical step toward fast, drift-free neural video decoding without sacrificing compression efficiency.
- P-SWA uses diagonal wavefronts to enable parallel decoding of transformer-based neural video codecs.
- Decoding speed increases by 36% over the parallel VCT baseline.
- Achieves up to 10.0% BD-rate savings for I-frames and 7.1% for P-frames vs. sequential SWA.
Why It Matters
Faster neural video decoding without quality loss means better real-time streaming and lower latency for edge AI applications.