ChWDTA: Wavelet-transform attention cuts image compression bitrate by 22.6%
Hybrid CNN-transformer LIC gets double-digit BD-rate reductions via channel-wise wavelet decomposition.
A new paper from Haisheng Fu and colleagues introduces ChWDTA (Channel-wise Wavelet-Domain Transformer Attention), a technique that injects wavelet transforms into both the attention and entropy-coding stages of learned image compression (LIC). Unlike previous hybrid CNN-transformer backbones that rely solely on spatial self-attention, ChWDTA first applies a channel-wise wavelet transform to the feature maps before computing the Q, K, V projections. This sparsifies channel covariance while preserving the efficient windowed spatial tokenization already used in modern LIC backbones. After attention, the output is mapped back via the inverse wavelet transform, forming a Channel-wise Wavelet-Domain Transformer Block (ChWDTB). The design improves rate-distortion performance without increasing the computational complexity of spatial windows.
On the entropy-coding side, the authors introduce a channel-wise wavelet packet (ChWP) decomposition that produces four equal-sized subbands, each better suited for slice-based autoregressive entropy modeling. By splitting each subband into two slices, the scheme uses eight total slices for entropy coding. On the Kodak, CLIC Professional Validation, and Tecnick test sets, the method yields BD-rate reductions of -17.82%, -19.15%, and -22.56%, respectively. Even when each subband is encoded as a single slice, most gains are retained with lower complexity. The results confirm the advantage of integrating wavelet transforms into CNN-transformer LIC pipelines, offering a practical path to more efficient image compression for streaming and storage applications.
- ChWDTA applies channel-wise wavelet transforms to Q/K/V projections in windowed self-attention, reducing channel covariance.
- Achieves BD-rate reductions of -17.82%, -19.15%, and -22.56% on Kodak, CLIC, and Tecnick datasets.
- Uses 8-slice entropy coding via wavelet packet subbands; single-slice coding retains most gains at lower cost.
Why It Matters
Better image compression means lower bandwidth and storage costs for streaming services and cloud storage.