Image & Video

Combined Dictionary Unfolding Network with Gradient-Adaptive Fidelity for Transferable Multi-Source Fusion

arXiv eess.IV May 04, 2026

⚡Outperforms second-best by 1.59 dB PSNR on RoadScene with 4x lower memory footprint.

Deep Dive

Existing deep unfolding networks for multi-source image fusion suffer from high computational and memory overhead because they iteratively update features from different modalities separately. A new paper from researchers at multiple Chinese institutions introduces CDNet, a Combined Dictionary Unfolding Network that reworks the architecture from the ground up. Instead of alternating minimization, CDNet encodes a coupled dictionary learning prior into a joint unfolding architecture called CDBlock. This block-sparse interaction topology performs a model-derived joint update of common and modality-specific representations, streamlining feature learning and dramatically reducing complexity. The network is trained unsupervised using a compact High- and Low-frequency Image Fidelity loss, so no ground-truth fused images are needed. This makes CDNet both efficient and practical for real-world deployment.

CDNet was evaluated on four tasks: multi-exposure fusion, infrared-visible fusion, medical image fusion, and infrared-visible fusion for semantic segmentation. Across all tasks, it achieved competitive or superior performance while being significantly more efficient. Specifically, on the TNO infrared-visible dataset, CDNet outperformed the second-best method by 1.23 dB in PSNR, and on RoadScene by 1.59 dB. It also led on five of six metrics on RoadScene. The lightweight design means CDNet can run on resource-constrained edge devices, opening up real-time fusion applications in surveillance, autonomous navigation, and medical imaging.

Key Points

CDNet uses a block-sparse interaction topology that jointly updates common and modality-specific features, unlike separate updates in prior deep unfolding methods.
Achieves 1.23 dB and 1.59 dB PSNR gains over second-best on TNO and RoadScene infrared-visible datasets, respectively.
Unsupervised training with High- and Low-frequency Image Fidelity loss eliminates need for ground-truth images, enabling deployment on edge devices.

Why It Matters

Enables high-quality multi-source image fusion on resource-constrained edge devices, expanding real-time applications in surveillance and medical imaging.

Read Original Article

Combined Dictionary Unfolding Network with Gradient-Adaptive Fidelity for Transferable Multi-Source Fusion

Why It Matters

Stay Ahead in AI