Research & Papers

M2Retinexformer boosts low-light images with depth and semantic cues

Combines depth, luminance, and semantic features for clearer night photos

Deep Dive

Low-light image enhancement remains challenging due to noise, artifacts, and color distortion. Existing deep learning methods, particularly Retinex-based ones, rely solely on RGB information. A new paper by Youssef Aboelwafa and colleagues introduces M2Retinexformer (Multi-Modal Retinexformer), a framework that incorporates depth, luminance, and semantic features into a progressive refinement pipeline. Depth provides geometry invariant to lighting, luminance guides brightness distribution, and semantic features improve scene understanding.

M2Retinexformer extracts these modalities at multiple scales and fuses them via cross-attention with adaptive gating that balances illumination-guided self-attention and cross-attention based on cue reliability. Evaluated on four benchmarks (LOL, SID, SMID, SDSD), it surpasses Retinexformer and recent state-of-the-art methods. Accepted at IEEE ICIP 2026, the code and pretrained weights are publicly available.

Key Points
  • Integrates depth, luminance, and semantic features beyond RGB-only Retinexformer
  • Uses multi-scale cross-attention with adaptive gating for dynamic modality fusion
  • Outperforms prior art on LOL, SID, SMID, and SDSD low-light benchmarks

Why It Matters

Enables sharper, more accurate low-light images for photography, surveillance, and autonomous driving.