Image & Video

KD-NVC accelerates neural video decoding to 69 FPS at 1080p

New search-and-distill framework delivers real-time 1080p decoding on edge GPUs

Deep Dive

Neural video coding (NVC) has dramatically improved compression efficiency but remains too slow for real-time decoding on edge devices due to high computational complexity. To close this gap, researchers from multiple institutions propose KD-NVC, a search-and-distill framework that systematically accelerates NVC models. The first stage uses an acceleration-efficiency-based neural architecture search (AE-NAS) algorithm. Unlike traditional NAS that treats the entire codec uniformly, AE-NAS explores the module-wise Pareto frontier to adaptively allocate the acceleration budget across heterogeneous sub-modules (e.g., motion estimation, residual coding, entropy models). This avoids wasteful uniform reductions and finds optimal lightweight architectures without training every candidate from scratch.

The second stage introduces an energy-aware feature distillation (EFD) loss that aligns spatially-aggregated feature-energy signatures between the teacher and student codecs. Traditional distillation overlooks the rate-constraint-induced sparsity in feature maps, which is critical for compression quality. EFD explicitly transfers these sparsity patterns, preserving the teacher's rate-distortion performance. Experimental results show KD-NVC consistently beats existing codec-oriented distillation methods. On an RTX 5060 GPU, the distilled student decoder achieves 69 FPS at 1080p resolution while maintaining compression quality comparable to VTM-LDB (the state-of-the-art traditional video codec). This demonstrates that carefully designed acceleration can unlock real-time neural video decoding on consumer hardware without sacrificing efficiency.

Key Points
  • AE-NAS finds Pareto-optimal architectures per heterogeneous sub-module, avoiding uniform reduction and costly full training.
  • Energy-aware feature distillation (EFD) transfers rate-induced feature sparsity, preserving compression quality in the lightweight student.
  • Achieves 69 FPS 1080p decoding on RTX 5060 with rate-distortion performance matching VTM-LDB, a traditional codec baseline.

Why It Matters

Enables real-time neural video decoding on consumer GPUs, bridging the gap between compression efficiency and practical deployment.