Image & Video

A Mamba-based Perceptual Loss Function for Learning-based UGC Transcoding

A new AI model uses Mamba architecture to make compressed TikTok and YouTube videos look better.

Deep Dive

A research team including Zihao Qi, Chen Feng, and David Bull has published a paper introducing a novel AI-driven approach to improving video compression for user-generated content (UGC). The core problem they address is that platforms like TikTok, YouTube, and Instagram must transcode videos that are already degraded from prior compression, editing, or poor capture conditions. Traditional compression methods that optimize for pixel-perfect fidelity to this flawed source end up preserving and even amplifying its artifacts. The team's key innovation is a new perceptual loss function that redefines the role of the reference video, treating it not as a ground-truth target but as a contextual guide for what the final video should look like.

To build this function, the researchers trained a lightweight neural quality model based on a Selective Structured State-Space Model, commonly known as Mamba—an architecture gaining traction for its efficiency in handling long sequences. This model was optimized using a weakly-supervised Siamese ranking strategy to better judge perceptual quality. When integrated into the rate-distortion optimization (RDO) process of two leading neural video codecs, DCVC and HiNeRV, the system achieved significant coding gains. Experiments showed BD-rate savings of 8.46% over an autoencoder baseline and 12.89% over an implicit neural representation baseline, meaning it can deliver the same perceived video quality at a substantially lower bitrate.

The practical impact is substantial for any service handling massive volumes of UGC. This technology enables more efficient storage and bandwidth use while actually improving the end-user's viewing experience, as the compression focuses on perceptual quality rather than artifact replication. It represents a shift from purely mathematical fidelity to a more human-centric approach in video encoding, leveraging modern AI architectures like Mamba to solve a pervasive problem in digital media infrastructure.

Key Points
  • Uses a Mamba-based neural model to judge perceptual quality, achieving BD-rate savings up to 12.89%.
  • Redefines the reference video as a guide, not a target, preventing the replication of source artifacts.
  • Integrated into neural codecs DCVC and HiNeRV, enabling more efficient compression for platforms like TikTok and YouTube.

Why It Matters

Lets social media and streaming platforms save significant bandwidth and storage costs while making user-uploaded videos look better.