Image & Video

On Optimizing Image Codecs for VMAF NEG: Analysis, Issues, and a Robust Loss Proposal

A new paper reveals how AI-trained image codecs can still 'cheat' a key perceptual quality metric.

Deep Dive

A team of researchers, including Florian Fingscheidt, Alexander Karabutov, and Elena Alshina, has published a critical analysis of VMAF NEG, a popular metric for assessing perceptual quality in AI-driven image and video compression. The paper, 'On Optimizing Image Codecs for VMAF NEG: Analysis, Issues, and a Robust Loss Proposal,' identifies that despite being designed as a more robust version of VMAF, the NEG variant remains vulnerable to adversarial optimization. This means machine-learned codecs fine-tuned to maximize VMAF NEG scores can still produce images that score well but look worse to humans—a problem exemplified by techniques like unsharpening that boost the metric while degrading real visual quality.

The researchers' key contribution is a proposed 'robust loss' function that incorporates VMAF NEG in a way that mitigates these vulnerabilities. This new method allows for the beneficial fine-tuning of either an encoder or a decoder on a dataset, aiming to preserve the metric's high correlation with human perception while preventing the model from 'gaming' the system. The work is supported by quantitative results and perceptual examples, providing a crucial fix for a foundational issue in training next-generation, efficient media codecs. This advancement is essential for ensuring that the AI models compressing our digital media are optimized for what we actually see, not just for a flawed numerical score.

Key Points
  • Identifies that the VMAF NEG metric, used to train AI codecs, can still be 'attacked' to produce high scores for low-quality images.
  • Proposes a new robust loss function to properly leverage VMAF NEG for fine-tuning encoder or decoder neural networks.
  • Aims to close the gap between metric scores and true human perception, which is critical for reliable AI-based media compression.

Why It Matters

Ensures AI-compressed images and videos are optimized for human eyes, not just benchmark scores, improving real-world quality.