Kelvin v1.0 neural pre-encoder slashes H.264 bitrate 27.6% while staying standards-compliant
A lightweight learned preprocessor that improves libx264 by 27.6% BD-VMAF without breaking decoders.
Marco Graziano's Kelvin v1.0 introduces a lightweight neural pre-encoder that sits in front of an unmodified libx264 encoder. The model applies content-adaptive pixel adjustments bounded at ±1/255 per channel, guiding the encoder to allocate bits more perceptually while emitting a fully standard H.264 bitstream compatible with all existing decoders, players, and CDNs. On the 7-sequence 1080p UVG benchmark, Kelvin achieves a mean BD-VMAF of -27.62% (7 of 7 wins) and BD-VMAF-NEG of -5.18% (6 of 7 wins) relative to baseline libx264 at preset medium. On the larger 30-sequence MCL-JCV set (28 unseen by training), the same checkpoint wins on 28 of 30 clips by BD-VMAF, with consistent performance (-27.70% BD-VMAF) after removing two diagnosable failures.
A core engineering challenge addressed is the non-differentiability of H.264. Kelvin uses a hybrid codec proxy that combines a calibrated differentiable rate estimator (Spearman ρ = 0.986 vs. real libx264 bits-per-pixel) with a U-Net distortion proxy trained on real encoder outputs. The paper publishes full per-sequence rate-distortion data, a named failure-mode taxonomy on MCL-JCV (rate-floor violation, distribution shift, metric saturation), and a five-baseline sanity panel (hqdn3d, unsharp, -tune psnr, -tune ssim, x265 medium). Notably, x265 medium beats Kelvin on every metric on the same corpus, so Kelvin is positioned specifically for workloads where remaining on H.264 is a constraint rather than a choice.
- Kelvin v1.0 achieves -27.62% BD-VMAF on UVG benchmark (7/7 wins) and -27.70% on MCL-JCV (28/30 wins) relative to libx264 medium.
- Uses a hybrid codec proxy with differentiable rate estimator (Spearman ρ=0.986 vs real bits) and U-Net distortion proxy to overcome H.264 non-differentiability.
- Designed for scenarios where H.264 is mandatory; x265 medium still outperforms Kelvin on all metrics on the same corpus.
Why It Matters
Enables significant perceptual quality gains on legacy H.264 infrastructure without upgrading codecs, ideal for CDN-bound workflows.