ANVIL: Accelerator-Native Video Interpolation via Codec Motion Vector Priors
New AI method achieves 12.8 ms 1080p inference on Snapdragon 8 Gen 3, enabling real-time 60fps playback from 30fps video.
Shibo Liu's ANVIL research paper introduces a breakthrough approach to real-time video frame interpolation on mobile devices. Traditional AI methods for converting 30fps video to 60fps struggle on mobile neural processing units (NPUs) due to three key barriers: spatial sampling operators exceed the 33.3 ms frame budget, iterative flow refinement fails under 8-bit quantization, and memory-bound operators dominate inference graphs. ANVIL solves these by fundamentally changing the architecture.
Instead of learning optical flow from scratch, ANVIL reuses motion vectors already computed by the H.264 video decoder to pre-align input frames. This eliminates learned optical flow, spatial sampling, and iterative accumulation from the accelerator graph entirely. The remaining work is handled by a convolution-dominated network composed almost entirely of compute-bound operators, which are much more efficient on mobile hardware.
On a Snapdragon 8 Gen 3 device, ANVIL achieves remarkable performance: 12.8 ms for 1080p network inference in 8-bit integer precision, with an open-source Android player sustaining 28.4 ms median end-to-end latency per interpolated frame pair. The system logged 54,623 consecutive samples during 30-minute continuous playback, demonstrating stability. The research also identifies quantized accumulation on recurrent flow states as the key mechanism behind integer quantization failure in traditional iterative methods.
While the current design specifically targets H.264 playback scenarios where decoder-exposed motion vectors are available, the approach represents a significant advancement in making high-quality video frame interpolation practical for everyday mobile use. By working with rather than against mobile hardware constraints, ANVIL enables smooth 60fps playback from standard 30fps video content on modern high-refresh-rate displays.
- Reuses H.264 decoder motion vectors instead of learning optical flow, eliminating memory-bound operators
- Achieves 12.8 ms 1080p inference on Snapdragon 8 Gen 3 with 8-bit integer precision
- Open-source Android player sustains 28.4 ms end-to-end latency over 30-minute continuous playback
Why It Matters
Enables real-time 60fps video playback on mobile devices from standard 30fps content, improving viewing experience on high-refresh-rate displays.