Audio & Speech

Fast and Flexible Audio Bandwidth Extension via Vocos

arXiv eess.AS March 10, 2026

⚡New AI model enhances 8-48 kHz audio at a real-time factor of 0.0001 on an A100 GPU.

Deep Dive

Researcher Yatharth Sharma has introduced a novel AI model for audio bandwidth extension (BWE) called a Vocos-based model. The system is designed to enhance audio quality by intelligently generating the missing high-frequency content in lower-quality audio files, effectively upsampling them. It works by first resampling input audio to 48 kHz and then processing it through a neural vocoder backbone. A key innovation is that this single network architecture can support arbitrary upsampling ratios, making it highly flexible for various source qualities.

The model's efficiency is a major breakthrough. It incorporates a lightweight refiner inspired by Linkwitz-Riley crossover filters to smoothly merge the original low-frequency band with the AI-generated high frequencies. On validation, it achieves a competitive log-spectral distance (LSD)—a key metric for audio quality—while operating at astonishing speeds. It boasts a real-time factor (RTF) of just 0.0001 on an NVIDIA A100 GPU, meaning it can process audio roughly 10,000 times faster than real-time. Even on a standard 8-core CPU, it maintains a highly practical RTF of 0.0053. This combination of high-quality output and extreme throughput makes it suitable for real-world applications like streaming, communication, and media restoration where latency and cost are critical.

Key Points

Uses a Vocos neural vocoder backbone to generate missing high-frequencies, enhancing audio from 8 kHz up to 48 kHz.
Achieves extreme throughput with a real-time factor of 0.0001 on an A100 GPU and 0.0053 on an 8-core CPU.
Features a flexible single-network design that supports arbitrary upsampling ratios and a lightweight crossover refiner for smooth audio merging.

Why It Matters

Enables real-time, high-quality audio enhancement for streaming, calls, and media restoration at negligible computational cost.

Read Original Article

Fast and Flexible Audio Bandwidth Extension via Vocos

Why It Matters

Stay Ahead in AI