UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension
A new GAN-based model upgrades any speech codec to super-wideband quality with minimal extra data.
A team of researchers from Fraunhofer IIS, the University of Erlangen-Nuremberg, and others have published UBGAN (Universal Bandwidth Extension Generative Adversarial Network), a novel AI model designed to upgrade the audio quality of existing speech codecs. Unlike traditional neural codecs that are locked to specific bitrates and sampling rates, UBGAN acts as a modular post-processor. It takes the output from standard wideband (WB) codecs—which cap audio at 8 kHz—and uses a lightweight GAN architecture to synthesize higher frequencies, producing super-wideband (SWB) audio at 16 kHz. This approach provides a flexible upgrade path for legacy systems in telecommunications, VoIP, and conferencing apps where replacing core codecs is impractical.
The model introduces two operational variants: 'blind-UBGAN,' which works without any extra data, and 'guided-UBGAN,' which transmits a tiny 0.8 kbps side signal to guide the synthesis for higher fidelity. In subjective listening tests (MUSHRA scores), UBGAN applied to the 3GPP EVS codec at 9.6 kbps outperformed the native EVS SWB codec. This demonstrates its generalization capacity across different codecs and bitrates. The research, presented at IEEE WASPAA 2025, signifies a shift towards adaptive, AI-powered enhancement layers that can be deployed over existing infrastructure, potentially delivering clearer voice calls on current networks without a full system overhaul.
- Modular GAN architecture upgrades any wideband (8 kHz) codec to super-wideband (16 kHz) quality.
- Offers 'blind' mode (no extra data) and 'guided' mode (adds only 0.8 kbps of side information).
- Outperformed the native 3GPP EVS super-wideband codec in subjective listening tests (MUSHRA).
Why It Matters
Enables telecoms and app developers to significantly improve call clarity on existing networks without replacing core infrastructure.