Assessing the Impact of Noise and Speech Enhancement on the Intelligibility of Speech Codecs
Classical codecs beat neural in noise; speech enhancement can rescue intelligibility.
Researchers Lyonel Behringer, Anna Leschanowsky, Anjana Rajasekhar, Emily Kratsch, and Guillaume Fuchs (submitted to Interspeech 2026) systematically evaluate how noise and speech enhancement affect speech intelligibility across classical codecs (e.g., Opus) and modern very low-bitrate neural codecs. In clean conditions, both codec types maintain high intelligibility, but under realistic noise, classical codecs retain significantly better performance. Neural codecs, while efficient, degrade sharply in noisy environments, raising concerns for real-world deployment in voice assistants or telecommunications.
To mitigate this, the authors test adding a speech enhancement (SE) module before coding. SE improves both intelligibility and listening effort scores for neural codecs, sometimes matching classical codec performance. Notably, listening effort metrics reveal differences even when intelligibility is at ceiling, suggesting SE reduces cognitive load. Furthermore, objective measures using automatic speech recognition (ASR) strongly correlate with subjective ratings per condition, validating a scalable evaluation pipeline. This work provides critical guidelines for designing robust audio processing pipelines in noise-prone settings.
- Classical codecs (e.g., Opus) are significantly more noise-robust than neural codecs in intelligibility tests.
- Speech enhancement before neural coding can restore intelligibility and reduce listening effort by up to measurable margins.
- ASR-based objective scores correlate highly with subjective intelligibility, enabling faster benchmarking without human listeners.
Why It Matters
As neural codecs proliferate in low-bitrate applications, this study reveals their noise sensitivity—fixable via pre-processing—crucial for real-world voice systems.