Audio & Speech

Speech-preserving active noise control: a deep learning approach in reverberant environments

A new deep learning system tackles the classic ANC problem of accidentally silencing the speaker you want to hear.

Deep Dive

Researcher Shuning Dai has introduced a novel deep learning approach to a classic audio engineering problem: Active Noise Control (ANC). Traditional ANC systems, based on FxLMS algorithms, struggle with non-linear, real-world acoustic environments and often cancel out desired speech along with the noise. Dai's proposed system tackles this by building an end-to-end control architecture centered on a Convolutional Recurrent Network (CRN). This design uses Long Short-Term Memory (LSTM) networks to model the temporal dynamics of sound and employs complex spectrum mapping to handle non-linear distortions, moving beyond the limitations of linear assumptions.

A key innovation is the inclusion of a specialized voice retention loss function. This guides the model to selectively suppress environmental noise while identifying and preserving the spectral characteristics of a target speaker's voice. To rigorously test the system in realistic conditions, the research used the Image Source Method (ISM) to create a high-fidelity acoustic simulation that includes challenging reverberation effects. Experimental results show the Deep ANC system achieves significantly better noise reduction than traditional methods, particularly for difficult, non-stationary noises like crowd babble. Critically, evaluations using standard metrics (PESQ for quality and STOI for intelligibility) confirm the system successfully maintains the clarity and naturalness of the preserved speech.

Key Points
  • Uses a Convolutional Recurrent Network (CRN) with LSTM layers for end-to-end control of complex acoustic signals.
  • Introduces a specialized voice retention loss function to selectively preserve target speech while suppressing noise.
  • Outperforms traditional FxLMS algorithms, especially on non-stationary noise, and maintains speech quality per PESQ/STOI metrics.

Why It Matters

This research could lead to smarter noise-cancelling headphones and conferencing systems that protect conversations in noisy places like cafes or open offices.