ARTT: Augmented Reverberant-Target Training for Unsupervised Monaural Speech Dereverberation
New AI training technique removes echoes from single-mic recordings by adding more reverb first.
A team of researchers has introduced ARTT (Augmented Reverberant-Target Training), a novel AI method that tackles the difficult problem of cleaning up echo-filled speech from a single microphone, all without needing any pre-recorded 'clean' audio for training. The core innovation is a counter-intuitive two-stage process. First, in the Reverberant-Target Training (RTT) stage, the system takes an already reverberant audio signal, artificially makes it even more echoey, and then trains a deep neural network (DNN) to recover the original reverberant signal. Despite the target being reverberant, this discriminative training forces the network to learn to identify and suppress reverberation components.
In the second stage, an online self-distillation mechanism based on the 'mean-teacher' algorithm is applied to stabilize and improve the model's predictions, leading to more robust dereverberation. This fully unsupervised approach is a significant breakthrough because it bypasses the major hurdle in the field: the scarcity of perfectly clean, matched speech recordings needed to train supervised models. Evaluation results show that ARTT achieves strong performance and 'significantly outperforms previous baselines,' offering a practical new tool for applications like voice assistants, conference calls, and hearing aids that must operate in real-world, acoustically challenging spaces.
- Uses a two-stage, fully unsupervised method called Augmented Reverberant-Target Training (ARTT).
- First stage (RTT) trains a DNN to recover a signal after making it more reverberant, learning to remove echoes.
- Second stage employs a 'mean-teacher' self-distillation algorithm to refine performance, beating previous benchmark methods.
Why It Matters
Enables clearer audio for calls and voice AI in echoey rooms without needing impossible-to-get clean training data.