Audio & Speech

ACAD: New AI Denoises Audio Based on Scene Context

Traffic noise is signal for surveillance but noise for a phone call...

Deep Dive

Current audio denoising systems apply fixed target-noise definitions, often removing useful sounds in one context while failing to suppress irrelevant ones. To address this, Diep Luong, Konstantinos Drossos, Mikko Heikkinen, and Tuomas Virtanen propose Automatic Contextual Audio Denoising (ACAD). The method restricts context to an acoustic scene class (e.g., park, traffic, office) and labels sounds outside that class's distribution as out-of-context (OC) noise, while typical sounds are in-context (IC). The deep learning model automatically infers the context from the audio and removes only OC components.

Benchmarked against variants without context inference, with oracle context, and with uninformative context, ACAD outperformed all on standard objective metrics across diverse paired clean/noisy datasets. Notably, an OC component in one context (e.g., footsteps in a traffic scene) is IC in another (e.g., footsteps in a hallway). This work demonstrates that context-dependent processing can enhance denoising by preserving relevant sounds, with applications in smart devices, hearing aids, and surveillance where the definition of "noise" depends on the user's environment.

Key Points
  • ACAD infers acoustic scene class (e.g., traffic, office) to define target vs. noise per context
  • Uses deep learning to remove out-of-context (OC) sounds while keeping in-context (IC) sounds
  • Outperforms fixed-target and non-contextual denoisers on objective metrics across diverse scenarios

Why It Matters

Smart assistants, hearing aids, and surveillance can now adapt denoising to the user's environment, preserving important sounds.