Audio & Speech

Over-the-air White-box Attack on the Wav2Vec Speech Recognition Neural Network

Researchers demonstrate how to manipulate speech recognition systems with inaudible audio perturbations.

Deep Dive

A new research paper by Alexey Protopopov demonstrates a sophisticated security vulnerability in modern speech recognition systems. The work focuses on executing a 'white-box' adversarial attack against Meta's widely-used Wav2Vec 2.0 neural network, meaning the attacker has full knowledge of the model's architecture. Crucially, the attack is designed to work 'over-the-air' (OTA), where adversarial perturbations are played through a speaker to manipulate a victim microphone and system, moving beyond digital-only file manipulation.

The core innovation of this research is its focus on stealth. Previous OTA attacks often produced audible distortions or static, making them easy for humans to detect. This paper explores methods to minimize the perceptibility of these audio perturbations while maintaining their effectiveness in tricking the AI. The findings reveal a troubling trade-off between attack stealth and success rate, exposing a fundamental weakness in systems like voice assistants, automated customer service, and transcription software that depend on models such as Wav2Vec.

Key Points
  • Targets Meta's Wav2Vec 2.0 model with a 'white-box' adversarial attack, assuming full model knowledge.
  • Executes 'over-the-air' (OTA), meaning attacks are transmitted via speakers to real-world microphones.
  • Prioritizes making audio perturbations less detectable to human hearing, increasing potential for stealthy exploitation.

Why It Matters

Reveals critical security flaws in voice-activated tech, forcing developers to build more robust, adversarial-resistant AI models.