ACES: Accent Subspaces for Coupling, Explanations, and Stress-Testing in Automatic Speech Recognition
New audit reveals accent information concentrates in just 8 dimensions of a popular speech AI model.
Researcher Swapnil Parekh has introduced ACES (Accent Subspaces for Coupling, Explanations, and Stress-Testing), a novel diagnostic framework for auditing bias in Automatic Speech Recognition systems. The research, published on arXiv, tackles the persistent problem of performance disparities across different English accents in models like Meta's Wav2Vec2-base. ACES works by extracting low-dimensional subspaces within the model's internal representations that are most sensitive to accent information, providing a precise tool to probe where and how bias manifests, moving beyond simple output metrics to understand internal mechanisms.
The technical analysis of Wav2Vec2-base across five English accents revealed that accent information is surprisingly concentrated in a very small part of the network—specifically an 8-dimensional subspace in the model's third layer. The magnitude of a speech sample's projection onto this 'accent subspace' showed a measurable correlation (r=0.26) with its word error rate (WER). Crucially, the study found that attempts to linearly attenuate or 'erase' this subspace did not reduce performance disparity and even slightly worsened it, indicating that accent-relevant features are fundamentally entangled with the cues needed for accurate speech recognition. This positions ACES as a vital diagnostic tool for developers, highlighting that fairness interventions must be more sophisticated than simple feature removal.
- ACES audit finds accent bias in Wav2Vec2-base concentrates in an 8-dimensional subspace at layer 3.
- Projection magnitude onto this subspace correlates with per-utterance word error rate (WER) at r=0.26.
- Linear attenuation of the accent subspace failed to reduce disparity, showing accent features are entangled with recognition cues.
Why It Matters
Provides developers a precise tool to diagnose and understand bias in speech AI, guiding more effective fairness interventions beyond simplistic fixes.