Achieves 13.4% WER with Whisper Accent-medium.en, a 4.1% absolute improvement over baseline Whisper-medium.en's 17.5%?

Achieves 13.4% WER with Whisper Accent-medium.en, a 4.1% absolute improvement over baseline Whisper-medium.en's 17.5%

Trains only <10% of parameters via AdaLN conditioning while freezing encoder/decoder, preserving original generalization?

Trains only <10% of parameters via AdaLN conditioning while freezing encoder/decoder, preserving original generalization

Supports 20+ English accents with 95.7% classification accuracy and provides full open-source training pipeline?

Supports 20+ English accents with 95.7% classification accuracy and provides full open-source training pipeline

Research & Papers

Whisper Accent adapts OpenAI's model for 20+ English accents with 4.1% lower WER

Q: Trains only <10% of parameters via AdaLN conditioning while freezing encoder/decoder, preserving original generalization?

Trains only <10% of parameters via AdaLN conditioning while freezing encoder/decoder, preserving original generalization

Q: Supports 20+ English accents with 95.7% classification accuracy and provides full open-source training pipeline?

Supports 20+ English accents with 95.7% classification accuracy and provides full open-source training pipeline

r/MachineLearning February 24, 2026

⚡Open-source project fine-tunes Whisper with accent conditioning, cutting word error rates by 4.1% while keeping 90% of parameters frozen.

Deep Dive

Open-source developer Mavleo96 has released Whisper Accent, a significant adaptation of OpenAI's Whisper speech recognition model specifically optimized for accented English. The project introduces accent-aware conditioning while preserving the original model's generalization capabilities through a novel architectural approach.

The technical implementation extends Whisper using Adaptive Layer Norm (AdaLN) in every decoder layer, where accent-specific embeddings condition the decoder hidden states. Crucially, the encoder and decoder remain completely frozen, preserving Whisper's original capabilities while only training <10% of parameters—specifically the AdaLN modulation weights, accent embeddings, and a classifier head. This classifier predicts accents from encoder states using learnable weighted sums, projection layers, and multi-head attention pooling.

Evaluation on the westbrook/English_Accent_DataSet shows substantial improvements: Whisper Accent-medium.en achieves 13.4% word error rate (WER), representing a 4.1% absolute improvement over the baseline Whisper-medium.en's 17.5% WER. The model also demonstrates 95.7% accuracy in accent classification across 20+ supported accents including American, British, Indian, Spanish, German, French, and various Eastern European variants.

This research matters because mainstream ASR systems often underperform on non-standard accents, creating accessibility barriers. By open-sourcing the full training setup and checkpoints, Mavleo96 enables both practical applications and further research into accent-adaptive speech recognition without requiring massive retraining of foundation models.

Key Points

Achieves 13.4% WER with Whisper Accent-medium.en, a 4.1% absolute improvement over baseline Whisper-medium.en's 17.5%
Trains only <10% of parameters via AdaLN conditioning while freezing encoder/decoder, preserving original generalization
Supports 20+ English accents with 95.7% classification accuracy and provides full open-source training pipeline

Why It Matters

Makes speech recognition more accessible globally by significantly improving accuracy for non-standard English accents without full model retraining.

Read Original Article

Whisper Accent adapts OpenAI's model for 20+ English accents with 4.1% lower WER

Why It Matters

Related Articles

🚀 Stay Ahead in AI