MBR decoding outperforms beam search for Whisper-based ASR on English and Japanese?

MBR decoding outperforms beam search for Whisper-based ASR on English and Japanese

Tested on both ASR and speech translation tasks across multiple Whisper model variants?

Tested on both ASR and speech translation tasks across multiple Whisper model variants

Sample-based approach selects hypothesis with minimum expected risk, boosting accuracy?

Sample-based approach selects hypothesis with minimum expected risk, boosting accuracy

Audio & Speech

MBR decoding beats beam search for Whisper speech recognition

arXiv eess.AS May 14, 2026

⚡New research shows sample-based MBR decoding outperforms beam search in ASR tasks...

Deep Dive

A new paper by Yuu Jinnai revisits Minimum Bayes Risk (MBR) decoding for automatic speech recognition (ASR) and speech translation (ST). While beam search has long been the standard decoding method for speech-to-text tasks, recent work has shown that sample-based MBR decoding outperforms beam search in text-to-text generation (e.g., machine translation). Jinnai tests whether this advantage extends to audio—specifically, using OpenAI's Whisper models and their derivatives on English and Japanese datasets.

The results show that MBR decoding achieves higher accuracy than beam search in most experimental settings. The method is especially promising for offline ASR and ST tasks where high accuracy is critical. By generating multiple candidate hypotheses and selecting the one with the lowest expected risk under a utility function, MBR reduces errors compared to the greedy or beam-search approaches traditionally used in speech. The code is open-sourced, making it easy for practitioners to adopt.

Key Points

MBR decoding outperforms beam search for Whisper-based ASR on English and Japanese
Tested on both ASR and speech translation tasks across multiple Whisper model variants
Sample-based approach selects hypothesis with minimum expected risk, boosting accuracy

Why It Matters

This could improve accuracy in production speech systems without needing model retraining—just a smarter decoding strategy.

Read Original Article

MBR decoding beats beam search for Whisper speech recognition

Why It Matters

Related Articles

🚀 Stay Ahead in AI