New survey maps 5 approaches to fix ASR errors without retraining models
Researchers categorize 50+ methods into fusion, re-scoring, correction, distillation, and training adjustment.
Automatic Speech Recognition (ASR) systems are essential for voice assistants, transcription services, and accessibility tools, but they still struggle with accents, dialects, background noise, and domain-specific terminology. Redesigning an ASR model is expensive and time-consuming, so researchers have turned to non-intrusive refinement—techniques that improve transcription accuracy without altering the core model. A new comprehensive survey on arXiv (v3, May 2026) by Mohammad Reza Peyghan and colleagues systematically reviews these methods, grouping them into five distinct classes: fusion, re-scoring, correction, distillation, and training adjustment.
Fusion combines outputs from multiple ASR systems to reduce errors; re-scoring uses additional language models or context to rank hypotheses; correction directly edits erroneous transcriptions; distillation transfers knowledge from a large model to a smaller one; and training adjustment fine-tunes external components (like a separate error corrector) while keeping the ASR model frozen. The survey outlines each method's strengths, limitations, and ideal application scenarios—for example, correction works well for domain-specific jargon, while re-scoring excels in noisy environments.
Beyond classification, the paper surveys adaptation techniques for domain-specific ASR refinement (e.g., medical or legal terminology), reviews commonly used evaluation datasets and their construction, and proposes a standardized set of metrics to enable fair comparisons across studies. The authors also identify open research gaps, such as handling code-switching and scaling refinement to real-time streaming applications. By providing this structured overview, the survey equips researchers and practitioners with a clear foundation to build more robust, accurate ASR pipelines without expensive model overhauls.
- Classifies 50+ non-intrusive ASR refinement techniques into five categories: fusion, re-scoring, correction, distillation, and training adjustment.
- Reviews domain adaptation methods and standardized evaluation metrics to enable fair comparisons across different refinement pipelines.
- Identifies open research gaps including code-switching handling and real-time streaming scalability.
Why It Matters
For teams building voice assistants or transcription tools, this survey provides a practical roadmap to boost accuracy without costly model redesign.