RECOVER: Robust Entity Correction via agentic Orchestration of hypothesis Variants for Evidence-based Recovery
New agentic system uses LLMs to fix critical names and terms that standard ASR often misses.
Researchers Abhishek Kumar and Aashraya Sachdeva have introduced RECOVER (Robust Entity Correction via agentic Orchestration of hypothesis Variants for Evidence-based Recovery), a novel framework designed to tackle a critical weakness in Automatic Speech Recognition (ASR). Standard ASR systems often struggle with rare, domain-specific entities like medical terms, financial jargon, or proper names, and can even omit them entirely. RECOVER addresses this by acting as a post-processing, tool-using agent that orchestrates multiple correction strategies. It doesn't rely on a single transcript; instead, it uses several ASR output hypotheses as evidence, retrieves relevant entities from knowledge sources, and applies a Large Language Model (LLM) to perform constrained corrections.
The system evaluates four distinct strategies for using these hypotheses: the standard 1-Best output, an Entity-Aware Select method, a ROVER Ensemble technique, and an LLM-Select approach. In comprehensive evaluations across five diverse datasets, RECOVER demonstrated significant improvements. The framework achieved relative reductions in entity-phrase word error rate (E-WER) ranging from 8% to 46%, and increased entity recall by up to 22 percentage points. The LLM-Select strategy emerged as the top performer, delivering the best entity correction results without degrading the overall word error rate (WER) of the general transcript. This makes it a practical upgrade for existing ASR pipelines in high-stakes fields.
- Agentic framework uses multiple ASR hypotheses & LLMs to correct specific entity errors post-transcription.
- Achieved 8-46% relative reduction in entity word error rate (E-WER) and up to 22 percentage point recall boost in tests.
- LLM-Select strategy provided the best correction performance while preserving the overall transcription accuracy (WER).
Why It Matters
Drastically improves accuracy for critical terms in finance, healthcare, and aviation, where misheard names or jargon are costly.