Audio & Speech

MedASR open-source model cuts medical dictation errors by 58%

This tiny 105M parameter model beats OpenAI's Whisper on medical speech.

Deep Dive

MedASR is a new open-source model purpose-built for medical dictation, addressing the unique challenges of clinical speech recognition. With only 105 million parameters, it is deliberately designed to be 'small, fast, and accurate,' prioritizing deployment efficiency without sacrificing accuracy. The model employs a pseudo-streaming sliding-window inference approach, enabling real-time transcription of long-form medical notes—a critical requirement in clinical settings. The team tackled two major obstacles: scarcity of clinical corpora and class imbalance in medical terminology, using targeted data augmentation and training strategies.

In head-to-head evaluations, MedASR achieves a 58% relative word error rate reduction on the Eye Gaze speech benchmark compared to OpenAI's Whisper Large-v3, a much larger and more resource-intensive model. By open-sourcing MedASR, the researchers aim to break down barriers to clinical documentation automation that proprietary systems often create. This gives healthcare organizations full transparency into the model's workings, allowing for customization, fine-tuning on local data, and compliance with privacy regulations. The result is a high-performance backbone that promises to reduce clinician burnout by making dictation more accurate and accessible.

Key Points
  • 58% relative WER reduction over Whisper Large-v3 on the Eye Gaze benchmark
  • 105M parameters – lightweight model optimized for speed and on-device deployment
  • Open-source with pseudo-streaming sliding-window inference for real-time long-form transcription

Why It Matters

Enables accurate, private medical dictation without relying on proprietary APIs, reducing clinical documentation burdens and improving workflow efficiency.