Audio & Speech

Researchers cut speech recognition errors by 50% for oral cancer patients using AI

New combo of data augmentation and LLM correction slashes Word Error Rate by up to 50%.

Deep Dive

A team led by Hidde Folkertsma tackled the challenge of ASR for speakers treated for oral cancer (OC), whose speech impairments often cause standard systems to fail. They finetuned Whisper and Meta’s Massively Multilingual Speech (MMS) models on a Dutch OC corpus. To overcome data scarcity, they applied text-to-speech augmentation to generate synthetic OC-like speech, yielding an 8% relative improvement in word error rate (WER).

Next, they deployed a large language model (LLM) to post-process ASR outputs, correcting residual errors. This reduced WER by an additional 21.4–26.2% on finetuned models and 10% on non-finetuned baselines. The combined approach achieved a 40% relative WER reduction for Whisper and 50% for MMS. The study, accepted at EMBC 2026, demonstrates a viable path to accessible voice technology for cancer survivors.

Key Points
  • TTS data augmentation reduced Word Error Rate by 8% relative on Whisper and MMS models.
  • LLM error correction added a further 21.4-26.2% relative WER decrease on finetuned ASR models.
  • Overall, Whisper achieved 40% and MMS achieved 50% relative WER reduction for oral cancer speech recognition.

Why It Matters

Brings voice assistants and dictation tools closer to practical use for speech-impaired cancer survivors.