Elderly-Contextual Data Augmentation via Speech Synthesis for Elderly ASR
A new data augmentation method slashes word error rates for seniors' speech recognition.
A team of researchers from South Korea has introduced a novel data augmentation pipeline to tackle the persistent challenge of automatic speech recognition (ASR) for elderly speakers. Their approach, detailed in a preprint on arXiv, combines large language model (LLM)-based transcript paraphrasing with text-to-speech (TTS) synthesis to generate synthetic training data. The pipeline first uses an LLM to produce elderly-contextual paraphrases of original transcripts, then a TTS model synthesizes corresponding speech using elderly reference speakers. The resulting audio-text pairs are merged with original data to fine-tune OpenAI's Whisper model without any architectural changes.
In experiments with English and Korean elderly speech datasets from speakers aged 70 and above, the method consistently outperformed conventional augmentation baselines, achieving up to a 58.2% reduction in word error rate (WER) compared to the Whisper baseline. The researchers also analyzed the effects of augmentation ratio and reference-speaker composition in low-resource scenarios, highlighting the importance of using elderly voices in the TTS synthesis. This work addresses critical data scarcity issues in elderly ASR, where limited training data and distinct acoustic-linguistic characteristics of elderly speech have long hindered progress. The paper is currently under review at IEEE Signal Processing Letters.
- Combines LLM-based transcript paraphrasing with TTS synthesis using elderly reference speakers for data augmentation.
- Fine-tunes Whisper without architectural modification, achieving up to 58.2% WER reduction on English and Korean datasets.
- Analysis of augmentation ratio and reference-speaker composition shows using elderly voices in TTS is critical for low-resource EASR.
Why It Matters
This breakthrough could dramatically improve voice assistants and accessibility tools for the growing elderly population.