Lost in Transcription: Subtitle Errors in Automatic Speech Recognition Reduce Speaker and Content Evaluations
ASR errors in subtitles lower viewer evaluations of speakers by 20-30%, new research finds.
A new study titled 'Lost in Transcription: Subtitle Errors in Automatic Speech Recognition Reduce Speaker and Content Evaluations' reveals a significant bias in how we perceive video content. Researchers from Cornell Tech conducted a preregistered online experiment with 207 U.S.-based participants, showing them speakers with various accents delivering talks. The key manipulation was subtitle quality: some viewers saw accurate subtitles while others saw error-prone versions generated by imperfect ASR systems. The results were clear and consistent: error-prone subtitles reduced both speaker and content evaluations across the board.
While the study's controlled analysis didn't show disparate impact between accent groups when isolating subtitle quality, the researchers highlight a critical real-world implication. Since ASR systems are known to perform worse for certain demographic groups and accents, those speakers are systematically more likely to receive error-ridden subtitles. This creates a compounding disadvantage: first, the technology fails them more often, and second, those errors directly cause viewers to judge them and their message more harshly. The paper, available on arXiv, adds to growing evidence that AI system performance gaps have tangible human consequences beyond mere technical metrics.
- Error-prone AI-generated subtitles reduced viewer evaluations of speakers and content by 20-30% in a controlled study of 207 participants.
- The research found no disparate impact between accent groups when subtitle quality was controlled, but notes real-world systems create bias.
- Speakers with accents that challenge current ASR models face a double penalty: worse transcription quality leading to lower perceived credibility.
Why It Matters
Flawed AI subtitles aren't just inconvenient—they systematically damage professional credibility and message reception in virtual meetings and content.