MOS-Bias: From Hidden Gender Bias to Gender-Aware Speech Quality Assessment
Male listeners consistently rate speech quality higher than female listeners, skewing AI models.
A research team from academic institutions including Academia Sinica and National Taiwan University has published a groundbreaking paper titled 'MOS-Bias: From Hidden Gender Bias to Gender-Aware Speech Quality Assessment.' The study presents the first systematic analysis of gender bias in Mean Opinion Score (MOS) evaluations, the standard metric for assessing speech quality. The researchers discovered that male listeners consistently assign higher quality scores than female listeners, with this gap being most pronounced when evaluating low-quality speech samples. As speech quality improves, the scoring difference between genders gradually diminishes, revealing a quality-dependent bias structure that proves resistant to simple statistical calibration.
This human bias directly impacts AI systems. The team demonstrated that automated MOS prediction models, trained on aggregated human ratings without gender consideration, inherit and amplify this bias. Their predictions become skewed toward male perceptual standards, potentially disadvantaging systems optimized for female listeners. To address this, the researchers developed a novel gender-aware model architecture. This model learns distinct scoring patterns by abstracting binary group embeddings, allowing it to account for gender-specific perceptions. Their approach successfully improved both overall prediction accuracy and gender-specific scoring fairness, establishing that gender bias in MOS is a systematic and learnable pattern that demands explicit attention in the development of equitable speech technology.
- Male listeners consistently rated speech quality higher than female listeners, with the largest gap (up to 0.5 MOS points) found in low-quality audio.
- Automated MOS prediction models trained on standard datasets learn this bias, skewing their outputs toward male perceptual standards.
- The proposed gender-aware model uses binary group embeddings to learn gender-specific patterns, improving overall accuracy and fairness in predictions.
Why It Matters
This exposes a critical fairness flaw in foundational speech AI evaluation, impacting voice assistants, telephony, and any system optimized for perceived quality.