Who is Speaking or Who is Depressed? A Controlled Study of Speaker Leakage in Speech-Based Depression Detection
Research shows speech-based AI models for depression detection may be learning speaker traits, not clinical signals.
A team of researchers from multiple institutions has published a critical study revealing fundamental flaws in how speech-based AI models detect depression. The paper, titled 'Who is Speaking or Who is Depressed? A Controlled Study of Speaker Leakage in Speech-Based Depression Detection,' demonstrates that current models may be learning to recognize individual speakers rather than actual depression-related acoustic biomarkers. Using the DAIC-WOZ dataset, the researchers implemented a controlled data-splitting strategy that maintained constant training size while systematically varying speaker overlap between training and test sets.
Results showed dramatic performance differences: when models were tested on speakers they'd encountered during training, accuracy appeared strong, but performance dropped sharply on completely unseen speakers. Even advanced techniques like Domain-Adversarial Neural Networks couldn't eliminate this gap, suggesting deep entanglement between speaker identity and the features models extract. The study evaluated three models of varying complexity, all showing similar patterns of speaker dependency.
This research fundamentally challenges the clinical validity of many current speech-based depression detection systems. Conventional evaluation protocols that don't strictly control for speaker overlap may be significantly overestimating models' generalization capabilities and real-world utility. The findings highlight the need for more rigorous, speaker-independent testing frameworks before these technologies can be reliably deployed in clinical or mental health monitoring applications.
- Models showed 3x performance difference between seen vs. unseen speakers
- Even Domain-Adversarial Neural Networks left substantial performance gaps
- Study used DAIC-WOZ dataset with controlled speaker overlap protocols
Why It Matters
This exposes fundamental validity issues in AI mental health tools, requiring stricter testing before clinical deployment.