Depression Detection at the Point of Care: Automated Analysis of Linguistic Signals from Routine Primary Care Encounters
A zero-shot GPT model analyzed 1,108 clinical conversations, spotting depression from just 128 patient words.
A research team from the University of Washington and UC San Diego has demonstrated that AI can effectively screen for depression by analyzing the natural language in routine primary care visits. Their study, published on arXiv, analyzed 1,108 audio-recorded clinical encounters from the Establishing Focus study, comparing 253 depressed patients (defined by PHQ-9 scores) against 855 non-depressed controls. The researchers tested multiple approaches, including supervised models like Sentence-BERT with Logistic Regression and ModernBERT, against a zero-shot implementation of OpenAI's GPT-OSS model.
GPT-OSS emerged as the top performer, achieving an Area Under the Receiver Operating Characteristic curve (AUROC) of 0.774 and an Area Under the Precision-Recall Curve (AUPRC) of 0.510. Crucially, the system proved most effective when analyzing the complete dyadic transcript—both patient and provider speech—rather than patient speech alone. This revealed that clinicians unconsciously engage in linguistic mirroring with depressed patients, subtly adopting similar speech patterns that create an additive diagnostic signal.
The practical implications are significant: meaningful detection was achievable from just the first 128 tokens of patient speech (AUROC=0.675), suggesting potential for real-time clinical decision support. This passive screening method, leveraging increasingly common digital scribe recordings, could serve as a low-burden complement to existing depression screening workflows like the PHQ-9 questionnaire, potentially helping address the critical problem of underdiagnosis in primary care settings.
- GPT-OSS achieved 77.4% accuracy (AUROC=0.774) detecting depression from clinical conversations, outperforming specialized supervised models
- Analysis of 1,108 encounters revealed providers linguistically mirror depressed patients, creating a stronger signal than patient speech alone
- System detected meaningful signals from just 128 patient tokens, enabling potential real-time clinical decision support
Why It Matters
Enables passive, scalable depression screening during routine visits, potentially catching cases that traditional questionnaires miss.