Audio & Speech

HuBERT model detects COVID-19 from voice with 86% accuracy

An AI model identifies COVID-19 from just a voice recording — no tests needed.

Deep Dive

A team led by Yuyang Yan at Maastricht University trained deep learning models to identify COVID-19 from crowd-sourced voice recordings. Using the Cambridge COVID-19 Sound database — 893 speech samples contributed by 4,352 participants via a smartphone app — they extracted Mel-spectrograms, MFCCs, and CNN encoder features. They compared LSTM, CNN, and HuBERT (a BERT-based audio model) against traditional machine learning baselines. HuBERT outperformed all others, reaching 86% accuracy and an area under the curve (AUC) of 0.93.

This research points to a promising tool for non-invasive, low-cost, and highly scalable COVID-19 screening. The model could be deployed as a mobile or web application, allowing remote detection without rapid tests or lab visits. While the paper notes limitations (small dataset, not peer-reviewed for clinical use), the approach could extend beyond COVID-19 to other respiratory diseases. The preprint was submitted in February 2024 and revised in May 2026, indicating ongoing refinement. If validated in larger studies, voice-based AI diagnosis could become a practical triage tool in public health.

Key Points
  • HuBERT model achieved 86% accuracy and 0.93 AUC on detecting COVID-19 from voice recordings.
  • Dataset: 893 crowd-sourced speech samples from 4,352 participants via the COVID-19 Sounds app.
  • Approach uses deep learning (LSTM, CNN, HuBERT) on Mel-spectrograms and MFCC features.

Why It Matters

Voice-based AI screening could enable cheap, instant, scalable COVID-19 detection without physical tests.