HCFD: A Benchmark for Audio Deepfake Detection in Healthcare
New AI model PHOENIX-Mamba detects synthetic patient voices with over 97% accuracy across multiple conditions.
A research team from IIT Roorkee and IIIT Delhi has published a groundbreaking paper introducing HCFD (Healthcare Codec-Fake Detection), the first benchmark specifically designed to detect AI-generated voice deepfakes in healthcare settings. The work addresses a critical vulnerability: modern speech synthesis pipelines use neural codecs as core building blocks, and existing detectors trained on healthy speech fail when faced with pathological voices from patients with conditions like depression or Alzheimer's. The team released Healthcare CodecFake, the inaugural pathology-aware dataset containing paired real and synthetic speech across multiple clinical conditions and codec families.
Their evaluation revealed that state-of-the-art detectors perform poorly on this new benchmark, highlighting the need for specialized models. The researchers demonstrated that PaSST, a patch-based spectro-temporal model, outperforms existing speech-based approaches. They then proposed PHOENIX-Mamba, a novel geometry-aware framework that models different types of codec-fakes as self-discovered clusters in hyperbolic space—a mathematical representation that better handles the variability in pathological speech.
The results are impressive: PHOENIX-Mamba achieved 97.04% accuracy detecting synthetic depression speech, 96.73% for Alzheimer's, and 96.57% for dysarthria. It maintained strong performance on Chinese language data as well, with accuracies above 93% across all conditions. This represents a significant advancement in securing voice-based healthcare applications against increasingly sophisticated audio deepfakes, which could otherwise be used to impersonate patients or fabricate medical evidence.
- Introduces HCFD, the first benchmark for detecting codec-based audio deepfakes in pathological healthcare speech
- PHOENIX-Mamba framework achieves 97%+ accuracy across depression, Alzheimer's, and dysarthria conditions
- Reveals existing detectors fail on pathological speech, creating security vulnerabilities in medical voice systems
Why It Matters
Prevents AI voice fraud in telehealth and medical documentation, protecting patient identity and treatment integrity.