SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model
New vision-language model stages sleep from brain waves and explains its reasoning using official medical guidelines.
A research team led by Guifeng Deng has introduced SleepVLM, a novel vision-language model (VLM) designed to solve a critical problem in medical AI: the lack of explainable reasoning. While automated sleep staging can match expert accuracy, its 'black box' nature hinders clinical trust. SleepVLM processes multi-channel polysomnography (PSG) data—converted into waveform images—and not only classifies sleep stages but also generates detailed, text-based rationales. These explanations are explicitly grounded in the official rules of the American Academy of Sleep Medicine (AASM), allowing doctors to audit the AI's logic step-by-step.
The model's performance is competitive with the best existing systems. It achieved a Cohen's kappa score of 0.767 on the held-out MASS-SS1 test set and 0.743 on an external clinical cohort (ZUAMHCS), demonstrating robust generalization. More importantly, in expert evaluations, its generated explanations received mean scores exceeding 4.0 out of 5.0 for factual accuracy, evidence comprehensiveness, and logical coherence. This combination of high accuracy and transparent, rule-based reasoning addresses the auditability gap that has blocked wider adoption of AI in sleep clinics.
To support further research, the team is also releasing MASS-EX, a new expert-annotated dataset. SleepVLM's architecture uses a two-phase training approach: first, waveform-perceptual pre-training to understand PSG signal patterns, followed by rule-grounded supervised fine-tuning to align its outputs with AASM criteria. By making AI reasoning interpretable and verifiable against a trusted medical standard, SleepVLM represents a significant step toward trustworthy AI assistants that can be safely integrated into real-world diagnostic workflows.
- Achieves expert-level accuracy with a Cohen's kappa of 0.767 on the MASS-SS1 test set, matching state-of-the-art models.
- Generates clinician-readable explanations explicitly based on American Academy of Sleep Medicine (AASM) rules, scoring over 4.0/5.0 for factual accuracy in expert review.
- Releases a new expert-annotated dataset, MASS-EX, to advance research in interpretable medical AI.
Why It Matters
It bridges the 'trust gap' in medical AI by providing auditable, rule-based explanations, paving the way for safer clinical adoption.