EviOSAHS: New AI framework achieves 94.86% sensitivity for sleep apnea screening via facial analysis
Researchers' EviOSAHS framework uses facial images and LLMs to screen for sleep apnea with 94.86% sensitivity.
EviOSAHS tackles the challenge of pre-polysomnography screening for obstructive sleep apnea-hypopnea syndrome (OSAHS) by combining visible craniofacial and neck cues with clinical risk factors. Instead of direct yes/no prompting of general-purpose multimodal foundation models, the framework separates image-only anatomical evidence acquisition from clinical adjudication. Each frontal facial image is decomposed into seven fixed anatomical queries covering the neck, chin, mouth, face/neck fat, lower jaw, midface, and nose. Visual responses are converted into structured evidence cards that record target anatomy, visibility, risk direction, evidence strength, confidence, and a summary. These cards are then combined with a cleaned clinical profile in a final stage where a large language model performs balanced binary screening adjudication.
Evaluated on a 642-subject cohort (with normal subjects mapped to screening-negative and mild/moderate/severe OSAHS to screening-positive), EviOSAHS achieved 88.47% accuracy, 94.86% sensitivity, 93.74% F1-score, and a low 5.14% false-negative rate. It outperformed clinical-only prompting, direct multimodal prompting, and naive two-stage pipelines. Ablation studies confirmed that the seven-question visual decomposition and balanced final adjudication were critical for the high-sensitivity operating point. A question-level audit of 4,494 visual outputs showed 100% structured parse rate and 93.88% high-visibility rate. The authors note EviOSAHS is a triage assistant, not a diagnostic system, and requires prospective validation before clinical deployment.
- EviOSAHS achieved 88.47% accuracy, 94.86% sensitivity, and 93.74% F1-score on a 642-subject cohort for OSAHS screening.
- Facial images are decomposed into seven structured anatomical queries (neck, chin, mouth, etc.) with a 93.88% high-visibility rate.
- Auditable two-stage reasoning – separate visual evidence cards plus clinical data – yields a low 5.14% false-negative rate.
Why It Matters
Enables high-sensitivity, auditable AI triage for sleep apnea screening before costly polysomnography.