Deployment and Evaluation of an EHR-integrated, Large Language Model-Powered Tool to Triage Surgical Patients
An LLM-powered tool reviewed 6,193 cases, flagging 23% for specialist consultation with high accuracy.
A research team from Stanford Health Care has developed and deployed an AI tool, the SCM Navigator, which uses a large language model (LLM) to automatically screen surgical patients for eligibility for Surgical Co-management (SCM). SCM is a proven model where hospitalists manage complex patients alongside surgeons, but manual identification is a bottleneck. The tool, integrated directly into the hospital's Electronic Health Record (EHR) system, analyzes pre-operative notes and structured data to categorize patients as appropriate, not appropriate, or possibly appropriate for SCM consultation.
In a real-world, prospective study involving 6,193 surgical cases, the SCM Navigator recommended hospitalist review for 1,582 patients (23%). When compared to physician determinations as the gold standard, the AI demonstrated high sensitivity (0.94) and moderate specificity (0.74). A crucial finding from post-hoc analysis was that most discrepancies were due to modifiable factors like clinical criteria gaps or workflow issues, not LLM errors. Only 2 of 19 false-negative cases were attributed to AI misclassification.
The study represents a significant step in operationalizing AI for clinical decision support. By embedding the LLM within the existing EHR and maintaining a 'human-in-the-loop' for final review, the team created a safe, augmentative system. The results provide strong evidence that AI can accurately and reliably automate the initial, labor-intensive screening step, freeing clinicians to focus on higher-value care. This model of EHR-integrated, assistive AI could be adapted to numerous other time-consuming clinical screening and triage tasks.
- The SCM Navigator tool reviewed 6,193 surgical cases, automatically flagging 1,582 (23%) for specialist hospitalist co-management.
- It achieved 94% sensitivity (correctly identifying those who need care) and 74% specificity in a prospective clinical study.
- Post-analysis showed most errors stemmed from clinical workflow issues, with the LLM itself responsible for only 11% of false negatives.
Why It Matters
This proves AI can safely automate tedious hospital screening tasks, potentially freeing up thousands of clinician hours for direct patient care.