AI Safety

Deployment and Evaluation of an EHR-integrated, Large Language Model-Powered Tool to Triage Surgical Patients

arXiv cs.CY March 19, 2026

⚡An LLM-powered tool reviewed 6,193 cases, flagging 23% for specialist consultation with high accuracy.

Deep Dive

A research team from Stanford Health Care has developed and deployed an AI tool, the SCM Navigator, which uses a large language model (LLM) to automatically screen surgical patients for eligibility for Surgical Co-management (SCM). SCM is a proven model where hospitalists manage complex patients alongside surgeons, but manual identification is a bottleneck. The tool, integrated directly into the hospital's Electronic Health Record (EHR) system, analyzes pre-operative notes and structured data to categorize patients as appropriate, not appropriate, or possibly appropriate for SCM consultation.

In a real-world, prospective study involving 6,193 surgical cases, the SCM Navigator recommended hospitalist review for 1,582 patients (23%). When compared to physician determinations as the gold standard, the AI demonstrated high sensitivity (0.94) and moderate specificity (0.74). A crucial finding from post-hoc analysis was that most discrepancies were due to modifiable factors like clinical criteria gaps or workflow issues, not LLM errors. Only 2 of 19 false-negative cases were attributed to AI misclassification.

The study represents a significant step in operationalizing AI for clinical decision support. By embedding the LLM within the existing EHR and maintaining a 'human-in-the-loop' for final review, the team created a safe, augmentative system. The results provide strong evidence that AI can accurately and reliably automate the initial, labor-intensive screening step, freeing clinicians to focus on higher-value care. This model of EHR-integrated, assistive AI could be adapted to numerous other time-consuming clinical screening and triage tasks.

Key Points

The SCM Navigator tool reviewed 6,193 surgical cases, automatically flagging 1,582 (23%) for specialist hospitalist co-management.
It achieved 94% sensitivity (correctly identifying those who need care) and 74% specificity in a prospective clinical study.
Post-analysis showed most errors stemmed from clinical workflow issues, with the LLM itself responsible for only 11% of false negatives.

Why It Matters

This proves AI can safely automate tedious hospital screening tasks, potentially freeing up thousands of clinician hours for direct patient care.

Read Original Article

Deployment and Evaluation of an EHR-integrated, Large Language Model-Powered Tool to Triage Surgical Patients

Why It Matters

Stay Ahead in AI