SLIP & ETHICS framework tackles safety-rapport paradox in AI companions
New staged intervention achieves 0% false positives while navigating high-risk emotional scenarios.
AI emotional companions face a fundamental dilemma: strict safeguards damage the supportive alliance users need, while permissive systems risk harmful outcomes. Kim's SLIP (Staged Layers of Intervention Protocol) addresses this with four intervention levels—none, soft, hard—triggered by qualitative indicators like affect intensity and narrative dynamism. The accompanying ETHICS taxonomy uses dynamic signals rather than static labels to contextualize user behavior. Early results from a small production deployment (68 entries, 10 users, 10 weeks) and synthetic persona tests (91 personas across 5 risk profiles) showed zero false positives for healthy flow personas, while crisis-oriented profiles correctly escalated.
Yet a critical boundary emerged: 8 consecutive days of sustained high-energy elevation produced zero interventions (0/8), exposing a blind spot where the principle of "do not pathologize" conflicts with safety. A subsequent stress test with three models showed that increasing model capability raised detection from 0/8 to 6/8 while preserving perfect specificity (0/10 false positives) on the largest model. Accepted at PervasiveHealth 2026, these findings position graduated intervention as a promising design direction—not a resolution—for the safety-rapport tension in affective computing.
- SLIP uses four intervention stages (none, soft, hard) triggered by affect intensity and narrative dynamism.
- In a 10-week, 68-entry study with 10 users, it achieved 0% false positives for healthy flow personas.
- An 8-day high-energy gap (0/8 interventions) was later closed to 6/8 by a larger model, revealing capability-dependent safety.
Why It Matters
Balancing emotional rapport with safety in AI companions is now quantifiable, guiding safer deployment in mental health and support tools.