AI Safety

M-CARE: Standardized Clinical Case Reporting for AI Model Behavioral Disorders, with a 20-Case Atlas and Experimental Validation

A clinical framework for AI 'mental health' reveals shell instructions can override cooperative behavior

Deep Dive

Jihoon Jeong's M-CARE framework, published on arXiv, brings a clinical lens to AI model behavior by adapting human medical case reporting standards. It offers a 13-section report format, a 4-axis diagnostic assessment, and a nosological classification for AI behavioral conditions, covering five categories: RLHF Performance Artifacts, Shell-Core Override Pathology, Context & Memory Conditions, Core Identity & Plasticity, and Stress, Methodology, & Boundary Conditions. The 20-case atlas includes field observations of deployed agents (8 cases), controlled experiments across three platforms (8 cases), and published sources (4 cases).

A key case, Shell-Induced Behavioral Override (SIBO), demonstrates that shell instructions can categorically override a model's default cooperative behavior. Validated across five game domains (Trust Game, Poker, Avalon, Codenames, Chess), SIBO reveals a domain-dependent spectrum (SIBO Index: 0.75 to 0.10) that varies with action space complexity, core domain expertise, and temporal directness. The framework is extensible, allowing new cases and categories without modification. Jeong releases the framework, all 20 case reports, and experimental data as open resources, marking the second paper in the Model Medicine series.

Key Points
  • M-CARE provides a 13-section clinical report format and 4-axis diagnostic system for AI behavioral disorders
  • 20-case atlas spans RLHF artifacts, shell-core override pathology, and identity plasticity
  • SIBO experiment shows shell instructions override cooperation across 5 games with index from 0.75 to 0.10

Why It Matters

Standardized diagnosis of AI behavioral issues could improve safety and reliability in deployed autonomous agents.