Validation of a Small Language Model for DSM-5 Substance Category Classification in Child Welfare Records
A small, locally hosted AI model achieved near-perfect agreement with human experts for classifying five key substance categories.
A research team from the University of Michigan and other institutions has successfully validated a small, locally deployable language model for a critical social work task. Their 20-billion-parameter model was designed to classify specific substance types mentioned in child welfare investigation narratives, moving beyond the simpler binary detection of 'substance-related problems' used in prior studies. The model was trained to identify seven categories aligned with the DSM-5, the standard diagnostic manual for mental health disorders.
The validation results are striking. For five of the seven substance categories—alcohol, cannabis, opioids, stimulants, and sedatives/hypnotics/anxiolytics—the model achieved 'almost perfect' agreement with expert human reviewers, with Cohen's kappa scores ranging from 0.94 to a perfect 1.00. Classification precision for these categories ranged from 92% to 100%. The model also demonstrated high test-retest stability (92.1% to 99.1% agreement) when classifying approximately 15,000 records. Only two low-prevalence categories, hallucinogens and inhalants, performed poorly, likely due to insufficient training data.
This research demonstrates that smaller, specialized AI models can match or exceed the performance of larger general-purpose LLMs for specific domain tasks. The local deployment aspect is crucial for child welfare agencies, as it allows for the analysis of highly sensitive case narratives without sending data to external cloud servers, addressing significant privacy and security concerns. This paves the way for more nuanced data analysis to inform policy and resource allocation.
- The 20B-parameter SLM achieved near-perfect agreement (κ=0.94-1.00) with human experts for 5 out of 7 DSM-5 substance categories.
- Classification precision for the top categories (alcohol, cannabis, opioids, stimulants, sedatives) ranged from 92% to 100% on child welfare narratives.
- The model enables local, private analysis for sensitive records, moving beyond binary detection to specific multi-label substance identification.
Why It Matters
Enables child welfare agencies to privately analyze case data with clinical precision, improving resource targeting for substance-related interventions.