CR4T: Rewrite-Based Guardrails for Safer Teen AI Interactions
New framework replaces refusals with age-appropriate guidance for adolescent LLM safety.
A new paper from researchers Heajun An, Qi Zhang, Vedanth Achanta, and Jin-Hee Cho introduces CR4T (Critique-and-Revise-for-Teenagers), a guardrail framework designed specifically for adolescent LLM safety. The authors argue that current safety mechanisms are built on adult-centric norms and rely on refusal-oriented suppression, which creates conversational dead-ends and fails to address the developmental vulnerabilities of teen-AI interactions. Instead, CR4T treats safety as a socio-technical transformation problem: it selectively reconstructs unsafe or refusal-style outputs into age-appropriate, guidance-oriented responses while preserving the original benign intent.
CR4T combines lightweight risk detection with domain-conditioned rewriting to remove risk-amplifying content, reduce unnecessary conversational shutdowns, and introduce developmentally appropriate guidance. Experimental results demonstrate that targeted rewriting substantially reduces unsafe and refusal-oriented outcomes while avoiding unnecessary intervention on acceptable interactions. The framework is model-agnostic, meaning it can be applied to any LLM without requiring architectural changes. By replacing blanket censorship with constructive rewriting, CR4T offers a more human-centered alternative for the growing number of AI systems embedded in adolescent digital environments.
- CR4T uses lightweight risk detection + domain-conditioned rewriting to replace unsafe/refusal outputs with guidance-oriented responses.
- Model-agnostic framework works with any LLM, no architectural changes required.
- Experimental results show substantial reduction in both unsafe content and conversational dead-ends while preserving benign intent.
Why It Matters
As teens increasingly use LLMs, CR4T offers a safer, more constructive alternative to blanket refusal filters.