Recognition Without Authorization: LLMs and the Moral Order of Online Advice
New research shows LLMs withhold directive advice on abuse, unlike human communities.
A new study by Tom van Nuenen, published on arXiv, compares four assistant-style LLMs with community-endorsed advice from 11,565 posts on r/relationship_advice. The subreddit serves as a concentrated, vote-ratified moral formation where prescriptive clarity makes divergence measurable. Across models, LLMs identify many of the same dynamics as human commenters, but they are markedly less likely to convert that recognition into directive authorization for action. The gap is sharpest where community consensus is strongest: on high-consensus posts involving abuse or safety threats, models recommend exit at roughly half the human rate while maintaining elevated levels of hedging, validation, and therapeutic framing.
The study describes this pattern as 'recognition without authorization': the capacity to register harm while withholding socially ratified permission for consequential action. This divergence is not incidental but structural, reflecting a portable advisory style that remains validating, risk-averse, and weakly directive across contexts. Safety alignment is one plausible contributor, alongside training-data averaging and broader assistant design. The article argues that model divergence can be reframed from a technical error to a way of seeing what standardized assistant norms flatten when they encounter situated moral worlds. This research has significant implications for deploying LLMs in sensitive domains like mental health support or crisis counseling.
- LLMs recommend exit on abuse posts at roughly half the rate of human commenters.
- The study analyzed 11,565 posts from r/relationship_advice across four assistant-style LLMs.
- Models use more hedging, validation, and therapeutic framing, avoiding directive authorization.
Why It Matters
LLMs may under-prescribe critical actions in sensitive contexts, risking user safety in advice roles.