Every(bot) Makes Mistakes: Coding Big Five Personalities, Context, and Tone into an LLM Chatbot Recovery Code Framework
Structured personality and tone coding lifts chatbot error recovery from 48.9% to 76.7%
A new research paper titled "Every(bot) Makes Mistakes: Coding Big Five Personalities, Context, and Tone into an LLM Chatbot Recovery Code Framework" presents a structured approach to handling chatbot errors. The authors—Rachel Hill, Tom Owen, and Julian Hough—designed a recovery code that maps four common LLM task contexts to specific Big Five personality traits (Conscientiousness, Agreeableness, Openness, and Extraversion), corresponding tones, and three-stage recovery instructions. They also built an evaluation rubric with three dimensions (Recovery quality, Tone alignment, Appropriateness) and nine sub-dimensions.
In an exploratory between-subjects experiment, four separate Claude Sonnet 4.6 agents received no recovery code training (baseline), while four others were trained on the code. Identical error-scenario prompts were used across conditions, and eight LLM evaluator agents scored responses. Results showed a 27.8% average improvement: coded responses scored 76.7% overall versus 48.9% for baseline. The coded condition achieved 83.3% in appropriateness, with personality appropriateness jumping from 50% to 75% and explanation provision from 20% to 60%. These findings demonstrate that structured personality, context, and tone-informed recovery codes can be effectively learned by LLMs, significantly improving error recovery quality.
- Coded recovery responses improved 27.8% over baselines (76.7% vs 48.9%) using Claude Sonnet 4.6
- Appropriateness dimension scored highest at 83.3%, with personality appropriateness rising from 50% to 75%
- Explanation provision saw the largest absolute gain: 20% baseline to 60% with recovery code training
Why It Matters
A proven template to make chatbot errors less harmful, boosting user trust and engagement through personality-aware recovery.