AI Safety

Assessing Cognitive Biases in LLMs for Judicial Decision Support: Virtuous Victim and Halo Effects

AI models like GPT-5 and Claude 4 show human-like biases in simulated sentencing decisions, raising fairness concerns.

Deep Dive

A Stanford University study led by Sierra Liu systematically evaluated whether leading large language models exhibit human cognitive biases when applied to judicial sentencing scenarios. The research tested five representative models—ChatGPT 5 Instant, ChatGPT 5 Thinking, DeepSeek V3.1, Claude Sonnet 4, and Gemini 2.5 Flash—using carefully constructed vignettes designed to avoid training data contamination. Researchers isolated two key biases: the virtuous victim effect (where sympathy for victims influences sentencing) and prestige-based halo effects (where occupation, company, or credentials sway judgments).

The findings reveal a complex picture of AI fairness. All tested models displayed a significant virtuous victim effect, mirroring human tendencies to recommend harsher sentences when victims are portrayed as particularly virtuous. However, unlike humans, the models showed no statistically significant penalty reduction for cases involving 'adjacent consent' scenarios. For halo effects, the AI systems generally showed reduced bias compared to human benchmarks—except for credential-based prestige, where bias reduction was substantial. The study concludes that while LLMs show promise for reducing certain biases, their inconsistent performance across different models and conditions makes them currently unsuitable for direct judicial decision support.

Key Points
  • Five major LLMs (GPT-5, Claude 4, Gemini 2.5, DeepSeek V3.1) tested show human-like 'virtuous victim' bias in sentencing scenarios
  • Credential-based halo effects reduced by 40-60% in AI models compared to human benchmarks, but occupation/company biases persist
  • No significant penalty reduction for 'adjacent consent' cases where AI diverges from typical human judicial patterns

Why It Matters

As courts explore AI assistance, this research highlights both the promise and peril of using LLMs for high-stakes fairness-critical decisions.