Moral Sensitivity Index (MSI) measures graduated bias across seven tiers from abstract math to historical injustice?

Moral Sensitivity Index (MSI) measures graduated bias across seven tiers from abstract math to historical injustice

Gemini 1.5 reached 72.7% MSI by Tier 5 under socioeconomic framing; Claude suppressed bias via identity-based safety?

Gemini 1.5 reached 72.7% MSI by Tier 5 under socioeconomic framing; Claude suppressed bias via identity-based safety

Mechanistic analysis shows a U-curve?

SLMs have strong criminal bias, instruction-tuning removes it, but reasoning distillation re-introduces it to SLM levels

AI Safety

New LLM bias study reveals Gemini 1.5 hits 72.7% moral sensitivity score

arXiv cs.CY May 06, 2026

⚡Seven-tier stress test uncovers a U-curve of bias across models and scales.

Deep Dive

A new study from researchers including Yash Aggarwal and Aman Chadha introduces the Moral Sensitivity Index (MSI), a metric that quantifies bias in LLMs across a graduated seven-tier stress test—ranging from abstract numerical problems to scenarios rooted in historical and socioeconomic injustice. Evaluating Claude 3.5, Qwen 3.5, Llama 3, and Gemini 1.5, the team found distinct behavioral signatures. Gemini 1.5 reached 72.7% MSI by Tier 5 under socioeconomic framing, while Claude exhibited sharp suppression consistent with identity-based safety training.

For mechanistic validation, the researchers selected criminal-bias scenarios and applied logit lens, attention analysis, activation patching, and semantic probing to six models across three capability tiers. Circuit-level analysis revealed a U-curve of bias: small language models (SLMs) exhibit strong criminal bias; scaling to instruction-tuned models eliminates it; however, reasoning distillation reintroduces bias to SLM-like levels despite identical parameter counts. This suggests distillation compresses reasoning traces in ways that reactivate shallow statistical associations, providing cross-stage validation that socially loaded cues drive the same bias-driving circuits identified mechanistically.

Key Points

Moral Sensitivity Index (MSI) measures graduated bias across seven tiers from abstract math to historical injustice
Gemini 1.5 reached 72.7% MSI by Tier 5 under socioeconomic framing; Claude suppressed bias via identity-based safety
Mechanistic analysis shows a U-curve: SLMs have strong criminal bias, instruction-tuning removes it, but reasoning distillation re-introduces it to SLM levels

Why It Matters

Reveals how bias emerges and re-emerges during model distillation, critical for safe LLM deployment in high-stakes domains like criminal justice.

Read Original Article

New LLM bias study reveals Gemini 1.5 hits 72.7% moral sensitivity score

Why It Matters

Related Articles

Stay Ahead in AI