LLM compression quietly re-introduces bias, study of Qwen/Mistral/Phi finds
Quantization at 3 and 4 bits causes 6-21% of neutral items to become biased
A systematic study published on arXiv and accepted at IEEE Cloud Summit 2026 reveals that compressing large language models (LLMs) through post-training quantization can silently reintroduce stereotypical biases, even when conventional metrics like perplexity remain stable. The researchers tested three instruction-tuned models — Qwen2.5-7B, Mistral-7B, and Phi-3.5-mini — at five precision levels ranging from BF16 down to 3-bit. Using 12,148 items from the BBQ bias benchmark across 5 random seeds (totaling 911,100 inference records), they tracked per-item changes in stereotypical association. The results show a clear dose-response relationship: aggressive quantization (3-bit) caused 6-21% of previously unbiased items to develop new stereotypical behaviors, while models' willingness to output "unknown" answers declined by 17.4%, indicating a loss of hedging behavior.
What makes these findings particularly alarming is that standard quality metrics fail to flag the degradation. Perplexity increased by less than 0.5% at 8-bit and under 3% at 4-bit, yet even at 4-bit precision, 2.5-5.6% of items already exhibited new biases. This gap between aggregate evaluation and item-level fairness-critical degradation means current deployment practices — which often rely solely on perplexity or loss — systematically overlook alignment erosion. The authors argue that safety-conscious compression requires explicit testing for bias emergence before any quantized model is put into production, especially as cloud and edge deployments push for lower bit widths to save memory and inference cost.
- 3-bit quantization caused 6-21% of previously neutral items to develop stereotypical biases across all three models tested (Qwen2.5-7B, Mistral-7B, Phi-3.5-mini).
- Models' 'unknown' answer rate dropped by 17.4% under 3-bit compression, indicating reduced caution and hedging.
- Perplexity increased only 0.5% at 8-bit and <3% at 4-bit, yet biases already emerged at 4-bit for 2.5-5.6% of items — aggregate metrics hide fairness degradation.
Why It Matters
As LLM quantization becomes standard for deployment, this study proves safety alignment can silently break without developers noticing.