Uncertainty Drives Social Bias Changes in Quantized Large Language Models
Making AI models smaller can secretly flip their biases, creating hidden risks.
Deep Dive
A new study reveals that compressing large language models to make them run faster fundamentally alters their social biases in unpredictable ways. While overall bias scores may appear unchanged, up to 21% of individual responses can flip between biased and unbiased states. This 'masked bias flipping' is driven by model uncertainty and is worse with aggressive compression, disproportionately impacting different demographic groups and creating misleadingly neutral aggregate results.
Why It Matters
This shows that making AI models efficient can introduce hidden, unequal harms, requiring new safety checks.