Research & Papers

Differential Privacy Cuts Bias in LLMs, But Only in Some Tasks

New study finds DP-SGD reduces bias in sentence scoring, but not universally.

Deep Dive

A new systematic evaluation by Eduardo Tenorio, Karuna Bhaila, and Xintao Wu examines the relationship between differential privacy (DP) and social bias in large language models (LLMs). The authors trained a pretrained LLM with DP-SGD (differentially private stochastic gradient descent) and compared it against non-DP baselines across four evaluation paradigms: sentence scoring, text completion, tabular classification, and question answering. Their goal was to determine whether adding privacy guarantees inadvertently affects or mitigates social biases.

The results are mixed. DP reduced bias in sentence scoring tasks, where bias is measured by comparing likelihoods of controlled sentence pairs. However, this improvement did not carry over to other tasks—such as text completion or QA—indicating that the effect of DP on bias is task-dependent. The study also uncovered a discrepancy between logit-level bias (hidden representation differences) and output-level bias (final generated text). Crucially, lowering memorization (a key DP goal) did not consistently reduce unfairness, suggesting that privacy and fairness are not aligned by default. The authors advocate for multi-paradigm evaluations when assessing fairness in differentially private LLMs.

Key Points
  • DP-SGD reduces bias in sentence scoring (controlled likelihood comparisons) but not in text completion, tabular classification, or QA.
  • A gap exists between logit-level bias and output-level bias, meaning internal representations don't predict final fairness.
  • Decreasing memorization via DP does not necessarily reduce unfairness—privacy and fairness are distinct objectives.

Why It Matters

For developers deploying private LLMs, this shows privacy techniques don't automatically fix bias and require task-specific fairness checks.