AI Safety

Auditing LLMs for Algorithmic Fairness in Casenote-Augmented Tabular Prediction

Fine-tuned LLMs cut bias while boosting accuracy in housing placement predictions.

Deep Dive

A new technical report from researchers Xiao Qi Lee, Ezinne Nwankwo, and Angela Zhou (arXiv:2604.19204) audits LLMs for algorithmic fairness in high-stakes social service settings, specifically housing placement prediction. The team used real tabular data and street outreach casenotes from a nonprofit partner to evaluate multi-class classification error disparities. They found that a fine-tuned model augmented with casenote summaries improved accuracy while reducing algorithmic fairness disparities, compared to baseline tabular classifiers. However, zero-shot classification with variable importance tweaks yielded mixed fairness results.

Crucially, the outreach casenotes were short and heavily redacted, yet the LLM-based augmentation did not introduce additional textual biases beyond those already present in tabular classification. The authors conclude that combining fine-tuning with casenote summaries offers a low-implementation-burden way to safely leverage additional text information, improving both accuracy and fairness. This work underscores the need for rigorous auditing of LLMs in domains with historical inequities, such as housing placement, to prevent perpetuating bias.

Key Points
  • Fine-tuned LLM with casenote summaries improved accuracy and reduced fairness disparities in housing placement prediction.
  • Zero-shot classification showed mixed fairness results, with variable importance tweaks not consistently reducing bias.
  • Short, redacted casenotes did not introduce extra textual biases beyond those in tabular data, enabling safe LLM augmentation.

Why It Matters

Demonstrates a practical path to fairer AI in social services by auditing and fine-tuning LLMs.