Research & Papers

DisaBench: New Framework Addresses Disability Harms in AI Models

DisaBench evaluates language model harms with 175 prompts and expert insights.

Deep Dive

Eugenia Kim and her team have developed DisaBench, a groundbreaking participatory evaluation framework that focuses on assessing disability-related harms in large language models. Recognizing that existing safety benchmarks fall short, DisaBench introduces a taxonomy of twelve specific disability harm categories co-created with individuals with disabilities and red teaming experts. The framework includes a dataset of 175 prompts, which allows for a nuanced evaluation across seven life domains, ensuring that evaluations consider the context of users’ experiences.

Key findings from the evaluation process reveal that harm rates differ significantly based on disability types, with terminology-driven harm being culturally and temporally specific. Standard safety evaluations typically capture overt failures but often miss subtle, nuanced harms that require domain expertise to identify. By releasing the dataset, taxonomy, and methodology via Hugging Face, DisaBench aims to integrate seamlessly into existing safety pipelines, fostering a more inclusive approach to AI development and deployment.

Key Points
  • Introduces a taxonomy of 12 disability harm categories co-created with experts.
  • Includes a dataset of 175 prompts with human-annotated labels on 525 responses.
  • Reveals that standard evaluations miss subtle harms requiring domain expertise.

Why It Matters

DisaBench enhances AI safety by addressing overlooked disability-related harms, promoting inclusivity.