Research & Papers

Quantifying and Attributing Polarization to Annotator Groups

A new metric reveals deep divides in how people judge hate speech and toxicity.

Deep Dive

Researchers have created a new metric to measure polarization between different groups of people labeling online content. It works even when groups are of very different sizes. When applied to hate speech and toxicity datasets, they found strong, persistent disagreement linked to annotator race. Religious and less educated annotators also showed distinct patterns. The team provides an open-source library and estimates the minimum number of annotators needed for reliable results.

Why It Matters

This reveals the subjective human biases behind AI content moderation, which can affect what gets flagged or removed online.