Research & Papers

From Consensus to Split Decisions: ABC-Stratified Sentiment in Holocaust Oral Histories

Research shows off-the-shelf AI sentiment classifiers disagree 60% of the time on sensitive historical testimony.

Deep Dive

A new computational linguistics study reveals significant challenges in applying AI sentiment analysis to sensitive historical narratives. Researcher Daban Q. Jaff tested three off-the-shelf transformer-based sentiment classifiers on a massive corpus of 107,305 utterances from Holocaust oral histories, finding that models frequently disagree—particularly when determining what constitutes neutral sentiment in complex, emotionally layered testimony.

To address this uncertainty, the paper introduces an ABC (agreement-based stability) taxonomy that categorizes model outputs based on their level of consensus. This framework helps researchers identify where sentiment models are most reliable versus where they diverge systematically. The study also employs a T5-based emotion classifier as an auxiliary tool, comparing emotion distributions across different agreement strata to better understand the nature of model disagreements.

The findings highlight the limitations of current AI sentiment tools when applied to domain-shifted, long-form narratives with complex discourse structures. By combining multi-model label triangulation with the ABC taxonomy, the research provides a more cautious, operational framework for using AI in historical analysis—one that acknowledges uncertainty rather than presenting false confidence in model outputs.

Key Points
  • Tested 3 transformer-based sentiment classifiers on 107,305 Holocaust testimony utterances
  • Found low-to-moderate inter-model agreement (60% disagreement rate) driven by neutral sentiment boundaries
  • Introduced ABC taxonomy framework for categorizing model stability in sensitive historical contexts

Why It Matters

Highlights AI's current limitations for analyzing complex human narratives and provides frameworks for more responsible historical analysis.