Audio & Speech

HumDial-EIBench: A Human-Recorded Multi-Turn Emotional Intelligence Benchmark for Audio Language Models

New benchmark uses real human conversations to test AI's emotional tracking and reasoning.

Deep Dive

A research team led by Shuiyuan Wang has introduced HumDial-EIBench, a novel benchmark designed to rigorously test the emotional intelligence (EI) of Audio Language Models (ALMs). Unlike previous benchmarks that relied on synthetic speech and single-turn interactions, this tool uses 1,259 real, human-recorded multi-turn dialogues from the ICASSP 2026 HumDial Challenge. It reformulates complex EI tasks—like tracking emotions across a conversation and reasoning about their causes—into objective multiple-choice questions with carefully crafted adversarial distractors. This approach mitigates the subjective scoring bias that has plagued previous evaluations of AI's cognitive and emotional capabilities.

HumDial-EIBench also introduces a critical new test: an acoustic-semantic conflict task where the tone of voice contradicts the spoken words, assessing a model's robustness to contradictory multimodal signals. When the team evaluated eight state-of-the-art ALMs, the results were revealing. Most models failed at multi-turn emotional tracking and implicit causal reasoning. Furthermore, all exhibited "decoupled" empathy, where their textual and acoustic responses to emotion were misaligned, and showed a severe "text-dominance bias," overwhelmingly prioritizing written content over vocal emotion when the two were in conflict. This benchmark provides a much-needed, standardized tool for developers to identify and address these critical weaknesses in conversational AI.

Key Points
  • Uses 1,259 real human dialogues instead of synthetic speech for authentic testing.
  • Replaces open-ended scoring with objective multiple-choice questions featuring adversarial distractors.
  • Reveals that all 8 tested ALMs have a severe text-dominance bias in cross-modal conflicts.

Why It Matters

Provides a standardized tool to build more emotionally aware and robust voice assistants and conversational AI.