iMiGUE-Speech: A Spontaneous Speech Dataset for Affective Analysis
New dataset uses authentic reactions from tennis match interviews, not actors, to train emotion AI.
A team of researchers led by Sofoklis Kakouros has introduced iMiGUE-Speech, a novel dataset designed to advance the field of affective computing by capturing spontaneous, real-world emotional speech. The dataset, an extension of the existing iMiGUE corpus, focuses on audio recordings from post-match tennis interviews, providing a rare resource of genuine emotional reactions tied to actual competition outcomes—wins and losses—rather than scripted or lab-induced performances. This addresses a critical gap in AI training data, where most emotion recognition systems are trained on acted or exaggerated expressions, limiting their real-world applicability. The release establishes initial benchmarks for two core tasks: speech emotion recognition (SER) and transcript-based sentiment analysis, leveraging state-of-the-art pre-trained models to evaluate performance.
The technical foundation of iMiGUE-Speech includes detailed metadata such as speaker-role separation (interviewer vs. interviewee), speech transcripts, and precise word-level forced alignments that timestamp each spoken word. This structure allows researchers to analyze the acoustic properties of speech (like tone and pitch) in sync with linguistic content. A key differentiator is its ability to be synchronously paired with the original iMiGUE dataset's micro-gesture annotations, creating a unique multimodal resource for studying the interplay between speech, emotion, and subtle body language. By providing this high-quality, naturally elicited data, the dataset paves the way for developing more robust and nuanced AI models capable of understanding human affect in authentic, unstructured interactions, such as customer service calls, mental health monitoring, or human-computer interfaces.
- Dataset captures spontaneous emotions from real tennis match interviews, not acted performances.
- Includes speech transcripts, speaker-role tags, and word-level forced alignments for precise analysis.
- Can be paired with micro-gesture data for unique multimodal emotion and gesture research.
Why It Matters
Enables training of AI emotion recognition on real human reactions, leading to more authentic and effective affective computing applications.