Audio & Speech

Dual-branch Graph Domain Adaptation for Cross-scenario Multi-modal Emotion Recognition

arXiv eess.AS March 31, 2026

⚡New dual-branch graph model adapts to unseen conversations, beating benchmarks on IEMOCAP and MELD datasets.

Deep Dive

A research team led by Yuntao Shou has introduced a novel AI framework called Dual-branch Graph Domain Adaptation (DGDA) designed to solve a critical flaw in current emotion-sensing AI. Most Multimodal Emotion Recognition in Conversations (MERC) models fail when deployed from a controlled training environment (source domain) to real-world scenarios (target domains) with different speakers, topics, and noise. DGDA tackles this 'domain shift' problem head-on by constructing an emotion interaction graph to map the complex dependencies between utterances in a dialogue.

The core innovation is a dual-branch encoder. One branch uses a Hypergraph Neural Network (HGNN) to explicitly model the many-to-many relationships between utterances, while the other uses a Path Neural Network (PathNN) to implicitly capture longer-range, global emotional dependencies across the conversation. To force the model to learn emotional cues that are invariant across different scenarios, the team integrated a domain adversarial discriminator—a technique that makes the model's core features indistinguishable between its source training data and unseen target data. Furthermore, a regularization loss function is included to reduce the model's sensitivity to potentially incorrect labels in the training data, addressing label noise.

Extensive testing on standard benchmarks IEMOCAP and MELD demonstrated that DGDA consistently outperforms existing strong baselines. The framework also comes with a tighter theoretical generalization bound, providing mathematical assurance of its robustness. By being the first MERC framework to jointly combat both domain shift and label noise, DGDA represents a significant step toward emotion AI that works reliably outside the lab, in the messy and varied landscape of real human interaction.

Key Points

Uses a dual-branch encoder with HGNN and PathNN to model complex emotional dependencies in conversations.
Integrates a domain adversarial discriminator to learn scenario-invariant features, enabling transfer to unseen domains.
Outperforms existing baselines on IEMOCAP and MELD datasets and includes regularization to suppress noisy label effects.

Why It Matters

Enables more reliable AI for customer service, mental health apps, and social robots by making emotion recognition robust across real-world settings.

Read Original Article

Dual-branch Graph Domain Adaptation for Cross-scenario Multi-modal Emotion Recognition

Why It Matters

Stay Ahead in AI