Variational Encoder--Multi-Decoder (VE-MD) for Privacy-by-functional-design (Group) Emotion Recognition
New AI framework achieves 90% accuracy on group emotion recognition while preserving privacy by design.
A research team from Université Grenoble Alpes (UGA) and the LIG laboratory has introduced VE-MD, a novel AI framework for Group Emotion Recognition (GER) that prioritizes privacy through its functional design. Unlike conventional methods that rely on cropped faces, person tracking, or per-person feature extraction—raising significant privacy concerns—VE-MD is constrained to predict only aggregate, group-level emotional states. It employs a Variational Encoder-Multi-Decoder architecture that learns a shared latent representation jointly optimized for emotion classification and internal prediction of body and facial structural representations. This approach intentionally avoids identity recognition or individual emotion outputs, making it suitable for deployment in sensitive environments like classrooms, crowds, and public events where only collective understanding is needed.
The framework was tested on six in-the-wild datasets, including two GER and four Individual Emotion Recognition (IER) benchmarks. The results, published on arXiv, reveal a key distinction: for GER, optimizing the latent space alone tends to attenuate interaction-related cues, whereas preserving explicit structural outputs through decoders (like a transformer-based PersonQuery or a dense Heatmap decoder) significantly improves collective affect inference. VE-MD achieved state-of-the-art performance, reaching 90.06% accuracy on the GAF-3.0 dataset and 82.25% on VGAF with audio fusion. It also showed competitive results on individual emotion tasks, outperforming SOTA on the SamSemo dataset with 77.9% accuracy when adding text modality. The research demonstrates that preserving interaction-related structural information is crucial for accurate group-level modeling without the privacy risks of individual monitoring.
- Achieves 90.06% state-of-the-art accuracy on the GAF-3.0 group emotion dataset.
- Uses a privacy-by-design approach, avoiding individual tracking by predicting only aggregate group affect.
- Shows structural supervision (via PersonQuery or Heatmap decoders) is key for modeling group interactions, unlike individual tasks.
Why It Matters
Enables ethical deployment of emotion AI in sensitive public spaces like schools and events, balancing accuracy with privacy.