AffectAI-Capture: New Protocol Transforms Small-Group Meeting Research
Synchronized eye tracking, physiology, and audio in 4-person meetings
AffectAI-Capture, developed by a team of nine researchers from institutions including IT University of Copenhagen, Technical University of Denmark, and others, introduces a comprehensive protocol for capturing synchronized multimodal data during small-group meetings. The system integrates eye tracking, wearable physiology sensors (e.g., heart rate, skin conductance), close-talk microphones and room audio, multi-view video, manual event logging, and structured self-report questionnaires. Sessions are structured around fixed task blocks adapted from established group-interaction paradigms (e.g., collaborative problem-solving, debate), ensuring ecological validity and comparability across studies. The core innovation is a single authoritative event timeline that synchronizes all data streams, from eye gaze coordinates to audio waveforms, via a centralized clock and standardized output formats (e.g., CSV, video containers). This eliminates the typical synchronization headaches that plague multimodal research.
Pilot-level validation has been conducted using controlled bench tests for audio quality (e.g., signal-to-noise ratio) and video synchronization accuracy (sub-frame precision). Full protocol sessions with human participants are currently ongoing. The protocol's reproducibility is emphasized: all hardware specifications, software configurations, and processing pipelines are documented openly, allowing other labs to replicate the setup exactly. This is crucial for advancing AI models that understand group dynamics, emotional affect, and nonverbal behavior in real-world contexts. While not yet a finished product, AffectAI-Capture represents a significant step toward standardized data collection in human-computer interaction and affective computing, potentially enabling more robust training datasets for AI meeting assistants, social robots, and behavioral analytics tools.
- Combines eye tracking, wearable physiology, close-talk/room audio, multi-view video, event logging, and self-report in one synchronized pipeline.
- Uses fixed task blocks based on established group-interaction paradigms for ecological validity and comparability.
- Pilot validation of audio quality and video synchronization completed; full human sessions are ongoing.
Why It Matters
Standardized multimodal data collection could accelerate AI models for group dynamics and affective computing.