Dual-stream architecture separates semantic (meaning) from beat (rhythm) gestures?

Dual-stream architecture separates semantic (meaning) from beat (rhythm) gestures.

Semantic Variational Information Bottleneck learns frame-level override of beat by semantic streams?

Semantic Variational Information Bottleneck learns frame-level override of beat by semantic streams.

Inertial Beat Prior reduces jitter and improves rhythmic smoothness without limiting semantic expressivity?

Inertial Beat Prior reduces jitter and improves rhythmic smoothness without limiting semantic expressivity.

Research & Papers

DuoGesture splits gesture generation into semantic and beat streams

arXiv cs.CV May 27, 2026

⚡DuoGesture's dual-stream AI creates biomechanically plausible, intelligible co-speech gestures.

Deep Dive

DuoGesture is a novel dual-stream AI system for generating co-speech gestures—hand and body movements that accompany speech. Existing holistic models mix lexically grounded semantic gestures with rhythmic beat gestures, limiting semantic grounding and kinematic smoothness. The DuoGesture architecture decomposes synthesis into two coupled streams: a semantic stream responsible for meaning-driven gestures, and a beat stream for prosody-aligned rhythmic motion. A Semantic Variational Information Bottleneck acts as a stochastic frame-level gate, learning when semantic gestures should override rhythmic beats. The semantic stream is further enhanced by Motion-Grounded Semantic Conditioning, replacing pure word embeddings with motion-language representations to better handle long-tailed lexical triggers.

The beat stream is regularized by an Inertial Beat Prior, an anthropometry-weighted arm-chain module that reduces jitter and improves rhythmic consistency without constraining semantic frames. Objective evaluations and subjective human experiments show DuoGesture outperforms strong holistic baselines. Component ablations confirm the complementary roles of semantic grounding, stochastic stream selection, and biomechanical regularization. The work is published on arXiv (2605.26236) and spans computer vision and speech processing.

Key Points

Dual-stream architecture separates semantic (meaning) from beat (rhythm) gestures.
Semantic Variational Information Bottleneck learns frame-level override of beat by semantic streams.
Inertial Beat Prior reduces jitter and improves rhythmic smoothness without limiting semantic expressivity.

Why It Matters

More natural virtual agents and animators can now generate intelligible, smooth co-speech gestures automatically.

Read Original Article

DuoGesture splits gesture generation into semantic and beat streams

Why It Matters

Related Articles

🚀 Stay Ahead in AI