The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning
New method lets AI think while listening, achieving competitive performance without added latency.
A research team including Yoshua Bengio has introduced FLAIR (Full-duplex LAtent and Internal Reasoning), a novel approach that enables AI dialogue systems to engage in continuous internal reasoning while simultaneously listening to human speech. Unlike conventional systems that must wait for speech to complete before beginning their "thinking" process, FLAIR recursively processes latent embeddings during the user's speaking phase, allowing for real-time cognitive processing that strictly adheres to causality without introducing additional latency. This mimics human conversational cognition where we think while listening, rather than thinking only after hearing complete utterances.
The technical innovation centers on an Evidence Lower Bound-based objective that enables efficient supervised fine-tuning via teacher forcing, eliminating the need for explicit reasoning annotations that would be difficult to obtain at scale. Experiments demonstrate that this think-while-listening design achieves competitive results across multiple speech benchmarks while robustly handling conversational dynamics. The system maintains full-duplex interaction capabilities, meaning it can process incoming speech while simultaneously preparing responses, creating more natural and responsive conversational experiences.
This approach represents a significant departure from traditional sequential processing in dialogue systems, where speech recognition, reasoning, and response generation occur in discrete, non-overlapping stages. By enabling latent reasoning during speech perception, FLAIR reduces the cognitive lag that plagues current conversational AI systems, potentially leading to more natural interactions where responses feel immediate and contextually appropriate rather than delayed and formulaic.
- Enables continuous latent reasoning during speech perception without added latency
- Uses recursive embedding updates and Evidence Lower Bound-based training objective
- Achieves competitive performance on speech benchmarks and full-duplex interaction metrics
Why It Matters
Enables more natural, responsive conversational AI by reducing cognitive lag in dialogue systems.