Full-Duplex-Bench v1.5: Evaluating Overlap Handling for Full-Duplex Speech Models
New benchmark reveals two AI strategies for managing overlapping speech in real-time.
Full-duplex spoken dialogue systems aim to move human-machine interaction beyond rigid turn-taking into fluid, natural conversation. However, the key challenge—managing overlapping speech—has been under-evaluated. Full-Duplex-Bench v1.5, introduced by a team from multiple institutions, is the first fully automated benchmark to systematically probe model behavior during speech overlap. It simulates four realistic scenarios: user interruption, user backchannel (e.g., "uh-huh"), talking to others, and background speech. The framework works with both open-source and commercial API-based models, offering metrics for categorical dialogue behaviors, stop and response latency, and prosodic adaptation.
Benchmarking five state-of-the-art agents revealed two divergent strategies: a responsive approach that prioritizes rapid response to user input, and a floor-holding approach that preserves conversational flow by filtering overlapping events. The open-source framework, accepted at ICASSP 2026, includes code and data for reproducible evaluation, enabling practitioners to accelerate development of robust full-duplex systems.
- First fully automated benchmark simulating four overlap scenarios: interruption, backchannel, talking to others, background speech
- Five state-of-the-art agents tested, revealing responsive vs. floor-holding strategies
- Accepted at ICASSP 2026; open-source code and data available for reproducible evaluation
Why It Matters
Enables more natural human-AI voice interactions by systematically evaluating overlap handling, a key barrier to fluid conversation.