Audio & Speech

Detect, Attend and Extract: Keyword Guided Target Speaker Extraction

New AI isolates a speaker from a noisy mix just by hearing a word they say.

Deep Dive

Researchers have developed a new AI system that can extract a single person's speech from a multi-speaker recording, like a busy meeting. Instead of needing a clean voice sample for reference, it uses a keyword the target person says—such as "project" or "budget"—to identify and isolate their voice. The system first detects the keyword, focuses on that speaker, and then extracts their audio. Experiments show it outperforms traditional methods that require a pre-recorded voice sample.

Why It Matters

This makes voice isolation practical for real-world calls and recordings where a clean reference voice isn't available.