Enroll-on-Wakeup: A First Comparative Study of Target Speech Extraction for Seamless Interaction in Real Noisy Human-Machine Dialogue Scenarios
New framework uses your wake word as a voiceprint, removing the need for a separate enrollment step.
Researchers Yiming Yang et al. propose Enroll-on-Wakeup (EoW), a novel framework for Target Speech Extraction (TSE). It automatically uses the short, noisy wake-word segment (like "Hey Siri") as the enrollment reference, eliminating the need for pre-recorded high-quality speech. Their study found current TSE models degrade with this method, but augmentation using LLM-based Text-to-Speech (TTS) significantly improves the listening experience in real noisy dialogue scenarios.
Why It Matters
Enables more natural, spontaneous interactions with voice assistants by removing the clunky voice training step.