Audio & Speech

Unmixing the Crowd: Learning Mixture-to-Set Speaker Embeddings for Enrollment-Free Target Speech Extraction

arXiv eess.AS April 06, 2026

⚡The model predicts speaker identities directly from noisy audio, eliminating the need for a clean enrollment recording.

Deep Dive

A team from the University of Michigan and Meta has published a novel AI research paper titled 'Unmixing the Crowd: Learning Mixture-to-Set Speaker Embeddings for Enrollment-Free Target Speech Extraction.' The core innovation is a model that bypasses a major hurdle in audio processing: the need for a clean 'enrollment' sample of a speaker's voice to extract it from a noisy mix. Instead, their system analyzes the mixed audio itself to predict a small set of candidate speaker embeddings, which then act as control signals to isolate individual voices.

This approach, trained with permutation-invariant teacher supervision to align with a strong single-speaker embedding space, creates a structured and clusterable identity space from chaos. On the noisy LibriMix benchmark, it outperformed the previous method of using WavLM features with K-means clustering. When these predicted embeddings are fed into standard speech extraction back-ends, they consistently improve both objective sound quality and intelligibility scores. Crucially, the model also generalizes effectively to real-world recordings from the DNS-Challenge, demonstrating practical potential beyond controlled lab datasets.

Key Points

Eliminates the need for a clean enrollment recording, a major barrier for real-world use.
Outperforms the WavLM+K-means baseline on standard speaker clustering metrics.
Improves objective speech quality and intelligibility when integrated with extraction models and works on real noise.

Why It Matters

Enables clearer voice isolation in crowded real-world settings like video calls, smart assistants, and hearing aids.

Read Original Article

Unmixing the Crowd: Learning Mixture-to-Set Speaker Embeddings for Enrollment-Free Target Speech Extraction

Why It Matters

Stay Ahead in AI