Disentangling Pitch and Creak for Speaker Identity Preservation in Speech Synthesis
This breakthrough could make AI voices sound more natural and trustworthy than ever.
Researchers have developed a new speech synthesis system that can modify vocal 'creak'—the rough, low-pitch quality in voices—while perfectly preserving the speaker's identity. Using a conditional continuous normalizing flow technique, the model disentangles pitch from creak during training. Experiments show it significantly improves speaker verification performance across various creak manipulation strengths, achieving more natural-sounding voice modifications that don't compromise who the speaker sounds like.
Why It Matters
This enables more realistic and secure AI voice generation for content creation, accessibility tools, and entertainment.