Audio & Speech

Dual-LoRA: Parameter-Efficient Adversarial Disentanglement for Cross-Lingual Speaker Verification

Solves language-speaker entanglement with 0.91% error rate on TidyVoice benchmark.

Deep Dive

A team of researchers from Shanghai Jiao Tong University, led by Qituan Shangguan and including Junhao Du, Kunyang Peng, Feng Xue, Hui Zhang, Xinsheng Wang, Kai Yu, and Shuai Wang, has introduced Dual-LoRA, a novel approach to cross-lingual speaker verification. Published on arXiv and submitted to Interspeech 2026, the method tackles the pervasive problem of language-speaker entanglement, which causes systematic errors when verifying speakers across different languages. Standard adversarial disentanglement often degrades speaker discriminability by penalizing traits that correlate with language but are essential for speaker identity.

Dual-LoRA addresses this by injecting trainable, task-factorized LoRA (Low-Rank Adaptation) adapters into a frozen pre-trained backbone, enabling parameter-efficient fine-tuning. Its core innovation is a Language-Anchored Adversary, which uses an explicit language branch to guide adversarial gradients toward true linguistic cues rather than arbitrary correlations. This preserves essential speaker characteristics. Evaluated on the TidyVoice benchmark, the system achieved a 0.91% validation equal error rate (EER) and secured 3rd place in the official challenge, demonstrating significant improvements over prior methods.

Key Points
  • Dual-LoRA uses task-factorized LoRA adapters for parameter-efficient fine-tuning on frozen pre-trained models.
  • Language-Anchored Adversary guides adversarial disentanglement to target language cues, not speaker traits.
  • Achieves 0.91% validation EER on TidyVoice benchmark, ranking 3rd in the official challenge.

Why It Matters

Enables reliable speaker verification across languages, critical for global security and voice authentication systems.