Audio & Speech

Exploring the potential and limitations of Model Merging for Multi-Domain Adaptation in ASR

arXiv eess.AS March 06, 2026

⚡A new merging algorithm outperforms full fine-tuning on 10 domains while preserving a single model's generalization.

Deep Dive

A team of researchers has published a significant study on model merging as a scalable alternative to multi-task training for large speech foundation models. The paper, titled 'Exploring the potential and limitations of Model Merging for Multi-Domain Adaptation in ASR,' investigates how to combine multiple specialized, domain-tuned models into a single unified checkpoint. This approach is crucial for ASR systems, which often require expensive, repeated fine-tuning for new data across different domains like medical, legal, or conversational speech. The researchers argue that maintaining numerous custom checkpoints is computationally prohibitive, making efficient merging a vital technique.

The team rigorously benchmarked 11 different merging algorithms across 10 distinct European Portuguese domains, evaluating in-domain accuracy, robustness to distribution shifts, and performance in English and multilingual settings. Their key contribution is BoostedTSV-M, a novel algorithm based on TSV-M that improves numerical stability and mitigates 'rank collapse'—a common problem where merged models lose capability—through singular-value boosting. The results show their merged model not only matches but outperforms traditional full fine-tuning on the target European Portuguese data. Crucially, it achieves this while preserving the general language capabilities of the original foundation model, maintaining strong performance on out-of-distribution and multilingual tasks, all within a single, deployable model. The work has been submitted for review at INTERSPEECH 2026.

Key Points

Introduced BoostedTSV-M, a new merging algorithm that prevents rank collapse via singular-value boosting for better stability.
Benchmarked 11 merging methods across 10 European Portuguese ASR domains, outperforming costly full fine-tuning.
Achieved superior in-domain accuracy while preserving a single model's out-of-distribution and multilingual generalization.

Why It Matters

Enables efficient, unified speech models for multiple specialized domains, drastically reducing compute costs and deployment complexity.

Read Original Article

Exploring the potential and limitations of Model Merging for Multi-Domain Adaptation in ASR

Why It Matters

Stay Ahead in AI