DNSMOS-C uses MOS-guided triplet contrastive loss to improve latent space organization for perceptual quality?

DNSMOS-C uses MOS-guided triplet contrastive loss to improve latent space organization for perceptual quality.

Achieves better correlation and generalization than DNSMOS Pro without extra computational cost?

Achieves better correlation and generalization than DNSMOS Pro without extra computational cost.

Eliminates reliance on large SSL encoders by jointly learning representations and regression in one framework?

Eliminates reliance on large SSL encoders by jointly learning representations and regression in one framework.

Audio & Speech

DNSMOS-C boosts speech quality assessment with contrastive learning

arXiv eess.AS June 26, 2026

⚡New model achieves better accuracy and generalization without added compute.

Deep Dive

DNSMOS-C, developed by Xinyu Liang and colleagues (accepted at Interspeech 2026), upgrades the DNSMOS Pro speech quality assessment framework by integrating a Mean Opinion Score (MOS)-guided triplet contrastive loss. Unlike prior methods that depend on large pre-trained self-supervised learning (SSL) encoders and multi-stage training, DNSMOS-C jointly learns speech representations and MOS regression within a single, unified pipeline. This design keeps the model compact and efficient while improving the organization of its latent space according to perceptual quality.

Experimental results across multiple datasets show that DNSMOS-C consistently outperforms DNSMOS Pro in correlation metrics and demonstrates superior generalization on challenging out-of-domain test sets. The contrastive supervision, applied directly to intermediate embeddings, encourages emergent low-dimensional quality ordering in the latent space. This ordering enhances interpretability and training stability without incurring additional computational overhead. The approach is particularly significant for real-time speech applications where both accuracy and low latency are critical.

Key Points

DNSMOS-C uses MOS-guided triplet contrastive loss to improve latent space organization for perceptual quality.
Achieves better correlation and generalization than DNSMOS Pro without extra computational cost.
Eliminates reliance on large SSL encoders by jointly learning representations and regression in one framework.

Why It Matters

Enables more accurate, lightweight speech quality assessment for real-time communications and audio AI products.

Read Original Article

DNSMOS-C boosts speech quality assessment with contrastive learning

Why It Matters

Related Articles

🚀 Stay Ahead in AI