Audio & Speech

Explainable AI in Speaker Recognition -- Making Latent Representations Understandable

arXiv eess.AS April 28, 2026

⚡New study uncovers hierarchical clusters in neural networks, not just flat ones...

Deep Dive

In a new paper on arXiv (2604.23354), researchers Yanze Xu, Wenwu Wang, and Mark D. Plumbley from the University of Surrey tackle a key Explainable AI (XAI) question: how do neural networks organize their internal representations? Focusing on speaker recognition networks, they challenge the prevailing view that these representations form independent, flat clusters. Instead, they applied Single-Linkage Clustering (SLINK) and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) to show that representations often form hierarchical clusters—nested groupings that reveal deeper organizational patterns.

To make these hierarchies interpretable, the team designed a new algorithm called Hierarchical Cluster-Class Matching (HCCM), which performs one-to-one matching between predefined semantic classes (like 'male' or 'UK') and the hierarchical clusters produced by SLINK or HDBSCAN. Results showed that some clusters matched individual classes, while others corresponded to conjunctions (e.g., 'male and UK'). They also introduced Liebig's score, a metric to quantify matching performance and diagnose limiting factors. This work provides a new lens for understanding how speaker recognition networks encode attributes like gender and accent, with implications for debugging, fairness, and interpretability in audio AI systems.

Key Points

Applied SLINK and HDBSCAN to speaker recognition networks, revealing hierarchical clustering in latent representations
Introduced HCCM algorithm to map hierarchical clusters to semantic classes like 'male' or 'UK'
Proposed Liebig's score metric to quantify cluster-class matching performance and identify limiting factors

Why It Matters

Makes speaker recognition AI more interpretable, enabling better debugging and fairness analysis in voice-based systems.

Read Original Article

Explainable AI in Speaker Recognition -- Making Latent Representations Understandable

Why It Matters

Stay Ahead in AI