First comprehensive survey specifically focused on evaluating large audio-language models (LALMs)?

First comprehensive survey specifically focused on evaluating large audio-language models (LALMs)

Proposes a four-dimension taxonomy?

auditory awareness, knowledge/reasoning, dialogue ability, and fairness/safety

Accepted at EMNLP 2025 (Main Conference) with a maintained paper collection for community use?

Accepted at EMNLP 2025 (Main Conference) with a maintained paper collection for community use

Audio & Speech

New taxonomy classifies LALM evaluations across four key dimensions

arXiv eess.AS April 28, 2026

⚡First comprehensive survey to benchmark large audio-language models systematically

Deep Dive

Researchers Chih-Kai Yang, Neo S. Ho, and Hung-yi Lee from National Taiwan University have published a comprehensive survey titled "Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey," accepted at EMNLP 2025 (Main Conference). The paper addresses the growing fragmentation in evaluating LALMs—models that combine large language models with auditory capabilities—by proposing a structured taxonomy. This taxonomy organizes LALM evaluations into four key dimensions: (1) General Auditory Awareness and Processing, covering tasks like speech recognition and sound event detection; (2) Knowledge and Reasoning, assessing comprehension and inference from audio; (3) Dialogue-oriented Ability, focusing on conversational and interactive performance; and (4) Fairness, Safety, and Trustworthiness, examining bias, robustness, and ethical concerns.

The survey provides detailed overviews of existing benchmarks within each category, identifies current challenges such as lack of standardization and limited coverage of real-world scenarios, and highlights promising future directions. As the first survey specifically focused on LALM evaluations, it offers clear guidelines for the research community. The authors will release and actively maintain a collection of surveyed papers to support ongoing advancements, making this a valuable resource for standardizing evaluation practices and accelerating progress in audio-language AI.

Key Points

First comprehensive survey specifically focused on evaluating large audio-language models (LALMs)
Proposes a four-dimension taxonomy: auditory awareness, knowledge/reasoning, dialogue ability, and fairness/safety
Accepted at EMNLP 2025 (Main Conference) with a maintained paper collection for community use

Why It Matters

Standardizing LALM evaluation frameworks is crucial for advancing reliable and safe audio AI systems.

Read Original Article

New taxonomy classifies LALM evaluations across four key dimensions

Why It Matters

Related Articles

🚀 Stay Ahead in AI