Audio & Speech

SpeechEditBench: New Benchmark for Instruction-Guided Speech Editing

Evaluates speech models across seven editing tasks with new metrics.

Deep Dive

Introducing SpeechEditBench, a bilingual multi-attribute benchmark for instruction-guided speech editing, by Hanlin Zhang, Daxin Tan, Dehua Tao, Xiao Chen, Haochen Tan, and Linqi Song. It features seven atomic editing tasks plus compositional tasks, using an anchor-based protocol with three metrics: target success, preservation success, and joint success. The evaluation reveals that no single model excels across all dimensions, closed-source Speech LLMs generally outperform open-source ones, and compositional editing remains highly challenging—pushing the need for more robust Speech LLMs.

Key Points
  • Introduces SpeechEditBench with seven atomic editing tasks for speech models.
  • Employs three evaluation metrics: target success, preservation success, and joint success.
  • Closed-source models outperform open-source variants, revealing gaps in current technologies.

Why It Matters

Improves evaluation methods for speech models, driving advancements in voice technology applications.