Achieves 94.3% fewer parameters and 95.3% lower computational cost (MACs) than previous models?

Achieves 94.3% fewer parameters and 95.3% lower computational cost (MACs) than previous models.

Outperforms the previous SOTA model TF-GridNet, especially when trained on the new EchoSet dataset?

Outperforms the previous SOTA model TF-GridNet, especially when trained on the new EchoSet dataset.

Introduces the EchoSet dataset with realistic noise and reverberation for better real-world model evaluation?

Introduces the EchoSet dataset with realistic noise and reverberation for better real-world model evaluation.

Audio & Speech

Tsinghua's TIGER model slashes speech separation compute by 95% with new EchoSet dataset

arXiv eess.AS March 02, 2026

⚡New AI model reduces parameters by 94.3% and computational cost by 95.3% while beating SOTA.

Deep Dive

A research team from Tsinghua University has introduced TIGER (Time-frequency Interleaved Gain Extraction and Reconstruction), a breakthrough speech separation model designed for extreme efficiency. Accepted at ICLR 2025, TIGER addresses a critical gap in low-latency speech processing by drastically reducing computational demands. The model leverages prior knowledge to divide and compress frequency bands, employing a multi-scale selective attention module and a full-frequency-frame attention module to capture contextual information. Crucially, the team also released EchoSet, a new benchmark dataset featuring realistic acoustic challenges like noise, reverberation, and object occlusions to better evaluate model performance in complex, real-world environments.

TIGER's architectural innovations yield staggering efficiency gains: it reduces the number of parameters by 94.3% and computational costs (measured in Multiply-Accumulate Operations or MACs) by 95.3% compared to previous models. Remarkably, it still surpasses the performance of the previous state-of-the-art model, TF-GridNet, particularly when trained and tested on the new EchoSet data. The introduction of EchoSet itself is a significant contribution, as models trained on it demonstrated superior generalization to physical-world recordings compared to those trained on existing datasets. This combination of a highly efficient model and a more realistic evaluation framework paves the way for deploying advanced speech separation in resource-constrained, real-time applications.

Key Points

Achieves 94.3% fewer parameters and 95.3% lower computational cost (MACs) than previous models.
Outperforms the previous SOTA model TF-GridNet, especially when trained on the new EchoSet dataset.
Introduces the EchoSet dataset with realistic noise and reverberation for better real-world model evaluation.

Why It Matters

Enables high-quality, real-time speech separation on low-power devices like hearing aids, smart speakers, and conferencing systems.

Read Original Article

Tsinghua's TIGER model slashes speech separation compute by 95% with new EchoSet dataset

Why It Matters

Related Articles

🚀 Stay Ahead in AI