New protocol preserves critical answers when compressing audio for LALMs
Audio compression can silently ruin answers for key query families—here's how to prevent it.
New research from Amir Ivry tackles a hidden risk in deploying Large Audio Language Models (LALMs) like Qwen: audio compression, often used to reduce memory and latency, can silently destroy accuracy for specific, deployment-critical query families while aggregate performance looks fine. The paper, "Task-Aware Answer Preservation under Audio Compression for Large Audio Language Models," proposes a theoretical acceptance-rejection criterion for compressors based on worst-family answer error, not just average.
The author derives a practical sign-off protocol that returns compression budgets meeting worst-family error checks with statistical confidence. Evaluated on five multiple-choice audio QA benchmarks with two Qwen-based backbones, the protocol exposes that query-family partition choice can alter the approved budget and identifies regimes where query-conditioned compression improves preservation. This allows safer, more efficient audio model deployment without surprise failures on key tasks. The methodology bridges the gap between average-case metrics and real-world reliability, offering a principled way to set compression levels when specific accuracy thresholds per query type are non-negotiable.
- Proposes a worst-family error criterion instead of average accuracy to judge audio compressors for LALMs
- Derives a statistical sign-off protocol that ensures compression budgets don't harm critical query families
- Evaluated on 5 benchmarks with 2 Qwen backbones, showing hidden damage and benefits of query-conditioned compression
Why It Matters
Ensures deployed large audio models don't fail silently on critical queries, improving reliability in real-world use.