Task-Aware Answer Preservation under Audio Compression for Large Audio Language Models
Audio compression can silently ruin answers for key query families—here's how to prevent it.
New research from Amir Ivry tackles a hidden risk in deploying Large Audio Language Models (LALMs) like Qwen: audio compression, often used to reduce memory and latency, can silently destroy accuracy for specific, deployment-critical query families while aggregate performance looks fine. The paper, "Task-Aware Answer Preservation under Audio Compression for Large Audio Language Models," proposes a theoretical acceptance-rejection criterion for compressors based on worst-family answer error, not just average.
The author derives a practical sign-off protocol that returns compression budgets meeting worst-family error checks with statistical confidence. Evaluated on five multiple-choice audio QA benchmarks with two Qwen-based backbones, the protocol exposes that query-family partition choice can alter the approved budget and identifies regimes where query-conditioned compression improves preservation. This allows safer, more efficient audio model deployment without surprise failures on key tasks. The methodology bridges the gap between average-case metrics and real-world reliability, offering a principled way to set compression levels when specific accuracy thresholds per query type are non-negotiable.
- Proposes a worst-family error criterion instead of average accuracy to judge audio compressors for LALMs
- Derives a statistical sign-off protocol that ensures compression budgets don't harm critical query families
- Evaluated on 5 benchmarks with 2 Qwen backbones, showing hidden damage and benefits of query-conditioned compression
Why It Matters
Ensures deployed large audio models don't fail silently on critical queries, improving reliability in real-world use.