TTT-PLC adapts packet loss concealment in real-time without clean audio
Self-supervised tuning fixes dropped audio packets using only what arrives
Packet loss concealment (PLC) traditionally uses static models to reconstruct missing audio packets. But each call or recording carries unique signal information in the packets that do arrive, which is wasted. To fix this, researchers from Bar-Ilan University propose TTT-PLC, a self-supervised test-time tuning framework that adapts a pretrained PLC model on the fly using only the received audio. The key insight: synthetically mask portions of the available signal, train the model to conceal those masked parts with its native objective, then apply the adapted model to the real packet losses. No clean reference, external data, or architecture changes are needed.
TTT-PLC was tested on two public PLC backbones: FRN (a recurrent full-band speech model) and PARCnet (a hybrid autoregressive-neural model for music). In the non-causal setting—where the entire received file is available before reconstruction—the model performs multiple adaptation passes to reach a per-file ceiling. In the causal streaming setting, adaptation runs on past completed blocks only, and updated parameters affect future audio. Results show significant improvement over static baselines, proving that pretrained PLC models don't need to remain frozen at inference time. The paper is under submission to IEEE TASLP.
- Self-supervised adaptation uses only received packets, no clean reference needed
- Tested on FRN (speech) and PARCnet (music) backbones across two deployment modes
- Causal setting allows real-time streaming adaptation without revising past samples
Why It Matters
Real-time audio quality improvement without retraining per call—critical for VoIP and streaming.