Self-supervised adaptation uses only received packets, no clean reference needed?

Self-supervised adaptation uses only received packets, no clean reference needed

Tested on FRN (speech) and PARCnet (music) backbones across two deployment modes?

Tested on FRN (speech) and PARCnet (music) backbones across two deployment modes

Causal setting allows real-time streaming adaptation without revising past samples?

Causal setting allows real-time streaming adaptation without revising past samples

Audio & Speech

TTT-PLC adapts packet loss concealment in real-time without clean audio

arXiv eess.AS July 03, 2026

⚡Self-supervised tuning fixes dropped audio packets using only what arrives

Deep Dive

Packet loss concealment (PLC) traditionally uses static models to reconstruct missing audio packets. But each call or recording carries unique signal information in the packets that do arrive, which is wasted. To fix this, researchers from Bar-Ilan University propose TTT-PLC, a self-supervised test-time tuning framework that adapts a pretrained PLC model on the fly using only the received audio. The key insight: synthetically mask portions of the available signal, train the model to conceal those masked parts with its native objective, then apply the adapted model to the real packet losses. No clean reference, external data, or architecture changes are needed.

TTT-PLC was tested on two public PLC backbones: FRN (a recurrent full-band speech model) and PARCnet (a hybrid autoregressive-neural model for music). In the non-causal setting—where the entire received file is available before reconstruction—the model performs multiple adaptation passes to reach a per-file ceiling. In the causal streaming setting, adaptation runs on past completed blocks only, and updated parameters affect future audio. Results show significant improvement over static baselines, proving that pretrained PLC models don't need to remain frozen at inference time. The paper is under submission to IEEE TASLP.

Key Points

Self-supervised adaptation uses only received packets, no clean reference needed
Tested on FRN (speech) and PARCnet (music) backbones across two deployment modes
Causal setting allows real-time streaming adaptation without revising past samples

Why It Matters

Real-time audio quality improvement without retraining per call—critical for VoIP and streaming.

Read Original Article

TTT-PLC adapts packet loss concealment in real-time without clean audio

Why It Matters

Related Articles

🚀 Stay Ahead in AI