Token-Based Audio Inpainting via Discrete Diffusion
New method uses tokenized audio and discrete diffusion to restore missing segments up to 750ms long.
Researchers from multiple institutions built a token-based audio inpainting model using discrete diffusion. It tokenizes music with a pre-trained audio tokenizer and introduces two novel training approaches: a derivative-based regularization loss and a span-based absorbing transition. The model outperforms baselines on MusicNet and MAESTRO datasets, especially for gaps of 150ms and above, enabling stable, semantically coherent restoration of long missing segments in degraded audio recordings.
Why It Matters
Enables high-quality restoration of damaged music recordings and audio files, advancing professional audio editing and archival work.