Audio & Speech

Multi-Stage Music Source Restoration with BandSplit-RoFormer Separation and HiFi++ GAN

New system uses a two-stage AI pipeline to separate and restore 8 individual instrument tracks from final, processed songs.

Deep Dive

A research team from CP-JKU has unveiled a sophisticated AI system designed to tackle the complex problem of Music Source Restoration (MSR). The system, detailed in a technical report for the ICASSP 2025 Challenge, aims to recover the original, unprocessed recordings of individual instruments (stems) from a final, mastered music track. This is a significant challenge because mastering introduces effects like compression and EQ that violate the simple linear-mixture assumptions of traditional audio separation models. The team's solution decomposes the problem into a two-stage process of separation followed by restoration.

The first stage employs a novel BandSplit-RoFormer architecture to separate a full mix into eight distinct instrument stems (e.g., bass, drums, vocals) plus an auxiliary 'other' stem. This model was trained using a three-stage curriculum, starting with a 4-stem warm-up using LoRA fine-tuning before expanding to 8 stems. The second stage uses a HiFi++ Generative Adversarial Network (GAN) as a waveform restorer. This component was first trained as a generalist model and then specialized into eight different expert models, one for each instrument type, to meticulously restore the audio quality and remove artifacts introduced during the initial separation. This combined approach represents a major step toward high-fidelity, practical music demixing and remastering tools for audio professionals.

Key Points
  • Uses a two-stage AI pipeline: BandSplit-RoFormer for separation and HiFi++ GAN for restoration.
  • Separates a full music mix into 8 specific instrument stems plus an auxiliary 'other' track.
  • Trained with a three-stage curriculum, progressing from 4-stem to 8-stem separation using LoRA fine-tuning.

Why It Matters

Enables audio engineers and producers to remix, sample, or analyze individual parts from finished songs, unlocking new creative and analytical workflows.