Audio & Speech

Pupu-Vocoder and Pupu-Codec deliver aliasing-free neural audio synthesis

New anti-aliasing technique beats existing models on singing voice and music

Deep Dive

Aliasing artifacts—distortions from nonlinear activations and upsampling—have long plagued neural audio synthesis, especially for high-fidelity music and singing voices. In a new paper accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), a team led by Yicheng Gu introduces Pupu-Vocoder and Pupu-Codec, which integrate differentiable anti-aliasing techniques directly into the activation functions and upsampling modules. The authors built a dedicated test-signal benchmark to evaluate anti-aliased modules and validated their models across speech, singing voice, music, and general audio benchmarks.

Experimental results show that Pupu-Vocoder and Pupu-Codec significantly outperform prior state-of-the-art models on singing voice, music, and audio tasks, while achieving comparable performance on speech. This breakthrough promises to unlock higher-quality synthetic audio for music production, voice synthesis, and audio codecs—reducing the metallic or blurry artifacts common in current neural audio systems. The researchers have released demos, code, and pretrained checkpoints to foster further development.

Key Points
  • Integrates differentiable anti-aliasing into activation and upsampling layers to eliminate distortion
  • Outperforms existing models on singing voice, music, and general audio; matches speech quality
  • Accepted by TASLP; code, demos, and checkpoints are publicly available

Why It Matters

Delivers cleaner synthetic audio for music and voice without artifacts, advancing neural audio codecs