Pupu-Vocoder and Pupu-Codec deliver aliasing-free neural audio synthesis
New anti-aliasing technique beats existing models on singing voice and music
Aliasing artifacts—distortions from nonlinear activations and upsampling—have long plagued neural audio synthesis, especially for high-fidelity music and singing voices. In a new paper accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), a team led by Yicheng Gu introduces Pupu-Vocoder and Pupu-Codec, which integrate differentiable anti-aliasing techniques directly into the activation functions and upsampling modules. The authors built a dedicated test-signal benchmark to evaluate anti-aliased modules and validated their models across speech, singing voice, music, and general audio benchmarks.
Experimental results show that Pupu-Vocoder and Pupu-Codec significantly outperform prior state-of-the-art models on singing voice, music, and audio tasks, while achieving comparable performance on speech. This breakthrough promises to unlock higher-quality synthetic audio for music production, voice synthesis, and audio codecs—reducing the metallic or blurry artifacts common in current neural audio systems. The researchers have released demos, code, and pretrained checkpoints to foster further development.
- Integrates differentiable anti-aliasing into activation and upsampling layers to eliminate distortion
- Outperforms existing models on singing voice, music, and general audio; matches speech quality
- Accepted by TASLP; code, demos, and checkpoints are publicly available
Why It Matters
Delivers cleaner synthetic audio for music and voice without artifacts, advancing neural audio codecs