FMelCodec operates at 250 bps for 16 kHz audio and 750 bps for 48 kHz, achieving 640x compression?

FMelCodec operates at 250 bps for 16 kHz audio and 750 bps for 48 kHz, achieving 640x compression.

Uses a three-stage CRR framework?

aggressive VQ codec, conditional flow matching refinement, and HiFi-GAN vocoder.

Outperforms existing ultra-low-bitrate codecs in speech quality and speaker similarity with lower complexity?

Outperforms existing ultra-low-bitrate codecs in speech quality and speaker similarity with lower complexity.

Audio & Speech

FMelCodec compresses speech to 250 bps with flow-matching refinement

arXiv eess.AS May 26, 2026

⚡New codec achieves 640x compression while preserving speaker identity and naturalness.

Deep Dive

A team of researchers has introduced FMelCodec, a novel neural speech codec designed for ultra-low-bitrate communication. Operating at just 250 bps for 16 kHz audio and 750 bps for 48 kHz, it achieves a staggering 640x compression ratio while preserving speech naturalness and speaker identity. The codec is built around a three-stage coding-refinement-reconstruction (CRR) framework that tackles the information loss and quantization instability typical at such extreme bit budgets.

The first stage uses a highly aggressive encoder-decoder structure with a single 1024-entry vector quantization (VQ) codebook, coupled with an online clustering strategy to prevent codebook collapse. The second stage applies conditional flow matching (CFM) to refine the degraded mel-spectrogram, using a lightweight velocity-field estimator and a self-consistency training scheme that reduces iterative inference steps. Finally, a HiFi-GAN vocoder reconstructs the waveform from the refined spectrogram. Experiments across multiple datasets and sampling rates show FMelCodec outperforms existing codecs in both objective and subjective evaluations, offering higher reconstruction quality and lower computational overhead. This breakthrough could enable high-fidelity voice communication over severely bandwidth-constrained channels, such as satellite links or IoT networks.

Key Points

FMelCodec operates at 250 bps for 16 kHz audio and 750 bps for 48 kHz, achieving 640x compression.
Uses a three-stage CRR framework: aggressive VQ codec, conditional flow matching refinement, and HiFi-GAN vocoder.
Outperforms existing ultra-low-bitrate codecs in speech quality and speaker similarity with lower complexity.

Why It Matters

Enables high-quality voice calls over extremely low-bandwidth networks, from satellite to IoT.

Read Original Article

FMelCodec compresses speech to 250 bps with flow-matching refinement

Why It Matters

Related Articles

🚀 Stay Ahead in AI