Operates entirely in the MDCT domain for efficient speech compression down to 0.65 kbps?

Operates entirely in the MDCT domain for efficient speech compression down to 0.65 kbps

Uses a noise-prior-aware conditional flow matching enhancer to restore fine spectral details?

Uses a noise-prior-aware conditional flow matching enhancer to restore fine spectral details

Outperforms baselines with significantly fewer parameters while approaching large-scale codec quality?

Outperforms baselines with significantly fewer parameters while approaching large-scale codec quality

Audio & Speech

CFMDCTCodec delivers high-quality speech at 0.65 kbps

arXiv eess.AS May 27, 2026

⚡This neural codec uses conditional flow matching to restore fine spectral details...

Deep Dive

CFMDCTCodec tackles the challenge of high-quality speech coding at extremely low bitrates, a critical need for bandwidth-constrained applications. The system operates entirely in the modified discrete cosine transform (MDCT) domain, using a lightweight encoder-quantizer-decoder architecture to produce a coarse spectral reconstruction. To restore fine-grained details lost during compression, it introduces a noise-prior-aware conditional flow matching (CFM) enhancer that integrates a conditional MDCT velocity-field filter with an ordinary differential equation (ODE) solver. This enhancer is guided by an MDCT-derived magnitude-adaptive noise prior that emphasizes perceptually important high-energy regions while stabilizing low-energy and silent areas.

Training is performed with a unified non-adversarial strategy that jointly optimizes reconstruction, quantization, and CFM objectives. Evaluations show CFMDCTCodec outperforms competitive baselines at 0.65 kbps, achieving perceptual quality close to much larger codecs with a fraction of the parameters and computational cost. The paper has been accepted by IEEE Transactions on Audio, Speech and Language Processing, signaling strong peer validation.

Key Points

Operates entirely in the MDCT domain for efficient speech compression down to 0.65 kbps
Uses a noise-prior-aware conditional flow matching enhancer to restore fine spectral details
Outperforms baselines with significantly fewer parameters while approaching large-scale codec quality

Why It Matters

Enables near-transparent speech transmission over ultra-low-bandwidth networks like satellite or IoT links.

Read Original Article

CFMDCTCodec delivers high-quality speech at 0.65 kbps

Why It Matters

Related Articles

🚀 Stay Ahead in AI