CFMDCTCodec delivers high-quality speech at 0.65 kbps
This neural codec uses conditional flow matching to restore fine spectral details...
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
CFMDCTCodec tackles the challenge of high-quality speech coding at extremely low bitrates, a critical need for bandwidth-constrained applications. The system operates entirely in the modified discrete cosine transform (MDCT) domain, using a lightweight encoder-quantizer-decoder architecture to produce a coarse spectral reconstruction. To restore fine-grained details lost during compression, it introduces a noise-prior-aware conditional flow matching (CFM) enhancer that integrates a conditional MDCT velocity-field filter with an ordinary differential equation (ODE) solver. This enhancer is guided by an MDCT-derived magnitude-adaptive noise prior that emphasizes perceptually important high-energy regions while stabilizing low-energy and silent areas.
Training is performed with a unified non-adversarial strategy that jointly optimizes reconstruction, quantization, and CFM objectives. Evaluations show CFMDCTCodec outperforms competitive baselines at 0.65 kbps, achieving perceptual quality close to much larger codecs with a fraction of the parameters and computational cost. The paper has been accepted by IEEE Transactions on Audio, Speech and Language Processing, signaling strong peer validation.
- Operates entirely in the MDCT domain for efficient speech compression down to 0.65 kbps
- Uses a noise-prior-aware conditional flow matching enhancer to restore fine spectral details
- Outperforms baselines with significantly fewer parameters while approaching large-scale codec quality
Why It Matters
Enables near-transparent speech transmission over ultra-low-bandwidth networks like satellite or IoT links.