Uses Data Prediction Mean Flows, a flow matching model, for speech restoration tasks like bandwidth extension and artifact removal?

Uses Data Prediction Mean Flows, a flow matching model, for speech restoration tasks like bandwidth extension and artifact removal

Achieves 120x less compute than state-of-the-art with similar audio quality?

Achieves 120x less compute than state-of-the-art with similar audio quality

Introduces zero algorithmic latency beyond STFT, enabling real-time processing in low-latency systems?

Introduces zero algorithmic latency beyond STFT, enabling real-time processing in low-latency systems

Audio & Speech

Sebastian Braun's mean flow model restores speech in real-time with 120x less compute

arXiv eess.AS May 18, 2026

⚡New speech restoration model runs in real-time with 120x less compute than SOTA

Deep Dive

Sebastian Braun introduces a novel real-time speech restoration model based on Data Prediction Mean Flows, a variant of flow matching generative models. Traditional large offline processing models excel at tasks like bandwidth extension, gap filling, and removing non-linear artifacts from codecs, clipping, and distortion, but they are not real-time capable due to high latency and compute requirements. Braun's model addresses this by combining a few-step flow matching approach with a low-latency architecture, achieving no algorithmic latency beyond the STFT (short-time Fourier transform) processing.

The key innovation is the mean flow formulation that reduces computational cost by 120x compared to state-of-the-art methods while maintaining comparable audio quality. This makes it practical for real-time deployment in communication systems, hearing aids, and voice assistants. The model can restore speech degraded by various non-linear distortions without needing linear denoising or dereverberation, filling a gap in real-time audio processing. The paper is available on arXiv (2605.16251) and could impact live speech enhancement in telephony and streaming.

Key Points

Uses Data Prediction Mean Flows, a flow matching model, for speech restoration tasks like bandwidth extension and artifact removal
Achieves 120x less compute than state-of-the-art with similar audio quality
Introduces zero algorithmic latency beyond STFT, enabling real-time processing in low-latency systems

Why It Matters

Enables high-quality real-time speech restoration for live calls and streaming with drastically lower compute requirements.

Read Original Article

Sebastian Braun's mean flow model restores speech in real-time with 120x less compute

Why It Matters

Related Articles

🚀 Stay Ahead in AI