Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device
A tiny CNN predicts stuttering events before they happen, running fully on-device.
A new paper from Nazar Kozak introduces a lightweight 616K-parameter CNN that can predict upcoming stuttering events from just three seconds of audio. Trained on Apple's SEP-28k dataset (20,131 clips), the model predicts whether the next contiguous clip will contain any disfluency. Notably, it shows severity-selective precursor signals: it achieves AUC of 0.601 for blocks and 0.617 for sound repetitions, while fillers and word repetitions remain at chance levels. This means the model is particularly effective at predicting clinically severe stuttering events, which exhibit measurable prosodic precursors.
The model is designed for practical deployment: it exports losslessly to CoreML (1.19 MB), ONNX (40 KB), and TFLite, with Neural-Engine latency ranging from 0.25 ms on the iPhone 17 Pro Max to 0.55 ms on older devices. A 4 Hz streaming simulation uses only 0.54% of the real-time budget, making it ideal for continuous on-device monitoring. The model also exhibits cross-population transfer: without fine-tuning, it achieves AUC 0.674 detection and 0.655 prediction on pediatric stutterers from FluencyBank. Five negative ablations (including GRU and focal loss variants) all failed to improve over the vanilla baseline, suggesting the simple CNN architecture is already near-optimal for this task.
- 616K-parameter CNN predicts upcoming stuttering from 3-second audio clips with AUC 0.601 for blocks and 0.617 for sound repetitions
- Model deploys fully on-device via CoreML (1.19 MB), ONNX (40 KB), and TFLite with 0.25–0.55 ms latency on iPhone
- Cross-population transfer achieves AUC 0.674 detection and 0.655 prediction on pediatric stutterers without fine-tuning
Why It Matters
Real-time on-device stuttering prediction enables privacy-preserving closed-loop intervention tools for speech therapy.