Research & Papers

[P] Utterance, an open source client-side semantic endpointing SDK for voice apps. We are looking for contributors.

Open-source 5MB model detects thinking pauses and interrupts locally, eliminating cloud latency and cost.

Deep Dive

Developer R3VNUE is building Utterance, an open-source, MIT-licensed SDK for voice applications. It runs a small 3-5MB ONNX model entirely client-side in a browser or on-device. Unlike basic VAD tools, it detects four conversational states: speaking, thinking pause, turn complete, and interrupt intent. This enables more natural voice interactions without relying on server-side APIs like OpenAI Realtime, thereby removing latency, cost, and privacy concerns for developers.

Why It Matters

Enables developers to build voice apps with human-like conversational flow, improving user experience while protecting privacy and reducing costs.