speech-to-text (Whisper v3), text-to-speech (with multiple voices), and voice activity detection

$0.006 per minute for transcription and generation, enabling cost-effective voice apps

Low latency and high accuracy allow real-time conversational agents and interactive voice response systems?

Low latency and high accuracy allow real-time conversational agents and interactive voice response systems

Models & Releases

OpenAI releases three new audio models in API for voice apps

r/OpenAI May 08, 2026

⚡Developers get speech-to-text, text-to-speech, and voice activity detection models...

Deep Dive

OpenAI has made a new Reddit submission, sparking discussion.

Key Points

Three audio models: speech-to-text (Whisper v3), text-to-speech (with multiple voices), and voice activity detection
Pricing: $0.006 per minute for transcription and generation, enabling cost-effective voice apps
Low latency and high accuracy allow real-time conversational agents and interactive voice response systems

Voice-first apps become trivial to build, unlocking faster, more natural user interactions across industries.