Models & Releases

We’re introducing three audio models in the API that unlock a new class of voice apps for developers.

Developers get speech-to-text, text-to-speech, and voice activity detection models...

Deep Dive

OpenAI has made a new Reddit submission, sparking discussion.

Key Points
  • Three audio models: speech-to-text (Whisper v3), text-to-speech (with multiple voices), and voice activity detection
  • Pricing: $0.006 per minute for transcription and generation, enabling cost-effective voice apps
  • Low latency and high accuracy allow real-time conversational agents and interactive voice response systems

Why It Matters

Voice-first apps become trivial to build, unlocking faster, more natural user interactions across industries.