Media & Culture

OpenAI has 3 new AI voice models that the ChatGPT maker says will ‘unlock a new class of voice apps for developers’

GPT-Realtime-2, Translate, and Whisper bring GPT-5 reasoning, live translation, and streaming transcription.

Deep Dive

OpenAI has unveiled three new AI voice models aimed at developers building real-time voice applications. The lineup includes GPT-Realtime-2, the company's first voice model with GPT-5-class reasoning that can handle complex requests, maintain natural conversation flow, and adapt tone based on user input. It can also parse specialized terms from fields like healthcare and production. Pricing is set at $32 per million input tokens and $64 per million output tokens.

The other two models focus on translation and transcription. GPT-Realtime-Translate provides live speech translation from over 70 input languages into 13 output languages, keeping pace with the speaker at $0.034 per minute. GPT-Realtime-Whisper streams speech-to-text transcription in real time, ideal for captions, meeting notes, and summaries, costing $0.017 per minute. All three are accessible via OpenAI's Realtime API and the Playground, with a dedicated prompt available for Codex users to integrate GPT-Realtime-2 directly into the agentic coding platform.

Key Points
  • GPT-Realtime-2: first OpenAI voice model with GPT-5 reasoning, adapts tone and handles specialized terms; $32/$64 per million tokens.
  • GPT-Realtime-Translate: translates speech from 70+ to 13 languages in real time; $0.034 per minute.
  • GPT-Realtime-Whisper: streaming speech-to-text for live captions/notes; $0.017 per minute.

Why It Matters

Developers can now build smarter, faster voice apps with native reasoning, translation, and transcription capabilities.