OpenAI launches new voice intelligence features in its API
70 input languages, real-time translation, and GPT-5-class reasoning in a voice API.
OpenAI announced Thursday a suite of new voice intelligence features for its Realtime API, including GPT-Realtime-2, a voice model that leverages GPT-5-class reasoning to handle complex conversational requests. Unlike its predecessor GPT-Realtime-1.5, the new model is designed for realistic vocal interactions that can reason and take action as a conversation unfolds. The company also introduced GPT-Realtime-Translate, which provides real-time translation across 70 input languages and 13 output languages, keeping pace with the speaker. Additionally, GPT-Realtime-Whisper offers live speech-to-text transcription captured during interactions. These features are billed by minute for Translate and Whisper, and by token consumption for GPT-Realtime-2.
OpenAI targets enterprises in customer service, education, media, events, and creator platforms. The company claims the models move real-time audio from simple call-and-response to voice interfaces that can listen, reason, translate, transcribe, and act. To prevent misuse, OpenAI has embedded guardrails that can halt conversations violating harmful content guidelines, addressing concerns about spam or fraud. The updates position OpenAI as a key player in voice AI, enabling developers to build more capable conversational applications.
- GPT-Realtime-2 uses GPT-5-class reasoning for complex voice conversations, improving on GPT-Realtime-1.5.
- GPT-Realtime-Translate supports 70 input and 13 output languages with real-time conversational pace.
- New features are billed per minute (Translate/Whisper) or per token (Realtime-2) via OpenAI's Realtime API.
Why It Matters
Voice AI leaps from simple commands to intelligent, multi-lingual conversations that can take action.