Amazon Nova 2 Sonic handles full speech-to-speech pipeline, no separate STT/TTS needed?

Amazon Nova 2 Sonic handles full speech-to-speech pipeline, no separate STT/TTS needed

Stream Vision Agents provides open-source framework with 25+ integrations, client SDKs for major platforms?

Stream Vision Agents provides open-source framework with 25+ integrations, client SDKs for major platforms

Edge network delivers sub-500ms join times and under 30ms audio latency for natural conversation flow?

Edge network delivers sub-500ms join times and under 30ms audio latency for natural conversation flow

Developer Tools

Stream Vision Agents + Amazon Nova 2 Sonic enable real-time voice agents in minutes

AWS Machine Learning Blog May 15, 2026

⚡Speech-to-speech model meets open-source framework for sub-500ms join times.

Deep Dive

Building production-grade voice AI that feels natural requires orchestrating speech recognition, language models, and text-to-speech within hundreds of milliseconds. Stream's new integration combines its open-source Vision Agents framework with Amazon Nova 2 Sonic, a speech-to-speech foundation model available through Amazon Bedrock. Nova 2 Sonic accepts audio input and produces audio output directly, eliminating the need for separate STT and TTS services. It provides real-time bidirectional audio streaming, native turn detection, and function calling.

Vision Agents provides a plugin-based Python framework with 25+ integrations and client SDKs for React, iOS, Android, Flutter, and React Native. It abstracts infrastructure complexity like WebRTC connection management, automatic reconnection, and graceful degradation. Together with Stream's globally distributed edge network (sub-500ms join times, under 30ms audio latency), developers can build and deploy voice agents within minutes. The architecture keeps sensitive data in the customer's AWS account while Stream handles media transport.

Key Points

Amazon Nova 2 Sonic handles full speech-to-speech pipeline, no separate STT/TTS needed
Stream Vision Agents provides open-source framework with 25+ integrations, client SDKs for major platforms
Edge network delivers sub-500ms join times and under 30ms audio latency for natural conversation flow

Why It Matters

Voice AI apps can now go from concept to production in minutes without building custom real-time infrastructure.

Read Original Article

Stream Vision Agents + Amazon Nova 2 Sonic enable real-time voice agents in minutes

Why It Matters

Related Articles

🚀 Stay Ahead in AI