Viral Wire

xAI Launches Voice Cloning API with 80+ Voices and 28 Languages

Create a custom AI voice from just 120 seconds of audio with 80+ pre-built options.

Deep Dive

xAI has officially added voice cloning capabilities to its API suite with the launch of Custom Voices and Voice Library on April 30, 2026. Developers can create a personalized AI voice by uploading a reference audio clip of at least 120 seconds. The model captures not just timbre but also delivery patterns and inflections, ensuring the cloned voice sounds and speaks like the original. Alternatively, users can select from over 80 pre-built voices spanning 28 languages. Each custom voice is assigned a unique 8-character alphanumeric ID that works seamlessly across both the Text-to-Speech and Voice Agent APIs, enabling consistent voice identity across applications.

Security is a core feature. xAI implements a two-stage verification process that requires the speaker to read a verification phrase in real time. The system then matches speaker embeddings from the verification clip and the full recording, preventing cloning from pre-existing audio or unauthorized replication of another person's voice. Pricing remains unchanged: $4.20 per million characters for Text-to-Speech and $0.05 per minute ($3.00 per hour) for the Voice Agent API. Intended use cases include voice agents, audiobook narration, and video game character voices. For Tesla owners using Grok, this signals a maturing infrastructure for personalized voice-driven AI interactions inside the vehicle.

Key Points
  • Custom Voices require a 120-second reference clip; xAI's model captures timbre, delivery patterns, and inflections for realistic cloning.
  • Two-stage verification with real-time phrase reading and speaker embedding matching prevents misuse from pre-existing recordings.
  • Pricing stays at standard API rates: $4.20 per million characters for TTS and $0.05 per minute for voice agents, no extra fee for cloning.

Why It Matters

Developers now have a secure, cost-effective way to build personalized voice agents, potentially transforming customer support and in-car AI like Grok.