Stability AI launches Stable Audio 3 with open-weight text-to-audio models
Generate music and SFX up to 6 minutes in seconds using open-weight AI
Stability AI has announced Stable Audio 3, a new family of open-weight text-to-audio models designed for music and sound effects generation. The release includes three models: Stable Audio 3 Small Music, Small SFX, and Medium. The Medium model stands out by producing audio up to 6 minutes and 20 seconds long, inferencing in mere seconds on NVIDIA GPUs. The Small models each focus on either music or sound effects, generating up to 2 minutes of audio and are optimized to run efficiently on CPUs. All models are available on Hugging Face, alongside a dedicated GitHub repository that provides inference scripts and support for LoRA fine-tuning, enabling developers to customize outputs.
The models are released under the Stability AI Community License, meaning they are free for personal and creative use with no royalties or ownership claims on outputs. Stability AI also published two academic papers detailing the model architecture and a new SAME autoencoder. This open approach invites both artists and developers to experiment and integrate AI-generated audio into their projects quickly. A demo is available at stableaudio.com. By combining speed, length, and open accessibility, Stable Audio 3 positions itself as a versatile tool for creators looking to generate custom audio from text prompts without licensing hurdles.
- Medium model generates up to 6 minutes 20 seconds of audio in seconds on NVIDIA GPUs
- Small models (Music and SFX) generate up to 2 minutes, optimized for CPU inference
- Open-weights release under Community License – free for personal/creative use, no royalties
Why It Matters
Democratizes AI music generation with fast, open, and royalty-free tools for creators and developers.