Medium model generates up to 6 minutes 20 seconds of audio in seconds on NVIDIA GPUs?

Medium model generates up to 6 minutes 20 seconds of audio in seconds on NVIDIA GPUs

Small models (Music and SFX) generate up to 2 minutes, optimized for CPU inference?

Small models (Music and SFX) generate up to 2 minutes, optimized for CPU inference

Open-weights release under Community License – free for personal/creative use, no royalties?

Open-weights release under Community License – free for personal/creative use, no royalties

Image & Video

Stability AI launches Stable Audio 3 with open-weight text-to-audio models

r/StableDiffusion May 21, 2026

⚡Generate music and SFX up to 6 minutes in seconds using open-weight AI

Deep Dive

Stability AI has announced Stable Audio 3, a new family of open-weight text-to-audio models designed for music and sound effects generation. The release includes three models: Stable Audio 3 Small Music, Small SFX, and Medium. The Medium model stands out by producing audio up to 6 minutes and 20 seconds long, inferencing in mere seconds on NVIDIA GPUs. The Small models each focus on either music or sound effects, generating up to 2 minutes of audio and are optimized to run efficiently on CPUs. All models are available on Hugging Face, alongside a dedicated GitHub repository that provides inference scripts and support for LoRA fine-tuning, enabling developers to customize outputs.

The models are released under the Stability AI Community License, meaning they are free for personal and creative use with no royalties or ownership claims on outputs. Stability AI also published two academic papers detailing the model architecture and a new SAME autoencoder. This open approach invites both artists and developers to experiment and integrate AI-generated audio into their projects quickly. A demo is available at stableaudio.com. By combining speed, length, and open accessibility, Stable Audio 3 positions itself as a versatile tool for creators looking to generate custom audio from text prompts without licensing hurdles.

Key Points

Medium model generates up to 6 minutes 20 seconds of audio in seconds on NVIDIA GPUs
Small models (Music and SFX) generate up to 2 minutes, optimized for CPU inference
Open-weights release under Community License – free for personal/creative use, no royalties

Why It Matters

Democratizes AI music generation with fast, open, and royalty-free tools for creators and developers.

Read Original Article

Stability AI launches Stable Audio 3 with open-weight text-to-audio models

Why It Matters

Related Articles

🚀 Stay Ahead in AI