Media & Culture

ByteDance’s next-gen AI model can generate clips based on text, images, audio, and video

The Verge AI February 12, 2026

⚡TikTok's parent company just dropped a multi-modal video bomb that's terrifying Hollywood.

Deep Dive

ByteDance has launched Seedance 2.0, a next-gen AI video model that can generate 15-second clips with audio by combining up to nine images, three video clips, and three audio prompts. It uniquely accounts for camera movement, visual effects, and motion to create complex, multi-subject scenes. The model, available on ByteDance's Dreamina platform, is already being used to create hyper-realistic videos featuring celebrities and copyrighted characters, sparking industry concern.

Why It Matters

This multi-modal leap directly challenges OpenAI's Sora and could democratize high-quality video production, disrupting film and content creation.

Read Original Article

ByteDance’s next-gen AI model can generate clips based on text, images, audio, and video

Why It Matters

Stay Ahead in AI