Open Source

Ant Group's 100B MoE Model Unifies Text, Image, Video, and Audio Generation

A single AI model that can see, hear, and create across all major media formats.

Deep Dive

Ant Group has open-sourced Ming-flash-omni-2.0, a massive 100B parameter Mixture-of-Experts model with 6B active parameters. It's a true omni-modal model, accepting image, text, video, and audio inputs to generate image, text, and audio outputs within a single unified architecture. This represents a significant step toward a single, general-purpose AI capable of understanding and creating across all major media types.

Why It Matters

This moves us closer to a single, general-purpose AI that can seamlessly understand and create across all media, potentially simplifying complex multi-modal workflows.

📬 Get the top 10 AI stories daily