Open Source

Ming-flash-omni-2.0: 100B MoE (6B active) omni-modal model - unified speech/SFX/music generation

A single AI model that can see, hear, and create across all major media formats.

Deep Dive

Ant Group has open-sourced Ming-flash-omni-2.0, a massive 100B parameter Mixture-of-Experts model with 6B active parameters. It's a true omni-modal model, accepting image, text, video, and audio inputs to generate image, text, and audio outputs within a single unified architecture. This represents a significant step toward a single, general-purpose AI capable of understanding and creating across all major media types.

Why It Matters

This moves us closer to a single, general-purpose AI that can seamlessly understand and create across all media, potentially simplifying complex multi-modal workflows.