Open Source

feat: Add Mimo v2.5 model support by AesSedai · Pull Request #22493 · ggml-org/llama.cpp

r/LocalLLaMA May 07, 2026

⚡A 310B parameter model with only 15B active per token and full multimodal input.

Deep Dive

Xiaomi's MiMo V2.5 is a sparse Mixture-of-Experts model with 310B total parameters but only 15B activated per token. It supports up to 1M tokens of context and processes text, image, video, and audio via dedicated encoders (729M ViT, 261M audio transformer) plus a 329M-parameter multi-token prediction module.

Key Points

310B total / 15B activated parameters makes it one of the most efficient MoE models for its size.
Supports up to 1M tokens of context, enabling very long document or video analysis.
Includes dedicated vision (729M ViT) and audio (261M transformer) encoders for full multimodal input.

Why It Matters

Brings enterprise-grade multimodal MoE with ultra-long context to local inference, democratizing advanced AI.

Read Original Article

feat: Add Mimo v2.5 model support by AesSedai · Pull Request #22493 · ggml-org/llama.cpp

Why It Matters

Stay Ahead in AI