MoE architecture with 80B total parameters activates only ~3B per inference, balancing performance and cost?

MoE architecture with 80B total parameters activates only ~3B per inference, balancing performance and cost.

Supports 64K token context and up to 2880x2880 image resolution for high-detail visual tasks?

Supports 64K token context and up to 2880x2880 image resolution for high-detail visual tasks.

Introduces 'Think with Image' – active visual tools like cropping/rotation within the reasoning chain for complex visual analysis?

Introduces 'Think with Image' – active visual tools like cropping/rotation within the reasoning chain for complex visual analysis.

Open Source

AIDC-AI's Ovis2.6-80B-A3B delivers MoE multimodal with 3B active parameters

r/LocalLLaMA May 13, 2026

⚡Only 3B active parameters for 80B total model, 64K context, and active visual reasoning

Deep Dive

AIDC-AI has introduced Ovis2.6-80B-A3B, the latest multimodal large language model in the Ovis series. The key innovation is its Mixture-of-Experts (MoE) architecture, which scales the total parameter count to 80B while activating only approximately 3B parameters per inference. This design dramatically reduces serving costs and increases throughput, making high-capacity multimodal reasoning more accessible. The model builds on the foundation of Ovis2.5, upgrading the LLM backbone to MoE to capture vast knowledge without the computational overhead of a dense 80B model.

Ovis2.6 extends its practical capabilities with a 64K token context window and support for images up to 2880×2880 pixels, enabling detailed analysis of high-resolution documents and long-form visual content. A standout feature is 'Think with Image,' which transforms vision from passive input into an active cognitive workspace. During chain-of-thought reasoning, the model can invoke visual tools like cropping and rotation to re-examine image regions, enabling multi-turn self-reflective reasoning. The model also delivers reinforced performance in OCR, document understanding, and chart/diagram analysis, making it a strong contender for enterprise-grade document intelligence and complex visual QA tasks.

Key Points

MoE architecture with 80B total parameters activates only ~3B per inference, balancing performance and cost.
Supports 64K token context and up to 2880x2880 image resolution for high-detail visual tasks.
Introduces 'Think with Image' – active visual tools like cropping/rotation within the reasoning chain for complex visual analysis.

Why It Matters

Cost-effective multimodal reasoning with high resolution and long context for enterprise document and image analysis.

Read Original Article

AIDC-AI's Ovis2.6-80B-A3B delivers MoE multimodal with 3B active parameters

Why It Matters

Related Articles

🚀 Stay Ahead in AI