AIDC-AI's Ovis2.6-80B-A3B delivers MoE multimodal with 3B active parameters
Only 3B active parameters for 80B total model, 64K context, and active visual reasoning
AIDC-AI has introduced Ovis2.6-80B-A3B, the latest multimodal large language model in the Ovis series. The key innovation is its Mixture-of-Experts (MoE) architecture, which scales the total parameter count to 80B while activating only approximately 3B parameters per inference. This design dramatically reduces serving costs and increases throughput, making high-capacity multimodal reasoning more accessible. The model builds on the foundation of Ovis2.5, upgrading the LLM backbone to MoE to capture vast knowledge without the computational overhead of a dense 80B model.
Ovis2.6 extends its practical capabilities with a 64K token context window and support for images up to 2880×2880 pixels, enabling detailed analysis of high-resolution documents and long-form visual content. A standout feature is 'Think with Image,' which transforms vision from passive input into an active cognitive workspace. During chain-of-thought reasoning, the model can invoke visual tools like cropping and rotation to re-examine image regions, enabling multi-turn self-reflective reasoning. The model also delivers reinforced performance in OCR, document understanding, and chart/diagram analysis, making it a strong contender for enterprise-grade document intelligence and complex visual QA tasks.
- MoE architecture with 80B total parameters activates only ~3B per inference, balancing performance and cost.
- Supports 64K token context and up to 2880x2880 image resolution for high-detail visual tasks.
- Introduces 'Think with Image' – active visual tools like cropping/rotation within the reasoning chain for complex visual analysis.
Why It Matters
Cost-effective multimodal reasoning with high resolution and long context for enterprise document and image analysis.