12B total parameters, but only 2.5B active per token, yielding over 2x faster inference than similarly sized open models?

12B total parameters, but only 2.5B active per token, yielding over 2x faster inference than similarly sized open models.

Released under Apache 2.0 license, optimized for routing, RAG, sub-agents, and private deployment?

Released under Apache 2.0 license, optimized for routing, RAG, sub-agents, and private deployment.

Competitive benchmark performance on code, reasoning, science, and math tasks despite compact active parameter count?

Competitive benchmark performance on code, reasoning, science, and math tasks despite compact active parameter count.

Open Source

JetBrains' Mellum2: 12B MoE Model with 2.5B Active Params, 2x Faster Inference

Hugging Face Blog June 01, 2026

⚡Open-source MoE model activates only 2.5B params per token for blazing-fast code and NLP tasks.

Deep Dive

JetBrains has released Mellum2, a 12B-parameter Mixture-of-Experts (MoE) language model trained from scratch on both natural language and code. The model activates only 2.5B parameters per token, enabling significant efficiency gains over dense models of similar size. In benchmarks spanning code generation, reasoning, science, and math, Mellum2 is competitive while achieving more than 2x faster inference. Its Apache 2.0 license makes it suitable for private deployments and self-hosted environments.

Mellum2 is purpose-built for latency-sensitive AI workflows such as routing and orchestration in multi-model systems, RAG pipeline processing, and sub-agent tasks like planning and validation. By focusing solely on text and code rather than multimodal capabilities, JetBrains keeps the model compact and deployable. It fills a niche as a "focal" model—fast, efficient, and well-scoped for high-frequency tasks inside larger AI architectures, reducing reliance on expensive frontier models for intermediate steps.

Key Points

12B total parameters, but only 2.5B active per token, yielding over 2x faster inference than similarly sized open models.
Released under Apache 2.0 license, optimized for routing, RAG, sub-agents, and private deployment.
Competitive benchmark performance on code, reasoning, science, and math tasks despite compact active parameter count.

Why It Matters

Professionals can deploy this open, fast MoE model for latency-critical coding and NLP tasks without needing a massive inference budget.

Read Original Article

JetBrains' Mellum2: 12B MoE Model with 2.5B Active Params, 2x Faster Inference

Why It Matters

Related Articles

🚀 Stay Ahead in AI