JetBrains' Mellum 2 12B A2.5B is a small MoE coding model with only 2.5B active parameters?

JetBrains' Mellum 2 12B A2.5B is a small MoE coding model with only 2.5B active parameters.

Coding benchmarks are near Qwen 3.5 9B (reasoning), but general tasks are worse than Qwen 3.5 4B?

Coding benchmarks are near Qwen 3.5 9B (reasoning), but general tasks are worse than Qwen 3.5 4B.

Open-source on Hugging Face; technical report on arXiv explains training and evaluations?

Open-source on Hugging Face; technical report on arXiv explains training and evaluations.

Open Source

JetBrains' Mellum 2 12B matches Qwen 3.5 9B in coding

r/LocalLLaMA June 01, 2026

⚡Small MoE model excels at code but lags in general tasks vs Qwen 4B.

Deep Dive

JetBrains released a small, coding-focused Mixture-of-Experts model. They claim its coding performance is around that of Qwen 3.5 9B (the reasoning model) but that it's worse than Qwen 3.5 4B on everything else. Models and technical report are available.

Key Points

JetBrains' Mellum 2 12B A2.5B is a small MoE coding model with only 2.5B active parameters.
Coding benchmarks are near Qwen 3.5 9B (reasoning), but general tasks are worse than Qwen 3.5 4B.
Open-source on Hugging Face; technical report on arXiv explains training and evaluations.

Why It Matters

Local coding assistants can now match larger models while using far less memory and compute.

Read Original Article

JetBrains' Mellum 2 12B matches Qwen 3.5 9B in coding

Why It Matters

Related Articles

🚀 Stay Ahead in AI