JetBrains' Mellum 2 12B matches Qwen 3.5 9B in coding
Small MoE model excels at code but lags in general tasks vs Qwen 4B.
Deep Dive
JetBrains released a small, coding-focused Mixture-of-Experts model. They claim its coding performance is around that of Qwen 3.5 9B (the reasoning model) but that it's worse than Qwen 3.5 4B on everything else. Models and technical report are available.
Key Points
- JetBrains' Mellum 2 12B A2.5B is a small MoE coding model with only 2.5B active parameters.
- Coding benchmarks are near Qwen 3.5 9B (reasoning), but general tasks are worse than Qwen 3.5 4B.
- Open-source on Hugging Face; technical report on arXiv explains training and evaluations.
Why It Matters
Local coding assistants can now match larger models while using far less memory and compute.