30B total parameters with only 3B activated per token, yielding ~9x inference throughput vs. dense models?

30B total parameters with only 3B activated per token, yielding ~9x inference throughput vs. dense models.

73.8% Hebrew reasoning average beats DictaLM-3.0 (68.9%) and matches Gemma-3-27B on key benchmarks?

73.8% Hebrew reasoning average beats DictaLM-3.0 (68.9%) and matches Gemma-3-27B on key benchmarks.

First open-weight Hebrew MoE model with 65,536-token native context and public weights?

First open-weight Hebrew MoE model with 65,536-token native context and public weights.

Research & Papers

Hebatron: Hebrew MoE Model Activates 3B Params, Rivals 27B Models

arXiv cs.CL May 13, 2026

⚡Open-weight model achieves 73.8% Hebrew reasoning with 9x faster inference than dense alternatives.

Deep Dive

Researchers led by Noam Kayzer et al. have released Hebatron, the first open-weight language model specialized for Hebrew using the NVIDIA Nemotron-3 sparse Mixture-of-Experts (MoE) architecture. The model features 30 billion total parameters but activates only 3 billion per token, enabling approximately 9 times higher inference throughput compared to dense models of equivalent quality. Native context length reaches 65,536 tokens. Training followed a three-phase easy-to-hard curriculum with continuous anti-forgetting anchoring, followed by supervised fine-tuning on 2 million bilingual Hebrew–English samples. The curriculum ordering alone contributed a 3-point aggregate benchmark gain over a reversed configuration.

Hebatron achieves a Hebrew reasoning average of 73.8%, surpassing DictaLM-3.0-24B-Thinking (68.9%) and remaining competitive with Gemma-3-27B-IT on tasks such as GSM8K-HE and Israeli Trivia. This marks the first language-specific adaptation of the Nemotron-3 architecture for any target language, and the first open-weight Hebrew-specialized MoE model with native long-context support. The weights are released openly to foster further research in Hebrew and Semitic-language NLP, offering a practical balance of efficiency and performance.

Key Points

30B total parameters with only 3B activated per token, yielding ~9x inference throughput vs. dense models.
73.8% Hebrew reasoning average beats DictaLM-3.0 (68.9%) and matches Gemma-3-27B on key benchmarks.
First open-weight Hebrew MoE model with 65,536-token native context and public weights.

Why It Matters

Opens efficient, high-performance Hebrew NLP for research and production, bridging language gaps in AI.

Read Original Article

Hebatron: Hebrew MoE Model Activates 3B Params, Rivals 27B Models

Why It Matters

Related Articles

🚀 Stay Ahead in AI