Research & Papers

Hebatron: Hebrew MoE Model Activates 3B Params, Rivals 27B Models

Open-weight model achieves 73.8% Hebrew reasoning with 9x faster inference than dense alternatives.

Deep Dive

Researchers led by Noam Kayzer et al. have released Hebatron, the first open-weight language model specialized for Hebrew using the NVIDIA Nemotron-3 sparse Mixture-of-Experts (MoE) architecture. The model features 30 billion total parameters but activates only 3 billion per token, enabling approximately 9 times higher inference throughput compared to dense models of equivalent quality. Native context length reaches 65,536 tokens. Training followed a three-phase easy-to-hard curriculum with continuous anti-forgetting anchoring, followed by supervised fine-tuning on 2 million bilingual Hebrew–English samples. The curriculum ordering alone contributed a 3-point aggregate benchmark gain over a reversed configuration.

Hebatron achieves a Hebrew reasoning average of 73.8%, surpassing DictaLM-3.0-24B-Thinking (68.9%) and remaining competitive with Gemma-3-27B-IT on tasks such as GSM8K-HE and Israeli Trivia. This marks the first language-specific adaptation of the Nemotron-3 architecture for any target language, and the first open-weight Hebrew-specialized MoE model with native long-context support. The weights are released openly to foster further research in Hebrew and Semitic-language NLP, offering a practical balance of efficiency and performance.

Key Points
  • 30B total parameters with only 3B activated per token, yielding ~9x inference throughput vs. dense models.
  • 73.8% Hebrew reasoning average beats DictaLM-3.0 (68.9%) and matches Gemma-3-27B on key benchmarks.
  • First open-weight Hebrew MoE model with 65,536-token native context and public weights.

Why It Matters

Opens efficient, high-performance Hebrew NLP for research and production, bridging language gaps in AI.