New open weights models: GigaChat-3.1-Ultra-702B and GigaChat-3.1-Lightning-10B-A1.8B
Russian tech giant Sber releases two powerful open-weight models under MIT license, challenging DeepSeek and Qwen.
Sber, Russia's largest tech and banking conglomerate, has made a significant move in the open-source AI landscape by releasing the weights for its GigaChat-3.1 model family under a permissive MIT license. The release includes two distinct models: the massive GigaChat-3.1-Ultra, a 702B parameter Mixture of Experts (MoE) model with 36B active parameters, and the compact GigaChat-3.1-Lightning, a 10B parameter MoE model with 1.8B active parameters. Both models were pretrained from scratch on Sber's own hardware and data, not derived from existing models like DeepSeek, and are optimized for both English and Russian while supporting 14 languages.
Benchmark results show the GigaChat-3.1-Ultra outperforming competitors like DeepSeek-V3-0324 and Qwen3-235B in several key areas, including a 0.7639 score on the BFCL tool-calling benchmark and strong performance in math and coding tasks. The model is designed for high-resource environments, requiring three HGX instances to run. Meanwhile, the GigaChat-3.1-Lightning is engineered for local inference, matching the speed of the tiny Qwen3-1.7B model while outperforming larger models like Qwen3-4B-Instruct and Gemma-3-4B on several benchmarks, thanks to native FP8 training and the efficient DeepSeekV3 architecture that supports a 256K context window.
This release directly challenges the dominance of other open-weight leaders and provides a powerful, commercially usable alternative specifically strong in CIS languages. The models' strong tool-calling capabilities (with Lightning scoring 0.76 on BFCLv3) make them practical for building AI agents. By open-sourcing these weights, Sber aims to bolster the ecosystem, provide a high-quality native model for Russian and related languages, and give developers and researchers access to state-of-the-art architecture without restrictive licensing.
- GigaChat-3.1-Ultra is a 702B parameter MoE model that beats DeepSeek-V3-0324 and Qwen3-235B on aggregate benchmarks, scoring 0.7639 on BFCL for tool calling.
- GigaChat-3.1-Lightning is a 10B MoE model that runs as fast as Qwen3-1.7B but outperforms 4B-class models, featuring a 256K context window and native FP8 efficiency.
- Both models are fully open-source under MIT license, pretrained from scratch by Sber, and optimized for English and Russian across 14 languages.
Why It Matters
Provides powerful, commercially free alternatives to closed models and strengthens the open-source ecosystem, especially for Russian/CIS language AI development.