Open Source

NVIDIA's Nemotron-3 Ultra delivers 550B parameters, 1M context, and open weights

NVIDIA's open model packs 550B total parameters with only 55B active for efficient frontier reasoning.

Deep Dive

NVIDIA released Nemotron-3-Ultra-550B-A55B-BF16, a frontier-scale open LLM with 550B total parameters (55B active). It uses a LatentMoE hybrid architecture (Mamba-2 + MoE + Attention) with Multi-Token Prediction for faster generation. It supports up to 1M tokens of context and is built for frontier reasoning, complex agentic workflows, long-context analysis, tool use, multilingual reasoning, and high-stakes RAG. Available under the OpenMDW License for commercial and non-commercial use. Minimum GPU requirements: 8x GB200/B200/GB300/B300, 16

Key Points
  • 550B total parameters with only 55B active, using a LatentMoE hybrid architecture (Mamba-2 + MoE + Attention) with Multi-Token Prediction.
  • Supports up to 1 million tokens of context and configurable reasoning mode for step-by-step traces.
  • Requires 8x GB200/B200/GB300/B300, 16x H100, or 8x H200 GPUs; available under OpenMDW License for commercial use.

Why It Matters

Enables enterprises to deploy open frontier reasoning for complex agents and long-context tasks without vendor lock-in.