550B total parameters with only 55B active, using a LatentMoE hybrid architecture (Mamba-2 + MoE + Attention) with Multi-Token Prediction?

550B total parameters with only 55B active, using a LatentMoE hybrid architecture (Mamba-2 + MoE + Attention) with Multi-Token Prediction.

Supports up to 1 million tokens of context and configurable reasoning mode for step-by-step traces?

Supports up to 1 million tokens of context and configurable reasoning mode for step-by-step traces.

Requires 8x GB200/B200/GB300/B300, 16x H100, or 8x H200 GPUs; available under OpenMDW License for commercial use?

Requires 8x GB200/B200/GB300/B300, 16x H100, or 8x H200 GPUs; available under OpenMDW License for commercial use.

Open Source

NVIDIA's Nemotron-3 Ultra delivers 550B parameters, 1M context, and open weights

r/LocalLLaMA June 04, 2026

⚡NVIDIA's open model packs 550B total parameters with only 55B active for efficient frontier reasoning.

Deep Dive

NVIDIA released Nemotron-3-Ultra-550B-A55B-BF16, a frontier-scale open LLM with 550B total parameters (55B active). It uses a LatentMoE hybrid architecture (Mamba-2 + MoE + Attention) with Multi-Token Prediction for faster generation. It supports up to 1M tokens of context and is built for frontier reasoning, complex agentic workflows, long-context analysis, tool use, multilingual reasoning, and high-stakes RAG. Available under the OpenMDW License for commercial and non-commercial use. Minimum GPU requirements: 8x GB200/B200/GB300/B300, 16

Key Points

550B total parameters with only 55B active, using a LatentMoE hybrid architecture (Mamba-2 + MoE + Attention) with Multi-Token Prediction.
Supports up to 1 million tokens of context and configurable reasoning mode for step-by-step traces.
Requires 8x GB200/B200/GB300/B300, 16x H100, or 8x H200 GPUs; available under OpenMDW License for commercial use.

Why It Matters

Enables enterprises to deploy open frontier reasoning for complex agents and long-context tasks without vendor lock-in.

Read Original Article

NVIDIA's Nemotron-3 Ultra delivers 550B parameters, 1M context, and open weights

Why It Matters

Related Articles

🚀 Stay Ahead in AI