Run NVIDIA Nemotron 3 Nano as a fully managed serverless model on Amazon Bedrock
The 30B parameter model with 3B active experts leads coding benchmarks and offers 256K context length.
NVIDIA, in collaboration with AWS, has launched its Nemotron 3 Nano model as a fully managed, serverless offering on Amazon Bedrock. This marks the latest addition following earlier Nemotron 2 Nano models. The Nemotron 3 Nano is a 30 billion parameter small language model (SLM) built on a novel hybrid Mixture-of-Experts (MoE) architecture that activates only 3 billion parameters at a time. It combines Transformer layers for precise reasoning on tasks like code and math with Mamba layers for efficient long-sequence modeling, all while supporting a substantial 256K token context window. This design prioritizes high throughput and computational efficiency, making it particularly suitable for running many concurrent, lightweight agent workflows.
The model distinguishes itself by leading benchmarks for coding (SWE Bench), scientific reasoning (AIME 2025), and agentic tasks (IFBench, Arena Hard v2) among open models with 30 billion or fewer MoE parameters. It is fully open-source, providing weights, datasets, and training recipes to foster transparency. Available immediately on Bedrock, it allows developers to power generative AI applications—from accelerating financial loan processing to building specialized agent clusters—directly through AWS's inference API without any infrastructure management, accelerating innovation and deployment.
- 30B parameter hybrid MoE model with only 3B active parameters for high efficiency
- Leads benchmarks for coding (SWE Bench) and reasoning (AIME 2025) among comparable open models
- Fully managed serverless deployment on Amazon Bedrock removes infrastructure complexity for developers
Why It Matters
Professionals can now deploy a state-of-the-art, efficient coding and reasoning model at scale without managing servers, accelerating AI application development.