NVIDIA Nemotron 3 Nano Omni model now available on Amazon SageMaker JumpStart
One model now processes video, audio, images, and text in a single pass...
NVIDIA has released Nemotron 3 Nano Omni on Amazon SageMaker JumpStart, a multimodal large language model that unifies video, audio, image, and text understanding into a single architecture. With 30 billion total parameters and 3 billion active parameters (30B A3B), it uses a Mamba2 Transformer Hybrid Mixture of Experts (MoE) design, combining the Nemotron 3 Nano LLM backbone, CRADIO v4-H vision encoder, and Parakeet speech encoder. The model supports a 131K token context length, chain-of-thought reasoning, tool calling, JSON output, and word-level timestamps for transcription. It accepts video (up to 2 minutes, 256 frames), audio (up to 1 hour), images (JPEG, PNG), and text inputs, outputting text in FP8 precision for efficiency.
This release addresses a key pain point in enterprise agent workflows, which traditionally stitch together separate models for vision, speech, and language—increasing latency, orchestration complexity, and cost. Nemotron 3 Nano Omni functions as a multimodal perception sub-agent, providing eyes and ears to agent systems in a single inference pass. Use cases include computer-use agents for GUI navigation (e.g., incident management, browser automation), document intelligence for contracts and financial documents, and audio/video understanding for meeting analysis, customer service review, and package delivery verification via OCR. The model is licensed for commercial use under the NVIDIA Open Model Agreement.
- 30B total parameters with only 3B active using Mamba2 Transformer Hybrid MoE architecture for efficiency
- Supports 131K token context length and processes video (2 min), audio (1 hour), images, and text in one pass
- Enables computer-use agents, document intelligence, and audio/video understanding without splitting across models
Why It Matters
Replaces fragmented multimodal pipelines with a single model, slashing latency and costs for enterprise AI agents.