NVIDIA Beats Everyone To DeepSeek V4 With Day-0 Blackwell Support, Pushing 3,500 Tokens Per Second On 1.6T Models
Day-0 support for 1.6T models on Blackwell hits 3,500 tokens per second.
NVIDIA has announced Day-0 support for DeepSeek V4 on its next-generation Blackwell GPUs, achieving a staggering 3,500 tokens per second on 1.6 trillion-parameter models. This performance milestone highlights Blackwell's ability to handle massive AI workloads with minimal latency, specifically optimized for DeepSeek V4's 1M-token long-context inference capabilities. The integration allows developers to run trillion-parameter models in real time, a feat previously constrained by memory bandwidth and compute bottlenecks.
For enterprises and researchers, this unlocks new possibilities in real-time document analysis, complex reasoning tasks, and large-scale AI deployment without sacrificing speed. NVIDIA's Day-0 support means Blackwell is ready for production use from the moment DeepSeek V4 goes live, reducing time-to-value. This positions NVIDIA as the dominant hardware provider for the next wave of ultra-large language models, directly competing with cloud-based alternatives.
- Blackwell GPUs achieve 3,500 tokens per second on 1.6 trillion-parameter DeepSeek V4 models
- Day-0 support enables immediate deployment with 1M-token long-context inference
- Optimized for low-latency, high-scale AI workloads targeting enterprise and research use
Why It Matters
Real-time inference on trillion-parameter models reshapes AI deployment, enabling faster, larger-scale applications in enterprise and research.