Open Source

NVIDIA admits to only 2x performance boost at max throughput with new generation of Rubin GPUs

NVIDIA's next-gen Rubin GPUs show just a 2x speedup in production workloads, raising efficiency concerns.

Deep Dive

NVIDIA has provided a more grounded performance expectation for its next-generation Rubin AI GPUs, revealing that the R200 chip will deliver only about a 2x throughput increase over the current flagship Blackwell B200 in maximum production workloads. This admission comes as the company moves to more direct "apples-to-apples" comparisons, contrasting chips with similar memory configurations rather than the previous practice of comparing an 80GB HBM3e Blackwell chip against a 288GB HBM3e Rubin chip. The 2x figure represents the real-world gain for the vast majority of companies running data centers at full utilization.

Despite the Rubin architecture's impressive theoretical specs—including nearly 3x the memory bandwidth and approximately 5x the FP4 performance—these improvements translate to a more modest doubling of output. This performance uplift comes with a significant power cost: the R200 has a Thermal Design Power (TDP) of 2300W, compared to the B200's 1000W. This means data centers would use 2.3x more power per GPU to achieve 2x the performance, raising immediate questions about the generation's power efficiency and total cost of ownership for large-scale AI training and inference clusters.

Key Points
  • NVIDIA's Rubin R200 offers only a 2x throughput boost over Blackwell B200 in max production workloads.
  • The chip requires 2300W TDP (2.3x B200's power) for that gain, highlighting potential efficiency concerns.
  • The company is now comparing similar memory configs, moving away from misleading 80GB vs. 288GB benchmarks.

Why It Matters

For AI companies, this sets realistic expectations for data center upgrades and highlights rising power costs as a critical bottleneck.