Benchmarking LLM Inference on RTX PRO 6000 SE / H100 / H200 / B200
New benchmarks reveal which GPUs give you the most AI bang for your buck.
Deep Dive
A new benchmark pits NVIDIA's new Blackwell B200 against the RTX Pro 6000 SE, H100, and H200 for LLM inference. The B200 dominated raw throughput, being up to 4.87x faster than the Pro 6000 on communication-heavy models. However, using real ownership costs, the RTX Pro 6000 emerged as a compelling low-capex option, beating the H100 on cost-per-token across all tested models and matching the H200 in one scenario.
Why It Matters
This data is crucial for anyone building or renting AI infrastructure, revealing the real trade-offs between speed, cost, and architecture.