Enterprise & Industry

Nvidia wants to own your AI data center from end to end

Nvidia's new inference rack uses Groq IP to slash latency and boost AI data center efficiency with a full-stack pitch.

Deep Dive

Nvidia used its GTC conference to make a bold, full-stack play for the AI data center. CEO Jensen Huang announced the LPX rack, a new inference system available later this year that combines the company's Rubin GPUs with the Nvidia Groq 3 LPU—a chip based on intellectual property licensed from AI startup Groq in a $20 billion deal. The core innovation is the LPU's 500MB of on-chip SRAM, which can store large language model weights and the intermediate KV cache. This design minimizes the need for GPUs to fetch data from slower, off-chip DRAM, dramatically cutting latency. Nvidia's Ian Buck stated this could turn "day-long queries" into results in "less than an hour."

The move is framed as an economic imperative. Nvidia presented data showing the LPU's memory access uses just one-third of a picojoule of energy per bit, compared to a GPU's 6 picojoules for DRAM access. In a concrete example, Buck claimed the LPX rack could process 500,000 tokens per second for $45 per million tokens, delivering 35 times as many tokens per second per megawatt of power. This efficiency, Nvidia argues, leads to a 10-fold increase in potential revenue per second per megawatt for AI service providers. The LPX is the hardware embodiment of Huang's broader pitch: that buying Nvidia's complete stack—from Vera CPUs and Rubin GPUs to inference racks and software—offers superior performance and economics, aiming to own the entire AI infrastructure lifecycle.

Key Points
  • Nvidia unveiled the LPX inference rack, integrating the new Nvidia Groq 3 LPU (based on $20B Groq IP) with Rubin GPUs.
  • The LPU's 500MB of on-chip SRAM holds model weights, cutting DRAM access and slashing latency for queries from days to under an hour.
  • Nvidia claims the system delivers 35x more tokens/sec/megawatt and a 10x revenue/sec/megawatt increase, pushing its full-stack data center vision.

Why It Matters

This move could lock in AI infrastructure economics, making Nvidia's integrated stack the default for cost and performance in large-scale AI deployment.