Enterprise & Industry

Nvidia wants to own your AI data center from end to end

ZDNet AI March 17, 2026

⚡Nvidia's new inference rack uses Groq IP to slash latency and boost AI data center efficiency with a full-stack pitch.

Deep Dive

Nvidia used its GTC conference to make a bold, full-stack play for the AI data center. CEO Jensen Huang announced the LPX rack, a new inference system available later this year that combines the company's Rubin GPUs with the Nvidia Groq 3 LPU—a chip based on intellectual property licensed from AI startup Groq in a $20 billion deal. The core innovation is the LPU's 500MB of on-chip SRAM, which can store large language model weights and the intermediate KV cache. This design minimizes the need for GPUs to fetch data from slower, off-chip DRAM, dramatically cutting latency. Nvidia's Ian Buck stated this could turn "day-long queries" into results in "less than an hour."

The move is framed as an economic imperative. Nvidia presented data showing the LPU's memory access uses just one-third of a picojoule of energy per bit, compared to a GPU's 6 picojoules for DRAM access. In a concrete example, Buck claimed the LPX rack could process 500,000 tokens per second for $45 per million tokens, delivering 35 times as many tokens per second per megawatt of power. This efficiency, Nvidia argues, leads to a 10-fold increase in potential revenue per second per megawatt for AI service providers. The LPX is the hardware embodiment of Huang's broader pitch: that buying Nvidia's complete stack—from Vera CPUs and Rubin GPUs to inference racks and software—offers superior performance and economics, aiming to own the entire AI infrastructure lifecycle.

Key Points

Nvidia unveiled the LPX inference rack, integrating the new Nvidia Groq 3 LPU (based on $20B Groq IP) with Rubin GPUs.
The LPU's 500MB of on-chip SRAM holds model weights, cutting DRAM access and slashing latency for queries from days to under an hour.
Nvidia claims the system delivers 35x more tokens/sec/megawatt and a 10x revenue/sec/megawatt increase, pushing its full-stack data center vision.

Why It Matters

This move could lock in AI infrastructure economics, making Nvidia's integrated stack the default for cost and performance in large-scale AI deployment.

Read Original Article

Nvidia wants to own your AI data center from end to end

Why It Matters

Stay Ahead in AI