Google unveils two new TPUs designed for the "agentic era"
New TPU pods pack 9600 chips and 2 petabytes of memory, slashing model training from months to weeks.
Google has launched its eighth-generation Tensor Processing Units (TPUs), introducing a dual-chip strategy for the first time with the TPU 8t for training and the TPU 8i for inference. This split is a direct response to what Google calls the 'agentic era,' where AI systems require specialized hardware for different lifecycle stages. The TPU 8t is built for speed, with updated server 'pods' containing 9,600 chips and a massive 2 petabytes of shared high-bandwidth memory. Google claims this setup delivers 121 FP4 EFlops of compute per pod, nearly three times the ceiling of the previous Ironwood TPU, and can scale linearly to a million chips in a single logical cluster. The company also boasts a 97% 'goodpute' rate, meaning less wasted computational effort due to better fault handling and memory management.
For the inference phase, where trained models generate outputs, the TPU 8i is optimized for efficiency. It features tripled on-chip SRAM (384 MB) to support larger key-value caches, which is crucial for models with long context windows. TPU 8i pods are larger, with 1,152 chips compared to Ironwood's 256, offering 11.6 EFlops per pod. A key architectural shift is the full-stack move to Google's custom Axion ARM CPUs, with one CPU for every two TPUs, replacing the x86 hosts used previously. Google states this co-designed approach, along with chip-level networking integration, results in twice the performance per watt compared to Ironwood and a sixfold increase in computing power per unit of electricity in its data centers.
- TPU 8t training pods feature 9,600 chips and 2PB memory, offering 121 FP4 EFlops—3x more compute than the previous Ironwood TPU.
- TPU 8i inference chips have 384MB of on-chip SRAM (3x more) for faster processing of models with long context windows.
- The full-stack ARM-based design with Axion CPUs delivers 2x better performance per watt and 6x more compute per unit of electricity in data centers.
Why It Matters
This specialized hardware could drastically reduce the cost and time to train frontier AI models, making advanced agent development more feasible.