technical

TPU

Pronunciation

/tiː piː juː/

Also known as:Tensor Processing UnitGoogle TPU

What is a TPU?

A Tensor Processing Unit (TPU) is a custom-designed AI accelerator chip developed by Google specifically for machine learning workloads. Unlike general-purpose GPUs, TPUs are optimized for the matrix operations that dominate neural network computation—particularly the tensor calculations used in training and running deep learning models.

TPU Generations

Google has released seven generations of TPUs:

GenerationYearKey Features
TPU v12016Inference only, 92 TFLOPS
TPU v22017Training capability added
TPU v32018Liquid cooling, 420 TFLOPS
TPU v42021275 TFLOPS per chip
TPU v5e2023Cost-optimized
TPU v6 "Trillium"2024Enhanced efficiency
TPU v7 "Ironwood"2025Inference-optimized, 4,614 TFLOPS

Ironwood (TPU v7) - 2025

Google's latest TPU, Ironwood, represents a major leap:

Performance: 4,614 TFLOPS per chip—4x better than previous generation for both training and inference.

Scale: Comes in 256-chip and 9,216-chip configurations. At full scale, delivers 42.5 exaflops of FP8 compute—more powerful than the world's largest supercomputer.

Memory: 1.77 petabytes of shared High Bandwidth Memory across the superpod.

Networking: Chips connected via Inter-Chip Interconnect (ICI) at 9.6 Tb/s.

Design Focus: First TPU designed specifically for inference, optimized for "thinking models" including LLMs and Mixture of Experts architectures.

Industry Adoption

Anthropic plans to use up to 1 million TPUs to run Claude.

Meta is in talks with Google to deploy TPUs in its data centers.

Neoclouds like Crusoe and CoreWeave are also exploring TPU deployments.

How TPUs Are Designed

Google uses AlphaChip, a reinforcement learning system, to generate chip layouts. This AI-designed approach has been used for the last three TPU generations, creating layouts that surpass human-designed alternatives.

Broadcom manufactures the chips based on Google's specifications, with fabrication through TSMC.

TPU vs GPU

AspectTPUGPU (e.g., NVIDIA H100)
DesignCustom for AIGeneral parallel computing
AvailabilityGoogle Cloud onlyWidely available
OptimizationMatrix/tensor operationsBroader workloads
ScaleBuilt for massive clustersIndividual or cluster
SoftwareTensorFlow/JAX nativeCUDA ecosystem

Why It Matters

TPUs demonstrate that custom silicon can outperform general-purpose hardware for specific AI workloads. As AI training costs reach billions of dollars, efficiency gains from specialized chips become economically crucial. Google's investment in TPUs gives them infrastructure independence from NVIDIA's GPU dominance.

Mentioned In

TPUs were designed specifically for the matrix operations that dominate neural network computation.

Jeff Dean at 00:35:00

"TPUs were designed specifically for the matrix operations that dominate neural network computation."

Related Terms

See Also