technical

TPU

Pronunciation

/tiː piː juː/

Also known as:Tensor Processing UnitGoogle TPU

What is a TPU?

A Tensor Processing Unit (TPU) is a custom-designed AI accelerator chip developed by Google specifically for machine learning workloads. Unlike general-purpose GPUs, TPUs are optimized for the matrix operations that dominate neural network computation—particularly the tensor calculations used in training and running deep learning models.

TPU Generations

Google has released seven generations of TPUs:

Generation	Year	Key Features
TPU v1	2016	Inference only, 92 TFLOPS
TPU v2	2017	Training capability added
TPU v3	2018	Liquid cooling, 420 TFLOPS
TPU v4	2021	275 TFLOPS per chip
TPU v5e	2023	Cost-optimized
TPU v6 "Trillium"	2024	Enhanced efficiency
TPU v7 "Ironwood"	2025	Inference-optimized, 4,614 TFLOPS

Ironwood (TPU v7) - 2025

Google's latest TPU, Ironwood, represents a major leap:

Performance: 4,614 TFLOPS per chip—4x better than previous generation for both training and inference.

Scale: Comes in 256-chip and 9,216-chip configurations. At full scale, delivers 42.5 exaflops of FP8 compute—more powerful than the world's largest supercomputer.

Memory: 1.77 petabytes of shared High Bandwidth Memory across the superpod.

Networking: Chips connected via Inter-Chip Interconnect (ICI) at 9.6 Tb/s.

Design Focus: First TPU designed specifically for inference, optimized for "thinking models" including LLMs and Mixture of Experts architectures.

Industry Adoption

Anthropic plans to use up to 1 million TPUs to run Claude.

Meta is in talks with Google to deploy TPUs in its data centers.

Neoclouds like Crusoe and CoreWeave are also exploring TPU deployments.

How TPUs Are Designed

Google uses AlphaChip, a reinforcement learning system, to generate chip layouts. This AI-designed approach has been used for the last three TPU generations, creating layouts that surpass human-designed alternatives.

Broadcom manufactures the chips based on Google's specifications, with fabrication through TSMC.

TPU vs GPU

Aspect	TPU	GPU (e.g., NVIDIA H100)
Design	Custom for AI	General parallel computing
Availability	Google Cloud only	Widely available
Optimization	Matrix/tensor operations	Broader workloads
Scale	Built for massive clusters	Individual or cluster
Software	TensorFlow/JAX native	CUDA ecosystem

Why It Matters

TPUs demonstrate that custom silicon can outperform general-purpose hardware for specific AI workloads. As AI training costs reach billions of dollars, efficiency gains from specialized chips become economically crucial. Google's investment in TPUs gives them infrastructure independence from NVIDIA's GPU dominance.

Jeff Dean - Google Chief Scientist, key TPU architect
How TPUs Came to Be - Dean explains the back-of-envelope calculation that launched TPUs
AI Infrastructure - The broader compute ecosystem
Scaling Laws - What TPU scale enables

Mentioned In

Jeff Dean at 00:35:00

"TPUs were designed specifically for the matrix operations that dominate neural network computation."

Related Terms

Gpu Ai Infrastructure Scaling Laws

Mentioned In

Related Terms

See Also