TPU
/tiː piː juː/
What is a TPU?
A Tensor Processing Unit (TPU) is a custom-designed AI accelerator chip developed by Google specifically for machine learning workloads. Unlike general-purpose GPUs, TPUs are optimized for the matrix operations that dominate neural network computation—particularly the tensor calculations used in training and running deep learning models.
TPU Generations
Google has released seven generations of TPUs:
| Generation | Year | Key Features |
|---|---|---|
| TPU v1 | 2016 | Inference only, 92 TFLOPS |
| TPU v2 | 2017 | Training capability added |
| TPU v3 | 2018 | Liquid cooling, 420 TFLOPS |
| TPU v4 | 2021 | 275 TFLOPS per chip |
| TPU v5e | 2023 | Cost-optimized |
| TPU v6 "Trillium" | 2024 | Enhanced efficiency |
| TPU v7 "Ironwood" | 2025 | Inference-optimized, 4,614 TFLOPS |
Ironwood (TPU v7) - 2025
Google's latest TPU, Ironwood, represents a major leap:
Performance: 4,614 TFLOPS per chip—4x better than previous generation for both training and inference.
Scale: Comes in 256-chip and 9,216-chip configurations. At full scale, delivers 42.5 exaflops of FP8 compute—more powerful than the world's largest supercomputer.
Memory: 1.77 petabytes of shared High Bandwidth Memory across the superpod.
Networking: Chips connected via Inter-Chip Interconnect (ICI) at 9.6 Tb/s.
Design Focus: First TPU designed specifically for inference, optimized for "thinking models" including LLMs and Mixture of Experts architectures.
Industry Adoption
Anthropic plans to use up to 1 million TPUs to run Claude.
Meta is in talks with Google to deploy TPUs in its data centers.
Neoclouds like Crusoe and CoreWeave are also exploring TPU deployments.
How TPUs Are Designed
Google uses AlphaChip, a reinforcement learning system, to generate chip layouts. This AI-designed approach has been used for the last three TPU generations, creating layouts that surpass human-designed alternatives.
Broadcom manufactures the chips based on Google's specifications, with fabrication through TSMC.
TPU vs GPU
| Aspect | TPU | GPU (e.g., NVIDIA H100) |
|---|---|---|
| Design | Custom for AI | General parallel computing |
| Availability | Google Cloud only | Widely available |
| Optimization | Matrix/tensor operations | Broader workloads |
| Scale | Built for massive clusters | Individual or cluster |
| Software | TensorFlow/JAX native | CUDA ecosystem |
Why It Matters
TPUs demonstrate that custom silicon can outperform general-purpose hardware for specific AI workloads. As AI training costs reach billions of dollars, efficiency gains from specialized chips become economically crucial. Google's investment in TPUs gives them infrastructure independence from NVIDIA's GPU dominance.
Related Reading
- Jeff Dean - Google Chief Scientist, key TPU architect
- How TPUs Came to Be - Dean explains the back-of-envelope calculation that launched TPUs
- AI Infrastructure - The broader compute ecosystem
- Scaling Laws - What TPU scale enables
