# TPU Chip

> Source: https://aiwiki.ai/wiki/tpu_chip
> Updated: 2026-06-24
> Categories: AI Hardware, Google, Machine Learning
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

*This article covers the TPU chip and its hardware. For the broader topic, see [Tensor Processing Unit (TPU)](/wiki/tpu).*

The **Tensor Processing Unit (TPU)** is a custom [application-specific integrated circuit](/wiki/asic) (ASIC) developed by [Google](/wiki/google) to accelerate [machine learning](/wiki/machine_learning) workloads, and it is one of the most widely deployed examples of a purpose-built [AI accelerator](/wiki/ai_chips).[1] First deployed in Google's data centers in 2015, TPUs are purpose-built for high-throughput, low-latency [tensor](/wiki/tensor) operations, particularly the [matrix multiplication](/wiki/matrix_multiplication) at the heart of [neural network](/wiki/neural_network) training and [inference](/wiki/inference).[3] In the foundational 2017 research paper, Google reported that the first TPU was "on average about 15X - 30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X - 80X higher."[1] Over seven public generations, Google has scaled the TPU from an inference-only accelerator running at 92 TOPS to the Ironwood (TPU v7) chip delivering 4,614 TFLOPS per chip, with superpods of 9,216 chips reaching 42.5 exaflops of aggregate compute.[8]

TPUs have powered some of the most widely known AI systems in the world, including [AlphaGo](/wiki/alphago), [AlphaFold](/wiki/alphafold), [BERT](/wiki/bert), [PaLM](/wiki/palm), and [Gemini](/wiki/gemini).[13] Google makes TPUs available to external users through [Google Cloud](/wiki/google_cloud_terms), the TPU Research Cloud program, and [Google Colab](/wiki/google_colab).[4] In October 2025, [Anthropic](/wiki/anthropic) agreed to access up to one million TPUs in a deal worth tens of billions of dollars, the largest external TPU commitment disclosed to date.[15]

## History and development

### What problem was the TPU built to solve?

In 2013, Google recognized that if every user spoke to their Android phone for just three minutes per day, the company would need to double its data center compute capacity to handle the inference load.[1] This realization prompted an internal effort to build custom silicon optimized for neural network inference. Dr. Amir Salek was recruited to establish custom silicon capabilities, and engineer Jonathan Ross (who later founded [Groq](/wiki/groq_hardware)) was among the original TPU designers.[13]

The TPU v1 was designed, verified, fabricated, and deployed to production data centers in just 15 months, an unusually fast timeline for a custom ASIC.[1] Google began deploying TPU v1 chips in its data centers in early 2015, but the existence of the chip remained secret for more than a year.[13]

### When was the TPU announced?

On May 18, 2016, at the Google I/O conference, CEO Sundar Pichai revealed that Google had been running TPUs inside its data centers for over a year.[13] He stated that TPUs delivered "an order of magnitude better performance per watt for machine learning" compared to existing processors.[13] The announcement came shortly after [AlphaGo](/wiki/alphago) defeated world Go champion Lee Sedol in March 2016, a match in which TPUs played a role in powering the inference computations.[13]

### Academic publication

The TPU v1 architecture was formally described in the paper "In-Datacenter Performance Analysis of a Tensor Processing Unit" by Norman P. Jouppi et al., presented at the 44th International Symposium on Computer Architecture (ISCA) in June 2017.[1] The paper reported that the TPU was 15 to 30 times faster and 30 to 80 times more energy-efficient than contemporary server-class CPUs and GPUs (an Intel Haswell CPU and an [NVIDIA](/wiki/nvidia) K80 [GPU](/wiki/gpu)) on production neural network inference workloads.[1]

### Manufacturing partnership

[Broadcom](/wiki/broadcom) serves as the co-developer of TPUs, translating Google's architecture and specifications into manufacturable silicon. All TPU generations have been fabricated by [TSMC](/wiki/tsmc).

## Architecture

### How does the systolic array work?

The defining architectural feature of the TPU is its [systolic array](/wiki/systolic_array), a grid of multiply-accumulate (MAC) units through which data flows in a regular, wave-like pattern (the name "systolic" is an analogy to the rhythmic pumping of the heart). In TPU v1, the matrix multiply unit (MXU) consists of a 256 x 256 grid of 8-bit MAC units, totaling 65,536 ALUs.[1] As the 2017 paper puts it, "the heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS)."[1]

During a matrix multiplication, weight values are preloaded into the array from above (the right-hand side, or RHS), while activation values enter from the left (the left-hand side, or LHS) and flow horizontally across the array. Each MAC unit multiplies its stored weight by the incoming activation, adds the result to a partial sum arriving from above, and passes both the activation (horizontally) and the updated partial sum (vertically) to neighboring units. Because all 65,536 ALUs pass intermediate results directly between spatially adjacent units without any memory access, power consumption is significantly reduced.[3] The short, local wires connecting adjacent ALUs are also more energy-efficient than long global interconnects.[1]

From TPU v2 onward, the MXU uses a 128 x 128 systolic array (16,384 multiply-accumulate units per MXU), with each chip containing two or more MXUs.[4] The TPU v6e (Trillium) and TPU v7 (Ironwood) expanded to a 256 x 256 MXU, quadrupling the number of FLOPs per cycle compared to earlier generations.[7]

### Memory hierarchy

TPU v1 uses 8 GB of DDR3 DRAM as off-chip memory, providing 34 GB/s of bandwidth.[1] On-chip, the design includes 28 MiB of software-managed SRAM (the "Unified Buffer") and 4 MiB of accumulators.[1] This simplified memory hierarchy, with no hardware-managed caches, reduces memory access latency and die area compared to general-purpose processors.[3][12]

Starting with TPU v2, Google switched to [High Bandwidth Memory](/wiki/high_bandwidth_memory) (HBM), dramatically increasing both capacity and bandwidth.[4] By TPU v7, each chip has 192 GB of HBM with 7.37 TB/s of bandwidth.[8]

### What is bfloat16 and why did Google create it?

TPU v2 introduced the **bfloat16** (Brain Floating Point) number format, a custom 16-bit floating-point representation conceived at [Google Brain](/wiki/google_brain).[5] Bfloat16 uses 1 sign bit, 8 exponent bits, and 7 mantissa bits.[5] By retaining the same 8-bit exponent as IEEE 754 float32, bfloat16 preserves the same dynamic range (values up to approximately 3.4 x 10^38) while halving memory usage.[5] This is in contrast to the IEEE 754 float16 (half-precision) format, which uses 5 exponent bits and 10 mantissa bits, giving it a narrower dynamic range.

Inside the MXU, multiplications are performed in bfloat16 while accumulations use full float32 precision, a mixed-precision strategy that maintains model accuracy while doubling throughput relative to pure float32 computation.[5] Bfloat16 has since been adopted by other hardware vendors, including Intel, AMD, and NVIDIA, and is supported across all major [deep learning](/wiki/deep_learning) frameworks.[5]

### Inter-chip interconnect (ICI)

TPU v2 introduced the Inter-Chip Interconnect (ICI), a custom high-bandwidth, low-latency network that links multiple TPU chips into a single logical accelerator called a "pod" or "slice."[4] TPU v2 and v3 use a 2D torus topology, in which each chip connects to its four nearest neighbors (north, south, east, west). TPU v4 and v5p upgraded to a 3D torus, where each chip connects to six neighbors, increasing bisection bandwidth.[2]

TPU v4 introduced optical circuit switches (OCSes) based on 3D Micro-Electro-Mechanical Systems (MEMS) mirrors that can dynamically reconfigure the interconnect topology.[2] This allows the system to form "twisted" 3D torus topologies that provide up to 70% higher bisection bandwidth than a standard torus.[2] The OCS hardware accounts for less than 5% of system cost and less than 3% of system power.[2] Each TPU v4 pod connects 4,096 chips through 48 OCSes using Google's custom Palomar 136x136 OCS.[2]

TPU v7 (Ironwood) scales the ICI to 9.6 Tb/s per chip, enabling superpods of up to 9,216 chips.[8]

### SparseCores

Starting with TPU v4, Google added **SparseCores** to each chip.[2] SparseCores are specialized dataflow processors designed to accelerate models that rely on [embedding](/wiki/embeddings) lookups, a common operation in recommendation systems and large language models. SparseCores occupy only about 5% of die area and power but accelerate embedding-heavy workloads by 5 to 7 times.[2] TPU v5p introduced second-generation SparseCores with further improvements, and TPU v7 contains four SparseCores per chip.[8]

## TPU generations

The table below summarizes the key specifications of each TPU generation.

| Generation | Release year | Process node | Clock (MHz) | Memory | Memory bandwidth | Peak compute | TDP (W) | Chips per pod | Training support |
|---|---|---|---|---|---|---|---|---|---|
| TPU v1 | 2015 | 28 nm | 700 | 8 GB DDR3 | 34 GB/s | 92 TOPS (INT8) | 75 | N/A (inference only) | No |
| TPU v2 | 2017 | 16 nm | 700 | 16 GB HBM | 600 GB/s | 45 TFLOPS (BF16) | 280 | 256 (11.5 PFLOPS) | Yes |
| TPU v3 | 2018 | 16 nm | 940 | 32 GB HBM | 900 GB/s | 123 TFLOPS (BF16) | 220 | 1,024 (>100 PFLOPS) | Yes |
| TPU v4 | 2021 | 7 nm | 1,050 | 32 GB HBM | 1,200 GB/s | 275 TFLOPS (BF16) | 170 | 4,096 (>1 EFLOPS) | Yes |
| TPU v5e | 2023 | N/A | N/A | 16 GB HBM | 819 GB/s | 197 TFLOPS (BF16) | N/A | 256 | Yes |
| TPU v5p | 2023 | N/A | 1,750 | 95 GB HBM | 2,765 GB/s | 459 TFLOPS (BF16) | N/A | 8,960 (4.45 EFLOPS) | Yes |
| TPU v6e (Trillium) | 2024 | N/A | N/A | 32 GB HBM | 1,640 GB/s | 918 TFLOPS (BF16) | N/A | 256 | Yes |
| TPU v7 (Ironwood) | 2025 | N/A | N/A | 192 GB HBM | 7,370 GB/s | 4,614 TFLOPS (FP8) | N/A | 9,216 (42.5 EFLOPS) | Yes |

### TPU v1 (2015)

The first-generation TPU was designed exclusively for inference.[1] It connects to its host server via a PCIe 3.0 bus and operates as a coprocessor, receiving instructions from the host CPU.[1] The chip was fabricated on a 28 nm process, runs at 700 MHz, and consumes 75 W. Its 256 x 256 systolic array of 8-bit integer MAC units delivers 92 TOPS.[1] Google deployed over 100,000 TPU v1 chips across its data centers to serve production workloads including [RankBrain](/wiki/rankbrain) (search ranking), [Google Street View](/wiki/google_street_view) text recognition, and Google Photos image processing.[3] A single TPU v1 could process over 100 million Google Photos per day.[3]

### TPU v2 (2017)

Announced in May 2017, TPU v2 was the first generation to support both training and inference.[13] It introduced HBM, bfloat16 arithmetic, and the ICI interconnect. Each chip contains two MXUs delivering a combined 45 TFLOPS in bfloat16.[4] Four chips form a board, and 64 boards (256 chips) form a full pod delivering 11.5 petaflops.[4] TPU v2 was the first TPU made available to external users through Google Cloud.[13]

### TPU v3 (2018)

Announced on May 8, 2018, TPU v3 doubled per-chip performance relative to TPU v2, reaching 123 TFLOPS in bfloat16.[13] The clock speed increased to 940 MHz. Pods scaled to 1,024 chips with over 100 petaflops of aggregate compute.[4] TPU v3 required liquid cooling due to its higher power density.[13]

### TPU v4 (2021)

Announced on May 18, 2021, and described in the 2023 ISCA paper "TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings" by Jouppi et al., TPU v4 moved to a 7 nm process.[2] Each chip delivers 275 TFLOPS in bfloat16 with 32 GB of HBM at 1,200 GB/s. The chip introduced SparseCores for embedding acceleration and optical circuit switches for reconfigurable 3D torus interconnect topology.[2] A full pod of 4,096 chips exceeds 1 exaflop.[2] Google reported that a TPU v4 deployment uses approximately 3 times less electricity and emits approximately 20 times less CO2 than a comparable on-premises GPU cluster performing the same training.[11] On production ML benchmarks, TPU v4 was reported to be 5 to 87% faster than an NVIDIA A100 GPU.[2]

### TPU v5e (2023)

The TPU v5e is a cost-optimized variant designed for both training and inference on models up to approximately 200 billion parameters.[6] It prioritizes price-performance, achieving 2.3 times better price-performance than TPU v4.[6] Each chip has 16 GB of HBM and delivers 197 TFLOPS in bfloat16 (or 393 TOPS in INT8). Google reports that 8 TPU v5e chips can generate approximately 2,175 tokens per second on [Llama](/wiki/llama) 2-70B inference.[6]

### TPU v5p (2023)

Announced in December 2023, the TPU v5p is the high-performance variant of the fifth generation, intended for large-scale training.[6] Each chip delivers 459 TFLOPS in bfloat16 with 95 GB of HBM at 2,765 GB/s.[10] A full v5p pod composes 8,960 chips in a 3D torus with 4,800 Gbps of ICI bandwidth per chip, reaching approximately 4.45 exaflops.[10] TPU v5p can train large [LLM](/wiki/large_language_model) models 2.8 times faster than TPU v4.[6] It includes second-generation SparseCores that accelerate embedding-dense workloads 1.9 times faster than TPU v4. The physical layout of TPU v5p was designed with the assistance of [deep reinforcement learning](/wiki/reinforcement_learning).[6]

### TPU v6e, Trillium (2024)

Announced at Google I/O in May 2024 and made generally available in late 2024, Trillium is Google's sixth-generation TPU.[7] Each chip delivers 918 TFLOPS in bfloat16, a 4.7 times increase over TPU v5e.[7] The MXU was expanded from 128 x 128 to 256 x 256.[9] HBM capacity doubled to 32 GB with 1,640 GB/s bandwidth.[9] Trillium is over 67% more energy-efficient than TPU v5e.[7] Pods scale to 256 chips with up to 13 TB/s of ICI bandwidth per chip.[7]

### TPU v7, Ironwood (2025)

Unveiled at Google Cloud Next in April 2025, Ironwood is Google's seventh-generation TPU and the first designed with inference as the primary target.[8] Each chip delivers 4,614 TFLOPS in FP8 and contains 192 GB of HBM with 7.37 TB/s bandwidth.[8] The chip uses a chiplet architecture: two chiplets, each containing one TensorCore, two SparseCores, and 96 GB of HBM.[8] Superpods scale to 9,216 chips connected via a 3D torus ICI at 9.6 Tb/s per chip, delivering 42.5 exaflops of aggregate compute and 1.77 petabytes of shared HBM.[8] Ironwood offers more than 4 times better performance per chip for both training and inference compared to the previous generation.[8]

### TPU v8 (2026)

At Google Cloud Next 2026, Google announced its eighth-generation TPU as two distinct chips, splitting training and inference for the first time: **TPU 8t**, optimized for large, compute-intensive training, and **TPU 8i**, optimized for latency-sensitive inference serving in what Google calls "the agentic era."[16] Google reported that TPU 8t delivers nearly 3 times the compute performance per pod over the previous generation, while TPU 8i delivers about 80% better performance-per-dollar.[16]

## Edge TPU

In addition to data center TPUs, Google developed the **Edge TPU**, a small ASIC designed for on-device [inference](/wiki/inference) in low-power environments. The Edge TPU delivers 4 TOPS of INT8 inference performance while consuming only 2 watts (2 TOPS per watt).[14] It can run models such as [MobileNet](/wiki/mobilenet) V2 at nearly 400 frames per second.[14] The Edge TPU supports only forward-pass operations (inference, not training) and requires 8-bit quantized [TensorFlow Lite](/wiki/tensorflow) models.[14]

Google sells Edge TPU hardware under the **Coral** brand in several form factors, including USB accelerators, PCI-e modules, development boards, and system-on-module packages.

## Software ecosystem

### Supported frameworks

TPUs are supported by three major [deep learning](/wiki/deep_learning) frameworks:

| Framework | Integration method | Notes |
|---|---|---|
| [TensorFlow](/wiki/tensorflow) | Native support via XLA compiler | TensorFlow was the first framework with TPU support; tight integration with Google's ecosystem |
| [JAX](/wiki/jax) | Native support via XLA compiler | JAX's functional programming model and GSPMD (General-purpose SPMD) partitioner allow automatic parallelization across TPU pods with minimal code changes |
| [PyTorch](/wiki/pytorch) | PyTorch/XLA library | Open-source package that translates PyTorch operations to XLA for execution on TPUs |

### XLA compiler

[XLA](/wiki/xla) (Accelerated Linear Algebra) is an open-source compiler for machine learning that takes computation graphs from TensorFlow, JAX, and PyTorch and optimizes them for high-performance execution on TPUs, GPUs, and CPUs. XLA performs whole-program optimization, including operator fusion, memory layout assignment, and tile-size selection, producing efficient machine code for the target hardware.

## How does a TPU differ from a CPU or GPU?

| Feature | [CPU](/wiki/cpu) | [GPU](/wiki/gpu) | TPU |
|---|---|---|---|
| Design purpose | General-purpose computing | Parallel computing; originally graphics rendering | Machine learning inference and training |
| Core architecture | Few complex cores with large caches | Thousands of smaller CUDA/stream cores | Systolic array of MAC units |
| Arithmetic precision | FP64, FP32, INT32, INT64 | FP64, FP32, FP16, BF16, INT8, FP8 | BF16, FP32, INT8, FP8 (varies by generation) |
| Memory hierarchy | Multi-level hardware caches (L1, L2, L3) | HBM with hardware caches | HBM with software-managed SRAM (no hardware caches in v1) |
| Interconnect for scaling | Ethernet, InfiniBand | NVLink, NVSwitch, InfiniBand | Custom ICI with optical circuit switches |
| Programming model | Any language/framework | CUDA, ROCm, OpenCL | XLA (via TensorFlow, JAX, or PyTorch/XLA) |
| Availability | Ubiquitous | Multiple vendors (NVIDIA, AMD, Intel) | Google Cloud only |

TPUs are optimized for workloads dominated by large matrix multiplications and convolutions, such as training and serving [transformer](/wiki/transformer) models, [convolutional neural networks](/wiki/convolutional_neural_network), and recommendation systems. GPUs offer broader flexibility for workloads with irregular computation patterns, custom CUDA kernels, or non-ML parallel computing tasks. CPUs remain the best choice for workloads with complex branching logic, low parallelism, or tasks that require broad instruction set support.

## Notable models and applications

TPUs have been used to train and serve many well-known AI systems:

| Model or system | Year | TPU generation used | Domain |
|---|---|---|---|
| [AlphaGo](/wiki/alphago) | 2016 | TPU v1 | Game playing (Go) |
| [RankBrain](/wiki/rankbrain) | 2015 | TPU v1 | Search ranking |
| Google Street View text processing | 2015 | TPU v1 | [OCR](/wiki/optical_character_recognition) |
| [AlphaZero](/wiki/alphazero) | 2017 | TPU v2 | Game playing (chess, Shogi, Go) |
| [BERT](/wiki/bert) | 2018 | TPU v3 | [Natural language processing](/wiki/natural_language_processing) |
| [AlphaFold](/wiki/alphafold) | 2020 | TPU v3 | [Protein structure prediction](/wiki/protein_folding) |
| [LaMDA](/wiki/lamda) | 2021 | TPU v4 | Conversational AI |
| [PaLM](/wiki/palm) | 2022 | TPU v4 | Large language model |
| [Gemini](/wiki/gemini) | 2023 | TPU v4, v5e, v5p | Multimodal AI |
| [Gemma](/wiki/gemma) | 2024 | TPU v5e | Open-weight LLM |

Google also offers the open-weight [Gemma](/wiki/gemma) model family, which shares technical infrastructure with Gemini and was trained on TPUs.[13]

## Cloud availability and pricing

TPUs are available to external users exclusively through [Google Cloud](/wiki/google_cloud_terms). Pricing is per chip-hour and varies by TPU generation and region.

| TPU version | On-demand price (per chip-hour, USD) | Committed use (1-year) discount |
|---|---|---|
| TPU v4 | $0.24 | ~25-30% |
| TPU v5e | $0.32 | ~25-30% |
| TPU v5p | $0.48 | ~25-30% |
| TPU v6e (Trillium) | Varies by region | Available |
| TPU v7 (Ironwood) | Varies by region | Available |

Google also provides free or subsidized TPU access through several programs:

- **TPU Research Cloud (TRC):** Researchers, students, and entrepreneurs can apply for free access to TPU clusters for research purposes.
- **[Google Colab](/wiki/google_colab):** Offers limited free access to a single TPU v5e chip for experimentation.
- **Google Cloud free credits:** New customers receive $300 in free credits applicable to TPU usage.

As of 2026, TPU v7 (Ironwood) is generally available. Google has also been in discussions with cloud providers such as CoreWeave and Crusoe about deploying TPUs outside of Google's own infrastructure.

### How big is the Anthropic TPU deal?

On October 23, 2025, [Anthropic](/wiki/anthropic) announced an expansion of its use of Google Cloud TPUs, agreeing to access up to one million TPU chips and well over a gigawatt of capacity expected to come online in 2026, in a deal that CNBC and others reported to be worth tens of billions of dollars.[15][17] It was the largest external TPU commitment disclosed at the time. Anthropic CFO Krishna Rao said that the company's customers, "from Fortune 500 companies to AI-native startups, depend on Claude for their most important work," framing the expanded compute as support for that demand.[15] The agreement signaled that TPUs now train and serve frontier models beyond Google's own [Gemini](/wiki/gemini) family, including the [Claude](/wiki/claude) models built by a competing AI lab.

## Explain like I'm 5 (ELI5)

Imagine your brain is really good at all kinds of things: reading, talking, doing math, playing games. That is like a regular computer chip (a CPU). Now imagine a special calculator that can only do one thing, but it does that one thing incredibly fast: multiplying lots of numbers at once. That is what a TPU is. Google built this special calculator because [artificial intelligence](/wiki/artificial_intelligence) programs need to multiply millions of numbers together over and over again. By making a chip that only does multiplication really well, Google can run AI programs much faster and using much less electricity than a regular chip.

## See also

- [Tensor Processing Unit (TPU)](/wiki/tpu)
- [TPU Board](/wiki/tpu_board)
- [AI accelerator](/wiki/ai_chips)
- [GPU](/wiki/gpu)
- [ASIC](/wiki/asic)
- [Neural network](/wiki/neural_network)
- [Google](/wiki/google)
- [XLA](/wiki/xla)
- [JAX](/wiki/jax)
- [TensorFlow](/wiki/tensorflow)
- [Deep learning](/wiki/deep_learning)

## References

1. Jouppi, N.P. et al. (2017). "In-Datacenter Performance Analysis of a Tensor Processing Unit." Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), pp. 1-12. https://arxiv.org/abs/1704.04760
2. Jouppi, N.P. et al. (2023). "TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings." Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA), pp. 1147-1160. https://arxiv.org/abs/2304.01433
3. Google Cloud. "An in-depth look at Google's first Tensor Processing Unit (TPU)." Google Cloud Blog. https://cloud.google.com/blog/products/ai-machine-learning/an-in-depth-look-at-googles-first-tensor-processing-unit-tpu
4. Google Cloud. "TPU system architecture." Google Cloud Documentation. https://docs.cloud.google.com/tpu/docs/system-architecture-tpu-vm
5. Google Cloud. "BFloat16: The secret to high performance on Cloud TPUs." Google Cloud Blog. https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus
6. Google Cloud. "Introducing Cloud TPU v5p and AI Hypercomputer." Google Cloud Blog. https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-tpu-v5p-and-ai-hypercomputer
7. Google Cloud. "Introducing Trillium, sixth-generation TPUs." Google Cloud Blog. https://cloud.google.com/blog/products/compute/introducing-trillium-6th-gen-tpus
8. Google. "Ironwood: The first Google TPU for the age of inference." Google Blog. https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/ironwood-tpu-age-of-inference/
9. Google Cloud. "TPU v6e documentation." Google Cloud Documentation. https://docs.cloud.google.com/tpu/docs/v6e
10. Google Cloud. "TPU v5p documentation." Google Cloud Documentation. https://docs.cloud.google.com/tpu/docs/v5p
11. Google Cloud. "TPU v4 enables performance, energy and CO2e efficiency gains." Google Cloud Blog. https://cloud.google.com/blog/topics/systems/tpu-v4-enables-performance-energy-and-co2e-efficiency-gains
12. Patterson, D. et al. (2021). "Ten Lessons From Three Generations Shaped Google's TPUv4i." IEEE Micro. https://www.cs.cmu.edu/~18742/papers/Jouppi2021.pdf
13. Google Cloud. "TPU transformation: A look back at 10 years of our AI-specialized chips." Google Cloud Blog. https://cloud.google.com/transform/ai-specialized-chips-tpu-history-gen-ai
14. Coral. "Edge TPU performance benchmarks." https://www.coral.ai/docs/edgetpu/benchmarks/
15. Anthropic. "Expanding our use of Google Cloud TPUs and Services." October 23, 2025. https://www.anthropic.com/news/expanding-our-use-of-google-cloud-tpus-and-services
16. Google. "Our eighth generation TPUs: two chips for the agentic era." Google Blog. https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/eighth-generation-tpu-agentic-era/
17. CNBC. "Google and Anthropic announce cloud deal worth tens of billions of dollars." October 23, 2025. https://www.cnbc.com/2025/10/23/anthropic-google-cloud-deal-tpu.html