# Cloud TPU

> Source: https://aiwiki.ai/wiki/cloud_tpu
> Updated: 2026-06-21
> Categories: AI Hardware, AI Infrastructure, Machine Learning
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

Cloud TPU is Google Cloud's offering of [Tensor Processing Units](/wiki/tpu) (TPUs), the family of custom [application-specific integrated circuits](/wiki/application_specific_integrated_circuit) (ASICs) that [Google](/wiki/google) builds to accelerate [machine learning](/wiki/machine_learning) training and inference. First deployed internally in 2015 and publicly announced in May 2016, TPUs are designed from the ground up for [neural network](/wiki/neural_network) computation rather than general-purpose processing, and Google rents them to external users through its [Google Cloud](/wiki/google_cloud_terms) platform as Cloud TPUs. As of 2026 Google has shipped seven generations, culminating in the v7 (Ironwood) generation, whose 9,216-chip pod delivers 42.5 FP8 exaflops of compute, and the platform's largest external customer is [Anthropic](/wiki/anthropic), which in October 2025 contracted for access to approximately one million TPUs [9][15].

## ELI5 (Explain Like I'm 5)

Imagine your brain is really good at lots of different things: reading, drawing, playing games, and doing math. That is like a regular computer chip (a CPU or GPU). Now imagine a special calculator that can only do one kind of math problem, but it does that one problem incredibly fast. That is what a TPU is. Google built this special calculator because training an AI model requires doing the same type of math (multiplying big grids of numbers) over and over, billions of times. By making a chip that only does this one job, Google made AI training and inference much faster and cheaper than using a regular chip that tries to do everything.

## What is a Cloud TPU?

A Cloud TPU is a Google-designed matrix-math accelerator that customers provision on demand through Google Cloud rather than buying physical hardware. Where a [GPU](/wiki/gpu) is a general-purpose parallel processor, a TPU is an ASIC whose silicon is dominated by a systolic array of multiply-accumulate units optimized for the dense [matrix multiplication](/wiki/matrix_multiplication) at the heart of deep learning. Google does not sell TPU chips; the only way to use them is through Google Cloud services such as Cloud TPU VMs and [Google Kubernetes Engine](/wiki/google_cloud_terms) (GKE), which makes Cloud TPU both the brand name and the exclusive access channel for the hardware.

## History and motivation

Google began developing TPUs around 2013 in response to internal projections showing that if every user spoke to their Android phone for just three minutes a day using voice search, the company would need to double its data center compute capacity. At the time, running [deep learning](/wiki/deep_learning) inference on CPUs and [GPUs](/wiki/gpu) was expensive in terms of both cost and power consumption. Google engineers, led by Norman Jouppi, designed a purpose-built chip that could handle neural network inference at scale with far better performance per watt than existing hardware.

The first TPU (v1) was deployed in Google data centers in 2015 and publicly disclosed at the Google I/O conference in May 2016. Jouppi and colleagues published the landmark paper "In-Datacenter Performance Analysis of a Tensor Processing Unit" at the International Symposium on Computer Architecture (ISCA) in June 2017. The paper demonstrated that the TPU achieved 15 to 30 times higher performance and 30 to 80 times higher performance per watt compared to contemporary CPUs and GPUs for neural network inference workloads [1].

Google made TPUs available to external users through Google Cloud Platform starting with TPU v2 in 2017. Since then, each new generation has expanded capabilities from inference-only (v1) to both training and inference (v2 onward), while continuously scaling up in compute power, memory bandwidth, and interconnect speed.

## Architecture

### Systolic array design

At the core of every TPU is a systolic array, a grid of multiply-accumulate (MAC) units through which data flows in a rhythmic, pipelined fashion. In a systolic array, partial results move from one processing element to the next without returning to memory at each step. This design minimizes memory access overhead and maximizes throughput for [matrix multiplication](/wiki/matrix_multiplication), which is the dominant operation in neural network training and inference.

The original TPU v1 contained a single 256 x 256 systolic array of 8-bit multiply-accumulate units, providing 65,536 MACs that could perform up to 92 trillion operations per second (TOPS). Starting with TPU v2, the array was reorganized into 128 x 128 units operating on [bfloat16](/wiki/bfloat16) inputs with FP32 accumulation. TPU v6e and TPU v7 (Ironwood) expanded the MXU back to 256 x 256 multiply-accumulators, increasing per-cycle throughput.

### TensorCore

Starting from TPU v2, each TPU chip contains one or more TensorCores. A TensorCore is a self-contained compute unit that includes:

- **Matrix Multiply Units (MXUs):** The primary compute engines, each containing a systolic array of multiply-accumulators. TPU v4 and later have four MXUs per TensorCore.
- **Vector unit:** Handles element-wise operations such as [activation functions](/wiki/activation_function), [softmax](/wiki/softmax), and batch normalization.
- **Scalar unit:** Manages control flow, memory address calculations, and other administrative operations.

Each TPU chip in v2 and v3 contains two TensorCores. TPU v4 and later generations also contain two TensorCores per chip, with each TensorCore housing four 128 x 128 MXUs (or 256 x 256 MXUs in v6e and v7).

### SparseCore

Starting with TPU v4, Google introduced SparseCores, specialized dataflow processors designed to accelerate models that rely heavily on sparse [embedding](/wiki/embeddings) lookups. Embedding-heavy models are common in recommendation systems and ranking workloads. TPU v4 includes four SparseCores per chip, each with dedicated scratchpad memory and optimized dataflow for sparse memory access patterns. Models with ultra-large embeddings have achieved 5 to 7 times speedups using SparseCores while consuming only about 5% of the total chip die area and power budget. TPU v5p features second-generation SparseCores, and TPU v6e includes third-generation SparseCores (two per chip).

### The bfloat16 number format

Google developed the bfloat16 (Brain Floating Point) number format specifically for TPU-based machine learning workloads. Bfloat16 is a 16-bit floating-point representation that uses one sign bit, eight exponent bits, and seven mantissa bits. Unlike IEEE FP16 (which trades exponent range for precision), bfloat16 preserves the same dynamic range as FP32 while halving memory usage. This design choice reflects the observation that neural networks are more sensitive to dynamic range than to precision during training.

On Cloud TPUs, matrix multiplications are performed with bfloat16 inputs and accumulated in FP32, providing a practical balance between computational speed and numerical accuracy. Because bfloat16 multipliers are roughly half the silicon area of FP16 multipliers and eight times smaller than FP32 multipliers, TPUs can pack more compute into the same die area [2].

### Memory and interconnects

TPU chips use [High Bandwidth Memory](/wiki/high_bandwidth_memory) (HBM) as their primary data store. HBM capacity and bandwidth have increased substantially with each generation, from 8 GB of DDR3 in TPU v1 to 192 GB of HBM3E per chip in TPU v7 (Ironwood).

TPU chips within a pod or slice communicate through high-speed Inter-Chip Interconnects (ICI). The network topology varies by generation:

- **2D torus:** TPU v2, v3, v5e, and v6e use a 2D torus topology, where each chip connects to its four nearest neighbors (north, south, east, west).
- **3D torus:** TPU v4 and v5p use a 3D torus topology, where each chip connects to six neighbors. This provides higher bisection bandwidth for large-scale distributed training. TPU v5p pods use a 16 x 20 x 28 topology connecting 8,960 chips.
- **TPU v7 (Ironwood):** Uses ICI operating at 9.6 Tb/s per chip, enabling 9,216-chip superpods.

## What are the TPU generations and their specs?

The following table summarizes the specifications of each TPU generation:

| Generation | Year | Process | Peak performance | HBM capacity | HBM bandwidth | Max pod size | Topology | Key feature |
|---|---|---|---|---|---|---|---|---|
| TPU v1 | 2015 | 28 nm | 92 TOPS (INT8) | 8 GB DDR3 | 34 GB/s | N/A (single board) | N/A | Inference only; 256x256 systolic array |
| TPU v2 | 2017 | 16 nm | 45 TFLOPS (bf16) | 16 GB HBM | 600 GB/s | 256 chips (11.5 PFLOPS) | 2D torus | First TPU for training; introduced bfloat16 |
| TPU v3 | 2018 | 16 nm | 123 TFLOPS (bf16) | 32 GB HBM | 900 GB/s | 1,024 chips | 2D torus | Liquid cooling; 2.7x perf over v2 |
| TPU v4 | 2021 | 7 nm | 275 TFLOPS (bf16) | 32 GB HBM | 1,200 GB/s | 4,096 chips | 3D torus | SparseCores; optical reconfigurable interconnect |
| TPU v5e | 2023 | N/A | 197 TFLOPS (bf16) | 16 GB HBM | 819 GB/s | 256 chips | 2D torus | Cost-efficient; training and inference |
| TPU v5p | 2023 | N/A | 459 TFLOPS (bf16) | 95 GB HBM | 2,765 GB/s | 8,960 chips | 3D torus | 2nd-gen SparseCores; competitive with H100 |
| TPU v6e (Trillium) | 2024 | N/A | 918 TFLOPS (bf16) | 32 GB HBM | 1,640 GB/s | 256 chips | 2D torus | 4.7x perf over v5e; 3rd-gen SparseCores |
| TPU v7 (Ironwood) | 2025 | N/A | 4,614 TFLOPS (FP8) | 192 GB HBM3E | 7,370 GB/s | 9,216 chips | 3D (ICI 9.6 Tb/s) | Inference-optimized; 2x perf/watt over v6e |

### TPU v1

The first-generation TPU was designed exclusively for neural network inference. It featured a single 256 x 256 systolic array of 8-bit integer ALUs, 28 MiB of on-chip SRAM, and 8 GB of DDR3 memory. Operating at 700 MHz on a 28 nm process, it consumed only 28 to 40 watts while delivering 92 TOPS. TPU v1 was deployed as a coprocessor on the PCIe bus and was never offered as a standalone cloud product. It powered latency-sensitive Google services including Search ranking, Google Translate, Google Photos, and the inference engine for [AlphaGo](/wiki/alphago) [1].

### TPU v2

Announced at Google I/O in May 2017, TPU v2 was the first generation to support both training and inference. Each chip contained two TensorCores with 128 x 128 MXUs, 16 GB of HBM, and 600 GB/s memory bandwidth. TPU v2 introduced the bfloat16 number format and delivered 45 TFLOPS per chip. Pods of up to 256 chips provided 11.5 petaFLOPS of aggregate compute. TPU v2 was the first generation made available to external users through Google Cloud and the TensorFlow Research Cloud (TFRC) program [3].

### TPU v3

Announced at Google I/O 2018, TPU v3 doubled the HBM capacity to 32 GB per chip and increased memory bandwidth to 900 GB/s. The clock speed rose from 700 MHz to 940 MHz, and peak performance reached 123 TFLOPS per chip. Pods scaled up to 1,024 chips, providing over 100 petaFLOPS of aggregate compute. TPU v3 was the first generation to require liquid cooling due to its higher power density [4].

### TPU v4

Announced at Google I/O 2021 and made generally available in 2022, TPU v4 represented a major architectural leap. Built on a 7 nm process with a die size under 400 mm2, it delivered 275 TFLOPS per chip. Each chip contained two TensorCores (four 128 x 128 MXUs each), four SparseCores, and 32 GB of HBM with 1,200 GB/s bandwidth.

TPU v4 introduced a 3D torus interconnect topology with optically reconfigurable circuit switches (OCS), allowing dynamic reconfiguration of the network topology to match workload requirements. A full v4 pod contained 4,096 chips with 10x the interconnect bandwidth per chip compared to previous generations. Google described the TPU v4 pod as an "optically reconfigurable supercomputer" in a 2023 paper [5].

### TPU v5e

Released in August 2023, TPU v5e was designed as a cost-efficient accelerator for both training and inference. It delivers 197 TFLOPS in bfloat16 and 393 TFLOPS in INT8, with 16 GB of HBM per chip. Pods support up to 256 chips in a 2D torus topology. Google positioned v5e as delivering the best price-performance ratio for mid-scale workloads, including [large language model](/wiki/large_language_model) fine-tuning and serving [6].

### TPU v5p

Announced in December 2023 alongside the [Gemini](/wiki/gemini) model, TPU v5p is Google's most powerful training-focused TPU prior to Trillium. Each chip delivers 459 TFLOPS in bfloat16 and 918 TFLOPS in INT8, with 95 GB of HBM and 2,765 GB/s bandwidth. A full v5p pod connects 8,960 chips in a 16 x 20 x 28 3D torus topology with 4,800 Gbps of ICI bandwidth per chip. TPU v5p features second-generation SparseCores that can train embedding-dense models 1.9x faster than TPU v4. Google stated that TPU v5p is competitive with the [NVIDIA](/wiki/nvidia) H100 for large model training [7].

### TPU v6e (Trillium)

Announced in mid-2024 and made generally available in late 2024, Trillium is Google's sixth-generation TPU. It achieves roughly 918 TFLOPS in bfloat16 per chip (approximately 4.7x the performance of TPU v5e) through larger 256 x 256 MXUs and a higher clock speed. HBM capacity doubled to 32 GB with doubled bandwidth (1,640 GB/s), and ICI bandwidth also doubled compared to v5e. Trillium includes third-generation SparseCores and is over 67% more energy efficient than TPU v5e.

Trillium pods scale up to 256 chips, and with Multislice technology and Titanium IPUs (Intelligence Processing Units), multiple pods can be connected into building-scale supercomputers with tens of thousands of chips. Google reported a 2.1x improvement in performance per dollar over v5e and 2.5x over v5p for dense LLM training on models such as [Llama](/wiki/llama) 2-70B and Llama 3.1-405B [8].

### TPU v7 (Ironwood)

Ironwood is Google's seventh-generation TPU and the first generation explicitly designed for inference at scale. It was unveiled at Google Cloud Next on April 9, 2025 and reached general availability on November 6, 2025 [9][16]. Each chip delivers 4,614 TFLOPS peak performance (FP8), a 10x improvement over TPU v5p per chip. Memory capacity jumps to 192 GB of HBM3E per chip with 7.37 TB/s bandwidth, six times the memory of Trillium.

Ironwood chips communicate via ICI at 9.6 Tb/s per chip. A full Ironwood superpod consists of 9,216 chips with access to 1.77 petabytes of aggregate HBM, delivering 42.5 FP8 exaflops per pod. Performance per watt is 2x that of Trillium, and Google states Ironwood is nearly 30x more efficient than the original TPU v1. Each chip contains two TensorCores and four SparseCores [9][16]. Google describes Ironwood as "purpose-built to power thinking, inferential AI models at scale" and positions it for what it calls the "age of inference" [9].

## How are TPUs organized into pods, slices, and Multislice?

TPU hardware is organized into a hierarchy of groupings:

- **Chip:** A single TPU die containing TensorCores, SparseCores, HBM, and ICI interfaces.
- **Host:** A physical machine (TPU VM) connected to one or more TPU chips.
- **Slice:** A contiguous group of TPU chips within a pod connected via ICI. Users can provision slices of various sizes depending on their workload requirements.
- **Pod:** The largest contiguous grouping of TPU chips connected by ICI within a single physical installation.

Cloud TPU Multislice is a scaling technology that allows a single training job to span multiple TPU slices, even across different pods. Slices within a Multislice configuration communicate through data center networking (DCN), which has higher latency but lower bandwidth than ICI. Multislice supports data parallelism, Fully Sharded Data Parallelism (FSDP), model parallelism, and pipeline parallelism. Google demonstrated this capability by running the world's largest distributed LLM training job across 50,944 TPU v5e chips [10].

## Software ecosystem and framework support

Cloud TPUs support three major machine learning frameworks:

| Framework | Integration method | Notes |
|---|---|---|
| [JAX](/wiki/jax) | Native via XLA | Primary framework for TPU development; developed by Google; compiles Python and NumPy-like code to XLA |
| [TensorFlow](/wiki/tensorflow) | Native via XLA | Supported from TPU v2 onward; TPU v5e, v5p, and v6e support TensorFlow 2.15.0 and later via PJRT |
| [PyTorch](/wiki/pytorch) | Via PyTorch/XLA | Open-source library maintained by Google and the PyTorch community; uses XLA as the compiler backend |

### JAX

JAX is a numerical computing library developed by Google that combines NumPy-like syntax with automatic differentiation and XLA (Accelerated Linear Algebra) compilation. JAX is the primary framework for TPU development at Google and is used for training large-scale models including Gemini. JAX's functional programming model maps naturally to TPU hardware, and its `pjit` and `shard_map` APIs provide fine-grained control over how computations and data are distributed across TPU chips [11].

### TensorFlow

TensorFlow was the original framework supported on Cloud TPUs. The TPU execution model in TensorFlow uses XLA compilation to translate TensorFlow graphs into optimized TPU machine code. Starting with TensorFlow 2.15.0, the PJRT (Portable JAX Runtime) interface provides automatic device memory defragmentation and a simpler hardware integration path.

### PyTorch/XLA

PyTorch/XLA is an open-source library that enables PyTorch models to run on TPUs by converting PyTorch operations into XLA HLO (High Level Operations) graphs. The torchax library from Google further bridges PyTorch and JAX by wrapping JAX arrays as PyTorch tensor subclasses, enabling seamless interoperability. More recently, vLLM TPU (powered by tpu-inference) has unified JAX and PyTorch under a single lowering path for high-throughput LLM inference on TPUs.

## How much do Cloud TPUs cost, and how do you access them?

Google Cloud offers several ways to provision and use TPUs:

- **On-demand:** Pay per chip-hour with no commitment. Offers maximum flexibility but highest per-unit cost.
- **Committed use discounts (CUDs):** 1-year or 3-year commitments that reduce per-chip-hour pricing significantly.
- **Preemptible/Spot TPUs:** Reduced-price TPU instances that can be reclaimed by Google with short notice. Suitable for fault-tolerant workloads; can reduce costs by up to 70%.
- **TPU Research Cloud (TRC):** A program providing free Cloud TPU access to academic researchers.

Approximate pricing (as of 2025) varies by generation and commitment level:

| TPU type | On-demand (per chip-hour) | 1-year CUD | 3-year CUD |
|---|---|---|---|
| TPU v5e | ~$1.20 | Discounted | Discounted |
| TPU v5p | ~$4.20 | Discounted | Discounted |
| TPU v6e (Trillium) | ~$2.70 | ~$1.89 | ~$1.22 |

TPU resources can be provisioned through the Google Cloud console, the `gcloud` CLI, or programmatically through Google Kubernetes Engine (GKE). GKE is the recommended orchestration layer for production TPU workloads, providing features such as job queueing with Kueue and Multislice job abstraction through the JobSet API.

## How do Cloud TPUs compare with GPUs?

TPUs and GPUs differ in their design philosophy and target workloads. The following table highlights the main differences:

| Aspect | Cloud TPU | GPU (e.g., NVIDIA H100/A100) |
|---|---|---|
| Design approach | Purpose-built ASIC for ML | General-purpose parallel processor |
| Precision formats | bfloat16, INT8, FP8 (v7), FP32 accum. | FP16, bfloat16, FP8, TF32, FP32, INT8 |
| Primary compute unit | Systolic array (MXU) | CUDA cores, Tensor Cores |
| Memory type | HBM (integrated) | HBM (integrated) |
| Interconnect | ICI (custom, in-pod) | NVLink, NVSwitch, InfiniBand |
| Software ecosystem | JAX, TensorFlow, PyTorch/XLA | CUDA, cuDNN, all major frameworks |
| Vendor lock-in | Google Cloud only | Multi-cloud, on-premises |
| Strengths | Large-batch training, LLM inference, cost per FLOP | Flexibility, broad framework support, general-purpose compute |

TPUs tend to offer better performance per dollar for large-scale, batch-oriented ML workloads, particularly for models that map well to matrix-heavy computation. Google has reported that TPU v6e provides up to 4x better performance per dollar compared to the NVIDIA H100 for LLM training and large-batch inference. However, GPUs offer broader software compatibility, support from multiple cloud providers, and the ability to handle diverse workloads beyond ML, including graphics rendering, simulation, and scientific computing [12].

The choice between TPUs and GPUs often depends on the specific workload, scale, framework preference, and whether vendor portability is a priority.

## Who uses Cloud TPUs?

TPUs have powered many of Google's most notable AI systems and attracted major external customers:

- **[AlphaGo](/wiki/alphago) and [AlphaFold](/wiki/alphafold):** DeepMind used TPUs to train AlphaGo (which defeated world champion Go player Lee Sedol in 2016) and AlphaFold 2 (which solved the protein structure prediction problem, contributing to a Nobel Prize in Chemistry in 2024) [13].
- **Gemini:** Google's multimodal AI model family. All phases of Gemini 3 training ran on TPU v5e and v6e pods [14].
- **Google Search and Translate:** TPU v1 was originally deployed to accelerate inference for Google Search ranking and the neural machine translation system behind Google Translate.
- **[Anthropic](/wiki/anthropic):** On October 23, 2025, Anthropic announced an expansion securing access to approximately one million TPUs, a deal worth tens of billions of dollars that brings well over a gigawatt of capacity online in 2026, making Anthropic the largest external TPU customer [15]. Anthropic said the expansion reflects "the strong price-performance and efficiency" it has seen with TPUs over several years [15]. In April 2026, Anthropic and Google further expanded the agreement toward multiple gigawatts of next-generation TPU capacity expected to come online starting in 2027 [17].
- **Apple:** In January 2026 Apple agreed to pay Google roughly $1 billion per year to license a custom 1.2 trillion parameter [Gemini](/wiki/gemini) model to power a revamped Siri, with Google Cloud TPU infrastructure used to serve parts of the workload before inference shifts to Apple's Private Cloud Compute systems [18].
- **Midjourney:** The AI image generation company reportedly reduced infrastructure costs by 65% by migrating workloads to TPUs.
- **Scientific research:** Climate modeling simulations, drug discovery pipelines, and genomics workflows have been run on Cloud TPU infrastructure through Google's research partnerships.

## Limitations

Despite their strong performance for ML workloads, Cloud TPUs have several limitations:

- **Vendor lock-in:** TPUs are available exclusively through Google Cloud. Workloads cannot be moved to AWS, Azure, or on-premises hardware without porting to GPU-compatible code.
- **Framework constraints:** While JAX, TensorFlow, and PyTorch are all supported, JAX remains the best-supported framework. PyTorch support via PyTorch/XLA has improved but still lags behind native CUDA support on GPUs for some operations.
- **Workload suitability:** TPUs excel at regular, dense matrix computation but may underperform on workloads with irregular memory access patterns, heavy branching, or operations not well-suited to systolic arrays.
- **Availability:** Demand for TPU capacity often exceeds supply, and quota allocation can be a bottleneck for new users.
- **Debugging complexity:** Debugging performance issues on TPUs requires familiarity with XLA compilation, HLO graph analysis, and TPU-specific profiling tools, which have a steeper learning curve than CUDA-based GPU profiling.

## References

1. Jouppi, N.P., et al. "In-Datacenter Performance Analysis of a Tensor Processing Unit." Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), 2017. https://arxiv.org/abs/1704.04760
2. Google Cloud. "BFloat16: The secret to high performance on Cloud TPUs." https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus
3. Google Cloud. "Tensor Processing Units (TPUs)." https://cloud.google.com/tpu
4. Google Cloud. "TPU v3 documentation." https://cloud.google.com/tpu/docs
5. Jouppi, N.P., et al. "TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings." arXiv:2304.01433, 2023. https://arxiv.org/abs/2304.01433
6. Google Cloud. "TPU v5e documentation." https://docs.cloud.google.com/tpu/docs/v5e
7. Google Cloud Blog. "Introducing Cloud TPU v5p and AI Hypercomputer." https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-tpu-v5p-and-ai-hypercomputer
8. Google Cloud Blog. "Introducing Trillium, sixth-generation TPUs." https://cloud.google.com/blog/products/compute/introducing-trillium-6th-gen-tpus
9. Google Blog. "Ironwood: The first Google TPU for the age of inference." https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/ironwood-tpu-age-of-inference/
10. Google Cloud Blog. "The world's largest distributed LLM training job on TPU v5e." https://cloud.google.com/blog/products/compute/the-worlds-largest-distributed-llm-training-job-on-tpu-v5e
11. JAX documentation. https://jax.readthedocs.io/
12. Google Cloud. "TPU v6e vs GPU: 4x Better AI Performance Per Dollar." https://introl.com/blog/google-tpu-v6e-vs-gpu-4x-better-ai-performance-per-dollar-guide
13. DeepMind. "AlphaFold: a solution to a 50-year-old grand challenge in biology." https://deepmind.google/research/breakthroughs/alphafold/
14. Google. "Gemini." https://deepmind.google/technologies/gemini/
15. Anthropic. "Expanding our use of Google Cloud TPUs and Services." https://www.anthropic.com/news/expanding-our-use-of-google-cloud-tpus-and-services
16. Google Cloud Blog. "Ironwood TPUs and new Axion-based VMs for your AI workloads." https://cloud.google.com/blog/products/compute/ironwood-tpus-and-new-axion-based-vms-for-your-ai-workloads
17. Anthropic. "Expanding our partnership with Google." https://www.anthropic.com/news/google-broadcom-partnership-compute
18. DataCenterDynamics / press reporting on Apple's January 2026 agreement to license a custom Gemini model for Siri. https://www.digitimes.com/news/a20260116PD236/apple-google-gemini-siri-supply-chain.html