# TPU v5p

> Source: https://aiwiki.ai/wiki/google_tpu_v5p
> Updated: 2026-07-07
> Categories: AI Hardware, Google
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**TPU v5p** (Cloud TPU v5p) is Google's fifth-generation, performance-tier [Tensor Processing Unit](/wiki/tensor_processing_unit_tpu), a custom AI accelerator that delivers 459 bfloat16 TFLOPS per chip and scales to pods of 8,960 chips (about 4.45 exaFLOPS of BF16 compute) for training frontier-class large language models. Announced on December 6, 2023 alongside the launch of the [Gemini](/wiki/gemini) family of foundation models and offered through [Google Cloud](/wiki/google_cloud_terms), the v5p was positioned as Google's "most powerful, scalable, and flexible AI accelerator to date," targeting the training of frontier large language models, [mixture-of-experts](/wiki/mixture_of_experts) systems, and dense embedding workloads.[^1][^2] The "p" suffix denotes the *performance* variant of the fifth generation, complementing the earlier *efficiency-tier* [TPU v5e](/wiki/cloud_tpu) introduced in August 2023.

Each TPU v5p chip delivers a peak throughput of 459 TFLOPS in [bfloat16](/wiki/bfloat16) and 918 TFLOPS (also reported as TOPS) at INT8 precision, paired with 95 GB of HBM2e [high-bandwidth memory](/wiki/high_bandwidth_memory) at roughly 2.76 TB/s.[^1][^3][^4][^20] A full v5p "pod" connects 8,960 chips through a reconfigurable 3D-torus inter-chip interconnect at 4,800 Gbps per chip, representing more than double the FLOPS per chip, triple the HBM capacity, and roughly four times the per-pod scale of its predecessor, the [TPU v4](/wiki/tpu).[^1][^2] Google has stated that the chip can train large LLMs up to 2.8 times faster than v4 and embedding-dense models up to 1.9 times faster owing to its second-generation SparseCores.[^1][^2]

Cloud TPU v5p entered limited "early access" in December 2023 and reached general availability at Google Cloud Next on April 9, 2024.[^5][^6] The chip became the workhorse for training Google's [Gemini](/wiki/gemini) models and was prominently adopted by external customers including [Anthropic](/wiki/anthropic), [Apple](/wiki/apple_inc) (for the on-device tier of [Apple Foundation Models](/wiki/apple_foundation_models)), Salesforce, Lightricks, and [Hugging Face](/wiki/hugging_face).[^7][^8][^9] In May 2024, Google previewed its successor, the sixth-generation TPU code-named Trillium ([TPU v6e](/wiki/google_trillium)), and in November 2025 unveiled the seventh generation, [Ironwood](/wiki/tpu_ironwood), positioning v5p as a transitional but enormously influential platform in the modern AI-accelerator landscape.[^10][^11]

## What are the TPU v5p's key specifications?

The following table summarizes the TPU v5p's verified specifications as documented by Google Cloud and its launch materials.[^1][^3][^4]

| Specification | TPU v5p |
| --- | --- |
| Announced | December 6, 2023 |
| General availability | April 9, 2024 (Google Cloud Next) |
| Peak compute per chip (BF16) | 459 TFLOPS |
| Peak compute per chip (INT8) | 918 TOPS |
| HBM capacity per chip | 95 GB HBM2e[^20] |
| HBM bandwidth per chip | 2,765 GB/s (2.76 TB/s) |
| ICI bandwidth per chip | 4,800 Gbps (1,200 GB/s bidirectional) |
| Chips per pod | 8,960 |
| Peak BF16 compute per pod | approximately 4.45 exaFLOPS |
| Max single-slice topology | 6,144 chips (16×16×24) |
| Max Multislice job | 18,432 chips |
| Host VM | 4 chips (ct5p-hightpu-4t), 208 vCPUs, 448 GB RAM |
| Matrix unit | 128×128 systolic-array MXU (8 per chip across 2 TensorCores) |
| SparseCores | 4 (second generation) |
| On-demand list price | $4.20 per chip-hour[^28] |

## Background

### TPU lineage

The TPU program was started inside [Google](/wiki/google) around 2013 to address the explosive compute demand of [deep-learning](/wiki/deep_learning) workloads such as [Google Brain](/wiki/google_brain)-derived voice and translation systems. The first-generation TPU, deployed in 2015 and publicly disclosed in 2016, was an inference-only ASIC; subsequent generations added training capability and high-bandwidth memory.[^12] The architecture is anchored by a [systolic-array](/wiki/systolic_array) matrix multiply unit (MXU), originally 128×128 multiply-accumulators in the v2/v3/v4/v5 generations, surrounded by vector and scalar units within "TensorCores."[^4][^12]

By the early 2020s the lineage looked roughly:

| Generation | Year | Role |
| --- | --- | --- |
| TPU v1 | 2015-16 | Inference (8-bit) |
| TPU v2 | 2017 | First-gen training |
| TPU v3 | 2018 | Liquid-cooled training |
| TPU v4 | 2021 (public 2022) | First optical-switched pods, 4,096 chips |
| TPU v5e | Aug 2023 | Cost-efficient inference/training |
| TPU v5p | Dec 2023 | Performance-tier training |
| Trillium (TPU v6e) | 2024 | 4.7× per-chip compute over v5e |
| Ironwood (TPU v7) | 2025 | Inference-era flagship |[^11][^12]

TPU v4 introduced Google's now-signature use of [optical circuit switches](https://www.servethehome.com/google-details-tpuv4-and-its-crazy-optically-reconfigurable-ai-network/) (OCS) to dynamically reconfigure 3D-torus inter-chip topologies, an innovation that v5p inherited and scaled.[^13][^14]

### Two flavors of v5

Google's fifth generation diverged into two SKUs serving different market segments:

* **TPU v5e**: released to GA in August 2023, optimized for inference and cost-sensitive training. Each chip pairs ~197 BF16 TFLOPS (393 INT8) with 16 GB of HBM and pods of up to 256 chips.[^4][^15]
* **TPU v5p**: announced December 6, 2023, the performance variant for large-scale frontier training. Each chip more than doubles v5e's compute, carries roughly six times its HBM capacity, and scales to pods of 8,960 chips.[^1][^4]

This "two-tier" strategy, established with v5, has continued in subsequent generations and is reflected in the naming conventions (e.g., Trillium is the v6e efficiency variant; an unnamed performance v6p sibling was not separately marketed before Ironwood).[^11]

### Announcement context

The v5p was unveiled in a joint announcement on December 6, 2023, alongside the launch of [Gemini](/wiki/gemini) 1.0 (Ultra, Pro, and Nano) and the broader **AI Hypercomputer** architecture from Google Cloud, a phrase Google used to describe its integrated stack of accelerators, networking, software, and consumption models.[^1][^2][^16] In the post announcing the chip, Amin Vahdat, VP/GM of Machine Learning, Systems, and Cloud AI at Google, wrote that v5p "is our most powerful, scalable, and flexible AI accelerator to date" and that "Gemini, Google's most capable and general AI model, was trained on, and is served, using TPUs."[^1] [Sundar Pichai](/wiki/sundar_pichai) framed the v5p as the hardware foundation for Google DeepMind's Gemini ambitions and for Google Cloud customers' "next wave" of model training.[^16]

## How is the TPU v5p architected?

### Chip overview

A TPU v5p chip is a custom Google-designed ASIC built around two **TensorCores**, each of which contains four 128×128 **Matrix Multiply Units (MXUs)** implemented as [systolic arrays](/wiki/systolic_array), a vector unit, and a scalar unit.[^4][^17] Across the two TensorCores, the chip therefore carries eight MXUs in total, twice the per-chip MXU count of TPU v5e, which uses a single TensorCore. The v5p preserves the 128×128 MXU dimension established with v2 (Trillium would later expand the MXU to 256×256).[^17]

The reported peak per-chip throughput is **459 TFLOPS in bfloat16** and **918 TFLOPS/TOPS at INT8**.[^1][^3][^4] The clock speed has been documented at approximately 1,750 MHz.[^4] Google has not publicly disclosed the process node, transistor count, or die area for the v5p chip; Wikipedia's TPU table lists these fields as undisclosed.[^4]

### SparseCores (second generation)

Each v5p chip integrates **four second-generation SparseCores**, small dataflow processors specialized for embedding and sparse-tensor operations, particularly important for [recommendation systems](/wiki/mixture_of_experts) and embedding-dense workloads.[^17] Google has described SparseCores as occupying roughly 5% of chip die area and power while delivering 5-7× speedups on the operations they target versus equivalent code on the MXU or vector unit.[^17] The second-generation SparseCores in v5p underpin Google's claim that v5p trains embedding-dense models 1.9× faster than v4.[^1][^2]

### Host VM and node

A v5p TPU node exposes four chips per host (machine type `ct5p-hightpu-4t`), pairing them with 208 vCPUs (104 usable per NUMA node), 448 GB of host RAM, two NUMA nodes, and a 200 Gbps host NIC for data-center networking.[^3] Each chip is connected to the host through PCIe and to its neighbors over the dedicated inter-chip interconnect, with 50 Gbps per chip reserved for data-center-network (DCN) traffic that crosses pod boundaries.[^3]

## How much memory and interconnect bandwidth does the TPU v5p have?

### HBM2e

Each v5p chip carries **95 GB of HBM2e** memory at approximately **2,765 GB/s** of bandwidth (2.76 TB/s).[^3][^4][^20] This is three times the HBM capacity of the 32 GB present on TPU v4 chips and represents a nearly 2.3× increase in HBM bandwidth.[^1][^4] In real-world workloads, Lightricks reported that the "ample memory capacity" of v5p allowed it to fit an entire generative text-to-video model on a single chip without splitting across processes, materially accelerating each training cycle.[^1]

### ICI bandwidth and topology

Inter-chip interconnect (ICI) is Google's high-bandwidth, low-latency proprietary fabric that wires TPU chips into 3D-torus configurations. On v5p, the ICI delivers **4,800 Gbps per chip** (cited as 1,200 GBps bidirectional in Google's documentation), Google's highest per-chip ICI bandwidth in the v4/v5 era.[^1][^3] Each chip is connected to its six nearest neighbors, forming a 3D torus.[^3][^4]

### Optical circuit switching

Following the precedent set by TPU v4, the v5p pod uses **Optical Circuit Switches (OCS)** to dynamically wire chips into reconfigurable 3D-torus topologies. A v5p superpod is reported to require 48 OCS units controlling roughly 13,824 optical ports to interconnect 8,960 chips in a 16×20×28 3D-torus configuration, with sub-10-ns switching times and ICI-resiliency features that route around faulty optical links or switches.[^14][^18][^19]

## How large is a TPU v5p pod?

A full TPU v5p **pod** comprises **8,960 chips**, making it 2.19× the size of a 4,096-chip TPU v4 pod and providing roughly **4×** the per-pod aggregate FLOPS of v4 when accounting for the doubled per-chip throughput.[^1][^2] At pod scale, a single v5p superpod delivers approximately **4.45 exaFLOPS of BF16 compute**.[^17][^20]

### Topologies and slices

The maximum **single-slice** schedulable job (i.e., a topology connected fully via ICI without DCN hops) is **6,144 chips** in a 16×16×24 configuration (96 "cubes," where a cube is a 4×4×4 sub-volume).[^3][^4][^20] Larger workloads can be assembled via Google's **Multislice** software, which links multiple ICI-coupled slices over DCN; Multislice scales v5p workloads to **18,432 chips**.[^21] OCS-based topology wrapping means even sub-pod slices receive full toroidal connectivity once they reach cube size.[^3]

Single-slice topology examples documented by Google include sizes from 2×2×1 (4 chips) up to 16×16×24 (6,144 chips), with intermediate shapes such as 8×8×8 (512 chips), 16×16×16 (4,096 chips), and 16×16×20 (5,120 chips) being commonly used for production training jobs.[^3]

## What software supports the TPU v5p?

The v5p is supported across Google's open-source and proprietary AI software stack, including:

* **[JAX](/wiki/jax)**: Google's preferred high-performance training framework for TPUs, used by both DeepMind and external customers. Salesforce cited the "seamless and easy transition from Cloud TPU v4 to v5p using JAX."[^1]
* **[PyTorch](/wiki/pytorch)/XLA**: PyTorch workloads execute on TPUs via the [XLA (Accelerated Linear Algebra)](/wiki/xla) compiler, which lowers high-level graph IR onto TPU primitives.[^1][^16]
* **[TensorFlow](/wiki/tensorflow)**: supported via XLA as well.[^1]
* **Pathways**: Google's distributed-systems runtime for coordinating model execution across very large numbers of accelerators, providing a "single Python client" abstraction for multi-pod workloads, megascale-XLA compilation services, and remote-Python execution.[^21]
* **MaxText**: Google's open-source reference implementation of high-performance LLM training pipelines on TPUs and GPUs; v5p MLPerf submissions and customer references regularly use MaxText.[^22]
* **AXLearn**: Apple's JAX-based deep-learning library, used in production to pre-train the Apple Foundation Models on Cloud TPU v4 and v5p clusters.[^7][^23]
* **Multislice training** and **multi-host inferencing** software: Google's first-party tools for scaling jobs beyond a single ICI domain via DCN-coupled slices.[^21][^16]
* **Google Kubernetes Engine (GKE)**: added native support for v5p (including multi-host serving) when v5p reached GA in April 2024.[^6]

The compiler underpinning the stack is the open-source **[XLA](/wiki/xla)** compiler, which lowers JAX, PyTorch, and TensorFlow programs to TPU kernels.[^16]

## How fast is the TPU v5p?

### Headline claims

Google's launch positioning compared v5p to v4 with the following claims:

* **>2× the FLOPS per chip** vs. v4 (459 BF16 TFLOPS vs. 275 BF16 TFLOPS).[^1][^2][^4]
* **3× more HBM** per chip (95 GB vs. 32 GB).[^1][^4]
* **4× more FLOPS per pod** when accounting for the larger 8,960-chip pod versus v4's 4,096.[^1]
* **2.8× faster training** of large LLMs (e.g., a GPT-3 175B-class workload) vs. v4.[^1][^2]
* **1.9× faster training** of embedding-dense models, driven by the second-generation SparseCores.[^1][^2]
* **~2.3× better performance per dollar** vs. v4 according to third-party analyses.[^24]

Jeff Dean, then chief scientist of [Google DeepMind](/wiki/google_deepmind) and Google Research, said at launch that "Google DeepMind and Google Research have observed 2× speedups for LLM training workloads using TPU v5p chips compared to the performance on our TPU v4 generation. With the 2nd generation of SparseCores we also see significant improvement in the performance of embeddings-heavy workloads."[^1]

### MLPerf 4.0 / 4.1

In MLCommons' MLPerf Training v4.0 (June 2024), Google submitted v5p results showing **near-linear scaling (≈99.9% efficiency)** on the GPT-3 175B pre-training task across slices ranging from 512 to 6,144 chips, the result of hardware/runtime/compiler/framework co-design.[^25]

In MLPerf Training v4.1 (November 2024), Google reported a **5.7% speedup on the GPT-3 175B benchmark at 6,144 accelerators** versus its earlier v5p submission and disclosed Stable Diffusion training results. The same round disclosed that Trillium reduced training cost by **up to 1.8× (45% lower)** versus v5p while achieving the same validation accuracy. Convergence-scaling efficiency at the largest cluster sizes was reported to be comparable between Trillium and v5p.[^22]

### Real-world Gemini training

Although Google has not published the exact mix of chips used for each Gemini variant, the v5p was developed in tight co-design with the Gemini program and was the most powerful TPU available at the time Gemini 1.0 was trained.[^1][^16][^26] Sundar Pichai's public framing of the Gemini 1.0 launch credited TPUs (including the just-announced v5p) for "training and serving" the new generation of models.[^16]

### Real-world customer benchmarks

* **Salesforce**: Erik Nijkamp, senior research scientist, said: "Cloud TPU v5p compute outperforms the previous generation TPU v4 by as much as 2× ... we love how seamless and easy the transition has been from Cloud TPU v4 to v5p using JAX."[^1]
* **Lightricks**: Yoav HaCohen, PhD, core generative-AI research team lead, reported "remarkable performance and ample memory capacity," training a generative text-to-video model without splitting it into separate processes and achieving roughly **2.5× faster training** for text-to-image and text-to-video models on v5p versus v4.[^1][^27]
* **Apple**: reported sustained model-FLOP utilization (MFU) of approximately **52%** during pre-training of the on-device Apple Foundation Model on a single 2,048-chip v5p slice (see § Customers).[^23]

## How much does the TPU v5p cost, and when was it released?

### Public list pricing

When v5p reached general availability in April 2024, Google published list pricing of **$4.20 per chip-hour** for on-demand v5p capacity in supported zones, up from $3.22/chip-hour for v4 and $1.20/chip-hour for v5e.[^28] One- and three-year committed-use discounts brought effective list rates down to approximately **$2.94** and **$1.89** per chip-hour respectively.[^28][^24]

### General availability

v5p was first made available on **December 6, 2023** through a private "request via your Google Cloud account manager" allocation process timed to the Gemini 1.0 launch.[^2] General availability was announced at **Google Cloud Next 2024** in early April 2024, when Google also disclosed GKE-native v5p support and multi-host inferencing.[^6][^29]

### Regional availability

At GA, v5p was offered through North-American zones (notably us-east5 in Columbus, Ohio); regional expansion proceeded throughout 2024.[^3]

## Who uses the TPU v5p?

### Google internal: Gemini family

The earliest and largest tenant of v5p capacity was Google itself, which used the chip to train its [Gemini](/wiki/gemini) family of frontier models. Google has stated that "Gemini, Google's most capable and general AI model, was trained on, and is served, using TPUs," with v5p being the most powerful TPU available during the Gemini 1.0 development cycle and into the Gemini 1.5 generation.[^1][^16]

### Apple Foundation Models

In July 2024, [Apple](/wiki/apple_inc) published its first technical report on the [Apple Foundation Models](/wiki/apple_foundation_models) (AFM) underlying [Apple Intelligence](/wiki/apple_intelligence). The paper disclosed that the **AFM-on-device** model, a roughly 3-billion-parameter model targeting on-device inference on Apple Silicon, was pre-trained on **a single slice of 2,048 TPU v5p chips**, while the larger **AFM-server** model was pre-trained on **8,192 TPU v4 chips** provisioned as eight 1,024-chip slices.[^23][^7] Training used the AXLearn JAX-based library, a sustained 52% model-FLOP utilization (MFU) on v5p, sequence lengths up to 8,192/32,768 tokens after continued pre-training, and a core pre-training corpus of 6.3 trillion tokens. Apple's deliberate decision to use Google TPUs for on-device foundation-model training (rather than NVIDIA GPUs) was widely covered as one of the most significant external endorsements of the TPU platform.[^7][^23]

### Anthropic

[Anthropic](/wiki/anthropic) has been a major Google Cloud TPU customer and trained [Claude](/wiki/claude) models using a diversified compute strategy spanning Google TPUs, AWS [Trainium](/wiki/aws_trainium), and NVIDIA GPUs.[^30] Anthropic's TPU usage included v5p chips, and the company announced in late 2025 a landmark expansion of its TPU footprint to "well over a gigawatt" of capacity coming online in 2026, alongside a partnership with Broadcom for next-generation accelerator co-design.[^30][^31]

### Salesforce

Salesforce announced at the v5p launch that it would use the chip and the AI Hypercomputer architecture to pre-train its proprietary foundation models for specialized production use cases. The company's quote on the launch blog highlighted both the 2× v4-to-v5p speedup and the seamless migration via [JAX](/wiki/jax).[^1]

### Lightricks

The Israeli generative-AI app developer Lightricks used v5p on [GKE](https://cloud.google.com/blog/products/media-entertainment/how-lightricks-trains-video-diffusion-models-at-scale-with-jax-on-tpu) to train its LTXV text-to-video model, achieving the ~2.5× v4-over-v5p training speedup and citing the memory capacity as critical to fitting longer video sequences on a single chip.[^1][^27]

### Hugging Face

[Hugging Face](/wiki/hugging_face) partnered with Google Cloud to make TPUs (including the v5 family) accessible to its user base via its Inference Endpoints and Spaces products, and developed the open-source `optimum-tpu` library to ease deployment of Hugging Face Transformers on Cloud TPUs.[^32]

### Other customers

Additional reported users of TPU v5p capacity through Google Cloud included generative-AI startups training image and video models, large-scale RL workloads, and academic groups (e.g., the Marin open-model project under Google's Open Source program).[^33]

## How does the TPU v5p compare to NVIDIA GPUs?

Industry coverage at launch positioned v5p as Google's most credible response to NVIDIA's [H100](/wiki/nvidia_h100) GPU, particularly for large-scale training where the v5p's pod-level scale of 8,960 chips and 4,800 Gbps ICI bandwidth competed with Nvidia's NVLink/InfiniBand-based clusters.[^4][^33] Analysts at SemiAnalysis, ServeTheHome, and others described the v5p as a major step in Google's vertically-integrated AI-infrastructure strategy, with the [AI Hypercomputer](https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-tpu-v5p-and-ai-hypercomputer) framing emphasizing co-design across hardware, networking (optical circuit switching, Jupiter data-center network), and software (XLA, JAX, Pathways, Multislice).[^1][^14] For a broader side-by-side of accelerators across memory, bandwidth, and FP4/FP8/BF16 compute, see the [AI accelerator comparison](/wiki/ai_accelerator_comparison).

A noteworthy point in third-party analysis was the constraint that v5p was a **Google-Cloud-exclusive device**: unlike NVIDIA GPUs, TPUs are not sold for on-premises or multi-cloud deployments, which limited their addressable market but reinforced Google Cloud's strategic moat around the chip.[^33] By late 2024, the v5p's market position was being eclipsed by Trillium (TPU v6e), which delivered roughly 2× the per-chip performance and was advertised as 2× more power-efficient than v5p, with a 4.7× peak compute per chip uplift over v5e.[^10][^34]

## What replaced the TPU v5p?

### Trillium (TPU v6e)

Google previewed **Trillium** at Google I/O on May 14, 2024 and reached general availability on December 11, 2024. Trillium is the v6e (efficiency-tier) successor in Google's TPU naming convention. It delivers approximately **4.7× the peak compute per chip vs. TPU v5e**, doubles HBM and ICI bandwidth over v5e, and per Google delivers **~2× the per-chip performance of v5p** while being **~2× more power-efficient**. MLPerf 4.1 results showed Trillium training the GPT-3 175B benchmark at **1.8× lower cost than v5p** for the same validation accuracy, and Google reported up to **2.5× better performance per dollar than v5p** on dense LLMs such as Llama2-70B and Llama3.1-405B.[^10][^22][^34]

Trillium expanded the MXU from 128×128 to 256×256, increasing matrix-multiply throughput per cycle, while the per-pod scale dropped to 256 chips per ICI domain, relying more heavily on Multislice for very large jobs.[^17][^34]

### Ironwood (TPU v7)

Google unveiled **Ironwood** in April 2025 as the first TPU "for the age of inference" and brought it to GA on November 6, 2025. Ironwood delivers 4,614 FP8 TFLOPS per chip, 192 GB of HBM3e at 7.37 TB/s, and "superpod" scale of **9,216 chips** producing **42.5 FP8 exaFLOPS** per superpod (about 1.77 PB of shared HBM and 9.6 Tb/s ICI). Google has cited a **10× peak performance improvement over TPU v5p** and a **3.7× improvement in compute-carbon-intensity** vs. v5p.[^11][^35][^36] Anthropic's late-2025 commitment to scale TPU usage to one million chips and "well over a gigawatt" was anchored on Ironwood capacity.[^30][^31]

### Strategic positioning

In Google's overall AI-accelerator roadmap as of the mid-2020s, v5p occupies the role of the *first production-grade frontier-training TPU* whose performance and pod-scale enabled the Gemini program and external customers' largest models. Trillium and Ironwood subsequently surpassed it in absolute performance, but the v5p remains in service through Google Cloud and continues to host significant training workloads, particularly where 95 GB HBM2e and 8,960-chip single-pod scale remain advantageous.[^11][^17]

## See also

* [Tensor Processing Unit (TPU)](/wiki/tensor_processing_unit_tpu)
* [Cloud TPU](/wiki/cloud_tpu)
* [TPU Pod](/wiki/tpu_pod)
* [Trillium (TPU v6e)](/wiki/google_trillium)
* [TPU Ironwood](/wiki/tpu_ironwood)
* [AI accelerator comparison](/wiki/ai_accelerator_comparison)
* [Gemini (language model)](/wiki/gemini)
* [Apple Foundation Models](/wiki/apple_foundation_models)
* [Apple Intelligence](/wiki/apple_intelligence)
* [JAX](/wiki/jax)
* [XLA (Accelerated Linear Algebra)](/wiki/xla)
* [PyTorch](/wiki/pytorch)
* [TensorFlow](/wiki/tensorflow)
* [bfloat16](/wiki/bfloat16)
* [Systolic array](/wiki/systolic_array)
* [Mixture of Experts (MoE)](/wiki/mixture_of_experts)
* [High Bandwidth Memory (HBM)](/wiki/high_bandwidth_memory)
* [NVIDIA H100](/wiki/nvidia_h100)
* [AWS Trainium](/wiki/aws_trainium)
* [Google DeepMind](/wiki/google_deepmind)
* [Sundar Pichai](/wiki/sundar_pichai)
* [Jeff Dean](/wiki/jeff_dean)
* [Anthropic](/wiki/anthropic)
* [Claude](/wiki/claude)
* [MLPerf](/wiki/mlperf)

## References

[^1]: "Introducing Cloud TPU v5p and AI Hypercomputer," Google Cloud Blog, Dec 6, 2023. https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-tpu-v5p-and-ai-hypercomputer

[^2]: "Google announces the Cloud TPU v5p, its most powerful AI accelerator yet," TechCrunch, Dec 6, 2023. https://techcrunch.com/2023/12/06/google-announces-the-cloud-tpu-v5p-its-most-powerful-ai-accelerator-yet/

[^3]: "TPU v5p," Google Cloud Documentation. https://docs.cloud.google.com/tpu/docs/v5p

[^4]: "Tensor Processing Unit," Wikipedia. https://en.wikipedia.org/wiki/Tensor_Processing_Unit

[^5]: "Cloud TPU release notes," Google Cloud Documentation. https://docs.cloud.google.com/tpu/docs/release-notes

[^6]: "Google Cloud Next 2024 wrap up," Google Cloud Blog. https://cloud.google.com/blog/topics/google-cloud-next/google-cloud-next-2024-wrap-up

[^7]: "Apple says its AI models were trained on Google's custom chips," CNBC, Jul 29, 2024. https://www.cnbc.com/2024/07/29/apple-says-its-ai-models-were-trained-on-googles-custom-chips-.html

[^8]: "Anthropic to Expand Use of Google Cloud TPUs and Services," Google Cloud Press Corner, Oct 23, 2025. https://www.googlecloudpresscorner.com/2025-10-23-Anthropic-to-Expand-Use-of-Google-Cloud-TPUs-and-Services

[^9]: "Google Cloud TPUs made available to Hugging Face users," Hugging Face Blog. https://huggingface.co/blog/tpu-inference-endpoints-spaces

[^10]: "Introducing Trillium, sixth-generation TPUs," Google Cloud Blog. https://cloud.google.com/blog/products/compute/introducing-trillium-6th-gen-tpus

[^11]: "Ironwood TPUs and new Axion-based VMs for your AI workloads," Google Cloud Blog. https://cloud.google.com/blog/products/compute/ironwood-tpus-and-new-axion-based-vms-for-your-ai-workloads

[^12]: "Google TPU Architecture: 7 Generations Explained," Introl Blog. https://introl.com/blog/google-tpu-architecture-complete-guide-7-generations

[^13]: "Google Details TPUv4 and its Crazy Optically Reconfigurable AI Network," ServeTheHome. https://www.servethehome.com/google-details-tpuv4-and-its-crazy-optically-reconfigurable-ai-network/

[^14]: "Highly Customized Optical Networking Critical for Google's TPUs," NextBigFuture, Nov 2025. https://www.nextbigfuture.com/2025/11/highly-customized-optical-networking-critical-for-googles-tensor-processing-units-tpus.html

[^15]: "Cloud TPU v5e is generally available," Google Cloud Blog. https://cloud.google.com/blog/products/compute/announcing-cloud-tpu-v5e-in-ga

[^16]: "Google announces AI chip 'TPU v5p'," GIGAZINE. https://gigazine.net/gsc_news/en/20231207-google-tpu-v5p/

[^17]: "Google TPUs Explained: Architecture & Performance for Gemini 3," IntuitionLabs. https://intuitionlabs.ai/articles/google-tpu-architecture-gemini-3

[^18]: "Unveiling Google's TPU Architecture: OCS Optical Circuit Switching," FiberMall. https://www.fibermall.com/blog/unveiling-google-tpu-architecture.htm

[^19]: "Google TPU v5p AI Chip Launches Alongside Gemini," ServeTheHome. https://www.servethehome.com/google-tpu-v5p-ai-chip-launches-alongside-gemini/

[^20]: "How to Think About TPUs," Jax ML Scaling Book. https://jax-ml.github.io/scaling-book/tpus/

[^21]: "Building production AI on Cloud TPUs with JAX," Google Cloud Documentation. https://docs.cloud.google.com/tpu/docs/jax-ai-stack

[^22]: "Trillium MLPerf 4.1 training benchmarks," Google Cloud Blog. https://cloud.google.com/blog/products/compute/trillium-mlperf-41-training-benchmarks/

[^23]: "Apple Intelligence Foundation Language Models," Apple/arXiv, 2024. https://arxiv.org/html/2407.21075v1

[^24]: "Google TPU v5p · Specs, Pricing, Buyers," BenchGecko. https://benchgecko.ai/hardware/tpu-v5p

[^25]: "MLPerf Training 4.0: Nvidia Still King," HPCwire, Jun 2024. https://www.hpcwire.com/2024/06/12/mlperf-training-4-0-nvidia-still-king-power-and-llm-fine-tuning-added/

[^26]: "Google reveals Gemini AI model, plus new chip and hypercomputer," TechMonitor. https://www.techmonitor.ai/digital-economy/ai-and-automation/google-gemini-ai-deepmind-tpu-hypercomputer

[^27]: "How Lightricks trains video diffusion models at scale with JAX on TPU," Google Cloud Blog. https://cloud.google.com/blog/products/media-entertainment/how-lightricks-trains-video-diffusion-models-at-scale-with-jax-on-tpu

[^28]: "TPU Pricing," Google Cloud. https://cloud.google.com/tpu/pricing

[^29]: "Google launches Cloud TPU v5p GA," LinkedIn / Gang Chen. https://www.linkedin.com/posts/gang-chen-1276872_tpu-googlecloudnext-activity-7183575344824696833-d79v

[^30]: "Anthropic Taps Over a Gigawatt of Google Cloud TPUs," Cloud Wars, 2025. https://cloudwars.com/ai/anthropic-taps-over-a-gigawatt-of-google-cloud-tpus-to-power-next-gen-claude-models/

[^31]: "Anthropic expands partnership with Google and Broadcom," Anthropic news. https://www.anthropic.com/news/expanding-our-use-of-google-cloud-tpus-and-services

[^32]: "Hugging Face and Google Cloud partner to improve AI model access and TPU support," ETIH EdTech News. https://www.edtechinnovationhub.com/news/hugging-face-announces-new-google-cloud-partnership-focused-on-speed-security-and-tpu-access

[^33]: "Apple skips Nvidia's GPUs for its AI models, uses thousands of Google TPUs instead," Tom's Hardware. https://www.tomshardware.com/tech-industry/artificial-intelligence/apple-skips-nvidias-gpus-for-its-ai-models-uses-thousands-of-google-tpus-instead

[^34]: "Trillium TPU is GA," Google Cloud Blog. https://cloud.google.com/blog/products/compute/trillium-tpu-is-ga

[^35]: "Ironwood: The first Google TPU for the age of inference," Google Blog. https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/

[^36]: "Ironwood TPUs deliver 3.7x carbon efficiency gains," Google Cloud Blog. https://cloud.google.com/blog/topics/systems/ironwood-tpus-deliver-37x-carbon-efficiency-gains