TPU Ironwood

TPU Ironwood (officially TPU v7 or TPU7x) is Google's seventh-generation Tensor Processing Unit, unveiled at Google Cloud Next 2025 in Las Vegas on April 9, 2025. It is the first TPU generation designed primarily for inference workloads, reflecting a broader industry shift from training-dominated compute to large-scale AI serving. Each chip delivers 4,614 teraFLOPS at FP8 precision, carries 192 GB of HBM3E memory with 7.37 TB/s of bandwidth, and can be assembled into superpods of up to 9,216 chips reaching 42.5 exaFLOPS of aggregate FP8 compute. Google has deployed Ironwood to power its Gemini 2.5 models and to run Google DeepMind's AlphaFold research workloads internally. External customers including Anthropic, Lightricks, and Essential AI were announced at launch as early adopters.

Background and TPU lineage

Google began designing its first custom AI accelerator in 2013, motivated by a projection that if users interacted with voice search using neural networks for only a few minutes per day, the company would need to double its entire data-center compute capacity to handle the inference load. The solution was an application-specific integrated circuit (ASIC) built around a systolic array optimized for matrix multiplication, rather than a general-purpose GPU.

TPU v1 entered production in Google data centers in 2015 and was publicly disclosed at Google I/O 2016. Fabricated on a 28 nm process, it contained a 256x256 array of 8-bit multiply-accumulate units delivering roughly 92 TOPS of 8-bit integer performance at 75 W. It was an inference-only accelerator connected to host servers over PCIe 3.0.

TPU v2, announced in 2017, was the first generation capable of training neural networks. It introduced bfloat16 arithmetic, a 16-bit floating-point format that Google popularized across the AI industry, and 16 GB of HBM memory providing 600 GB/s of bandwidth. Pods of 64 chips were assembled for training jobs.

TPU v3, announced in May 2018, doubled the per-chip compute of v2 and introduced liquid cooling, a thermal design that has persisted through every subsequent generation. Pods scaled to 1,024 chips.

TPU v4, disclosed in 2021, moved to a 7 nm process and introduced optical circuit switches (OCS) to interconnect chips in a pod without fixed wiring. The OCS fabric allowed any two chips in a pod to communicate through a dynamically reconfigurable optical path, dramatically simplifying large-scale distributed training. TPU v4 pods contained 4,096 chips.

TPU v5e and TPU v5p, launched in 2023, represented a split into cost-optimized (v5e) and performance-optimized (v5p) variants. TPU v5p carried 95 GB of HBM and delivered 918 teraFLOPS of BF16 compute, positioning it against the NVIDIA H100.

TPU v6e (Trillium), announced at Google I/O in May 2024 and entering preview that October, was the largest architectural leap between consecutive generations since v2. Trillium doubled the matrix multiply unit from 128x128 to 256x256 MACs, raised clock speed, and delivered 4.7x the peak compute of v5e per chip. It carried 32 GB of HBM at 1,640 GB/s bandwidth, ran pods of up to 256 chips, and improved energy efficiency by 4.7x over v5e.

TPU v7 (Ironwood), announced April 2025, is the subject of this article. Its successor, the eighth-generation TPU, was announced in April 2026 as two separate chips: TPU 8t for training and TPU 8i for inference, each pushing into FP4 precision territory.

Architecture

Chiplet design

Ironwood is the first Tensor Processing Unit built with a multi-chiplet package. Each Ironwood chip contains two compute dies, called Ironwood compute chiplets, bonded together in a single package. Each chiplet is functionally self-contained: it includes one TensorCore, two SparseCores, and 96 GB of HBM3E memory across four stacks. The two chiplets are joined by a die-to-die (D2D) interface that runs at a bandwidth six times faster than a single 1D inter-chip interconnect (ICI) link, making intra-package communication far faster than any off-package link.

This gives the assembled chip two TensorCores, four SparseCores, 192 GB of HBM3E across eight stacks, and a combined peak FP8 throughput of 4,614 teraFLOPS.

The package also includes separate I/O dies responsible for the ICI links that connect chips to neighbors in a pod. Separating I/O logic from compute logic allows Google to optimize each die independently and improve yields.

The chiplet design was co-developed with Broadcom, which handles packaging and supply chain logistics. Fabrication uses TSMC's N3P 3 nm process for the compute dies.

TensorCore

Each TensorCore contains a Matrix Multiply Unit (MXU) and a Vector Processing Unit (VPU).

The MXU uses a 256x256 systolic array for matrix operations. For FP8 precision, the implementation maps two FP8 operations per FP16 data path, effectively creating a 512x512 logical MAC array and providing an additional computational throughput boost over structural improvements alone. FP8 support is new to Ironwood; it is the first TPU generation with hardware FP8 tensor calculations.

The VPU handles element-wise operations such as activations, layer normalizations, and scaling. It runs alongside the MXU to handle the non-matrix portions of typical transformer inference passes.

Operating frequency is approximately 1.1 GHz, lower than the 1.75 GHz of TPU v5p. The frequency reduction reflects a deliberate trade-off: the wider MXU and higher-bandwidth memory deliver more throughput per clock, while the lower frequency reduces power and heat per compute unit.

SparseCore

Each TensorCore is paired with two SparseCores (four total per chip). SparseCore is a specialized accelerator for large embedding table lookups, which dominate the compute profile of ranking and recommendation systems. For large language models and mixture-of-experts architectures, SparseCores handle the gating and routing operations that direct tokens to appropriate expert subnetworks.

SparseCore supports efficient data distribution across chips during embedding lookups, enabling recommendation workloads that would otherwise saturate conventional memory hierarchies. This makes Ironwood viable for advertising and content-recommendation serving workloads alongside LLM inference.

AlphaChip co-design

Google used its AlphaChip system, a reinforcement-learning-based chip floorplanning tool developed by Google DeepMind, to optimize the physical layout of Ironwood dies. AlphaChip was first applied to TPU v4 and has been used for every subsequent generation. The system produces floorplans that human engineers have difficulty matching within comparable time budgets.

Per-chip specifications

Specification	Value
Generation	7th (TPU v7 / TPU7x)
Codename	Ironwood
Architecture	Dual-chiplet, multi-die package
TensorCores per chip	2
SparseCores per chip	4
MXU array size	256x256 (logical 512x512 at FP8)
Peak compute (FP8)	4,614 TFLOPS
Peak compute (BF16)	2,307 TFLOPS
HBM type	HBM3E
HBM capacity	192 GB (8 stacks, 24 GB per stack)
HBM bandwidth	7,380 GiB/s (approx. 7.37 TB/s)
ICI bandwidth	1,200 GB/s bidirectional
Data center network	100 Gbps per chip
Estimated TDP	~700 W to 1 kW per chip
Process node	TSMC N3P (3 nm class)
Cooling	Third-generation liquid cooling
Host CPU	Google Axion (Arm-based)

Pod configuration

Google offers Ironwood in two standard configurations through Google Cloud.

The 256-chip pod targets moderate-scale workloads, including dedicated inference serving for individual model deployments and smaller training runs.

The 9,216-chip superpod is the largest Ironwood configuration. At 9,216 chips and 42.5 exaFLOPS of FP8 throughput, this superpod exceeds the theoretical peak of the world's fastest traditional supercomputers (the Frontier system at Oak Ridge National Laboratory reached 1.1 exaFLOPS; El Capitan at Lawrence Livermore reached approximately 1.7 exaFLOPS). Google notes the full superpod holds 1.77 petabytes of directly addressable HBM across all chips.

The superpod draws approximately 10 megawatts of power under full load.

For deployments requiring more than 9,216 chips, multiple superpods can be interconnected through Google's data-center network (DCN) using the Jupiter fabric, scaling to hundreds of thousands of chips. Google's internal Gemini production clusters operate at this multi-superpod scale.

The VM configuration for Ironwood in Google Cloud uses the tpu7x-standard-4t machine type, which bundles 4 TPU chips with 224 vCPUs and 960 GB of RAM on a shared host. This grouping reflects the physical hardware layout: 4 Ironwood chips per liquid-cooled tray, 16 trays per rack (64 chips), paired with CPU host racks.

Supported topologies

Within a pod, chips are arranged in a 3D torus topology. The smallest deployable slice is 2x2x1 (4 chips, one host). The largest single-pod slice is 8x16x16 (2,048 chips across 512 hosts). The full superpod is assembled by connecting multiple cubes through the OCS fabric.

ICI and networking

Inter-Chip Interconnect

The Inter-Chip Interconnect (ICI) is a proprietary high-speed serial link that connects Ironwood chips within a pod. Each chip has four ICI links providing 9.6 Tbps of aggregate bidirectional bandwidth (approximately 1.2 TB/s). This represents a 1.5x bandwidth increase over Trillium's ICI.

ICI uses custom Remote Direct Memory Access (RDMA) protocols that allow chips to read and write each other's HBM without involving the host CPU. This reduces latency for collective operations such as all-reduce and all-gather, which are ubiquitous in distributed training and tensor-parallel inference.

3D torus topology

Within a cube of 64 chips, ICI links form a 3D torus mesh. In this topology, every chip connects to six neighbors (two in each of three spatial dimensions). The torus structure gives each chip multiple paths to every other chip, providing both bandwidth and fault tolerance. Three distinct parallelism axes exist, which map naturally to the tensor parallelism, pipeline parallelism, and data parallelism axes used in large model training and inference.

Optical circuit switching

Since TPU v4, Google has used Optical Circuit Switches (OCS) to interconnect chip cubes within a pod. Ironwood continues this approach. The OCS fabric manager can establish, reconfigure, and route around failed links or chips dynamically. When a chip or cube becomes unhealthy, the OCS fabric manager establishes new optical paths using designated spares, maintaining full torus connectivity without requiring a restart of the workload. This contributes to Google's reported ~99.999% fleet-wide TPU uptime since 2020.

For the largest Ironwood clusters, multiple superpods connect through Google's Jupiter data-center network, which uses electrical packet switching for inter-superpod traffic.

Memory subsystem

Each Ironwood chip carries eight stacks of HBM3E memory, providing 192 GB of capacity and approximately 7.37 TB/s of peak bandwidth. This is a dramatic increase over previous generations: Trillium (TPU v6e) carried 32 GB at 1,640 GB/s per chip, and TPU v5p carried 95 GB at 2,765 GB/s.

The 192 GB per chip was deliberately chosen to allow large language models to run with fewer chips. A 70-billion-parameter model in BF16 requires roughly 140 GB of weight memory alone. With Trillium, that model would need a minimum of 5 chips for weights; with Ironwood, two chips can accommodate it, reducing communication overhead and improving latency for serving.

The 9,216-chip superpod presents a single addressable pool of 1.77 petabytes of HBM to the Pathways distributed runtime. XLA and the Pathways layer manage sharding of model weights and KV-cache tensors across this pool transparently.

Beyond HBM, each TensorCore has a quantity of fast on-chip SRAM used for tile-level data staging. This SRAM is explicitly managed through the Pallas kernel programming interface, allowing engineers to implement custom fused operations that overlap HBM-to-SRAM transfers with MXU computation.

Bandwidth improvements vs. prior generations:

Generation	HBM capacity	HBM bandwidth	Ratio to Trillium (bandwidth)
TPU v5p	95 GB	2,765 GB/s	0.37x
TPU v6e (Trillium)	32 GB	1,640 GB/s	1.0x
TPU v7 (Ironwood)	192 GB	7,370 GB/s	4.5x

Liquid cooling

Ironwood uses Google's third-generation liquid cooling infrastructure. Liquid cooling was first introduced with TPU v3 in 2018, following thermal limits that made air cooling impractical for high-density AI accelerator racks.

The system uses cold plates mounted directly on the compute dies. A closed water loop circulates coolant through the cold plate and carries heat to facility-level cooling distribution units. Google's design keeps the water entering the cold plate chemically treated and filtered to prevent mineral deposits from blocking the narrow channels.

The thermal benefit is substantial: Google states that advanced liquid cooling sustains twice the sustained compute performance of standard air cooling under continuous heavy workloads. For Ironwood, this matters because the chip's estimated per-chip TDP is in the 700 W to 1 kW range, a power density that air cooling cannot remove reliably at high rack density.

The full 9,216-chip superpod draws approximately 10 MW under load, requiring industrial-scale cooling plant infrastructure. Google designs this cooling infrastructure in-house and co-locates it with TPU production clusters.

Performance benchmarks

The following table summarizes Ironwood's key performance figures as reported by Google at announcement:

Metric	Value	Notes
FP8 peak (per chip)	4,614 TFLOPS	First TPU generation with native FP8
BF16 peak (per chip)	2,307 TFLOPS	Standard training precision
FP8 peak (9,216-chip superpod)	42.5 ExaFLOPS	Google's headline figure
BF16 peak (9,216-chip superpod)	21.26 ExaFLOPS
HBM bandwidth (per chip)	7.37 TB/s	4.5x Trillium
HBM capacity (per chip)	192 GB	6x Trillium
ICI bandwidth (per chip)	1.2 TB/s bidirectional	1.5x Trillium
Perf/watt vs. Trillium	2x improvement	Measured at FP8 FLOPS per watt TDP
Perf/watt vs. TPU v2 (2018)	~30x improvement	Long-term efficiency trend
Peak vs. TPU v5p	10x higher peak TFLOPS
Per-chip perf vs. Trillium	4x+ better	Training and inference

Google also reports the 9,216-chip superpod has 1.77 petabytes of total addressable HBM and 88,473.6 Tbps of aggregate ICI bandwidth across all inter-chip links.

Comparison with Trillium (TPU v6e)

Specification	Trillium (TPU v6e)	Ironwood (TPU v7)	Improvement
Peak compute (FP8)	1,836 TFLOPS	4,614 TFLOPS	2.5x per chip
Peak compute (BF16)	918 TFLOPS	2,307 TFLOPS	2.5x per chip
HBM capacity	32 GB	192 GB	6x
HBM bandwidth	1,640 GB/s	7,370 GB/s	4.5x
ICI bandwidth	~800 GB/s	1,200 GB/s	1.5x
Max pod size	256 chips	9,216 chips	36x
Perf/watt	1x (baseline)	2x	2x
FP8 support	Yes	Yes	--
Chiplet design	No	Yes (2 dies)	New
HBM type	HBM	HBM3E	Newer generation

Google positions Trillium as the preferred choice for customers who need both training and inference capacity simultaneously. Ironwood is the preferred choice for pure inference serving and for training jobs that need very large per-chip memory.

Comparison with NVIDIA Blackwell

NVIDIA's Blackwell architecture, released in 2024 and 2025 across the B100, B200, and GB200 products, is the primary commercial competitor to Ironwood for large-scale AI inference infrastructure.

Specification	NVIDIA B200	NVIDIA GB200 NVL72	Ironwood (TPU v7)	Ironwood 9,216-chip pod
FP8 peak (per accelerator)	4,500 TFLOPS	5,000 TFLOPS	4,614 TFLOPS	--
HBM capacity	192 GB	192 GB	192 GB	1.77 PB
HBM bandwidth	8.0 TB/s	8.0 TB/s	7.37 TB/s	--
Max GPUs/chips per system	72 (NVL72)	72 (NVL72)	9,216 (superpod)	--
Interconnect bandwidth	14.4 Tbps (NVLink)	14.4 Tbps	9.6 Tbps (ICI)	--
FP4 support	Yes (B200)	Yes	No	--
Cooling	Liquid (NVL72)	Liquid	Liquid	--
Third-party ecosystem	Very broad	Very broad	Limited (JAX, PyTorch XLA)	--

Key differences in competitive positioning:

At the per-chip level, Ironwood and NVIDIA Blackwell B200 deliver comparable FP8 throughput (4,614 vs. 4,500 TFLOPS) and carry identical HBM capacity (192 GB). NVIDIA's NVLink provides higher per-link interconnect bandwidth (14.4 Tbps vs. 9.6 Tbps ICI), and NVIDIA's B200 and GB300 support FP4 precision for compressed inference, which Ironwood does not.

At the pod or cluster level, Google's advantage is scale. NVIDIA's largest self-contained systems top out at 72 GPUs in a single NVL72 rack. Ironwood is purpose-built to operate as 9,216 chips in a single optical-fabric domain, presenting that entire pool as one coherent parallel processor. For workloads that benefit from all-to-all or all-reduce collectives across the entire cluster, the lower per-link ICI bandwidth of Ironwood is offset by the ability to run those collectives within a single optical domain rather than across multiple compute islands.

On total cost of ownership, third-party estimates suggest Ironwood superpod costs approximately 30% less per hour than an equivalent GB200 configuration and approximately 41% less than GB300, when accounting for compute capacity and power costs.

Software stack

Compiler: XLA

All Ironwood workloads compile through XLA (Accelerated Linear Algebra), Google's domain-specific compiler for tensor programs. XLA ingests programs written in JAX or PyTorch, performs whole-program optimization including operator fusion, operation scheduling, and layout selection, then emits optimized TPU machine code. XLA's whole-program view lets it make decisions that per-operator compilers cannot, such as fusing a sequence of matrix multiplications, activations, and normalizations into a single kernel that avoids repeated round trips to HBM.

JAX

JAX is the primary framework for Ironwood. JAX programs are Python expressions that describe tensor computations using NumPy-like syntax. The jit decorator triggers XLA compilation. The grad function computes exact gradients via reverse-mode automatic differentiation, enabling training without manually writing backward passes. The shard_map primitive allows explicit specification of how tensors are partitioned across a mesh of chips.

Google's production LLM training and serving codebase is written in JAX. MaxText, an open-source LLM framework from Google, supports pre-training, supervised fine-tuning, and reinforcement learning from human feedback on Ironwood. MaxText supports popular open-weight architectures including Gemma, DeepSeek, Qwen, and Mixtral-style mixture-of-experts models.

The JAX ecosystem for Ironwood includes several production-grade libraries:

Library	Purpose
Optax	Gradient processing and optimization algorithms (AdamW, Lion, etc.)
Orbax	Asynchronous distributed checkpointing for large model arrays
Qwix	Quantization-aware training and post-training quantization
Metrax	Distributed evaluation metric computation
Tunix	Post-training pipeline orchestration
Goodput	ML training efficiency measurement and monitoring

PyTorch XLA

PyTorch users can run on Ironwood through PyTorch XLA, which translates PyTorch eager operations into XLA computations. PyTorch XLA on Ironwood supports full eager mode execution, torch.distributed for multi-chip parallelism, and torch.compile for graph-traced optimization. Google has invested specifically in making PyTorch feel idiomatic on TPUs to reduce the porting burden for teams with existing PyTorch codebases.

vLLM, the popular open-source LLM serving framework, has a TPU backend that works with Ironwood. The tpu-inference plugin provides a unified JAX/PyTorch lowering path for inference use cases, supporting paged attention and continuous batching.

Pallas

Pallas is a domain-specific language embedded in JAX for writing custom TPU kernels. It provides explicit control over the HBM-to-SRAM data movement pipeline and allows engineers to overlap memory transfers with MXU computation. Mosaic, Pallas's compiler backend, handles tiling, operator fusion, and software pipelining to produce optimized TPU machine code. Pallas is used for custom attention kernels, mixture-of-experts routing, and other operations where XLA's automatic optimization falls short of theoretical peak.

Pathways

Pathways is Google's distributed ML runtime that manages job execution across large TPU clusters. It presents a pool of TPU chips as a single machine to the programmer. Pathways handles fault recovery: if a chip or link fails during a training run, it checkpoints, reroutes around the failed hardware using the OCS fabric, and resumes from the checkpoint. This elastic execution model is critical for multi-week training jobs on tens of thousands of chips where hardware failures are statistically certain to occur.

TensorFlow is not supported on TPU7x. Google's documentation specifies JAX as the only officially supported framework for direct TPU7x use; PyTorch runs through the XLA bridge.

Availability

Vertex AI and Google Kubernetes Engine (GKE) are the primary access paths for Ironwood on Google Cloud.

Ironwood was announced at Google Cloud Next on April 9, 2025, with initial access for select customers immediately. General availability was announced on November 7, 2025. Access requires contacting a Google Cloud account team; quota is not available through the standard self-service console at launch.

TPU7x runs on the tpu7x-standard-4t VM shape: 4 TPU chips, 224 vCPUs, 960 GB RAM per VM. Storage uses Hyperdisk Balanced by default, with Hyperdisk ML also supported. Boot and attached disks use Google's standard Hyperdisk infrastructure.

GKE's Cluster Director feature provides topology-aware scheduling for Ironwood jobs, ensuring that chips assigned to a given workload are physically adjacent in the ICI fabric rather than scattered across different cubes. GKE's Inference Gateway reduces time-to-first-token latency for serving workloads by up to 96% and lowers serving costs by approximately 30% compared to naive round-robin load balancing, by routing prefill and decode phases to different chip allocations.

Vertex AI Model Garden and Vertex AI online prediction support Ironwood for hosted inference, abstracting the chip allocation and batching details from end users.

Use by Google

Gemini models

Google DeepMind has deployed Ironwood to serve Gemini 2.5 in production, as disclosed at the April 2025 launch. Gemini 2.5 is a reasoning-oriented model that uses extended chain-of-thought generation, making it particularly memory-bandwidth intensive at inference time. Ironwood's large per-chip HBM and high bandwidth reduce the number of chips needed to cache KV states for long context windows during generation.

Gemini 2.5 Pro, used for coding, reasoning, and multimodal tasks, is served from Ironwood superpods that keep the full model's KV-cache in on-chip HBM, eliminating DRAM offload latency that would otherwise degrade interactive latency.

AlphaFold and DeepMind research

Google DeepMind also uses Ironwood for AlphaFold workloads, including structure prediction and structure-guided drug discovery pipelines. AlphaFold 3's all-atom structure diffusion model requires large per-chip memory for attention over long protein sequences, making Ironwood's 192 GB HBM an operational fit.

Internal Google products

Ironwood powers inference serving for other Google internal products, including Google Search's AI Overviews feature and Google Maps' AI-assisted navigation features, though Google has not publicly disclosed the full list of production applications.

External customer adoption

Anthropic

Anthropic announced in October 2025 an expansion of its Google Cloud TPU agreement that covers access to up to 1 million TPUs, with the first phase covering approximately 400,000 Ironwood (TPU v7) chips. The deal is estimated at roughly $10 billion for the first phase of finished racks. Anthropic cited Ironwood's inference performance and price-performance improvements as the primary motivation, planning to use Ironwood both for training future Claude models and for serving Claude to users.

Lightricks

Lightricks, the maker of AI-based creative tools, used Ironwood to train its LTX-2 multimodal generative video model. Lightricks reported breakthrough training efficiency on Ironwood and was preparing inference workloads on the same hardware at the time of the April 2025 announcement.

Essential AI

Essential AI, a frontier model startup, adopted Ironwood for training large models, citing easy onboarding and immediate operational efficiency.

References

Google Blog. "Ironwood: The first Google TPU for the age of inference." April 9, 2025. https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/ironwood-tpu-age-of-inference/
Google Cloud Blog. "Inside the Ironwood TPU codesigned AI stack." 2025. https://cloud.google.com/blog/products/compute/inside-the-ironwood-tpu-codesigned-ai-stack
Google Cloud Blog. "Ironwood TPUs and new Axion-based VMs for your AI workloads." November 7, 2025. https://cloud.google.com/blog/products/compute/ironwood-tpus-and-new-axion-based-vms-for-your-ai-workloads
Google Cloud Documentation. "TPU7x (Ironwood)." https://docs.cloud.google.com/tpu/docs/tpu7x
Google Blog. "3 things to know about Ironwood, Google's latest TPU." 2025. https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/ironwood-google-tpu-things-to-know/
The Register. "Google's 7th-gen Ironwood TPUs promise 42 AI exaFLOPS pods." April 10, 2025. https://www.theregister.com/2025/04/10/googles_7thgen_ironwood_tpus_debut/
The Next Platform. "With Ironwood TPU, Google Pushes The AI Accelerator To The Floor." April 9, 2025. https://www.nextplatform.com/compute/2025/04/09/with-ironwood-tpu-google-pushes-the-ai-accelerator-to-the-floor/1660461
ServeTheHome. "Google Ironwood TPU Swings for Reasoning Model Leadership at Hot Chips 2025." https://www.servethehome.com/google-ironwood-tpu-swings-for-reasoning-model-leadership-at-hot-chips-2025/
SemiAnalysis. "Google TPUv7: The 900lb Gorilla In the Room." https://newsletter.semianalysis.com/p/tpuv7-google-takes-a-swing-at-the
XPU.pub. "Google Adds FP8 to Ironwood TPU; Can It Beat Blackwell?" April 16, 2025. https://xpu.pub/2025/04/16/google-ironwood/
CNBC. "Google unveils Ironwood, seventh generation TPU, competing with Nvidia." November 6, 2025. https://www.cnbc.com/2025/11/06/google-unveils-ironwood-seventh-generation-tpu-competing-with-nvidia.html
VentureBeat. "Google debuts AI chips with 4X performance boost, secures Anthropic megadeal worth billions." 2025. https://venturebeat.com/infrastructure/google-debuts-ai-chips-with-4x-performance-boost-secures-anthropic-megadeal
Anthropic. "Expanding our use of Google Cloud TPUs and Services." October 23, 2025. https://www.anthropic.com/news/expanding-our-use-of-google-cloud-tpus-and-services
Wikipedia. "Tensor Processing Unit." https://en.wikipedia.org/wiki/Tensor_Processing_Unit
Google Cloud Blog. "TPU transformation: A look back at 10 years of our AI-specialized chips." https://cloud.google.com/transform/ai-specialized-chips-tpu-history-gen-ai
TrendForce. "Google Unveils 7th-Gen TPU Ironwood with 9,216-Chip Superpod, Taking Aim at NVIDIA." November 7, 2025. https://www.trendforce.com/news/2025/11/07/news-google-unveils-7th-gen-tpu-ironwood-with-9216-chip-superpod-taking-aim-at-nvidia/
Tom's Hardware. "Google deploys new Axion CPUs and seventh-gen Ironwood TPU." 2025. https://www.tomshardware.com/tech-industry/artificial-intelligence/google-deploys-new-axion-cpus-and-seventh-gen-ironwood-tpu-training-and-inferencing-pods-beat-nvidia-gb300-and-shape-ai-hypercomputer-model
The Next Web. "Google launches Ironwood TPU and previews eighth-gen split into training and inference chips at TSMC 2nm." https://thenextweb.com/news/google-ironwood-tpu-inference-cloud-next

Background and TPU lineage

Architecture

Chiplet design

TensorCore

SparseCore

AlphaChip co-design

Per-chip specifications

Pod configuration

Supported topologies

ICI and networking

Inter-Chip Interconnect

3D torus topology

Optical circuit switching

Memory subsystem

Liquid cooling

Performance benchmarks

Comparison with Trillium (TPU v6e)

Comparison with NVIDIA Blackwell

Software stack

Compiler: XLA

JAX

PyTorch XLA

Pallas

Pathways

Availability

Use by Google

Gemini models

AlphaFold and DeepMind research

Internal Google products

External customer adoption

Anthropic

Lightricks

Essential AI

Meta

See also

References

Improve this article

Related Articles

MCP server

NVIDIA B200

AWS Trainium 2

NVIDIA GB300 NVL72

AMD Instinct MI355X

Cerebras WSE-3

Background and TPU lineage

Architecture

Chiplet design

TensorCore

SparseCore

AlphaChip co-design

Per-chip specifications

Pod configuration

Supported topologies

ICI and networking

Inter-Chip Interconnect

3D torus topology

Optical circuit switching

Memory subsystem

Liquid cooling

Performance benchmarks

Comparison with Trillium (TPU v6e)

Comparison with NVIDIA Blackwell

Software stack

Compiler: XLA

JAX

PyTorch XLA

Pallas

Pathways

Availability

Use by Google

Gemini models

AlphaFold and DeepMind research

Internal Google products

External customer adoption

Anthropic

Lightricks

Essential AI

Meta

See also

References