Google TPU 8t

AI Hardware Google

8 min read

Updated Jun 3, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 3, 2026

Fact-checked

In review queue

Sources

6 citations

Revision

v1 · 1,568 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Google TPU 8t is the training-focused member of Google's eighth-generation Tensor Processing Unit family, previewed at Google Cloud Next on April 22, 2026. It was announced alongside a sibling inference chip, the TPU 8i, and the pairing marks the first time Google has split a single TPU generation into two purpose-built parts: one tuned for large-scale model training and one tuned for low-latency serving. Several outlets report that the 8t carries the internal codename Sunfish and that it was co-designed with Broadcom, while the inference part (codenamed Zebrafish) was developed with MediaTek.^[1]^[2]^[3] Google itself frames both chips as the product of a decade of TPU development built in partnership with Google DeepMind.^[4]

The headline number Google led with is scale. A single TPU 8t superpod links 9,600 chips into one memory domain, holds about 2 petabytes of shared high bandwidth memory, and delivers roughly 121 FP4 ExaFLOPS of compute, close to three times the per-pod compute of the seventh-generation TPU Ironwood.^[4]^[1] Google positions the chip as a way to compress the frontier model development cycle "from months to weeks," and the strategic message is that the agentic AI era rewards specialized silicon over a single general-purpose accelerator.^[4]^[5]

Why Google split the TPU in two

For seven generations, each TPU was a single design that handled both training and inference. Ironwood, the 2025 part, was already pitched heavily toward inference, but it remained one chip doing both jobs. With the eighth generation Google decided the two workloads had drifted too far apart to share a die. Training is dominated by dense matrix math across enormous synchronized clusters, while serving modern reasoning and mixture of experts models is dominated by memory bandwidth, fast collectives, and low latency at smaller scale. HyperFRAME Research summarized Google's reasoning bluntly: a single topology cannot efficiently serve both dense training and agent-swarm decoding at once.^[5]

So the family forked. The TPU 8t (Sunfish) is the training engine, sold in large superpods. The TPU 8i (Zebrafish) is the serving engine, sold in much smaller pods of up to about 1,024 to 1,152 chips with three times the on-chip SRAM and a new low-diameter network. This is a different bet from NVIDIA, whose Vera Rubin generation keeps scaling a single GPU design up inside dense racks, and from Amazon, whose Trainium line stays a single SKU.^[2]^[5] Whether splitting the line is genuinely better or simply harder to manage is something the market will sort out over the next few years, but it is a real divergence in philosophy.

Architecture

Reporting from coverage of the announcement describes the 8t package as a multi-die design rather than a single monolithic chip. The package reportedly carries two compute dies plus a separate I/O die, with each compute die feeding its memory through several stacks of HBM3e. The NextPlatform account put the configuration at two compute dies and one I/O die, with four SparseCores total for embedding-heavy work and matrix units inside each TensorCore.^[1] StorageReview's writeup described the memory subsystem as six 12-high HBM3e stacks totaling 216 GB at about 6.5 TB/s per chip.^[3]

The process node is the one spec where public accounts disagree. The brief and some secondary write-ups describe a leading-edge TSMC node in the 2-nanometer class, while NextPlatform's analysis placed the chips on TSMC's N3 (3-nanometer) family.^[1] Google did not state a node in its own materials, so the exact geometry should be treated as unconfirmed. The chips are hosted by Google's own Axion Arm-based CPU rather than an x86 host, continuing the move toward a fully Google-designed stack.^[4]^[6]

The per-chip and per-pod numbers Google published in its technical deep dive are summarized below, with Ironwood shown for comparison where figures are available.

Specification	TPU 8t (Sunfish, training)	TPU 8i (Zebrafish, inference)	TPU Ironwood (v7)
Peak FP4 compute per chip	12.6 PFLOPS	10.1 PFLOPS	not directly comparable
HBM capacity per chip	216 GB (HBM3e)	288 GB (HBM3e)	192 GB
HBM bandwidth per chip	6,528 GB/s	8,601 GB/s	~7.4 TB/s
On-chip SRAM (Vmem)	128 MB	384 MB (3x prior gen)	smaller
Die configuration (reported)	2 compute dies + I/O die, six 12-Hi HBM3e stacks	single compute die, four HBM stacks	single die
Scale-up domain	9,600-chip superpod	up to ~1,024 to 1,152-chip pod	9,216-chip superpod
Superpod shared HBM	~2 PB	smaller	smaller
Superpod compute	~121 FP4 ExaFLOPS	~11.6 FP8 ExaFLOPS per pod	~1/3 the per-pod 8t figure
Network topology	3D torus	Boardfly (low-diameter)	3D torus
Host CPU	Google Axion (Arm)	Google Axion (Arm)	Axion

Sources: Google Cloud technical deep dive for per-chip compute, memory, SRAM, and superpod size; NextPlatform and StorageReview for die and stack configuration.^[6]^[1]^[3]

A few of these figures are worth a second look. The 8t actually carries a little more memory than Ironwood (216 GB versus 192 GB) but slightly lower per-chip bandwidth, which NextPlatform pegged at roughly 11 to 12 percent lower than Ironwood's. Google's gains at the chip level come less from raw bandwidth and more from the FP4 number format, larger pods, and a much faster fabric. The inference part is the reverse: the 8i has more memory and noticeably higher bandwidth per chip than the 8t, which makes sense given that serving is bandwidth-bound.^[1]^[6]

Pod and superpod scale

The 9,600-chip superpod is the unit Google sells for training, and it is a single shared-memory domain rather than a loose collection of racks. Inside it, the roughly 2 petabytes of pooled HBM lets very large models and their optimizer states sit in fast memory without constant spilling. Google connects 8t chips with a 3D torus inter-chip interconnect that reportedly doubles the scale-up bandwidth of Ironwood and increases data-center-network bandwidth by up to 4x.^[6]

Above the superpod, Google described a data-center fabric called Virgo that can stitch up to about 134,000 TPU 8t chips into one non-blocking network, quoted at 47 petabits per second of bisection bandwidth, and then extend past a million chips across multiple sites for the largest training runs.^[6]^[2] Tom's Hardware framed the million-TPU cluster ceiling as a deliberate counter to NVIDIA's rack-scale approach, the idea being that Google competes on fabric and pod-level throughput rather than on winning any single socket-to-socket comparison. By NextPlatform's reckoning, NVIDIA's Vera Rubin R200 still holds something like a 3:1 edge in raw compute per socket, so Google's argument rests on scale and price rather than peak per-chip horsepower.^[1]^[2]

Performance versus Ironwood

Google's comparisons are all drawn against Ironwood, the seventh-generation part it shipped in 2025. The cleanest claims:

About 3x the compute per pod, driven by the larger 9,600-chip superpod and the FP4 format.^[4]^[1]
Up to 2.7x better training price-performance, the figure Google lists in its technical deep dive table. The launch keynote and some coverage rounded this to "up to 2.8x," so the headline you see depends on which Google source you read.^[6]^[4]^[5]
About 2x better performance per watt across the generation, with NextPlatform measuring the training part specifically at closer to 2x and the inference part at roughly 1.8x.^[4]^[1]

The companion 8i claims up to 80 percent better performance-per-dollar than Ironwood for inference, which Google translates into serving roughly twice as many users at the same cost.^[4]^[6] These are vendor numbers measured on Google's own workloads, so they are best read as directional rather than independently benchmarked. Pricing for either chip was not announced, and several analysts noted that a fuller technical accounting is expected in a Hot Chips 2026 paper.^[1]

Availability

Both eighth-generation TPUs were shown as a preview at Cloud Next, with Sundar Pichai holding the physical chips on stage. Google said the parts will be generally available to Google Cloud customers later in 2026 through its AI Hypercomputer platform, with interested customers able to request access ahead of time.^[4]^[2] That 2026 window is consistent across Google's own posts, Tom's Hardware, and NextPlatform; a stray report of a late-2027 launch appears to conflate the chips with a longer fleet ramp and is not supported by Google's stated timeline.^[4]^[1]^[2]

Like earlier TPUs, the 8t will not be sold as standalone hardware. It is meant to be rented as part of Google Cloud's managed AI Hypercomputer stack, alongside the Axion CPUs, the Virgo and Boardfly networks, and Google's storage and software layers. That packaging is itself part of the strategy: Google is selling an integrated training system, and the 8t is the engine at the center of it.

References

Timothy Prickett Morgan, "With TPU 8, Google Makes GenAI Systems Much Better, Not Just Bigger," The Next Platform, April 24, 2026. https://www.nextplatform.com/compute/2026/04/24/with-tpu-8-google-makes-genai-systems-much-better-not-just-bigger/ ↩
"Inside Google's TPU V8 strategy, delivering two chips for two crucial tasks at incredible scale," Tom's Hardware, April 2026. https://www.tomshardware.com/tech-industry/semiconductors/google-splits-its-tpu-into-two-chips-for-the-first-time-with-training-and-inference-variants ↩
"Google Announces TPU 8t Sunfish and TPU 8i Zebrafish," StorageReview, May 1, 2026. https://www.storagereview.com/news/google-announces-tpu-8t-sunfish-and-tpu-8i-zebrafish ↩
"Our eighth generation TPUs: two chips for the agentic era," Google (The Keyword blog), April 22, 2026. https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/eighth-generation-tpu-agentic-era/ ↩
"Google Cloud Next 2026: Google Cloud Bifurcates the AI Future," HyperFRAME Research, April 22, 2026. https://hyperframeresearch.com/2026/04/22/google-cloud-next-2026-google-cloud-bifurcates-the-ai-future-specialized-tpu-8t-and-8i-architectures-signal-the-end-of-general-purpose-silicon/ ↩
"TPU 8t and TPU 8i technical deep dive," Google Cloud Blog, April 2026. https://cloud.google.com/blog/products/compute/tpu-8t-and-tpu-8i-technical-deep-dive ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

Google TPU 8i

Why Google split the TPU in two

Architecture

Pod and superpod scale

Performance versus Ironwood

Availability

References

Improve this article

Related Articles

Tensor Processing Unit (TPU)

TPU Pod

TPU Chip

TPU Device

TPU Master

TPU Node