Google TPU 8t
Last reviewed
Jun 3, 2026
Sources
6 citations
Review status
Source-backed
Revision
v1 · 1,568 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
6 citations
Review status
Source-backed
Revision
v1 · 1,568 words
Add missing citations, update stale details, or suggest a clearer explanation.
Google TPU 8t is the training-focused member of Google's eighth-generation Tensor Processing Unit family, previewed at Google Cloud Next on April 22, 2026. It was announced alongside a sibling inference chip, the TPU 8i, and the pairing marks the first time Google has split a single TPU generation into two purpose-built parts: one tuned for large-scale model training and one tuned for low-latency serving. Several outlets report that the 8t carries the internal codename Sunfish and that it was co-designed with Broadcom, while the inference part (codenamed Zebrafish) was developed with MediaTek.[1][2][3] Google itself frames both chips as the product of a decade of TPU development built in partnership with Google DeepMind.[4]
The headline number Google led with is scale. A single TPU 8t superpod links 9,600 chips into one memory domain, holds about 2 petabytes of shared high bandwidth memory, and delivers roughly 121 FP4 ExaFLOPS of compute, close to three times the per-pod compute of the seventh-generation TPU Ironwood.[4][1] Google positions the chip as a way to compress the frontier model development cycle "from months to weeks," and the strategic message is that the agentic AI era rewards specialized silicon over a single general-purpose accelerator.[4][5]
For seven generations, each TPU was a single design that handled both training and inference. Ironwood, the 2025 part, was already pitched heavily toward inference, but it remained one chip doing both jobs. With the eighth generation Google decided the two workloads had drifted too far apart to share a die. Training is dominated by dense matrix math across enormous synchronized clusters, while serving modern reasoning and mixture of experts models is dominated by memory bandwidth, fast collectives, and low latency at smaller scale. HyperFRAME Research summarized Google's reasoning bluntly: a single topology cannot efficiently serve both dense training and agent-swarm decoding at once.[5]
So the family forked. The TPU 8t (Sunfish) is the training engine, sold in large superpods. The TPU 8i (Zebrafish) is the serving engine, sold in much smaller pods of up to about 1,024 to 1,152 chips with three times the on-chip SRAM and a new low-diameter network. This is a different bet from NVIDIA, whose Vera Rubin generation keeps scaling a single GPU design up inside dense racks, and from Amazon, whose Trainium line stays a single SKU.[2][5] Whether splitting the line is genuinely better or simply harder to manage is something the market will sort out over the next few years, but it is a real divergence in philosophy.
Reporting from coverage of the announcement describes the 8t package as a multi-die design rather than a single monolithic chip. The package reportedly carries two compute dies plus a separate I/O die, with each compute die feeding its memory through several stacks of HBM3e. The NextPlatform account put the configuration at two compute dies and one I/O die, with four SparseCores total for embedding-heavy work and matrix units inside each TensorCore.[1] StorageReview's writeup described the memory subsystem as six 12-high HBM3e stacks totaling 216 GB at about 6.5 TB/s per chip.[3]
The process node is the one spec where public accounts disagree. The brief and some secondary write-ups describe a leading-edge TSMC node in the 2-nanometer class, while NextPlatform's analysis placed the chips on TSMC's N3 (3-nanometer) family.[1] Google did not state a node in its own materials, so the exact geometry should be treated as unconfirmed. The chips are hosted by Google's own Axion Arm-based CPU rather than an x86 host, continuing the move toward a fully Google-designed stack.[4][6]
The per-chip and per-pod numbers Google published in its technical deep dive are summarized below, with Ironwood shown for comparison where figures are available.
| Specification | TPU 8t (Sunfish, training) | TPU 8i (Zebrafish, inference) | TPU Ironwood (v7) |
|---|---|---|---|
| Peak FP4 compute per chip | 12.6 PFLOPS | 10.1 PFLOPS | not directly comparable |
| HBM capacity per chip | 216 GB (HBM3e) | 288 GB (HBM3e) | 192 GB |
| HBM bandwidth per chip | 6,528 GB/s | 8,601 GB/s | ~7.4 TB/s |
| On-chip SRAM (Vmem) | 128 MB | 384 MB (3x prior gen) | smaller |
| Die configuration (reported) | 2 compute dies + I/O die, six 12-Hi HBM3e stacks | single compute die, four HBM stacks | single die |
| Scale-up domain | 9,600-chip superpod | up to ~1,024 to 1,152-chip pod | 9,216-chip superpod |
| Superpod shared HBM | ~2 PB | smaller | smaller |
| Superpod compute | ~121 FP4 ExaFLOPS | ~11.6 FP8 ExaFLOPS per pod | ~1/3 the per-pod 8t figure |
| Network topology | 3D torus | Boardfly (low-diameter) | 3D torus |
| Host CPU | Google Axion (Arm) | Google Axion (Arm) | Axion |
Sources: Google Cloud technical deep dive for per-chip compute, memory, SRAM, and superpod size; NextPlatform and StorageReview for die and stack configuration.[6][1][3]
A few of these figures are worth a second look. The 8t actually carries a little more memory than Ironwood (216 GB versus 192 GB) but slightly lower per-chip bandwidth, which NextPlatform pegged at roughly 11 to 12 percent lower than Ironwood's. Google's gains at the chip level come less from raw bandwidth and more from the FP4 number format, larger pods, and a much faster fabric. The inference part is the reverse: the 8i has more memory and noticeably higher bandwidth per chip than the 8t, which makes sense given that serving is bandwidth-bound.[1][6]
The 9,600-chip superpod is the unit Google sells for training, and it is a single shared-memory domain rather than a loose collection of racks. Inside it, the roughly 2 petabytes of pooled HBM lets very large models and their optimizer states sit in fast memory without constant spilling. Google connects 8t chips with a 3D torus inter-chip interconnect that reportedly doubles the scale-up bandwidth of Ironwood and increases data-center-network bandwidth by up to 4x.[6]
Above the superpod, Google described a data-center fabric called Virgo that can stitch up to about 134,000 TPU 8t chips into one non-blocking network, quoted at 47 petabits per second of bisection bandwidth, and then extend past a million chips across multiple sites for the largest training runs.[6][2] Tom's Hardware framed the million-TPU cluster ceiling as a deliberate counter to NVIDIA's rack-scale approach, the idea being that Google competes on fabric and pod-level throughput rather than on winning any single socket-to-socket comparison. By NextPlatform's reckoning, NVIDIA's Vera Rubin R200 still holds something like a 3:1 edge in raw compute per socket, so Google's argument rests on scale and price rather than peak per-chip horsepower.[1][2]
Google's comparisons are all drawn against Ironwood, the seventh-generation part it shipped in 2025. The cleanest claims:
The companion 8i claims up to 80 percent better performance-per-dollar than Ironwood for inference, which Google translates into serving roughly twice as many users at the same cost.[4][6] These are vendor numbers measured on Google's own workloads, so they are best read as directional rather than independently benchmarked. Pricing for either chip was not announced, and several analysts noted that a fuller technical accounting is expected in a Hot Chips 2026 paper.[1]
Both eighth-generation TPUs were shown as a preview at Cloud Next, with Sundar Pichai holding the physical chips on stage. Google said the parts will be generally available to Google Cloud customers later in 2026 through its AI Hypercomputer platform, with interested customers able to request access ahead of time.[4][2] That 2026 window is consistent across Google's own posts, Tom's Hardware, and NextPlatform; a stray report of a late-2027 launch appears to conflate the chips with a longer fleet ramp and is not supported by Google's stated timeline.[4][1][2]
Like earlier TPUs, the 8t will not be sold as standalone hardware. It is meant to be rented as part of Google Cloud's managed AI Hypercomputer stack, alongside the Axion CPUs, the Virgo and Boardfly networks, and Google's storage and software layers. That packaging is itself part of the strategy: Google is selling an integrated training system, and the 8t is the engine at the center of it.