Google Virgo Network
Last reviewed
Jun 3, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,435 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,435 words
Add missing citations, update stale details, or suggest a clearer explanation.
Virgo Network is a megascale data-center fabric introduced by Google at Google Cloud Next 2026 in April 2026. It connects large fleets of AI accelerators inside a single building and across multiple sites, and it is the scale-out networking layer of Google's AI Hypercomputer platform. A single Virgo fabric can link about 134,000 of Google's eighth-generation TPU 8t training chips with roughly 47 petabits per second of non-blocking bisection bandwidth, and the design extends past one million TPUs when stitched together across data centers into one training cluster.[1][2][3] Google also offers Virgo to customers running large NVIDIA GPU deployments, where it supports up to 80,000 Vera Rubin GPUs in a single data center and up to 960,000 across multiple sites.[4][5]
Google described Virgo as a reimagined scale-out network "custom-built for the stringent demands of modern AI workloads," built around what it calls a campus-as-a-computer philosophy: treating an entire data-center campus as one machine rather than a collection of separate clusters.[2][6] The fabric was detailed in a Google Cloud networking blog post by Benny Siman-Tov, a senior director of product management, and Arjun Singh, an engineering fellow at Google Cloud.[2]
Training and serving frontier AI models is bottlenecked less by raw compute than by how fast accelerators can talk to each other. Modern models are split across thousands of chips, and the chips must constantly exchange gradients, activations, and parameters. In these synchronized workloads a single late packet can stall the entire job, so the network, not the silicon, often sets the ceiling on useful throughput.
Engineers usually split this communication into two regimes. Scale-up refers to the very high bandwidth, very low latency links inside a tightly coupled group of accelerators, such as a rack or pod, where chips behave almost like one large processor. Scale-out refers to the wider fabric that connects many of those pods together across a building or campus. NVIDIA's NVLink is a scale-up interconnect, and its current generation tops out at 576 GPUs in a single NVLink domain; reaching the thousands or hundreds of thousands of accelerators a large training run needs means crossing into the scale-out network.[3] Virgo is Google's answer for that scale-out tier. It carries the east-west traffic between pods, which may be racks of TPUs or GPUs assembled in a scale-up configuration, and ties them into a single compute domain.[1][7]
The most consequential change in Virgo is structural. Conventional large data-center networks use a three-tier Clos design, often called spine-and-leaf, where traffic between distant nodes hops through several switching layers. Each extra hop is another place where packets can queue, and queuing delay is what hurts tightly synchronized AI jobs. Virgo collapses that into a flat, two-layer non-blocking topology.[1][7][8]
It does this with high-radix switches, meaning switches with a very large number of ports. By packing more ports into each switch, Google can connect more endpoints with fewer tiers, which cuts the hop count between any two accelerators and limits the cumulative chance of queuing along the path.[1][8] Sameh Boujelbene of Dell'Oro Group summarized the logic this way: flattening "reduces hop count and creates more direct, predictable paths," which matters most for synchronized workloads where one delayed packet stalls the whole run.[8]
Virgo is multi-planar, splitting the fabric into several independent switching planes with separate control domains. If hardware fails in one plane, the failure is isolated rather than rippling across the whole network.[2][8] The fabric layers into the rest of Google's infrastructure rather than replacing it: a scale-up domain handles bandwidth inside a pod, the Virgo scale-out accelerator fabric (an RDMA network) handles pod-to-pod east-west traffic, and Google's existing Jupiter network continues to serve north-south traffic to storage and general compute.[2][7] In other words, Virgo does not retire Jupiter; it works alongside it, with Jupiter providing the front-end path and Virgo the dedicated accelerator backbone.[7]
Google's headline numbers position Virgo as the largest networking step it has taken for AI in years.
| Metric | Reported figure |
|---|---|
| TPU 8t chips in a single fabric (one data center) | ~134,000 |
| Bisection bandwidth, single fabric | up to 47 Pb/s, non-blocking |
| Aggregate compute reachable in one fabric | ~1.6 million ExaFLOPS |
| TPUs across multiple sites in one training cluster | more than 1,000,000 |
| Bandwidth per accelerator vs prior generation | up to 4x |
| Unloaded fabric latency vs prior generation | ~40% lower |
| NVIDIA Vera Rubin GPUs, single data center | up to 80,000 |
| NVIDIA Vera Rubin GPUs, across sites | up to 960,000 |
The 4x bandwidth-per-accelerator and 40% lower unloaded latency figures are Google's comparisons against the previous generation of its accelerator network, paired with the TPU 8t chip, not a claim about replacing Jupiter.[1][2] Google frames these gains around goodput, the share of time accelerators spend doing useful work rather than waiting on the network or recovering from faults. The design targets more than 97% goodput, using sub-millisecond telemetry, automated straggler and hang detection, and rerouting around failed links so that a localized problem does not idle a whole cluster.[2][9] Optical circuit switching, which Google has been refining in Jupiter for years, lets the fabric reconfigure around failures without operator intervention.[9]
Virgo was co-designed with Google's eighth-generation TPUs, announced at the same event. That generation split into two chips for the first time: TPU 8t, a high-throughput training part, and TPU 8i, an inference and reasoning chip aimed at low-latency agentic and Mixture-of-Experts workloads.[3][10] Virgo is tuned for TPU 8t in particular, working with the chip's SparseCore dataflow processors to offload data-dependent all-gather operations and head off communication bottlenecks during training.[6][7]
The fabric is not limited to Google silicon. It also underpins the new A5X bare-metal instances built on NVIDIA's Vera Rubin NVL72 rack-scale systems, where it serves as the scale-out network knitting those racks together. The A5X machines use NVIDIA ConnectX-9 SuperNICs to connect into Virgo, and the integration draws on the open Falcon networking protocol being developed by members of the Open Compute Project.[4][5] All of this sits under AI Hypercomputer, Google Cloud's integrated stack of accelerators, storage, networking, and software, where Virgo provides the connective tissue meant to turn a campus, and eventually several campuses, into something that behaves like one supercomputer.[2][6]
Virgo lands in a market where the major approaches to AI networking are diverging. NVIDIA pairs NVLink for dense scale-up domains with Spectrum-X Ethernet, which combines switches and data-processing units to manage congestion across GPU clusters. Broadcom supplies the high-radix switching silicon, its Tomahawk 6 and Jericho lines, that underpins many large Ethernet AI fabrics and provides the port density flat topologies depend on.[8]
Google's distinguishing move is co-design. As a hyperscaler that builds its own TPUs, switches, and software together, it can optimize the whole stack rather than assembling merchant parts, and it has chosen to treat tail-latency consistency, not peak throughput, as the metric that matters most for AI clusters.[8] Analyst Ron Westfall characterized the fabric as reimagining the data center as a campus-as-a-computer and treating tail latency as a hardware-reliability issue rather than a tuning afterthought.[8] The same blueprint extends to NVIDIA hardware through A5X, which means Virgo competes with NVIDIA's own scale-out networking even while hosting NVIDIA GPUs. Whether the flat two-layer approach holds up at the million-accelerator scale Google is targeting will depend heavily on the optics and traffic distribution underneath, since at extreme size flattening alone cannot fully prevent congestion from concentrating.[8]