Google Virgo Network

AI Infrastructure Data Centers Google

7 min read

Updated Jun 3, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 3, 2026

Fact-checked

In review queue

Sources

10 citations

Revision

v1 · 1,435 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Virgo Network is a megascale data-center fabric introduced by Google at Google Cloud Next 2026 in April 2026. It connects large fleets of AI accelerators inside a single building and across multiple sites, and it is the scale-out networking layer of Google's AI Hypercomputer platform. A single Virgo fabric can link about 134,000 of Google's eighth-generation TPU 8t training chips with roughly 47 petabits per second of non-blocking bisection bandwidth, and the design extends past one million TPUs when stitched together across data centers into one training cluster.^[1]^[2]^[3] Google also offers Virgo to customers running large NVIDIA GPU deployments, where it supports up to 80,000 Vera Rubin GPUs in a single data center and up to 960,000 across multiple sites.^[4]^[5]

Google described Virgo as a reimagined scale-out network "custom-built for the stringent demands of modern AI workloads," built around what it calls a campus-as-a-computer philosophy: treating an entire data-center campus as one machine rather than a collection of separate clusters.^[2]^[6] The fabric was detailed in a Google Cloud networking blog post by Benny Siman-Tov, a senior director of product management, and Arjun Singh, an engineering fellow at Google Cloud.^[2]

The problem Virgo addresses

Training and serving frontier AI models is bottlenecked less by raw compute than by how fast accelerators can talk to each other. Modern models are split across thousands of chips, and the chips must constantly exchange gradients, activations, and parameters. In these synchronized workloads a single late packet can stall the entire job, so the network, not the silicon, often sets the ceiling on useful throughput.

Engineers usually split this communication into two regimes. Scale-up refers to the very high bandwidth, very low latency links inside a tightly coupled group of accelerators, such as a rack or pod, where chips behave almost like one large processor. Scale-out refers to the wider fabric that connects many of those pods together across a building or campus. NVIDIA's NVLink is a scale-up interconnect, and its current generation tops out at 576 GPUs in a single NVLink domain; reaching the thousands or hundreds of thousands of accelerators a large training run needs means crossing into the scale-out network.^[3] Virgo is Google's answer for that scale-out tier. It carries the east-west traffic between pods, which may be racks of TPUs or GPUs assembled in a scale-up configuration, and ties them into a single compute domain.^[1]^[7]

Topology and switch design

The most consequential change in Virgo is structural. Conventional large data-center networks use a three-tier Clos design, often called spine-and-leaf, where traffic between distant nodes hops through several switching layers. Each extra hop is another place where packets can queue, and queuing delay is what hurts tightly synchronized AI jobs. Virgo collapses that into a flat, two-layer non-blocking topology.^[1]^[7]^[8]

It does this with high-radix switches, meaning switches with a very large number of ports. By packing more ports into each switch, Google can connect more endpoints with fewer tiers, which cuts the hop count between any two accelerators and limits the cumulative chance of queuing along the path.^[1]^[8] Sameh Boujelbene of Dell'Oro Group summarized the logic this way: flattening "reduces hop count and creates more direct, predictable paths," which matters most for synchronized workloads where one delayed packet stalls the whole run.^[8]

Virgo is multi-planar, splitting the fabric into several independent switching planes with separate control domains. If hardware fails in one plane, the failure is isolated rather than rippling across the whole network.^[2]^[8] The fabric layers into the rest of Google's infrastructure rather than replacing it: a scale-up domain handles bandwidth inside a pod, the Virgo scale-out accelerator fabric (an RDMA network) handles pod-to-pod east-west traffic, and Google's existing Jupiter network continues to serve north-south traffic to storage and general compute.^[2]^[7] In other words, Virgo does not retire Jupiter; it works alongside it, with Jupiter providing the front-end path and Virgo the dedicated accelerator backbone.^[7]

Bandwidth, latency, and scale

Google's headline numbers position Virgo as the largest networking step it has taken for AI in years.

Metric	Reported figure
TPU 8t chips in a single fabric (one data center)	~134,000
Bisection bandwidth, single fabric	up to 47 Pb/s, non-blocking
Aggregate compute reachable in one fabric	~1.6 million ExaFLOPS
TPUs across multiple sites in one training cluster	more than 1,000,000
Bandwidth per accelerator vs prior generation	up to 4x
Unloaded fabric latency vs prior generation	~40% lower
NVIDIA Vera Rubin GPUs, single data center	up to 80,000
NVIDIA Vera Rubin GPUs, across sites	up to 960,000

The 4x bandwidth-per-accelerator and 40% lower unloaded latency figures are Google's comparisons against the previous generation of its accelerator network, paired with the TPU 8t chip, not a claim about replacing Jupiter.^[1]^[2] Google frames these gains around goodput, the share of time accelerators spend doing useful work rather than waiting on the network or recovering from faults. The design targets more than 97% goodput, using sub-millisecond telemetry, automated straggler and hang detection, and rerouting around failed links so that a localized problem does not idle a whole cluster.^[2]^[9] Optical circuit switching, which Google has been refining in Jupiter for years, lets the fabric reconfigure around failures without operator intervention.^[9]

Relationship to TPUs, GPUs, and AI Hypercomputer

Virgo was co-designed with Google's eighth-generation TPUs, announced at the same event. That generation split into two chips for the first time: TPU 8t, a high-throughput training part, and TPU 8i, an inference and reasoning chip aimed at low-latency agentic and Mixture-of-Experts workloads.^[3]^[10] Virgo is tuned for TPU 8t in particular, working with the chip's SparseCore dataflow processors to offload data-dependent all-gather operations and head off communication bottlenecks during training.^[6]^[7]

The fabric is not limited to Google silicon. It also underpins the new A5X bare-metal instances built on NVIDIA's Vera Rubin NVL72 rack-scale systems, where it serves as the scale-out network knitting those racks together. The A5X machines use NVIDIA ConnectX-9 SuperNICs to connect into Virgo, and the integration draws on the open Falcon networking protocol being developed by members of the Open Compute Project.^[4]^[5] All of this sits under AI Hypercomputer, Google Cloud's integrated stack of accelerators, storage, networking, and software, where Virgo provides the connective tissue meant to turn a campus, and eventually several campuses, into something that behaves like one supercomputer.^[2]^[6]

Comparison to NVIDIA and Broadcom fabrics

Virgo lands in a market where the major approaches to AI networking are diverging. NVIDIA pairs NVLink for dense scale-up domains with Spectrum-X Ethernet, which combines switches and data-processing units to manage congestion across GPU clusters. Broadcom supplies the high-radix switching silicon, its Tomahawk 6 and Jericho lines, that underpins many large Ethernet AI fabrics and provides the port density flat topologies depend on.^[8]

Google's distinguishing move is co-design. As a hyperscaler that builds its own TPUs, switches, and software together, it can optimize the whole stack rather than assembling merchant parts, and it has chosen to treat tail-latency consistency, not peak throughput, as the metric that matters most for AI clusters.^[8] Analyst Ron Westfall characterized the fabric as reimagining the data center as a campus-as-a-computer and treating tail latency as a hardware-reliability issue rather than a tuning afterthought.^[8] The same blueprint extends to NVIDIA hardware through A5X, which means Virgo competes with NVIDIA's own scale-out networking even while hosting NVIDIA GPUs. Whether the flat two-layer approach holds up at the million-accelerator scale Google is targeting will depend heavily on the optics and traffic distribution underneath, since at extreme size flattening alone cannot fully prevent congestion from concentrating.^[8]

References

Google Cloud Blog, "Introducing Virgo Network megascale data center fabric." https://cloud.google.com/blog/products/networking/introducing-virgo-megascale-data-center-fabric ↩
Benny Siman-Tov and Arjun Singh, Google Cloud Blog, "Introducing Virgo Network megascale data center fabric" (April 2026). https://cloud.google.com/blog/products/networking/introducing-virgo-megascale-data-center-fabric ↩
Anton Shilov, Tom's Hardware, "Inside Google's TPU V8 strategy... network scales up to 1 million TPUs per cluster" (April 22, 2026). https://www.tomshardware.com/tech-industry/semiconductors/google-splits-its-tpu-into-two-chips-for-the-first-time-with-training-and-inference-variants ↩
Google Cloud Blog, "AI infrastructure at Next '26." https://cloud.google.com/blog/products/compute/ai-infrastructure-at-next26 ↩
SDxCentral, "Google Cloud cozies up to Nvidia's Vera Rubin for new bare metal A5X AI inference VMs." https://www.sdxcentral.com/news/google-cloud-cozies-up-to-nvidias-vera-rubin-for-new-bare-metal-a5x-ai-inference-vms/ ↩
SDxCentral, "Google unveils Virgo, the scale-out fabric ditching three-layer networks to unify AI accelerators." https://www.sdxcentral.com/news/google-unveils-virgo-the-scale-out-fabric-ditching-three-layer-networks-to-unify-ai-accelerators/ ↩
Google Cloud Blog, "Data center and global networks built for AI era." https://cloud.google.com/blog/products/networking/data-center-and-global-networks-built-for-ai-era ↩
Data Center Knowledge, "How Google's Virgo Fabric Signals Shift in AI Network Design." https://www.datacenterknowledge.com/infrastructure/google-s-virgo-fabric-signals-shift-in-ai-network-design ↩
HPCwire, "Google Bolsters AI Hypercomputer with New TPU Chips, Virgo Interconnect, Speedier Lustre" (April 22, 2026). https://www.hpcwire.com/2026/04/22/google-bolsters-ai-hypercomputer-with-new-tpu-chips-virgo-interconnect-speedier-lustre/ ↩
The Register, "Google dual tracks TPU 8 to conquer training and inference" (April 22, 2026). https://www.theregister.com/2026/04/22/google_tpu8_dual_track_training_inference/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

TPU Ironwood

The problem Virgo addresses

Topology and switch design

Bandwidth, latency, and scale

Relationship to TPUs, GPUs, and AI Hypercomputer

Comparison to NVIDIA and Broadcom fabrics

References

Improve this article

Related Articles

TPU Ironwood

NVIDIA B200

AWS Trainium 2

NVIDIA GB300 NVL72

AMD Instinct MI355X

Cerebras WSE-3

What links here

Related Articles

TPU Ironwood

NVIDIA B200

AWS Trainium 2

NVIDIA GB300 NVL72

AMD Instinct MI355X

Cerebras WSE-3