NVIDIA Spectrum-XGS
Last reviewed
Jun 7, 2026
Sources
15 citations
Review status
Source-backed
Revision
v1 ยท 2,135 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 7, 2026
Sources
15 citations
Review status
Source-backed
Revision
v1 ยท 2,135 words
Add missing citations, update stale details, or suggest a clearer explanation.
NVIDIA Spectrum-XGS, marketed in full as NVIDIA Spectrum-XGS Ethernet, is a networking technology from NVIDIA designed to link several geographically separated data centers into a single, coordinated AI computing system that the company calls a "giga-scale AI super-factory." NVIDIA announced it on August 22, 2025, in connection with the Hot Chips 2025 conference held August 24 to 26 at Stanford University, describing it as a "scale-across" capability that extends the company's existing Spectrum-X Ethernet platform. Rather than introducing new switch silicon, Spectrum-XGS adds distance-adaptive software and algorithms so that a single training or inference job can run across data centers in different buildings, cities, or countries while the network keeps behaving like one fabric. NVIDIA frames scale-across as a third tier of AI networking, alongside scale-up and scale-out, and named the cloud provider CoreWeave as its first announced adopter. [1][2][3]
Large AI models are trained by spreading a single computation across many GPUs that constantly exchange gradients and activations. The speed of that exchange, not raw compute alone, often sets how quickly a deep learning job finishes, which is why interconnect technology has become a central battleground in AI infrastructure. As frontier models have grown, the GPU counts needed to train them have outrun what a single building can house, because individual sites are reaching the limits of available power, cooling, and floor space. The practical response is to run one job across two or more data centers, but ordinary Ethernet was never built for that. Over the tens or hundreds of kilometers that separate facilities, off-the-shelf Ethernet introduces high latency, jitter, and unpredictable throughput, which stalls the tightly synchronized collective operations that distributed training depends on. [1][4][5]
Spectrum-XGS is NVIDIA's answer to that gap. It is not a standalone product line so much as a capability layered onto the Spectrum-X Ethernet platform, combining Spectrum-X switches with NVIDIA ConnectX-8 SuperNICs and a set of "distance-aware" algorithms. According to Gilad Shainer, NVIDIA's senior vice president of networking, "It's not a new hardware element, but it's leveraging the Spectrum-X infrastructure, and the new algorithms effectively move more data across longer distances." NVIDIA director Dave Salvator described the goal as letting separate data centers "essentially act as one gigantic GPU." The first organization publicly committed to deploying it is CoreWeave, whose cofounder and chief technology officer Peter Salanki said the company would use it to "connect our data centers into a single, unified supercomputer." [2][3][6]
NVIDIA organizes AI networking into three nested tiers, and Spectrum-XGS introduces the outermost one.
Scale-up is the connection of GPUs inside a single server or rack so that they behave like one large accelerator. NVIDIA's NVLink and NVLink Switch fabric handle this tier, tying the GPUs in a system together with very high bandwidth and very low latency over distances measured in centimeters.
Scale-out is the connection of many servers into a cluster inside one data center. This is the job of the base Spectrum-X Ethernet platform, or alternatively NVIDIA's InfiniBand fabric, which links thousands of nodes across a building so they can train or serve a model together. Distances here span the data center hall, typically up to a few hundred meters.
Scale-across, the tier Spectrum-XGS adds, connects whole data centers to one another. NVIDIA describes scale-across as taking effect beyond roughly 500 meters and extending over "tens or hundreds of miles, across cities or even states and countries." The intent is that the boundary between facilities becomes largely invisible to the workload: a job scheduled across two campuses sees one continuous, predictable network rather than a fast local cluster bolted onto a slow long-haul link. NVIDIA chief executive Jensen Huang summarized the framing this way: "With NVIDIA Spectrum-XGS Ethernet, we add scale-across to scale-up and scale-out capabilities to link data centers across cities, nations and continents into vast, giga-scale AI super-factories." [2][4][7]
It is worth distinguishing Spectrum-XGS from the base Spectrum-X Photonics and the standard Spectrum-X platform. Base Spectrum-X is the scale-out fabric for a single site, and Spectrum-X Photonics is a co-packaged-optics physical layer for that fabric. Spectrum-XGS reuses the same Spectrum-X switches and SuperNICs but adds the algorithmic layer needed to make the fabric span sites. In NVIDIA's telling, the three names describe the same family of hardware applied to progressively larger distances. [2][8]
Because Spectrum-XGS is primarily a software and algorithmic addition, its mechanisms center on compensating for distance rather than on new chips. NVIDIA describes four main techniques, all "fully integrated into the Spectrum-X platform." [1][4]
The first is distance-adaptive, auto-adjusted congestion control. Standard Ethernet treats a rack-level hop and a continent-spanning hop the same way, which leads to either packet loss or queue buildup over long links. Spectrum-XGS instead measures the actual distance and traffic conditions between sites and tunes its congestion thresholds accordingly. NVIDIA notes that signals propagate through fiber at roughly 5 nanoseconds per meter, so a 1 kilometer span adds about 5 microseconds of one-way delay; the algorithms account for this propagation time so that the SuperNIC sets an appropriate injection rate depending on whether the destination is co-located or remote. [4][6]
The second is precision latency management, which keeps the added delay of distance from translating into unpredictable jitter. Rather than relying on the deep packet buffers that traditional long-haul Ethernet uses to absorb bursts (and which introduce variable, hard-to-predict delays), Spectrum-XGS leans on end-to-end, telemetry-based congestion management to hold performance steady. The third technique is that end-to-end telemetry itself, a continuous feedback loop that lets the network observe and react to conditions along the full path between data centers. The fourth is that the routing is both distance-aware and "NCCL-aware," meaning it is tuned to the communication patterns of the NVIDIA Collective Communications Library, the software that GPUs use to perform operations such as all-reduce during training. [4][6]
NVIDIA attributes specific performance gains to this approach. The company states that Spectrum-XGS "nearly doubles the performance of the NVIDIA Collective Communications Library." In its technical materials NVIDIA quantifies this as up to 1.9 times higher NCCL all-reduce bandwidth compared with off-the-shelf Ethernet, based on running NCCL primitives across sites separated by 10 kilometers, with the largest gains appearing on the big message sizes most common in AI training. These figures are NVIDIA's own and should be read as vendor claims. They should also not be confused with the separate, broader Spectrum-X figure of roughly 1.6 times the effective bandwidth of conventional Ethernet, which refers to the base scale-out platform rather than to scale-across. [1][4][9]
| Attribute | Detail |
|---|---|
| Full name | NVIDIA Spectrum-XGS Ethernet |
| Announced | August 22, 2025 (in connection with Hot Chips 2025, August 24 to 26, Stanford University) |
| Category | "Scale-across" Ethernet networking for AI |
| Part of | NVIDIA Spectrum-X Ethernet platform |
| Key hardware | Spectrum-X Ethernet switches plus ConnectX-8 SuperNICs (800 Gb/s class) |
| Core innovation | Distance-adaptive algorithms (a software addition, not new switch silicon) |
| Main techniques | Auto-adjusted distance congestion control; precision latency management; end-to-end telemetry; NCCL-aware routing |
| Performance claim (NVIDIA) | "Nearly doubles" NCCL performance; up to 1.9x NCCL all-reduce bandwidth vs off-the-shelf Ethernet (tested at 10 km) |
| Effective distance | From about 500 meters up to hundreds of kilometers, across cities, states, and countries |
| First announced adopter | CoreWeave |
| Stated purpose | Unite multiple distributed data centers into one "giga-scale AI super-factory" |
| Availability | Offered as part of the Spectrum-X platform |
CoreWeave is the headline early adopter. The GPU-focused cloud provider, which held its initial public offering in March 2025, has been assembling a large multi-site footprint and was a natural fit for a technology that stitches campuses together. In July 2025 CoreWeave agreed to acquire the data center operator Core Scientific in an all-stock deal that, by the company's account, would bring roughly 1.3 gigawatts of gross power across a national footprint under CoreWeave's control, with more than an additional gigawatt of potential expansion. CoreWeave reported about 2.2 gigawatts of total contracted power as of the second quarter of 2025. A provider operating at that scale, across many separate buildings, is exactly the kind of customer for whom a single building can no longer hold a frontier training run. [10][11]
Peter Salanki framed the appeal in terms of unification: "With NVIDIA Spectrum-XGS, we can connect our data centers into a single, unified supercomputer, giving our customers access to giga-scale AI that will accelerate breakthroughs across every industry." At launch, NVIDIA said Spectrum-XGS was available as part of the Spectrum-X Ethernet platform, with CoreWeave among the first to deploy it. [1][2]
Spectrum-XGS arrived during a wider push by NVIDIA to make Ethernet, not just InfiniBand, the default fabric for AI. In the same period NVIDIA highlighted that other large operators were standardizing on Spectrum-X Ethernet for scale-out networking, including Meta, which planned to build switches on NVIDIA Spectrum Ethernet for its Facebook Open Switching System, and Oracle, which said it would build giga-scale AI supercomputers using Spectrum-X switches. Those commitments concern the base Spectrum-X platform rather than the scale-across Spectrum-XGS layer specifically, but together they signal the momentum behind the broader family that Spectrum-XGS extends. [12]
The strategic logic behind Spectrum-XGS is that power, not silicon, has become the binding constraint on AI. Frontier training clusters now draw hundreds of megawatts, and the largest planned "AI factories" are described in gigawatts. Few individual sites can supply that much power, secure the grid connections, or dissipate the resulting heat, so operators are increasingly forced to distribute a single workload across multiple campuses. If the network between those campuses cannot keep the GPUs busy, the distributed cluster runs far below its theoretical capacity and the extra sites add cost without adding much usable throughput. Spectrum-XGS is NVIDIA's attempt to remove that penalty and make multi-site training behave more like single-site training. [1][5][7]
The move also fits NVIDIA's commercial strategy. Independent analysts have noted that NVIDIA became one of the fastest-growing vendors in data center switching, and that Ethernet revenue in the segment is projected to climb into the tens of billions of dollars over the next several years. By owning the scale-up tier (NVLink), the scale-out tier (Spectrum-X and InfiniBand), and now the scale-across tier (Spectrum-XGS), NVIDIA can offer a single, vertically integrated networking stack from inside the server out to the inter-city link, which deepens customer lock-in around its accelerators. [3][5]
There are caveats. The performance numbers are NVIDIA's own and were measured at a relatively short 10 kilometer separation; physics still imposes real latency over longer distances, and how Spectrum-XGS performs across hundreds of kilometers in production remains to be demonstrated publicly. Distributed training across sites also raises practical questions about resiliency, data movement, and cost that a faster network alone does not resolve. Even so, Spectrum-XGS is notable as one of the first commercial networking products explicitly built around the premise that a single AI job will routinely span multiple data centers, and it formalizes "scale-across" as a named tier of AI infrastructure that other vendors are likely to address in their own ways. [4][5][13]