UALink
Last reviewed
May 31, 2026
Sources
14 citations
Review status
Source-backed
Revision
v3 ยท 1,992 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 31, 2026
Sources
14 citations
Review status
Source-backed
Revision
v3 ยท 1,992 words
Add missing citations, update stale details, or suggest a clearer explanation.
UALink (Ultra Accelerator Link) is an open industry standard for scale-up interconnect between AI accelerators. It defines a high-speed, low-latency link that lets a large group of accelerators read and write each other's memory directly inside a single tightly coupled domain that the standard calls a pod. UALink is governed by the UALink Consortium, and its first specification, UALink 200G 1.0, was published in April 2025. [1][2] The standard is positioned as an open alternative to NVIDIA's proprietary NVLink, and it is meant to work alongside scale-out fabrics such as Ultra Ethernet rather than replace them. [3][4]
Training and serving large models splits the networking problem into two layers, and UALink targets only one of them.
The first layer is scale-up. This is the connection among accelerators that act as one large compute resource, often a rack or a few racks wired together. Within a scale-up domain the accelerators behave almost like a single machine. They share memory at the hardware level, so one accelerator can issue a load or a store against the memory of another accelerator and get the data back in well under a microsecond. The traffic here is dense and latency-sensitive, and it is dominated by the all-to-all exchanges that tensor and expert parallelism generate. NVLink is the best known example of a scale-up fabric, and UALink is built to do the same job through an open specification. [3][5]
The second layer is scale-out. This connects many pods together into a cluster that can hold tens of thousands of accelerators. Scale-out runs over a packet network, usually some form of Ethernet or InfiniBand, and it carries the more bursty, longer-distance traffic of data parallelism and gradient exchange. Ultra Ethernet is the open effort aimed at this tier. [4]
The distinction matters because the two layers have different physics. Scale-up wants memory semantics, very low latency, and a flat address space across a bounded number of devices. Scale-out wants reach, routing, congestion control, and the ability to grow to enormous size. UALink and Ultra Ethernet were designed by overlapping groups of companies precisely so that an open stack could cover both without forcing buyers into one vendor's proprietary fabric for either. [4][6]
The effort started as the UALink Promoter Group, announced on 30 May 2024 by eight companies: AMD, Broadcom, Cisco, Google, Hewlett Packard Enterprise, Intel, Meta, and Microsoft. [6] The group set out to define a single industry standard for high-speed scale-up communication, and it said the first specification would draw on existing technology including AMD's Infinity Fabric, the inter-chip link AMD already used to connect its own GPUs. [6][7]
In October 2024 the group incorporated as the UALink Consortium, a formal standards body. At incorporation the board of directors lined up as AMD, Astera Labs, Amazon Web Services, Cisco, Google, Hewlett Packard Enterprise, Intel, Meta, Microsoft, and Synopsys. [8] Broadcom, one of the original eight promoters, was not part of that board lineup, and reporting at the time linked its lower profile to its own push for Ethernet-based scale-up switching using its Tomahawk and Jericho silicon. [8][5]
Membership runs in tiers. Promoter or board members steer the consortium, contributor members take part in the technical working groups, and adopter members implement the finished specification in products. By the time the 1.0 specification shipped, the consortium reported that it had grown well past 100 members across those tiers, spanning chip designers, hyperscalers, system builders, and intellectual-property vendors. [1][9] One large name is absent by design. NVIDIA is not a member, which fits the fact that UALink competes head on with its NVLink business. [9][3]
The consortium released UALink 200G 1.0 on 8 April 2025. [1] The name refers to the per-lane signaling target. Each lane runs at up to 200 gigatransfers per second, and the physical layer builds on the IEEE 802.3dj 200 Gb/s per-lane Ethernet PHY, the same electrical foundation that the Ultra Ethernet effort uses. Reusing that PHY lets switch and retimer vendors share signaling work across the scale-up and scale-out layers. [1][2]
A UALink connection can be one, two, or four lanes wide. The specification calls a four-lane group a Station, and at that full x4 width a single link delivers up to 800 Gb/s in each direction. [1][2] The headline scaling number is the pod size. A UALink pod can hold up to 1,024 accelerators joined through UALink switches, each accelerator carrying a 10-bit routing identifier, and inside that pod any accelerator can reach any other. [1][2]
What makes UALink a scale-up fabric rather than just a fast network is its memory semantics. The protocol carries load, store, and atomic operations, so software can treat remote accelerator memory much like local memory instead of packing data into network messages and waiting for a software stack to move it. [1][2] The specification organizes this into four layers. A physical layer handles the electrical signaling, a data link layer adds framing and error handling, a transaction layer manages the read and write operations, and a protocol layer sits on top. The design is kept deliberately lean so that latency and silicon cost stay low, which is the whole point of a scale-up link. [1][2]
| Attribute | UALink 200G 1.0 |
|---|---|
| Released | 8 April 2025 [1] |
| Per-lane rate | Up to 200 GT/s [1][2] |
| Physical layer | Based on IEEE 802.3dj 200 Gb/s Ethernet PHY [1][2] |
| Link widths | x1, x2, x4 (a 4-lane group is a Station) [1][2] |
| Bandwidth at x4 | Up to 800 Gb/s per direction [1][2] |
| Max accelerators per pod | 1,024 [1][2] |
| Topology | Switched, via UALink switches [1] |
| Operation model | Memory-semantic load, store, atomic [1][2] |
| Protocol stack | Physical, data link, transaction, protocol layers [1][2] |
| Latency | Low and deterministic within a pod [1][3] |
| Governing body | UALink Consortium [1] |
NVLink is NVIDIA's in-house scale-up fabric, and it is the incumbent that UALink set out to challenge. The technologies aim at the same target from opposite philosophies. NVLink is closed and works only with NVIDIA accelerators, while UALink is published as a multi-vendor specification that any member can build to. [3][9]
NVIDIA also has a long head start. NVLink is in its fifth generation in the Blackwell products, where it provides 1.8 TB/s of total link bandwidth per GPU, and NVIDIA's NVLink Switch ties 72 Blackwell GPUs into one domain in the GB200 NVL72 design, with a roadmap that points toward far larger domains. [10][11] On raw device count the UALink ceiling of 1,024 accelerators per pod is higher than today's shipping NVLink domains, but that is a specification limit rather than a product that buyers can rack today. [2][12] NVLink is mature and deployed at scale, and UALink had no shipping silicon when the 1.0 document arrived. [12][5]
The per-GPU bandwidth figures are not directly comparable, because NVIDIA quotes aggregate bidirectional bandwidth across many lanes per GPU while the UALink number is stated per lane and per direction. The more useful contrast is the model. NVLink locks the scale-up fabric to one supplier, and UALink tries to turn that fabric into a commodity that several accelerator vendors, switch vendors, and retimer vendors can supply. [3][5]
UALink does not try to be a full data-center network. It stops at the edge of the pod. For traffic between pods, the same broad coalition has been building Ultra Ethernet through the Ultra Ethernet Consortium, an open profile of Ethernet tuned for AI and high-performance computing. The intended split is clean. UALink carries the memory-semantic, low-latency traffic inside a pod, and Ultra Ethernet carries the routed, longer-distance traffic across pods to form a full cluster. [4][6] Because many of the same companies sit in both groups, an open scale-up layer and an open scale-out layer can be combined into a system that avoids proprietary fabrics at either tier. [4]
The interconnect, not just the accelerator, has become one of the strongest sources of lock-in in AI infrastructure. A buyer who commits to NVLink for scale-up has, in practice, committed to NVIDIA accelerators as well, because nothing else speaks the fabric. UALink is an attempt by the rest of the industry to break that coupling and make the scale-up layer an open, shared standard the way PCIe and Ethernet became open standards in earlier eras. [3][6]
The lineup behind it is what gives the effort weight. The promoters and board include the companies that design competing accelerators, the hyperscalers that buy them by the hundreds of thousands, and the connectivity vendors that would build the switches and retimers. [6][8] If the standard succeeds, accelerator makers gain a credible answer to NVLink without each having to invent a fabric, and cloud operators gain leverage and a second source for a part of the stack that has had only one. [5][9]
As of the 1.0 release in 2025, UALink was a specification rather than a product. No shipping accelerators or switches implemented it at launch, and coverage of the release stressed that finishing the document was the easy part next to building an ecosystem of compliant silicon. [12][5] Industry expectations put the first UALink-based products in 2026 or later, since chips, switch ASICs, and retimers all have to be designed, fabricated, and validated against the standard. [12][5]
The consortium continues to add members and to develop follow-on work through its contributor tier. [1][9] The near-term reality is that NVIDIA's NVLink remains the dominant scale-up fabric in deployed systems, and UALink's success depends on how fast its backers turn the paper standard into hardware that customers can buy. [12][5]
The clearest limitation is timing. A standard with no shipping silicon cannot yet be benchmarked against a fifth-generation fabric that is already running at scale, and the gap between a published spec and a validated product is where many open standards have stalled. [12][5] There is also a question of commitment. Several backers run their own scale-up interconnects, including Google's in-house links, Amazon's accelerator fabric, and Broadcom's Ethernet-based approach, which leaves open how fully each one will standardize on UALink for its own designs. [5][8]
UALink is also narrow by design. It addresses scale-up only, so a complete cluster still needs a separate scale-out network, and the open vision depends on Ultra Ethernet and the surrounding ecosystem maturing in parallel. [4] Finally, the absence of NVIDIA means the standard cannot become truly universal across all accelerators in the market, and its momentum will rise or fall with the share of compute that the consortium's members ship. [9][3]