Ultra Ethernet
Last reviewed
May 31, 2026
Sources
15 citations
Review status
Source-backed
Revision
v1 ยท 2,733 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 31, 2026
Sources
15 citations
Review status
Source-backed
Revision
v1 ยท 2,733 words
Add missing citations, update stale details, or suggest a clearer explanation.
Ultra Ethernet is an open networking specification that reworks Ethernet for large AI training clusters and high performance computing. It is developed by the Ultra Ethernet Consortium (UEC), a group hosted by the Linux Foundation Joint Development Foundation, and its central piece is a new transport protocol called Ultra Ethernet Transport (UET). The consortium published version 1.0 of the specification on June 11, 2025. [1][2] The goal is easy to state and hard to engineer. Take the cheap, ubiquitous, multi vendor Ethernet that already wires up most data centers, and make it good enough to move the enormous, bursty, synchronized traffic that modern AI fabrics generate, so operators do not have to reach for a proprietary fabric like InfiniBand to get top performance. [3][4]
Training a large model spreads the computation across thousands or tens of thousands of accelerators that have to exchange gradients and activations constantly. The network sits on the critical path. When a collective operation such as an all reduce runs, every participating GPU waits for the slowest message, so a single congested link or one dropped packet can stall the whole step. AI traffic also looks nothing like ordinary data center traffic. It arrives in tight, synchronized bursts from many senders at once, and it uses a small number of very large flows rather than many small ones, so those few flows can easily pile onto the same path and create hotspots. [4][5]
The usual way to run RDMA over Ethernet is RoCE, short for RDMA over Converged Ethernet. RoCE works, and hyperscalers have run very large clusters on it, but it carries assumptions that fight against AI scale. Classic RoCE wants a lossless network, which it gets through Priority Flow Control, and it expects packets within a flow to arrive in order. Lossless behavior is fragile at scale because a pause frame can spread backward through the fabric and cause congestion to bloom in places far from the original problem. The in order requirement is the bigger limiter. Because each flow must stay on one path to preserve ordering, the standard approach cannot easily spread a single large flow across the many equal cost links that a Clos fabric provides, so expensive bandwidth sits idle while one path saturates. The common workaround, hashing flows onto paths with ECMP, falls apart when there are only a few elephant flows, because two of them can hash to the same link and there is nothing to rebalance them. Tuning RoCE for a giant training cluster has become specialist work, and the result is often vendor specific. [5][6]
Ultra Ethernet keeps the parts of Ethernet that make it attractive, the open standards, the broad supplier base, the existing optics and switch silicon, and replaces the transport behavior that does not suit AI. It is built to tolerate some loss rather than demand a perfectly lossless fabric, to use every available path at once, and to hand data to the application even when packets show up out of order. A stated design tenet is that no existing Ethernet infrastructure has to be ripped out to adopt it. [3][4]
The Ultra Ethernet Consortium was announced on July 19, 2023, as a Joint Development Foundation project hosted by the Linux Foundation. The founding members were AMD, Arista, Broadcom, Cisco, Eviden (an Atos business), Hewlett Packard Enterprise, Intel, Meta, and Microsoft, a lineup that pairs the companies building the chips and switches with the companies running some of the largest AI fleets in the world. The founders seeded four working groups, for the physical layer, the link layer, the transport layer, and the software layer. [7][8] Membership grew fast. By the end of 2024 the consortium counted more than one hundred member companies and over fifteen hundred participants, and it has been described as one of the fastest growing efforts in the Linux Foundation's history. [4][9]
Notably, NVIDIA joined the consortium in 2024, after the founding group formed. [9] That is worth a moment, because NVIDIA owns InfiniBand through its Mellanox acquisition and also sells Spectrum X, its own Ethernet platform tuned for AI. Its participation signals that even the dominant supplier of AI fabrics expects a standard, multi vendor Ethernet transport to matter, while it continues to sell its own offerings alongside. [9]
Rather than a single short document, the 1.0 release is a large stack of specifications, running to more than five hundred and fifty pages, that together cover the network from the wire up to the software interface. The consortium also published a condensed paper written by the primary designers to make the full text easier to digest. The work is organized into layers, and the released material spans the physical layer, the link layer, the transport layer, and the software API, plus supporting areas such as storage, management, compliance, and debug. [1][2][10]
The table below sketches the layers and what each one contributes.
| Layer | What it covers | Examples of what 1.0 adds |
|---|---|---|
| Physical | Signaling and the electrical or optical link | Profiles aligned with existing high speed Ethernet, with support for QSFP-DD and OSFP optics so current cabling carries over |
| Link | Behavior on a single hop | Link Level Retry (LLR) to recover lost frames locally, and credit based flow control for more precise back pressure than blunt pause |
| Transport | End to end delivery, the heart of the spec | Ultra Ethernet Transport with multipath packet spraying, out of order delivery, congestion control, and built in security |
| Software | The interface applications use | An open, largely connectionless API so RDMA style and collective workloads can adopt it without a rewrite |
The link layer additions are quietly important. Link Level Retry lets two adjacent devices resend a dropped frame on the spot instead of waiting for the endpoints to notice and recover, which keeps a little packet loss from turning into a large latency spike. Credit based flow control gives finer grained back pressure than the all or nothing pause of older Ethernet, so the network can slow a sender gently rather than freezing a whole priority class. [2][10]
UET is the reason the specification exists, and it is where the AI specific thinking lives. A few ideas do most of the work.
The first is multipath packet spraying. Instead of pinning a flow to one path, UET sprays the packets of a single message across many paths through the fabric at once. A large all reduce that would have hammered one link now fans out across the whole Clos, which raises link utilization and smooths out the hotspots that wreck tail latency. Spraying only works if the receiver can cope with packets that arrive in a jumble, which leads to the second idea. [3][5]
UET supports out of order delivery to the upper layers. Traditional RDMA insists on strict ordering, so the moment you spray packets you break it. UET instead lets the transport place incoming data into the right memory location as it arrives, even if packet seven shows up before packet five, and it signals completion to the application using the operation semantics rather than raw byte order. For the bulk transfers that dominate AI traffic this removes the ordering tax and is what makes spraying practical. The transport offers several delivery modes, so workloads that still want ordered or idempotent semantics can ask for them. [2][3]
Third is congestion control built for this traffic. UET defines new sender driven and receiver driven control algorithms so the fabric can react fast to the synchronized bursts of incast, where many GPUs blast one destination at the same instant, while keeping queues shallow and throughput high. Because the design tolerates a little loss and recovers quickly, it does not lean on a perfectly lossless network the way classic RoCE does, which sidesteps the spreading congestion that a pause frame can cause. [2][5]
Fourth is security as a first class feature rather than a bolt on. UET defines a Transport Security Sub-layer that borrows proven ideas from IPsec and from Google's open PSP protocol, using AES-GCM encryption, key derivation, and replay protection to authenticate and encrypt traffic at the transport level. It is built to keep working with packet spraying, and it uses group keying so the many endpoints in one security domain can talk securely without each pair holding heavy per connection state. That matters when many customers or teams share one cluster and need real isolation without losing performance. [11][12]
UET also leans on lightweight, mostly connectionless state instead of the heavy long lived connections of older transports. The API is connectionless, and a peer to peer reliability context can be set up without extra latency, sometimes established by the first packet in nanoseconds. That choice is what lets a single endpoint talk to a vast number of peers, which is exactly the pattern in a many to many collective across a huge cluster. [2][10]
It is worth being precise about where the intelligence sits. In this design the fabric of switches can stay relatively simple, spraying packets and signaling congestion, while the smart, stateful work of reassembly, retransmission, and completion lives at the endpoints in the network interface. That split matters for cost and scale, because plain Ethernet switch silicon is cheap and plentiful, and pushing the hard parts to the NIC means the fabric does not need expensive per flow state in every switch. It does raise the bar for the endpoint hardware, since the NIC or accelerator now has to track out of order arrivals and drive congestion control at line rate, which is one reason mature implementations depend on purpose built silicon. [5][6]
Ultra Ethernet is a scale out technology. Scale out means the back end network that ties together the many servers, racks, and pods of a cluster, the layer where a training job spanning thousands of accelerators does its cross node communication. This is the territory InfiniBand has owned in the largest AI and HPC systems, and matching InfiniBand class performance on open Ethernet at that scale is the explicit aim. The specification is designed to support a single fabric scaling toward a million endpoints, which is the order of magnitude that the biggest planned AI builds are reaching for. [3][4]
Scale out is a different problem from scale up, and the two are often confused. Scale up is the dense, very high bandwidth link inside a single server or rack that lets a handful of accelerators behave almost like one big accelerator with shared memory, the role NVLink plays for NVIDIA. The open standard aimed at that job is UALink, short for Ultra Accelerator Link, a memory semantic interconnect that ties together up to 1,024 accelerators over very short distances inside a pod. UALink and Ultra Ethernet are complementary rather than competing. A typical large AI infrastructure build would use a scale up fabric such as UALink inside each pod to bind accelerators tightly, then use Ultra Ethernet as the scale out fabric to connect those pods into a cluster of many thousands of accelerators. The comparison below lines up the three approaches at the back end layer. [3][13]
| Ultra Ethernet (UET) | InfiniBand | RoCE v2 (classic) | |
|---|---|---|---|
| Type | Open, multi vendor standard | Largely single vendor (NVIDIA) | Open standard, but tuning often vendor specific |
| Role | Scale out back end fabric | Scale out back end fabric | Scale out back end fabric |
| Path use | Multipath packet spraying across many links | Adaptive routing | Typically one path per flow |
| Ordering | Out of order delivery to the application | In order within a flow | In order within a flow |
| Loss model | Tolerates some loss, fast recovery | Engineered lossless | Wants lossless via Priority Flow Control |
| Congestion control | New sender and receiver driven algorithms | Credit based, mature | DCQCN and similar, needs careful tuning |
| Security | Encryption and authentication in the transport | Add on | Add on |
The importance of Ultra Ethernet is mostly economic and strategic. AI fabrics have become one of the larger line items in a data center build, and a single vendor fabric concentrates both cost and supply risk. An open standard backed by nearly every relevant chip, switch, and NIC maker, and by the hyperscalers writing the checks, gives buyers a second path with real competition behind it. [1][7]
Real silicon is the gating factor, and it is arriving. Broadcom began shipping its Tomahawk Ultra switch in July 2025, a chip aimed at low latency AI and HPC fabrics, and NIC and accelerator vendors including AMD, by way of its Pensando networking line, are building UET capable endpoints. [14][15] Because the physical and link choices stay close to mainstream Ethernet, switches and optics can be reused, which lowers the barrier to deployment compared with adopting an entirely separate fabric. The software story helps too. A connectionless API that fits existing models lets the collective communication libraries that AI frameworks already call sit on top of UET, so an operator can in principle swap the fabric underneath a training job rather than rebuilding the stack. The first wave of products that fully implement the 1.0 transport, expected through late 2025 and into 2026, is what turns the specification from paper into running clusters. [2][4]
A published specification is not the same as a deployed, debugged ecosystem. The hard part of any fabric is interoperability at scale, and the proof will come from multi vendor clusters with tens of thousands of endpoints running real training jobs, not from the document itself. The spec is also enormous, well over five hundred pages, and a long, complex standard takes time to implement consistently across vendors. [2][10] InfiniBand has a long head start, a mature software stack, and years of field hardening, so it will not be displaced quickly even where Ultra Ethernet matches it on paper. NVIDIA also continues to push Spectrum X, its own AI tuned Ethernet, which means even within the Ethernet camp there will be more than one way to build a fabric. [9] And out of order delivery, while powerful, pushes more reassembly and state handling onto the endpoint, so the gains depend on NICs and accelerators that implement the transport efficiently in hardware. [6][10]
Even with those caveats, Ultra Ethernet is the clearest sign yet that the industry wants the back end of AI clusters to run on an open standard. Whether it reaches InfiniBand class performance in practice will be settled in production over the next few years, but the direction is set and the membership behind it is broad. [1][3]