InfiniBand
Last reviewed
May 31, 2026
Sources
15 citations
Review status
Source-backed
Revision
v2 ยท 2,539 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 31, 2026
Sources
15 citations
Review status
Source-backed
Revision
v2 ยท 2,539 words
Add missing citations, update stale details, or suggest a clearer explanation.
InfiniBand is a high throughput, low latency networking interconnect standard used to connect servers, storage, and accelerators inside high performance computing systems and large artificial intelligence training clusters. It was designed from the start as a switched fabric, which means every device links to a switch through a dedicated point to point connection rather than sharing a single bus or a contended medium. The standard is governed by the InfiniBand Trade Association, an industry body formed in 1999, and it is maintained as an open specification that multiple vendors can build to [1][2]. In modern practice the technology is closely associated with NVIDIA, which acquired the leading InfiniBand vendor Mellanox and now sells the dominant switch and adapter line under the Quantum and ConnectX brands [3][4].
InfiniBand matters for AI because training a large model spreads the computation across hundreds or thousands of GPUs that must constantly exchange gradients and activations. The speed of that exchange sets a ceiling on how well the cluster scales. When the network stalls, expensive accelerators sit idle waiting for data. InfiniBand reduces that stall by moving data with very low latency and by letting one machine read or write the memory of another without involving the remote processor. That property, together with high per port bandwidth, is why the interconnect shows up so often in the systems near the top of the TOP500 list and in commercial GPU superclusters [5].
A cluster has two very different communication problems. Inside a single server, a handful of GPUs need to talk to each other at enormous bandwidth, and NVIDIA solves that with NVLink, a dedicated GPU to GPU link. Between servers, the traffic crosses a room or a data hall, and that is where InfiniBand fits. It is the scale out fabric that ties many nodes into one logical machine, while NVLink handles scale up inside the node. The two are complementary rather than competing, and a typical NVIDIA GPU pod uses both layers together.
The other communication problem is generality. A fabric for AI also carries storage traffic, management traffic, and the collective operations that synchronize a distributed job. InfiniBand was built as a general server interconnect, so it handles all of these on the same wires. Storage systems can present remote volumes over the fabric, and parallel file systems common in high performance computing run well on it because of the low latency.
The defining feature of InfiniBand is Remote Direct Memory Access, usually shortened to RDMA. With ordinary TCP networking, sending data means copying it through kernel buffers, building packets in software, and interrupting the processor on both ends. RDMA removes most of that work. The network adapter transfers data directly between the application memory of two machines, bypassing the operating system kernel and avoiding extra copies. This is often called kernel bypass and zero copy. The result is lower latency, lower processor overhead, and higher effective bandwidth, because the cores are free to compute instead of shuffling bytes [6][7].
The hardware that makes this possible is the channel adapter. A host channel adapter, or HCA, sits in each server and connects the host to the fabric. It implements the InfiniBand transport in silicon, so the work of segmenting messages, handling acknowledgements, and enforcing reliable delivery happens on the adapter rather than on the host processor. NVIDIA sells these adapters under the ConnectX name, and the network interface in a modern AI server is usually one of these parts [4][6]. A second class of device, the target channel adapter, connects peripherals such as storage controllers.
Applications do not call the operating system for each transfer. Instead they post work requests to queues that the adapter services directly. A pair of these queues, one for sends and one for receives, is called a queue pair, and it is the basic communication endpoint in the InfiniBand model. Completion queues tell the application when a transfer has finished. Because the application talks to the adapter through these queues in user space, the data path never traps into the kernel on the fast path [6].
The fabric itself is switched. Servers connect to switches, switches connect to other switches, and large fabrics are built in multi level topologies, with the fat tree being the most common choice for HPC and AI. A fat tree gives full bandwidth between any two nodes if it is built without oversubscription, which is what training jobs want. Routers join separate InfiniBand subnets when a single subnet is not enough.
A piece that has no direct equivalent in plain Ethernet is the subnet manager, often abbreviated SM. One node in each subnet runs this software. It discovers the topology, assigns a local identifier, called a LID, to every port, computes the forwarding tables that the switches use, and programs them. It keeps doing this as links come and go. Because routing is centrally computed and pushed to the switches, the fabric can be tuned deterministically, which helps the predictable performance that collective operations depend on [1][2].
Links are built from lanes. A single lane carries a serial signal in each direction, and ports gang several lanes together for more bandwidth. The common widths are 1x, 4x, and 12x, and the workhorse server port has been 4x for most of the technology's history. The headline data rate of a generation usually refers to a 4x port, so a 400 gigabit per second NDR port is four lanes of roughly 100 gigabits each [2][8].
InfiniBand has advanced through a sequence of named speed grades, each roughly doubling the per lane signaling rate of the one before. The early generations used 8b/10b line encoding, which spends two bits in ten on framing, so the usable data rate is 80 percent of the raw signaling rate. From FDR onward the encoding moved to the far more efficient 64b/66b, and the newest grades use PAM4 modulation, which sends two bits per signal step and pairs it with forward error correction. The table below lists the per lane signaling rate and the effective per port data rate for a 4x link, which is the figure normally quoted [2][8][9].
| Generation | Per lane signaling | 4x port effective data rate | Encoding | Era |
|---|---|---|---|---|
| SDR | 2.5 Gb/s | 8 Gb/s | 8b/10b | early 2000s |
| DDR | 5 Gb/s | 16 Gb/s | 8b/10b | mid 2000s |
| QDR | 10 Gb/s | 32 Gb/s | 8b/10b | around 2008 |
| FDR | 14.0625 Gb/s | about 56 Gb/s | 64b/66b | around 2011 |
| EDR | 25.78125 Gb/s | 100 Gb/s | 64b/66b | around 2014 |
| HDR | 53.125 Gb/s | 200 Gb/s | 64b/66b | around 2018 |
| NDR | about 106.25 Gb/s | 400 Gb/s | PAM4 | around 2021 |
| XDR | about 212.5 Gb/s | 800 Gb/s | PAM4 | from 2024 |
The named grades come from a marketing vocabulary that the IBTA has used for years, where SDR is single data rate, DDR is double, QDR is quad, and then FDR, EDR, HDR, NDR, and XDR continue the pattern as fourteen, enhanced, high, next, and extreme data rate. The association publishes a roadmap that extends past XDR to GDR, so the cadence is expected to continue [2][8]. Higher widths multiply these numbers, so a 12x XDR link reaches into the multiple terabit range, though 4x and the aggregation of many 4x ports is the usual way large fabrics are built.
After NVIDIA completed its purchase of Mellanox in April 2020, in a deal valued at about 6.9 billion dollars and first announced in March 2019, the InfiniBand product line became part of NVIDIA's networking business [3]. NVIDIA brands its InfiniBand switches as Quantum. The HDR generation switch was the original Quantum at 200 gigabits per port. Quantum-2 followed at the NDR generation, delivering 400 gigabit ports, with a single switch providing 64 ports of 400 gigabits built from 32 physical cages that can also split into 128 ports of 200 gigabits [4][10]. At its 2024 developer conference NVIDIA introduced Quantum-X800, an XDR class platform that pushes per port speed to 800 gigabits and was positioned squarely at trillion parameter scale AI [10][11].
A feature that sets the NVIDIA fabric apart is in network computing through SHARP, the Scalable Hierarchical Aggregation and Reduction Protocol. Distributed training spends a large share of its network time on collective operations, the most important being the all reduce that sums gradients across every GPU and hands the result back to all of them. Normally that sum is computed by passing partial results around the network many times. SHARP moves the arithmetic into the switches themselves. As data flows up the tree, switch silicon adds the contributions together, so the reduction happens once, in the fabric, instead of being carried back and forth among the endpoints. This cuts the volume of data on the wire and lowers the time the collective takes, and NVIDIA reports meaningful speedups for the operations that dominate large training runs [12][13]. SHARP is one of the clearest examples of treating the network as an active part of the computer rather than a passive pipe.
The natural alternative to InfiniBand is Ethernet, which is cheaper, more familiar, and present in every data center already. The gap in raw capability narrowed once Ethernet gained RDMA through RoCE, short for RDMA over Converged Ethernet. RoCE lets an Ethernet network carry the same kernel bypass transfers that InfiniBand pioneered, and the latest version routes those transfers across ordinary IP networks. With RoCE, an organization can get much of the latency benefit of RDMA while staying on Ethernet switches and Ethernet operations skills.
The practical differences come down to how each fabric handles congestion and loss. InfiniBand uses a credit based flow control scheme in which a sender only transmits when the receiver has signaled that it has buffer space, so the fabric is effectively lossless by design and behaves predictably under heavy load. Ethernet was built to tolerate dropped packets and to recover from them, and making it lossless enough for RDMA requires careful tuning of priority flow control and congestion notification. When that tuning is right, RoCE performs very well, but it is more sensitive to misconfiguration, and the failure modes can be subtle at the scale of an AI cluster. InfiniBand trades some of Ethernet's ubiquity and lower cost for behavior that is easier to make deterministic [7][14].
A newer challenger is Ultra Ethernet, the work of the Ultra Ethernet Consortium, an effort hosted by the Linux Foundation and launched in 2023 with backing from a broad group of vendors and large operators. Its stated goal is to take Ethernet and rework its transport, congestion control, and packet handling specifically for AI and HPC workloads, so that an open, multi vendor Ethernet fabric can match what a purpose built interconnect offers at this scale [14][15]. Ultra Ethernet is an attempt to close the gap from the Ethernet side, and it reflects how much money is now flowing into AI fabrics. Whether it displaces InfiniBand at the high end or simply gives buyers a credible second option is still playing out, and several of the largest operators are building very large clusters on Ethernet variants today.
InfiniBand is the default compute fabric in NVIDIA's reference architectures for large training. The DGX SuperPOD design, NVIDIA's blueprint for stitching many GPU servers into one supercomputer, uses Quantum InfiniBand for the network that carries traffic between nodes, with the current generation built on Quantum-2 at 400 gigabits and the newer designs moving to 800 [10][11]. Inside each server NVLink connects the GPUs, and InfiniBand connects the servers, so a job sees a smooth hierarchy of bandwidth from the local GPUs out to the rest of the cluster. Many of the named supercomputers and research clusters listed on the TOP500 use InfiniBand for the same reasons, and the interconnect has long held a large share of the systems on that list [5].
The reason the fabric earns its place is the combination of three things. Bandwidth keeps the GPUs fed, low latency keeps synchronization tight, and SHARP offloads the collectives that would otherwise saturate the network. For a training run where every step ends in a global synchronization, shaving microseconds and cutting traffic at each step compounds across millions of steps into a real reduction in wall clock time and cost. This is why operators who measure their clusters in tens of thousands of GPUs have been willing to pay the premium that InfiniBand carries over commodity Ethernet.
InfiniBand is not free of drawbacks. It costs more than Ethernet for comparable port speeds, the operational model is different enough that teams need specific expertise, and for most of its life the ecosystem has been dominated by a single supplier, which became even more concentrated after the Mellanox acquisition put the leading switches and adapters under one roof. That concentration gives buyers a coherent, well integrated stack, but it also leaves them dependent on one vendor's roadmap and pricing.
The competitive picture is shifting because Ethernet is improving fast and because several of the largest AI builders prefer an open fabric they can source from many vendors. RoCE already serves large RDMA deployments, and Ultra Ethernet aims to give the open ecosystem a transport tuned for AI. Some hyperscale operators have publicly built enormous GPU clusters on Ethernet rather than InfiniBand, which shows that the high end is no longer a single horse race. At the same time NVIDIA continues to push InfiniBand forward with each Quantum generation and also sells its own Ethernet line, Spectrum-X, for customers who want RDMA on Ethernet from the same vendor. The likely outcome is a market with two strong fabric families, InfiniBand where deterministic low latency and tight integration with NVIDIA's stack matter most, and advanced Ethernet where openness, cost, and reuse of existing skills win. For now InfiniBand remains the reference choice for the largest and most performance sensitive AI training systems, and its role connecting GPUs across the data hall is firmly established [5][10][14].