InfiniBand

AI Hardware AI Infrastructure

14 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

16 citations

Revision

v3 · 2,759 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

InfiniBand is a high throughput, low latency networking interconnect standard used to connect servers, storage, and accelerators inside high performance computing systems and large artificial intelligence training clusters. It was designed from the start as a switched fabric, which means every device links to a switch through a dedicated point to point connection rather than sharing a single bus or a contended medium. The standard is governed by the InfiniBand Trade Association, an industry body formed in 1999, and it is maintained as an open specification that multiple vendors can build to ^[1]^[2]. As of the November 2024 TOP500 ranking of the world's fastest supercomputers, InfiniBand was the interconnect in 66 of the TOP100 systems and held roughly half of the full list, which is why it is widely described as the default fabric for HPC and AI ^[5]^[16]. In modern practice the technology is closely associated with NVIDIA, which acquired the leading InfiniBand vendor Mellanox for 6.9 billion dollars and now sells the dominant switch and adapter line under the Quantum and ConnectX brands ^[3]^[4].

InfiniBand matters for AI because training a large model spreads the computation across hundreds or thousands of GPUs that must constantly exchange gradients and activations. The speed of that exchange sets a ceiling on how well the cluster scales. When the network stalls, expensive accelerators sit idle waiting for data. InfiniBand reduces that stall by moving data with very low latency and by letting one machine read or write the memory of another without involving the remote processor. That property, together with high per port bandwidth, is why the interconnect shows up so often in the systems near the top of the TOP500 list and in commercial GPU superclusters ^[5].

What is InfiniBand used for?

A cluster has two very different communication problems. Inside a single server, a handful of GPUs need to talk to each other at enormous bandwidth, and NVIDIA solves that with NVLink, a dedicated GPU to GPU link. Between servers, the traffic crosses a room or a data hall, and that is where InfiniBand fits. It is the scale out fabric that ties many nodes into one logical machine, while NVLink handles scale up inside the node. The two are complementary rather than competing, and a typical NVIDIA GPU pod uses both layers together.

The other communication problem is generality. A fabric for AI also carries storage traffic, management traffic, and the collective operations that synchronize a distributed job. InfiniBand was built as a general server interconnect, so it handles all of these on the same wires. Storage systems can present remote volumes over the fabric, and parallel file systems common in high performance computing run well on it because of the low latency.

How does InfiniBand work?

The defining feature of InfiniBand is Remote Direct Memory Access, usually shortened to RDMA. With ordinary TCP networking, sending data means copying it through kernel buffers, building packets in software, and interrupting the processor on both ends. RDMA removes most of that work. The network adapter transfers data directly between the application memory of two machines, bypassing the operating system kernel and avoiding extra copies. This is often called kernel bypass and zero copy. The result is lower latency, lower processor overhead, and higher effective bandwidth, because the cores are free to compute instead of shuffling bytes ^[6]^[7].

The hardware that makes this possible is the channel adapter. A host channel adapter, or HCA, sits in each server and connects the host to the fabric. It implements the InfiniBand transport in silicon, so the work of segmenting messages, handling acknowledgements, and enforcing reliable delivery happens on the adapter rather than on the host processor. NVIDIA sells these adapters under the ConnectX name, and the network interface in a modern AI server is usually one of these parts ^[4]^[6]. A second class of device, the target channel adapter, connects peripherals such as storage controllers.

Applications do not call the operating system for each transfer. Instead they post work requests to queues that the adapter services directly. A pair of these queues, one for sends and one for receives, is called a queue pair, and it is the basic communication endpoint in the InfiniBand model. Completion queues tell the application when a transfer has finished. Because the application talks to the adapter through these queues in user space, the data path never traps into the kernel on the fast path ^[6].

The fabric itself is switched. Servers connect to switches, switches connect to other switches, and large fabrics are built in multi level topologies, with the fat tree being the most common choice for HPC and AI. A fat tree gives full bandwidth between any two nodes if it is built without oversubscription, which is what training jobs want. Routers join separate InfiniBand subnets when a single subnet is not enough.

A piece that has no direct equivalent in plain Ethernet is the subnet manager, often abbreviated SM. One node in each subnet runs this software. It discovers the topology, assigns a local identifier, called a LID, to every port, computes the forwarding tables that the switches use, and programs them. It keeps doing this as links come and go. Because routing is centrally computed and pushed to the switches, the fabric can be tuned deterministically, which helps the predictable performance that collective operations depend on ^[1]^[2].

Links are built from lanes. A single lane carries a serial signal in each direction, and ports gang several lanes together for more bandwidth. The common widths are 1x, 4x, and 12x, and the workhorse server port has been 4x for most of the technology's history. The headline data rate of a generation usually refers to a 4x port, so a 400 gigabit per second NDR port is four lanes of roughly 100 gigabits each ^[2]^[8].

What are the InfiniBand generations and data rates?

InfiniBand has advanced through a sequence of named speed grades, each roughly doubling the per lane signaling rate of the one before. The early generations used 8b/10b line encoding, which spends two bits in ten on framing, so the usable data rate is 80 percent of the raw signaling rate. From FDR onward the encoding moved to the far more efficient 64b/66b, and the newest grades use PAM4 modulation, which sends two bits per signal step and pairs it with forward error correction. The table below lists the per lane signaling rate and the effective per port data rate for a 4x link, which is the figure normally quoted ^[2]^[8]^[9].

Generation	Per lane signaling	4x port effective data rate	Encoding	Era
SDR	2.5 Gb/s	8 Gb/s	8b/10b	early 2000s
DDR	5 Gb/s	16 Gb/s	8b/10b	mid 2000s
QDR	10 Gb/s	32 Gb/s	8b/10b	around 2008
FDR	14.0625 Gb/s	about 56 Gb/s	64b/66b	around 2011
EDR	25.78125 Gb/s	100 Gb/s	64b/66b	around 2014
HDR	53.125 Gb/s	200 Gb/s	64b/66b	around 2018
NDR	about 106.25 Gb/s	400 Gb/s	PAM4	around 2021
XDR	about 212.5 Gb/s	800 Gb/s	PAM4	from 2024

The named grades come from a marketing vocabulary that the IBTA has used for years, where SDR is single data rate, DDR is double, QDR is quad, and then FDR, EDR, HDR, NDR, and XDR continue the pattern as fourteen, enhanced, high, next, and extreme data rate. The association publishes a roadmap that extends past XDR to GDR, so the cadence is expected to continue ^[2]^[8]. Higher widths multiply these numbers, so a 12x XDR link reaches into the multiple terabit range, though 4x and the aggregation of many 4x ports is the usual way large fabrics are built.

NVIDIA Quantum and SHARP

NVIDIA completed its purchase of Mellanox on April 27, 2020, in a deal valued at about 6.9 billion dollars and first announced in March 2019, after which the InfiniBand product line became part of NVIDIA's networking business ^[3]. Framing the rationale at the time of the announcement, NVIDIA founder and CEO Jensen Huang said the companies would "unite NVIDIA's accelerated computing platform with Mellanox's world-renowned accelerated networking platform under one roof to create next-generation datacenter-scale computing solutions" ^[3]. NVIDIA brands its InfiniBand switches as Quantum. The HDR generation switch was the original Quantum at 200 gigabits per port. Quantum-2 followed at the NDR generation, delivering a single switch with 64 ports of 400 gigabits, built from 32 physical OSFP cages that can also split into 128 ports of 200 gigabits, for an aggregate 51.2 terabits per second of bidirectional throughput and more than 66.5 billion packets per second ^[4]^[10]. At its 2024 developer conference NVIDIA introduced Quantum-X800, an XDR class platform built on 200 gigabit per lane SerDes that pushes per port speed to 800 gigabits. Its Q3400 switch provides 144 ports of 800 gigabits and can connect up to 10,368 network adapters in a two level fat tree, positioning it squarely at trillion parameter scale AI ^[10]^[11].

A feature that sets the NVIDIA fabric apart is in network computing through SHARP, the Scalable Hierarchical Aggregation and Reduction Protocol. Distributed training spends a large share of its network time on collective operations, the most important being the all reduce that sums gradients across every GPU and hands the result back to all of them. Normally that sum is computed by passing partial results around the network many times. SHARP moves the arithmetic into the switches themselves. As data flows up the tree, switch silicon adds the contributions together, so the reduction happens once, in the fabric, instead of being carried back and forth among the endpoints. This cuts the volume of data on the wire and lowers the time the collective takes, and NVIDIA reports meaningful speedups for the operations that dominate large training runs ^[12]^[13]. The fourth generation of the protocol, SHARPv4, ships on Quantum-X800 and adds FP8 precision and new collective operations such as ReduceScatter and ScatterGather ^[11]. SHARP is one of the clearest examples of treating the network as an active part of the computer rather than a passive pipe.

How does InfiniBand differ from Ethernet and RoCE?

The natural alternative to InfiniBand is Ethernet, which is cheaper, more familiar, and present in every data center already. The gap in raw capability narrowed once Ethernet gained RDMA through RoCE, short for RDMA over Converged Ethernet. RoCE lets an Ethernet network carry the same kernel bypass transfers that InfiniBand pioneered, and the latest version routes those transfers across ordinary IP networks. With RoCE, an organization can get much of the latency benefit of RDMA while staying on Ethernet switches and Ethernet operations skills.

The practical differences come down to how each fabric handles congestion and loss. InfiniBand uses a credit based flow control scheme in which a sender only transmits when the receiver has signaled that it has buffer space, so the fabric is effectively lossless by design and behaves predictably under heavy load. Ethernet was built to tolerate dropped packets and to recover from them, and making it lossless enough for RDMA requires careful tuning of priority flow control and congestion notification. When that tuning is right, RoCE performs very well, but it is more sensitive to misconfiguration, and the failure modes can be subtle at the scale of an AI cluster. InfiniBand trades some of Ethernet's ubiquity and lower cost for behavior that is easier to make deterministic ^[7]^[14].

A newer challenger is Ultra Ethernet, the work of the Ultra Ethernet Consortium, an effort hosted by the Linux Foundation and launched in 2023 with backing from a broad group of vendors and large operators. Its stated goal is to take Ethernet and rework its transport, congestion control, and packet handling specifically for AI and HPC workloads, so that an open, multi vendor Ethernet fabric can match what a purpose built interconnect offers at this scale ^[14]^[15]. Ultra Ethernet is an attempt to close the gap from the Ethernet side, and it reflects how much money is now flowing into AI fabrics. Whether it displaces InfiniBand at the high end or simply gives buyers a credible second option is still playing out, and several of the largest operators are building very large clusters on Ethernet variants today.

Why is InfiniBand the default fabric for large AI clusters?

InfiniBand is the default compute fabric in NVIDIA's reference architectures for large training. The DGX SuperPOD design, NVIDIA's blueprint for stitching many GPU servers into one supercomputer, uses Quantum InfiniBand for the network that carries traffic between nodes, with the current generation built on Quantum-2 at 400 gigabits and the newer designs moving to 800 ^[10]^[11]. Inside each server NVLink connects the GPUs, and InfiniBand connects the servers, so a job sees a smooth hierarchy of bandwidth from the local GPUs out to the rest of the cluster. Many of the named supercomputers and research clusters listed on the TOP500 use InfiniBand for the same reasons, and the interconnect has long held a large share of the systems on that list. On the November 2024 list InfiniBand connected 66 of the top 100 systems and powered six of the ten most energy efficient systems on the companion Green500 ranking ^[5]^[16].

The reason the fabric earns its place is the combination of three things. Bandwidth keeps the GPUs fed, low latency keeps synchronization tight, and SHARP offloads the collectives that would otherwise saturate the network. For a training run where every step ends in a global synchronization, shaving microseconds and cutting traffic at each step compounds across millions of steps into a real reduction in wall clock time and cost. This is why operators who measure their clusters in tens of thousands of GPUs have been willing to pay the premium that InfiniBand carries over commodity Ethernet.

What are the limitations of InfiniBand?

InfiniBand is not free of drawbacks. It costs more than Ethernet for comparable port speeds, the operational model is different enough that teams need specific expertise, and for most of its life the ecosystem has been dominated by a single supplier, which became even more concentrated after the Mellanox acquisition put the leading switches and adapters under one roof. That concentration gives buyers a coherent, well integrated stack, but it also leaves them dependent on one vendor's roadmap and pricing.

The competitive picture is shifting because Ethernet is improving fast and because several of the largest AI builders prefer an open fabric they can source from many vendors. RoCE already serves large RDMA deployments, and Ultra Ethernet aims to give the open ecosystem a transport tuned for AI. Some hyperscale operators have publicly built enormous GPU clusters on Ethernet rather than InfiniBand, which shows that the high end is no longer a single horse race. At the same time NVIDIA continues to push InfiniBand forward with each Quantum generation and also sells its own Ethernet line, Spectrum-X, for customers who want RDMA on Ethernet from the same vendor. The likely outcome is a market with two strong fabric families, InfiniBand where deterministic low latency and tight integration with NVIDIA's stack matter most, and advanced Ethernet where openness, cost, and reuse of existing skills win. For now InfiniBand remains the reference choice for the largest and most performance sensitive AI training systems, and its role connecting GPUs across the data hall is firmly established ^[5]^[10]^[14].

References

InfiniBand Trade Association. About the IBTA. https://www.infinibandta.org/about-infiniband/ ↩
InfiniBand Trade Association. InfiniBand Roadmap. https://www.infinibandta.org/infiniband-roadmap/ ↩
NVIDIA. NVIDIA to Acquire Mellanox for $6.9 Billion (March 2019); NVIDIA Completes Acquisition of Mellanox (April 27, 2020). https://nvidianews.nvidia.com/news/nvidia-to-acquire-mellanox-for-6-9-billion ↩
NVIDIA. InfiniBand Networking. https://www.nvidia.com/en-us/networking/products/infiniband/ ↩
TOP500. The List, interconnect statistics. https://www.top500.org/ ↩
NVIDIA. ConnectX InfiniBand Adapters. https://www.nvidia.com/en-us/networking/infiniband-adapters/ ↩
RDMA Consortium and IBTA. RDMA over Converged Ethernet (RoCE) overview, IBTA. https://www.infinibandta.org/roce-initiative/ ↩
InfiniBand Trade Association. InfiniBand Architecture Specification, technical overview. https://www.infinibandta.org/ibta-specification/ ↩
Wikipedia. InfiniBand. https://en.wikipedia.org/wiki/InfiniBand ↩
NVIDIA. NVIDIA Quantum-2 InfiniBand Platform. https://www.nvidia.com/en-us/networking/quantum2/ ↩
NVIDIA. NVIDIA Quantum-X800 InfiniBand Switches. https://www.nvidia.com/en-us/networking/products/infiniband/quantum-x800/ ↩
NVIDIA. NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP). https://docs.nvidia.com/networking/display/sharpv300 ↩
NVIDIA Developer Blog. Advancing Performance with NVIDIA SHARP In-Network Computing. https://developer.nvidia.com/blog/advancing-performance-with-nvidia-sharp-in-network-computing/ ↩
Ultra Ethernet Consortium. Overview and specification goals. https://ultraethernet.org/ ↩
Linux Foundation. Ultra Ethernet Consortium launch announcement. https://www.linuxfoundation.org/press/announcing-ultra-ethernet-consortium-dedicated-to-high-performance-networking ↩
InfiniBand Trade Association. InfiniBand and RoCE Advances Further in the TOP500 November 2024 List. https://www.infinibandta.org/infiniband-and-roce-advances-further-in-the-top500-november-2024-list/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

InfiniBand

What is InfiniBand used for?

How does InfiniBand work?

What are the InfiniBand generations and data rates?

NVIDIA Quantum and SHARP

How does InfiniBand differ from Ethernet and RoCE?

Why is InfiniBand the default fabric for large AI clusters?

What are the limitations of InfiniBand?

References

Improve this article

What links here (24 of 29)

What links here (24 of 29)

What is InfiniBand used for?

How does InfiniBand work?

What are the InfiniBand generations and data rates?

NVIDIA Quantum and SHARP

How does InfiniBand differ from Ethernet and RoCE?

Why is InfiniBand the default fabric for large AI clusters?

What are the limitations of InfiniBand?

References

Improve this article

Related Articles

Cloud TPU

NVIDIA Picasso

Tensor Processing Unit (TPU)

TPU Pod

TPU Node

TPU Worker

What links here (24 of 29)

Related Articles

Cloud TPU

NVIDIA Picasso

Tensor Processing Unit (TPU)

TPU Pod

TPU Node

TPU Worker

What links here (24 of 29)