NVLink

AI Hardware Data Centers NVIDIA

20 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

17 citations

Revision

v3 · 3,949 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

NVLink is NVIDIA's proprietary high-bandwidth, low-latency, cache-coherent point-to-point interconnect for directly connecting GPUs to other GPUs and, in some configurations, to host CPUs. Introduced in 2014 and first shipped in 2016 with the Tesla P100, it now reaches 1.8 TB/s of aggregate bidirectional bandwidth per GPU in its fifth generation (Blackwell, 2024), roughly fourteen times the bandwidth of PCIe Gen5 x16.^[6]^[16] Across five generations NVLink has become the dominant scale-up fabric inside modern AI servers, supercomputers, and rack-scale systems such as the DGX GB200 NVL72, where it ties 72 GPUs into a single 130 TB/s memory domain.^[6] The companion NVSwitch chip is the crossbar that connects many NVLink endpoints together, and in 2025 NVIDIA extended the technology to third-party silicon through NVLink Fusion.^[16]

NVLink was created to overcome the bandwidth and latency limitations of PCI Express in multi-GPU systems.^[12] NVLink is the physical interconnect; the related NVSwitch chip is the crossbar that connects many NVLink endpoints together inside a node or a rack, and NVLink-C2C ("chip-to-chip") is a die-to-die variant used to bond NVIDIA's Grace CPU to Hopper or Blackwell GPUs into a single coherent superchip.^[5] NVIDIA Collective Communications Library collectives such as all-reduce, all-gather, and reduce-scatter run directly on top of NVLink whenever GPUs in a distributed training job share an NVLink fabric, which is why nearly every published frontier LLM training run depends on it.

What is NVLink used for?

NVLink exists to move tensors between GPUs fast enough that large neural networks can be trained and served across many accelerators as if they were one device. Modern frontier models do not fit on a single GPU, so the work is sharded across many GPUs using a mix of data parallelism, tensor parallelism, pipeline parallelism, and expert parallelism. Each of those strategies generates heavy GPU-to-GPU communication (all-reduce, all-gather, reduce-scatter), and NVLink is the fabric that carries it inside a server or rack at hundreds of gigabytes per second per GPU, far beyond what PCIe can sustain. The detailed mechanics of why each parallelism strategy needs NVLink are covered in the sections below.

Background and motivation

By the early 2010s, the dominant host interconnect for accelerators was PCI Express. PCIe Gen3 x16, the standard interface for GPUs through the Maxwell generation, delivered roughly 16 GB/s per direction (about 32 GB/s aggregate). For a single accelerator that was acceptable, but for the kinds of multi-GPU servers that deep learning was beginning to demand, PCIe became the bottleneck. Gradient exchange, parameter synchronization, and activation transfer between GPUs all traveled across the same PCIe root complex and, often, across the host CPU. As models grew, the time spent shuffling tensors across PCIe began to dominate the time spent on compute.

NVIDIA announced NVLink in March 2014 as an answer to this problem.^[12] The design goals were straightforward: much higher bandwidth than PCIe, lower latency, support for direct GPU-to-GPU memory access, and a cache-coherent memory model so that multiple GPUs could share data without the software overhead of explicit copies. The first commercial implementation arrived two years later inside the Tesla P100 and the original DGX-1 system.^[8]

When was each generation of NVLink released?

Five generations of NVLink have shipped to date, each tied to a specific NVIDIA GPU computing microarchitecture.^[12] The table below summarizes the per-link signaling rate, the data rate per direction per link, the number of links exposed by the flagship data-center GPU of that generation, and the resulting aggregate bidirectional bandwidth per GPU.

Generation	Year	Microarchitecture	Flagship GPU	Per-link signaling	Data rate per direction per link	Links per GPU	Aggregate bandwidth per GPU
NVLink 1.0	2016	Pascal	Tesla P100	20 GT/s	20 GB/s	4	160 GB/s
NVLink 2.0	2017	Volta	Tesla V100	25 GT/s	25 GB/s	6	300 GB/s
NVLink 3.0	2020	Ampere	A100	50 GT/s	25 GB/s	12	600 GB/s
NVLink 4.0	2022	Hopper	H100	50 GT/s	25 GB/s	18	900 GB/s
NVLink 5.0	2024	Blackwell	B100 / B200 / GB200	100 GT/s	50 GB/s	18	1,800 GB/s

A few conventions are worth flagging because the marketing numbers can be confusing. NVIDIA generally quotes "aggregate bandwidth" as the bidirectional sum across all links, so a 900 GB/s H100 is really 450 GB/s in each direction.^[7] Per-link bandwidth is also typically the bidirectional figure: a single NVLink 4.0 link is 25 GB/s in each direction, often written as 50 GB/s bidirectional. Generations 3.0 and 4.0 share the same 50 GT/s per-pair signaling and the same 25 GB/s per-direction-per-link figure.^[3] The improvement from A100 to H100 came from raising the link count from 12 to 18 and from cutting the number of differential pairs per link in half, which let NVIDIA pack more links into roughly the same die area.^[3] The jump to NVLink 5.0 doubled per-link bandwidth from 50 GB/s to 100 GB/s while holding the link count at 18, which is how Blackwell reaches 1.8 TB/s, exactly twice the H100.^[16]

NVLink 1.0 (Pascal, 2016)

NVLink 1.0 debuted on the GP100 die inside the Tesla P100. Each link carried 20 GB/s in each direction, and each P100 exposed four links, giving 80 GB/s per direction or 160 GB/s aggregate per GPU.^[8] That was already roughly five times the bandwidth of PCIe Gen3 x16 and made it possible to build the first DGX-1 with eight P100s wired together in a hybrid cube-mesh topology.^[8] NVLink 1.0 was also the version IBM integrated into POWER8+ CPUs, which is how the Summit supercomputer's predecessors first put NVLink directly between a CPU and a GPU instead of going through PCIe.^[12]

NVLink 2.0 (Volta, 2017)

Volta's V100 stepped the per-link signaling from 20 GT/s to 25 GT/s and increased the link count from four to six. The result was 25 GB/s per direction per link and 300 GB/s aggregate per GPU.^[1] NVLink 2.0 also introduced cache coherence with IBM POWER9, which is the configuration Oak Ridge's Summit and Lawrence Livermore's Sierra supercomputers used: each node had two POWER9 CPUs and six V100 GPUs all connected with NVLink 2.0, giving the CPU and GPU a shared coherent address space.^[12]

NVLink 3.0 (Ampere, 2020)

Ampere's A100 doubled the per-pair signaling rate to 50 Gbit/s while halving the number of pairs per link, then doubled the link count from six to twelve. Each NVLink 3.0 link still delivered 25 GB/s per direction, but with twelve of them per A100 the aggregate bandwidth reached 600 GB/s.^[12] The A100 generation also introduced second-generation NVSwitch, which NVIDIA used inside the DGX A100 to connect eight A100s in an all-to-all topology where every pair of GPUs gets the full 600 GB/s of bisection bandwidth.^[12]

NVLink 4.0 (Hopper, 2022)

Hopper's H100 kept the same 50 GT/s per-pair signaling but pushed the link count from twelve to eighteen. Each H100 SXM5 module exposes 18 NVLink 4.0 links for 900 GB/s of aggregate bandwidth, which NVIDIA consistently advertises as roughly seven times the bandwidth of PCIe Gen5 x16 (128 GB/s).^[2]^[7] The H100 NVL variant, which is a PCIe-form-factor product designed for inference of large language models, exposes 600 GB/s of NVLink instead of 900 GB/s.^[7] Hopper paired with third-generation NVSwitch, whose total switching capacity rose to 13.6 Tbit/s from 7.2 Tbit/s in the prior generation, and added in-network reduction (the SHARP accelerator) so that all-reduce can finish inside the switch fabric rather than touching every endpoint.^[10]

NVLink 5.0 (Blackwell, 2024 onward)

Blackwell's B100, B200, and the dual-die GB200 superchip use fifth-generation NVLink. The signaling rate doubles to 100 GT/s, and the per-direction bandwidth per link doubles correspondingly to 50 GB/s. With 18 links per GPU, aggregate bandwidth reaches 1.8 TB/s, twice that of an H100 and roughly fourteen times PCIe Gen5 x16.^[6]^[16] NVIDIA describes a single Blackwell GPU as supporting "up to 18 NVLink 100 gigabyte-per-second (GB/s) connections for a total bandwidth of 1.8 terabytes per second (TB/s)."^[16] NVLink 5.0 is the foundation of the GB200 NVL72 rack, which is the largest single NVLink domain NVIDIA had ever shipped at launch.^[6] Through the NVLink Switch System, fifth-generation NVLink can be extended beyond a single rack to connect up to 576 GPUs in one non-blocking domain with more than 1 petabyte per second of total bandwidth.^[6]^[16]

NVSwitch

NVLink by itself is a point-to-point link. To connect more than a handful of GPUs in a fully connected, high-bandwidth fashion you need a switch, and that switch is NVSwitch. NVSwitch first appeared in 2018 inside the DGX-2, where 16 V100 GPUs were tied together by twelve NVSwitch chips, giving every GPU full 300 GB/s of NVLink bandwidth to every other GPU in the system.^[9] That was the first time a 16-GPU server behaved, from the application's perspective, as a single shared-memory accelerator.^[9]

NVSwitch generations track NVLink generations but are numbered separately. First-generation NVSwitch shipped with V100 in DGX-2.^[9] Second-generation NVSwitch shipped with A100 in DGX A100, and used six switch chips per system rather than twelve.^[12] Third-generation NVSwitch shipped with H100 in DGX H100 and HGX H100 baseboards, with 13.6 Tbit/s per chip and the SHARP in-network reduction engine.^[10] Fourth-generation NVSwitch shipped with Blackwell in 2024, with each switch chip exposing 72 NVLink 5.0 ports.^[6]

The most important thing NVSwitch enables is non-blocking all-to-all bandwidth. Without a switch, an N-GPU system has to share its NVLink budget across every pair, which limits both the topology and the bisection bandwidth. With NVSwitch, every GPU can talk to every other GPU at full link bandwidth simultaneously, which is exactly what NCCL collectives like all-reduce and all-gather want.^[9]

DGX systems and the rise of NVLink-based servers

NVIDIA's DGX line has been the reference design for NVLink-based servers from the beginning, and the DGX generations roughly map onto the NVLink generations.^[13]

System	Year	GPUs	NVLink generation	Topology	Notable detail
DGX-1 (P100)	2016	8 x P100	1.0	Hybrid cube mesh	First commercial NVLink server
DGX-1 (V100)	2017	8 x V100	2.0	Hybrid cube mesh	First V100 system, no NVSwitch
DGX-2	2018	16 x V100	2.0	Full NVSwitch fabric	First NVSwitch system, 2.4 TB/s bisection
DGX A100	2020	8 x A100	3.0	Full NVSwitch fabric	Six 2nd-gen NVSwitch chips
DGX H100	2022	8 x H100	4.0	Full NVSwitch fabric	3.6 TB/s bisection, SHARP reductions
DGX GB200 NVL72	2024	72 x B200	5.0	Rack-scale NVSwitch	130 TB/s NVLink domain bandwidth

The DGX-1 with eight P100s used a hybrid cube-mesh because the GPUs only had four NVLinks each, so a fully connected topology was not possible without a switch.^[8] The DGX-2 was the first system to behave as a single 16-GPU device, and it produced 2.4 TB/s of bisection bandwidth and 75 GB/s of all-reduce bandwidth.^[9] The DGX A100 dropped back to eight GPUs but used the much larger A100 NVLink budget to give every pair the full 600 GB/s.^[12] The DGX H100 raised that to 3.6 TB/s bisection and 450 GB/s of all-reduce.^[2]

The big jump arrived with the GB200 NVL72. Instead of stopping at eight GPUs in a single chassis, NVIDIA used fifth-generation NVLink and fourth-generation NVSwitch to extend the NVLink domain across an entire liquid-cooled rack.^[6] Seventy-two Blackwell GPUs and 36 Grace CPUs sit in eighteen compute trays, with nine NVLink Switch trays providing the all-to-all fabric.^[6] The result is a single NVLink domain with 130 TB/s of aggregate GPU communication bandwidth, which NVIDIA treats as the basic unit of an "AI factory" rack.^[6] Before NVL72, the largest practical NVLink domain was eight GPUs (DGX H100). Going from eight to seventy-two in a single coherent fabric is the largest single-generation jump in scale-up topology NVIDIA has ever made.

NVLink-C2C

NVLink-C2C is a die-to-die variant of NVLink built specifically to bond a Grace CPU to a Hopper or Blackwell GPU on the same package or board. It first appeared in the Grace Hopper Superchip (GH200) and now anchors the GB200 superchip as well.^[4] NVLink-C2C delivers 900 GB/s of bidirectional bandwidth between the CPU and GPU, which NVIDIA describes as roughly seven times the bandwidth of x16 PCIe Gen5.^[5] It is also dramatically more energy-efficient: 1.3 picojoules per bit transferred, which NVIDIA cites as more than five times better than PCIe Gen5.^[11]

The more important property is coherence. NVLink-C2C is memory-coherent, meaning the GPU and the CPU share a single, hardware-managed address space.^[4] In NVIDIA's words, "NVLink-C2C memory coherency enables developers to transfer only the data they need, and not migrate memory pages back and forth between the CPU and GPU," with CPU and GPU threads concurrently and transparently accessing both CPU- and GPU-resident memory.^[4] CPU threads and GPU threads can both access either CPU-resident or GPU-resident memory transparently, and atomic operations cross the boundary natively.^[4] For model parallelism and large-model inference this matters because the GPU can spill embedding tables, KV caches, and other oversized data structures into the CPU's much larger LPDDR5X memory pool without paying PCIe transfer costs every time it needs them.^[4]

What is NVLink Fusion?

NVLink Fusion is NVIDIA's program, announced May 18, 2025 at Computex, that licenses NVLink interconnect technology to third parties so they can integrate NVLink directly into their own custom silicon and connect it to NVIDIA GPUs and CPUs.^[16] At launch the first custom-silicon partners were MediaTek, Marvell, Alchip Technologies, Astera Labs, Synopsys, and Cadence, while Fujitsu and Qualcomm Technologies were named as CPU partners whose processors can be paired with NVIDIA GPUs over NVLink.^[16] The goal is to let customers build semi-custom AI infrastructure in which custom CPUs or accelerators participate in NVLink-based clusters alongside NVIDIA GPUs, rather than being limited to PCIe.^[16] NVIDIA CEO Jensen Huang framed the launch in sweeping terms: "A tectonic shift is underway: for the first time in decades, data centers must be fundamentally rearchitected, AI is being fused into every computing platform."^[16] NVLink Fusion uses the same fifth-generation NVLink that delivers 1.8 TB/s per GPU, and NVIDIA positions it as a way for cloud providers to scale out AI factories toward millions of GPUs.^[16]

Cache coherence and the memory model

NVLink is more than a fast pipe. From the second generation onward, links can carry coherent memory traffic, which means that two GPUs (or a CPU and a GPU) connected by NVLink behave like nodes in a coherent NUMA system rather than two separate devices that happen to be wired together.^[12] A GPU can issue a load to a memory address that physically lives in another GPU's HBM, and the cache fabric will resolve it without explicit DMA. CUDA programs see this as Unified Virtual Addressing and, with newer GPUs, as full Unified Memory across the NVLink domain.

This matters for two reasons. First, it lets the CUDA runtime move data lazily, only fetching cache lines when they are actually touched, which is much more efficient than pre-staging entire tensors. Second, it lets compilers and frameworks treat the NVLink domain as one big memory pool. Tensor parallelism, in particular, depends on this: when a single matrix multiply is sharded across eight GPUs, each GPU has to read partial results from every other GPU during the all-reduce that follows, and the cost of that all-reduce is dominated by NVLink bandwidth.^[14]

How does NVLink differ from PCIe?

PCIe is still the standard host interconnect, and every NVIDIA data-center GPU also exposes a PCIe interface for talking to the host CPU and to NICs. But for GPU-to-GPU communication inside a server, NVLink wins by an enormous margin. The headline contrast: PCIe Gen5 x16 tops out around 128 GB/s aggregate, while NVLink 5.0 delivers 1,800 GB/s per GPU, roughly fourteen times more.^[16]

Interconnect	Per-direction bandwidth	Aggregate bandwidth	Notes
PCIe Gen3 x16	~16 GB/s	~32 GB/s	P100-era host link
PCIe Gen4 x16	~32 GB/s	~64 GB/s	A100-era host link
PCIe Gen5 x16	~64 GB/s	~128 GB/s	H100 / Blackwell host link
NVLink 4.0 (H100)	450 GB/s	900 GB/s	About 7x PCIe Gen5 x16
NVLink 5.0 (B200)	900 GB/s	1,800 GB/s	About 14x PCIe Gen5 x16
NVLink-C2C (GH200)	450 GB/s	900 GB/s	CPU-to-GPU, coherent

The gap matters most for collectives. An all-reduce of a 100 GB gradient buffer over PCIe Gen5 x16 takes more than a second of pure transfer time per GPU pair. Over NVLink 5.0 it takes around 50 milliseconds. When that collective happens after every microbatch in a training step, the difference between PCIe and NVLink can be the difference between a model that trains in three months and a model that never finishes.

Why does NVLink matter for AI?

The headline reason NVLink exists today is large-model training. Modern frontier language models are too big to fit on a single GPU, so training (and increasingly inference) is sharded across many GPUs using a mix of data parallelism, tensor parallelism, pipeline parallelism, and expert parallelism.^[14] The communication patterns these strategies generate are very different in their bandwidth requirements.

Tensor parallelism is the most NVLink-hungry. It splits a single matrix multiply across N GPUs and requires an all-reduce after every transformer layer's attention and MLP blocks.^[14] The size of the activations being reduced is large (gigabytes per layer for big models) and the operation has to complete before the next layer can start. In practice this means tensor-parallel groups must live inside a single NVLink domain.^[14] For H100-based clusters that means tensor parallelism is bounded at eight (one DGX H100 node); for GB200 NVL72 it can go up to seventy-two without leaving the NVLink fabric.^[6] That is the single biggest reason hyperscalers care about NVL72: it lets them use much larger tensor-parallel degrees, which in turn lets them serve trillion-parameter models with low latency. NVIDIA claims the 72-GPU NVLink domain delivers up to 30x faster real-time trillion-parameter LLM inference than the prior generation.^[6]

Pipeline parallelism is gentler on the network because the only thing that crosses GPUs is the activation tensor between pipeline stages, but it still benefits from NVLink because pipeline bubbles get smaller as inter-stage latency drops. Data parallelism uses all-reduce on gradients once per step, and FSDP uses all-gather on parameters before each forward pass; both are friendlier to InfiniBand-class inter-node networks but still benefit from NVLink for the intra-node portion of the collective.

NCCL, NVIDIA's collective communication library, is what actually drives the wires. NCCL automatically discovers the topology, picks the best available transport (NVLink, NVSwitch, PCIe, InfiniBand, or RoCE), and chooses an algorithm (ring, tree, or SHARP) for each collective. When NCCL detects an NVLink fabric it prefers it over everything else, and on NVL72 it will try to route as much of the all-reduce as possible through the in-switch SHARP engine.^[10]

The effect on training is large enough to be visible in published numbers. The Megatron-LM paper from NVIDIA and Microsoft reports that scaling a GPT-style model from 1 billion to 175 billion parameters and from 8 to 384 A100 GPUs sustains about 50 percent of peak FLOPS, which is only possible because NVLink keeps the all-reduce time small relative to compute.^[14] Frontier-class runs (GPT-4, Claude, Gemini, Llama 3 405B, and beyond) have not published their parallelism configurations in detail, but every public description of how they are trained references NVLink-connected nodes as the unit of tensor parallelism.

Is NVLink open or proprietary?

NVLink is proprietary, and that has been both its greatest strength and the main complaint against it. Because NVIDIA controls the spec, it can iterate quickly: each generation has roughly doubled aggregate bandwidth, and the addition of NVSwitch and SHARP has happened on NVIDIA's own schedule rather than a standards-body schedule. The cost is that there is no second source. If you want NVLink, you have to buy NVIDIA GPUs, NVIDIA switches, and (increasingly) NVIDIA networking. The H100 PCIe NVL variant, which exposes only 600 GB/s instead of 900 GB/s, is a reminder that even NVIDIA's own product segmentation can hide NVLink behind extra cost.^[7] The 2025 NVLink Fusion program softens this slightly by licensing NVLink to partner silicon, but the fabric, switches, and software ecosystem remain NVIDIA's.^[16]

Competitors have responded with their own scale-up fabrics. AMD's Infinity Fabric connects its Instinct MI300X and MI325X GPUs at up to 896 GB/s aggregate per GPU on Infinity Fabric 4. Intel's Xe Link does the same job for the Ponte Vecchio and Gaudi accelerators, although Intel has shifted resources around in this area more than once. None of these is interchangeable with NVLink at the silicon level, and none of them currently has anything comparable to the NVL72 rack-scale topology.

The more interesting development is UALink (Ultra Accelerator Link), an open standard whose consortium was incorporated in 2024 with founding members including AMD, Intel, Broadcom, Cisco, Google, HPE, Meta, and Microsoft.^[15] The UALink 200G 1.0 specification was ratified on April 8, 2025, defining an open scale-up fabric that can connect up to 1,024 accelerators with direct read, write, and atomic semantics at 200 Gbit/s per lane.^[17] The consortium represents more than 85 member companies and its explicit goal is to be an open alternative to NVLink and NVSwitch.^[15]^[17] Whether UALink ships in volume, and whether it can match NVIDIA's bandwidth and software ecosystem, is one of the open questions in AI infrastructure for the next several years.

One practical limitation worth noting: NVIDIA removed NVLink from its consumer Ada Lovelace GeForce cards in 2022, and Blackwell consumer cards (RTX 50 series) likewise do not expose NVLink.^[12] NVLink is now strictly a data-center technology. Hobbyists who want to pool VRAM across multiple GPUs at home cannot do it the way they could with the RTX 30-series Quadro and earlier Titan cards.

References

Nvidia, "NVIDIA DGX-1 With Tesla V100 System Architecture" white paper. https://images.nvidia.com/content/pdf/dgx1-v100-system-architecture-whitepaper.pdf ↩
Nvidia, "NVIDIA H100 Tensor Core GPU Architecture" white paper, March 2022. https://www.advancedclustering.com/wp-content/uploads/2022/03/gtc22-whitepaper-hopper.pdf ↩
Nvidia Developer Blog, "NVIDIA Hopper Architecture In-Depth." https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/ ↩
Nvidia Developer Blog, "NVIDIA Grace Hopper Superchip Architecture In-Depth." https://developer.nvidia.com/blog/nvidia-grace-hopper-superchip-architecture-in-depth/ ↩
Nvidia, "NVLink-C2C: Chip Interconnect Technology." https://www.nvidia.com/en-us/data-center/nvlink-c2c/ ↩
Nvidia, "GB200 NVL72." https://www.nvidia.com/en-us/data-center/gb200-nvl72/ ↩
Nvidia, "H100 GPU." https://www.nvidia.com/en-us/data-center/h100/ ↩
Nvidia Developer Blog, "NVIDIA DGX-1: The Fastest Deep Learning System." https://developer.nvidia.com/blog/dgx-1-fastest-deep-learning-system/ ↩
Alben et al., "NVSwitch and DGX-2," Hot Chips 30, 2018. https://www.old.hotchips.org/hc30/2conf/2.01_Nvidia_NVswitch_HotChips2018_DGX2NVS_Final.pdf ↩
Ishii et al., "The NVLink-Network Switch," Hot Chips 34, 2022. https://hc34.hotchips.org/assets/program/conference/day2/Network%20and%20Switches/NVSwitch%20HotChips%202022%20r5.pdf ↩
Sury et al., "NVLink-C2C: A Coherent Off Package Chip-to-Chip Interconnect with 40Gbps/pin Single-ended Signaling," ISSCC 2023. https://ieeexplore.ieee.org/document/10067395/ ↩
Wikipedia, "NVLink." https://en.wikipedia.org/wiki/NVLink ↩
Wikipedia, "Nvidia DGX." https://en.wikipedia.org/wiki/Nvidia_DGX ↩
Narayanan et al., "Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM," SC 2021. https://people.eecs.berkeley.edu/~matei/papers/2021/sc_megatron_lm.pdf ↩
UALink Consortium and reporting in The Next Platform, "UALink Fires First GPU Interconnect Salvo at Nvidia NVSwitch," April 2025. https://www.nextplatform.com/2025/04/08/ualink-fires-first-gpu-interconnect-salvo-at-nvidia-nvswitch/ ↩
NVIDIA Newsroom, "NVIDIA Unveils NVLink Fusion for Industry to Build Semi-Custom AI Infrastructure With NVIDIA Partner Ecosystem," May 18, 2025. https://nvidianews.nvidia.com/news/nvidia-nvlink-fusion-semi-custom-ai-infrastructure-partner-ecosystem ↩
UALink Consortium, "UALink Consortium Releases the Ultra Accelerator Link 200G 1.0 Specification," April 8, 2025. https://ualinkconsortium.org/blog/ualink-200g-1-0-specification-overview-802/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

NVLink

What is NVLink used for?

Background and motivation

When was each generation of NVLink released?

NVLink 1.0 (Pascal, 2016)

NVLink 2.0 (Volta, 2017)

NVLink 3.0 (Ampere, 2020)

NVLink 4.0 (Hopper, 2022)

NVLink 5.0 (Blackwell, 2024 onward)

NVSwitch

DGX systems and the rise of NVLink-based servers

NVLink-C2C

What is NVLink Fusion?

Cache coherence and the memory model

How does NVLink differ from PCIe?

Why does NVLink matter for AI?

Is NVLink open or proprietary?

See also

References

Improve this article

What links here (24 of 52)

What links here (24 of 52)

What is NVLink used for?

Background and motivation

When was each generation of NVLink released?

NVLink 1.0 (Pascal, 2016)

NVLink 2.0 (Volta, 2017)

NVLink 3.0 (Ampere, 2020)

NVLink 4.0 (Hopper, 2022)

NVLink 5.0 (Blackwell, 2024 onward)

NVSwitch

DGX systems and the rise of NVLink-based servers

NVLink-C2C

What is NVLink Fusion?

Cache coherence and the memory model

How does NVLink differ from PCIe?

Why does NVLink matter for AI?

Is NVLink open or proprietary?

See also

References

Improve this article

Related Articles

NVIDIA A100

NVIDIA H200

NVIDIA Hopper

NVIDIA B200

NVIDIA GB300 NVL72

NVIDIA DGX B300

What links here (24 of 52)

Related Articles

NVIDIA A100

NVIDIA H200

NVIDIA Hopper

NVIDIA B200

NVIDIA GB300 NVL72

NVIDIA DGX B300

What links here (24 of 52)