NVIDIA HGX
Last reviewed
Jun 3, 2026
Sources
11 citations
Review status
Source-backed
Revision
v1 ยท 2,438 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
11 citations
Review status
Source-backed
Revision
v1 ยท 2,438 words
Add missing citations, update stale details, or suggest a clearer explanation.
NVIDIA HGX is a family of multi-GPU baseboards and platform reference designs produced by NVIDIA that serve as the building block for most of the world's accelerated servers used in artificial intelligence and high-performance computing. At its core, an HGX platform is a single printed circuit board, often called the GPU baseboard or HGX board, that carries four or eight of NVIDIA's highest-end data-center GPUs in the SXM socket form factor, wires them together with the company's NVLink interconnect, and in the eight-GPU version adds embedded NVSwitch fabric chips so that every GPU can talk to every other GPU at full bandwidth. NVIDIA manufactures the baseboard and the on-board interconnect as a fixed, validated assembly, then sells it to server makers and cloud providers, who add CPUs, memory, storage, networking, power, and cooling to produce a complete system. Because the hard engineering of the GPU complex is already done and certified by NVIDIA, HGX lets a broad ecosystem of partners ship dense AI servers quickly while preserving a consistent multi-GPU architecture across vendors.
HGX is one of three closely related NVIDIA data-center platform brands and is frequently confused with the other two. It is the baseboard building block; DGX is NVIDIA's own fully built and branded server appliance; and MGX is a separate, more flexible modular server reference design. Distinguishing the three is central to understanding NVIDIA's data-center strategy, and is treated in detail below.
An HGX platform is best understood as a self-contained GPU subsystem rather than a complete computer. The deliverable is the baseboard plus its mounted GPUs and interconnect. The GPUs use the SXM (Server PCIe Module, also rendered SXM2/SXM4/SXM5 across generations) mezzanine form factor rather than standard PCIe add-in cards. SXM modules bolt directly onto the baseboard, draw power through it, and are cooled by the host system's air or liquid solution. This arrangement allows far higher per-GPU power limits and much denser, higher-bandwidth NVLink wiring than PCIe slots permit, which is why HGX is the form factor of choice for large-scale training.
The defining feature of the eight-GPU HGX board is its embedded NVSwitch fabric. Each GPU exposes a large number of NVLink ports, and the on-board NVSwitch chips form a non-blocking crossbar so that all eight GPUs reach one another simultaneously at the full per-GPU NVLink rate, making the board behave like a single large "super-GPU" for memory-hungry, communication-intensive workloads such as training large transformer models. The four-GPU HGX board omits NVSwitch and instead connects its GPUs with direct point-to-point NVLink.
What HGX deliberately leaves out is just as important as what it includes. The baseboard does not come with host CPUs, system DRAM, boot storage, a chassis, or its power and cooling solution. NVIDIA also fixes the parts that define the GPU complex: GPU count, GPU model, memory configuration, NVLink wiring, and NVSwitch layout are all set by NVIDIA and cannot be altered by the integrator. Partners differentiate everything around the board, choosing host processors (typically Intel Xeon or AMD EPYC, or NVIDIA's own Arm-based CPUs in some designs), the amount of system memory, networking adapters, drive bays, and the air- or liquid-cooling design. The result is that competing vendors' HGX servers share an identical GPU and interconnect architecture while differing in density, serviceability, thermal design, and price.
The three platforms target different points on the spectrum between turnkey and build-it-yourself.
DGX systems are NVIDIA's own complete, branded servers. NVIDIA takes an HGX baseboard and integrates it into a finished appliance with a fixed set of CPUs, memory, storage, networking, and a validated software stack including NVIDIA AI Enterprise, then sells and supports the whole unit directly. A DGX is therefore a specific, NVIDIA-defined product (for example the DGX H100 or DGX B300) with one bill of materials, while HGX is a component that many vendors build many different products around. Because a DGX bundles NVIDIA's software and direct engineering support, it typically costs more than an equivalent partner server built on the same HGX board. Multiple DGX nodes can be aggregated into reference-architecture clusters such as the DGX SuperPOD.
HGX sits one level below DGX in the stack. It is the GPU baseboard that NVIDIA supplies to original equipment manufacturers (OEMs) and cloud providers as a building block. Crucially, the very same HGX baseboard sits inside NVIDIA's own DGX systems and inside partner servers alike, which is why a DGX and an OEM HGX server of the same generation share identical GPU-to-GPU bandwidth and topology. HGX targets the highest-bandwidth eight-GPU training and inference nodes.
MGX (Modular GPU eXtension) is a different kind of thing again. Rather than a populated board, MGX is a modular server and rack reference architecture: a set of standardized chassis, sleds, power, cooling, and connector specifications that lets OEMs and ODMs build a wide range of accelerated systems from common building blocks and reuse those designs across GPU generations. MGX supports many configurations, including PCIe GPUs, varying GPU counts, and a mix of x86 or NVIDIA Grace CPUs, and it covers inference servers, mixed CPU-GPU nodes, and smaller deployments that do not need a full eight-GPU NVSwitch island. In short, HGX standardizes the GPU complex itself, DGX is NVIDIA's finished system, and MGX standardizes the surrounding server and rack.
The lineage of HGX traces to 2017, when NVIDIA introduced the HGX-1 hyperscale GPU accelerator at the Open Compute Project (OCP) Summit in collaboration with Microsoft. HGX-1 packaged eight Tesla P100 GPUs based on the Pascal architecture and was built as a component of Microsoft's Project Olympus open server design; the same basic design also underpinned Facebook's Big Basin systems and was closely related to NVIDIA's original DGX-1. HGX-1 was engineered to be drop-in ready for the Volta-generation Tesla V100, and it used a PCIe-switch-based fabric that could link up to 32 GPUs across chassis. Over the following generations the HGX name shifted to describe NVIDIA's standardized SXM baseboard plus NVSwitch design, and the platform became the dominant template for third-party AI servers.
Each HGX generation tracks NVIDIA's flagship data-center GPU and advances the NVLink and NVSwitch fabric. The boards are commonly offered in four-GPU and eight-GPU variants, with the eight-GPU board carrying the embedded NVSwitch fabric. The table below summarizes the major NVSwitch-based generations; figures are for the eight-GPU baseboard unless noted.
| Platform | Architecture | GPUs (board) | NVSwitch (8-GPU) | NVLink generation, per-GPU bandwidth | Aggregate / total GPU memory | Peak compute (8-GPU) |
|---|---|---|---|---|---|---|
| HGX A100 | Ampere | 4 or 8 | 6 NVSwitch | 3rd-gen NVLink, 600 GB/s | up to 640 GB HBM2e | 2.4 TB/s NVLink bisection |
| HGX H100 | Hopper | 4 or 8 | 4 NVSwitch (3rd gen) | 4th-gen NVLink, 900 GB/s | 640 GB HBM3 | ~32 PFLOPS FP8; 3.6 TB/s NVLink bisection |
| HGX H200 | Hopper | 4 or 8 | 4 NVSwitch (3rd gen) | 4th-gen NVLink, 900 GB/s | over 1.1 TB HBM3e | over 32 PFLOPS FP8; 3.6 TB/s NVLink bisection |
| HGX B200 | Blackwell | 8 | NVSwitch (5th-gen NVLink) | 5th-gen NVLink, 1.8 TB/s | up to 1.4 TB HBM3e | 144 PFLOPS FP4; 14.4 TB/s total NVLink |
| HGX B300 | Blackwell Ultra | 8 | NVSwitch (5th-gen NVLink) | 5th-gen NVLink, 1.8 TB/s | 2.3 TB HBM3e | higher FP4; 14.4 TB/s total NVLink |
The A100-based HGX was the first to define the modern eight-GPU-plus-NVSwitch template. The HGX A100 8-GPU baseboard carries eight A100 SXM GPUs and six first-generation NVSwitch chips, with each A100 exposing 12 NVLink ports. The NVSwitch mesh lets any A100 reach any other at 600 GB/s of bidirectional NVLink bandwidth, roughly ten times a PCIe Gen4 x16 link. NVIDIA also offered a four-GPU HGX A100 baseboard that uses direct NVLink without NVSwitch, delivering 200 GB/s peer bandwidth. Two eight-GPU baseboards could be joined back-to-back through NVSwitch to form a fully connected 16-GPU domain.
With the Hopper generation, NVIDIA simplified the eight-GPU board to four third-generation NVSwitch chips while increasing per-GPU bandwidth. On the HGX H100 8-GPU board, fourth-generation NVLink connects any two H100 GPUs at 900 GB/s, and the system reaches 3.6 TB/s of NVLink bisection bandwidth, a 1.5x increase over HGX A100's 2.4 TB/s. The eight-GPU board delivers on the order of 32 petaFLOPS of FP8 compute. The four-GPU HGX H100 board uses direct fourth-generation NVLink at 300 GB/s peer bandwidth. HGX H100 also introduced an optional external NVLink Switch System (NVLink Network) that can extend the NVLink domain across multiple eight-GPU nodes to as many as 256 GPUs.
The HGX H200 board is mechanically and electrically the same eight-GPU, four-NVSwitch design, but populated with H200 GPUs that swap HBM3 for faster, larger HBM3e. Each H200 carries 141 GB of HBM3e at 4.8 TB/s, so an eight-way HGX H200 provides more than 1.1 TB of aggregate GPU memory while retaining the 900 GB/s NVLink and over 32 petaFLOPS of FP8 compute. The extra memory capacity and bandwidth are aimed squarely at large-language-model inference, where model and key-value-cache sizes are the binding constraint.
The Blackwell generation moved HGX to fifth-generation NVLink at 1.8 TB/s per GPU, giving the eight-GPU board 14.4 TB/s of total NVLink bandwidth. The HGX B200 8-GPU board carries eight Blackwell GPUs with up to 1.4 TB of HBM3e (roughly 180 GB per GPU) and reaches 144 petaFLOPS of FP4 Tensor Core compute for inference and 72 petaFLOPS of FP8 for training. In typical B200 servers, networking is provided by eight discrete ConnectX-7 adapters at 400 Gb/s each (one per GPU), optionally paired with BlueField-3 DPUs.
The HGX B300 board, based on Blackwell Ultra, increases per-GPU HBM3e from roughly 192 GB to 288 GB, for about 2.3 TB of GPU memory across the eight-GPU board, and raises FP4 throughput further. A notable system-level change is that the HGX B300 8-GPU baseboard integrates ConnectX-8 networking directly onto the board, with each interface running at 800 Gb/s, doubling per-GPU scale-out bandwidth versus the B200's discrete ConnectX-7 cards. These NVSwitch-connected eight-GPU HGX boards are distinct from NVIDIA's rack-scale Grace-Blackwell systems such as the GB200 NVL72 and GB300 NVL72, which link dozens of GPUs and Grace CPUs across an entire liquid-cooled rack through external NVLink switches rather than on a single baseboard.
Beyond Blackwell, NVIDIA has signalled that the HGX platform continues with future GPU architectures (such as the Rubin generation) and increasingly tightly couples the GPU baseboard with NVIDIA's own Arm-based server CPUs and networking, while preserving the same eight-GPU, all-to-all-NVLink design philosophy.
The eight-GPU HGX board is the platform's signature configuration. Its purpose is to make eight discrete GPUs behave, for the purposes of a parallel job, like one tightly coupled accelerator. Each GPU's many NVLink lanes are routed not GPU-to-GPU in a fixed ring but into the on-board NVSwitch chips, which act as a non-blocking crossbar. Because every GPU connects to every NVSwitch, any pair of GPUs can communicate at the full per-GPU NVLink rate at the same time as every other pair, with no oversubscription. This full-mesh, all-to-all fabric is what allows efficient tensor-parallel and other collective operations across all eight GPUs, and it is the principal architectural advantage of SXM-based HGX servers over loosely coupled PCIe GPU boxes. The four-GPU board provides a lower-cost option for workloads that do not require an eight-way NVSwitch island, connecting its GPUs with direct NVLink. For scale beyond a single node, GPUs communicate over the host's networking fabric, historically InfiniBand or Ethernet via ConnectX and BlueField adapters, and, in the largest configurations, over an external NVLink Switch System.
HGX exists precisely so that NVIDIA does not have to build every AI server itself. The platform is the foundation for the dense GPU servers sold by essentially every major server vendor. Dell (PowerEdge XE series, including the XE9712 for HGX B300), Supermicro (a broad HGX line such as the SYS-821GB-TNRX), Hewlett Packard Enterprise (Cray XD670), Lenovo (ThinkSystem SR680b and related models), Gigabyte (G383 series), and Cisco (the UCS C885A M8, offered with eight HGX H100 or H200 GPUs) all ship systems built around NVIDIA HGX baseboards, differentiating on chassis design, cooling, CPU choice, and serviceability while sharing the identical NVIDIA GPU complex. The same baseboards underpin the GPU instances offered by the major cloud providers and GPU specialists, including Microsoft Azure, Google Cloud, Oracle Cloud Infrastructure, Amazon Web Services, and dedicated AI clouds such as CoreWeave and Lambda. This breadth is the practical reason HGX matters: when an organization rents or buys an eight-GPU H100, H200, or B200 node from almost any vendor, it is in most cases using an NVIDIA HGX board.
HGX is, in effect, the standard module from which the modern AI data center is assembled. By fixing the hardest and most performance-critical part of an AI server, the multi-GPU NVLink and NVSwitch complex, and delivering it as a validated baseboard, NVIDIA propagated a single high-bandwidth multi-GPU architecture across competing OEMs and clouds, accelerated time to market for new GPU generations, and ensured software portability across vendors. The eight-GPU HGX node became the de facto unit of AI compute for much of the deep-learning era and remains the building block beneath a large share of large language model training and inference capacity worldwide, complemented at the top end by NVIDIA's rack-scale NVL72 systems for the very largest models.