NVIDIA DGX SuperPOD

AI Infrastructure Data Centers NVIDIA

11 min read

Updated Jul 17, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 17, 2026

Fact-checked

In review queue

Sources

14 citations

Revision

v3 · 2,289 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

The NVIDIA DGX SuperPOD is a reference-architecture artificial intelligence supercomputer designed and sold by Nvidia. It is a turnkey, factory-validated cluster built from the company's DGX systems, connected by high-speed InfiniBand networking, paired with shared high-performance storage, and managed by NVIDIA's data-center software stack. Rather than asking customers to design a large GPU cluster from scratch, NVIDIA delivers the DGX SuperPOD as a pre-engineered, repeatable blueprint that can be deployed in weeks instead of months. NVIDIA markets it as a complete "AI factory" or "turnkey AI data center solution," combining compute, networking, storage, and software that are tuned to work together at scale.^[1]

A defining characteristic of the DGX SuperPOD is that it is built around a modular building block called the scalable unit (SU). Each SU is a fixed group of DGX systems plus the networking and management infrastructure needed to operate them, so that customers can size a deployment predictably by adding units. This design lets organizations grow from a few racks to clusters with tens of thousands of GPUs while keeping the same validated topology, cabling scheme, and software. NVIDIA also uses the DGX SuperPOD architecture to build its own internal supercomputers, most notably Selene (based on DGX A100 systems) and Eos (based on DGX H100 systems), both of which have appeared on the TOP500 list of the world's fastest supercomputers.^[6]^[7]

What a DGX SuperPOD is

A DGX SuperPOD is not a single product with one fixed specification. It is a reference architecture: a documented, supported design that specifies which DGX systems to use, how to wire them together, what storage and management nodes to include, and which software to run. Each generation is published as a formal reference-architecture document by NVIDIA.^[2] The intent is to remove the engineering risk and integration effort of standing up a large training cluster, so an enterprise or research lab receives a system that has already been validated to deliver predictable performance at scale.

The core ingredients of every DGX SuperPOD generation are consistent:

Compute: A number of NVIDIA DGX systems, each containing eight (or, in the Grace Blackwell generations, dozens of) NVIDIA data-center GPUs linked internally by NVLink and NVSwitch.
Compute fabric: A low-latency InfiniBand network, using NVIDIA's Quantum series of switches (acquired with Mellanox), arranged as a rail-optimized fat tree so that every GPU has a dedicated high-bandwidth path across the cluster.^[2]
Storage fabric: A separate high-bandwidth network connecting a parallel file system and object storage, so that data movement does not contend with GPU-to-GPU traffic.
Management and in-band/out-of-band networks: Ethernet networks for provisioning, telemetry, and control, plus management and login nodes.
Software: NVIDIA's cluster-management and workload-orchestration stack, historically NVIDIA Base Command (including Base Command Manager for provisioning, monitoring, and cluster management), and more recently NVIDIA Mission Control for the Blackwell-era systems.^[13]

Because the whole stack is specified and tested together, NVIDIA positions the DGX SuperPOD as a supported alternative to building a bespoke cluster, with the company providing white-glove installation, support, and lifecycle services.

The scalable-unit concept

The scalable unit is the organizing principle of the DGX SuperPOD. An SU bundles a fixed number of DGX nodes together with the leaf-layer InfiniBand switches that connect them, so the cluster can be assembled and grown in well-defined increments. Multiple SUs are then joined through additional spine and, for the largest systems, core switching layers, forming a multi-tier fat-tree fabric.

The size of a scalable unit has changed with each DGX generation:

Generation	DGX system	GPU per node	Nodes per scalable unit	GPUs per scalable unit
DGX A100 SuperPOD	DGX A100	8 x A100	20	160
DGX H100 / H200 SuperPOD	DGX H100 / H200	8 x H100 or H200	32	256
DGX B200 SuperPOD	DGX B200	8 x B200	32	256
DGX GB200 SuperPOD	DGX GB200 (NVL72 rack)	72 x Blackwell per rack	8 systems	576
DGX B300 SuperPOD	DGX B300	8 x Blackwell Ultra	64	512

For example, in the original DGX A100 SuperPOD, four people could rack a single 20-node scalable unit in about an hour, producing a roughly 2-petaflops building block; a standard SuperPOD of seven such units totaled 140 DGX A100 systems and 1,120 A100 GPUs.^[5] The Hopper-generation DGX H100 SuperPOD moved to 32 nodes per SU, giving 256 H100 GPUs per unit, and added an external NVLink Switch System so that GPUs across multiple nodes could share memory bandwidth more tightly.^[2] In the Grace Blackwell DGX GB200 SuperPOD, the unit is defined as eight liquid-cooled DGX GB200 rack systems, connecting 576 Blackwell GPUs in a single NVLink domain.^[3]

Generations

The DGX SuperPOD has tracked NVIDIA's GPU roadmap closely, with each new architecture producing a corresponding SuperPOD reference design.

Origin: the 2019 DGX SuperPOD

NVIDIA first introduced the DGX SuperPOD in June 2019. The original system was built from 96 DGX-2H servers containing 1,536 Volta-based V100 GPUs, linked with Mellanox EDR (100 Gb/s) InfiniBand.^[4] It delivered roughly 9.4 petaflops on the High Performance Linpack (HPL) benchmark against an 11.2-petaflops theoretical peak, consumed about one megawatt of power, and debuted at number 22 on the June 2019 TOP500 list.^[4] NVIDIA reported that it was assembled in only a few weeks and built primarily to support the company's autonomous-vehicle development program, while also running graphics, speech, healthcare, and HPC workloads.^[4]

DGX A100 SuperPOD

Announced in 2020 alongside the A100 GPU and the Ampere architecture, the DGX A100 SuperPOD formalized the scalable-unit model that the product is known for today. Each DGX A100 node carries eight A100 GPUs and two AMD EPYC 7742 64-core CPUs.^[5] A scalable unit was 20 nodes, and a standard SuperPOD comprised seven SUs for 140 nodes and 1,120 GPUs.^[5] The compute fabric used Mellanox Quantum HDR 200 Gb/s InfiniBand switches in a rail-optimized fat tree, with separate InfiniBand links for compute and storage, and storage delivered by a high-throughput parallel file system.^[5] The A100 reference architecture established the validated cabling, switching, and management patterns that later generations refined.

DGX H100 and H200 SuperPOD

With the Hopper generation, NVIDIA grew the scalable unit to 32 DGX H100 nodes, yielding 256 H100 GPUs per SU.^[2] Each DGX H100 node pairs eight H100 GPUs with two Intel Xeon Platinum 8480C CPUs and ConnectX-7 network adapters running 400 Gb/s NDR InfiniBand on NVIDIA's Quantum-2 switches.^[2] The H100 SuperPOD added the NVLink Switch System, extending the in-node NVLink fabric across nodes within a scalable unit. The reference architecture scales from a baseline of four SUs (128 nodes, 1,024 GPUs) up to designs of dozens of units and many thousands of GPUs.^[2] A closely related variant uses the memory-enhanced H200 GPU with the same 32-node SU.

DGX B200 SuperPOD

The Blackwell-based DGX B200 SuperPOD kept the air-cooled eight-GPU node format and the 32-node scalable unit (256 GPUs per SU), upgrading the GPUs to B200 and refreshing the networking. It serves customers who want a Blackwell-class training and inference cluster in the same physical and operational pattern as the Hopper systems.

DGX GB200 and GB300 (Blackwell) SuperPOD

At GTC in March 2024, NVIDIA announced the Grace Blackwell DGX SuperPOD built from DGX GB200 systems, a liquid-cooled, rack-scale design for trillion-parameter generative AI.^[12] Each DGX GB200 is a rack-scale system in which 36 Grace CPUs and 72 Blackwell GPUs (organized as 36 GB200 Superchips, see GB200 NVL72) are connected as one unit by fifth-generation NVLink.^[12] A scalable unit is eight DGX GB200 systems, connecting 576 Blackwell GPUs in a shared NVLink domain, with NVIDIA Quantum InfiniBand joining units into larger systems.^[3] NVIDIA stated that a Grace Blackwell DGX SuperPOD provides 11.5 exaflops of AI compute at FP4 precision with 240 terabytes of fast memory in its base configuration, scaling further with additional racks.^[12]

At GTC in March 2025, NVIDIA followed with the Blackwell Ultra DGX SuperPOD using DGX GB300 and DGX B300 systems.^[13] Each DGX GB300 rack system pairs 36 Grace CPUs with 72 Blackwell Ultra GPUs and includes 38 terabytes of fast memory; the systems connect via NVLink, NVIDIA Quantum-X800 800 Gb/s InfiniBand, and NVIDIA Spectrum-X Ethernet, with 72 ConnectX-8 SuperNICs per system.^[13] These Blackwell Ultra deployments are managed by NVIDIA Mission Control software.^[13]

NVIDIA's own deployments: Selene and Eos

NVIDIA validates each DGX SuperPOD generation by building large internal systems of its own, both of which became prominent TOP500 entries.

Selene was NVIDIA's DGX A100 SuperPOD. NVIDIA assembled the first version extremely quickly in mid-2020, and it debuted at number 7 on the June 2020 TOP500 list at about 27.6 petaflops on Linpack.^[8] NVIDIA then expanded Selene and re-ran the benchmark: in its full configuration it comprised 560 DGX A100 nodes, with the TOP500-submitted run using 1,080 AMD EPYC CPUs and 4,320 A100 GPUs to reach 63.46 petaflops (against a 79.22-petaflops peak), which placed it at number 5 on the November 2020 list.^[6] Selene was also one of the most energy-efficient large systems of its era, ranking highly on the Green500.^[14] NVIDIA used Selene for internal research and for record-setting submissions to the MLPerf training and inference benchmarks, and it served as the in-house proof point for the scalable-unit design that NVIDIA sold to customers.^[14]

Eos is NVIDIA's DGX H100 SuperPOD, unveiled in 2022 and brought online in 2023.^[9] The system that NVIDIA submitted to the TOP500 in November 2023 was built from 576 DGX H100 systems, for a total of 4,608 H100 GPUs joined by Quantum-2 NDR400 InfiniBand, and it achieved 121.4 petaflops on Linpack (against a 188.65-petaflops peak), debuting at number 9.^[7] NVIDIA described this Eos as delivering up to 18 exaflops of FP8 AI performance.^[9] NVIDIA also operated a larger, separately configured Eos-class system with 10,752 H100 GPUs that it used for headline MLPerf training runs; the existence of two differently sized machines under the Eos name caused some public confusion about its exact specifications.^[10] The clarifying detail is that the TOP500 entry and the largest MLPerf submission referred to different physical clusters.^[11]

System	SuperPOD generation	DGX nodes	GPUs	Linpack (Rmax)	Highest TOP500 rank
2019 DGX SuperPOD	DGX-2H (Volta V100)	96	1,536	9.44 PFlop/s	No. 22 (Jun 2019)
Selene	DGX A100	up to 560	4,320 (in benchmark run)	63.46 PFlop/s	No. 5 (Nov 2020)
Eos	DGX H100	576	4,608	121.4 PFlop/s	No. 9 (Nov 2023)

Enterprise and research adoption

The DGX SuperPOD is aimed at organizations that need a large, supported AI training cluster but do not want to design, integrate, and tune one themselves. Because the architecture is validated end to end and delivered with NVIDIA installation and support services, it offers a faster and lower-risk path to a TOP500-class system than assembling commodity components. Customers have included national research centers, universities, government agencies, automotive and pharmaceutical companies, and cloud and sovereign-AI operators.

Notable external deployments built on the DGX SuperPOD architecture include NVIDIA's own Cambridge-1, a DGX A100 SuperPOD in the United Kingdom dedicated to healthcare and life-sciences research, and large national and commercial systems in the Middle East, Europe, and Asia. The reference architecture has also been offered through partners and integrated into NVIDIA's broader portfolio, including managed access to SuperPOD-class infrastructure via NVIDIA's cloud offerings.

Significance

The DGX SuperPOD has been influential as one of the first widely available, productized blueprints for building large-scale AI supercomputers. By codifying the scalable-unit model, a tested InfiniBand fat-tree fabric, integrated storage, and a unified management stack, NVIDIA turned the previously bespoke task of building an AI training cluster into a repeatable, supported product. Its internal incarnations, Selene and Eos, demonstrated that AI-optimized clusters of GPUs could rank among the most powerful supercomputers in the world while remaining highly energy efficient, and they provided NVIDIA with the platform for its competitive MLPerf results. As generative AI drove demand for ever-larger training systems, the DGX SuperPOD reference architecture became a template that enterprises, governments, and cloud providers used to stand up "AI factories" at scale, and it continues to evolve in lockstep with NVIDIA's GPU roadmap through the Hopper, Blackwell, and Blackwell Ultra generations.

References

NVIDIA, "NVIDIA DGX SuperPOD: A Turnkey AI Supercomputer." https://www.nvidia.com/en-us/data-center/dgx-superpod/ ↩
NVIDIA Docs, "DGX SuperPOD Architecture (Reference Architecture Featuring NVIDIA DGX H100)." https://docs.nvidia.com/dgx-superpod/reference-architecture-scalable-infrastructure-h100/latest/dgx-superpod-architecture.html ↩
NVIDIA Docs, "Key Components of the DGX SuperPOD (Reference Architecture Featuring NVIDIA DGX GB200)." https://docs.nvidia.com/dgx-superpod/reference-architecture-scalable-infrastructure-gb200/latest/dgx-superpod-components.html ↩
The Next Platform, "Inside Nvidia's DGX SuperPOD Cluster," June 18, 2019. https://www.nextplatform.com/hpc/2019/06/18/inside-nvidias-dgx-superpod-cluster/ ↩
ServeTheHome, "NVIDIA DGX A100 SuperPod Detailed Look." https://www.servethehome.com/nvidia-dgx-a100-superpod-detailed-look/ ↩
TOP500, "Selene - NVIDIA DGX A100, AMD EPYC 7742 64C 2.25GHz, NVIDIA A100, Mellanox HDR Infiniband." https://www.top500.org/system/179842/ ↩
TOP500, "Eos NVIDIA DGX SuperPOD - NVIDIA DGX H100, Xeon Platinum 8480C, NVIDIA H100, Infiniband NDR400." https://top500.org/system/180239/ ↩
HPCwire, "Nvidia Nabs #7 Spot on Top500 with Selene, Launches A100 PCIe Cards," June 22, 2020. https://www.hpcwire.com/2020/06/22/nvidia-nabs-7-spot-on-top500-with-selene-launches-a100-pcie-cards/ ↩
NVIDIA Blog, "NVIDIA Eos Revealed: Peek Into Operations of a Top 10 Supercomputer." https://blogs.nvidia.com/blog/eos/ ↩
The Next Platform, "A Tale Of Two Nvidia Eos Supercomputers," March 6, 2024. https://www.nextplatform.com/2024/03/06/a-tale-of-two-nvidia-eos-supercomputers/ ↩
The Register, "What's going on with Eos, Nvidia's shrinking supercomputer?" February 19, 2024. https://www.theregister.com/2024/02/19/eos_nvidia_supercomputer/ ↩
NVIDIA Newsroom, "NVIDIA Launches Blackwell-Powered DGX SuperPOD for Generative AI Supercomputing at Trillion-Parameter Scale," March 18, 2024. https://nvidianews.nvidia.com/news/nvidia-blackwell-dgx-generative-ai-supercomputing ↩
NVIDIA Newsroom, "NVIDIA Blackwell Ultra DGX SuperPOD Delivers Out-of-the-Box AI Supercomputer for Enterprises to Build AI Factories," March 18, 2025. https://nvidianews.nvidia.com/news/blackwell-ultra-dgx-superpod-supercomputer-ai-factories ↩
Selene (supercomputer), Wikipedia. https://en.wikipedia.org/wiki/Selene_(supercomputer) ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

NVIDIA AI Enterprise NVIDIA DGX B300 NVIDIA HGX

What a DGX SuperPOD is

The scalable-unit concept

Generations

Origin: the 2019 DGX SuperPOD

DGX A100 SuperPOD

DGX H100 and H200 SuperPOD

DGX B200 SuperPOD

DGX GB200 and GB300 (Blackwell) SuperPOD

NVIDIA's own deployments: Selene and Eos

Enterprise and research adoption

Significance

References

Improve this article

Related Articles

NVIDIA B200

NVIDIA GB300 NVL72

NVIDIA DGX B300

NVIDIA A100

NVLink

NVIDIA H200

What links here

Related Articles

NVIDIA B200

NVIDIA GB300 NVL72

NVIDIA DGX B300

NVIDIA A100

NVLink

NVIDIA H200

What links here