# NVIDIA GB200 NVL72

> Source: https://aiwiki.ai/wiki/nvidia_gb200_nvl72
> Updated: 2026-06-21
> Categories: AI Hardware, NVIDIA
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

The **NVIDIA GB200 NVL72** is a rack-scale [AI](/wiki/ai) computing system that packages 72 [Blackwell](/wiki/nvidia_blackwell) [GPUs](/wiki/gpu) and 36 [Grace](/wiki/grace_cpu) [CPUs](/wiki/cpu) into a single liquid-cooled rack, joined into one [NVLink](/wiki/nvlink) domain that the software stack treats as a single massive accelerator.[1] [NVIDIA](/wiki/nvidia) markets it as delivering up to 30x faster real-time inference and up to 25x lower cost and energy consumption on trillion-parameter [large language models](/wiki/large_language_model) compared with the same number of [H100](/wiki/nvidia_h100) GPUs.[2] The platform was announced at GTC 2024 on March 18, 2024, as the company's flagship answer to the compute demand created by trillion-parameter models and reasoning workloads.[2] The first production racks were delivered to hyperscale customers in late 2024 and the platform ramped to volume through 2025.[13]

The central design move is the unified NVLink domain across all 72 GPUs. Earlier generations of NVIDIA [HGX](/wiki/hgx) and [DGX](/wiki/dgx) servers connected eight GPUs over [NVSwitch](/wiki/nvswitch) at high bandwidth and then dropped to slower [InfiniBand](/wiki/infiniband) or [Ethernet](/wiki/ethernet) for everything beyond the box.[11] The NVL72 keeps NVLink continuous across the whole rack at 1.8 TB/s per GPU, which lets a model with weights and activations far larger than any single GPU's [HBM3e](/wiki/hbm3e) memory run as if it were on one device.[3] NVIDIA cites a peak of 1.4 [exaFLOPS](/wiki/exaflops) of [FP4](/wiki/fp4) inference compute and 720 [petaFLOPS](/wiki/petaflops) of [FP8](/wiki/fp8) training compute per rack, with about 13.4 TB of unified GPU memory and 130 TB/s of aggregate NVLink bandwidth.[1][2]

The NVL72 became the most consequential single product in [data center](/wiki/data_center) infrastructure during 2024 and 2025. Every major hyperscaler placed multi-billion-dollar orders. [Microsoft](/wiki/microsoft), [Meta](/wiki/meta), [Oracle](/wiki/oracle), [AWS](/wiki/aws), Google, and [CoreWeave](/wiki/coreweave) all committed to large GB200 fleets, with the system serving as the substrate for [OpenAI](/wiki/openai) inference, [Llama](/wiki/llama) training, and the next wave of frontier model training. A typical configured rack ships for roughly USD 3.0 to 3.4 million.[7]

## Infobox

| Field | Value |
|---|---|
| Type | Liquid-cooled rack-scale AI computing system |
| GPU architecture | [Blackwell](/wiki/nvidia_blackwell) (B200) |
| GPUs per rack | 72 Blackwell GPUs |
| CPUs per rack | 36 Grace ARM Neoverse V2 CPUs |
| Superchip building block | GB200 Grace Blackwell Superchip (1 Grace + 2 Blackwell) |
| Compute trays | 18 (1U each), 2 Grace + 4 Blackwell per tray |
| NVLink switch trays | 9 (2 NVSwitch chips per tray) |
| GPU memory | ~13.4 TB HBM3e unified |
| CPU memory | ~17 TB LPDDR5X |
| NVLink bandwidth per GPU | 1.8 TB/s bidirectional (fifth-generation) |
| Total NVLink fabric bandwidth | 130 TB/s |
| FP4 inference (dense) | 1.4 exaFLOPS |
| FP8 training (dense) | 720 petaFLOPS |
| Power draw | Up to 120 kW |
| Weight | ~1.36 metric tons (~3,000 lb) |
| Cooling | 100% direct liquid cooling, no fans in rack |
| External networking | NVIDIA Quantum-X800 InfiniBand or Spectrum-X Ethernet, ConnectX-7 or ConnectX-8 |
| Announced | March 18, 2024 (GTC 2024) |
| First customer shipments | Late 2024 |
| Volume ramp | Q2 to Q3 2025 |
| List price per rack | ~USD 3.0 to 3.4 million |
| Successor | [NVIDIA GB300 NVL72](/wiki/nvidia_gb300_nvl72) |

## What is the GB200 NVL72?

NVIDIA introduced the [Blackwell](/wiki/nvidia_blackwell) architecture at GTC 2024 in March 2024 as the successor to [Hopper](/wiki/nvidia_hopper).[2] Each Blackwell GPU is a multi-die package: two reticle-limited dies on TSMC's custom 4NP process joined by NV-HBI, a 10 TB/s die-to-die interconnect, and presented as one logical GPU. The combined package contains 208 billion transistors and ships with 192 GB of HBM3e memory at 8 TB/s.[2] Blackwell adds native [FP4](/wiki/fp4) tensor support to the Hopper-era FP8 [Transformer Engine](/wiki/transformer_engine), roughly doubling raw throughput for inference workloads that tolerate four-bit precision.[3]

The **GB200 Grace Blackwell Superchip** is the building block. It pairs a single [Grace](/wiki/grace_cpu) CPU with two Blackwell B200 GPUs using NVIDIA's NVLink-C2C cache-coherent chip-to-chip interconnect at 900 GB/s.[3] The Grace CPU contributes 72 Arm Neoverse V2 cores and 480 GB of LPDDR5X memory, serving as the host processor for orchestration and data preprocessing. Each superchip carries 384 GB of HBM3e total across its two GPUs.[11]

NVIDIA's earlier rack-scale efforts built up to this point in increments. The GH200 NVL32, launched alongside Hopper, connected 32 Grace-Hopper superchips into a single NVLink domain through external switch boxes. The GB200 NVL72 collapses the whole stack into one 48U cabinet by integrating the switch trays into the same rack as the compute, pushing the entire domain onto a passive copper backplane.[11] Jensen Huang framed Blackwell at GTC 2024 with the line, "Generative AI is the defining technology of our time. Blackwell is the engine to power this new industrial revolution."[2] From the software's perspective, the NVL72 presents a single unified address space and a flat NVLink topology that hides the per-package boundaries.

## What does the rack look like physically?

The rack itself is a 48U cabinet that weighs roughly 1.36 metric tons fully populated.[5] Power draw at peak load reaches 120 kW, which is roughly five to seven times the density of a conventional GPU rack and well beyond what most data center floors were designed to support in 2024.[6] The mechanical design is the result of close collaboration with the [Open Compute Project](/wiki/open_compute_project), and NVIDIA contributed the reference rack design to OCP in October 2024.[4]

A fully populated rack contains:

| Component | Count | Notes |
|---|---|---|
| Compute trays | 18 | 1U, 2 Grace CPUs + 4 Blackwell GPUs each |
| NVLink switch trays | 9 | 2 fifth-generation NVSwitch chips each |
| Power shelves | 4 to 8 | Top and bottom mounted |
| NVLink copper backplane | 1 | ~5,000 passive copper cables |
| Cooling distribution unit | 1 | Liquid manifold, top of rack |

The **compute tray** houses two GB200 superchips in 1U. NVIDIA solders the LPDDR5X directly to the motherboard and routes cooling through cold plates on the GPU and CPU packages. There are no fans in compute trays. Data signaling exits through blind-mate connectors on the rear.[11] The **switch tray** holds two fifth-generation NVSwitch chips, each delivering 7.2 TB/s across 144 NVLink ports at 50 GB/s.[3]

The **NVLink copper backplane** is the headline mechanical element. NVIDIA stuck with passive copper rather than active optics: roughly 5,000 copper twinax cables routed through a custom cartridge backplane behind all 27 trays, carrying 130 TB/s of bandwidth at zero retiming or optical conversion cost.[11] Optical alternatives would burn an extra 20 kW for lasers and SerDes alone.[11] The downside is assembly complexity: each cable is hand-routed by ODM technicians.

### Why does the NVL72 require liquid cooling?

The NVL72 is 100% direct liquid cooled with no air path inside the cabinet. Coolant enters at the top, runs through manifolds to cold plates on every GPU, CPU, NVSwitch, and voltage regulator, and exits at the bottom.[6] Typical supply temperatures are 32 to 45 degrees Celsius, warm enough that many deployments use chiller-less or evaporative-only outdoor cooling rather than mechanical chilling, dramatically improving facility-level Power Usage Effectiveness.[5]

Liquid cooling is not optional. Each Blackwell GPU draws up to 1,200 W and each Grace CPU another 300 W; the combined heat flux through a 1U tray with 4 GPUs and 2 CPUs would not move with air.[6] Facilities need rear-door heat exchangers or direct facility water connections, redundant pumps, leak detection, and chilled water at the proper supply temperature. Most data centers built before 2023 must be retrofitted, one of the gating factors on deployment pace.

## How does the unified NVLink domain work?

The defining feature of the NVL72 is the unified NVLink domain spanning 72 GPUs. Large [mixture-of-experts](/wiki/mixture_of_experts) models can have hundreds of gigabytes of weights, and serving them at low latency requires either fitting the whole model on one accelerator (impossible above ~192 GB) or sharding across many accelerators with constant cross-device traffic for expert routing and KV cache lookups. In a conventional eight-GPU server, that cross-device traffic is fast within the box but slow across boxes. In an NVL72, all 72 GPUs sit on the same NVLink fabric at 1.8 TB/s per-GPU bandwidth, so a sharded model behaves much more like a single-device deployment.[3]

The practical numbers:

| Metric | Value |
|---|---|
| NVLink generation | 5 |
| Per-GPU NVLink bandwidth | 1.8 TB/s bidirectional |
| Per-port bandwidth | 50 GB/s |
| Ports per GPU | 18 |
| NVSwitch chip bandwidth | 7.2 TB/s |
| NVSwitch ports per chip | 144 at 50 GB/s |
| NVSwitch chips per rack | 18 (across 9 trays) |
| Total NVLink fabric bandwidth | 130 TB/s |
| GPUs per NVLink domain | 72 |
| Aggregate GPU memory addressable from any GPU | ~13.4 TB |

NVIDIA's description of "36 times faster than 400 Gbps Ethernet" captures the magnitude.[1] The NVLink fabric is also far lower latency: about 1 microsecond hop-to-hop for adjacent GPUs versus ten to twenty microseconds for InfiniBand or Ethernet.[11] For multi-rack scale-out, each compute tray has slots for ConnectX-7 or ConnectX-8 SuperNICs supporting either Quantum-X800 InfiniBand or Spectrum-X Ethernet at 800 Gbps.[1]

## How fast is the GB200 NVL72?

NVIDIA quotes the following peak compute figures for a single GB200 NVL72 rack:[1]

| Precision | Throughput (dense) |
|---|---|
| FP4 Tensor Core | 1,440 PFLOPS (1.44 exaFLOPS) |
| FP8 Tensor Core | 720 PFLOPS |
| FP16 / BF16 Tensor Core | 360 PFLOPS |
| TF32 Tensor Core | 180 PFLOPS |
| FP64 Tensor Core | 3.24 PFLOPS |

At the platform level, NVIDIA claims the GB200 NVL72 "provides up to a 30x performance increase compared to the same number of NVIDIA H100 Tensor Core GPUs for LLM inference workloads," while reducing cost and energy consumption by up to 25x, with the largest gains on a 1.8 trillion parameter GPT mixture-of-experts model.[2]

NVIDIA's measurements, replicated in third-party MLPerf submissions, place the NVL72 at roughly 30 times higher throughput than an equivalent count of [H200](/wiki/nvidia_h200) GPUs on Llama 3.1 405B inference in MLPerf Inference v5.0, with most of the gain coming from the unified NVLink domain rather than per-GPU compute.[12] On H200 the model must shard across multiple eight-GPU nodes with cross-node InfiniBand for tensor parallelism; on NVL72 the same model fits in a single NVLink domain.

For training, CoreWeave, NVIDIA, and IBM submitted a 2,496-GPU cluster (about 35 NVL72 racks) to MLPerf Training v5.0 in mid-2025 and recorded a more than 2x speedup over Hopper at the same GPU count, with up to 3.2x speedup on the Llama 3.1 405B benchmark.[9] For mixture-of-experts models like [DeepSeek-V3](/wiki/deepseek_v3), independent SGLang benchmarks in mid-2025 measured up to 2.7x higher inference throughput on the NVL72 than on H100 for the 671B-parameter model, primarily because the expert routing layer can scatter and gather across all 72 GPUs at full NVLink bandwidth.[19]

## Networking and scale-out

A single NVL72 rack is the unit of NVLink locality. Multi-rack deployments use either **Quantum-X800 InfiniBand** at 800 Gbps (default for training clusters, used by Microsoft, CoreWeave, and Oracle) or **Spectrum-X Ethernet** at 800 Gbps (used by Meta and several neoclouds with adaptive routing and BlueField-3 DPU congestion control). Per-rack network egress is typically 36 ConnectX-8 SuperNICs (one per superchip), totaling 28.8 Tbps.[1] The egress bandwidth is intentionally a fraction of the in-rack NVLink fabric, since NVLink absorbs the heaviest tensor-parallel and KV cache traffic while InfiniBand or Ethernet handles the slower pipeline-parallel and data-parallel collectives. OCI Superclusters with NVIDIA Blackwell have been announced at the 100,000-GPU scale, which corresponds to roughly 1,400 NVL72 racks.[16]

## Who builds the racks?

NVIDIA does not assemble the racks itself; it licenses the reference design to partners.

| Partner | Role | Notes |
|---|---|---|
| Foxconn (Hon Hai) | ODM | Largest manufacturer, ~1,000 racks shipped April 2025 |
| Quanta Computer | ODM | 300 to 400 racks per month at peak ramp |
| Wistron | ODM | 150-plus racks per month |
| Wiwynn, Inventec, Pegatron, Hyve | ODM | Hyperscaler-focused configurations |
| Supermicro | OEM | SRS-GB200-NVL72 SuperCluster |
| Dell Technologies | OEM | PowerEdge XE9712 and XE9785 |
| Hewlett Packard Enterprise | OEM | HPE GB200 NVL72 racks |
| Lenovo, GIGABYTE, ASUS, QCT | OEM | Branded enterprise racks |
| Inspur, H3C | OEM (China) | Region-specific variants |

ODMs build to a hyperscaler's custom spec and ship under the customer's brand. OEMs ship branded racks through enterprise channels at higher margins. The ODM channel accounts for the majority of NVL72 volume in 2024 and 2025.[18]

## How much does a GB200 NVL72 cost?

NVIDIA does not publish list prices. Supply chain reporting and customer disclosures place the rack-scale server at roughly USD 3.0 to 3.4 million for a hyperscaler configuration.[7] The rack alone (compute, switches, backplane, cooling) is about USD 3.1 million; a fully integrated rack including networking, storage, and installation is about USD 3.9 million.[7] Per-GPU economics work out to roughly USD 40,000 to 45,000 per Blackwell B200 in the NVL72.[17]

NVIDIA's pricing strategy through the Blackwell launch has been to charge for the system rather than the chip, capturing the value of the NVLink fabric, liquid cooling integration, and unified-rack engineering. The NVL72 priced in at roughly 3x the equivalent-GPU-count H100 server cost while delivering 5x to 30x the inference throughput depending on workload.[12]

Demand has exceeded supply. By early 2026, total Blackwell hardware was sold out through Q3 2026 with an estimated backlog of approximately 3.6 million GPU-equivalents. The supply constraint is primarily TSMC CoWoS-L advanced packaging capacity, with HBM3e supply from Micron, SK hynix, and Samsung as a secondary constraint.[18]

## Why was the production ramp delayed?

The NVL72 had a difficult ramp. Initial production was slated for September 2024, but a sequence of issues pushed the volume ramp to Q2 and Q3 of 2025:[14]

- **CoWoS-L transition.** Blackwell uses TSMC's CoWoS-L (chip-on-wafer-on-substrate with local silicon interposers) rather than CoWoS-S used on Hopper. CoWoS-L allows the larger Blackwell package but had lower initial yields. Morgan Stanley estimated B200's 2025 CoWoS-L allocation at roughly 220,000 wafers, more than 30% of TSMC's total advanced packaging capacity.[18]
- **Overheating.** Early NVL72 racks hit thermal management issues at sustained 120 kW load, particularly in the NVSwitch trays and the backplane connector regions. NVIDIA worked through several mechanical and thermal interface material revisions through late 2024.[15]
- **Backplane connector revision.** Amphenol-supplied blind-mate connectors required design changes after initial production runs revealed reliability issues at scale, delaying several customer deployments by a quarter.[15]
- **Forecast cuts.** Ming-Chi Kuo's mid-2025 forecast pegged 2025 NVL72 rack shipments at 25,000 to 35,000 units, well below the 50,000 to 80,000 unit range projected in late 2024.[14]

By April 2025, industry-wide NVL72 shipments reached approximately 1,500 racks per month, with Hon Hai (Foxconn) alone shipping 1,000 racks.[13] Total 2025 NVL72 shipments settled in the 30,000 to 40,000 rack range.

## Customer deployments

- **Microsoft and OpenAI.** Microsoft is one of the largest NVL72 customers, with deployments at scale in Azure for internal workloads and [OpenAI](/wiki/openai). Azure's ND GB200 v6 VM family exposes the system to customers. Microsoft's 2025 buildout included hundreds of thousands of Blackwell GPUs anchoring [GPT-4o](/wiki/gpt_4o) and successor inference; subsequent capacity has shifted toward the [GB300 NVL72](/wiki/nvidia_gb300_nvl72).
- **Oracle.** OCI deployed thousands of NVL72 racks in 2025 and announced plans to scale beyond 100,000 Blackwell GPUs.[16] Oracle is a key partner for OpenAI's [Stargate](/wiki/stargate) project, with reports of a USD 40 billion Oracle order for NVIDIA AI GPUs.[8]
- **Meta.** [Meta](/wiki/meta) committed to a multi-hundred-thousand-GPU Blackwell fleet for 2025, with NVL72 racks anchoring [Llama](/wiki/llama) 4 training. Meta uses Spectrum-X Ethernet rather than InfiniBand.
- **CoreWeave.** [CoreWeave](/wiki/coreweave), NVIDIA, and IBM jointly submitted the largest MLPerf Training v5.0 result on Blackwell with 2,496 GPUs.[9] CoreWeave serves capacity to OpenAI, Microsoft, and other frontier AI labs.
- **AWS** launched EC2 P6e-GB200 UltraServer instances in 2025, exposing single-rack NVL72 capacity integrated with Elastic Fabric Adapter networking.
- **Google Cloud's** A4 family includes NVL72 VM shapes for customers who specifically want Blackwell rather than [TPU](/wiki/tpu) v5p and Trillium.
- **xAI Colossus** in Memphis added thousands of NVL72 racks during 2025 expansions on top of the original 100,000-H100 buildout.
- **G42 / Stargate UAE.** A 5 GW UAE data center is planned to deploy more than 2 million GB200-class chips, equivalent to tens of thousands of NVL72 racks.
- **Sovereign and national-lab customers.** European governments, the Saudi PIF-backed HUMAIN initiative, and U.S. national labs committed to NVL72 deployments.
- **Neoclouds.** Lambda partnered with Pegatron, with similar deployments at RunPod, Crusoe, Nebius, Together AI, and Vultr.

## Software stack

The NVL72 runs the standard NVIDIA software stack with Blackwell-specific support: [CUDA](/wiki/cuda) 12.4+ with Blackwell compute capability sm_100, [cuDNN](/wiki/cudnn) 9.x with Transformer Engine FP8 and FP4 integration, [TensorRT-LLM](/wiki/tensorrt_llm) for production inference with NVL72-aware sharding, NVIDIA [Triton Inference Server](/wiki/triton_inference_server), [NCCL](/wiki/nccl) with topology discovery, [PyTorch](/wiki/pytorch) 2.4+ with native FP8 and FP4 training paths, NVIDIA Dynamo (announced GTC 2025) for disaggregated inference orchestration, NVIDIA NIM microservices, and NVIDIA Mission Control for cluster management.[3]

A 72-GPU all-reduce on the NVL72 runs in roughly the time that the same all-reduce across nine eight-GPU H200 servers would take just to negotiate the cross-node InfiniBand transfers. PyTorch's distributed primitives automatically detect the topology and route through NVLink wherever possible.

## How does the NVL72 compare with related products?

| Product | GPUs | CPUs | NVLink domain | FP4 peak | Year |
|---|---|---|---|---|---|
| [DGX H100](/wiki/dgx_h100) | 8 H100 | 2 x86 | 8-GPU | n/a (FP8: 32 PFLOPS) | 2022 |
| HGX H200 | 8 H200 | host CPU | 8-GPU | n/a (FP8: 32 PFLOPS) | 2024 |
| GH200 NVL32 | 32 GH200 | 32 Grace | 32-GPU | n/a (FP8: 127 PFLOPS) | 2023 |
| **GB200 NVL72** | **72 B200** | **36 Grace** | **72-GPU** | **1.44 EFLOPS** | **2024** |
| [GB300 NVL72](/wiki/nvidia_gb300_nvl72) | 72 B300 | 36 Grace | 72-GPU | ~1.1 EFLOPS dense | 2025 |
| Vera Rubin NVL144 | 144 R100 | 36 Vera | 144-GPU | TBD | 2026 to 2027 |

The immediate successor is the [GB300 NVL72](/wiki/nvidia_gb300_nvl72), announced at GTC 2025, a refresh of the same rack form factor using the Blackwell Ultra B300 GPU. GB300 began customer shipments in mid-2025 and rapidly overtook the GB200 in newer cluster builds. NVIDIA's roadmap calls for the **Vera Rubin NVL144** in 2026 to 2027, doubling per-rack GPU count to 144 with HBM4 memory and a refreshed NVLink generation. Pricing for Vera Rubin NVL144 racks has been reported as high as USD 8.8 million.

## What is the GB200 NVL72 used for?

The NVL72 is most valuable where the unified NVLink domain enables a different deployment shape:

- **Trillion-parameter dense LLM training.** Models in the 500 billion to 2 trillion parameter range fit in the 13.4 TB unified GPU memory pool and benefit from full-NVLink all-reduce; NVIDIA states the architecture supports training and real-time inference for models scaling up to 10 trillion parameters.[2][3]
- **Frontier MoE inference.** [Mixture-of-experts](/wiki/mixture_of_experts) models like DeepSeek-V3 see the largest per-GPU throughput gains because expert routing benefits from the flat NVLink topology.[19]
- **Long-context inference.** 128K-plus context windows producing KV caches in hundreds of gigabytes per request fit in the rack-scale memory pool.
- **Reasoning model inference.** [Test-time compute](/wiki/test_time_compute) workloads like [o1](/wiki/o1) generate orders of magnitude more output tokens per query, making throughput the binding constraint.
- **Agentic AI serving.** Long-running agent loops with large context histories benefit from unified memory across the rack.
- **Multi-tenant inference.** A single NVL72 serves dozens of model replicas because the unified domain lets the scheduler treat the rack as one resource pool.

## Limitations

- **Facility requirements.** The 120 kW liquid-cooled rack requires facilities most operators did not have in 2024. Retrofitting liquid-cooled floors is a multi-year undertaking.[6]
- **Unified failure domain.** A failed NVSwitch tray or backplane fault can take down all 72 GPUs simultaneously, a coarser failure granularity than eight-GPU servers connected by InfiniBand.
- **Repair complexity.** Servicing requires draining the liquid loop or isolating the affected tray.
- **Pricing.** At USD 3 million-plus per rack, the NVL72 is an order of magnitude more expensive per GPU than a CPU server.[7]
- **Supply constraints.** Through 2024 and most of 2025, supply did not meet demand. Smaller customers received little or no allocation while hyperscaler orders consumed available capacity.[14]
- **Power infrastructure.** 120 kW per rack at cluster-scale pushes against grid capacity even at purpose-built AI data center sites. Several deployments have been delayed by interconnection queue or substation timelines rather than hardware availability.

## See also

- [NVIDIA Blackwell](/wiki/nvidia_blackwell), [NVIDIA B200](/wiki/nvidia_b200), [NVIDIA GB200](/wiki/nvidia_gb200), [NVIDIA GB300 NVL72](/wiki/nvidia_gb300_nvl72)
- [NVIDIA Grace](/wiki/grace_cpu), [NVIDIA H100](/wiki/nvidia_h100), [NVIDIA H200](/wiki/nvidia_h200), [NVIDIA Hopper](/wiki/nvidia_hopper)
- [DGX](/wiki/dgx), [HGX](/wiki/hgx), [NVLink](/wiki/nvlink), [NVSwitch](/wiki/nvswitch), [HBM3e](/wiki/hbm3e)
- [CUDA](/wiki/cuda), [TensorRT-LLM](/wiki/tensorrt_llm), [NCCL](/wiki/nccl), [Transformer Engine](/wiki/transformer_engine)
- [Stargate](/wiki/stargate), [OpenAI](/wiki/openai), [Microsoft Azure](/wiki/azure), [Oracle Cloud](/wiki/oracle), [AWS](/wiki/aws), [CoreWeave](/wiki/coreweave)

## References

1. NVIDIA, ["GB200 NVL72" product page](https://www.nvidia.com/en-us/data-center/gb200-nvl72/)
2. NVIDIA Newsroom, ["NVIDIA Blackwell Platform Arrives to Power a New Era of Computing"](https://nvidianews.nvidia.com/news/nvidia-blackwell-platform-arrives-to-power-a-new-era-of-computing), March 18, 2024
3. NVIDIA Developer Blog, ["NVIDIA GB200 NVL72 Delivers Trillion-Parameter LLM Training and Real-Time Inference"](https://developer.nvidia.com/blog/nvidia-gb200-nvl72-delivers-trillion-parameter-llm-training-and-real-time-inference/)
4. NVIDIA Developer Blog, ["NVIDIA Contributes NVIDIA GB200 NVL72 Designs to Open Compute Project"](https://developer.nvidia.com/blog/nvidia-contributes-nvidia-gb200-nvl72-designs-to-open-compute-project/)
5. NVIDIA Docs, ["NVIDIA DGX GB Rack Scale Systems User Guide"](https://docs.nvidia.com/dgx/dgxgb200-user-guide/hardware.html)
6. Data Center Dynamics, ["Nvidia reveals liquid cooled GB200 NVL72 system"](https://www.datacenterdynamics.com/en/news/nvidia-announces-liquid-cooled-gb200-nvl72-system-with-72-blackwell-gpus/)
7. Tom's Hardware, ["Nvidia's next-gen Blackwell AI Superchips could cost up to $70,000"](https://www.tomshardware.com/pc-components/gpus/nvidias-next-gen-blackwell-ai-gpus-to-cost-up-to-dollar70000-fully-equipped-servers-range-up-to-dollar3000000-report)
8. Tom's Hardware, ["Oracle has reportedly placed an order for $40 billion in Nvidia AI GPUs"](https://www.tomshardware.com/pc-components/gpus/oracle-has-reportedly-placed-an-order-for-usd40-billion-in-nvidia-ai-gpus-for-a-new-openai-data-center)
9. CoreWeave, ["CoreWeave, NVIDIA, and IBM Set MLPerf Record"](https://www.coreweave.com/blog/coreweave-nvidia-and-ibm-set-mlperf-record-with-largest-nvidia-gb200-blackwell-cluster-achieving-over-2x-faster-training)
10. Supermicro, ["NVIDIA GB200 NVL72 SuperCluster Brochure"](https://www.supermicro.com/manuals/brochure/Brochure-AI-SuperCluster-NVIDIA-GB200-NVL72.pdf)
11. SemiAnalysis, ["GB200 Hardware Architecture"](https://newsletter.semianalysis.com/p/gb200-hardware-architecture-and-component)
12. SemiAnalysis, ["H100 vs GB200 NVL72 Training Benchmarks"](https://newsletter.semianalysis.com/p/h100-vs-gb200-nvl72-training-benchmarks)
13. TweakTown, ["NVIDIA GB200 NVL72 AI server shipments: 1500 units in April"](https://www.tweaktown.com/news/105151/nvidia-gb200-nvl72-ai-server-shipments-1500-units-in-april-compared-to-1000-q1-2025/index.html)
14. Ming-Chi Kuo, ["Low Mass Production Visibility for GB200 NVL72"](https://medium.com/@mingchikuo/low-mass-production-visibility-for-gb200-nvl72-cautiously-monitoring-short-term-potential-risks-in-eb8068f77202)
15. TrendForce, ["Dell, Foxconn, and Quanta Reassure NVIDIA's GB200 Shipments"](https://www.trendforce.com/news/2024/11/19/news-dell-foxconn-and-quanta-reassure-nvidias-gb200-shipments-remain-on-track-amid-overheat-rumors/)
16. NVIDIA Blog, ["Oracle Cloud Infrastructure Deploys Thousands of NVIDIA Blackwell GPUs"](https://blogs.nvidia.com/blog/oracle-cloud-infrastructure-blackwell-gpus-agentic-ai-reasoning-models/)
17. Spheron, ["NVIDIA GB200 NVL72 Guide: Specs, Pricing & Architecture"](https://www.spheron.network/blog/nvidia-gb200-nvl72-guide/)
18. IntuitionLabs, ["NVIDIA GB200 Supply Chain"](https://intuitionlabs.ai/articles/nvidia-gb200-supply-chain)
19. InfoQ, ["Nvidia's GB200 NVL72 Supercomputer Achieves 2.7x Faster Inference on DeepSeek V3"](https://www.infoq.com/news/2025/06/nvidia-gb200/)

