The NVIDIA GB300 NVL72 is a rack-scale AI computing system built around the Blackwell Ultra GPU architecture. NVIDIA announced the platform on March 18, 2025 at its GTC conference, positioning the system as the company's primary answer to the explosive compute demands of reasoning AI, agentic workloads, and test-time scaling inference. The GB300 NVL72 packs 72 NVIDIA Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs into a single liquid-cooled rack, forming one of the largest unified compute domains ever deployed in commercial AI infrastructure.
At full utilization, the system delivers over 1.1 exaFLOPS of dense FP4 compute, a figure that marks the first time a single commercial rack has crossed the exascale threshold. Against the Hopper-based H100 systems that dominated data centers from 2022 through 2024, NVIDIA claims 50x higher AI factory output and 65x more AI compute. Shipments began with Dell delivering the first production unit to CoreWeave in July 2025, with broad availability across major cloud providers through the second half of 2025.
NVIDIA's Blackwell architecture (the B100/B200 generation) launched in 2024 as the successor to Hopper. The original Blackwell GPUs introduced a dual-die design, FP4 tensor cores, fifth-generation NVLink, and a tight integration with the Grace CPU through a high-speed chip-to-chip interconnect. The GB200 NVL72, built around the standard B200 GPU, became the company's flagship rack-scale product in 2024 and drew commitments from every major hyperscaler.
Blackwell Ultra, designated the B300, is a refined version of the same core architecture on the same TSMC 4NP process. NVIDIA did not redesign the chip from scratch. Instead, the company made targeted changes to the components most relevant to inference and reasoning workloads: the attention-layer compute units, the memory stack configuration, and the FP4 tensor core throughput. The result is a GPU that costs more per unit, draws 200W more at peak, but delivers roughly 50% more useful throughput for the transformer-based inference workloads that now dominate data center demand.
Jensen Huang, NVIDIA's CEO, framed the timing at GTC 2025: "AI has made a giant leap. Reasoning and agentic AI demand orders of magnitude more computing performance." The GB300 NVL72 was designed specifically to serve that demand, with the Dynamo inference framework and NVIDIA NIM microservices providing the software counterpart.
Like its B200 predecessor, the Blackwell Ultra B300 is not a single monolithic die. Two reticle-limited GPU dies are connected through NVIDIA's NV-HBI (High-Bandwidth Interface), a custom die-to-die interconnect delivering 10 TB/s of internal bandwidth. The two dies operate as a single logical GPU, sharing a unified memory address space and appearing to software as one accelerator.
The combined die contains 208 billion transistors, identical to the standard Blackwell count, manufactured on TSMC's N4P process node. This is an optimized variant of TSMC's 5nm family, sometimes called 4NP, tuned for high-density logic and power efficiency in compute workloads. The 208 billion transistor count exceeds the 185 billion in AMD's competing Instinct MI355X by approximately 12%.
Each B300 GPU contains:
The fifth-generation Tensor Cores in the B300 support FP8, FP6, and NVFP4 (four-bit floating point) precision. FP4 inference was the headline addition in the original Blackwell generation, but Blackwell Ultra doubled the attention-layer compute specifically to address a known bottleneck in transformer inference.
In a standard FP8 transformer forward pass, softmax computation in the attention layer consumes roughly the same number of cycles as the matrix multiplication (GEMM) operations. This creates a pipeline bottleneck that requires precise kernel scheduling to avoid performance loss. Blackwell Ultra adds 2x the MUFU (Multi-Function Unit) capacity dedicated to attention operations, providing a 2x speedup on attention compute and relaxing the kernel scheduling constraints that limited practical throughput on B200 systems.
Dense NVFP4 throughput per GPU is 15 petaFLOPS, compared to 10 petaFLOPS on the B200 (a 50% increase). FP8 throughput is 7.5 petaFLOPS dense.
The most significant change in the B300 relative to the B200 is memory capacity. The B300 uses 12-Hi HBM3E stacks instead of the 8-Hi stacks in the B200. This allows 288 GB of HBM3E per GPU, compared to 192 GB on the B200, a 50% increase per chip.
The memory interface consists of sixteen 512-bit controllers (8,192-bit total bus width) with a peak bandwidth of 8 TB/s per GPU. This is the same bandwidth figure as the B200 because HBM3E at 12-Hi stacks does not inherently increase per-pin speed; the benefit is purely in capacity. For inference, the 288 GB capacity means very large models (300B+ parameter models) can fit entirely within a single GPU's memory without offloading, eliminating costly inter-GPU communication for certain serving configurations.
Across the 72-GPU GB300 NVL72 rack, total HBM3E memory is approximately 20 TB.
Each B300 GPU connects to its paired Grace CPU through NVLink-C2C, a coherent chip-to-chip interconnect running at 900 GB/s. The CPU and GPU share a unified memory address space over this link, allowing CPU-side LPDDR5X memory to be addressed directly from GPU kernels without explicit data transfers.
For host system connectivity in server configurations without a Grace CPU, the B300 provides a PCIe Gen 6 x16 interface delivering 256 GB/s bidirectional bandwidth, double that of PCIe Gen 5.
NVIDIA packages the B300 GPU with the Grace CPU in a component called the GB300 Grace Blackwell Ultra Superchip. Each GB300 superchip contains two B300 GPUs and one Grace CPU, connected through NVLink-C2C. This three-chip package is the building block of the GB300 NVL72 rack.
The Grace CPU in each superchip is based on Arm's Neoverse V2 architecture, the same core used in the GB200. Each Grace CPU contains 72 Arm Neoverse V2 cores running at a base frequency of 3.1 GHz. In the GB300 NVL72 rack, 36 superchips provide 36 Grace CPUs with a combined 2,592 Neoverse V2 cores.
Grace CPU memory in the NVL72 system totals approximately 17 TB of LPDDR5X ECC memory with up to 14 TB/s of bandwidth. The CPU memory is available to GPU kernels through the coherent NVLink-C2C interface, giving each pair of B300 GPUs access to a combined pool of CPU DRAM and GPU HBM3E.
The GB300 NVL72 is a pre-integrated 48U rack containing:
The rack is shipped as a complete, pre-validated unit by NVIDIA's manufacturing partners, including Dell, HPE, Lenovo, Supermicro, GIGABYTE, and others. Dell's PowerEdge XE9712 was the first system to ship, delivered to CoreWeave on July 3, 2025.
All 72 GPUs in the rack are connected through NVIDIA's fifth-generation NVLink Switch chips into a single non-blocking NVLink fabric. Each B300 GPU has fifth-generation NVLink with 1.8 TB/s total bandwidth (900 GB/s unidirectional) across 18 links at 100 GB/s per link each direction.
The NVSwitch chips inside the rack aggregate these links into a fabric that delivers 130 TB/s of total GPU-to-GPU bandwidth within the NVL72 domain. Every GPU can communicate directly with every other GPU at full NVLink speed without congestion. This topology makes the 72-GPU rack function as a single logical compute unit for large model inference and training jobs.
The NVLink Switch also supports NVIDIA SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) with FP8 precision, enabling in-network collective operations that reduce communication overhead for distributed training and inference.
For connecting multiple GB300 NVL72 racks or integrating with storage and front-end systems, each superchip node connects to a ConnectX-8 SuperNIC IO module. The ConnectX-8 provides 800 Gb/s of network bandwidth per GPU node, using either NVIDIA Quantum-X800 InfiniBand or NVIDIA Spectrum-X Ethernet fabric.
The ConnectX-8 SuperNIC integrates PCIe Gen 6 switching directly into the NIC, eliminating the need for a separate PCIe switch chip on the baseboard and reducing latency on the data path between GPU and network. Each ConnectX-8 module contains two ConnectX-8 devices, providing 800 Gb/s aggregate throughput.
A BlueField-3 Data Processing Unit (DPU) handles multi-tenant networking isolation, security, and storage offload functions within the rack.
| Specification | Value |
|---|---|
| Architecture | Blackwell Ultra |
| Process node | TSMC N4P |
| Transistors | 208 billion |
| Die configuration | Dual-die (NV-HBI, 10 TB/s) |
| Streaming Multiprocessors | 160 |
| CUDA cores | 20,480 |
| Tensor Cores | 640 (5th generation) |
| Precision support | FP4, FP6, FP8, FP16, BF16, TF32, FP64 |
| HBM3E capacity | 288 GB |
| HBM3E bandwidth | 8 TB/s |
| HBM3E stacks | 8 stacks of 12-Hi |
| Memory bus width | 8,192-bit |
| FP4 dense compute | 15 petaFLOPS |
| FP8 dense compute | 7.5 petaFLOPS |
| FP16/BF16 dense compute | 3.75 petaFLOPS |
| TDP | 1,400 W |
| NVLink bandwidth | 1.8 TB/s (bidirectional) |
| NVLink version | 5th generation |
| PCIe interface | Gen 6 x16 (256 GB/s bidirectional) |
| NVLink-C2C bandwidth | 900 GB/s |
| Specification | Value |
|---|---|
| Total GPUs | 72 (Blackwell Ultra B300) |
| Total Grace CPUs | 36 |
| CPU cores | 2,592 (Arm Neoverse V2) |
| GPU memory (HBM3E) | ~20 TB |
| GPU memory bandwidth | ~576 TB/s |
| CPU memory (LPDDR5X) | ~17 TB |
| CPU memory bandwidth | ~14 TB/s |
| Total fast memory | ~37 TB |
| NVLink fabric bandwidth | 130 TB/s |
| Network bandwidth per GPU | 800 Gb/s (ConnectX-8) |
| FP4 peak compute | 1,080 PFLOPS dense (~1.1 exaFLOPS) |
| FP4 peak compute (with sparsity) | 1,440 PFLOPS |
| FP8 peak compute | 540 PFLOPS |
| FP16/BF16 peak compute | 270 PFLOPS |
| Rack power draw | ~120 kW (full load) |
| Cooling | 100% liquid-cooled |
| Form factor | 48U rack |
| Networking options | Quantum-X800 InfiniBand / Spectrum-X Ethernet |
| Metric | H100 SXM5 | H200 SXM5 | B200 SXM | B300 SXM |
|---|---|---|---|---|
| Architecture | Hopper | Hopper | Blackwell | Blackwell Ultra |
| Process node | TSMC 4N | TSMC 4N | TSMC N4P | TSMC N4P |
| Transistors | 80B | 80B | 208B | 208B |
| HBM capacity | 80 GB | 141 GB | 192 GB | 288 GB |
| HBM bandwidth | 3.35 TB/s | 4.8 TB/s | 8 TB/s | 8 TB/s |
| FP8 dense compute | 1.98 PFLOPS | 1.98 PFLOPS | 4.5 PFLOPS | 7.5 PFLOPS |
| FP4 dense compute | N/A | N/A | 9 PFLOPS | 15 PFLOPS |
| Attention speedup vs H100 | 1x | 1x | 5x (vs H100) | 2x (vs B200) |
| TDP | 700 W | 700 W | 1,000 W | 1,400 W |
The GB300 NVL72 as a full rack system versus a comparable Hopper cluster:
| Workload metric | GB300 NVL72 vs H100 NVL8 equivalent |
|---|---|
| AI factory output (NVIDIA claim) | 50x higher |
| AI compute (FP4 vs FP8 equivalent basis) | 65x more |
| Throughput per megawatt | 5x higher |
| Video generation | 30x faster |
| Tokens per second per user | 10x improvement |
NVLink 5 is the interconnect generation shipping with both Blackwell and Blackwell Ultra GPUs. Each GPU has 18 NVLink lanes, each running at 100 GB/s per direction, for a total of 1.8 TB/s bidirectional per GPU. Compared to NVLink 4 (used in Hopper), NVLink 5 doubles per-GPU bandwidth.
Within the NVL72 rack, the NVLink Switch chips aggregate all GPU links. The NVL72 domain runs at 130 TB/s total GPU-to-GPU bandwidth, non-blocking. NVLink Switch also includes SHARP support for in-network reductions, which reduces the amount of data that must traverse the fabric during collective operations like all-reduce, commonly used in distributed training.
For scale-out networking beyond a single rack, the ConnectX-8 SuperNIC provides 800 Gb/s per GPU node. This is double the networking bandwidth available on GB200 NVL72 systems, which used ConnectX-7 at 400 Gb/s. The ConnectX-8 supports both InfiniBand (Quantum-X800 at 400 Gb/s per port, NDR) and Ethernet (Spectrum-X at 400 Gb/s per port). Two such ports per NIC provide 800 Gb/s aggregate.
At scale, NVIDIA's reference architecture for multi-rack GB300 NVL72 deployments uses a two-tier spine-leaf topology with dedicated InfiniBand or Ethernet fabrics connecting racks. Microsoft Azure's first at-scale deployment connected more than 4,600 GB300 NVL72 racks through next-generation InfiniBand for OpenAI workloads.
The GB300 NVL72 requires direct liquid cooling. Air cooling is not sufficient for a rack drawing approximately 120 kW at full load, and NVIDIA designed the system from the ground up for liquid. The primary components (GPUs, CPUs, NVSwitch chips) are all liquid-cooled through direct contact cold plates. Peripheral components including OSFP transceiver modules, storage drives, and power distribution boards are air-cooled within the rack enclosure.
Each rack requires connection to a facility chilled water loop through a Coolant Distribution Unit (CDU). Target supply water temperatures are between 30°C and 40°C. A single GB300 NVL72 rack generates approximately 409,000 BTU/hr of heat at 120 kW load.
The cooling system bill of materials for one NVL72 rack, according to industry component pricing, totals approximately $49,860. This includes cooling hardware across all compute trays (roughly $40,680 worth) and NVSwitch trays (roughly $9,180 worth). This cooling hardware cost is separate from the rack system price itself.
Data centers deploying GB300 NVL72 racks at scale need to plan power distribution at the row level, not per-rack. A single row of 10 GB300 NVL72 racks draws over one megawatt. Most enterprise data centers built before 2022 require significant infrastructure upgrades before deploying this generation of hardware.
NVIDIA has cited the liquid cooling architecture as a significant efficiency improvement. The company claims GB200 and GB300 NVL72 liquid-cooled systems achieve over 300x greater water efficiency compared to traditional air-cooled data centers running H100s at equivalent output, primarily because liquid cooling allows much higher heat densities and requires less evaporative cooling in the overall facility.
The B300 GPU is fully CUDA-compatible. All existing CUDA code targeting Blackwell or Hopper runs without modification on Blackwell Ultra. NVIDIA introduced the sm_90a compute capability for original Blackwell and the same PTX instruction set supports Blackwell Ultra, with updated libraries (cuDNN, cuBLAS, NCCL) auto-tuned for the B300's doubled attention throughput and NVFP4 compute.
NVIDIA Dynamo is an open-source inference framework released alongside the GB300 NVL72. It was designed specifically for distributed, disaggregated inference serving on large GPU clusters. Dynamo splits the two phases of LLM inference, prefill (context processing) and decode (token generation), across separate pools of GPUs. This disaggregated serving approach allows each phase to be independently scaled and optimized.
On the GB300 NVL72, Dynamo with disaggregated serving delivers approximately 1.5x higher throughput per GPU versus traditional in-flight batching approaches. For Mixture-of-Experts models like DeepSeek-R1, the combination of GB300 NVL72 hardware and Dynamo delivers up to 50x higher throughput than Hopper-based systems with earlier software.
Dynamo supports major LLM serving frameworks as backends including NVIDIA TensorRT-LLM, vLLM, and SGLang. NIM (NVIDIA Inference Microservices) integrates Dynamo capabilities to provide a containerized, optimized deployment option.
NVIDIA TensorRT-LLM is the primary compiler and runtime for optimized LLM inference on NVIDIA hardware. For the B300, TensorRT-LLM includes optimized kernels for NVFP4 quantization, FP8 key-value cache compression, and the new attention MUFU units. Models quantized to NVFP4 via TensorRT-LLM's quantization toolkit run at the full 15 petaFLOPS rated throughput of the B300.
NVIDIA's NCCL (NCCL Collective Communications Library) handles multi-GPU and multi-node communication. On the GB300 NVL72, NCCL operations within the 72-GPU NVLink domain run over the 130 TB/s NVLink fabric. Cross-rack NCCL operations use the ConnectX-8 network at 800 Gb/s per node. NCCL is aware of the NVLink topology and routes intra-rack collectives through the NVSwitch fabric rather than the network interface.
The primary design goal of the GB300 NVL72 is reasoning AI inference. Models like OpenAI's o1/o3, DeepSeek-R1, and similar "chain-of-thought" reasoning systems generate many more tokens per query than conventional LLMs. While a standard chat completion might generate 200-500 output tokens, a reasoning model solving a complex problem may generate tens of thousands of tokens across internal reasoning steps. NVIDIA estimates that reasoning variants demand approximately 20x more tokens per query than standard models, and up to 150x more compute than traditional one-shot inference.
This token explosion makes memory capacity and memory bandwidth the primary bottlenecks, not raw FLOPS. The B300's 288 GB of HBM3E addresses both: more capacity means larger KV caches for longer contexts, and the doubled attention throughput means the attention computation itself does not bottleneck the decode phase.
Test-time scaling, the practice of spending more compute during inference to improve answer quality, can demand up to 100x more compute than traditional inference. The GB300 NVL72's exaFLOP-scale compute capacity is intended to make test-time scaling economically viable at production scale.
The GB300 NVL72 is also used for training and fine-tuning large language models. The 130 TB/s NVLink fabric allows large model parallelism within a single rack. For trillion-parameter model training, multiple racks are connected via InfiniBand or Ethernet.
In MLPerf Training v5.1 benchmarks, Lambda's GB300 NVL72 cluster outperformed GB200 NVL72 systems by 27% on training throughput metrics. The increased per-GPU compute and memory capacity allows larger batch sizes and reduces the communication overhead relative to compute time.
Agentic AI systems use LLMs to plan and execute multi-step tasks, often calling external tools, running code, browsing web content, or invoking specialized models. These workloads require both high throughput (to handle many concurrent agent instances) and low latency (to minimize response time per step). The GB300 NVL72's combination of per-GPU memory capacity, FP4 throughput, and attention acceleration serves both requirements simultaneously.
NVIDIA also targets physical AI applications including robotics simulation and synthetic data generation. For video generation using diffusion models, NVIDIA claims 30x faster generation on GB300 NVL72 systems compared to Hopper. This enables synthetic dataset generation for autonomous vehicle training and robotic manipulation at scales that were previously impractical.
The GB300 NVL72 made its official benchmark debut in MLPerf Inference v5.1 in September 2025. Key results:
DeepSeek-R1 671B (offline scenario):
Llama 3.1 405B (server scenario):
Llama 3.1 405B (interactive scenario):
The benchmarks employed NVFP4 quantization for the majority of model weights (DeepSeek-R1), FP8 key-value cache compression, and disaggregated prefill serving. These techniques collectively contributed about 1.5x throughput improvement on top of the hardware gains.
In MLPerf Training v5.1, NVIDIA swept all seven benchmarks with Blackwell Ultra systems, with the GB200 NVL72 achieving a record 10-minute training time for Llama 3.1 405B.
System integrators announced GB300 NVL72 products at or shortly after GTC 2025, including:
All major hyperscalers committed to deploying GB300 NVL72 systems:
Amazon Web Services: AWS announced Amazon EC2 P6e-GB300 UltraServers, which became generally available in December 2025 at AWS re:Invent. The P6e-GB300 instances provide 1.5x more GPU memory and 1.5x more FP4 compute compared to the prior P6e-GB200 instances. AWS also deployed HGX B300-based EC2 P6-B300 instances for single-node workloads in November 2025.
Microsoft Azure: Azure deployed the first large-scale GB300 NVL72 cluster in production, with more than 4,600 interconnected GB300 NVL72 racks using NVIDIA InfiniBand, running OpenAI workloads. Azure's GB300 NVL72 instances are available as ND GB300 v6 VMs.
Google Cloud: Announced as an early GB300 NVL72 customer at GTC 2025.
Oracle Cloud Infrastructure: Committed to GB300 NVL72 deployment as part of the GTC 2025 announcements.
CoreWeave: The first cloud provider to bring GB300 NVL72 instances into production. CoreWeave made GB300 NVL72-powered instances generally available on August 19, 2025, initially in the US-WEST-01A availability zone. Dell delivered CoreWeave's first production unit on July 3, 2025.
Additional GPU cloud providers committed at GTC 2025 include Lambda, Nebius, Nscale, Crusoe, Yotta, and YTL.
NVIDIA has not published official list prices for the GB300 NVL72. Industry estimates based on ODM quotes and supply chain analysis suggest the full rack system costs approximately $6 million to $6.5 million in AI inference-optimized configurations, though pricing varies by vendor and configuration.
For comparison, the GB200 NVL72 was estimated at approximately $3 million per rack when it launched, suggesting the GB300 NVL72 carries a roughly 2x price premium for 1.5x performance improvement per GPU. The price per token of inference, accounting for the system cost amortized against throughput, is nonetheless favorable because the GB300 NVL72 delivers significantly more tokens per rack per unit of power.
Initial shipments began in July 2025 (Dell to CoreWeave). Volume production ramp was expected in Q3-Q4 2025. AWS P6e-GB300 instances became generally available in December 2025. The broader enterprise market and smaller cloud providers were expected to gain access through 2026.
The HGX B300 NVL16 (an 8-GPU server form factor without Grace CPUs) is available as a lower-cost entry point to Blackwell Ultra, suitable for workloads that do not require the full rack-scale NVLink domain.
| Specification | NVIDIA B300 SXM | AMD Instinct MI355X |
|---|---|---|
| Architecture | Blackwell Ultra | CDNA 3.5 |
| Process node | TSMC N4P | TSMC N3P (compute), N6 (IO) |
| Transistors | 208 billion | 185 billion |
| HBM capacity | 288 GB HBM3E | 288 GB HBM3E |
| Memory bandwidth | 8 TB/s | 8 TB/s |
| FP4 dense compute | 15 PFLOPS | 20 PFLOPS |
| FP8 dense compute | 7.5 PFLOPS | 10 PFLOPS |
| FP16/BF16 dense compute | 3.75 PFLOPS | 5 PFLOPS |
| TDP | 1,400 W | 1,400 W |
| NVLink/xGMI | NVLink 5 (1.8 TB/s) | xGMI (no equivalent scale-up fabric) |
| Rack-scale domain | 72 GPUs / 130 TB/s | No equivalent |
| Software ecosystem | CUDA, TensorRT, Dynamo | ROCm, MIOpen |
The AMD MI355X has higher peak FP4 FLOPS on paper (20 vs 15 PFLOPS dense). However, real-world inference benchmarks tell a different story. SemiAnalysis's InferenceX v2 analysis found that NVIDIA's B300 and GB300 NVL72 dominate across most inference scenarios when advanced techniques like disaggregated prefill, wide expert parallelism, and FP4 precision are used together. The MI355X performs well when these optimizations are applied individually but underperforms when combined, due to gaps in the ROCm software stack's kernel and collective optimizations.
In MLPerf Inference v5.1, the MI355X matched or slightly beat the B200 (not B300) on several single-node LLM benchmarks. AMD stated at ISSCC 2026 that the MI355X matches GB200 performance despite lower compute unit count through per-CU throughput improvements.
The GB300 NVL72's primary competitive advantage is not per-GPU FLOPS but rather the 130 TB/s NVLink fabric connecting all 72 GPUs. No AMD product offers an equivalent scale-up domain. For very large model inference (400B+ parameters across a whole rack) and for agentic workloads requiring rapid KV cache sharing, the unified NVLink domain provides practical throughput advantages that peak FLOPS comparisons do not capture.
On cost-per-token for standard FP8 inference at high concurrency, AMD claims competitive cost economics for the MI355X versus the GB300 NVL72 at throughput-optimized operating points.
NVIDIA announced the Vera Rubin platform as the successor to Blackwell Ultra at GTC 2025, with additional details at CES 2026. The Vera Rubin NVL72 uses the R100 GPU (also called the Rubin GPU) and the Vera CPU, and follows the same rack-scale architecture as the GB300 NVL72.
Key announced improvements in Vera Rubin:
Jensen Huang confirmed at CES 2026 that the Vera Rubin NVL72 was in production, with delivery expected in the second half of 2026 to the same initial customers (AWS, Google Cloud, Microsoft Azure, OCI, CoreWeave, Lambda, Nebius, Nscale).
NVIDIA has also outlined a longer roadmap including an NVL144 rack (doubling the GPU count per rack) and a "Vera Rubin Ultra" generation, though detailed specifications for these future products had not been disclosed as of early 2026.
The GB300 NVL72 represents a significant step up in data center infrastructure requirements compared to prior GPU generations. Key infrastructure considerations:
Power density: Each rack draws approximately 120 kW at full load (each B300 GPU at 1,400 W, plus Grace CPUs and switching equipment). This is roughly double the power draw of a comparable H100 NVL8 cluster delivering equivalent model throughput.
Liquid cooling infrastructure: 100% of GPU and CPU heat must be removed by liquid. Data centers must provide a chilled water supply loop with sufficient flow rate and heat rejection capacity. Many enterprise data centers built before 2022 lack the piping infrastructure for direct liquid cooling and require capital investment before deploying GB300 NVL72 racks.
Power distribution: A row of 10 GB300 NVL72 racks exceeds 1.2 MW. Power feeds must be redundant and capacity-planned at the row level. Floor loading must accommodate rack weights that include heavy liquid cooling equipment.
Power smoothing: NVIDIA has incorporated energy storage and power management features in the GB300 NVL72 design to reduce peak demand transients when GPU workloads ramp up simultaneously, a persistent challenge with dense GPU installations.
Facility build time: Analysts at Eliovp estimated a four-month data center construction and infrastructure preparation period as the minimum lead time for a new facility capable of hosting GB300 NVL72 racks, longer than the supply lead time for the hardware itself.