The NVIDIA DGX B300 is an 8-GPU AI supercomputer node built around NVIDIA's Blackwell Ultra architecture. Announced at GTC 2025 in March and shipping from January 2026, the system pairs eight B300 SXM GPUs with Intel Xeon 6776P processors to deliver 144 petaFLOPS of sparse FP4 inference and 72 petaFLOPS of FP8 training inside a 10U chassis. It replaces the DGX B200 as the entry-level node in NVIDIA's DGX family, sitting below the rack-scale NVIDIA GB300 NVL72 in NVIDIA's Blackwell Ultra lineup.
The DGX B300 is designed for organizations that need a self-contained, standardized AI server rather than a full rack deployment. Its 2.3 TB of HBM3e memory is sufficient to hold models with 400 billion or more parameters entirely in GPU memory, reducing the orchestration overhead required when weights must be distributed across multiple nodes. NVIDIA positions the system primarily for large language model training, extended reasoning inference (where test-time compute chains generate intermediate reasoning tokens before producing a final answer), and agentic workloads.
NVIDIA introduced the DGX brand in April 2016 with the DGX-1, an 8-GPU server using Pascal-generation P100 accelerators. The system delivered 170 teraFLOPS of half-precision compute and was priced at $129,000, with the first unit shipped personally by Jensen Huang to OpenAI. It established the pattern that has defined every subsequent DGX generation: a fixed GPU count (typically 8), a reference CPU configuration, NVLink-based GPU-to-GPU interconnect, and bundled software including NVIDIA's container stack.
The 2018 DGX-2 scaled to 16 Volta V100 GPUs interconnected by NVSwitch, the first generation of NVIDIA's dedicated GPU interconnect chip. It delivered 2 petaFLOPS with 512 GB of shared memory at a $399,000 list price.
The DGX A100, released in May 2020, returned to an 8-GPU configuration using Ampere A100 accelerators. It was the first DGX to ship with AMD EPYC processors rather than Intel Xeon CPUs, a choice driven by memory bandwidth and PCIe lane requirements at the time. Priced at $199,000, it became a widely deployed node across commercial data centers and academic supercomputers, including the Selene system at Argonne National Laboratory.
With the Hopper generation, NVIDIA launched the DGX H100 in 2022 with 8 H100 SXM5 GPUs delivering 32 petaFLOPS of FP8 compute, priced around $482,000. The DGX H200 followed in 2023, using the same H100 die but with increased HBM3e capacity (141 GB per GPU versus 80 GB on the H100 SXM5). Both Hopper systems returned to Intel Xeon processors.
The first Blackwell-generation node was the DGX B200, announced at GTC 2024. It used eight B200 SXM GPUs with 192 GB of HBM3e each, NVLink 5 interconnect, and ConnectX-7 networking. It delivered 72 petaFLOPS of FP4 inference per node.
The DGX B300 represents the Blackwell Ultra step-up from the B200. The GPU die is the same dual-reticle design manufactured on TSMC's 4NP process but with higher clock speeds, increased HBM3e capacity per package (288 GB versus 192 GB on the B200), and a higher 1,400W TDP compared to the B200's 1,000W. NVIDIA brands these upgraded chips the "Blackwell Ultra" generation.
The B300 GPU at the center of the DGX B300 is a refinement of the original Blackwell die rather than a new design. Like its predecessor, it consists of two reticle-sized silicon dies bonded using NVIDIA's High-Bandwidth Interface (NV-HBI), a chip-to-chip interconnect providing 10 TB/s of die-to-die bandwidth. Together the two dies contain 208 billion transistors, 2.6 times more than the Hopper H100.
The compute substrate is organized across 160 Streaming Multiprocessors (SMs), each containing 128 CUDA cores and four fifth-generation Tensor Cores with NVIDIA's second-generation Transformer Engine. Each SM also includes 256 KB of dedicated Tensor Memory (TMEM), a local buffer that feeds the Tensor Cores without contending with the shared L2 cache. The chip totals 20,480 CUDA cores.
The Tensor Cores support FP64, FP32, TF32, BF16, FP16, FP8, FP6, and the new NVFP4 precision format. NVFP4 is NVIDIA's 4-bit floating-point format, introduced with Blackwell. It delivers 15 petaFLOPS of dense compute per GPU, roughly 1.67 times the B200's 9 petaFLOPS in the same precision. The format reduces memory footprint by approximately 1.8 times compared to FP8 while maintaining inference accuracy close to FP8 levels for most large language model workloads.
Attention computation received specific hardware acceleration in Blackwell Ultra: SFU (Special Function Unit) throughput for exponential operations was doubled compared to the original Blackwell, which NVIDIA states delivers up to 2x faster attention-layer compute for transformer models.
The memory subsystem connects twelve 12-Hi HBM3e stacks through sixteen 512-bit controllers, providing 288 GB of capacity per GPU at 8 TB/s of bandwidth. That bandwidth figure equals the B200 despite the larger capacity, because the additional stacks are at the same HBM3e interface specification. Compared to the H100 SXM5 (3.35 TB/s), the B300 provides approximately 2.4 times more bandwidth.
Each GPU connects to the DGX B300 system fabric via fifth-generation NVLink (NVLink 5), providing 1.8 TB/s of bidirectional bandwidth per GPU.
The DGX B300 places eight B300 SXM GPUs on an HGX-compatible baseboard in a 10U chassis. The physical dimensions are 442 mm tall x 482.6 mm wide x 904.2 mm deep, and the system weighs approximately 370 lbs (168 kg) in the AC PSU configuration.
The host processors are two Intel Xeon Platinum 6776P CPUs. This is a return to Intel after the DGX A100 used AMD EPYC. The 6776P is a 64-core chip based on the Granite Rapids architecture, clocked at 2.3 GHz base with an all-core turbo of 3.6 GHz and a maximum single-core boost of 4.6 GHz via Intel's Priority Core Turbo feature. Each socket has a 350W TDP and supports DDR5-6400 with 8 memory channels. The system supports 2 TB of DDR5 system memory in the standard configuration, expandable to 4 TB using 128 GB DIMMs.
This CPU choice reflects NVIDIA's focus on PCIe bandwidth and NVLink host management rather than compute: the Xeon 6776P provides 88 PCIe 5.0 lanes per socket, sufficient to connect the DGX B300's full complement of ConnectX-8 NICs and storage controllers without a bandwidth bottleneck on the CPU side.
Local storage consists of two 1.92 TB NVMe M.2 drives for the operating system and eight 3.84 TB E1.S NVMe SSDs (totaling 30.7 TB) for cache and local dataset staging. All drives are self-encrypting.
The system ships with DGX OS 7, based on Ubuntu 24.04 LTS. It also supports Red Hat Enterprise Linux 8 and 9 and Rocky Linux for organizations with enterprise Linux requirements. NVIDIA AI Enterprise and NVIDIA Mission Control (built on Run:ai technology) are included for workload management and cluster operations.
With 288 GB of HBM3e per GPU across eight GPUs, the DGX B300 provides 2.3 TB of total GPU memory. This is large enough to hold a 400-billion-parameter model at FP8 precision without any model parallelism (a 400B FP8 model requires approximately 400 GB), or a 200-billion-parameter model at FP16 precision (approximately 400 GB). For reasoning-focused deployments running models with extended context windows, the memory headroom reduces the latency and coordination overhead associated with tensor parallelism across nodes.
For context, the DGX H200 offered 141 GB per GPU for 1.1 TB total, the DGX B200 offered 192 GB per GPU for 1.5 TB total, and the DGX B300 represents a further 53% increase over the B200 in total GPU memory capacity.
The 14.4 TB/s of aggregate NVLink bandwidth within the node means GPU-to-GPU transfers run at high speed regardless of whether the workload uses tensor parallelism or pipeline parallelism. A 14 TB/s aggregate corresponds to 1.8 TB/s per GPU in a fully connected topology, and the two NVSwitch chips in the DGX B300 provide a non-blocking crossbar between all eight GPUs.
Fifth-generation NVLink is central to how the DGX B300 functions as a tightly coupled 8-GPU system rather than 8 independent cards attached to a PCIe bus. Each B300 GPU has NVLink 5 ports providing 1.8 TB/s bidirectional bandwidth per GPU, compared to 900 GB/s per GPU for NVLink 4 in the Hopper H100.
The DGX B300 includes two NVSwitch chips that connect all eight GPUs in a non-blocking fabric. The aggregate GPU-to-GPU bandwidth across the two switches is 14.4 TB/s. NVIDIA describes the resulting 8-GPU pool as effectively a single logical GPU with 2.3 TB of memory for workloads using NVIDIA's NVLink programming model.
NVLink 5 also supports direct load/store access between GPU memory spaces, enabling frameworks like PyTorch to move tensors between GPUs without routing through CPU DRAM. This matters for inference workloads that pipeline prefill and decode phases across multiple GPUs or for training where gradient updates must be all-reduced after each backward pass.
External connectivity uses eight single-port NVIDIA ConnectX-8 SuperNICs, each connected via OSFP port, providing 8 x 800 Gb/s of InfiniBand HDR or Ethernet per system. ConnectX-8 is a significant upgrade from the ConnectX-7 used in the DGX B200: it supports 800 Gb/s per port rather than 400 Gb/s, doubles the external bandwidth per node, and integrates PCIe switching functionality directly on the NIC ASIC rather than requiring a separate PCIe retimer or switch chip. This consolidation allowed NVIDIA to place the ConnectX-8 cards on the HGX baseboard itself rather than in PCIe slots, reducing latency and simplifying the chassis layout.
For storage networking and out-of-band management, the system includes two dual-port NVIDIA BlueField-3 DPUs, each port rated at 400 Gb/s, for a total of 4 x 400 Gb/s storage fabric bandwidth. The BlueField-3 DPUs also provide hardware-accelerated network security (encryption, firewall), storage offload (NVMe-oF), and isolation between the compute and storage fabrics. Additional management connectivity comes via a 1 GbE onboard NIC with RJ45 and a 1 GbE baseboard management controller (BMC).
For large cluster deployments, the DGX B300's external connectivity is designed for NVIDIA Quantum-X800 InfiniBand switches, which support 800 Gb/s XDR InfiniBand. In the DGX SuperPOD reference architecture with DGX B300 nodes, compute traffic runs over InfiniBand XDR while storage traffic uses a separate NDR 400 Gb/s InfiniBand fabric to prevent interference between workloads.
The DGX B300 has a maximum power consumption of approximately 14.5 kW in the AC power supply configuration and a similar figure for the DC busbar variant. The peak transient power can reach approximately 19 to 19.7 kW depending on configuration.
Power delivery uses twelve 3.2 kW AC power supplies (200-240 VAC input) in an N+N redundant configuration. The system requires a minimum of six operational PSUs to boot and six to maintain N redundancy. A DC busbar variant is available for data centers using 54 VDC distribution at up to 300 A.
At 1,400W per GPU (versus 1,000W for the B200 and 700W for the H100 SXM5), direct liquid cooling is mandatory. The DGX B300 uses a direct liquid cooling (DLC) loop that circulates coolant to cold plates on each GPU module, the CPUs, and the NVSwitch chips. Air cooling handles remaining components. NVIDIA specifies that the DLC loop requires facility-side coolant at an appropriate supply temperature; exact specifications are in NVIDIA's data center best practices guide for the DGX B300.
The thermal density of the DGX B300 means a fully populated rack of four systems runs at approximately 56 kW per rack, which is toward the upper end of what most existing data center power and cooling infrastructure can handle. Organizations planning DGX B300 deployments typically need to verify both their power distribution capacity (requiring a dedicated rack PDU branch at or above the system's peak draw) and their cooling loop capacity before installation.
For training workloads measured in FP8 precision, the DGX B300 delivers 72 petaFLOPS. This compares to 32 petaFLOPS for the DGX H100 and 72 petaFLOPS for the DGX B200 (also measured at FP8 sparse). The gains over Hopper at FP8 are roughly 2.25 times per system. NVIDIA states the DGX B300 provides a 4x speedup for training compared to Hopper-generation systems when accounting for memory capacity and bandwidth improvements that reduce data staging overhead.
For the FP16/BF16 precision commonly used in mixed-precision training, each GPU delivers 3,500 TFLOPS dense. In aggregate across 8 GPUs, the system provides 28 petaFLOPS of dense BF16 compute, compared to approximately 7.9 petaFLOPS for the DGX H100.
Inference is where Blackwell Ultra shows the largest gains. The DGX B300 delivers 144 petaFLOPS of sparse FP4 inference, twice the 72 petaFLOPS the DGX B200 provides in the same precision. Compared to the DGX H100 (which lacks native FP4 support), NVIDIA states the DGX B300 provides 11x faster AI inference on large language models.
For token generation benchmarks on Llama 70B-class models, B300-based systems achieve on the order of 100,000 tokens per second at FP8 precision per GPU, compared to approximately 21,800 tokens per second for the H100 SXM5, according to benchmarks published by cloud providers in early 2026. At NVFP4 precision, throughput is roughly 1.5 times higher still.
The 2x increase in hardware attention throughput (via the doubled SFU capacity for exponential operations) is particularly relevant for reasoning models that use extended chain-of-thought generation, where each inference step involves many attention operations across a long accumulated context.
NVIDIA specifically positions the DGX B300 for "reasoning AI" workloads, meaning models that generate intermediate "thinking" tokens during inference before producing a final output. Examples include OpenAI's o-series, DeepSeek-R1, and similar models using test-time compute scaling. These workloads are computationally expensive because they produce far more tokens per user query than standard next-token prediction, and they require low memory access latency to keep each step fast. The DGX B300's combination of 2.3 TB of high-bandwidth GPU memory and doubled attention throughput addresses both constraints.
NVIDIA's open-source Dynamo inference framework (announced alongside Blackwell Ultra) is designed to maximize DGX B300 utilization for these workloads by disaggregating the prefill phase (processing the input prompt) from the decode phase (generating output tokens) and scheduling them on separate GPU partitions.
The table below compares the DGX B300 with its immediate predecessors and the rack-scale GB300 system:
| System | GPUs | GPU architecture | GPU memory | FP8 training | FP4 inference | Power | Price (approx.) |
|---|---|---|---|---|---|---|---|
| DGX H100 | 8x H100 SXM5 | Hopper | 640 GB HBM3 | 32 PFLOPS | N/A | ~7 kW | ~$482K |
| DGX H200 | 8x H200 SXM | Hopper | 1.1 TB HBM3e | ~32 PFLOPS | N/A | ~7 kW | ~$400-500K |
| DGX B200 | 8x B200 SXM | Blackwell | 1.5 TB HBM3e | 72 PFLOPS | 72 PFLOPS | ~10 kW | ~$300K+ |
| DGX B300 | 8x B300 SXM | Blackwell Ultra | 2.3 TB HBM3e | 72 PFLOPS | 144 PFLOPS | ~14.5 kW | ~$300-350K |
| GB300 NVL72 | 72x B300 + 36 Grace CPUs | Blackwell Ultra | 21 TB HBM3e | ~648 PFLOPS | ~1,300 PFLOPS | ~132 kW | ~$3M+ |
Compared to the DGX B200 at similar pricing, the DGX B300 provides 53% more GPU memory, twice the FP4 inference throughput, and doubled hardware attention throughput. The tradeoff is a 45% increase in power consumption (from roughly 10 kW to 14.5 kW), which affects both energy costs and cooling infrastructure requirements.
Compared to the DGX H200, the DGX B300 offers approximately 2 times the GPU memory, roughly 4x more bandwidth, native FP4 support (which the H200 lacks entirely), and significantly higher raw compute. The DGX H200 is still relevant for FP64 scientific computing workloads where precision matters more than throughput, since the Hopper H200 has higher FP64 double-precision throughput than the B300 per GPU.
The GB300 NVL72 is the rack-scale alternative, integrating 72 Blackwell Ultra GPUs with 36 NVIDIA Grace Arm CPUs into a single liquid-cooled rack. It provides roughly 9 times the GPU count and memory of a single DGX B300 node, but at approximately 10 times the cost and 9 times the power. Critically, the GB300 NVL72 uses NVIDIA Grace CPUs rather than Intel Xeon, which affects software compatibility for CPU-intensive preprocessing pipelines that rely on x86 libraries. For organizations with existing x86 software stacks who want to scale beyond a single node, clusters of DGX B300 systems (each with standard Xeon CPUs) are often preferred over GB300 NVL72 racks.
The DGX B300 is NVIDIA's fully integrated, branded version of the HGX B300 NVL16 platform. The HGX B300 NVL16 is a reference baseboard design that NVIDIA licenses to OEM server manufacturers including Dell, HPE, Lenovo, Supermicro, and GIGABYTE. OEM HGX B300 servers share the same GPU modules and NVSwitch interconnect as the DGX B300 but allow the manufacturer to choose a different CPU, chassis form factor, and cooling implementation.
A notable architectural difference in the HGX B300 NVL16 versus earlier HGX generations is that the eight ConnectX-8 NICs are now integrated directly onto the HGX baseboard itself rather than occupying PCIe expansion slots. This means OEM servers must accommodate eight external network connections on the removable GPU tray, which required significant chassis redesign by server manufacturers. Supermicro began shipping HGX B300 systems using its DLC-2 direct liquid cooling technology in late 2025.
For cloud providers and hyperscalers that want B300 GPU capability without NVIDIA's specific DGX packaging, the HGX B300 route offers more flexibility and typically lower per-unit cost. AWS, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure all announced plans to offer B300-based instances.
For organizations needing more compute than a single DGX B300 node, NVIDIA's DGX SuperPOD architecture scales clusters of DGX B300 nodes with a standardized networking and storage reference design. The basic scaling unit is one "Scalable Unit" (SU) containing 72 DGX B300 systems (576 GPUs), connected via NVIDIA Quantum-X800 800 Gb/s InfiniBand switches in a rail-optimized fat-tree topology.
A rack holds up to four DGX B300 systems at roughly 56 kW per rack, so a single SU spans approximately 18 racks. The standard reference architecture documents an 8-SU configuration with 576 DGX B300 nodes and 4,608 GPUs, but NVIDIA's documentation states that DGX SuperPOD can scale to 72 SUs with more than 2,000 DGX B300 nodes.
Networking separates the compute fabric (Quantum-X800 InfiniBand at 800 Gb/s XDR) from the storage fabric (InfiniBand NDR at 400 Gb/s) to prevent storage I/O from interfering with GPU-to-GPU communication during training.
NVIDIA Mission Control manages DGX SuperPOD clusters, providing job scheduling (via Run:ai), health monitoring, automated failure handling, and telemetry aggregated at the cluster level. The software stack supports both SLURM-based HPC workloads and Kubernetes-based cloud-native AI pipelines.
NVIDIA does not publish an official list price for the DGX B300. Based on quotes from resellers and system integrators in early 2026, the system price is in the range of $300,000 to $350,000, which implies a per-GPU cost of approximately $37,500 to $43,750. This is modestly lower than the DGX H100 at launch ($482,000) despite representing two generations of performance improvements, partly because the broader market supply of Blackwell-generation components and competition among resellers has compressed margins.
Cloud rental pricing for B300 GPUs as of April 2026 runs from approximately $2.45 per GPU per hour on spot market platforms to $6.80 per GPU per hour for dedicated reserved capacity, according to aggregated pricing data from GPU cloud marketplaces. The substantial difference between spot and dedicated pricing reflects high demand for B300 capacity relative to supply.
For context, DGX B200 cloud pricing settled around $4.50 to $6.50 per GPU per hour during the same period, meaning the B300 commands a modest premium for roughly double the FP4 throughput and 53% more memory per GPU.
The most prominent DGX B300 deployment announced publicly is Eli Lilly's LillyPod, described as the world's largest AI factory wholly owned by a pharmaceutical company. The system was unveiled at a NVIDIA event in Washington, D.C., in November 2025 and became operational in early 2026. LillyPod uses a DGX SuperPOD configuration with 1,016 NVIDIA Blackwell Ultra GPUs, delivering more than 9,000 petaFLOPS of AI performance. Lilly uses the system for biomedical foundation model training, genome sequence analysis, predictive patient outcome modeling, antibody and molecule generation via the NVIDIA BioNeMo platform, manufacturing digital twins in NVIDIA Omniverse, and AI-assisted clinical trial documentation workflows.
AWS, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure all announced support for Blackwell Ultra-based instances, which include both HGX B300 and GB300 NVL72 configurations depending on the workload tier. Specific DGX B300 instance types are available through NVIDIA's own cloud partnerships and through resellers like Lambda Labs, CoreWeave, and Crusoe Energy.
EPRI (the Electric Power Research Institute) is using DGX B300 systems for weather forecasting models supporting electrical grid reliability. Microsoft Research has tested B300 systems for large-scale language model research. Equinix announced plans to offer DGX B300 systems as part of NVIDIA's Instant AI Factory managed service, providing preconfigured AI infrastructure available in 45 global markets.
The DGX B300's combination of large GPU memory, high bandwidth, and strong inference throughput makes it well suited for several categories of workloads.
For large language model training at scale, the 2.3 TB of NVLink-connected GPU memory allows model parallelism strategies that keep large weight tensors resident across the 8-GPU pool without inter-node communication. The 72 petaFLOPS of FP8 compute sustains high throughput for forward and backward passes on transformer models.
For reasoning inference, the doubled attention throughput and NVFP4 precision support allow single DGX B300 nodes to serve high-concurrency inference for models like DeepSeek-R1 (671B parameters with MoE) without requiring multi-node inference serving. At FP4 precision, a 671B MoE model with sparse activation can fit within the 2.3 TB GPU memory pool.
For fine-tuning and post-training (RLHF, DPO, instruction tuning) of models up to 70B parameters, a single DGX B300 can run the entire fine-tuning pipeline on-premises without cloud dependency, which appeals to organizations with regulatory constraints on data residency.
For multimodal workloads combining vision and language, the memory capacity supports large vision-language models with high-resolution image inputs that would overflow smaller GPU memory pools.
Biomedical and scientific applications benefit from the combination of high FP16/BF16 throughput for deep learning and the large memory capacity for genomics and molecular simulation workloads that require keeping large sparse matrices resident on GPU.
The 14.5 kW power consumption requires direct liquid cooling infrastructure, which many existing data centers do not have deployed at the rack level. Organizations moving from air-cooled GPU servers must install facility-side cooling loops before deploying the DGX B300, an upgrade that can take months and add substantial cost beyond the system price.
The 800 Gb/s external networking requires ConnectX-8-compatible switches and cables. Data centers running 400 Gb/s InfiniBand or Ethernet infrastructure for DGX H100 or H200 clusters must upgrade their top-of-rack switching to access full bandwidth, because the DGX B300's eight 800 Gb/s ports saturate a 400 Gb/s port in less than half the time.
The DGX B300's 2.3 TB of GPU memory is large by single-node standards but falls well short of multi-trillion-parameter model requirements. Workloads requiring more than approximately 1 trillion parameters at FP8 still require multi-node deployment. The GB300 NVL72, with its 21 TB of unified HBM3e, addresses a different scale tier.
At the software level, full utilization of NVFP4 precision requires TensorRT-LLM 0.15 or later and CUDA 12.x, meaning legacy inference stacks need updating before the DGX B300 can reach its peak FP4 throughput.