NVIDIA Blackwell Ultra

AI Hardware NVIDIA

9 min read

Updated Jun 27, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 27, 2026

Fact-checked

In review queue

Sources

10 citations

Revision

v2 · 1,874 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

NVIDIA Blackwell Ultra is a mid-cycle refresh of NVIDIA's Blackwell data-center GPU architecture, announced at NVIDIA's GTC conference on March 18, 2025, and built for what NVIDIA calls the "age of AI reasoning." It centers on the B300 GPU and the GB300 (Grace Blackwell Ultra) superchip, each B300 carrying 288 GB of HBM3e memory (50 percent more than the 192 GB B200) and about 15 petaFLOPS of dense NVFP4 compute, which NVIDIA describes as roughly 1.5x the dense low-precision performance of the original B200. Delivered chiefly through the rack-scale GB300 NVL72 system (72 Blackwell Ultra GPUs and 36 Grace CPUs) and the HGX B300 server board, it began shipping from partners in the second half of 2025. ^[1]^[2]^[3]

Blackwell Ultra is not a new microarchitecture. It reuses the same dual-die Blackwell silicon, the same fifth-generation NVLink interconnect, and the same NVFP4 numerical format as the base generation, while raising memory capacity, attention-layer throughput, and power budget. In NVIDIA's cadence it occupies the slot between the original 2024 Blackwell launch and the next-generation Vera Rubin platform expected in 2026, mirroring the earlier mid-life update of the prior Hopper generation. NVIDIA chief executive Jensen Huang positioned it directly at reasoning and agentic workloads, saying "AI has made a giant leap, reasoning and agentic AI demand orders of magnitude more computing performance." ^[1]^[2]^[4]

What is Blackwell Ultra?

Blackwell Ultra is the "Ultra" mid-cycle upgrade of NVIDIA's Blackwell GPU line, optimized for large-scale, token-heavy inference rather than for a wholesale compute redesign. The core changes over base Blackwell are larger memory (288 GB of HBM3e per GPU) and higher dense low-precision math (about 15 petaFLOPS of NVFP4), both aimed at serving reasoning models that emit long chains of thought at test time. NVIDIA describes the platform as a "single versatile platform" that handles pretraining, post-training, and reasoning inference, and frames its purpose as building "AI factories" that convert electrical power and chips into output tokens at higher throughput and lower latency than the base Blackwell platform. Jensen Huang said at launch, "We designed Blackwell Ultra for this moment, it's a single versatile platform that can easily and efficiently do pretraining, post-training and reasoning AI inference." ^[1]^[2]^[3]

When was Blackwell Ultra announced?

NVIDIA unveiled the Blackwell Ultra AI factory platform on March 18, 2025, during Jensen Huang's keynote at GTC in San Jose. The company framed the launch around inference for reasoning models, which generate far more output tokens per query than conventional chatbots and therefore stress both compute and memory during the generation (decode) phase. NVIDIA tied the platform to "test-time scaling inference, the art of applying more compute during inference to improve accuracy," and claimed it increased Blackwell's revenue opportunity for AI factories by 50x compared with systems built on NVIDIA Hopper. ^[1]^[2]

The platform announced at GTC comprised two principal system designs: the rack-scale GB300 NVL72 and the eight-GPU HGX B300 NVL16 board for conventional servers. NVIDIA said Blackwell Ultra products would be available from partners starting in the second half of 2025, and that the GB300 NVL72 would also be offered through NVIDIA DGX Cloud. ^[1]

How is Blackwell Ultra different from Blackwell (B200)?

The defining changes from base Blackwell are larger memory and higher dense low-precision math, both aimed at inference. NVIDIA's technical materials emphasize three deltas over the B200: a step up from 192 GB to 288 GB of HBM3e per GPU, an increase in dense NVFP4 tensor throughput from about 10 to 15 petaFLOPS (which NVIDIA describes as a roughly 1.5x increase), and up to 2x faster acceleration of the attention layer's softmax (exponential) operations, which dominate the cost of long-context generation. The added memory lets a single GPU hold larger models and far longer key-value caches, which is precisely the bottleneck for reasoning workloads that emit thousands of tokens. ^[3]^[5]

Importantly, several core attributes are unchanged from the base generation. Blackwell Ultra uses the same two reticle-sized GPU dies fused by NVIDIA's NV-HBI die-to-die link (about 10 TB/s) into a single logical GPU totaling 208 billion transistors, the same fifth-generation NVLink at 1.8 TB/s of bidirectional bandwidth per GPU, and the same NVFP4 4-bit format introduced with Blackwell. The principal trade-off for the higher performance is power: the SXM B300 module is rated up to about 1,400 W, above the roughly 1,000 to 1,200 W of base Blackwell modules. ^[3]^[5]^[6]

Specification (per GPU)	Blackwell (B200)	Blackwell Ultra (B300)
HBM memory	192 GB HBM3e	288 GB HBM3e
Memory bandwidth	8 TB/s	8 TB/s
Dense NVFP4 compute	~10 PFLOPS	~15 PFLOPS
Attention (softmax) throughput	baseline	up to ~2x baseline
NVLink (5th gen)	1.8 TB/s	1.8 TB/s
Transistors (dual die)	208 billion	208 billion
Peak board power	up to ~1,200 W	up to ~1,400 W

Sources: NVIDIA technical blog and product materials; Tom's Hardware. ^[3]^[5]^[6]

What is the B300 GPU?

The B300 is the Blackwell Ultra GPU. Like the base B200, it is a single package containing two reticle-limit dies built on a TSMC 4-nanometer-class (4NP) process and joined by the NV-HBI interface so software treats them as one device. The GPU exposes 160 streaming multiprocessors organized into eight processing clusters, with 640 fifth-generation Tensor Cores, and it supports NVIDIA's NVFP4, FP6, and FP8 formats through a second-generation Transformer Engine. ^[3]^[5]

The headline figures for the B300 are 288 GB of HBM3e (eight 12-high stacks) delivering about 8 TB/s of bandwidth, and 15 petaFLOPS of dense NVFP4 tensor compute (about 20 petaFLOPS with sparsity). The doubling of the special-function-unit throughput used for exponentials gives up to 2x faster attention-layer compute than base Blackwell, a targeted change because attention's softmax is the hot path during long-sequence decoding. Each GPU connects to its host Grace CPU over NVLink-C2C at 900 GB/s and to peers over fifth-generation NVLink at 1.8 TB/s. ^[3]^[5]

What is the GB300 Grace Blackwell Ultra Superchip?

The GB300 superchip pairs NVIDIA's Arm-based Grace CPU with Blackwell Ultra GPUs on a single coherent module, the Blackwell Ultra equivalent of the earlier GB200. In the configuration used inside the GB300 NVL72, each superchip board combines one Grace CPU with two B300 GPUs, joined to the CPU by the 900 GB/s NVLink-C2C coherent link so that CPU and GPU share a unified memory space. This tight CPU-GPU coupling, together with the large HBM3e pool, is what NVIDIA leans on for serving very large mixture-of-experts and reasoning models. ^[3]^[5]

What is the GB300 NVL72?

The GB300 NVL72 is the flagship Blackwell Ultra system: a single liquid-cooled rack that links 72 B300 GPUs and 36 Grace CPUs through a fifth-generation NVLink switch fabric so the whole rack behaves as one large accelerator. NVIDIA states the rack delivers about 1.1 exaFLOPS of dense NVFP4 compute (1,440 petaFLOPS with sparsity) and 720 petaFLOPS of FP8 for training, with around 20 TB of HBM3e GPU memory, about 37 TB of total fast memory, and 130 TB/s of aggregate NVLink bandwidth. Networking uses NVIDIA ConnectX-8 SuperNICs providing 800 Gb/s per GPU. NVIDIA reports the GB300 NVL72 delivers "1.5x more AI performance than the NVIDIA GB200 NVL72." ^[1]^[3]^[7]

Specification	GB200 NVL72	GB300 NVL72
Blackwell GPUs	72 (B200)	72 (B300)
Grace CPUs	36	36
NVFP4 compute (with sparsity)	1,440 PFLOPS	1,440 PFLOPS
FP4 dense compute	~0.72 EFLOPS	~1.1 EFLOPS
GPU (HBM3e) memory	~13.4 TB	~20 TB
NVLink bandwidth	130 TB/s	130 TB/s
Per-GPU SuperNIC	ConnectX-7/8	ConnectX-8, 800 Gb/s

Sources: NVIDIA GB300 NVL72 product page and technical blog; NVIDIA GB200 NVL72 materials. ^[3]^[7]^[8]

For customers using standard eight-GPU servers rather than full racks, NVIDIA offers the HGX B300 NVL16 baseboard, which carries Blackwell Ultra GPUs in the familiar HGX form factor. NVIDIA characterized the HGX B300 as delivering up to 11x faster large-language-model inference, 7x more compute, and 4x larger memory than the Hopper generation. The same silicon underpins the DGX B300 system, which packs eight Blackwell Ultra GPUs with 2.1 TB of total GPU memory, 144 petaFLOPS of FP4 inference (sparse), and 72 petaFLOPS of FP8 training, paired with Intel Xeon host CPUs and ConnectX-8 networking. Eight GB300 NVL72 racks combine into a Blackwell Ultra DGX SuperPOD of 576 GPUs, 288 Grace CPUs, about 300 TB of HBM3e, and roughly 11.5 exaFLOPS of FP4. ^[1]^[9]^[6]

When did Blackwell Ultra ship, and who adopted it?

NVIDIA guided that Blackwell Ultra systems would ship from partners in the second half of 2025, and the rollout broadly tracked that schedule. The company named a wide hardware ecosystem at launch, including server makers Cisco, Dell, HPE, Lenovo, and Supermicro alongside ASUS, Foxconn, GIGABYTE, Pegatron, QCT, Wistron, and Wiwynn, plus cloud providers AWS, Google Cloud, Microsoft Azure, Oracle Cloud Infrastructure, CoreWeave, Crusoe, Lambda, and Nebius. ^[1]

By the second half of 2025 the platform was appearing in benchmarks and live deployments. In the September 2025 round of MLPerf Inference, NVIDIA reported that the GB300 NVL72 set records across the tested workloads and delivered about a 45 percent increase in DeepSeek-R1 inference throughput over the GB200 NVL72. Cloud operators also began standing up very large GB300 clusters, with Microsoft Azure describing a production GB300 NVL72 deployment networking thousands of Blackwell Ultra GPUs into a single inference fabric. ^[10]^[7]

Why does Blackwell Ultra matter?

Blackwell Ultra marked NVIDIA's pivot to an annual product rhythm in which a mid-cycle "Ultra" refresh extends a microarchitecture before the next full generation arrives. Its specific bias toward memory capacity and attention throughput, rather than a wholesale compute redesign, reflected the rise of reasoning models and inference-time scaling, where serving cost is dominated by long output sequences and large key-value caches rather than by raw matrix-multiply peaks. By raising per-GPU HBM3e to 288 GB and dense FP4 to 15 petaFLOPS while keeping the rack-scale NVLink fabric of base Blackwell, NVIDIA aimed to lower the cost per token of these workloads and to bridge the gap to the Vera Rubin platform, the next-generation architecture slated to use HBM4 memory. ^[2]^[3]^[4]

References

NVIDIA Newsroom, "NVIDIA Blackwell Ultra AI Factory Platform Paves Way for Age of AI Reasoning," March 18, 2025. https://nvidianews.nvidia.com/news/nvidia-blackwell-ultra-ai-factory-platform-paves-way-for-age-of-ai-reasoning ↩
NVIDIA Investor Relations, "NVIDIA Blackwell Ultra AI Factory Platform Paves Way for Age of AI Reasoning," March 18, 2025. https://investor.nvidia.com/news/press-release-details/2025/NVIDIA-Blackwell-Ultra-AI-Factory-Platform-Paves-Way-for-Age-of-AI-Reasoning/default.aspx ↩
NVIDIA Technical Blog, "Inside NVIDIA Blackwell Ultra: The Chip Powering the AI Factory Era." https://developer.nvidia.com/blog/inside-nvidia-blackwell-ultra-the-chip-powering-the-ai-factory-era/ ↩
NVIDIA, "GB300 NVL72" product page. https://www.nvidia.com/en-us/data-center/gb300-nvl72/ ↩
TweakTown, "NVIDIA details Blackwell Ultra GB300: dual-die design, 208B transistors, up to 288GB HBM3E." https://www.tweaktown.com/news/107373/nvidia-details-blackwell-ultra-gb300-dual-die-design-208b-transistors-up-to-288gb-hbm3e/index.html ↩
Tom's Hardware, "Nvidia announces Blackwell Ultra B300, 1.5X faster than B200 with 288GB HBM3e and 15 PFLOPS dense FP4." https://www.tomshardware.com/pc-components/gpus/nvidia-announces-blackwell-ultra-b300-1-5x-faster-than-b200-with-288gb-hbm3e-and-15-pflops-dense-fp4 ↩
Tom's Hardware, "Nvidia claims software and hardware upgrades allow Blackwell Ultra GB300 to dominate MLPerf benchmarks," September 10, 2025. https://www.tomshardware.com/pc-components/gpus/nvidia-claims-software-and-hardware-upgrades-allow-blackwell-ultra-gb300-to-dominate-mlperf-benchmarks-touts-45-percent-deepseek-r-1-inference-throughput-increase-over-gb200 ↩
NVIDIA, "GB200 NVL72" product page. https://www.nvidia.com/en-us/data-center/gb200-nvl72/ ↩
NVIDIA, "DGX B300" product page. https://www.nvidia.com/en-us/data-center/dgx-b300/ ↩
Tom's Hardware, "Microsoft deploys world's first 'supercomputer-scale' GB300 NVL72 Azure cluster." https://www.tomshardware.com/tech-industry/artificial-intelligence/microsoft-deploys-worlds-first-supercomputer-scale-gb300-nvl72-azure-cluster-4-608-gb300-gpus-linked-together-to-form-a-single-unified-accelerator-capable-of-1-44-pflops-of-inference ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

AI Accelerator Comparison (H100 vs B200 vs MI300 vs TPU)GPU Technology Conference High Bandwidth Memory (HBM)NVIDIA Blackwell B200 NVIDIA HGX NVIDIA Rubin Ultra