Template:Infobox chipset
NVIDIA Blackwell is a GPU architecture and data center platform family by NVIDIA introduced on March 18, 2024 at the company's Graphics Technology Conference (GTC). It succeeds NVIDIA Hopper for datacenter and Ada Lovelace for consumer graphics, and underpins products such as the B200 Tensor Core GPU, the GB200 Grace Blackwell Superchip (which couples two B200 GPUs with a Grace CPU), rack-scale systems like GB200 NVL72, and the consumer GeForce RTX 50 series. Blackwell-architecture GPUs comprise two reticle-limited dies linked by a 10 TB/s chip-to-chip interconnect in a single logical GPU, are fabricated on custom TSMC 4NP for datacenter and 4N for consumer products, and pack 208 billion transistors in datacenter variants.[1]
With its dual-die design, second-generation Transformer Engine supporting FP4 precision, and fifth-generation NVLink interconnect, Blackwell represents a generational leap in AI compute density. NVIDIA has described Blackwell as the engine behind "AI factories," positioning the architecture as the foundation for training and deploying trillion-parameter large language models.
The architecture was first leaked in 2022, with the B40 and B100 accelerators confirmed in October 2023 via an official NVIDIA roadmap during an investors' presentation when "Hopper-Next" was replaced with "Blackwell".[2] NVIDIA CEO Jensen Huang officially announced Blackwell at GTC 2024, stating the company invested approximately $10 billion in research and development for the NV-HBI die interconnect technology.[2]
Blackwell is named after David Harold Blackwell (April 24, 1919 -- July 8, 2010), an American statistician and mathematician whose work profoundly influenced the fields that underpin modern artificial intelligence. Born in Centralia, Illinois, Blackwell showed exceptional mathematical talent from a young age. His passion for mathematics began during a high school geometry course, and he graduated from high school in 1935 at the age of sixteen.[3][4]
Blackwell earned his Bachelor of Arts (1938), Master of Arts (1939), and Ph.D. (1941) from the University of Illinois at Urbana-Champaign, completing his doctoral dissertation on Markov chains under the supervision of Joseph L. Doob. At the time, he was only the seventh African American to earn a Ph.D. in mathematics in the United States. He joined the faculty of Howard University in 1944, where he served as Head of the Mathematics Department from 1947 to 1954.[3][4]
In 1954, Blackwell was recruited to the University of California, Berkeley, where he became the institution's first African American tenured professor. He also served as chairman of Berkeley's statistics department from 1957 to 1961. In 1965, Blackwell became the first African American scholar elected to the U.S. National Academy of Sciences.[3][4]
Blackwell made foundational contributions to game theory, probability theory, information theory, and Bayesian statistics. He is known for the Rao-Blackwell theorem, a fundamental result in mathematical statistics that provides a method for improving estimators. He independently developed key ideas in dynamic programming, which finds applications today in finance, genomic analysis, and reinforcement learning algorithms. His 1954 book Theory of Games and Statistical Decisions, co-authored with Abraham Girshick, became a landmark text. During his time as a consultant at the RAND Corporation (1948 to 1950), Blackwell applied game theory to military strategy by modeling the optimal timing of theoretical armed duels.[3][4]
Blackwell was posthumously awarded the National Medal of Science in 2012. Numerous mathematical concepts and awards bear his name, including Blackwell games, Blackwell determinacy, and the MAA-NAM David Blackwell Lecture. His contributions to game theory, probability theory, and statistics have influenced the machine learning and deep learning algorithms that form the backbone of today's AI systems.
All Blackwell-architecture datacenter GPUs consist of two reticle-limited dies (codenamed GB100, each with 104 billion transistors for a total of 208 billion) connected internally by a 10 TB/s NV-High Bandwidth Interface (NV-HBI) based on NVLink 7 protocol. The two dies are presented as one GPU to software with full cache coherency.[1][2] This approach was necessary because the transistor count required for Blackwell exceeded the maximum size of a single reticle on TSMC's lithography equipment. The NV-HBI interconnect operates at 10 TB/s bidirectional bandwidth, allowing the two dies to communicate with latency low enough that workloads see a single unified GPU.
The datacenter B200 GPU enables 148 streaming multiprocessors (SMs) across both dies (74 per die out of 80 physical SMs, with some disabled for yield), providing approximately 18,432 CUDA cores and 592 fifth-generation Tensor Cores. Each SM offers 228 KB of shared memory with 64 concurrent warps to maximize utilization.[5][6]
The B200 is fabricated on TSMC's custom 4NP process node, an optimized variant of the 4 nm class process developed in collaboration with NVIDIA specifically for high-performance datacenter chips. Consumer Blackwell products (GeForce RTX 50 series) use the standard TSMC 4N process and feature monolithic (single-die) designs rather than the dual-die approach of datacenter parts.[2]
The second-generation Transformer Engine is one of Blackwell's most significant innovations. It adds micro-tensor scaling and community-defined microscaling formats (MXFP4, MXFP6), enabling FP4 inference and larger effective model sizes while maintaining accuracy. By operating at 4-bit floating-point precision (FP4), Blackwell can effectively double the model size that fits in GPU memory compared to 8-bit formats, while also doubling throughput for inference workloads.[1][3]
The Transformer Engine dynamically adjusts precision during computation, using higher precision where needed for accuracy and lower precision where performance gains can be achieved without sacrificing output quality. This approach is particularly effective for large language model inference, where many layers and operations can tolerate reduced precision without measurable degradation in response quality.
Blackwell's Tensor Cores support a wide range of precision formats:
| Precision Format | Use Case | Blackwell Support |
|---|---|---|
| FP4 | Inference, lightweight training | New in Blackwell |
| FP6 | Inference with higher accuracy | New in Blackwell |
| FP8 | Training and inference | Enhanced from Hopper |
| FP16 / BF16 | Standard mixed-precision training | Yes |
| TF32 | Drop-in replacement for FP32 training | Yes |
| FP32 | High-precision compute | Yes |
| FP64 | Scientific/HPC workloads | Yes |
| INT8 | Quantized inference | Yes |
Blackwell introduces fifth-generation NVLink, delivering up to 1.8 TB/s bidirectional throughput per GPU through 18 NVLink5 links running at 100 GB/s each. This represents more than 14 times the bandwidth of PCIe Gen5 and double the per-GPU NVLink bandwidth of the Hopper H100.[1][3]
The NVLink Switch Chip is a dedicated ASIC that enables all-to-all GPU communication within a rack-scale domain. In the GB200 NVL72 configuration, nine NVLink Switch chips route traffic between all 72 GPUs, creating a 130 TB/s aggregate bandwidth domain with approximately 300 nanosecond switch latency. Each NVLink switch tray provides 144 NVLink ports, fully connecting all 18 NVLink ports on every one of the 72 Blackwell GPUs.[1][7]
For even larger deployments, the fifth-generation NVLink architecture can scale up to 576 GPUs in a single NVLink domain with over 1 PB/s total bandwidth, enabling efficient training of trillion-parameter and multi-trillion-parameter AI models without requiring slower fabric interconnects for GPU-to-GPU communication. The system also supports SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) with FP8, providing 4x bandwidth efficiency for collective operations.[7]
Fifth-Generation Tensor Cores. Support for new precision formats including FP4, FP6, FP8, FP16, BF16, TF32, FP32, and FP64 operations for enhanced AI and HPC workloads. The new Tensor Cores deliver approximately 2x the compute throughput of Hopper's fourth-generation Tensor Cores at equivalent precisions, and the addition of FP4 enables up to 5x the effective inference throughput.[5]
AI Management Processor (AMP). A RISC-V-based dedicated scheduler chip on the GPU that offloads scheduling from the CPU, improving resource control via Windows Hardware-Accelerated GPU Scheduling (HAGS). This frees the host CPU from GPU management overhead and improves responsiveness for latency-sensitive workloads.[2]
Confidential Computing (Secure AI). Blackwell is the first TEE-I/O (Trusted Execution Environment I/O) capable GPU family with inline protection over NVLink, offering near-parity throughput to unencrypted modes when paired with compatible hosts. It protects GPU execution, memory, and register states while keeping models, training data, and inference prompts isolated across the entire AI lifecycle. This allows enterprises to deploy sensitive models in shared or multi-tenant cloud environments without sacrificing performance. NVIDIA's Remote Attestation Service provides mechanisms to ensure the integrity and security of devices operating within a TEE.[1][6]
RAS Engine. Dedicated hardware for reliability, availability, and serviceability with AI-based predictive management that monitors thousands of hardware and software data points for early fault detection. At hyperscale, even small improvements in uptime translate to significant cost savings, making this feature particularly valuable for data center operators running thousands of GPUs.[1]
Decompression Engine. Hardware acceleration for LZ4, Snappy, and Deflate formats at 800 GB/s to speed database and analytics pipelines. This engine is tightly coupled with the Grace CPU over a 900 GB/s NVLink-C2C interconnect, enabling efficient data movement for workloads that combine analytics with AI inference.[1][3]
The B200 is the flagship Blackwell datacenter GPU, used standalone in HGX B200 8-GPU servers and within the GB200 Grace Blackwell Superchip. It features two GB100 dies with 208 billion total transistors, 192 GB of HBM3e memory, and 8 TB/s memory bandwidth. The B200 has a thermal design power (TDP) of 1,000 W and requires liquid cooling in most configurations.[3][7]
| Specification | B200 |
|---|---|
| Architecture | Blackwell (GB100 dual-die) |
| Transistors | 208 billion |
| Process Node | TSMC 4NP |
| CUDA Cores | ~18,432 (148 SMs) |
| Tensor Cores | 592 (5th gen) |
| GPU Memory | 192 GB HBM3e |
| Memory Bandwidth | 8 TB/s |
| FP4 Tensor (with sparsity) | 20 PFLOPS |
| FP8 Tensor (with sparsity) | 10 PFLOPS |
| FP16/BF16 Tensor (with sparsity) | 5 PFLOPS |
| TF32 Tensor | 2.5 PFLOPS |
| FP64 Tensor | 40 TFLOPS |
| NVLink Bandwidth | 1.8 TB/s (bidirectional) |
| TDP | 1,000 W |
| Form Factor | SXM |
The B100 is a lower-power variant designed for drop-in compatibility with existing HGX H100 infrastructure. It operates at a TDP of 700 W, matching the H100's thermal envelope, so data centers can upgrade without modifying their cooling or power delivery systems. The B100 uses the same dual-die Blackwell design with 208 billion transistors and 192 GB of HBM3e memory with 8 TB/s bandwidth, but runs at lower clock speeds to stay within the 700 W power budget.[3][7]
The HGX B100 system delivers approximately 14 PFLOPS of FP4 performance (with sparsity) across its eight GPUs, compared to approximately 7 PFLOPS FP4 dense. While this is lower than the B200's output, it offers a straightforward upgrade path: organizations with existing H100-class server infrastructure can replace the GPU baseboards without redesigning their systems.
The GB200 Grace Blackwell Superchip integrates one Grace CPU with two B200 GPUs over NVLink-C2C (900 GB/s bidirectional), providing a tightly coupled CPU-GPU memory system with 864 GB total unified memory for large-scale LLM workloads. The Grace CPU is an Arm-based processor built on Neoverse V2 cores, providing 72 high-performance CPU cores alongside the two Blackwell GPUs.[3]
This integrated design eliminates the PCIe bottleneck between CPU and GPU, enabling faster data transfers for workloads that require frequent CPU-GPU communication, such as data preprocessing pipelines and retrieval-augmented generation (RAG) systems.
The NVIDIA DGX B200 is a complete AI server system featuring eight B200 GPUs interconnected with fifth-generation NVLink. The system is powered by two Intel Xeon Platinum 8570 processors and provides 1,440 GB of total GPU memory (180 GB usable per GPU in the DGX configuration). The DGX B200 delivers 72 PFLOPS of FP8 compute performance and 144 PFLOPS of FP4 (with sparsity), with maximum system power consumption of approximately 10.2 kW.[8]
NVIDIA positions the DGX B200 as delivering 3x the training performance and 15x the inference performance compared to the previous-generation DGX H100 system. Pricing for the DGX B200 system starts at approximately $500,000 to $515,000 based on early OEM listings.[8]
The HGX B200 is the reference GPU baseboard designed for OEM server manufacturers (such as Dell, HPE, Lenovo, and Supermicro) to integrate into their own server chassis. Each HGX B200 board contains eight B200 GPUs connected via NVLink. Unlike the DGX system, which is a turnkey server from NVIDIA, the HGX baseboard allows partners to customize the CPU, storage, and networking components around the GPU subsystem.
The GB200 NVL72 connects 36 GB200 Grace Blackwell Superchips (72 B200 GPUs + 36 Grace CPUs) in a liquid-cooled rack with a single 72-GPU NVLink domain that behaves like one giant GPU for software. The system uses direct-to-chip liquid cooling, with liquid running through manifolds and cold plates attached to the CPUs and GPUs in each compute tray.[9]
Physically, the GB200 NVL72 rack measures 600 mm wide by 1,068 mm deep by 2,236 mm high and weighs approximately 1.36 metric tons (3,000 pounds). System power consumption is approximately 120 to 132 kW, with 115 kW liquid-cooled and 17 kW air-cooled components.[9]
| Metric | GB200 NVL72 (rack) | GB200 Grace Blackwell Superchip (per node) |
|---|---|---|
| FP4 Tensor Core (with sparsity) | 1,440 PFLOPS | 40 PFLOPS |
| FP8/FP6 Tensor Core (with sparsity) | 720 PFLOPS | 20 PFLOPS |
| INT8 Tensor Core (with sparsity) | 720 POPS | 20 POPS |
| FP16/BF16 Tensor Core (with sparsity) | 360 PFLOPS | 10 PFLOPS |
| TF32 Tensor Core | 180 PFLOPS | 5 PFLOPS |
| FP32 | 5,760 TFLOPS | 160 TFLOPS |
| FP64 / FP64 Tensor Core | 2,880 TFLOPS | 80 TFLOPS |
| GPU Memory (HBM3e) | Up to 13.4 TB, 576 TB/s | Up to 372 GB, 16 TB/s |
| NVLink Bandwidth | 130 TB/s | 3.6 TB/s |
| CPU Cores (Arm Neoverse V2) | 2,592 | 72 |
| CPU Memory (LPDDR5X) | Up to 17 TB, up to 18.4 TB/s | Up to 480 GB, up to 512 GB/s |
The GB200 NVL36x2 is an alternative rack configuration that splits the 72-GPU domain across two interconnected cabinets, each containing 18 Grace CPUs and 36 Blackwell GPUs. This configuration uses a hybrid cooling approach where the Grace CPUs, Blackwell GPUs, ConnectX-7 NICs, and NVLink Switch ASICs are liquid-cooled, while the remaining components are air-cooled. Power consumption is approximately 66 kW per rack (132 kW total for both racks), roughly 10 kW more than the NVL72 configuration due to additional NVSwitch ASICs and cross-rack interconnect cabling. The NVL36x2 uses 36 NVSwitch5 ASICs in total (compared to 18 in the NVL72) to maintain full-bisection NVLink bandwidth across the two cabinets.[9]
This configuration is expected to be the most commonly deployed form factor for GB200 systems, as it is compatible with a wider range of existing datacenter environments that may not support the density or cooling requirements of the single-rack NVL72.
In March 2025 at GTC 2025, NVIDIA introduced "Blackwell Ultra," the B300 GPU and GB300 NVL72 system. The B300 GPU retains the dual-die Blackwell design with 208 billion transistors but increases the enabled SM count to 160 SMs total (128 CUDA cores per SM, yielding 20,480 CUDA cores). It upgrades to 288 GB of HBM3e memory per GPU and draws 1,400 W of power.[10]
| Specification | B200 | B300 (Blackwell Ultra) |
|---|---|---|
| GPU Memory | 192 GB HBM3e | 288 GB HBM3e |
| Memory Bandwidth | 8 TB/s | 8 TB/s |
| FP4 Dense Performance | 9 PFLOPS | 15 PFLOPS |
| FP4 Sparse Performance | 20 PFLOPS | 30 PFLOPS |
| NVLink Bandwidth | 1.8 TB/s | 1.8 TB/s |
| TDP | 1,000 W | 1,400 W |
| CUDA Cores | ~18,432 | 20,480 |
The GB300 NVL72 rack system delivers 1.5x more AI performance than the GB200 NVL72 and targets order-of-magnitude gains in AI reasoning and real-time generation workloads compared to the Hopper generation. In MLPerf benchmarks, the GB300 NVL72 demonstrated 1.9x faster training than the GB200 NVL72 at equivalent GPU scale, and 4.2x cumulative improvement over the H100 baseline. Blackwell Ultra products are expected to be available from partners starting in the second half of 2025.[10]
NVIDIA also announced the DGX Station with GB300 Grace Blackwell Superchip, a workstation-class system aimed at researchers and developers who need access to Blackwell Ultra performance without deploying full rack-scale infrastructure.
The GeForce RTX 50 series based on Blackwell architecture was announced at CES 2025 on January 6, 2025. Consumer dies use TSMC 4N process and do not feature the dual-die design of datacenter parts. Consumer Blackwell GPUs include several features specific to gaming and content creation.[11]
Fourth-Generation RT Cores. Feature Triangle Cluster Intersection Engine for mega geometry and Linear Swept Spheres for rendering fine details like hair, with 2x ray-triangle intersection throughput compared to the previous generation.[2]
DLSS 4 and Multi Frame Generation. The RTX 50 series supports DLSS 4 with Multi Frame Generation, which can generate up to three additional frames per rendered frame, significantly boosting frame rates in supported games.
| Model | GPU | CUDA Cores | Memory | Bus Width | Die Size | Transistors | TDP | MSRP | Release Date |
|---|---|---|---|---|---|---|---|---|---|
| RTX 5090 | GB202-300-A1 | 21,760 | 32 GB GDDR7 | 512-bit | 750 mm2 | 92.2 bn | 575W | $1,999 | January 30, 2025 |
| RTX 5080 | GB203-200-A1 | 10,752 | 16 GB GDDR7 | 256-bit | 378 mm2 | 45.6 bn | 360W | $999 | January 30, 2025 |
| RTX 5070 Ti | GB203-300-A1 | 8,960 | 16 GB GDDR7 | 256-bit | 378 mm2 | 45.6 bn | 300W | $749 | February 2025 |
| RTX 5070 | GB205-300-A1 | 6,144 | 12 GB GDDR7 | 192-bit | 263 mm2 | 31.1 bn | 250W | $549 | February 2025 |
Professional RTX PRO Blackwell series includes RTX PRO 6000 (96 GB VRAM, April 2025) and RTX PRO 5000/4500/4000 (summer 2025).[13]
The following table compares the flagship datacenter GPUs across NVIDIA's three most recent architecture generations:
| Feature | NVIDIA H100 (Hopper) | NVIDIA H200 (Hopper) | NVIDIA B200 (Blackwell) |
|---|---|---|---|
| Transistors | 80 Billion | 80 Billion | 208 Billion (dual-die) |
| Process Node | TSMC 4N | TSMC 4N | TSMC 4NP |
| CUDA Cores | 14,592 | 14,592 | ~18,432 |
| Tensor Cores | 456 (4th gen) | 456 (4th gen) | 592 (5th gen) |
| Max AI Performance (Sparse) | 4 PFLOPS (FP8) | 4 PFLOPS (FP8) | 20 PFLOPS (FP4) / 10 PFLOPS (FP8) |
| Max GPU Memory | 80 GB HBM3 | 141 GB HBM3e | 192 GB HBM3e |
| Memory Bandwidth | 3.35 TB/s | 4.8 TB/s | 8 TB/s |
| NVLink (GPU-to-GPU) | 4th Gen: 900 GB/s | 4th Gen: 900 GB/s | 5th Gen: 1.8 TB/s |
| Lowest Supported Precision | FP8 | FP8 | FP4 |
| TDP | 700W | 700W | 1,000W |
| NVLink Domain (Max GPUs) | 256 | 256 | 576 |
The jump from H100 to B200 is particularly notable in memory capacity (2.4x), memory bandwidth (2.4x), and peak AI throughput at the lowest supported precision (5x at FP4 vs FP8). The transition from 80 billion to 208 billion transistors was enabled by the dual-die design, as no single die at current process nodes could accommodate that transistor count.
Blackwell has demonstrated substantial improvements in AI training performance across multiple benchmarks. In MLPerf Training v5.0 and v5.1 submissions:
Blackwell's inference capabilities have been validated through MLPerf Inference v5.0 benchmarks:
| Benchmark | B200 (8-GPU) | H200 (8-GPU) | Speedup |
|---|---|---|---|
| Llama 2 70B (server) | 98,443 tokens/s | ~32,800 tokens/s | 3.0x |
| Llama 2 70B (offline) | 98,858 tokens/s | ~35,300 tokens/s | 2.8x |
| Mixtral 8x7B (server) | 126,845 tokens/s | ~60,400 tokens/s | 2.1x |
| Mixtral 8x7B (offline) | 128,148 tokens/s | ~61,000 tokens/s | 2.1x |
| Stable Diffusion XL (server) | 28.44 samples/s | ~17.8 samples/s | 1.6x |
The GB200 NVL72 system demonstrated even more dramatic gains due to its unified 72-GPU NVLink domain. On the Llama 3.1 405B benchmark, the NVL72 delivered up to 30x higher throughput compared to an H200 NVL8 submission, achieving 3.4x higher per-GPU performance. This multiplied advantage comes from the fact that the 405B model can be distributed across all 72 GPUs without inter-node communication bottlenecks.[15]
At the original Blackwell announcement, NVIDIA stated that, per chip, Blackwell delivers:
Blackwell's power consumption represents a significant increase over previous generations, reflecting the architecture's focus on maximizing compute density:
| Product | TDP | Cooling Requirement |
|---|---|---|
| B100 | 700 W | Air-cooled (H100 compatible) |
| B200 | 1,000 W | Liquid cooling required |
| B300 (Blackwell Ultra) | 1,400 W | Liquid cooling required |
| DGX B200 (8-GPU system) | 10,200 W (system) | Liquid cooling for GPUs |
| GB200 NVL72 (rack) | 120,000 to 132,000 W | Direct-to-chip liquid cooling |
Despite the higher absolute power draw, NVIDIA claims a 25x improvement in energy efficiency per token for trillion-parameter model inference compared to the Hopper generation. This improvement stems from the combination of FP4 precision (which doubles throughput per watt relative to FP8), architectural efficiency gains in the Tensor Cores, and the reduced data movement overhead enabled by the NVLink domain architecture.[3]
In practice, measurements have shown that B200 GPUs often draw well below their 1,000 W TDP during typical inference workloads, with actual power consumption around 600 W for individual GPUs under moderate load. Training workloads that fully exercise the Tensor Cores push power consumption closer to the rated TDP.
NVIDIA does not publish official retail prices for its datacenter GPUs, as they are sold through OEM partners and system integrators. However, early listings and OEM quotes provide approximate pricing:
| Product | Estimated Price |
|---|---|
| B200 192GB SXM (individual GPU) | $45,000 to $50,000 |
| DGX B200 (8-GPU server) | ~$515,000 |
| GB200 NVL72 (complete rack) | $2,000,000 to $3,000,000 (estimated) |
These prices are rough estimates and vary based on volume, configuration, and partner. By comparison, the H100 SXM GPU had an estimated street price of $25,000 to $40,000 at launch, though prices declined over 2024 and 2025 as supply improved.
All major cloud service providers have announced or launched Blackwell-based instances:
| Cloud Provider | Instance/Offering | Configuration | Status (as of early 2026) |
|---|---|---|---|
| Amazon Web Services (AWS) | EC2 P6 instances, GB200 NVL72 | 8x B200 (P6), 72-GPU NVL72 | Generally available |
| Google Cloud | A4 VMs (HGX B200), A4X (GB200) | 8x B200 per VM | Generally available |
| Microsoft Azure | ND GB200 v6 series | GB200 Grace Blackwell | Generally available |
| Oracle Cloud (OCI) | OCI Supercluster, OCI Compute | B200 and GB200 | Generally available |
| CoreWeave | Blackwell clusters | GB200 NVL72 racks | Generally available |
| Lambda | Cloud instances | 8x B200 | Generally available |
Cloud pricing for B200 GPUs varies significantly by provider and commitment level. On-demand pricing typically ranges from $5 to $7 per GPU-hour, while spot or preemptible instances can be as low as $2.25 per GPU-hour. Most providers offer B200s only in 8-GPU instances (matching the HGX B200 baseboard configuration), though some providers offer fractional GPU access through serverless platforms.[16]
By November 2024, Morgan Stanley reported that the entire 2025 production of Blackwell was sold out. Major cloud providers including Amazon Web Services, Google, Meta, Microsoft, OpenAI, Oracle, and Tesla committed to adopting Blackwell-based systems. Demand for Blackwell GPUs has significantly outstripped supply, continuing a pattern established with the H100 generation.[2]
Initial consumer RTX 50 series availability faced supply constraints, with some units found missing eight render output units (ROPs) due to a "production anomaly" affecting less than 0.5% of cards according to NVIDIA.[15]
Blackwell systems are offered in HGX and DGX platforms, and as managed DGX Cloud instances. They pair with NVIDIA's Quantum-X800 InfiniBand and Spectrum-X Ethernet fabrics (up to 800 Gb/s) and BlueField-3 DPUs for secure, composable acceleration.[3]
For inter-node networking, Blackwell systems support both InfiniBand and Ethernet-based fabrics:
NVIDIA AI Enterprise includes NVIDIA NIM inference microservices for Blackwell deployment, and the TensorRT-LLM library provides optimized kernels that take advantage of Blackwell's FP4 Tensor Cores. The CUDA compute capability for Blackwell datacenter GPUs is 10.0 (Blackwell) and 12.0 (Blackwell Ultra), with CUDA 12.8+ required for full support.[1]
NVIDIA maintains an aggressive annual release cadence for its datacenter GPU architectures:
| Year | Architecture | Key Product | Process | Memory |
|---|---|---|---|---|
| 2022 | Hopper | H100 | TSMC 4N | HBM3 |
| 2023 | Hopper (refresh) | H200 | TSMC 4N | HBM3e |
| 2024 | Blackwell | B200 / GB200 | TSMC 4NP | HBM3e |
| 2025 | Blackwell Ultra | B300 / GB300 | TSMC 4NP | HBM3e |
| 2026 | Vera Rubin | Rubin NVL144 | TSMC 3nm | HBM4 |
| 2027 | Rubin Ultra | R300 (expected) | TSMC 3nm | HBM4e |
| 2028+ | Feynman | TBD | TBD | TBD |
The Vera Rubin platform, announced at GTC 2025 and already taped out, will feature the Rubin GPU paired with the new Vera CPU (an Arm-based processor with 88 custom NVIDIA "Olympus" cores). The Rubin NVL144 is projected to deliver 3.6 EFLOPS of dense FP4 compute, roughly 3.3x the performance of the B300 NVL72's 1.1 EFLOPS. Rubin will also transition from HBM3e to HBM4 memory, increasing per-GPU bandwidth from 8 TB/s to 13 TB/s while maintaining 288 GB capacity. Rubin Ultra is expected in 2027, reportedly using a four-die GPU configuration for approximately 100 PFLOPS of FP4 performance per socket.[17]
After Rubin, NVIDIA's next datacenter architecture will be named after theoretical physicist Richard Feynman.