NVIDIA H100
Last reviewed
May 18, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v3 · 5,103 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 18, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v3 · 5,103 words
Add missing citations, update stale details, or suggest a clearer explanation.
NVIDIA H100 (also called the H100 Tensor Core GPU) is a data-center graphics processing unit built by NVIDIA on the Hopper microarchitecture. It was announced by NVIDIA chief executive Jensen Huang at the GTC conference on March 22, 2022, and reached general availability in the second half of 2022 [1][2]. The product is named after computer scientist Grace Hopper, continuing NVIDIA's tradition of naming data-center architectures after notable scientists. The H100 succeeded the Ampere-based A100 as NVIDIA's flagship AI training and inference accelerator and became the central piece of compute infrastructure during the 2023 to 2024 generative AI buildout [3].
The H100 introduced several features targeting large language model workloads, including a fourth-generation Tensor Core, the Transformer Engine, native FP8 support, fourth-generation NVLink, and HBM3 memory. Demand outstripped supply through most of 2023, with lead times of several months to over a year, per-unit prices in the $25,000 to $40,000 range, and cloud rental rates initially as high as $4 to $8 per GPU-hour [4][5]. By mid-2026 the H100 had transitioned from scarce frontier-training hardware into a commoditized inference workhorse, with on-demand cloud rates dipping below $2 per GPU-hour on neoclouds and used eight-GPU H100 servers trading for $150,000 to $180,000 on the secondary market [6][7]. Major customers included Microsoft, Meta, Google, Amazon, Oracle, CoreWeave, Tesla, and xAI, whose Colossus cluster reached 200,000 H100 and H200 GPUs in 2025 [8].
The H100 was succeeded in the Hopper generation by the memory-upgraded H200, announced November 13, 2023, and by the Blackwell-architecture B200 and GB200 products announced at GTC on March 18, 2024, followed by Blackwell Ultra B300 and GB300 systems announced at GTC on March 18, 2025 [9][10][11].
| Field | Value |
|---|---|
| Type | Data-center GPU accelerator |
| Microarchitecture | Hopper |
| Announced | March 22, 2022 |
| Released | Late 2022 |
| Manufacturer | NVIDIA, fabricated by TSMC |
| Process node | TSMC 4N (custom 4 nm) |
| Transistors | 80 billion |
| Die size | 814 mm squared |
| SMs (SXM5) | 132 (of 144 on full GH100) |
| CUDA cores (SXM5) | 16,896 |
| Tensor cores (SXM5) | 528 (4th generation) |
| Memory | Up to 80 GB HBM3 (SXM5); 94 GB HBM3 (NVL) |
| Memory bandwidth | 3.35 TB/s (SXM5) |
| NVLink | 4th generation, 900 GB/s |
| TDP | 700 W (SXM5) / 350 W (PCIe) |
| Predecessor | NVIDIA A100 |
| Successor | NVIDIA H200, NVIDIA B200, Blackwell Ultra B300 |
The H100 is built on the GH100 die, an 814 mm squared piece of silicon containing approximately 80 billion transistors fabricated on a custom TSMC process NVIDIA calls 4N, a 4 nm-class process tuned for NVIDIA designs [1][12]. The full GH100 contains 144 streaming multiprocessors (SMs) organized into eight graphics processing clusters and 66 texture processing clusters; the highest-end shipping configuration enables 132 SMs in the SXM5 variant and 114 SMs in the PCIe variant. Each SM has 128 FP32 CUDA cores, four fourth-generation Tensor Cores, and 256 KB of combined L1 cache and shared memory [12].
In its flagship SXM5 configuration the H100 exposes 16,896 CUDA cores and 528 Tensor Cores. The PCIe variant ships with 14,592 CUDA cores and 456 Tensor Cores. The chip carries 50 MB of L2 cache, 25 percent more than the A100, and pairs the GPU die with HBM stacks in the same package using TSMC's CoWoS-S advanced packaging [12].
A major departure from prior generations is asynchronous execution at much larger granularity. Hopper introduces the Tensor Memory Accelerator (TMA), a dedicated hardware unit that copies large multidimensional tensors between global and shared memory without consuming CUDA or Tensor Core cycles. Hopper also adds thread block clusters, a new level in the CUDA execution hierarchy above blocks and below grids, allowing multiple thread blocks on adjacent SMs to cooperate through distributed shared memory [1][12].
The Hopper generation added several capabilities distinguishing the H100 from the A100. The headline addition is the Transformer Engine, combining fourth-generation Tensor Core hardware with a software runtime that selects FP8 or FP16 precision on a per-layer, per-tensor basis. The runtime tracks tensor statistics during training and dynamically chooses scaling factors so FP8 can be used wherever dynamic range allows, falling back to higher precision where needed for numerical stability. NVIDIA reports this can roughly double training throughput on transformer architectures while preserving final accuracy [1][12].
FP8 is supported in two sub-formats:
Other architectural additions over Ampere include:
Taken together, these additions positioned the H100 as a chip optimized for the transformer-dominated workload mix that emerged in 2022 and 2023.
NVIDIA shipped the H100 in several form factors plus export-restricted derivatives for the Chinese market.
| Variant | Form factor | TDP | Memory | Memory bandwidth | NVLink | Notes |
|---|---|---|---|---|---|---|
| H100 SXM5 | SXM5 module | 700 W | 80 GB HBM3 | 3.35 TB/s | 900 GB/s | Flagship; used in HGX and DGX H100 |
| H100 PCIe | Dual-slot PCIe Gen 5 | 350 W | 80 GB HBM2e (initial); 94 GB HBM3 (later) | 2.0 TB/s (HBM2e); ~3.9 TB/s (HBM3) | 600 GB/s via bridge | Lower thermal envelope |
| H100 NVL | Pair of PCIe cards bridged via NVLink | 2x 400 W | 188 GB HBM3 (94 GB each) | 7.8 TB/s aggregate | 600 GB/s bridge | GTC March 2023, LLM inference |
| GH200 Grace Hopper | H100 + Grace ARM CPU module | ~1000 W | 96 GB HBM3 (later 144 GB HBM3e) | 4 TB/s (HBM3); 4.9 TB/s (HBM3e) | NVLink-C2C 900 GB/s | 72-core Neoverse-V2 plus H100 |
| H800 SXM/PCIe | SXM5 / PCIe | 700 W / 350 W | 80 GB | 3.35 TB/s | 400 GB/s (cut from 900) | China-only, November 2022 [13] |
| H20 | SXM | 400 W | 96 GB HBM3 | 4 TB/s | 900 GB/s | China-only, 2024, reduced compute |
The SXM5 module is mounted on an HGX baseboard inside reference systems such as the DGX H100. The PCIe variant is intended for standard servers where the 700 W envelope is impractical [1][13]. The H100 NVL, announced at GTC in March 2023, packages two H100 PCIe cards joined by NVLink to expose 188 GB of HBM3 to a single workload, targeting inference of 175 billion-parameter-class transformers without a full HGX baseboard [13].
The H800 was the original China-export variant, introduced in November 2022 after the October 7, 2022 U.S. export controls. It retained H100 compute throughput but cut chip-to-chip NVLink bandwidth from 900 GB/s to 400 GB/s, throttling its usefulness for large-cluster training [13][14]. The H800 became the principal training accelerator for several Chinese frontier labs through 2023 and 2024, including DeepSeek, which trained its 671-billion-parameter DeepSeek-V3 model on a cluster of 2,048 H800 SXM5 GPUs over 2.78 million GPU-hours [15]. The October 17, 2023 BIS update closed the H800 loophole and forced NVIDIA to develop the further-degraded H20 under the new total-processing-performance metric [14].
The table below lists peak theoretical throughput from NVIDIA's Hopper whitepaper for the H100 SXM5. Sparse Tensor Core figures assume the 2:4 structured sparsity pattern introduced with Ampere [1][12].
| Precision | Dense throughput | With 2:4 sparsity |
|---|---|---|
| FP64 | 34 TFLOPS | n/a |
| FP64 Tensor Core | 67 TFLOPS | n/a |
| FP32 | 67 TFLOPS | n/a |
| TF32 Tensor Core | 989 TFLOPS | 1,979 TFLOPS |
| BF16 Tensor Core | 1,979 TFLOPS | 3,958 TFLOPS |
| FP16 Tensor Core | 1,979 TFLOPS | 3,958 TFLOPS |
| FP8 Tensor Core | 3,958 TFLOPS | 7,916 TFLOPS |
| INT8 Tensor Core | 3,958 TOPS | 7,916 TOPS |
Memory and interconnect specifications:
| Specification | H100 SXM5 | H100 PCIe (HBM2e) | H100 NVL (per pair) |
|---|---|---|---|
| Memory | 80 GB HBM3 | 80 GB HBM2e | 188 GB HBM3 |
| Bandwidth | 3.35 TB/s | 2.0 TB/s | 7.8 TB/s aggregate |
| Interconnect | NVLink 4.0 (900 GB/s) | PCIe Gen 5 (128 GB/s); optional NVLink bridge | NVLink bridge |
| TDP | 700 W | 300 to 350 W | 2 x 350 to 400 W |
| MIG | Up to 7 instances | Up to 7 instances | Per-card |
In MLPerf Training v3.0 (June 2023), NVIDIA reported H100 systems completing the GPT-3 175B reference workload about 2.8x faster than equivalent A100 systems at the same GPU count, with the gap widening at larger scale due to NVLink and NVSwitch improvements [16]. MLPerf Inference v3.1 included the first FP8 results on GPT-J 6B, roughly 4x the A100 baseline [16].
The DGX H100 is NVIDIA's reference appliance built around the H100 SXM5. Each DGX H100 contains:
NVIDIA quotes 32 PFLOPS of FP8 dense performance per DGX H100 (8 x 3.958 PFLOPS), or 64 PFLOPS with 2:4 sparsity. The DGX H100 shipped to customers in late 2022 and early 2023 and replaced the DGX A100 at the top of the product line [17].
The related HGX H100 baseboard exposes the same eight-GPU NVLink topology but is sold to OEMs (Supermicro, Dell, HPE, Lenovo, Quanta, and others) for integration into custom server chassis. The bulk of hyperscale H100 deployments used HGX boards rather than full DGX systems.
NVIDIA's reference design for an H100 supercomputer is the DGX SuperPOD with H100, combining 32 DGX H100 nodes (256 GPUs) into a single scalable unit connected by an NVLink Switch System and InfiniBand fabric. Multiple SuperPODs combine into SuperClusters [17][18].
Notable H100-based systems in production between 2023 and 2025:
These systems combine an NVLink Switch System for intra-pod connectivity with InfiniBand or Spectrum-X / RoCEv2 Ethernet for cluster-wide scale-out.
The launch of OpenAI's ChatGPT in November 2022 triggered a surge in demand for accelerators capable of training and serving large transformer models. By the first half of 2023 the H100 had become the de facto standard chip for state-of-the-art generative AI workloads, and NVIDIA's lead times stretched to 36 to 52 weeks for many enterprise buyers, according to reporting in Bloomberg and the Financial Times [4][5].
Reported per-unit prices for the H100 SXM5 in 2023 ranged from approximately $25,000 to over $40,000, with the higher figures associated with HGX configurations and resellers in supply-constrained regions. PCIe variants generally sold for $25,000 to $35,000 [4][5]. Cloud rental rates moved in step: hyperscalers and specialty providers initially priced on-demand H100 instances at $4 to $8 per GPU-hour in 2023, with reserved capacity at lower effective rates [4].
The 2025 pricing collapse marked the transition of the H100 from frontier hardware to commodity inference fleet:
Key enterprise customers through 2025 included Microsoft, Meta, Google, Amazon, Oracle, CoreWeave, Tencent, ByteDance (subject to export-control limits), Tesla, and xAI [3]. NVIDIA's data-center revenue grew from roughly $15 billion in fiscal 2023 to about $47.5 billion in fiscal 2024 and $115.2 billion in fiscal 2025, the bulk attributable to H100 and H200 shipments. In Q3 of fiscal 2026 (ending October 2025) NVIDIA reported a record $51.2 billion in single-quarter data-center revenue, up 66 percent year over year, with H200 and GB200/GB300 Blackwell systems leading the mix [23]. NVIDIA's market capitalization, around $300 billion in late 2022, exceeded $3 trillion by mid-2024 and crossed $4 trillion in 2025 [3][23].
The H100 became the workhorse chip for foundation model development between 2023 and 2024 and remained the dominant inference engine through 2025 and 2026. Reported and disclosed workloads include:
The table below summarizes how the H100 fits into NVIDIA's data-center GPU roadmap, drawing on NVIDIA product pages and architecture whitepapers [1][9][10][11][12][24].
| Specification | A100 (SXM4, 80 GB) | H100 (SXM5) | H200 (SXM) | B200 (SXM) | B300 (Blackwell Ultra) |
|---|---|---|---|---|---|
| Architecture | Ampere | Hopper | Hopper | Blackwell | Blackwell Ultra |
| Announced | May 2020 | March 2022 | November 2023 | March 2024 | March 2025 |
| Process | TSMC 7 nm (N7) | TSMC 4N | TSMC 4N | TSMC 4NP | TSMC 4NP |
| Transistors | 54.2 billion | 80 billion | 80 billion | 208 billion (2 dies) | 208 billion (2 dies) |
| Memory | 80 GB HBM2e | 80 GB HBM3 | 141 GB HBM3e | 192 GB HBM3e | 288 GB HBM3e |
| Memory bandwidth | 2.0 TB/s | 3.35 TB/s | 4.8 TB/s | 8.0 TB/s | 8.0 TB/s |
| FP8/FP4 Tensor (dense) | n/a | 3,958 TFLOPS FP8 | 3,958 TFLOPS FP8 | 9,000 TFLOPS FP8 / 18 PFLOPS FP4 | 15 PFLOPS NVFP4 |
| BF16 Tensor (dense) | 312 TFLOPS | 1,979 TFLOPS | 1,979 TFLOPS | 4,500 TFLOPS | ~4,500 TFLOPS |
| NVLink bandwidth | 600 GB/s | 900 GB/s | 900 GB/s | 1.8 TB/s | 1.8 TB/s |
| TDP | 400 W | 700 W | 700 W | 1,000 W | 1,400 W |
The A100, launched at GTC in May 2020, established the SXM4 form factor and HGX baseboard pattern the H100 inherited. Built on TSMC's 7 nm N7 process with 54.2 billion transistors, supporting third-generation NVLink at 600 GB/s, it shipped in 40 GB and 80 GB HBM2e configurations and was the workhorse for the first wave of large-model training, including the original ChatGPT runs of GPT-3.5 [24].
The H200, announced on November 13, 2023, retained the GH100 Hopper SM design but upgraded memory to 141 GB of HBM3e at 4.8 TB/s, roughly a 76 percent capacity and 43 percent bandwidth increase over the H100 SXM5. NVIDIA quoted nearly 2x inference speedup on Llama 2 70B in early benchmarks. The H200 reached general availability in Q2 2024 [9].
The Blackwell generation was announced by Jensen Huang at GTC on March 18, 2024. The B200 uses two reticle-limit dies connected by a 10 TB/s die-to-die NVLink-C2C interface to form a single logical GPU with 208 billion transistors, delivering up to 20 PFLOPS FP4 and 2.5x training / 5x inference improvement over the H100 on certain LLM workloads. The GB200 Grace Blackwell Superchip pairs two B200 GPUs with one Grace CPU, and the GB200 NVL72 rack-scale system aggregates 36 GB200 superchips (72 B200 GPUs, 36 Grace CPUs) into a single liquid-cooled rack [10].
Blackwell Ultra (B300 and GB300 NVL72) was announced at GTC on March 18, 2025, and began shipping in the second half of 2025. The B300 delivers 15 PFLOPS of NVFP4, 288 GB of HBM3e, and 8 TB/s memory bandwidth at a 1,400 W TDP. NVIDIA positioned the GB300 NVL72 as offering 1.5x the AI throughput of the GB200 NVL72 and approximately 7.5x the dense FP4 throughput of the H100 on reasoning-style inference; by Q4 of fiscal 2026 NVIDIA reported that GB300 shipments had surpassed GB200 and accounted for roughly two-thirds of Blackwell revenue [11][23].
The H100 is supported throughout NVIDIA's software stack:
torch.compile, Transformer Engine, and FlashAttention 2 and 3 integrations. TensorFlow 2.13 and later expose H100 kernels through XLA and Transformer Engine.FlashAttention 3, by Tri Dao and collaborators, was released in mid-2024 specifically tuned for Hopper, exploiting the asynchronous TMA, warp-specialization patterns, and FP8 Tensor Cores to reach close to 75 percent of the H100's theoretical FP8 peak on attention kernels [26].
The H100 sat at the center of an evolving United States export-control regime aimed at restricting advanced AI accelerator flows to China.
The H100 export controls became a major story in 2023 to 2025 coverage of AI, framed both as a national-security measure and as a constraint on Chinese frontier AI development. DeepSeek's 2024 to 2025 releases of V3 and R1, trained largely on H800 capacity, reignited debate over the effectiveness of the regime [15].
The H100 was widely covered as the central product of the generative AI boom. Independent reviews from AnandTech, ServeTheHome, and The Next Platform validated NVIDIA's throughput improvements over the A100 and highlighted the Transformer Engine and NVLink Switch System as the most significant additions [12][30]. Analysts at Morgan Stanley and Bernstein attributed the bulk of NVIDIA's data-center revenue growth in fiscal 2024 to H100 shipments, with unit volume reaching the low millions during 2024 [3].
The phrase "compute as the new oil" became common shorthand in 2023, reflecting the perception that H100 supply, more than algorithmic novelty, was the binding constraint on frontier model development. NVIDIA's market capitalization exceeded $1 trillion in mid-2023, $2 trillion in February 2024, and $3 trillion by June 2024, briefly making it the world's most valuable public company, and crossed $4 trillion during 2025 as Blackwell ramped and H100 inference fleets matured [3][23].
Critics raised three concerns: vendor concentration risk (which led hyperscalers to accelerate internal programs such as AWS Trainium, Google TPU, and Microsoft Maia); the energy footprint, with single training runs consuming gigawatt-hours; and supply-chain concentration at TSMC for fabrication and SK Hynix and Micron for HBM [3][30].
Competing accelerators in 2023 to 2025 included AMD's MI300X (December 2023, 192 GB HBM3, LLM inference focus) and the follow-on MI325X and MI350 series, Google TPU v4, v5p, and v6 Trillium, AWS Trainium and Trainium2, Cerebras CS-3, Tenstorrent, and Groq Language Processing Units. The H100 remained the dominant training accelerator through 2024 and the dominant cloud inference accelerator through 2025, with industry estimates putting NVIDIA's share of the merchant AI accelerator market above 80 percent into 2026 [3].
Despite its commercial success, the H100 had several recognized limitations: