NVIDIA H100
Last reviewed
Apr 30, 2026
Sources
22 citations
Review status
Source-backed
Revision
v2 ยท 3,919 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Apr 30, 2026
Sources
22 citations
Review status
Source-backed
Revision
v2 ยท 3,919 words
Add missing citations, update stale details, or suggest a clearer explanation.
NVIDIA H100 (also called the H100 Tensor Core GPU) is a data-center graphics processing unit built by NVIDIA on the Hopper microarchitecture. It was announced by NVIDIA chief executive Jensen Huang at the GTC conference on March 22, 2022, and reached general availability in the second half of 2022 [1][2]. The product is named after computer scientist Grace Hopper, continuing NVIDIA's tradition of naming data-center architectures after notable scientists. The H100 succeeded the Ampere-based A100 as NVIDIA's flagship AI training and inference accelerator and became the central piece of compute infrastructure during the 2023 to 2024 generative AI buildout [3].
The H100 introduced several features targeting large language model workloads, including a fourth-generation Tensor Core, the Transformer Engine, native FP8 support, fourth-generation NVLink, and HBM3 memory. Demand outstripped supply through most of 2023, with lead times of several months to over a year, per-unit prices in the $25,000 to $40,000 range, and cloud rental rates initially as high as $4 to $8 per GPU-hour [4][5]. Major customers included Microsoft, Meta, Google, Amazon, Oracle, CoreWeave, Tesla, and xAI, whose Colossus cluster, announced September 2024, used over 100,000 H100 GPUs [6].
The H100 was succeeded in the Hopper generation by the memory-upgraded H200, announced November 13, 2023, and by the Blackwell-architecture B200 and GB200 products announced at GTC on March 18, 2024 [7][8].
| Field | Value |
|---|---|
| Type | Data-center GPU accelerator |
| Microarchitecture | Hopper |
| Announced | March 22, 2022 |
| Released | Late 2022 |
| Manufacturer | NVIDIA, fabricated by TSMC |
| Process node | TSMC 4N (custom 4 nm) |
| Transistors | 80 billion |
| Die size | 814 mm squared |
| Memory | Up to 80 GB HBM3 (SXM5); 94 GB HBM3 (later NVL) |
| Predecessor | NVIDIA A100 |
| Successor | NVIDIA H200, NVIDIA B200 |
The H100 is built on the GH100 die, an 814 mm squared piece of silicon containing approximately 80 billion transistors fabricated on a custom TSMC process NVIDIA calls 4N, a 4 nm-class process tuned for NVIDIA designs [1][9]. The full GH100 contains 144 streaming multiprocessors (SMs) organized into eight graphics processing clusters, although the highest-end shipping configurations enable 132 SMs in the SXM5 variant and 114 SMs in the PCIe variant. Each SM has 128 FP32 CUDA cores, four fourth-generation Tensor Cores, and 256 KB of combined L1 cache and shared memory [9].
In its flagship SXM5 configuration the H100 exposes 16,896 CUDA cores and 528 Tensor Cores. The PCIe variant ships with 14,592 CUDA cores and 456 Tensor Cores. The chip carries 50 MB of L2 cache, 25 percent more than the A100, and pairs the GPU die with HBM stacks in the same package using TSMC's CoWoS-S advanced packaging [9].
A major departure from prior generations is asynchronous execution at much larger granularity. Hopper introduces the Tensor Memory Accelerator (TMA), a dedicated hardware unit that copies large multidimensional tensors between global and shared memory without consuming CUDA or Tensor Core cycles. Hopper also adds thread block clusters, a new level in the CUDA execution hierarchy above blocks and below grids, allowing multiple thread blocks on adjacent SMs to cooperate through distributed shared memory [1][9].
The Hopper generation added several capabilities distinguishing the H100 from the A100. The headline addition is the Transformer Engine, combining fourth-generation Tensor Core hardware with a software runtime that selects FP8 or FP16 precision on a per-layer, per-tensor basis. The runtime tracks tensor statistics during training and dynamically chooses scaling factors so FP8 can be used wherever dynamic range allows, falling back to higher precision where needed for numerical stability. NVIDIA reports this can roughly double training throughput on transformer architectures while preserving final accuracy [1][9].
FP8 is supported in two sub-formats:
Other architectural additions over Ampere include:
Taken together, these additions positioned the H100 as a chip optimized for the transformer-dominated workload mix that emerged in 2022 and 2023.
NVIDIA shipped the H100 in several form factors plus export-restricted derivatives for the Chinese market.
| Variant | Form factor | TDP | Memory | Memory bandwidth | NVLink | Notes |
|---|---|---|---|---|---|---|
| H100 SXM5 | SXM5 module | 700 W | 80 GB HBM3 | 3.35 TB/s | 900 GB/s | Flagship; used in HGX and DGX H100 |
| H100 PCIe | Dual-slot PCIe Gen 5 | 350 W | 80 GB HBM2e (initial); 94 GB HBM3 (later) | 2.0 TB/s (HBM2e); ~3.9 TB/s (HBM3) | 600 GB/s via bridge | Lower thermal envelope |
| H100 NVL | Pair of PCIe cards bridged via NVLink | 2x 400 W | 188 GB HBM3 (94 GB each) | 7.8 TB/s aggregate | 600 GB/s bridge | GTC March 2023, LLM inference |
| GH200 Grace Hopper | H100 + Grace ARM CPU module | ~1000 W | 96 GB HBM3 (later 144 GB HBM3e) | 4 TB/s (HBM3); 4.9 TB/s (HBM3e) | NVLink-C2C 900 GB/s | 72-core Neoverse-V2 plus H100 |
| H800 | SXM5 / PCIe | 700 W / 350 W | 80 GB | 3.35 TB/s | 400 GB/s (cut from 900) | China-only, October 2022 |
| H20 | SXM | 400 W | 96 GB HBM3 | 4 TB/s | 900 GB/s | China-only, 2024, reduced compute |
The SXM5 module is mounted on an HGX baseboard inside reference systems such as the DGX H100. The PCIe variant is intended for standard servers where the 700 W envelope is impractical [1][10]. The H100 NVL, announced at GTC in March 2023, packages two H100 PCIe cards joined by NVLink to expose 188 GB of HBM3 to a single workload, targeting inference of 175 billion-parameter-class transformers without a full HGX baseboard [10].
The table below lists peak theoretical throughput from NVIDIA's Hopper whitepaper for the H100 SXM5. Sparse Tensor Core figures assume the 2:4 structured sparsity pattern introduced with Ampere [1][9].
| Precision | Dense throughput | With 2:4 sparsity |
|---|---|---|
| FP64 | 34 TFLOPS | n/a |
| FP64 Tensor Core | 67 TFLOPS | n/a |
| FP32 | 67 TFLOPS | n/a |
| TF32 Tensor Core | 989 TFLOPS | 1,979 TFLOPS |
| BF16 Tensor Core | 1,979 TFLOPS | 3,958 TFLOPS |
| FP16 Tensor Core | 1,979 TFLOPS | 3,958 TFLOPS |
| FP8 Tensor Core | 3,958 TFLOPS | 7,916 TFLOPS |
| INT8 Tensor Core | 3,958 TOPS | 7,916 TOPS |
Memory and interconnect specifications:
| Specification | H100 SXM5 | H100 PCIe (HBM2e) | H100 NVL (per pair) |
|---|---|---|---|
| Memory | 80 GB HBM3 | 80 GB HBM2e | 188 GB HBM3 |
| Bandwidth | 3.35 TB/s | 2.0 TB/s | 7.8 TB/s aggregate |
| Interconnect | NVLink 4.0 (900 GB/s) | PCIe Gen 5 (128 GB/s); optional NVLink bridge | NVLink bridge |
| TDP | 700 W | 300 to 350 W | 2 x 350 to 400 W |
| MIG | Up to 7 instances | Up to 7 instances | Per-card |
In MLPerf Training v3.0 (June 2023), NVIDIA reported H100 systems completing the GPT-3 175B reference workload about 2.8x faster than equivalent A100 systems at the same GPU count, with the gap widening at larger scale due to NVLink and NVSwitch improvements [11]. MLPerf Inference v3.1 included the first FP8 results on GPT-J 6B, roughly 4x the A100 baseline [11].
The DGX H100 is NVIDIA's reference appliance built around the H100 SXM5. Each DGX H100 contains:
NVIDIA quotes 32 PFLOPS of FP8 dense performance per DGX H100 (8 x 3.958 PFLOPS), or 64 PFLOPS with 2:4 sparsity. The DGX H100 shipped to customers in late 2022 and early 2023 and replaced the DGX A100 at the top of the product line [12].
The related HGX H100 baseboard exposes the same eight-GPU NVLink topology but is sold to OEMs (Supermicro, Dell, HPE, Lenovo, Quanta, and others) for integration into custom server chassis. The bulk of hyperscale H100 deployments used HGX boards rather than full DGX systems.
NVIDIA's reference design for an H100 supercomputer is the DGX SuperPOD with H100, combining 32 DGX H100 nodes (256 GPUs) into a single scalable unit connected by an NVLink Switch System and InfiniBand fabric. Multiple SuperPODs combine into SuperClusters [12][13].
Notable H100-based systems in production between 2023 and 2024:
These systems combine an NVLink Switch System for intra-pod connectivity with InfiniBand or Spectrum-X Ethernet for cluster-wide scale-out.
The launch of OpenAI's ChatGPT in November 2022 triggered a surge in demand for accelerators capable of training and serving large transformer models. By the first half of 2023 the H100 had become the de facto standard chip for state-of-the-art generative AI workloads, and NVIDIA's lead times stretched to 36 to 52 weeks for many enterprise buyers, according to reporting in Bloomberg and the Financial Times [4][5].
Reported per-unit prices for the H100 SXM5 in 2023 ranged from approximately $25,000 to over $40,000, with the higher figures associated with HGX configurations and resellers in supply-constrained regions. PCIe variants generally sold for $25,000 to $35,000 [4][5]. Cloud rental rates moved in step: hyperscalers and specialty providers initially priced on-demand H100 instances at $4 to $8 per GPU-hour in 2023, with reserved capacity at lower effective rates. By the second half of 2024, as production scaled and the H200 and Blackwell pipeline approached, spot prices on neoclouds such as CoreWeave, Lambda, and Together AI fell into the $2 to $3 per GPU-hour range [4].
Key enterprise customers through 2024 included Microsoft, Meta, Google, Amazon, Oracle, CoreWeave, Tencent, ByteDance (subject to export-control limits), Tesla, and xAI [3]. NVIDIA's data-center revenue grew from roughly $15 billion in fiscal 2023 to over $47 billion in fiscal 2024, the bulk attributable to H100 shipments, and the company's market capitalization rose from about $300 billion in late 2022 to more than $3 trillion by mid-2024 [3].
The H100 became the workhorse chip for foundation model development between 2023 and 2024. Reported and disclosed workloads include:
The table below summarizes how the H100 fits into NVIDIA's data-center GPU roadmap, drawing on NVIDIA product pages and architecture whitepapers [1][7][8][9][15].
| Specification | A100 (SXM4, 80 GB) | H100 (SXM5) | H200 (SXM) | B200 (SXM) |
|---|---|---|---|---|
| Architecture | Ampere | Hopper | Hopper | Blackwell |
| Announced | May 2020 | March 2022 | November 2023 | March 2024 |
| Process | TSMC 7 nm (N7) | TSMC 4N | TSMC 4N | TSMC 4NP |
| Transistors | 54.2 billion | 80 billion | 80 billion | 208 billion (2 dies) |
| Memory | 80 GB HBM2e | 80 GB HBM3 | 141 GB HBM3e | 192 GB HBM3e |
| Memory bandwidth | 2.0 TB/s | 3.35 TB/s | 4.8 TB/s | 8.0 TB/s |
| FP8 Tensor (dense) | n/a | 3,958 TFLOPS | 3,958 TFLOPS | 9,000 TFLOPS |
| BF16 Tensor (dense) | 312 TFLOPS | 1,979 TFLOPS | 1,979 TFLOPS | 4,500 TFLOPS |
| NVLink bandwidth | 600 GB/s | 900 GB/s | 900 GB/s | 1.8 TB/s |
| TDP | 400 W | 700 W | 700 W | 1,000 W |
The A100, launched at GTC in May 2020, established the SXM4 form factor and HGX baseboard pattern the H100 inherited. Built on TSMC's 7 nm N7 process with 54.2 billion transistors, supporting third-generation NVLink at 600 GB/s, it shipped in 40 GB and 80 GB HBM2e configurations and was the workhorse for the first wave of large-model training, including the original ChatGPT runs of GPT-3.5 [15].
The H200, announced on November 13, 2023, retained the GH100 Hopper SM design but upgraded memory to 141 GB of HBM3e at 4.8 TB/s, roughly a 76 percent capacity and 43 percent bandwidth increase over the H100 SXM5. NVIDIA quoted nearly 2x inference speedup on Llama 2 70B in early benchmarks. The H200 reached general availability in Q2 2024 [7].
The Blackwell generation was announced by Jensen Huang at GTC on March 18, 2024. The B200 uses two reticle-limit dies connected by a 10 TB/s die-to-die NVLink-C2C interface to form a single logical GPU with 208 billion transistors, delivering up to 20 PFLOPS FP4 and 2.5x training / 5x inference improvement over the H100 on certain LLM workloads. The GB200 Grace Blackwell Superchip pairs two B200 GPUs with one Grace CPU, and the GB200 NVL72 rack-scale system aggregates 36 GB200 superchips (72 B200 GPUs, 36 Grace CPUs) into a single liquid-cooled rack [8].
The H100 is supported throughout NVIDIA's software stack:
torch.compile, Transformer Engine, and FlashAttention 2 and 3 integrations. TensorFlow 2.13 and later expose H100 kernels through XLA and Transformer Engine.FlashAttention 3, by Tri Dao and collaborators, was released in mid-2024 specifically tuned for Hopper, exploiting the asynchronous TMA, warp-specialization patterns, and FP8 Tensor Cores to reach close to 75 percent of the H100's theoretical FP8 peak on attention kernels [17].
The H100 sat at the center of an evolving United States export-control regime aimed at restricting advanced AI accelerator flows to China.
The H100 export controls became a major story in 2023 and 2024 coverage of AI, framed both as a national-security measure and as a constraint on Chinese frontier AI development.
The H100 was widely covered as the central product of the generative AI boom. Independent reviews from AnandTech, ServeTheHome, and The Next Platform validated NVIDIA's throughput improvements over the A100 and highlighted the Transformer Engine and NVLink Switch System as the most significant additions [9][20]. Analysts at Morgan Stanley and Bernstein attributed the bulk of NVIDIA's data-center revenue growth in fiscal 2024 to H100 shipments, with unit volume reaching the low millions during 2024 [3].
The phrase "compute as the new oil" became common shorthand in 2023, reflecting the perception that H100 supply, more than algorithmic novelty, was the binding constraint on frontier model development. NVIDIA's market capitalization exceeded $1 trillion in mid-2023, $2 trillion in February 2024, and $3 trillion by June 2024, briefly making it the world's most valuable public company [3].
Critics raised three concerns: vendor concentration risk (which led hyperscalers to accelerate internal programs such as AWS Trainium, Google TPU, and Microsoft Maia); the energy footprint, with single training runs consuming gigawatt-hours; and supply-chain concentration at TSMC for fabrication and SK Hynix and Micron for HBM [3][20].
Competing accelerators in 2023 and 2024 included AMD's MI300X (December 2023, 192 GB HBM3, LLM inference focus), Google TPU v4 and v5p, AWS Trainium, Cerebras CS-3, Tenstorrent, and Groq Language Processing Units. The H100 remained the dominant training accelerator through 2024, with industry estimates putting NVIDIA's share of the merchant AI accelerator market above 80 percent [3].
Despite its commercial success, the H100 had several recognized limitations: