# NVIDIA H100

> Source: https://aiwiki.ai/wiki/nvidia_h100
> Updated: 2026-06-20
> Categories: AI Hardware, AI Infrastructure, NVIDIA
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**NVIDIA H100** (also called the **H100 Tensor Core GPU**) is a data-center [graphics processing unit](/wiki/gpu) built by [NVIDIA](/wiki/nvidia) on the **Hopper** microarchitecture, fabricated with over 80 billion transistors on a custom [TSMC](/wiki/tsmc) 4N (4 nm-class) process and shipping with up to 80 GB of [HBM3](/wiki/hbm3) memory at 3.35 TB/s [^1][^31]. It was announced by NVIDIA chief executive [Jensen Huang](/wiki/jensen_huang) at the [GTC](/wiki/gtc) conference on March 22, 2022, and reached general availability in the second half of 2022 [^1][^2]. NVIDIA described the H100 as "the new engine of the world's AI infrastructures" and "the biggest generational leap ever," claiming up to 9x faster at-scale AI training and up to 30x faster large-language-model inference throughput versus the prior-generation A100 [^32]. The product is named after computer scientist [Grace Hopper](/wiki/grace_hopper), continuing NVIDIA's tradition of naming data-center architectures after notable scientists. The H100 succeeded the [Ampere](/wiki/ampere_microarchitecture)-based [A100](/wiki/nvidia_a100) as NVIDIA's flagship AI training and inference accelerator and became the central piece of compute infrastructure during the 2023 to 2024 generative AI buildout [^3].

The H100 introduced several features targeting [large language model](/wiki/large_language_model) workloads, including a fourth-generation [Tensor Core](/wiki/tensor_core), the [Transformer Engine](/wiki/transformer_engine), native [FP8](/wiki/fp8) support, fourth-generation [NVLink](/wiki/nvlink), and [HBM3](/wiki/hbm3) memory. Demand outstripped supply through most of 2023, with lead times of several months to over a year, per-unit prices in the $25,000 to $40,000 range, and cloud rental rates initially as high as $4 to $8 per GPU-hour [^4][^5]. By mid-2026 the H100 had transitioned from scarce frontier-training hardware into a commoditized inference workhorse, with on-demand cloud rates dipping below $2 per GPU-hour on neoclouds and used eight-GPU H100 servers trading for $150,000 to $180,000 on the secondary market [^6][^7]. Major customers included [Microsoft](/wiki/microsoft), [Meta](/wiki/meta), [Google](/wiki/google), Amazon, [Oracle](/wiki/oracle), [CoreWeave](/wiki/coreweave), [Tesla](/wiki/tesla), and [xAI](/wiki/xai), whose [Colossus](/wiki/colossus_supercomputer) cluster reached 200,000 H100 and H200 GPUs in 2025 [^8].

The H100 was succeeded in the Hopper generation by the memory-upgraded [H200](/wiki/nvidia_h200), announced November 13, 2023, and by the [Blackwell](/wiki/nvidia_blackwell)-architecture [B200](/wiki/nvidia_b200) and [GB200](/wiki/nvidia_gb200) products announced at GTC on March 18, 2024, followed by **Blackwell Ultra** B300 and GB300 systems announced at GTC on March 18, 2025 [^9][^10][^11].

## What is the NVIDIA H100?

The NVIDIA H100 is a data-center GPU accelerator designed primarily for training and running large neural networks, especially [transformer](/wiki/transformer)-based [large language models](/wiki/large_language_model). It is the first product based on the Hopper microarchitecture and the direct successor to the A100. The H100 ships in two main board form factors, an SXM5 module rated at 700 W and a dual-slot PCIe Gen 5 card rated at 350 W, plus the dual-card H100 NVL and the [GH200](/wiki/gh200) Grace Hopper module that pairs an H100 with an ARM-based [Grace](/wiki/grace_hopper) CPU [^1][^12]. Its defining innovation is the Transformer Engine, which combines fourth-generation Tensor Cores with software that automatically switches between FP8 and 16-bit precision per layer to accelerate transformer math while preserving accuracy [^1][^12].

## Infobox

| Field | Value |
|---|---|
| Type | Data-center GPU accelerator |
| Microarchitecture | [Hopper](/wiki/hopper_microarchitecture) |
| Announced | March 22, 2022 |
| Released | Late 2022 |
| Manufacturer | NVIDIA, fabricated by [TSMC](/wiki/tsmc) |
| Process node | TSMC 4N (custom 4 nm) |
| Transistors | 80 billion |
| Die size | 814 mm squared |
| SMs (SXM5) | 132 (of 144 on full GH100) |
| CUDA cores (SXM5) | 16,896 |
| Tensor cores (SXM5) | 528 (4th generation) |
| Memory | Up to 80 GB HBM3 (SXM5); 94 GB HBM3 (NVL) |
| Memory bandwidth | 3.35 TB/s (SXM5) |
| NVLink | 4th generation, 900 GB/s |
| TDP | 700 W (SXM5) / 350 W (PCIe) |
| Predecessor | [NVIDIA A100](/wiki/nvidia_a100) |
| Successor | [NVIDIA H200](/wiki/nvidia_h200), [NVIDIA B200](/wiki/nvidia_b200), [Blackwell Ultra B300](/wiki/nvidia_blackwell) |

## What is inside the Hopper GH100 die?

The H100 is built on the GH100 die, an 814 mm squared piece of silicon containing approximately 80 billion transistors fabricated on a custom [TSMC](/wiki/tsmc) process NVIDIA calls **4N**, a 4 nm-class process tuned for NVIDIA designs [^1][^12]. NVIDIA's Hopper architecture page describes the chip as "built with over 80 billion transistors using a cutting edge TSMC 4N process" [^31]. The full GH100 contains 144 streaming multiprocessors (SMs) organized into eight graphics processing clusters and 66 texture processing clusters; the highest-end shipping configuration enables 132 SMs in the SXM5 variant and 114 SMs in the PCIe variant. Each SM has 128 FP32 [CUDA](/wiki/cuda) cores, four fourth-generation [Tensor Cores](/wiki/tensor_core), and 256 KB of combined L1 cache and shared memory [^12].

In its flagship SXM5 configuration the H100 exposes 16,896 CUDA cores and 528 Tensor Cores. The PCIe variant ships with 14,592 CUDA cores and 456 Tensor Cores. The chip carries 50 MB of L2 cache, 25 percent more than the A100, and pairs the GPU die with HBM stacks in the same package using TSMC's CoWoS-S advanced packaging [^12].

A major departure from prior generations is asynchronous execution at much larger granularity. Hopper introduces the **Tensor Memory Accelerator** (TMA), a dedicated hardware unit that copies large multidimensional tensors between global and shared memory without consuming CUDA or Tensor Core cycles. Hopper also adds **thread block clusters**, a new level in the CUDA execution hierarchy above blocks and below grids, allowing multiple thread blocks on adjacent SMs to cooperate through distributed shared memory [^1][^12].

## How does the H100 differ from the A100?

The Hopper generation added several capabilities distinguishing the H100 from the [A100](/wiki/nvidia_a100). The headline addition is the **Transformer Engine**, combining fourth-generation Tensor Core hardware with a software runtime that selects FP8 or FP16 precision on a per-layer, per-tensor basis. The runtime tracks tensor statistics during training and dynamically chooses scaling factors so FP8 can be used wherever dynamic range allows, falling back to higher precision where needed for numerical stability. NVIDIA reports this can roughly double training throughput on transformer architectures while preserving final accuracy, and states the Transformer Engine "applies mixed FP8 and FP16 precisions to dramatically accelerate AI calculations for transformers" [^1][^12][^31].

FP8 is supported in two sub-formats:

- **E4M3** (4 exponent, 3 mantissa bits), prioritizing precision for forward activations and weights.
- **E5M2** (5 exponent, 2 mantissa bits), prioritizing dynamic range for gradients during backpropagation [^12].

Other architectural additions over Ampere include:

- **DPX instructions**, hardware-accelerated primitives for dynamic programming algorithms such as Smith-Waterman in genomics and Floyd-Warshall in routing, with up to 7x speedup over the A100 [^12].
- **Fourth-generation [NVLink](/wiki/nvlink)** with 18 links per GPU, providing 900 GB/s aggregate bidirectional bandwidth per H100 SXM5, a 1.5x increase over A100 [^1].
- **[PCIe](/wiki/pcie) Gen 5**, doubling host I/O bandwidth to 128 GB/s bidirectional [^1].
- **HBM3 memory** at 3.35 TB/s on SXM5, a roughly 1.7x bandwidth uplift over A100 HBM2e [^1].
- **Confidential Computing** via a hardware-based trusted execution environment that isolates workloads from the host OS and hypervisor [^12].
- **[Multi-Instance GPU](/wiki/mig) (MIG) 2.0**, supporting up to seven secure tenants per GPU with hardware isolation including media engines and dedicated NVDEC/NVJPEG per partition [^12].
- **Cluster- and grid-level synchronization primitives**, exposed through new CUDA cooperative group APIs.

Taken together, these additions positioned the H100 as a chip optimized for the transformer-dominated workload mix that emerged in 2022 and 2023.

## What variants of the H100 exist?

NVIDIA shipped the H100 in several form factors plus export-restricted derivatives for the Chinese market.

| Variant | Form factor | TDP | Memory | Memory bandwidth | NVLink | Notes |
|---|---|---|---|---|---|---|
| H100 SXM5 | SXM5 module | 700 W | 80 GB HBM3 | 3.35 TB/s | 900 GB/s | Flagship; used in HGX and [DGX H100](/wiki/dgx_h100) |
| H100 PCIe | Dual-slot PCIe Gen 5 | 350 W | 80 GB HBM2e (initial); 94 GB HBM3 (later) | 2.0 TB/s (HBM2e); ~3.9 TB/s (HBM3) | 600 GB/s via bridge | Lower thermal envelope |
| H100 NVL | Pair of PCIe cards bridged via NVLink | 2x 400 W | 188 GB HBM3 (94 GB each) | 7.8 TB/s aggregate | 600 GB/s bridge | GTC March 2023, LLM inference |
| [GH200](/wiki/gh200) Grace Hopper | H100 + [Grace](/wiki/grace_hopper) ARM CPU module | ~1000 W | 96 GB HBM3 (later 144 GB HBM3e) | 4 TB/s (HBM3); 4.9 TB/s (HBM3e) | NVLink-C2C 900 GB/s | 72-core Neoverse-V2 plus H100 |
| H800 SXM/PCIe | SXM5 / PCIe | 700 W / 350 W | 80 GB | 3.35 TB/s | 400 GB/s (cut from 900) | China-only, November 2022 [^13] |
| [H20](/wiki/h20) | SXM | 400 W | 96 GB HBM3 | 4 TB/s | 900 GB/s | China-only, 2024, reduced compute |

The SXM5 module is mounted on an HGX baseboard inside reference systems such as the [DGX H100](/wiki/dgx_h100). The PCIe variant is intended for standard servers where the 700 W envelope is impractical [^1][^13]. The H100 NVL, announced at GTC in March 2023, packages two H100 PCIe cards joined by NVLink to expose 188 GB of HBM3 to a single workload, targeting inference of 175 billion-parameter-class transformers without a full HGX baseboard [^13].

The **[H800](/wiki/h800)** was the original China-export variant, introduced in November 2022 after the October 7, 2022 U.S. export controls. It retained H100 compute throughput but cut chip-to-chip NVLink bandwidth from 900 GB/s to 400 GB/s, throttling its usefulness for large-cluster training [^13][^14]. The H800 became the principal training accelerator for several Chinese frontier labs through 2023 and 2024, including [DeepSeek](/wiki/deepseek), which trained its 671-billion-parameter **DeepSeek-V3** model on a cluster of 2,048 H800 SXM5 GPUs over 2.78 million GPU-hours [^15]. The October 17, 2023 BIS update closed the H800 loophole and forced NVIDIA to develop the further-degraded **H20** under the new total-processing-performance metric [^14].

## Performance specifications

The table below lists peak theoretical throughput from NVIDIA's Hopper whitepaper for the H100 SXM5. Sparse Tensor Core figures assume the 2:4 structured sparsity pattern introduced with Ampere [^1][^12].

| Precision | Dense throughput | With 2:4 sparsity |
|---|---|---|
| FP64 | 34 TFLOPS | n/a |
| FP64 Tensor Core | 67 TFLOPS | n/a |
| FP32 | 67 TFLOPS | n/a |
| TF32 Tensor Core | 989 TFLOPS | 1,979 TFLOPS |
| BF16 Tensor Core | 1,979 TFLOPS | 3,958 TFLOPS |
| FP16 Tensor Core | 1,979 TFLOPS | 3,958 TFLOPS |
| FP8 Tensor Core | 3,958 TFLOPS | 7,916 TFLOPS |
| INT8 Tensor Core | 3,958 TOPS | 7,916 TOPS |

Memory and interconnect specifications:

| Specification | H100 SXM5 | H100 PCIe (HBM2e) | H100 NVL (per pair) |
|---|---|---|---|
| Memory | 80 GB HBM3 | 80 GB HBM2e | 188 GB HBM3 |
| Bandwidth | 3.35 TB/s | 2.0 TB/s | 7.8 TB/s aggregate |
| Interconnect | NVLink 4.0 (900 GB/s) | PCIe Gen 5 (128 GB/s); optional NVLink bridge | NVLink bridge |
| TDP | 700 W | 300 to 350 W | 2 x 350 to 400 W |
| MIG | Up to 7 instances | Up to 7 instances | Per-card |

In [MLPerf](/wiki/mlperf) Training v3.0 (June 2023), NVIDIA reported H100 systems completing the GPT-3 175B reference workload about 2.8x faster than equivalent A100 systems at the same GPU count, with the gap widening at larger scale due to NVLink and NVSwitch improvements [^16]. MLPerf Inference v3.1 included the first FP8 results on GPT-J 6B, roughly 4x the A100 baseline [^16].

## DGX H100 systems

The **[DGX H100](/wiki/dgx_h100)** is NVIDIA's reference appliance built around the H100 SXM5. Each DGX H100 contains:

- 8 H100 SXM5 GPUs on an HGX baseboard, totaling 640 GB HBM3 (8 x 80 GB).
- 4 fourth-generation [NVSwitch](/wiki/nvswitch) chips providing all-to-all NVLink connectivity at 900 GB/s per GPU.
- 2 Intel Xeon Platinum 8480C CPUs (56 cores each) and 2 TB of system DRAM in early SKUs.
- 8 ConnectX-7 400 Gb/s InfiniBand or Ethernet NICs, plus 2 dual-port BlueField-3 DPUs.
- 30 TB of NVMe SSD storage in the reference configuration.
- A 6U rackmount chassis with peak power of approximately 10.2 kW [^17].

NVIDIA quotes 32 PFLOPS of FP8 dense performance per DGX H100 (8 x 3.958 PFLOPS), or 64 PFLOPS with 2:4 sparsity. The DGX H100 shipped to customers in late 2022 and early 2023 and replaced the DGX A100 at the top of the product line [^17].

The related **HGX H100** baseboard exposes the same eight-GPU NVLink topology but is sold to OEMs (Supermicro, Dell, HPE, Lenovo, Quanta, and others) for integration into custom server chassis. The bulk of hyperscale H100 deployments used HGX boards rather than full DGX systems.

## SuperPOD and large clusters

NVIDIA's reference design for an H100 supercomputer is the **[DGX SuperPOD](/wiki/dgx_superpod) with H100**, combining 32 DGX H100 nodes (256 GPUs) into a single scalable unit connected by an NVLink Switch System and InfiniBand fabric. Multiple SuperPODs combine into SuperClusters [^17][^18].

Notable H100-based systems in production between 2023 and 2025:

- **[Eos](/wiki/eos_supercomputer)**: NVIDIA's own H100 supercomputer, with 576 DGX H100 systems totaling 4,608 GPUs. Announced at GTC 2022, online in 2023, used for internal model training and MLPerf submissions [^18].
- **Israel-1**: an NVIDIA-owned generative AI supercomputer in Israel, built on Spectrum-X Ethernet networking. Disclosed in May 2023 [^19].
- **[Colossus](/wiki/colossus_supercomputer)** ([xAI](/wiki/xai)): announced by Elon Musk in September 2024 with 100,000 H100 GPUs in Memphis, Tennessee, and scaled by mid-2025 to roughly 150,000 H100s plus 50,000 H200s and 30,000 GB200 GPUs in a single fabric exceeding 250 MW of power draw [^8]. xAI began construction on **Colossus 2** in March 2025, targeting a gigawatt-class follow-on facility with hundreds of thousands of Blackwell-generation GPUs to train future Grok models [^8].
- **Meta H100 fleet**: Meta committed to operating roughly 350,000 H100s and a total of about 600,000 H100-equivalent GPUs by the end of 2024, per Mark Zuckerberg's January 2024 disclosure; **Llama 4** was subsequently trained on a cluster exceeding 100,000 H100s, which Zuckerberg described as "bigger than anything that I've seen reported for what others are doing" [^20].
- **Microsoft Azure** disclosed multiple H100 capacity buildouts during 2023 and 2024 to support OpenAI training and inference [^3].
- **[Oracle Cloud Infrastructure](/wiki/oracle)** offered H100 SuperClusters scaling up to 16,384 GPUs with up to 65 EFLOPS of FP8 throughput, using RDMA over Converged Ethernet (RoCEv2) with NVIDIA ConnectX-7 NICs in place of InfiniBand [^21].
- **[Stargate Project](/wiki/stargate_project)**: announced January 21, 2025 at the White House by President Donald Trump alongside Sam Altman ([OpenAI](/wiki/openai)), Larry Ellison ([Oracle](/wiki/oracle)), and Masayoshi Son ([SoftBank](/wiki/softbank)). The joint venture committed $500 billion over four years and $100 billion immediately, beginning with a data-center campus in Abilene, Texas; while the bulk of the rollout uses GB200 and GB300 Blackwell systems, the initial Abilene buildout incorporated significant H100 and H200 capacity supplied by Oracle [^22].

These systems combine an NVLink Switch System for intra-pod connectivity with InfiniBand or Spectrum-X / RoCEv2 Ethernet for cluster-wide scale-out.

## How much did the H100 cost?

The launch of OpenAI's ChatGPT in November 2022 triggered a surge in demand for accelerators capable of training and serving large transformer models. By the first half of 2023 the H100 had become the de facto standard chip for state-of-the-art generative AI workloads, and NVIDIA's lead times stretched to 36 to 52 weeks for many enterprise buyers, according to reporting in Bloomberg and the Financial Times [^4][^5].

Reported per-unit prices for the H100 SXM5 in 2023 ranged from approximately $25,000 to over $40,000, with the higher figures associated with HGX configurations and resellers in supply-constrained regions. PCIe variants generally sold for $25,000 to $35,000 [^4][^5]. Cloud rental rates moved in step: hyperscalers and specialty providers initially priced on-demand H100 instances at $4 to $8 per GPU-hour in 2023, with reserved capacity at lower effective rates [^4].

The 2025 pricing collapse marked the transition of the H100 from frontier hardware to commodity inference fleet:

- By Q4 2025, neoclouds such as Lambda, RunPod, Vast.ai, and Cudo Compute were quoting on-demand H100 SXM5 instances at $1.49 to $2.99 per GPU-hour, with one-year reservations bottoming near $1.70/hr per GPU in October 2025 [^6][^7].
- AWS cut on-demand H100 instance pricing by roughly 44 percent in June 2025; by May 2026 AWS and GCP listed H100 on-demand capacity in the $3 to $4 range, while marketplace spot pricing repeatedly dipped under $2 [^6][^7].
- One-year H100 contract pricing then rebounded approximately 40 percent between October 2025 and March 2026, reaching about $2.35/hr per GPU as Blackwell Ultra rollouts and tightening on-demand supply re-balanced the market [^6].
- A used eight-GPU H100 HGX server traded for roughly $150,000 to $180,000 in early 2026, versus approximately $500,000 for a new B300 server, making H100 capacity the cheapest path to dense FP8 inference throughput per dollar of capex [^7].

Key enterprise customers through 2025 included Microsoft, Meta, Google, Amazon, Oracle, CoreWeave, Tencent, ByteDance (subject to export-control limits), Tesla, and xAI [^3]. NVIDIA's data-center revenue grew from roughly $15 billion in fiscal 2023 to about $47.5 billion in fiscal 2024 and **$115.2 billion in fiscal 2025**, the bulk attributable to H100 and H200 shipments. In Q3 of fiscal 2026 (ending October 2025) NVIDIA reported a record $51.2 billion in single-quarter data-center revenue, up 66 percent year over year, with H200 and GB200/GB300 Blackwell systems leading the mix [^23]. NVIDIA's market capitalization, around $300 billion in late 2022, exceeded $3 trillion by mid-2024 and crossed $4 trillion in 2025 [^3][^23].

## What is the H100 used for?

The H100 became the workhorse chip for foundation model development between 2023 and 2024 and remained the dominant inference engine through 2025 and 2026. Reported and disclosed workloads include:

- **Frontier large language model training**: OpenAI's [GPT-4](/wiki/gpt-4), trained on a mixture of A100 and H100 capacity through Microsoft Azure; Anthropic's [Claude](/wiki/claude) 2 and Claude 3 families; Meta's [Llama](/wiki/llama) 2 (A100 clusters) and Llama 3, trained on a 24,576-GPU cluster of H100s per Meta's published technical report, then Llama 4 trained on a cluster exceeding 100,000 H100s in 2024 to 2025 [^20]; [Mistral](/wiki/mistral) AI's Mistral and Mixtral models; and the GPU portion of Google [Gemini](/wiki/gemini) training alongside [TPU](/wiki/tpu) capacity [^3].
- **China frontier training under export controls**: [DeepSeek](/wiki/deepseek)-V3 and DeepSeek-R1 were trained on roughly 2,048 H800 SXM5 GPUs, with disclosed pre-training cost on the order of $5.6 million in compute time, demonstrating that the H100 family at degraded-interconnect form remained capable of frontier-quality training when paired with aggressive sparsity and reinforcement-learning techniques [^15].
- **LLM inference**: production serving of Llama 3 70B and 405B, GPT-4, Claude, DeepSeek-R1, and Mistral on H100 NVL and SXM5 nodes using TensorRT-LLM, vLLM, and SGLang.
- **Diffusion image and video**: [Stable Diffusion](/wiki/stable_diffusion) XL, Midjourney, and Sora-class video models train and serve on H100, H200, and Blackwell hardware.
- **Speech**: large-scale Whisper inference and transcription pipelines.
- **Recommender systems**: Meta has publicly described migrating large recommender models to H100 to improve embedding-table throughput.
- **Scientific computing**: drug discovery, AlphaFold-style protein structure prediction, climate modeling, computational chemistry, fluid dynamics, and seismic processing, with the FP64 Tensor Core path delivering 67 TFLOPS of double-precision throughput per GPU.
- **Autonomous driving**: training of perception, planning, and end-to-end driving models at Tesla, Waymo, Cruise, and other automakers.

## When was the H100 released, and what came before and after it?

The table below summarizes how the H100 fits into NVIDIA's data-center GPU roadmap, drawing on NVIDIA product pages and architecture whitepapers [^1][^9][^10][^11][^12][^24].

| Specification | A100 (SXM4, 80 GB) | H100 (SXM5) | H200 (SXM) | B200 (SXM) | B300 (Blackwell Ultra) |
|---|---|---|---|---|---|
| Architecture | [Ampere](/wiki/ampere_microarchitecture) | [Hopper](/wiki/hopper_microarchitecture) | Hopper | [Blackwell](/wiki/nvidia_blackwell) | Blackwell Ultra |
| Announced | May 2020 | March 2022 | November 2023 | March 2024 | March 2025 |
| Process | TSMC 7 nm (N7) | TSMC 4N | TSMC 4N | TSMC 4NP | TSMC 4NP |
| Transistors | 54.2 billion | 80 billion | 80 billion | 208 billion (2 dies) | 208 billion (2 dies) |
| Memory | 80 GB HBM2e | 80 GB HBM3 | 141 GB HBM3e | 192 GB HBM3e | 288 GB HBM3e |
| Memory bandwidth | 2.0 TB/s | 3.35 TB/s | 4.8 TB/s | 8.0 TB/s | 8.0 TB/s |
| FP8/FP4 Tensor (dense) | n/a | 3,958 TFLOPS FP8 | 3,958 TFLOPS FP8 | 9,000 TFLOPS FP8 / 18 PFLOPS FP4 | 15 PFLOPS NVFP4 |
| BF16 Tensor (dense) | 312 TFLOPS | 1,979 TFLOPS | 1,979 TFLOPS | 4,500 TFLOPS | ~4,500 TFLOPS |
| NVLink bandwidth | 600 GB/s | 900 GB/s | 900 GB/s | 1.8 TB/s | 1.8 TB/s |
| TDP | 400 W | 700 W | 700 W | 1,000 W | 1,400 W |

The **A100**, launched at GTC in May 2020, established the SXM4 form factor and HGX baseboard pattern the H100 inherited. Built on TSMC's 7 nm N7 process with 54.2 billion transistors, supporting third-generation NVLink at 600 GB/s, it shipped in 40 GB and 80 GB HBM2e configurations and was the workhorse for the first wave of large-model training, including the original ChatGPT runs of GPT-3.5 [^24].

The **[H200](/wiki/nvidia_h200)**, announced on November 13, 2023, retained the GH100 Hopper SM design but upgraded memory to 141 GB of [HBM3e](/wiki/hbm3e) at 4.8 TB/s, roughly a 76 percent capacity and 43 percent bandwidth increase over the H100 SXM5. NVIDIA quoted nearly 2x inference speedup on Llama 2 70B in early benchmarks. The H200 reached general availability in Q2 2024 [^9].

The **[Blackwell](/wiki/nvidia_blackwell)** generation was announced by Jensen Huang at GTC on March 18, 2024. The B200 uses two reticle-limit dies connected by a 10 TB/s die-to-die NVLink-C2C interface to form a single logical GPU with 208 billion transistors, delivering up to 20 PFLOPS FP4 and 2.5x training / 5x inference improvement over the H100 on certain LLM workloads. The [GB200](/wiki/nvidia_gb200) Grace Blackwell Superchip pairs two B200 GPUs with one Grace CPU, and the GB200 NVL72 rack-scale system aggregates 36 GB200 superchips (72 B200 GPUs, 36 Grace CPUs) into a single liquid-cooled rack [^10].

**Blackwell Ultra** (B300 and GB300 NVL72) was announced at GTC on March 18, 2025, and began shipping in the second half of 2025. The B300 delivers 15 PFLOPS of NVFP4, 288 GB of HBM3e, and 8 TB/s memory bandwidth at a 1,400 W TDP. NVIDIA positioned the GB300 NVL72 as offering 1.5x the AI throughput of the GB200 NVL72 and approximately 7.5x the dense FP4 throughput of the H100 on reasoning-style inference; by Q4 of fiscal 2026 NVIDIA reported that GB300 shipments had surpassed GB200 and accounted for roughly two-thirds of Blackwell revenue [^11][^23].

## Software ecosystem

The H100 is supported throughout NVIDIA's software stack:

- **[CUDA](/wiki/cuda) Toolkit 12.0** (December 2022) was the first release to expose all H100 features, including thread block clusters, the Tensor Memory Accelerator, distributed shared memory, and FP8 datatypes [^25].
- **cuBLAS, [cuDNN](/wiki/cudnn), and [NCCL](/wiki/nccl)** received Hopper-specific optimizations for matrix multiplication, convolution, and collective operations, including FP8 GEMM kernels and NVLink Switch System support.
- **[TensorRT-LLM](/wiki/tensorrt_llm)**, released October 2023, provides H100-optimized inference kernels for Llama, GPT, Falcon, Mixtral, and other LLM architectures, with FP8 quantization and in-flight batching.
- **OpenAI [Triton](/wiki/nvidia_triton)** gained a Hopper backend exposing the new asynchronous primitives.
- **[PyTorch](/wiki/pytorch) 2.0** (March 2023) and later added compilation and FP8 paths for Hopper through `torch.compile`, Transformer Engine, and FlashAttention 2 and 3 integrations. **[TensorFlow](/wiki/tensorflow) 2.13** and later expose H100 kernels through XLA and Transformer Engine.
- **Magnum IO** is a set of GPU-accelerated I/O libraries that drive InfiniBand, RDMA, and storage at line rate, while **NVIDIA AI Enterprise** is the commercial production-deployment suite for certified H100 hardware.

**FlashAttention 3**, by Tri Dao and collaborators, was released in mid-2024 specifically tuned for Hopper, exploiting the asynchronous TMA, warp-specialization patterns, and FP8 Tensor Cores to reach close to 75 percent of the H100's theoretical FP8 peak on attention kernels [^26].

## Why was the H100 export-controlled?

The H100 sat at the center of an evolving United States export-control regime aimed at restricting advanced AI accelerator flows to China.

- **October 7, 2022**: the U.S. Commerce Department's [Bureau of Industry and Security](/wiki/bureau_of_industry_and_security) restricted exports of advanced AI chips, including the H100 and A100, to China and several other destinations. The rules used a combined threshold based on interconnect bandwidth and compute density [^14].
- **November 2022**: NVIDIA introduced the **[H800](/wiki/h800)** for China, derived from the H100 but with NVLink bandwidth reduced from 900 GB/s to 400 GB/s. A corresponding **A800** was derived from the A100 [^14].
- **October 17, 2023**: BIS removed the bandwidth-only criterion in favor of a single threshold based on total processing performance and performance density, closing the H800/A800 loophole. NVIDIA halted new H800 and A800 shipments to China [^14].
- **2024**: NVIDIA introduced the **[H20](/wiki/h20)**, **L20**, and **L2** China-specific products under the October 2023 rules. The H20 retains 96 GB HBM3 and 4 TB/s bandwidth but with sharply reduced compute, reportedly under 30 percent of H100 training throughput on transformer workloads [^14][^27].
- **April 2025**: the [Trump administration](/wiki/donald_trump) effectively banned H20 exports to China, requiring a license for every shipment indefinitely. NVIDIA disclosed an approximately $5.5 billion inventory and prepaid-contract charge in Q1 of fiscal 2026 [^28].
- **July 2025**: the administration reversed course and authorized resumed H20 shipments under a revenue-share arrangement reported to be 15 percent of relevant China sales; the reversal followed lobbying by Jensen Huang and concern about ceding the Chinese inference market to domestic competitors such as Huawei Ascend [^28][^29].

The H100 [export controls](/wiki/export_controls) became a major story in 2023 to 2025 coverage of AI, framed both as a national-security measure and as a constraint on Chinese frontier AI development. DeepSeek's 2024 to 2025 releases of V3 and R1, trained largely on H800 capacity, reignited debate over the effectiveness of the regime [^15].

## Reception and impact

The H100 was widely covered as the central product of the generative AI boom. Independent reviews from AnandTech, ServeTheHome, and The Next Platform validated NVIDIA's throughput improvements over the A100 and highlighted the Transformer Engine and NVLink Switch System as the most significant additions [^12][^30]. Analysts at Morgan Stanley and Bernstein attributed the bulk of NVIDIA's data-center revenue growth in fiscal 2024 to H100 shipments, with unit volume reaching the low millions during 2024 [^3].

The phrase "compute as the new oil" became common shorthand in 2023, reflecting the perception that H100 supply, more than algorithmic novelty, was the binding constraint on frontier model development. NVIDIA's market capitalization exceeded $1 trillion in mid-2023, $2 trillion in February 2024, and $3 trillion by June 2024, briefly making it the world's most valuable public company, and crossed $4 trillion during 2025 as Blackwell ramped and H100 inference fleets matured [^3][^23].

Critics raised three concerns: vendor concentration risk (which led hyperscalers to accelerate internal programs such as [AWS Trainium](/wiki/aws_trainium), Google [TPU](/wiki/tpu), and Microsoft Maia); the energy footprint, with single training runs consuming gigawatt-hours; and supply-chain concentration at TSMC for fabrication and SK Hynix and Micron for HBM [^3][^30].

Competing accelerators in 2023 to 2025 included [AMD's MI300X](/wiki/amd_mi300x) (December 2023, 192 GB HBM3, LLM inference focus) and the follow-on MI325X and MI350 series, Google [TPU](/wiki/tpu) v4, v5p, and v6 Trillium, [AWS Trainium](/wiki/aws_trainium) and Trainium2, [Cerebras](/wiki/cerebras) CS-3, [Tenstorrent](/wiki/tenstorrent), and [Groq](/wiki/groq_hardware) Language Processing Units. The H100 remained the dominant training accelerator through 2024 and the dominant cloud inference accelerator through 2025, with industry estimates putting NVIDIA's share of the merchant AI accelerator market above 80 percent into 2026 [^3].

## Limitations

Despite its commercial success, the H100 had several recognized limitations:

- **Power**: the SXM5's 700 W TDP and eight GPUs per HGX baseboard drove DGX H100 systems to roughly 10.2 kW peak, stressing air-cooling infrastructure and accelerating the move to direct-to-chip and rear-door liquid cooling.
- **Cost**: at $25,000 to $40,000 per unit during peak demand, full DGX H100 systems exceeded $400,000 each, and SuperPOD deployments cost hundreds of millions of dollars.
- **Memory capacity**: 80 GB HBM3 was a bottleneck for inference of larger models, where weights plus KV caches exceeded the per-GPU budget. The H100 NVL partially addressed this with 188 GB across two cards; lasting fixes arrived with the H200 (141 GB HBM3e), B200 (192 GB HBM3e), and B300 (288 GB HBM3e).
- **Supply chain concentration**: GH100 fabrication relied on a single TSMC 4N line, and HBM3 packaging depended on CoWoS-S capacity with multi-quarter lead times.
- **Software complexity**: extracting full FP8 Transformer Engine throughput required model-side cooperation, and not all open-source frameworks reached parity in the first year.
- **Generational obsolescence on reasoning workloads**: by 2025, long-context reasoning models such as DeepSeek-R1 and o3-class systems shifted the cost-optimal frontier toward Blackwell and Blackwell Ultra, which deliver substantially better throughput per watt on NVFP4 and very-long-sequence attention. H100 capacity remained competitive only on inference of smaller or quantized models at lower price points.

## See also

- [Microsoft Maia 200](/wiki/microsoft_maia_200)
- [SambaNova SN50](/wiki/sambanova_sn50)
- [NVIDIA Spectrum-XGS](/wiki/nvidia_spectrum_xgs)
- [OpenAI-Broadcom AI accelerators](/wiki/openai_broadcom_accelerators)
- [NVIDIA DGX Station for Windows](/wiki/nvidia_dgx_station_for_windows)
- [Microsoft Maia 200](/wiki/microsoft_maia_200)
- [SambaNova SN50](/wiki/sambanova_sn50)
- [NVIDIA Spectrum-XGS](/wiki/nvidia_spectrum_xgs)
- [Reliance Jamnagar AI data center](/wiki/reliance_jamnagar_ai_data_center)
- [NVIDIA DSX](/wiki/nvidia_dsx)
- [Hopper microarchitecture](/wiki/hopper_microarchitecture), [Ampere microarchitecture](/wiki/ampere_microarchitecture), [NVIDIA Blackwell](/wiki/nvidia_blackwell)
- [NVIDIA A100](/wiki/nvidia_a100), [NVIDIA H200](/wiki/nvidia_h200), [NVIDIA B200](/wiki/nvidia_b200), [NVIDIA GB200](/wiki/nvidia_gb200), [GH200](/wiki/gh200), [NVIDIA Grace Hopper](/wiki/nvidia_grace_hopper)
- [DGX H100](/wiki/dgx_h100), [DGX SuperPOD](/wiki/dgx_superpod), [Eos supercomputer](/wiki/eos_supercomputer), [Colossus supercomputer](/wiki/colossus_supercomputer), [Stargate Project](/wiki/stargate_project)
- [Transformer Engine](/wiki/transformer_engine), [FP8](/wiki/fp8), [BF16](/wiki/bf16), [HBM3](/wiki/hbm3), [HBM3e](/wiki/hbm3e), [Tensor Core](/wiki/tensor_core), [MIG](/wiki/mig), [MLPerf](/wiki/mlperf)
- [NVLink](/wiki/nvlink), [NVSwitch](/wiki/nvswitch), [PCIe](/wiki/pcie), [TSMC](/wiki/tsmc), [CUDA](/wiki/cuda), [TensorRT-LLM](/wiki/tensorrt_llm), [Triton](/wiki/nvidia_triton), [PyTorch](/wiki/pytorch), [TensorFlow](/wiki/tensorflow), [cuDNN](/wiki/cudnn), [NCCL](/wiki/nccl)
- [AMD MI300X](/wiki/amd_mi300x), [TPU](/wiki/tpu), [AWS Trainium](/wiki/aws_trainium), [Cerebras](/wiki/cerebras), [Groq](/wiki/groq_hardware), [Tenstorrent](/wiki/tenstorrent)
- [Jensen Huang](/wiki/jensen_huang), [Grace Hopper](/wiki/grace_hopper), [GTC](/wiki/gtc), [Bureau of Industry and Security](/wiki/bureau_of_industry_and_security), [H800](/wiki/h800), [H20](/wiki/h20), [Export controls](/wiki/export_controls), [Data center](/wiki/data_center), [DeepSeek](/wiki/deepseek)

## References

[^1]: NVIDIA Corporation. "NVIDIA H100 Tensor Core GPU Architecture." Hopper architecture whitepaper, 2022. https://resources.nvidia.com/en-us-tensor-core/gtc22-whitepaper-hopper

[^2]: NVIDIA Corporation. "NVIDIA H100 Tensor Core GPU." Product page. https://www.nvidia.com/en-us/data-center/h100/

[^3]: NVIDIA Corporation. "NVIDIA Investor Relations: Quarterly earnings releases and conference calls," fiscal 2023 to fiscal 2026. https://investor.nvidia.com/financial-info/financial-reports/

[^4]: Don Clark. "Nvidia's AI Chip Demand Has Customers Waiting Up to a Year." The Wall Street Journal / Bloomberg coverage, August 2023.

[^5]: Richard Waters and Tim Bradshaw. "AI groups spend more than $50bn on Nvidia chips." Financial Times, 2023.

[^6]: Silicon Data. "H100 Rental Price Over Time (2023 to 2025): A Complete Market Analysis." 2026. https://www.silicondata.com/blog/h100-rental-price-over-time

[^7]: Introl. "GPU Cloud Prices Collapse: H100 Market, December 2025." 2025. https://introl.com/blog/gpu-cloud-price-collapse-h100-market-december-2025

[^8]: xAI Corporation and Wikipedia contributors. "Colossus (supercomputer)." Press release and statements by Elon Musk, September 2024 through 2025. https://x.ai/colossus and https://en.wikipedia.org/wiki/Colossus_(supercomputer)

[^9]: NVIDIA Corporation. "NVIDIA H200 Tensor Core GPU." Press release and product page, November 13, 2023. https://www.nvidia.com/en-us/data-center/h200/

[^10]: NVIDIA Corporation. "NVIDIA Blackwell platform: B100, B200, GB200." GTC March 2024 announcement materials. https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/

[^11]: NVIDIA Corporation. "NVIDIA Blackwell Ultra AI Factory Platform Paves Way for Age of AI Reasoning." NVIDIA Newsroom, March 18, 2025. https://nvidianews.nvidia.com/news/nvidia-blackwell-ultra-ai-factory-platform-paves-way-for-age-of-ai-reasoning

[^12]: NVIDIA Corporation. "NVIDIA Hopper Architecture In-Depth." NVIDIA Developer Blog, March 22, 2022. https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/

[^13]: NVIDIA Corporation. "NVIDIA H100 NVL for High-Performance Inference." GTC March 2023 announcement.

[^14]: U.S. Bureau of Industry and Security. "Export Controls on Advanced Computing and Semiconductor Manufacturing Items." Federal Register notices, October 7, 2022 and October 17, 2023. https://www.bis.doc.gov/

[^15]: DeepSeek. "DeepSeek-V3 Technical Report." December 2024. https://arxiv.org/abs/2412.19437; NextPlatform, "How Did DeepSeek Train Its AI Model On A Lot Less - And Crippled - Hardware?", January 27, 2025. https://www.nextplatform.com/ai/2025/01/27/how-did-deepseek-train-its-ai-model-on-a-lot-less-and-crippled-hardware/

[^16]: MLCommons. "MLPerf Training v3.0 and Inference v3.1 Results." 2023. https://mlcommons.org/benchmarks/

[^17]: NVIDIA Corporation. "NVIDIA DGX H100 Datasheet." 2022. https://www.nvidia.com/en-us/data-center/dgx-h100/

[^18]: NVIDIA Corporation. "NVIDIA Eos: Powering AI Innovation." NVIDIA Blog, 2023. https://blogs.nvidia.com/blog/eos-supercomputer-mlperf/

[^19]: NVIDIA Corporation. "Israel-1 Generative AI Supercomputer." Press release, May 2023.

[^20]: Tom's Hardware. "Meta is using more than 100,000 Nvidia H100 AI GPUs to train Llama-4." 2024. https://www.tomshardware.com/tech-industry/artificial-intelligence/meta-is-using-more-than-100-000-nvidia-h100-ai-gpus-to-train-llama-4; HPCwire. "Meta's Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs." January 25, 2024. https://www.hpcwire.com/2024/01/25/metas-zuckerberg-puts-its-ai-future-in-the-hands-of-600000-gpus/

[^21]: Oracle Corporation. "Oracle Offers First Zettascale Cloud Computing Cluster." September 11, 2024. https://www.oracle.com/news/announcement/ocw24-oracle-offers-first-zettascale-cloud-computing-cluster-2024-09-11/

[^22]: OpenAI. "Announcing The Stargate Project." January 21, 2025. https://openai.com/index/announcing-the-stargate-project/; OpenAI. "Stargate advances with 4.5 GW partnership with Oracle." 2025. https://openai.com/index/stargate-advances-with-partnership-with-oracle/

[^23]: NVIDIA Corporation. "NVIDIA Announces Financial Results for Third Quarter Fiscal 2026." November 2025. https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-third-quarter-fiscal-2026

[^24]: NVIDIA Corporation. "NVIDIA A100 Tensor Core GPU Architecture." Ampere architecture whitepaper, 2020. https://www.nvidia.com/en-us/data-center/a100/

[^25]: NVIDIA Corporation. "CUDA Toolkit 12.0 Release Notes." December 2022. https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/

[^26]: Tri Dao, Daniel Haziza, et al. "FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision." arXiv:2407.08608, July 2024.

[^27]: Reuters. "Nvidia readies new China-tailored chips after U.S. export curbs." November 2023.

[^28]: NPR. "Nvidia discloses that U.S. will limit sales of advanced chips to China after all." April 16, 2025. https://www.npr.org/2025/04/16/nx-s1-5366665/nvidia-china-h20-chips-exports

[^29]: TIME. "What to Know About Trump's Nvidia Deal and China's Response." 2025. https://time.com/7309264/nvidia-trump-china-chips-deal-h20-blackwell-national-security-concerns/

[^30]: ServeTheHome. "NVIDIA H100 Hopper Detailed Architecture and Performance." 2022 to 2023 coverage. https://www.servethehome.com/

[^31]: NVIDIA Corporation. "NVIDIA Hopper Architecture." Product technology page. https://www.nvidia.com/en-us/data-center/technologies/hopper-architecture/

[^32]: NVIDIA Corporation. "NVIDIA Hopper GPU Architecture Vaults NVIDIA's Data Center Computing to New Heights." NVIDIA Blog, March 22, 2022. https://blogs.nvidia.com/blog/2022/03/22/ai-factories-hopper-h100-nvidia-ceo-jensen-huang/