NVIDIA Hopper

AI Hardware Data Centers NVIDIA

20 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

17 citations

Revision

v5 · 3,904 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

NVIDIA Hopper is the codename for NVIDIA's ninth-generation datacenter GPU microarchitecture, announced March 22, 2022 by CEO Jensen Huang at the GTC keynote.^[1] Hopper succeeds the Ampere architecture used in the NVIDIA A100 and is the foundation for the NVIDIA H100, the NVIDIA H200, and the GH200 Grace Hopper Superchip.^[2] The architecture is named after Rear Admiral Grace Hopper, the U.S. Navy computer scientist who developed the first compiler and helped create COBOL.^[9]

Hopper was the workhorse of the generative AI boom from 2022 through 2024. Almost every frontier large language model trained during that period, including reported training runs for GPT-4, Claude 3, Gemini, and Llama 3.1 405B, ran on H100 clusters. The architecture introduced fourth-generation Tensor Cores with the FP8 numerical format, the Transformer Engine for dynamic precision selection, fourth-generation NVLink at 900 GB/s, the NVLink Switch System for 256-GPU domains, and the Tensor Memory Accelerator for asynchronous data movement.^[2] At launch NVIDIA quantified the generational gain as up to 9x faster training on a 395-billion-parameter Mixture-of-Experts model and up to 30x higher inference throughput on a Megatron 530B chatbot compared to A100.^[17] Hopper was eventually succeeded by NVIDIA Blackwell in 2024, but H100 and H200 inventory continues to dominate datacenter AI capacity well into 2026.

When was Hopper announced?

NVIDIA announced Hopper on March 22, 2022 during the GTC spring keynote. Jensen Huang revealed the H100 GPU as the first product based on the new GH100 die, manufactured by TSMC on a custom 4N process node (a 5nm-class derivative tuned specifically for NVIDIA).^[1] The H100 began shipping in volume in the third quarter of 2022 through SXM5 and PCIe form factors.^[3] Framing the launch, Huang said: "Data centers are becoming AI factories, processing and refining mountains of data to produce intelligence. NVIDIA H100 is the engine of the world's AI infrastructure that enterprises use to accelerate their AI-driven businesses."^[17]

The codename follows NVIDIA's tradition of naming microarchitectures after pioneering scientists. Earlier generations included Tesla, Fermi, Kepler, Maxwell, Pascal, Volta, Turing, and Ampere. Hopper sits between Ampere (A100, 2020) and Blackwell (B100/B200, 2024).^[9] The naming is paired with the Grace CPU, also developed by NVIDIA, which targets the Arm server market and combines with Hopper in the GH200 superchip module.^[5]

Hopper arrived at the moment that demand for large language model training was beginning to explode. By late 2022, after the launch of ChatGPT, every major AI lab and hyperscaler was attempting to acquire as many H100s as possible. The chip was effectively allocated rather than sold for most of 2023 and 2024, with reported lead times stretching past a year at the peak.

What products are in the Hopper family?

The Hopper family covers SXM and PCIe accelerators, an export-restricted China variant, and a CPU+GPU superchip module.

Product	Announced	Memory	Bandwidth	TDP	Form factor
H100 SXM5	March 2022	80 GB HBM3	3.35 TB/s	700 W	SXM5 module
H100 PCIe	2022	80 GB HBM2e	2 TB/s	350 W	Dual-slot PCIe
H100 NVL	November 2023	188 GB total (94 GB per card, 2 cards)	3.9 TB/s per GPU	350 to 400 W per card	Dual PCIe with NVLink bridge
H800	Late 2022	80 GB	3.35 TB/s	700 W	China-export variant with reduced NVLink
H200 SXM	November 13, 2023	141 GB HBM3e	4.8 TB/s	700 W	SXM5 module
H200 NVL	Q3 2024	141 GB HBM3e	4.8 TB/s	600 W	Dual-slot PCIe
GH200 (HBM3)	May 2023 production	96 GB HBM3 + 480 GB LPDDR5X	4 TB/s GPU	Up to 1000 W module	Superchip module
GH200 NVL2	August 2023	282 GB HBM3e (2 superchips)	10 TB/s combined	Module	Dual superchip
GH200-141G (HBM3e)	August 2023 announcement, Q2 2024 ship	144 GB HBM3e	Higher bandwidth	Module	Superchip module

The H100 SXM5 was the volume part for AI training and shipped on the HGX H100 baseboard, which carries eight GPUs interconnected through fourth-generation NVLink and NVSwitch.^[1] The H100 PCIe targeted enterprise servers that could not accept the higher 700 W power draw of SXM5. The H100 NVL, sometimes called the H100 NVL for Large Language Models, paired two PCIe cards over an NVLink bridge to deliver 188 GB of combined memory specifically for serving very large LLMs.^[3]

The H800 was a stripped-down variant created in response to U.S. export controls in October 2022, with reduced NVLink bandwidth so it could be sold legally into China.^[9] It was later joined by additional restricted SKUs as export rules evolved.

The H200, announced November 13, 2023 at SC23, used the same GH100 die as the H100 but added six 24 GB stacks of HBM3e memory in place of the five 16 GB HBM3 stacks on H100.^[6]^[10] That single change boosted capacity by 76 percent and bandwidth by 43 percent without any other architectural change.^[10] NVIDIA claimed roughly 1.9x the inference throughput on Llama 2 70B compared to H100 thanks to the larger working set and faster bandwidth.^[6]

The GH200 Grace Hopper Superchip integrates a Grace CPU (72 Arm Neoverse V2 cores, up to 480 GB LPDDR5X) with a Hopper GPU on a single module connected by NVLink-Chip-to-Chip (NVLink-C2C) at 900 GB/s coherent bandwidth, roughly seven times faster than PCIe Gen 5.^[5] The original GH200 used 96 GB of HBM3 on the GPU side; an updated GH200-141G announced in August 2023 added HBM3e to push the GPU memory to 144 GB.^[11]

Architecture and the GH100 die

Hopper is built around a single very large monolithic die called GH100, which set new records for transistor count and complexity in a datacenter GPU when it launched.^[2] The GH100 packs 80 billion transistors on an 814 mm squared die manufactured on the TSMC 4N process.^[1]^[2]

Specification	GH100 full die	H100 SXM5	H100 PCIe
Process node	TSMC 4N	TSMC 4N	TSMC 4N
Transistors	80 billion	80 billion	80 billion
Die size	814 mm squared	814 mm squared	814 mm squared
GPCs	8	8	7 or 8
Streaming Multiprocessors	144	132 enabled	114 enabled
FP32 CUDA cores	18,432	16,896	14,592
Fourth-gen Tensor Cores	576	528	456
L2 cache	60 MB	50 MB enabled	50 MB enabled
HBM stacks	6 (one disabled in shipping H100)	5 active	5 active
Memory controllers	12 x 512-bit	10 x 512-bit	10 x 512-bit
Compute capability	9.0	9.0	9.0

Each H100 SM contains 128 FP32 cores and four fourth-generation Tensor Cores. The Tensor Cores are the workhorse of the chip for AI workloads and provide more than ten times the deep learning throughput of the standard CUDA cores.^[2] Hopper SMs include up to 228 KB of combined L1 cache and shared memory, about 1.33 times the per-SM capacity of A100, and they expose a new Distributed Shared Memory (DSMEM) capability that lets SMs in the same Thread Block Cluster read each other's L1 directly.^[2]

The GH100 die uses a 60 MB L2 cache, but ships products use 50 MB because one of the cache slices is paired with the disabled HBM stack.^[2] The L2 is split into two physically separated halves, which has interesting bandwidth and latency consequences that ChipsAndCheese and academic microbenchmark papers have analyzed in detail.^[14]^[15]

What makes Hopper different from Ampere?

Hopper is more than a process shrink of Ampere. It introduces several architectural features specifically designed for transformer-based workloads.

Feature	Description
Fourth-generation Tensor Cores	New Tensor Core design with twice the per-clock throughput of A100's third-gen cores on equivalent precisions
FP8 (E4M3 and E5M2)	New 8-bit floating point formats with FP32 or FP16 accumulators, doubling throughput over FP16
Transformer Engine	Software and hardware library that picks FP8 vs FP16 per layer using running statistics, often delivering 2x throughput on transformers without accuracy loss
Thread Block Clusters	New programming hierarchy above the thread block, letting up to 16 blocks (8 portable) cooperate across SMs in a GPC
Distributed Shared Memory	SMs in the same cluster can read each other's L1 directly, about 7x faster than going through global memory
Tensor Memory Accelerator (TMA)	Hardware unit that performs asynchronous bulk transfers of multidimensional tensors between global and shared memory
Asynchronous Transaction Barriers	Atomic synchronization primitive that counts both thread arrivals and bytes transferred
DPX instructions	New instructions that accelerate dynamic programming algorithms (Smith-Waterman, Floyd-Warshall) by up to 7x
Confidential Computing	Hardware-enforced trusted execution environment for encrypted GPU compute, the first on a server GPU
MIG 2.0	Second-generation Multi-Instance GPU with up to 7 isolated instances, now supporting confidential computing per instance
Fourth-gen NVLink	18 links per GPU at 50 GB/s each in each direction, 900 GB/s aggregate, 1.5x A100
NVLink Switch System	NVLink Switch chips enable a single domain of up to 256 H100 GPUs across 32 nodes
Lossless decompression engine	Dedicated hardware for decompressing data on the fly when streaming from HBM
PCIe Gen 5	First server GPU to use PCIe Gen 5, doubling host bandwidth to 128 GB/s aggregate

The Transformer Engine deserves special attention. It is implemented partly in silicon (the FP8 Tensor Cores) and partly in the open-source Transformer Engine library, which integrates with PyTorch, JAX, and frameworks like Megatron-LM.^[8]^[16] The library tracks per-tensor scaling statistics during forward and backward passes and dynamically chooses between E4M3 (more mantissa bits, smaller range) for forward activations and E5M2 (more exponent bits, larger range) for gradients.^[8] The result is that large transformer models can train in FP8 for most operations while keeping a small fraction in FP16 or FP32 to preserve numerical stability.^[8] NVIDIA describes the Transformer Engine as a combination of "software and custom Hopper Tensor Core technology" that automatically handles re-casting and scaling between FP8 and 16-bit precision in each layer.^[8]

Thread Block Clusters and the Tensor Memory Accelerator together change how high-performance Hopper kernels are written. The TMA lets a single thread issue a multi-dimensional tile copy from HBM to shared memory while other warps continue computing, eliminating the traditional pattern of dedicating warps to address arithmetic and memory issue.^[2] Combined with asynchronous transaction barriers, this enables the producer-consumer pipeline patterns that show up in libraries like CUTLASS 3.x and FlashAttention-3.

How fast is the H100?

The published H100 SXM5 performance numbers from the NVIDIA datasheet are reproduced below. These are peak theoretical throughputs and the FP8/FP16/BF16/TF32/INT8 figures include the structured 2:4 sparsity feature.^[3]

Precision	H100 SXM	H100 NVL (per GPU)	H200 SXM	H200 NVL
FP64	34 TFLOPS	30 TFLOPS	34 TFLOPS	30 TFLOPS
FP64 Tensor Core	67 TFLOPS	60 TFLOPS	67 TFLOPS	60 TFLOPS
FP32	67 TFLOPS	60 TFLOPS	67 TFLOPS	60 TFLOPS
TF32 Tensor Core (sparse)	989 TFLOPS	835 TFLOPS	989 TFLOPS	835 TFLOPS
BF16 Tensor Core (sparse)	1,979 TFLOPS	1,671 TFLOPS	1,979 TFLOPS	1,671 TFLOPS
FP16 Tensor Core (sparse)	1,979 TFLOPS	1,671 TFLOPS	1,979 TFLOPS	1,671 TFLOPS
FP8 Tensor Core (sparse)	3,958 TFLOPS	3,341 TFLOPS	3,958 TFLOPS	3,341 TFLOPS
INT8 Tensor Core (sparse)	3,958 TOPS	3,341 TOPS	3,958 TOPS	3,341 TOPS
GPU memory	80 GB HBM3	94 GB HBM3	141 GB HBM3e	141 GB HBM3e
Memory bandwidth	3.35 TB/s	3.9 TB/s	4.8 TB/s	4.8 TB/s
NVLink	900 GB/s	600 GB/s	900 GB/s	600 GB/s
Max TDP	700 W	350 to 400 W	700 W	600 W

Real-world FP8 performance during training typically lands in the 700 to 1,200 TFLOPS range per GPU depending on the workload, network topology, and how aggressively the Transformer Engine can stay in FP8. Meta reported around 380 teraFLOP/s sustained per GPU during the Llama 3.1 405B training run on a 16,384 H100 cluster, which corresponds to roughly 38 percent of dense FP8 peak.^[13]

How does Hopper compare with Ampere (A100)?

Hopper is roughly a generational leap over the A100, the previous datacenter flagship. The improvements come from three independent factors stacking together: more SMs, faster Tensor Cores per SM, and a new lower-precision format (FP8) that doubles arithmetic density.

Specification	A100 SXM4 80GB	H100 SXM5	Hopper improvement
Process	TSMC 7nm	TSMC 4N	One full node generation
Transistors	54.2 billion	80 billion	1.48x
Die size	826 mm squared	814 mm squared	Slightly smaller
Streaming Multiprocessors	108	132	1.22x
FP64 Tensor Core	19.5 TFLOPS	67 TFLOPS	3.4x
TF32 Tensor (sparse)	312 TFLOPS	989 TFLOPS	3.2x
BF16/FP16 Tensor (sparse)	624 TFLOPS	1,979 TFLOPS	3.2x
FP8 Tensor (sparse)	not supported	3,958 TFLOPS	New
Memory	80 GB HBM2e	80 GB HBM3	Faster generation
Memory bandwidth	2.0 TB/s	3.35 TB/s	1.68x
NVLink bandwidth	600 GB/s	900 GB/s	1.5x
Max TDP	400 W	700 W	1.75x

NVIDIA's headline claims of 9x faster training and up to 30x faster LLM inference for H100 vs A100 depend on FP8 being usable for the workload and on the model being large enough to benefit from the larger memory capacity.^[1]^[17] For HPC FP64 workloads the improvement is more modest, in the 3x to 4x range. For dense LLM inference of very large models, the combination of HBM3, larger L2, the Transformer Engine, and faster NVLink can produce more than 10x throughput improvements at equivalent latency.

How does Hopper compare with Blackwell?

Blackwell, introduced in March 2024, is Hopper's successor. Blackwell uses a dual-die package (two reticle-sized chips connected by a 10 TB/s die-to-die link) and adds support for FP4 and a refined FP8 implementation. It targets a roughly 2x to 4x performance improvement over Hopper on most LLM workloads, depending on whether FP4 is used.

Specification	H100 SXM5 (Hopper)	B200 (Blackwell)
Transistors	80 billion	208 billion (dual die)
Process	TSMC 4N	TSMC 4NP
Memory	80 GB HBM3	192 GB HBM3e
Memory bandwidth	3.35 TB/s	8 TB/s
FP8 Tensor (sparse)	3,958 TFLOPS	~9,000 TFLOPS
FP4 Tensor (sparse)	not supported	~18,000 TFLOPS
NVLink	900 GB/s (NVLink 4)	1,800 GB/s (NVLink 5)
TDP	700 W	1,000 W

Despite Blackwell's launch, Hopper continues to ship in volume through 2025 and 2026 because demand for AI compute massively exceeds supply and because TSMC CoWoS advanced packaging capacity is the binding constraint on Blackwell production. H100 and H200 remain the most commonly available datacenter accelerator in cloud catalogs at the time of writing.

Software stack

Hopper requires CUDA 11.8 or later for basic support and CUDA 12.x for full feature support including FP8 Tensor Cores, Thread Block Clusters, the TMA, and the DPX instructions.^[2] The full software ecosystem includes:

Component	Purpose
CUDA 12.x	Core programming model and runtime
cuDNN	Deep learning primitives library, FP8 support added
NCCL	Multi-GPU collective communications, scaled for NVLink Switch
TensorRT and TensorRT-LLM	Inference optimization and LLM serving
Transformer Engine	Open source library that orchestrates FP8 training and inference
Megatron-LM	Reference framework for very large transformer training
NeMo	NVIDIA's enterprise LLM toolkit
vLLM	Open source LLM inference server, Hopper-optimized
SGLang	High-throughput LLM inference engine
Triton Inference Server	Production inference serving framework
CUTLASS 3.x	Template library for high-performance Tensor Core kernels

The Transformer Engine library is open source and lives at github.com/NVIDIA/TransformerEngine.^[16] It integrates with PyTorch through te.LayerNormLinear, te.MultiheadAttention, and similar drop-in modules, and with JAX through Praxis. FlashAttention-3, released in 2024, was rewritten specifically to take advantage of Hopper TMA and asynchronous warp specialization, and pushed attention throughput on H100 to roughly 75 percent of FP16 Tensor Core peak.

Which models were trained on Hopper?

Hopper became the dominant training accelerator for frontier AI models from 2022 through 2024. Most training clusters are reported in approximate H100 counts, since the exact numbers are usually company-confidential.

Model	Lab	Reported scale
GPT-4	OpenAI	Reportedly thousands of H100s, never officially confirmed
GPT-4o, o1	OpenAI	H100 clusters, sizes not disclosed
Claude 3 family	Anthropic	H100 clusters at scale
Gemini 1.5, Gemini Ultra	Google	Mix of H100 and TPU
Llama 2	Meta	A100 and H100
Llama 3 70B	Meta	H100 cluster, 24,000-GPU class
Llama 3.1 405B	Meta	16,384 H100 GPUs over 54 days
Mistral Large, Mixtral	Mistral AI	H100
Grok 2	xAI	H100
DBRX	Databricks (Mosaic)	H100

The Llama 3.1 405B training run is the most thoroughly documented, since Meta published its technical report.^[13] The 16,384-GPU cluster sustained around 380 teraFLOP/s per H100 in BF16 and experienced roughly one component failure every three hours, with HBM3 and GPU faults accounting for about half of all failures.^[12]

Datacenter deployment

Hopper ships in several reference platforms designed for different scales of deployment.

Platform	Description
HGX H100	Eight-GPU SXM5 baseboard with NVSwitch, the building block for most cloud and DGX systems
HGX H200	Same baseboard architecture as HGX H100 but with H200 GPUs
DGX H100	NVIDIA's reference 8-GPU server with dual Intel Sapphire Rapids CPUs, 2 TB system memory, and NVSwitch
DGX H200	DGX with H200 GPUs
DGX SuperPOD	Modular cluster reference design built from 32 DGX H100 nodes (256 H100s) connected through NVLink Switch and InfiniBand
GH200 systems	Single-socket and dual-socket Grace Hopper systems from Supermicro, GIGABYTE, and others
MGX	NVIDIA's modular reference platform for partner-built Hopper servers

Major cloud providers have all built large H100 fleets. AWS offers H100 through P5 instances (eight H100 SXM per instance). Microsoft Azure offers ND H100 v5 series. Google Cloud offers A3 instances. Oracle Cloud Infrastructure built one of the largest H100 superclusters for OpenAI's training. CoreWeave and Lambda Labs operate H100 clouds focused on AI workloads. By 2024 the secondary GPU cloud market had grown to hundreds of operators reselling H100 capacity.

How much does an H100 cost?

H100 supply was the binding constraint on AI development through 2023 and most of 2024. Industry reporting from Raymond James and others in late 2023 and early 2024 placed the per-GPU price for H100 SXM at roughly $25,000 to $30,000 with secondary market prices reportedly above $40,000. Complete DGX H100 systems were widely cited as costing $400,000 to $500,000.

Lead times for direct H100 orders from NVIDIA reached six months to over a year at the peak. Cloud provider allocation became a major business consideration; reports suggested that H100 access drove material revenue at AWS, Azure, GCP, and Oracle Cloud, and was a competitive differentiator for AI-focused clouds like CoreWeave. Several large AI labs disclosed multi-billion dollar H100 commitments. The supply constraint shifted upstream to TSMC's CoWoS advanced packaging capacity and to SK Hynix and Micron HBM3/HBM3e production, both of which became their own well-publicized bottlenecks.

Limitations

Hopper has several well-understood limitations that drove the design of Blackwell. The 700 W TDP per SXM5 module pushed datacenter cooling to its limits and effectively required liquid cooling at rack densities above 40 kW per rack.^[1] Even at lower densities, the power and thermal envelope of an HGX H100 server (roughly 10 kW for the GPU portion alone) requires careful facility design.

The chip is monolithic and at 814 mm squared sits near the reticle limit of TSMC's lithography tools, which constrains how much further single-die scaling can go.^[2] Blackwell's response was to move to a dual-die package. Memory capacity per GPU was a frequent bottleneck for serving very large models; H100's 80 GB forced the use of tensor parallelism across multiple GPUs even for inference of 70B-class models, which is why H200's jump to 141 GB and the GH200's 480 GB of CPU memory were so well received.^[6]

HBM supply was the most-cited single constraint on H100 production. The chip uses five active stacks of HBM3 per GPU, and the total industry HBM3 output through 2023 and 2024 was insufficient to meet H100 demand, let alone leave headroom for the H200 and Blackwell ramp.^[2]

Finally, FP8 software tooling was immature in the first year after launch. Production training in FP8 only became routine in late 2023 and 2024 as the Transformer Engine library matured, FlashAttention-3 landed, and major training frameworks (Megatron-LM, NeMo, MosaicML/Composer) shipped robust FP8 paths.

References

NVIDIA. "NVIDIA H100 Tensor Core GPU Architecture" whitepaper, March 2022. https://resources.nvidia.com/en-us-hopper-architecture/nvidia-h100-tensor-c ↩
NVIDIA Developer Blog. "NVIDIA Hopper Architecture In-Depth." https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/ ↩
NVIDIA. "H100 Tensor Core GPU" product page and datasheet. https://www.nvidia.com/en-us/data-center/h100/ ↩
NVIDIA. "H200 Tensor Core GPU" product page and datasheet. https://www.nvidia.com/en-us/data-center/h200/
NVIDIA. "GH200 Grace Hopper Superchip" product page. https://www.nvidia.com/en-us/data-center/grace-hopper-superchip/ ↩
NVIDIA Newsroom. "NVIDIA Supercharges Hopper, the World's Leading AI Computing Platform." November 13, 2023. https://nvidianews.nvidia.com/news/nvidia-supercharges-hopper-the-worlds-leading-ai-computing-platform ↩
NVIDIA Investor Relations. "NVIDIA Grace Hopper Superchips Designed for Accelerated Generative AI Enter Full Production." May 28, 2023.
NVIDIA Blog. "H100 Transformer Engine Supercharges AI Training." https://blogs.nvidia.com/blog/h100-transformer-engine/ ↩
Wikipedia. "Hopper (microarchitecture)." https://en.wikipedia.org/wiki/Hopper_(microarchitecture) ↩
Tom's Hardware. "Nvidia Announces H200 GPU: 141GB of HBM3e and 4.8 TB/s Bandwidth." November 13, 2023. https://www.tomshardware.com/news/nvidia-h200-gpu-announced ↩
AnandTech. "NVIDIA Unveils Updated GH200 Grace Hopper Superchip with HBM3e Memory." August 2023. https://www.anandtech.com/show/20001/nvidia-unveils-gh200-grace-hopper-gpu-with-hbm3e-memory ↩
Tom's Hardware. "Faulty Nvidia H100 GPUs and HBM3 memory caused half of failures during Llama 3 training." ↩
Meta AI. "Introducing Llama 3.1: Our Most Capable Models to Date." https://ai.meta.com/blog/meta-llama-3-1/ ↩
ChipsAndCheese. "Nvidia's H100: Funny L2, and Tons of Bandwidth." https://chipsandcheese.com/p/nvidias-h100-funny-l2-and-tons-of-bandwidth ↩
Luo et al., "Benchmarking and Dissecting the Nvidia Hopper GPU Architecture." arXiv:2402.13499. https://arxiv.org/html/2402.13499v1 ↩
NVIDIA. Transformer Engine library on GitHub. https://github.com/NVIDIA/TransformerEngine ↩
NVIDIA Newsroom. "NVIDIA Announces Hopper Architecture, the Next Generation of Accelerated Computing." March 22, 2022. https://nvidianews.nvidia.com/news/nvidia-announces-hopper-architecture-the-next-generation-of-accelerated-computing ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

4 revisions by 1 contributors · full history

Suggest edit

NVIDIA Hopper

When was Hopper announced?

What products are in the Hopper family?

Architecture and the GH100 die

What makes Hopper different from Ampere?

How fast is the H100?

How does Hopper compare with Ampere (A100)?

How does Hopper compare with Blackwell?

Software stack

Which models were trained on Hopper?

Datacenter deployment

How much does an H100 cost?

Limitations

References

Improve this article

What links here (24 of 36)

What links here (24 of 36)

When was Hopper announced?

What products are in the Hopper family?

Architecture and the GH100 die

What makes Hopper different from Ampere?

How fast is the H100?

How does Hopper compare with Ampere (A100)?

How does Hopper compare with Blackwell?

Software stack

Which models were trained on Hopper?

Datacenter deployment

How much does an H100 cost?

Limitations

References

Improve this article

Related Articles

NVIDIA A100

NVLink

NVIDIA H200

NVIDIA B200

NVIDIA GB300 NVL72

NVIDIA DGX B300

What links here (24 of 36)

Related Articles

NVIDIA A100

NVLink

NVIDIA H200

NVIDIA B200

NVIDIA GB300 NVL72

NVIDIA DGX B300

What links here (24 of 36)