NVIDIA GH200 Grace Hopper Superchip

14 min read

Updated Jul 23, 2026

The NVIDIA GH200 Grace Hopper Superchip is a single-module processor from NVIDIA that combines a 72-core Grace Arm CPU with a Hopper-generation H100-class GPU on one package, joined by a 900 GB/s memory-coherent interconnect called NVLink-C2C ^[1]^[8]. Built for large-scale artificial intelligence and high-performance computing (HPC), it gives the GPU fast, coherent access to a large pool of CPU-attached memory, presenting up to 576 GB of unified memory in the HBM3 configuration or 624 GB in the HBM3e configuration ^[1]^[4]. NVIDIA revealed the Grace Hopper design at its GTC conference in March 2022, announced full production at Computex on May 28, 2023, and the chip reached broad system availability through 2024 ^[1]^[2]^[10]. It is the predecessor of the Grace Blackwell GB200 generation.

The central idea of a "superchip" is to stop treating the CPU and GPU as separate devices connected over a comparatively slow PCI Express link, and instead wire them together tightly enough that the GPU can read and write the CPU's memory almost as if it were its own. That helps the largest models, whose parameters, activations, and key-value caches often exceed the GPU's onboard high-bandwidth memory. NVIDIA describes the GH200 as the first true heterogeneous accelerated platform for HPC and AI, meaning a single building block engineered so the CPU and GPU share one memory space rather than copying data back and forth ^[1]^[3].

What is the NVIDIA GH200?

The GH200 is NVIDIA's first Grace Hopper Superchip: one board that fuses a server-class Arm CPU (Grace) and a data-center GPU (Hopper) so they operate as a single accelerated computing unit with a shared, coherent memory space. It targets memory-bound workloads such as large language models, recommendation systems, and graph analytics, where the working set is too big for GPU memory alone and the conventional alternative is slow paging over PCI Express. NVIDIA founder and CEO Jensen Huang framed the design around that surge in demand: "To meet surging demand for generative AI, data centers require accelerated computing platforms with specialized needs. The new GH200 Grace Hopper Superchip platform delivers this with exceptional memory technology and bandwidth to improve throughput, the ability to connect GPUs to aggregate performance without compromise, and a server design that can be easily deployed across the entire data center." ^[5]

How is the GH200 built: Grace CPU plus Hopper GPU

The GH200 has two halves joined by NVLink-C2C. The Grace half is an NVIDIA-designed server CPU built from 72 Arm Neoverse V2 cores on the Armv9 instruction set, paired with up to 480 gigabytes (GB) of LPDDR5X memory with error-correction code (ECC) ^[1]^[7]. Each core carries 64 kilobytes (KB) of L1 instruction cache plus 64 KB of L1 data cache and 1 megabyte (MB) of L2 cache, backed by 114 MB of L3 cache shared across the chip. The cores run at a 3.1 gigahertz (GHz) base frequency, dropping to a 3.0 GHz all-core SIMD frequency under heavy vector load ^[1]. The LPDDR5X subsystem delivers up to 512 GB/s of memory bandwidth. NVIDIA positions Grace as the fastest Arm data-center CPU of its generation and claims roughly twice the performance per watt of conventional x86-64 platforms, with the LPDDR5X memory subsystem providing up to 53 percent more bandwidth than an eight-channel DDR5 design at one-eighth the power per gigabyte per second ^[1]. The CPU also exposes up to four PCIe x16 Gen5 links for networking and storage.

The Hopper half is an H100 Tensor Core GPU, NVIDIA's ninth-generation data-center GPU built on the Hopper architecture ^[1]. It carries fourth-generation Tensor Cores, a Transformer Engine that accelerates the matrix math behind transformer models, and Multi-Instance GPU (MIG) support for partitioning the GPU into isolated slices. The Hopper GPU in the GH200 delivers 34 teraFLOPS of FP64 and 67 teraFLOPS of FP64 via the Tensor Cores, 67 teraFLOPS of FP32, and, on the Tensor Cores, 989 teraFLOPS of TF32, 1,979 teraFLOPS of BFLOAT16 and FP16, and 3,958 teraFLOPS of FP8 (those last figures quoted with sparsity; dense throughput is half) ^[1]. INT8 reaches 3,958 TOPS with sparsity. These are the same compute capabilities as the standalone H100, since the GH200 uses an H100-class Hopper die; what changes is everything around it.

The whole module is a single superchip board with a programmable thermal design power (TDP) spanning 450 watts to 1,000 watts across the CPU, GPU, and memory combined ^[1]. That envelope lets system builders tune the chip for dense, power-constrained racks or for maximum performance, and the module supports both air cooling and liquid cooling.

What are the GH200 specifications?

Component	Specification
CPU	NVIDIA Grace, 72 Arm Neoverse V2 (Armv9) cores
CPU caches	64 KB L1 i + 64 KB L1 d per core, 1 MB L2 per core, 114 MB L3
CPU clocks	3.1 GHz base, 3.0 GHz all-core SIMD
CPU memory	Up to 480 GB LPDDR5X with ECC, up to 512 GB/s
GPU	Hopper H100 Tensor Core GPU
GPU memory	96 GB HBM3 (up to 4 TB/s) or 144 GB HBM3e (up to 4.9 TB/s)
GPU compute	34 TFLOPS FP64; 989 TFLOPS TF32, 3,958 TFLOPS FP8 (with sparsity)
CPU-to-GPU link	NVLink-C2C, 900 GB/s bidirectional (7x PCIe Gen5)
PCIe	Up to 4x PCIe x16 Gen5
Total unified memory	Up to 576 GB (HBM3) or 624 GB (HBM3e) per superchip
TDP	Programmable 450 W to 1,000 W (CPU + GPU + memory)
Coherency	Unified, coherent address space across CPU and GPU memory
Successor	GB200 Grace Blackwell Superchip

How does NVLink-C2C deliver memory coherence?

NVLink-C2C is the heart of the design. It is a memory-coherent, low-latency chip-to-chip interconnect that provides 900 GB/s of total bidirectional bandwidth, which NVIDIA puts at seven times the bandwidth of the PCIe Gen5 lanes used in conventional accelerated systems ^[1]^[8]. Built on fourth-generation NVLink, it lets the GPU reach peer memory with direct loads, stores, and atomic operations rather than the page-migration model that PCIe-attached accelerators rely on.

Coherence is the part that matters most for software. Because the CPU and GPU share one address space and NVLink-C2C keeps their caches consistent, CPU threads and GPU threads can concurrently and transparently read and write both CPU-resident and GPU-resident memory ^[1]^[3]. Developers transfer only the data they actually need instead of copying entire pages to and from the GPU, and they get lightweight synchronization through native atomics that work from either side. The practical effect is that an application can oversubscribe the GPU's HBM and spill into the much larger LPDDR5X pool at high bandwidth, so a working set that would not fit in GPU memory alone still runs without slow manual staging. NVIDIA pitches this as a way to let scientists and engineers focus on algorithms rather than explicit memory management.

What memory does the GH200 have: HBM3 versus HBM3e?

The GH200 shipped in two GPU-memory variants. The original configuration pairs the Hopper GPU with 96 GB of HBM3 at up to 4 terabytes per second (TB/s) of bandwidth, for 576 GB of total unified memory once the 480 GB of Grace LPDDR5X is included ^[1]^[4]. NVIDIA then announced a second variant at SIGGRAPH on August 8, 2023, upgrading the GPU to 144 GB of faster HBM3e and raising GPU memory bandwidth to up to 4.9 TB/s ^[4]^[5]. NVIDIA described that part as the world's first HBM3e processor, with HBM3e running about 50 percent faster than the HBM3 it replaces ^[5]. Systems built on the HBM3e version were expected from leading manufacturers in the second quarter of calendar 2024, while the HBM3 version was already in full production ^[5]^[6].

The Grace side stays the same across both: up to 480 GB of LPDDR5X. Combined with the GPU's HBM, a single GH200 presents up to 576 GB (HBM3) or 624 GB (HBM3e) of fast-access memory to an application through the coherent address space ^[1]^[4]. NVIDIA frames the advantage in terms of headroom: with up to 480 GB of LPDDR5X, the GPU has direct access to roughly seven times more fast memory than its HBM3 alone, or nearly eight times more than its HBM3e alone, depending on the configuration ^[1]. That is far beyond what a standalone H100 with 80 GB of HBM can offer, and it is the reason the GH200 is attractive for memory-bound workloads.

NVIDIA also offers a dual configuration. The SIGGRAPH 2023 dual-superchip design fully connects two GH200 modules over NVLink to present 144 Arm Neoverse cores, eight petaFLOPS of AI performance, 282 GB of HBM3e, and 1.2 TB of fast memory across the pair, with up to 10 TB/s of combined memory bandwidth, which NVIDIA cited as 3.5 times more memory capacity and 3 times more bandwidth than the prior-generation offering ^[5]. The productized version of this design, the GH200 NVL2, fits into a single 2U node and is specified at up to 288 GB of HBM3e (two 144 GB GPUs) with up to 10 TB/s of memory bandwidth, and NVIDIA describes it as delivering up to 3.5 times more GPU memory capacity and 3 times more bandwidth than a single H100 in one server ^[19].

How does the GH200 scale: GH200 NVL32 and rack-scale deployment

Individual GH200 superchips can be combined into much larger machines using the NVIDIA NVLink Switch System. The GH200 NVL32 is a rack-scale platform that connects 32 superchips into a single NVLink domain so they behave as one large accelerator with 19.5 TB of unified memory ^[9]^[11]. It is built from 16 dual-GH200 server nodes, uses nine NVSwitch chips based on third-generation NVSwitch technology, fits the NVIDIA MGX modular chassis design, and is liquid cooled for density ^[11]. NVIDIA reported large gains for this configuration against H100 systems, including roughly 1.7 times faster GPT-3 training, about 2 times faster large-language-model inference on a GPT-530B class model, up to 7.9 times faster training for recommender systems with 10 TB embedding tables, and up to 5.8 times faster GraphSAGE graph-neural-network training ^[11].

The NVLink Switch System scales further than a single rack. A full fabric can connect up to 256 NVLink-connected GPUs, with all GPU threads able to access up to 144 TB of memory at high bandwidth ^[1]. NVIDIA packaged that 256-superchip topology as the DGX GH200 AI supercomputer, announced at Computex in May 2023 and aimed at memory-intensive AI such as large language models, recommender systems, and graph neural networks ^[2]^[10]. On the cloud side, AWS announced at re:Invent in November 2023 that it would be the first cloud provider to offer NVLink-connected GH200 superchips, delivered through NVIDIA DGX Cloud and Amazon EC2 instances ^[9]^[11].

Where is the GH200 used in real supercomputers?

The GH200 anchored several large public supercomputers. The most prominent is JUPITER at the Jülich Supercomputing Centre in Germany, Europe's first exascale system. Its Booster module uses roughly 24,000 GH200 superchips across about 6,000 compute nodes, with four GH200s per node, built on Eviden's BullSequana XH3000 platform with direct liquid cooling ^[12]^[13]. JUPITER exceeds one exaFLOP/s of FP64 performance on the High Performance Linpack benchmark, and in the May 2024 GREEN500 list its early partition ranked first for energy efficiency at more than 60 gigaFLOPS per watt ^[12]^[14].

In Switzerland, the Swiss National Supercomputing Centre (CSCS) built Alps on the HPE Cray EX254n platform, with 10,752 GH200 superchips arranged in quad-GH200 nodes; it was inaugurated in September 2024 and reached about 270 petaFLOPS in its initial configuration ^[15]^[16]. These machines illustrate the pattern NVIDIA designed the GH200 for: tight CPU-GPU coupling at the node level, scaled out across thousands of nodes with high-speed networking such as Slingshot or InfiniBand. System makers including Supermicro and GIGABYTE shipped GH200-based servers as part of NVIDIA's MGX family, broadening availability beyond the largest labs ^[6].

How does the GH200 differ from the H100?

The GPU compute in the GH200 matches a standalone H100: same Hopper die, same fourth-generation Tensor Cores, same Transformer Engine, and the same peak FLOPS ^[1]. The difference is the package around it. A standalone H100 is a discrete accelerator that plugs into a host server and reaches CPU memory over PCI Express; its onboard memory tops out at 80 GB (or 94 GB on the H100 NVL) of HBM. The GH200 instead bundles the GPU with a 72-core Grace CPU on one board, links them with 900 GB/s NVLink-C2C (7x PCIe Gen5), and makes the CPU's up to 480 GB of LPDDR5X directly and coherently addressable by the GPU ^[1]^[8]. The result is up to 576 GB or 624 GB of unified memory versus 80 GB on a discrete H100, which is why the GH200 wins on memory-bound and CPU-GPU-bandwidth-bound workloads rather than on raw GPU throughput. A separate, later product, the H200, keeps the discrete-GPU form factor but upgrades the H100's memory to 141 GB of HBM3e; the GH200 is the CPU-plus-GPU superchip rather than a GPU-only upgrade.

What came after the GH200?

The GH200 marked NVIDIA's shift from selling GPUs that plug into third-party servers toward selling tightly integrated CPU-plus-GPU building blocks for AI infrastructure. The coherent memory model is especially useful for recommendation systems, graph analytics, and large language models with very large embeddings or key-value caches, where the working set does not fit in GPU memory alone and the alternative is slow paging over PCI Express. Because the GH200 runs the standard 64-bit Arm software ecosystem, the same containers, binaries, and operating systems that run on other Arm servers run on Grace Hopper without modification, alongside the full NVIDIA HPC, AI, and Omniverse software stacks ^[1]^[3].

The approach carried directly into the Grace Blackwell GB200, announced at GTC in March 2024. The GB200 replaces the single Hopper GPU with two Blackwell-architecture GPUs per Grace CPU, again linked by a 900 GB/s NVLink-C2C interconnect, and scales to the rack-level GB200 NVL72, which ties 36 Grace Blackwell Superchips (72 Blackwell GPUs and 36 Grace CPUs) into one NVLink domain over fifth-generation NVLink ^[17]^[18]. The lineage continues toward NVIDIA's later Vera Rubin platform. The GH200 also helped popularize the broader industry move toward Arm-based server CPUs co-designed with accelerators, a pattern visible in later host CPUs such as Google Axion and in the wider competition over rack-scale AI systems, including NVIDIA's own NVLink Fusion effort to open the interconnect to third-party silicon.

A limitation of the GH200 is that its coherent-memory advantage depends on software taking advantage of the unified address space, and the LPDDR5X pool, while large, is much slower than HBM, so performance depends on how well a workload's hottest data stays in HBM. The Hopper compute itself matches a standard H100, so the chip's appeal rests on memory capacity, coherence, and CPU-GPU bandwidth rather than raw GPU throughput.

References

^NVIDIA. "NVIDIA GH200 Grace Hopper Superchip Datasheet." NVIDIA, March 2024. resources.nvidia.com/...grace-hopper-superchip
^NVIDIA. "NVIDIA Grace Hopper Superchips Designed for Accelerated Generative AI Enter Full Production." NVIDIA Newsroom, 28 May 2023. nvidianews.nvidia.com/...-ai-enter-full-production
^NVIDIA. "NVIDIA Grace Hopper Superchip Architecture Whitepaper." NVIDIA. resources.nvidia.com/...nvidia-grace-hopper
^NVIDIA. "NVIDIA GH200 Grace Hopper Superchip." NVIDIA Data Center. nvidia.com/...grace-hopper-superchip
^NVIDIA. "NVIDIA Unveils Next-Generation GH200 Grace Hopper Superchip Platform for Era of Accelerated Computing and Generative AI." NVIDIA Newsroom, 8 August 2023. nvidianews.nvidia.com/...perchip-with-hbm3e-memory
^Supermicro. "Supermicro Starts Shipments of NVIDIA GH200 Grace Hopper Superchip-Based Servers, the Industry's First Family of NVIDIA MGX Systems." Supermicro Press Releases. supermicro.com/...e-hopper-superchip-based-servers
^Arm. "Arm Neoverse V2 platform." Arm. arm.com/...neoverse-v2
^NVIDIA. "NVLink and NVLink-C2C." NVIDIA. nvidia.com/...nvlink
^NVIDIA. "AWS and NVIDIA Announce Strategic Collaboration to Offer New Supercomputing Infrastructure, Software and Services for Generative AI." NVIDIA Newsroom, 28 November 2023. nvidianews.nvidia.com/...oration-for-generative-ai
^The Register. "Nvidia DGX GH200 stitches together 256 superchips." 29 May 2023. theregister.com/...nvidia_dgx_gh200_nvlink
^NVIDIA. "One Giant Superchip for LLMs, Recommenders, and GNNs: Introducing NVIDIA GH200 NVL32." NVIDIA Technical Blog. developer.nvidia.com/...oducing-nvidia-gh200-nvl32
^EuroHPC JU. "JUPITER Officially Propels Europe into the Exascale Era." European High Performance Computing Joint Undertaking, 17 November 2025. eurohpc-ju.europa.eu/...exascale-era-2025-11-17_en
^Forschungszentrum Jülich. "JUPITER Technical Overview." Jülich Supercomputing Centre. fz-juelich.de/...tech
^EuroHPC JU. "European Exascale Supercomputer JUPITER Sets New Energy Efficiency Standards with #1 Ranking in GREEN500." 13 May 2024. eurohpc-ju.europa.eu/...ing-green500-2024-05-13_en
^CSCS. "New research infrastructure: 'Alps' supercomputer inaugurated." Swiss National Supercomputing Centre, 2024. cscs.ch/...tructure-alps-supercomputer-inaugurated
^TOP500. "Alps - HPE Cray EX254n, NVIDIA Grace 72C 3.1GHz, NVIDIA GH200 Superchip, Slingshot-11." TOP500. top500.org/...180259
^NVIDIA. "NVIDIA Blackwell Platform Arrives to Power a New Era of Computing." NVIDIA Newsroom, 18 March 2024. nvidianews.nvidia.com/...er-a-new-era-of-computing
^NVIDIA. "NVIDIA GB200 NVL72." NVIDIA Data Center. nvidia.com/...gb200-nvl72
^NVIDIA. "Simplify System Memory Management with the Latest NVIDIA GH200 NVL2 Enterprise RA." NVIDIA Technical Blog. developer.nvidia.com/...a-gh200-nvl2-enterprise-ra

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

3 revisions by 1 contributors · v4 · 2,783 words · full history

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Suggest edit

What links here

ETH Zurich Google Axion Meta-Amazon Graviton deal NVIDIA DGX Cloud NVIDIA Grace NVIDIA H100 NVIDIA Vera (CPU)

What is the NVIDIA GH200?

How is the GH200 built: Grace CPU plus Hopper GPU

What are the GH200 specifications?

How does NVLink-C2C deliver memory coherence?

What memory does the GH200 have: HBM3 versus HBM3e?

How does the GH200 scale: GH200 NVL32 and rack-scale deployment

Where is the GH200 used in real supercomputers?

How does the GH200 differ from the H100?

What came after the GH200?

References

Improve this article

Related Articles

NVIDIA Picasso

NVIDIA H100

NCCL (NVIDIA Collective Communications Library)

NVIDIA B200

NVIDIA GB300 NVL72

NVIDIA DGX B300

What links here

Related Articles

NVIDIA Picasso

NVIDIA H100

NCCL (NVIDIA Collective Communications Library)

NVIDIA B200

NVIDIA GB300 NVL72

NVIDIA DGX B300

What links here