NVIDIA Grace
Last reviewed
Jun 3, 2026
Sources
14 citations
Review status
Source-backed
Revision
v1 · 1,740 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
14 citations
Review status
Source-backed
Revision
v1 · 1,740 words
Add missing citations, update stale details, or suggest a clearer explanation.
NVIDIA Grace is a data-center central processing unit (CPU) developed by Nvidia, based on Arm Neoverse V2 cores. Announced in 2021, Grace is the company's first server-class CPU and marks Nvidia's entry into a market historically dominated by x86 processors from Intel and AMD. Rather than competing as a general-purpose server chip, Grace was designed primarily to act as a tightly coupled host processor for Nvidia's data-center GPUs, joined to them by a high-bandwidth coherent interconnect called NVLink-C2C. Grace anchors several of Nvidia's flagship "superchip" products, including the GH200 Grace Hopper Superchip and the GB200 Grace Blackwell Superchip, and powers a number of large supercomputers. Its successor, the Vera CPU, was introduced at GTC 2026. [1][2][7]
Nvidia announced Grace on April 12, 2021, during the company's GPU Technology Conference (GTC) keynote. It was positioned as a CPU built for giant AI and high-performance computing (HPC) workloads, intended to relieve the data-movement and memory bottlenecks that arise when feeding very large models to accelerators. At launch, Nvidia stated that availability was expected in early 2023. [2]
The processor is named after Grace Hopper, the American computer-science pioneer and U.S. Navy rear admiral who helped develop early compilers and the COBOL programming language. The naming is deliberately paired with Nvidia's GPU architecture names: the Hopper GPU generation shares Grace Hopper's surname, so that the combined CPU-plus-GPU module became the "Grace Hopper" Superchip. The first announced adopters were the Swiss National Supercomputing Centre (CSCS) and the U.S. Department of Energy's Los Alamos National Laboratory, both of which planned Grace-powered systems built by Hewlett Packard Enterprise. [2]
Grace is built from Arm Neoverse V2 cores, a high-performance core design implementing the Armv9-A instruction set. A single Grace die integrates 72 cores. Each core includes Nvidia's implementation of the Scalable Vector Extension 2 (SVE2), configured as four 128-bit vector units per core, alongside the older NEON SIMD instructions. [1][3]
The cache hierarchy provides 64 KB of L1 instruction cache and 64 KB of L1 data cache per core, plus 1 MB of L2 cache per core. Cores, memory controllers, and I/O are tied together by Nvidia's second-generation Scalable Coherency Fabric (SCF), which the company quotes at more than 3.2 TB/s of total bisection bandwidth. [1]
A defining feature of Grace is its memory subsystem. Instead of using socketed DIMMs, Grace co-packages server-class LPDDR5X memory with error-correction code (ECC) directly alongside the CPU. Nvidia states that LPDDR5X delivers roughly twice the bandwidth and far better energy efficiency than DDR4 or DDR5 DIMM-based designs, at the cost of fixed, non-upgradeable capacity. A single Grace CPU provides up to 480 GB of LPDDR5X capacity and up to 500 GB/s of memory bandwidth. [1][2]
NVLink-C2C ("chip-to-chip") is the coherent interconnect that allows Grace to function as a high-bandwidth companion to Nvidia GPUs or to a second Grace die. Derived from Nvidia's fourth-generation NVLink technology, the link provides 900 GB/s of total bidirectional bandwidth, which Nvidia describes as seven times the bandwidth of a PCIe Gen5 x16 connection. [1][2][4]
Crucially, NVLink-C2C is cache-coherent. When Grace is paired with a GPU, the CPU and GPU share a single unified memory address space, so the GPU can directly access the CPU's large LPDDR5X pool and the CPU can access the GPU's high-bandwidth memory without explicit data copies. This coherent CPU-GPU memory model is the central design idea behind the Grace-based superchips and is well suited to AI training, inference, and HPC applications whose working sets exceed GPU memory alone. [4]
Grace appears in several distinct products. As a standalone CPU it is sold as the Grace CPU Superchip; it also serves as the host processor inside Nvidia's GPU-bearing superchips.
| Product | CPU | GPU | NVLink-C2C link | Notable use |
|---|---|---|---|---|
| Grace CPU Superchip | Two Grace dies (144 cores) | None | Grace to Grace, 900 GB/s | HPC, CPU-bound and memory-bound workloads |
| Grace CPU C1 | One Grace die (72 cores) | None | n/a | Single-socket cloud, edge, storage, telco |
| GH200 Grace Hopper | One Grace (72 cores) | One Hopper H100/H200 | Grace to GPU, 900 GB/s | AI and HPC, unified memory |
| GB200 Grace Blackwell | One Grace (72 cores) | Two Blackwell GPUs | Grace to two GPUs, 900 GB/s | GB200 NVL72 rack-scale AI |
| GB300 Grace Blackwell | One Grace (72 cores) | Two Blackwell Ultra GPUs | Grace to two GPUs | GB300 NVL72, reasoning inference |
The Grace CPU Superchip joins two Grace dies on a single module using NVLink-C2C, presenting 144 Arm Neoverse V2 cores to the operating system. It is aimed at HPC and large AI workloads that benefit from high single-thread performance, very high memory bandwidth, and strong data-movement capability, but that do not require a GPU on the same module.
| Specification | Grace CPU Superchip |
|---|---|
| Cores | 144 Arm Neoverse V2 (two dies of 72) |
| Vector | 4x 128-bit SVE2 per core, plus NEON |
| L1 cache | 64 KB instruction + 64 KB data per core |
| L2 cache | 1 MB per core |
| L3 cache | 234 MB per Superchip |
| Memory | Up to 960 GB co-packaged LPDDR5X with ECC |
| Memory bandwidth | Up to 1 TB/s |
| Die-to-die link | NVLink-C2C, 900 GB/s bidirectional |
| I/O | Up to 8x PCIe Gen5 x16 (up to 1 TB/s total) |
| Power | Up to 500 W TDP including memory |
The Superchip carries 234 MB of distributed L3 cache across the two dies and supports up to eight PCIe Gen5 x16 interfaces, which can be bifurcated for flexible I/O. Nvidia rates the complete module, including its co-packaged memory, at up to 500 W. [1][3][6]
The GH200 combines a 72-core Grace CPU with a Hopper-generation GPU (an H100, and later an H200-class part) over NVLink-C2C, producing a single module with a coherent CPU-plus-GPU memory space. Two memory configurations shipped: an HBM3 version with 96 GB of GPU memory for 576 GB of total fast memory, and an HBM3e version with 144 GB of GPU memory for 624 GB of total fast memory. The HBM3e variant pairs faster GPU memory with the same 480 GB of LPDDR5X on the Grace side. A two-socket variant, the GH200 NVL2, links two GH200 modules over NVLink to expose 144 Arm cores and 288 GB of HBM3e in a single node. [5][7][8]
In the Blackwell generation, Nvidia shifted the CPU-to-GPU ratio. The GB200 Grace Blackwell Superchip pairs a single Grace CPU with two Blackwell GPUs over NVLink-C2C, a 1:2 CPU-to-GPU ratio. These superchips are the building blocks of the rack-scale GB200 NVL72, a liquid-cooled system that connects 36 Grace CPUs and 72 Blackwell GPUs into a single 72-GPU NVLink domain that behaves as one very large accelerator for trillion-parameter LLM inference. The successor GB300 NVL72 keeps the same topology, combining 36 Grace CPUs with 72 Blackwell Ultra GPUs and targeting reasoning and test-time-scaling inference workloads. [9][10]
Grace and Grace Hopper hardware underpins a wave of HPC and AI supercomputers delivered largely by Hewlett Packard Enterprise and Eviden. Notable systems include:
Nvidia has also cited additional Grace Hopper systems such as EXA1-HE in France, Helios in Poland, and Miyabi in Japan, reflecting broad adoption across national HPC centers. [11][12]
At GTC in 2026, Nvidia introduced the Vera CPU as Grace's successor and the host processor of the Vera Rubin platform. Where Grace used licensed Arm Neoverse V2 cores, Vera moves to 88 fully custom Nvidia-designed "Olympus" Arm-compatible cores, which Nvidia claims deliver substantially higher per-core performance. Vera also expands memory capacity and bandwidth relative to Grace and pairs with the Rubin generation of GPUs in rack-scale systems, continuing the coherent CPU-plus-GPU design philosophy that Grace established. [13][14]
Grace represents Nvidia's first serious move into data-center CPUs and a strategic bet on the Arm architecture for the server room. By co-packaging high-bandwidth LPDDR5X memory and tying the CPU to its GPUs with a coherent NVLink-C2C link, Nvidia reframed the CPU not as a standalone competitor to x86 server parts but as an integral, high-bandwidth host for accelerated computing. That approach, embodied first in the Grace CPU Superchip and Grace Hopper and then scaled up dramatically in the Grace Blackwell GB200 and GB300 NVL72 racks, has become a defining feature of Nvidia's AI-infrastructure strategy and set the template carried forward by the Vera CPU. [1][7][9]