NVIDIA Vera (CPU)
Last reviewed
Jun 2, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 1,933 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 2, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 1,933 words
Add missing citations, update stale details, or suggest a clearer explanation.
NVIDIA Vera is a data center central processing unit developed by NVIDIA and positioned by the company as "the CPU for agents," meaning a processor designed first for agentic AI workloads rather than the human-driven, interactive computing that has shaped server CPUs for decades. It was broken out as its own announcement during Jensen Huang's GTC Taipei keynote at COMPUTEX 2026 on June 1, 2026. Vera is built around 88 custom Arm-compatible "Olympus" cores and NVIDIA Spatial Multithreading, and it serves as the host processor for the NVIDIA Vera Rubin platform, where it is paired with NVIDIA's Rubin GPUs over a coherent NVLink-C2C link. NVIDIA describes Vera as the successor to its Grace CPU and claims it completes agentic tasks up to 1.8 times faster than leading x86 processors.[1][2][3]
This article covers the Vera CPU specifically. For the full rack, the GPUs, and the broader system, see NVIDIA Vera Rubin.
| Attribute | Detail |
|---|---|
| Developer | NVIDIA |
| Type | Data center CPU for agentic AI |
| Announced | June 1, 2026 (GTC Taipei keynote at COMPUTEX 2026)[3] |
| CPU cores | 88 custom NVIDIA "Olympus" cores[1][2] |
| Instruction set | Arm-compatible (Armv9.2 per reporting)[4] |
| Threading | NVIDIA Spatial Multithreading, 176 threads[2] |
| Memory | LPDDR5X, up to 1.5 TB capacity[2] |
| Memory bandwidth | Up to 1.2 TB/s[1][2] |
| CPU-GPU link | NVLink-C2C, up to 1.8 TB/s coherent[1][2] |
| On-chip fabric | Second-generation SCF, 3.4 TB/s bisection bandwidth[2] |
| Paired GPU | NVIDIA Rubin (in Vera Rubin)[5] |
| Predecessor | NVIDIA Grace CPU[1] |
| Availability | Full production; shipping fall 2026[1][6] |
Vera is NVIDIA's second-generation data center CPU and its first to use a fully custom CPU core, rather than the licensed Arm Neoverse cores used in the earlier Grace design. NVIDIA frames it as a clean break in CPU philosophy. In Huang's words from the keynote, earlier processors were "created for humans," whereas Vera was "built for agents," the reasoning being that fleets of autonomous AI agents will become the dominant consumers of compute and that they place very different demands on a CPU than a person clicking through an application does.[3][7]
In practice that positioning translates into a chip tuned for the parts of an AI workload that a GPU does not handle well: orchestration, control flow, data movement, scheduling, and keeping accelerators fed. NVIDIA targets three workload classes explicitly, agentic AI, reinforcement learning, and large-scale data processing, all of which mix latency-sensitive control logic with heavy memory traffic. The company's headline claim is that Vera delivers "up to 1.8x faster task completion compared with x86 CPUs" on these workloads. That figure is an NVIDIA marketing claim measured on its own agentic benchmarks and should be read as such.[1][2]
Vera does not ship as a standalone consumer part. It is sold inside NVIDIA's data center systems, principally as the host CPU of the Vera Rubin platform, and in a dedicated CPU-only rack configuration for customers who want Vera compute without the GPUs.[6][8]
Each Vera CPU integrates 88 of NVIDIA's custom "Olympus" cores on a single monolithic compute die. NVIDIA designed Olympus for high single-thread performance and energy efficiency while remaining fully Arm-compatible, with reporting from launch coverage indicating compatibility with the Armv9.2 instruction set and support for FP8 precision.[2][4]
The core uses a wide, deep microarchitecture aimed at the control-heavy, data-movement-intensive code that surrounds modern AI inference. NVIDIA describes a 10-wide instruction fetch and decode frontend and a neural branch predictor capable of evaluating two taken branches per cycle, alongside improved prefetching and load-store performance. The intent is to extract high instructions-per-clock on the irregular, branchy code that drives agent orchestration, where a GPU is a poor fit.[2]
Keeping everything on one compute die is a deliberate choice. NVIDIA pairs that single die with adjacent dielets that implement the memory and I/O subsystems, but the 88 cores and the coherency fabric live together on one piece of silicon. The company argues this avoids cross-chiplet latency and gives the predictable, low-variance response times that agent workloads need.[2]
The standout architectural feature is what NVIDIA calls Spatial Multithreading, its variant of simultaneous multithreading. Across 88 cores Vera exposes 176 threads, two per core. Where conventional SMT time-shares a single core's execution resources between threads, NVIDIA's approach partitions the core's resources spatially so the two threads occupy separate execution lanes inside the core.[2][4]
The payoff NVIDIA emphasizes is determinism. Because the threads are not competing for the same shared structures cycle by cycle, the design is meant to deliver stronger isolation between threads and more predictable tail latency, which matters when many agents share a machine and one slow request can stall a pipeline. NVIDIA also notes that operators can choose between maximizing per-thread performance and maximizing thread count at runtime, tuning the same hardware toward latency or throughput depending on the workload.[2]
Vera uses a second-generation LPDDR5X memory subsystem that delivers up to 1.2 TB/s of total bandwidth and up to 1.5 TB of capacity. NVIDIA reports roughly 14 GB/s of memory bandwidth provisioned per core, which it characterizes as about three times the per-core rate of traditional data center CPUs using DDR5, and about twice the bandwidth at half the power of conventional CPU memory.[1][2]
A notable packaging change accompanies the move to LPDDR5X. Vera's memory ships on SOCAMM modules (small outline compression-attached memory modules), which replace the soldered LPDDR found in earlier low-power designs with detachable, field-replaceable modules. This brings low-power mobile-class memory into a serviceable data center form factor, addressing one of the practical drawbacks of soldered memory at rack scale.[2]
Binding the cores and memory together is a second-generation Scalable Coherency Fabric (SCF). The SCF connects all 88 Olympus cores to a shared, unified last-level cache and provides 3.4 TB/s of bisection bandwidth, and NVIDIA states the design sustains over 90 percent of peak memory bandwidth under load. Independent launch coverage adds further detail not broken out by NVIDIA, reporting 2 MB of L2 cache per core and a unified L3 cache of 164 MB.[2][4]
Vera connects to NVIDIA's GPUs over NVLink-C2C (chip-to-chip), a coherent interconnect that provides up to 1.8 TB/s of bandwidth between the CPU and GPU. This is roughly double the 900 GB/s NVLink-C2C link used in the earlier Grace Hopper generation.[1][2]
Coherence is the important property. The link lets a Vera CPU and the GPUs it serves share a single unified memory address space, so the GPU can reach the CPU's LPDDR5X and the CPU can reach the GPU's HBM as one coherent pool without explicit copies. For agentic and reasoning workloads this is what makes practices like KV-cache offload, where key-value attention state spills from GPU memory into the larger CPU memory pool, efficient enough to be useful. NVIDIA positions Vera not as a passive host but as a high-bandwidth data-movement engine tightly coupled to GPU execution.[1][2][5]
Vera is the host CPU of the Vera Rubin platform, NVIDIA's Blackwell successor for AI factories. The basic building block is the Vera Rubin Superchip, which pairs one Vera CPU with two Rubin GPUs over NVLink-C2C. In the flagship rack, the Vera Rubin NVL72, NVIDIA combines 36 Vera CPUs with 72 Rubin GPUs in a single liquid-cooled, NVLink-connected domain, a two-to-one GPU-to-CPU ratio.[5][9]
Within that system Vera handles orchestration, data movement, and coherent memory access across the rack while the Rubin GPUs do the dense math. The platform also includes NVIDIA's Rubin CPX accelerators and BlueField DPUs for specific phases of inference and for networking and storage offload. Full details of the rack, the GPUs, and the networking are covered on the NVIDIA Vera Rubin page.[5]
Separately, NVIDIA introduced a CPU-only Vera rack for customers who want Vera's general-purpose compute on its own. Launch coverage describes a configuration packing up to 256 liquid-cooled Vera CPUs per rack, with NVIDIA citing large gains in CPU throughput over conventional x86 server racks.[6][8]
Vera is the direct successor to NVIDIA's Grace CPU, the Arm-based data center processor introduced earlier in the Grace Hopper and Grace Blackwell generations. NVIDIA says Grace had shipped nearly 2.5 million units by the time Vera was announced, and it presents Vera as building on that base while moving to a fully custom core.[1]
The generational jump is substantial on paper. Reporting comparing the two parts notes that Vera increases the core count from 72 to 88, roughly doubles memory capacity, raises memory bandwidth from the Grace generation's figure to 1.2 TB/s, and doubles the NVLink-C2C link to 1.8 TB/s. The biggest qualitative change is the switch from licensed Arm Neoverse cores to NVIDIA's own Olympus cores, which is what lets the company tune the microarchitecture specifically for agentic workloads.[1][4][5]
The table below collects the figures cited at launch. Core platform numbers come from NVIDIA's own materials; the cache, interconnect-standard, and power figures marked as reported come from independent launch coverage and should be treated as such until fully documented by NVIDIA.
| Specification | Value | Source basis |
|---|---|---|
| Cores | 88 custom Olympus cores | NVIDIA[1][2] |
| Threads | 176 (Spatial Multithreading) | NVIDIA[2] |
| Instruction set | Arm-compatible; Armv9.2 | NVIDIA / reported[2][4] |
| Frontend | 10-wide fetch and decode | NVIDIA[2] |
| Branch predictor | Neural, 2 taken branches/cycle | NVIDIA[2] |
| L2 cache | 2 MB per core | Reported[4] |
| L3 cache | 164 MB unified | Reported[4] |
| On-chip fabric | 2nd-gen SCF, 3.4 TB/s bisection | NVIDIA[2] |
| Memory | LPDDR5X (SOCAMM), up to 1.5 TB | NVIDIA[2] |
| Memory bandwidth | Up to 1.2 TB/s (~14 GB/s per core) | NVIDIA[1][2] |
| CPU-GPU link | NVLink-C2C, up to 1.8 TB/s coherent | NVIDIA[1][2] |
| Host I/O | PCIe Gen 6, CXL 3.1 | Reported[4] |
| Precision | FP8 supported | Reported[4] |
| Socket power | ~450 W (as tested) | Reported[4] |
NVIDIA said at GTC Taipei that the Vera CPU and the Vera Rubin platform were in full production and on schedule to ship in the fall of 2026, with broader availability through the second half of the year. The company named early adopters spanning AI labs, finance, and cloud, including Anthropic, OpenAI, ByteDance, CoreWeave, Lambda, Nebius, Nscale, and Oracle Cloud Infrastructure, and listed system makers Cisco, Dell, HPE, Lenovo, and Supermicro as building Vera-based servers.[1][2][6]