NVIDIA Vera (CPU)

AI Hardware NVIDIA

10 min read

Updated Jun 28, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 28, 2026

Fact-checked

In review queue

Sources

10 citations

Revision

v2 · 2,079 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

NVIDIA Vera is a custom Arm-based data center central processing unit from NVIDIA, unveiled at the GTC Taipei keynote at COMPUTEX 2026 on June 1, 2026 and marketed by the company as "the CPU for agents," a processor designed first for agentic AI workloads rather than the human-driven, interactive computing that has shaped server CPUs for decades. Vera is built around 88 custom Arm-compatible "Olympus" cores with NVIDIA Spatial Multithreading (176 threads), and it serves as the host processor for the NVIDIA Vera Rubin platform, where it is paired with NVIDIA's Rubin GPUs over a coherent NVLink-C2C link that carries up to 1.8 TB/s. NVIDIA presents Vera as the next step beyond its Grace CPU and claims it completes agentic tasks up to 1.8 times faster than leading x86 processors.^[1]^[2]^[3]

This article covers the Vera CPU specifically. For the full rack, the GPUs, and the broader system, see NVIDIA Vera Rubin.

Key facts

Attribute	Detail
Developer	NVIDIA
Type	Data center CPU for agentic AI
Announced	June 1, 2026 (GTC Taipei keynote at COMPUTEX 2026)^[3]
CPU cores	88 custom NVIDIA "Olympus" cores^[1]^[2]
Instruction set	Arm-compatible (Armv9.2 per reporting)^[4]
Threading	NVIDIA Spatial Multithreading, 176 threads^[2]
Memory	LPDDR5X, up to 1.5 TB capacity^[2]
Memory bandwidth	Up to 1.2 TB/s^[1]^[2]
CPU-GPU link	NVLink-C2C, up to 1.8 TB/s coherent^[1]^[2]
On-chip fabric	Second-generation SCF, 3.4 TB/s bisection bandwidth^[2]
Paired GPU	NVIDIA Rubin (in Vera Rubin)^[5]
Predecessor	NVIDIA Grace CPU^[1]
Availability	Full production; shipping fall 2026^[1]^[6]

What is the NVIDIA Vera CPU?

Vera is NVIDIA's second-generation data center CPU and its first to use a fully custom CPU core, rather than the licensed Arm Neoverse cores used in the earlier Grace design. NVIDIA frames it as a clean break in CPU philosophy. Announcing the chip, NVIDIA chief executive Jensen Huang said Vera is "the first CPU designed for that future, built to run agentic AI at hyperscale with extraordinary performance, efficiency and programmability," the reasoning being that fleets of autonomous AI agents will become the dominant consumers of compute and that they place very different demands on a CPU than a person clicking through an application does.^[1]^[3]^[7]

In practice that positioning translates into a chip tuned for the parts of an AI workload that a GPU does not handle well: orchestration, control flow, data movement, scheduling, and keeping accelerators fed. NVIDIA targets three workload classes explicitly, agentic AI, reinforcement learning, and large-scale data processing, all of which mix latency-sensitive control logic with heavy memory traffic. The company's headline claim is that Vera delivers "up to 1.8x faster task completion compared with x86 CPUs" on these workloads. That figure is an NVIDIA marketing claim measured on its own agentic benchmarks and should be read as such.^[1]^[2]

Vera does not ship as a standalone consumer part. It is sold inside NVIDIA's data center systems, principally as the host CPU of the Vera Rubin platform, and in a dedicated CPU-only rack configuration for customers who want Vera compute without the GPUs.^[6]^[8]

What is the Olympus core architecture?

Each Vera CPU integrates 88 of NVIDIA's custom "Olympus" cores on a single monolithic compute die. NVIDIA designed Olympus for high single-thread performance and energy efficiency while remaining fully Arm-compatible, with reporting from launch coverage indicating compatibility with the Armv9.2 instruction set and support for FP8 precision.^[2]^[4]

The core uses a wide, deep microarchitecture aimed at the control-heavy, data-movement-intensive code that surrounds modern AI inference. NVIDIA describes a 10-wide instruction fetch and decode frontend and a neural branch predictor capable of evaluating two taken branches per cycle, alongside improved prefetching and load-store performance. The intent is to extract high instructions-per-clock on the irregular, branchy code that drives agent orchestration, where a GPU is a poor fit.^[2]

Keeping everything on one compute die is a deliberate choice. NVIDIA pairs that single die with adjacent dielets that implement the memory and I/O subsystems, but the 88 cores and the coherency fabric live together on one piece of silicon. The company argues this avoids cross-chiplet latency and gives the predictable, low-variance response times that agent workloads need.^[2]

How does Spatial Multithreading work?

The standout architectural feature is what NVIDIA calls Spatial Multithreading, its variant of simultaneous multithreading. Across 88 cores Vera exposes 176 threads, two per core. Where conventional SMT time-shares a single core's execution resources between threads, NVIDIA's approach partitions the core's resources spatially so the two threads occupy separate execution lanes inside the core.^[2]^[4]

The payoff NVIDIA emphasizes is determinism. Because the threads are not competing for the same shared structures cycle by cycle, the design is meant to deliver stronger isolation between threads and more predictable tail latency, which matters when many agents share a machine and one slow request can stall a pipeline. NVIDIA also notes that operators can choose between maximizing per-thread performance and maximizing thread count at runtime, tuning the same hardware toward latency or throughput depending on the workload.^[2]

How much memory bandwidth does Vera have?

Vera uses a second-generation LPDDR5X memory subsystem that delivers up to 1.2 TB/s of total bandwidth and up to 1.5 TB of capacity. NVIDIA reports roughly 14 GB/s of memory bandwidth provisioned per core, which it characterizes as about three times the bandwidth per core of leading x86 CPUs using DDR5, and about twice the bandwidth at half the power of conventional CPU memory.^[1]^[2]^[9]

A notable packaging change accompanies the move to LPDDR5X. Vera's memory ships on SOCAMM modules (small outline compression-attached memory modules), which replace the soldered LPDDR found in earlier low-power designs with detachable, field-replaceable modules. This brings low-power mobile-class memory into a serviceable data center form factor, addressing one of the practical drawbacks of soldered memory at rack scale.^[2]

Binding the cores and memory together is a second-generation Scalable Coherency Fabric (SCF). The SCF connects all 88 Olympus cores to a shared, unified last-level cache and provides 3.4 TB/s of bisection bandwidth, and NVIDIA states the design sustains over 90 percent of peak memory bandwidth under load. Independent launch coverage adds further detail not broken out by NVIDIA, reporting 2 MB of L2 cache per core and a unified L3 cache of 164 MB.^[2]^[4]

How does Vera connect to the Rubin GPU?

Vera connects to NVIDIA's GPUs over NVLink-C2C (chip-to-chip), a coherent interconnect that provides up to 1.8 TB/s of bandwidth between the CPU and GPU. This is roughly double the 900 GB/s NVLink-C2C link used in the earlier Grace Hopper generation.^[1]^[2]

Coherence is the important property. The link lets a Vera CPU and the GPUs it serves share a single unified memory address space, so the GPU can reach the CPU's LPDDR5X and the CPU can reach the GPU's HBM as one coherent pool without explicit copies. For agentic and reasoning workloads this is what makes practices like KV-cache offload, where key-value attention state spills from GPU memory into the larger CPU memory pool, efficient enough to be useful. NVIDIA positions Vera not as a passive host but as a high-bandwidth data-movement engine tightly coupled to GPU execution.^[1]^[2]^[5]

What is the Vera Rubin platform?

Vera is the host CPU of the Vera Rubin platform, NVIDIA's Blackwell successor for AI factories. The basic building block is the Vera Rubin Superchip, which pairs one Vera CPU with two Rubin GPUs over NVLink-C2C. In the flagship rack, the Vera Rubin NVL72, NVIDIA combines 36 Vera CPUs with 72 Rubin GPUs in a single liquid-cooled, NVLink-connected domain, a two-to-one GPU-to-CPU ratio.^[5]^[10]

Within that system Vera handles orchestration, data movement, and coherent memory access across the rack while the Rubin GPUs do the dense math. The platform also includes NVIDIA's Rubin CPX accelerators and BlueField DPUs for specific phases of inference and for networking and storage offload. Full details of the rack, the GPUs, and the networking are covered on the NVIDIA Vera Rubin page.^[5]

Separately, NVIDIA introduced a CPU-only Vera rack for customers who want Vera's general-purpose compute on its own. Launch coverage describes a configuration packing up to 256 liquid-cooled Vera CPUs per rack, with NVIDIA citing large gains in CPU throughput over conventional x86 server racks.^[6]^[8]

How does Vera differ from Grace?

Vera is the direct successor to NVIDIA's Grace CPU, the Arm-based data center processor introduced earlier in the Grace Hopper and Grace Blackwell generations. NVIDIA says Grace had shipped nearly 2.5 million units by the time Vera was announced, and it presents Vera as building on that base while moving to a fully custom core.^[1]

The generational jump is substantial on paper. Reporting comparing the two parts notes that Vera increases the core count from 72 to 88, roughly doubles memory capacity, raises memory bandwidth from the Grace generation's figure to 1.2 TB/s, and doubles the NVLink-C2C link to 1.8 TB/s. The biggest qualitative change is the switch from licensed Arm Neoverse cores to NVIDIA's own Olympus cores, which is what lets the company tune the microarchitecture specifically for agentic workloads.^[1]^[4]^[5]

Metric	Grace	Vera	Source basis
CPU cores	72 (Arm Neoverse)	88 (custom Olympus)	NVIDIA / reported^[1]^[4]^[5]
Threads	72	176 (Spatial Multithreading)	NVIDIA / reported^[2]^[4]
Memory bandwidth	Lower generation figure	Up to 1.2 TB/s	NVIDIA^[1]^[2]
NVLink-C2C	900 GB/s	Up to 1.8 TB/s	NVIDIA^[1]^[2]
Core source	Licensed Arm Neoverse	NVIDIA custom Olympus	NVIDIA / reported^[1]^[4]

Reported specifications

The table below collects the figures cited at launch. Core platform numbers come from NVIDIA's own materials; the cache, interconnect-standard, and power figures marked as reported come from independent launch coverage and should be treated as such until fully documented by NVIDIA.

Specification	Value	Source basis
Cores	88 custom Olympus cores	NVIDIA^[1]^[2]
Threads	176 (Spatial Multithreading)	NVIDIA^[2]
Instruction set	Arm-compatible; Armv9.2	NVIDIA / reported^[2]^[4]
Frontend	10-wide fetch and decode	NVIDIA^[2]
Branch predictor	Neural, 2 taken branches/cycle	NVIDIA^[2]
L2 cache	2 MB per core	Reported^[4]
L3 cache	164 MB unified	Reported^[4]
On-chip fabric	2nd-gen SCF, 3.4 TB/s bisection	NVIDIA^[2]
Memory	LPDDR5X (SOCAMM), up to 1.5 TB	NVIDIA^[2]
Memory bandwidth	Up to 1.2 TB/s (~14 GB/s per core, ~3x x86 DDR5)	NVIDIA^[1]^[2]^[9]
CPU-GPU link	NVLink-C2C, up to 1.8 TB/s coherent	NVIDIA^[1]^[2]
Host I/O	PCIe Gen 6, CXL 3.1	Reported^[4]
Precision	FP8 supported	Reported^[4]
Socket power	~450 W (as tested)	Reported^[4]

When is Vera available, and who is using it?

NVIDIA said at GTC Taipei that the Vera CPU and the Vera Rubin platform were in full production and on schedule to ship in the fall of 2026, with broader availability through the second half of the year. The company named early adopters spanning AI labs, finance, and cloud, including Anthropic, OpenAI, ByteDance, CoreWeave, Lambda, Nebius, Nscale, and Oracle Cloud Infrastructure, and listed system makers Cisco, Dell, HPE, Lenovo, and Supermicro as building Vera-based servers.^[1]^[2]^[6]

References

NVIDIA Newsroom. "NVIDIA Unveils Vera, the CPU for Agents." https://nvidianews.nvidia.com/news/nvidia-unveils-vera-the-cpu-for-agents ↩
NVIDIA Technical Blog. "NVIDIA Vera CPU Delivers High Performance, Bandwidth, and Efficiency for AI Factories." https://developer.nvidia.com/blog/nvidia-vera-cpu-delivers-high-performance-bandwidth-and-efficiency-for-ai-factories/ ↩
NVIDIA Blog. "NVIDIA GTC Taipei at COMPUTEX: Live Updates on What's Next in AI." https://blogs.nvidia.com/blog/nvidia-gtc-taipei-computex-2026-news/ ↩
SQ Magazine. "NVIDIA Vera ARM CPU Outperforms Intel Xeon and AMD EPYC." https://sqmagazine.co.uk/nvidia-vera-arm-cpu-intel-amd-benchmarks/ ↩
NVIDIA Technical Blog. "Inside the NVIDIA Vera Rubin Platform: Six New Chips, One AI Supercomputer." https://developer.nvidia.com/blog/inside-the-nvidia-rubin-platform-six-new-chips-one-ai-supercomputer/ ↩
DataCenterKnowledge. "GTC Taipei: Nvidia Says Vera Rubin, Vera CPU on Track." https://www.datacenterknowledge.com/data-center-chips/nvidia-says-vera-rubin-vera-cpu-on-track-launches-dsx-os-to-run-ai-factories ↩
DigiTimes. "Nvidia's Vera CPU is built for agents, not humans; Jensen Huang says it opens a market that never existed before." https://www.digitimes.com/news/a20260601VL217/jensen-huang-nvidia-gtc-ai-agent-cpu-2026.html ↩
Tom's Hardware. "Nvidia unveils details of new 88-core Vera CPUs positioned to compete with AMD and Intel." https://www.tomshardware.com/pc-components/gpus/nvidia-unveils-details-of-new-88-core-vera-cpus-positioned-to-compete-with-amd-and-intel-new-vera-cpu-rack-features-256-liquid-cooled-chips-that-deliver-up-to-a-6x-gain-in-cpu-throughput ↩
NVIDIA. "Next Gen Data Center CPU | NVIDIA Vera CPU." https://www.nvidia.com/en-us/data-center/vera-cpu/ ↩
NVIDIA. "NVIDIA Vera Rubin NVL72 | Co-Designed Infrastructure for Agentic AI." https://www.nvidia.com/en-us/data-center/vera-rubin-nvl72/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

NVIDIA Grace NVIDIA Rubin Ultra NVIDIA Vera Rubin

Key facts

What is the NVIDIA Vera CPU?

What is the Olympus core architecture?

How does Spatial Multithreading work?

How much memory bandwidth does Vera have?

How does Vera connect to the Rubin GPU?

What is the Vera Rubin platform?

How does Vera differ from Grace?

Reported specifications

When is Vera available, and who is using it?

References

Improve this article

Related Articles

CuDNN

Jetson Thor

NVIDIA Blackwell

NVIDIA DGX Spark

NVIDIA Picasso

Jensen Huang

What links here

Related Articles

CuDNN

Jetson Thor

NVIDIA Blackwell

NVIDIA DGX Spark

NVIDIA Picasso

Jensen Huang

What links here