NVIDIA Vera Rubin

NVIDIA Vera Rubin is the next generation of NVIDIA's data center AI computing platform, succeeding the NVIDIA Blackwell architecture. The platform combines the Vera CPU with the Rubin GPU into an integrated superchip designed for agentic AI, large-scale inference, and trillion-parameter model training. It was first publicly unveiled at Computex 2024, fleshed out at GTC March 2025, formally announced as a six-chip platform at CES January 2026, and rebranded as a seven-chip platform at GTC March 2026 after the addition of the Groq 3 LPU. NVIDIA shipped its first Vera Rubin samples to customers in February 2026 and has stated that production shipments will commence in the second half of 2026.

The platform is named after American astronomer Vera Rubin, whose observations of galaxy rotation curves provided the first compelling evidence for dark matter. The naming continues NVIDIA's tradition of honoring scientists on its data center GPU architectures, following Hopper (named for Grace Hopper), Ada Lovelace, and Blackwell (after David Blackwell).

Background

From Hopper to Blackwell

NVIDIA's data center GPU roadmap has followed an annual or near-annual cadence since the Hopper generation. The NVIDIA H100 (Hopper, 2022) established the transformer training and inference workloads that have come to define modern AI computing. Its successor, the Blackwell architecture launched in 2024, introduced a dual-die design with the GB200 superchip pairing two B200 GPU dies with a single Grace CPU.

Blackwell Ultra

NVIDIA Blackwell Ultra (GB300 series) followed in 2025 as a mid-generation refresh. Blackwell Ultra upgraded the HBM3E memory to 288 GB per GPU package, raised the TDP to 1,400 W, and pushed FP4 inference performance to approximately 15 PetaFLOPS per chip. The NVIDIA GB300 NVL72 rack configuration became the standard deployment unit for the largest hyperscaler clusters, housing 72 GPU packages and delivering roughly 1 ExaFLOP of FP4 compute.

Blackwell Ultra also introduced improvements to the MUFU softmax unit, improving attention calculation throughput by 2.5x over standard Blackwell, which reduced a latency bottleneck that had limited inference throughput at long sequence lengths.

Announcement timeline

The Vera Rubin disclosures unfolded across four separate keynote events, each adding more architectural detail:

Event	Date	Disclosure
Computex 2024 (Taipei)	June 2, 2024	First public reveal of "Rubin" GPU and "Vera" CPU as the post-Blackwell roadmap, alongside Blackwell Ultra. NVLink 6, CX9 SuperNIC, and the X1600 converged switch named for the first time. HBM4 confirmed.
GTC March 2025 (San Jose)	March 18, 2025	Vera Rubin Superchip shown publicly for the first time. Rubin Ultra (NVL576, 2027) and the Feynman successor architecture (2028) added to the roadmap. Initial NVL144 die-based naming proposed.
CES 2026 (Las Vegas)	January 5, 2026	NVIDIA announced the platform name "Vera Rubin," confirmed all chips were back from the fab, declared full production, and renamed the rack from VR200 NVL144 (die-based) back to VR200 NVL72 (package-based). The platform was presented as comprising six chips.
GTC March 2026 (San Jose)	March 16, 2026	Seven-chip configuration announced with the addition of the Groq 3 LPU, alongside the broader DGX SuperPOD and the five rack-scale system family.

This staged rollout, with each event adding more concrete specifications, gave hyperscalers an unusually long lead time on infrastructure planning. NVIDIA framed the Rubin generation as purpose-built for the agentic AI era, where models reason across long context windows, chain tool calls, and operate autonomously over extended compute sessions. These workloads place different demands on hardware than the batch inference or fine-tuning tasks that dominated earlier generations.

Naming and inspiration

The Vera Rubin platform takes its name from Vera Florence Cooper Rubin (1928 to 2016), an American astronomer who spent much of her career at the Carnegie Institution of Washington. Rubin's most significant contribution came from her meticulous measurements of galactic rotation curves throughout the 1970s, carried out in collaboration with astronomer Kent Ford.

In a spinning galaxy held together only by visible mass, stars near the outer edges should orbit more slowly than those closer to the center, just as outer planets in the solar system move more slowly than inner ones. Rubin and Ford found the opposite: stars at the periphery of galaxies moved at nearly the same speed as those near the center. The only explanation consistent with Newtonian gravity was the presence of a large quantity of invisible mass distributed throughout the galaxy. This was the first direct observational evidence for what physicists call dark matter.

Rubin's findings transformed cosmology. Dark matter is now understood to constitute roughly 27 percent of the energy content of the universe, compared to about 5 percent for ordinary matter. Despite this foundational contribution, Rubin never received the Nobel Prize in Physics, an omission widely regarded as one of the most consequential oversights in Nobel history. She did receive the National Medal of Science in 1993, the Gold Medal of the Royal Astronomical Society in 1996 (the first woman so honored since 1828), and election to the National Academy of Sciences.

The Vera C. Rubin Observatory in Chile, a major ground-based telescope facility dedicated to the Legacy Survey of Space and Time, was named in her honor in 2020.

NVIDIA's choice of name reflects an effort to commemorate scientists who advanced human knowledge through careful, data-intensive observation. The pairing of the CPU name (Vera) and GPU name (Rubin) as a single honorific preserves the full name in the product line rather than splitting it across generations.

Architecture overview

The Vera Rubin platform is organized around a superchip that integrates one Vera CPU with two Rubin GPU dies through NVLink-C2C, a high-bandwidth chip-to-chip interconnect. NVIDIA describes its approach with Vera Rubin as treating the data center as the unit of compute rather than the individual chip. The NVL72 rack, with 72 GPU packages unified by NVLink 6 into a single high-bandwidth domain, functions more like one large accelerator than a cluster of discrete devices. This architecture lets collective operations such as all-reduce passes during training proceed at memory speeds rather than network speeds.

By GTC 2026, the full deployment configuration involved seven chip families:

Chip	Function
Vera CPU	General computation, reinforcement learning environments, agentic orchestration
Rubin GPU	AI training and generation-phase inference
Rubin CPX GPU	Long-context prefill inference acceleration
NVLink 6 Switch	Scale-up GPU interconnect within a rack
ConnectX-9 SuperNIC	1.6 Tb/s scale-out networking per GPU
BlueField-4 DPU	Network offload, storage disaggregation, security
Spectrum-6 Ethernet Switch	Data center scale-out networking
Groq 3 LPU (added at GTC 2026)	High-throughput decode-phase inference for trillion-parameter models

At CES 2026, NVIDIA's official press release described the platform as "six chips" because Rubin CPX was treated as a Rubin GPU variant rather than a separate chip, and the Groq integration had not yet been announced. NVIDIA's GTC March 2026 press release expanded the count to seven by listing the Groq 3 LPU as a distinct platform component, even as some technical documentation continued to describe the core platform as six chips with Groq 3 LPU as an additional partnership.

Vera CPU

The Vera CPU is NVIDIA's second fully custom Arm-architecture processor for data centers, following the Grace CPU introduced with the Hopper generation. Where Grace used Arm Neoverse V2 cores licensed from Arm Holdings, Vera uses entirely custom microarchitecture cores designed by NVIDIA, called Olympus cores.

Olympus cores

The Olympus core implements the Armv9.2 instruction set architecture but departs from the Neoverse V2 design in its internal pipeline. Each Vera CPU die contains 88 Olympus cores. The cores support NVIDIA Spatial Multithreading, a form of simultaneous multithreading that allows up to 176 threads to execute concurrently per socket (two threads per core).

Per-core resources include 2 MB of L2 cache, with 164 MB of unified L3 cache shared across the socket. NVIDIA describes Olympus as the first general-purpose CPU to support FP8 precision natively in the integer and floating-point pipelines, a feature aimed at small-model serving and reinforcement learning environments where mixed precision is useful even on the host side.

The Scalable Coherency Fabric, now in its second generation, connects all 88 cores to a shared L3 cache and the memory subsystem with 3.4 TB/s of bisection bandwidth.

Memory and connectivity

Vera CPU supports up to 1.5 TB of LPDDR5X memory per socket, a threefold increase over Grace's 480 GB maximum. The memory subsystem sustains up to 1.2 TB/s of bandwidth. NVIDIA reports that the CPU sustains over 90 percent of peak memory bandwidth under realistic workloads, a figure that compares favorably with typical x86 server processors that commonly sustain 60 to 70 percent of rated bandwidth.

The GPU-facing interface uses the second generation of NVLink-C2C at 1.8 TB/s of coherent bandwidth per socket. In the Vera Rubin superchip, the Vera CPU and two Rubin GPU dies share a unified memory address space through this interconnect. CPU and GPU threads can access each other's memory without explicit copy operations, which simplifies programming models for heterogeneous workloads.

NVIDIA's marketing materials cite approximately 2x performance over the Grace generation at comparable power, and approximately 50 percent faster reinforcement-learning evaluation cycles versus competing server CPU platforms. These are NVIDIA-published figures and have not yet been independently benchmarked at scale.

Vera CPU rack

GTC 2026 introduced a CPU-only rack variant that houses 256 liquid-cooled Vera CPUs in a single rack, intended for reinforcement-learning rollouts where many environment instances must run in parallel without requiring GPU compute. NVIDIA cited up to 6x throughput on RL workloads versus a comparably configured Grace-based rack.

Rubin GPU

The Rubin GPU (internally designated VR200) is a dual-die design built on TSMC's N3P process node, a 3 nm-class technology. Each of the two compute dies is reticle-sized, approaching the largest possible single die on the lithography step used. The two dies are assembled on a CoWoS-L (Chip on Wafer on Substrate, Large) interposer alongside two smaller I/O tiles that handle the SerDes interfaces for NVLink, PCIe, and NVLink-C2C connections.

Transistor count and die area

The combined Rubin GPU package contains 336 billion transistors across its two compute dies, compared with approximately 208 billion in the NVIDIA B200. This 1.6x increase in transistor count reflects both the process node improvement from TSMC's 4NP (used in Blackwell) to N3P and the larger reticle layout in the CoWoS-L assembly.

Packaging and die organization

The CoWoS-L (Chip on Wafer on Substrate, Large) interposer used for Rubin places two compute dies side by side, with two smaller I/O tiles arranged at the periphery. The I/O tiles handle the off-package SerDes for NVLink 6, PCIe Gen 6, and NVLink-C2C. Decoupling the I/O from the compute dies lets NVIDIA hold the I/O tiles on a more mature node while pushing the compute dies to N3P, which optimizes both yield and cost. This is a deeper version of the partitioning approach that AMD has used for several generations on its Instinct line and that Intel adopted on Ponte Vecchio, and it has become the dominant packaging strategy for reticle-pushing accelerators across the industry.

The HBM4 stacks sit at the edges of the interposer, four stacks per side, with a 2048-bit interface per stack. Total HBM signal count across all eight stacks is in the tens of thousands of pins, and the substrate routing required to fan all of those signals out to the compute dies is one of the most demanding parts of the design. NVIDIA worked closely with TSMC on substrate routing rule changes to enable the Rubin layout, with reporting in 2025 that early CoWoS-L test substrates carrying the full Rubin signal count had higher than usual rework rates that gradually fell as TSMC's process matured.

Tensor cores and Transformer Engine

The Rubin GPU contains 224 Streaming Multiprocessors, each equipped with fifth-generation Tensor Cores. The fifth-generation design adds a hardware-accelerated adaptive compression unit, the third-generation NVIDIA Transformer Engine, which dynamically selects numerical formats for each layer of a transformer model to preserve accuracy while maximizing throughput. The primary compute format is NVFP4, a 4-bit floating point type introduced with Blackwell Ultra and now the standard for inference at scale on NVIDIA hardware.

NVFP4 uses a two-level scaling scheme paired with hardware-accelerated quantization. NVIDIA reports that NVFP4 delivers near-FP8 accuracy (typically within 1 percent on representative inference benchmarks), reduces memory footprint by approximately 1.8x relative to FP8, and approximately 3.5x relative to FP16.

Peak performance per Rubin GPU package, as published by NVIDIA:

Precision	Performance
NVFP4 (inference)	50 PetaFLOPS
NVFP4 (training)	35 PetaFLOPS
FP8	approximately 17 PetaFLOPS
FP16 / BF16	approximately 8 PetaFLOPS

The Tensor Core path is tightly coupled with expanded special-function units and execution pipelines designed for the attention, activation, and sparse-compute paths common in modern reasoning models.

HBM4 memory

Each Rubin GPU package contains 8 stacks of HBM4 (High Bandwidth Memory, 4th generation), each stack using a 12-Hi (12 layer) configuration. Total memory capacity per package is 288 GB. The HBM4 interface operates at 6.4 GT/s per pin across a 2048-bit memory bus, delivering approximately 22 TB/s of aggregate memory bandwidth per package.

This compares with the HBM3E memory in Blackwell Ultra, which delivered approximately 8 TB/s per package. The roughly 2.8x increase in memory bandwidth is one of the largest generational jumps in NVIDIA's roadmap history.

At the rack level, the 72-GPU NVL72 configuration aggregates 20.7 TB of HBM4 and 1.6 PB/s of HBM4 bandwidth.

Rubin CPX

Alongside the primary Rubin GPU, NVIDIA introduced a variant called the Rubin CPX (Context Phase eXtreme). While the standard Rubin GPU is optimized for both training and generation-phase inference, the CPX is purpose-built for the context or prefill phase of inference, where the model processes a large input prompt before generating any output tokens.

The CPX uses 128 GB of GDDR7 memory rather than HBM4, prioritizing high-bandwidth access to a smaller pool. It delivers 30 PetaFLOPS of NVFP4 compute and provides up to 3x faster attention calculation compared with the GB300 NVL72 baseline, according to NVIDIA's published comparisons. The lower cost of GDDR7 relative to HBM4 makes the CPX economical for the context workload, which consumes large amounts of compute but does not need the same memory capacity as the generation phase.

NVIDIA frames Rubin CPX as the answer to million-token context windows used in coding agents, video generation models, and long-form reasoning tasks where the prefill pass dominates total inference cost.

Rack-level CPX deployment

In the Vera Rubin NVL144 CPX rack, 144 Rubin CPX GPUs handle context processing while 144 standard Rubin GPUs handle token generation, alongside 36 Vera CPUs. The combined rack delivers 8 ExaFLOPS of NVFP4 compute, 7.5x more than a GB300 NVL72, with 100 TB of total memory and approximately 1.7 PB/s of memory bandwidth. This is a different SKU from the standard Vera Rubin NVL72 GPU rack, which uses 72 standard Rubin GPU packages without CPX.

NVLink 6

NVLink 6 is the sixth generation of NVIDIA's proprietary high-speed GPU interconnect, introduced with the Vera Rubin platform. It doubles the per-GPU bandwidth compared with NVLink 5 used in Blackwell, from 1.8 TB/s to 3.6 TB/s bidirectional.

SerDes and switch architecture

NVLink 6 uses 400 Gbps SerDes lanes, compared with the 200 Gbps SerDes in NVLink 5. The NVLink 6 Switch chip delivers 28.8 TB/s of aggregate switch bandwidth per tray and provides 260 TB/s of total scale-up connectivity when 72 Rubin GPUs are unified in the NVL72 configuration.

All 72 GPUs in the NVL72 rack communicate through an all-to-all NVLink topology managed by NVSwitch 6 blades. Each GPU can reach every other GPU at full NVLink bandwidth without traversing the slower PCIe or Ethernet fabric. This flat topology eliminates the bandwidth degradation that multi-hop network paths introduce and is what allows NVIDIA to describe the NVL72 rack as a single performance domain.

In-network compute

NVLink 6 Switch includes hardware support for SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) in-network compute. The switch can perform floating-point reduction operations directly on data in flight, delivering 14.4 TFLOPs of FP8 in-network compute per switch tray. This reduces the volume of data that must traverse the network during all-reduce passes in distributed training and cuts collective-operation latency.

NVLink-C2C and CPU-GPU coherence

NVLink-C2C (Chip-to-Chip), in its second generation, provides coherent interconnect between the Vera CPU and the Rubin GPU dies within the superchip at 1.8 TB/s. The coherence protocol means CPU and GPU can share data structures without software-managed copies, which is particularly useful for agentic AI frameworks where orchestration logic running on the CPU needs low-latency access to KV caches resident on GPU memory.

Networking platform

The Vera Rubin platform pairs the scale-up NVLink 6 fabric with a refreshed scale-out networking stack.

ConnectX-9 SuperNIC

ConnectX-9 SuperNICs deliver 1.6 Tb/s of per-GPU scale-out bandwidth using either Ethernet (Spectrum-X) or InfiniBand (Quantum-X800). The card supports programmable RDMA for low-latency, GPU-direct networking. Each Vera Rubin compute blade typically provisions one ConnectX-9 per Rubin GPU.

BlueField-4 DPU

BlueField-4 integrates a 64-core Grace-based control CPU, hardware-accelerated offload engines for storage and security, and an 800 Gb/s network interface. BlueField-4 anchors the storage tier of the platform through a dedicated STX storage rack, which exposes large pools of NVMe storage to the compute racks at line rate.

Spectrum-6 Ethernet Switch

Each Spectrum-6 ASIC delivers 102.4 Tb/s of switch bandwidth and supports lossless Ethernet with adaptive routing tuned for collective operations. Spectrum-6 deploys in a dedicated SPX Ethernet rack that aggregates traffic across an entire AI factory.

Silicon photonics and co-packaged optics

Vera Rubin is the first NVIDIA generation engineered around silicon photonics with co-packaged optics, the technology NVIDIA expects to use for the scale-out fabric between racks as cluster sizes push past the million-GPU mark. The Quantum-X and Spectrum-X photonics switch families, unveiled at GTC 2025 alongside the Rubin disclosure, integrate optical engines directly onto the switch ASIC package rather than using pluggable optical modules. This eliminates the discrete electrical-to-optical transceiver step that consumed both power and real estate in the prior Quantum-2 InfiniBand and Spectrum-4 Ethernet generations.

Switch family	Aggregate bandwidth	Port configuration	Availability
Quantum-X Photonics (InfiniBand)	115 Tb/s per Q3450-LD switch	144 ports at 800 Gb/s	Second half of 2025
Spectrum-X Photonics (Ethernet)	Up to 409.6 Tb/s	512 ports at 800 Gb/s or 128 ports at 800 Gb/s configurations	Second half of 2026

NVIDIA cites four headline improvements from the photonics design: 4x fewer lasers per port, 3.5x better power efficiency relative to pluggable optics, 63x better signal integrity, and 10x better network resiliency at scale. The 1.3x faster deployment claim reflects the reduced cabling and pluggable-module assembly burden in the field. The optical engines are based on a 200 Gbps-per-lane SerDes mated to TSMC's COUPE (compact universal photonic engine) process, with vertical-cavity surface-emitting laser arrays packaged on the same substrate as the switch ASIC.

For a million-GPU AI factory, NVIDIA estimates that the shift from pluggable optics to co-packaged optics saves on the order of tens of megawatts of facility power that would otherwise be consumed by the transceivers themselves, freeing that power budget for more compute. The first photonics-equipped racks are expected to deploy alongside Vera Rubin in late 2026 for InfiniBand and during the Rubin Ultra ramp in 2027 for the Ethernet side.

Production timeline and roadmap

The Vera Rubin development timeline followed NVIDIA's now-standard public roadmap pattern of progressive disclosure across keynotes:

Date	Event
June 2024	Computex 2024: First public reveal of Rubin GPU, Vera CPU, NVLink 6, CX9 SuperNIC, X1600 switch concept
March 2025	GTC 2025: Vera Rubin Superchip shown for the first time; Rubin Ultra (2027) and Feynman (2028) added to roadmap
Late 2025	CFO Colette Kress, on the Q3 FY26 earnings call (November 2025), reported $500B of Blackwell-plus-Rubin revenue visibility through end of calendar 2026
January 2026	CES 2026: Vera Rubin platform officially launched as a six-chip platform; full production declared; rack rebranded from VR200 NVL144 to VR200 NVL72
February 2026	First Vera Rubin samples shipped to early customers (per CFO commentary on Q4 FY26 earnings call)
March 2026	GTC 2026: Seven-chip platform announced with Groq 3 LPU; full DGX SuperPOD and five rack-system family detailed
April 2026	Meta and CoreWeave expand AI cloud agreement by $21 billion, naming Vera Rubin as part of the initial deployment
May 2026	Q1 FY27 earnings reported May 20; NVIDIA reiterates Rubin ramp and pilot production at ODM partners ahead of first hyperscaler shipments
Mid-2026	ODM pilot production at Foxconn and Wistron; first racks slated for delivery to North American hyperscalers in July
H2 2026	Production shipments to early customers commence; rack-scale ramp anticipated in the September to October window

NVIDIA confirmed that manufacturing uses TSMC's N3P process for the Rubin GPU compute dies, with CoWoS-L packaging. At CES in January 2026, Jensen Huang stated that all necessary chips were back from the fab, indicating silicon validation was complete. Full production was confirmed during the same keynote.

CFO Colette Kress, on the Q4 FY26 earnings call in February 2026, told investors that NVIDIA had "shipped our first Vera Rubin samples to customers earlier this week" and that the company was on track to commence production shipments in the second half of 2026. The same call provided the first concrete signal that hyperscalers had silicon in hand. NVIDIA's Q1 FY27 earnings, scheduled for May 20, 2026, were widely expected by analysts to confirm the rack-scale ramp window and reaffirm the gross-margin guidance of approximately 75 percent that the company had carried through the Blackwell production curve.

NVL72 rack system

The Vera Rubin NVL72 is the primary rack-scale deployment unit for the platform. It continues the architecture established with the GB200 NVL72 and GB300 NVL72, using the same Oberon rack chassis with cooling modifications.

NVIDIA initially planned to call this rack VR200 NVL144 to count GPU dies (72 packages times 2 dies per package). In late December 2025, NVIDIA reverted to the package-based NVL72 naming for marketing consistency with the prior Blackwell generation. References published before that date often use NVL144; references published from CES 2026 onwards use NVL72.

Configuration

Parameter	Value
Rubin GPU packages	72
Rubin GPU compute dies (total)	144
Vera CPUs	36
HBM4 memory (total)	20.7 TB
HBM4 bandwidth	1.6 PB/s
LPDDR5X memory	54 TB
NVSwitch 6 blades	9
Compute blades	18
Total transistors	approximately 220 trillion
NVFP4 inference performance	3.6 ExaFLOPS
NVFP4 training performance	2.5 ExaFLOPS
Scale-up bandwidth	260 TB/s

The rack is fully liquid cooled and accepts coolant at up to 45 degrees Celsius supply temperature, which keeps it compatible with facility cooling systems that do not chill water below data center ambient.

A meaningful operational improvement over Blackwell: NVIDIA reports that the NVL72 rack can be assembled in approximately 6 minutes, compared with around 100 minutes for the GB200 NVL72. This is achieved through pre-integrated cable management and a revised blade design.

DGX Vera Rubin SuperPOD

The DGX Vera Rubin SuperPOD groups 14 NVL72 racks together with a shared high-speed network fabric, totaling 1,008 Rubin GPUs and 504 Vera CPUs. NVIDIA cites 50.4 ExaFLOPS of NVFP4 compute and 1,046 TB of fast memory at this scale.

A separate DGX Rubin NVL8 SuperPOD configuration aggregates 64 NVL8 systems (8 Rubin GPUs each) for a total of 512 Rubin GPUs, suitable for workloads that need fewer GPUs unified at NVLink speed.

Rubin Ultra

Rubin Ultra is the planned mid-generation refresh of the Rubin platform, targeted for the second half of 2027. It follows the pattern established by Blackwell Ultra as a substantially upgraded variant on the same platform foundations. NVIDIA disclosed Rubin Ultra at GTC March 2025 alongside the Vera Rubin announcement.

GPU architecture

The Rubin Ultra GPU (VR300) moves from two to four reticle-sized compute chiplets per package. NVIDIA confirmed at GTC 2025 that Rubin Ultra would carry 1 TB of HBM4E across 16 stacks per package, with packaging on TSMC's CoWoS-L. TrendForce reporting in April 2026 indicated that NVIDIA decided to retain the multi-die assembly approach rather than attempt a monolithic super-die for the Ultra refresh.

Key Rubin Ultra GPU specifications, as published or projected:

Parameter	Rubin (VR200)	Rubin Ultra (VR300)
Compute dies	2	4
FP4 inference (per package)	50 PetaFLOPS	approximately 100 PetaFLOPS
Memory	288 GB HBM4	1 TB HBM4E
Memory stacks	8	16
Memory bandwidth (per package)	22 TB/s	approximately 32 TB/s
TDP (per package)	approximately 1,800 W	approximately 3,600 W
NVLink generation	NVLink 6	NVLink 7

The memory upgrade to HBM4E (an enhanced version of HBM4 with higher pin speeds) and 16 stacks per package raises total memory per GPU from 288 GB to approximately 1 TB.

Kyber rack system

The Rubin Ultra platform introduces a new rack architecture called the Kyber rack, which departs significantly from the Oberon design used for Blackwell, Blackwell Ultra, and standard Rubin.

In the Kyber design, compute blades rotate 90 degrees into a vertical blade form factor for higher density. The rack holds four canisters (sometimes called pods), each containing 18 compute blades, for a total of 576 GPU dies in the NVL576 configuration. A PCB backplane replaces the copper cable interconnects used in Oberon, a change made necessary by the higher density and reduced available routing space.

The Kyber rack has an estimated total power draw of 600 kW, making it the highest-power computing platform NVIDIA has designed for commercial deployment. Facilities deploying Rubin Ultra NVL576 racks must be engineered for power densities roughly equivalent to several hundred typical residential homes per rack.

System-level performance

NVIDIA cites a target of approximately 14x higher inference performance than the GB300 NVL72 for the Rubin Ultra NVL576 system, reflecting both the per-GPU performance improvement and the higher GPU count per rack. The company has published a target of approximately 15 ExaFLOPS of dense FP4 inference and 5 ExaFLOPS of FP8 training per Rubin Ultra NVL576 rack, with 365 TB of fast memory and 4.6 PB/s of HBM4E bandwidth.

Feynman successor

Feynman is the planned successor to the Rubin generation, targeting a 2028 launch. It is named after theoretical physicist Richard Feynman. NVIDIA announced Feynman's existence at GTC March 2025 alongside the Rubin announcement.

At GTC 2025, Jensen Huang stated only that "our next generation will be named after Feynman" without providing detailed specifications. By GTC 2026, NVIDIA had added several headline elements to the Feynman roadmap: 3D die stacking (NVIDIA's first use of stacked GPU dies), custom HBM memory co-designed with the GPU, and fabrication on TSMC's A16 (1.6 nm-class) node.

Feynman is expected to be paired with the Rosa CPU, a successor to Vera in the same way Vera succeeded Grace. The networking stack is projected to advance to NVLink 8, NVSwitch 8, ConnectX-10, BlueField-5, and Spectrum-7 Ethernet. These details remain forward-looking and may evolve as Feynman moves through tape-out and validation.

Performance projections

NVIDIA provides performance comparisons relative to prior generations at the system level (NVL72 or equivalent rack), rather than on a per-chip basis, to reflect real-world deployment configurations. The numbers below are NVIDIA-published marketing figures and should be read as such.

Rubin vs. Blackwell (per-GPU comparison)

Metric	Blackwell Ultra (GB300)	Rubin (VR200)	Improvement
FP4 inference	15 PetaFLOPS	50 PetaFLOPS	3.3x
FP4 training	10 PetaFLOPS	35 PetaFLOPS	3.5x
Memory capacity	288 GB HBM3E	288 GB HBM4	Same
Memory bandwidth	8 TB/s	22 TB/s	2.8x
NVLink bandwidth	1.8 TB/s	3.6 TB/s	2x
TDP (typical)	1,400 W	approximately 1,800 W	1.3x

The 3.3x FP4 inference figure reflects per-GPU compute with comparable TDP growth, derived from SemiAnalysis analysis of NVIDIA disclosures. NVIDIA's own marketing materials cite a 5x inference improvement at the rack level, measured by comparing the VR200 NVL72 against a GB200 NVL72 (rather than against Blackwell Ultra), and accounting for the disaggregated CPX configuration where applicable.

Rack-level comparison

Configuration	NVFP4 Inference	Memory	Memory BW
GB200 NVL72 (Blackwell)	approximately 720 PetaFLOPS	13.5 TB HBM3E	approximately 570 TB/s
GB300 NVL72 (Blackwell Ultra)	approximately 1 ExaFLOP	20.7 TB HBM3E	approximately 580 TB/s
VR200 NVL72 (Rubin)	3.6 ExaFLOPS	20.7 TB HBM4	1.6 PB/s
VR200 NVL144 CPX (Rubin + CPX)	8 ExaFLOPS	100 TB	approximately 1.7 PB/s
VR300 NVL576 (Rubin Ultra, 2027 target)	approximately 15 ExaFLOPS	approximately 365 TB	approximately 4.6 PB/s

Efficiency claims

NVIDIA claims approximately 10x higher inference throughput per watt for the Vera Rubin NVL72 versus the GB200 NVL72 generation. Cost per token at scale is projected by NVIDIA at approximately one-tenth of Blackwell costs, a figure that combines the throughput gains with anticipated reductions in HBM bandwidth costs as HBM4 volume production matures. With the addition of the Groq 3 LPX rack at GTC 2026, NVIDIA claimed up to 35x higher tokens-per-megawatt versus the Blackwell NVL72 alone, at a target cost point of approximately $45 per million tokens for trillion-parameter models. Both of these are vendor-published numbers and have not yet been independently validated.

Power requirements

The Rubin GPU TDP is approximately 1,800 W per package in its standard configuration. Reports in early 2026 indicated that NVIDIA raised some performance targets to widen the gap with AMD's Instinct MI400 series, lifting boost clocks and memory bandwidth on certain SKUs at a cost of an additional roughly 500 W per accelerator, bringing those configurations to 2,300 W per GPU.

At the rack level, the NVL72 with 2,300 W GPUs has a TDP of up to 220 kW. The standard 1,800 W configuration draws approximately 160 to 170 kW per rack.

For Rubin Ultra, the higher TDP of 3,600 W per package scales the NVL576 Kyber rack to an estimated 600 kW total. Infrastructure partners including CoreWeave, Lambda, Nebius, Oracle Cloud Infrastructure, and Together AI have announced plans to engineer facilities around 800-volt power distribution to accommodate these loads. NVIDIA has also developed DSX Max-Q power management software that allows up to 30 percent more infrastructure deployment within fixed power budgets by dynamically managing GPU power states.

The DSX Flex capability extends the platform to grid-flexible operation, allowing AI factories to modulate compute load in response to grid availability signals. NVIDIA describes this as a way to access what it calls 100 gigawatts of stranded grid capacity.

Customers and deployment

NVIDIA announced customer commitments for Vera Rubin from a broad set of AI labs, cloud providers, and system manufacturers. The most concrete commitments came in the November 2025 to April 2026 window, when several multi-billion-dollar capacity agreements named Vera Rubin explicitly as the initial deployment target.

AI labs and model developers

Anthropic, Meta, Mistral AI, and OpenAI were named on stage at GTC 2026 as platform customers.
The Microsoft, NVIDIA, and Anthropic three-way partnership announced on November 18, 2025 is the largest single Vera Rubin commitment publicly disclosed: Anthropic agreed to purchase up to $30 billion of compute capacity from Microsoft Azure, with an initial deployment of up to 1 GW of NVIDIA Grace Blackwell and Vera Rubin systems. As part of the same announcement, NVIDIA agreed to invest up to $5 billion in Anthropic, with Microsoft committing up to $10 billion. Anthropic was valued at approximately $350 billion in connection with the deal.

Meta and CoreWeave

On April 9, 2026, CoreWeave announced a $21 billion expansion of its AI cloud capacity agreement with Meta Platforms, running through December 2032. The expansion built on a prior $14.2 billion contract signed in 2025, bringing total committed value to roughly $35 billion. CoreWeave's announcement specifically named the NVIDIA Vera Rubin NVL72 as a meaningful share of the initial deployment, alongside ongoing GB300 NVL72 buildouts.

The deal is structured as take-or-pay committed capacity rather than elastic cloud usage. Meta pays for guaranteed GPU access regardless of utilization, locking in supply ahead of demand and giving CoreWeave the financial runway to prefinance facility construction and Rubin rack procurement. Meta cannot reduce its commitment if internal data centers come online faster than expected, and CoreWeave cannot delay delivery without penalties. The agreement targets inference at scale across Meta's Facebook, Instagram, WhatsApp, and Threads properties, not training, which Meta continues to run primarily on its own infrastructure.

The Meta-CoreWeave commitment is the largest publicly disclosed Vera Rubin order outside the Microsoft-Anthropic agreement and is widely cited by analysts as evidence that hyperscaler buyers are still willing to commit capital years in advance for Rubin-class compute despite the Stargate slowdown.

Stargate and the Microsoft-OpenAI realignment

OpenAI's Stargate program, originally announced in early 2025 as a multi-hundred-billion-dollar AI infrastructure buildout, was a heavy intended consumer of Rubin-class hardware. Through 2026 the picture changed materially:

In April 2026, Microsoft took over the 230 MW Stargate Norway site at Narvik that had been earmarked for OpenAI, contracting an additional 30,000 NVIDIA Vera Rubin GPUs from the neocloud Nscale on top of a prior $6.2 billion site commitment, with delivery scheduled for 2027. OpenAI did not finalize an agreement with Nscale and indicated it would access compute at the site through Microsoft Azure if at all.
In February 2026, OpenAI revised its long-range infrastructure spending guidance to investors, reframing a $600 billion outlay through 2030 as the working figure compared with around $1.4 trillion in long-range commitments it had previously telegraphed. OpenAI also paused a planned UK Stargate site, citing energy costs.

These reversals do not eliminate OpenAI's Vera Rubin commitments; OpenAI remains on the published Vera Rubin customer list. They do indicate that the original Stargate scale projection was elastic and that hyperscaler-led deployments (Microsoft, AWS, Google Cloud) are doing more of the actual buildout than originally pitched.

Sovereign AI deployments

Vera Rubin has become the named platform for several national-scale sovereign AI buildouts that emerged through late 2025 and into 2026. Saudi Arabia's HUMAIN project, a multi-billion-dollar program to build an Arabic-native AI ecosystem, has committed to NVIDIA hardware as its primary compute substrate, with Vera Rubin slated for the larger sites once production reaches volume. The United Arab Emirates announced Stargate UAE, a 1 GW compute cluster project led by G42 and operated jointly with OpenAI and Oracle. The first 200 MW phase, expected online in 2026, will use NVIDIA GB300 systems, with later phases earmarked for Vera Rubin once H2 2026 production shipments are in volume.

In Europe, Nebius confirmed Vera Rubin NVL72 availability across its United States and European regions in H2 2026, positioning the platform for European Union customers facing data-residency constraints. Nscale's Norway site, taken over by Microsoft in April 2026, will host the largest concentrated Rubin deployment in Europe with 30,000 GPUs scheduled for 2027 delivery. France's Mistral AI, named at GTC 2026, has confirmed Rubin capacity for its frontier model training through a combination of Microsoft Azure and Scaleway partnerships.

NVIDIA describes the sovereign AI category as a structurally different opportunity from hyperscaler sales: governments and nationally backed consortia tend to commit longer in advance, accept higher unit prices in exchange for delivery priority, and bundle hardware with NVIDIA's enterprise software stack. The company's public guidance treats sovereign AI as a high-double-digit-billion-dollar revenue line through 2028.

Cloud providers and neoclouds

AWS, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure have announced plans to offer Vera Rubin instances. CoreWeave, Crusoe, Lambda, Nebius, Nscale, and Together AI are among the earliest neocloud deployments. Nebius has confirmed Vera Rubin NVL72 availability in the United States and Europe from H2 2026.

CoreWeave is positioned as one of the first cloud providers to take delivery of Vera Rubin racks in H2 2026 and has developed a Rack Lifecycle Controller, a Kubernetes-native orchestrator that treats an entire Vera Rubin NVL72 rack as a single programmable entity. The controller spans power delivery, liquid cooling, network integration, and firmware management on the rack, exposing rack-level health and reconfiguration primitives to upper-layer schedulers. CoreWeave passed 1 GW of operating data center capacity in early 2026, on track to exceed 1.7 GW by year-end, with Vera Rubin deployments concentrated at the newer self-build sites engineered for 800-volt distribution and direct-to-chip liquid cooling. Lambda, named on the original CES 2026 customer list, has committed to Vera Rubin instances for its training cluster product line, with delivery in H2 2026 and pricing positioned below the hyperscaler reserved-instance rates.

System manufacturers

OEM partners building Vera Rubin systems include Cisco, Dell Technologies, Hewlett Packard Enterprise, Lenovo, and Supermicro. Contract manufacturers ASUS, Foxconn, GIGABYTE, Inventec, Pegatron, QCT, Wistron, and Wiwynn are producing boards and systems for cloud providers and hyperscalers.

Capital and supply chain

Vera Rubin sits at the center of NVIDIA's near-term revenue trajectory. On the Q3 FY26 earnings call (November 19, 2025), CFO Colette Kress stated that NVIDIA had visibility into approximately $500 billion of combined Blackwell and Rubin revenue from the start of calendar 2025 through the end of calendar 2026, of which approximately $150 billion had already shipped. By the Q4 FY26 call (February 25, 2026), the company reported quarterly data center revenue of approximately $50 billion and reiterated that Rubin samples were now in customer hands. Q1 FY27 results were reported on May 20, 2026, with analyst consensus pointing to roughly $78 billion in revenue, accelerating year-over-year growth above 70 percent and excluding any contribution from China data center compute.

The Rubin generation also reshapes how NVIDIA reports revenue mix. Where Hopper and early Blackwell were dominated by individual GPU and HGX baseboard sales, Rubin is sold in much larger units: the standard SKU is a full NVL72 rack, and NVIDIA's revenue per Rubin shipment carries far more value from the rack-integrated networking, switches, and software than from the GPU silicon alone. This has lengthened sales cycles but raised the average selling price per design win.

TSMC has expanded CoWoS-L capacity through 2027 to accommodate Rubin and Rubin Ultra orders, which dominate advanced packaging allocation at TSMC. The transition from Blackwell's CoWoS-L footprint to Rubin's larger 4x reticle layout consumes more wafer-equivalent capacity per package, so the same wafer-out throughput at TSMC produces fewer Rubin packages than Blackwell packages. This is one reason Rubin Ultra's four-die assembly was a significant point of supply-chain analysis: the question of whether four reticle-sized dies could fit on a single CoWoS-L substrate or whether NVIDIA would split into two paired packages was unresolved through early 2026, with TrendForce reporting in April 2026 that the dual-substrate approach (Rubin Ultra retains a multi-die on-package layout rather than splitting into paired packages) remains the path forward.

HBM4 supply, which had been a concern in mid-2025, was reported on track for Rubin's H2 2026 ramp by the end of 2025, with NVIDIA repeatedly refuting earlier delay reports. Samsung, SK Hynix, and Micron all qualified HBM4 production for Rubin, with Hynix taking the largest share of initial volume. A reported NVIDIA decision in early 2026 to push HBM4 pin speeds higher (above the JEDEC reference 6.4 GT/s for some SKUs) introduced additional qualification work but ultimately did not delay the H2 2026 ramp.

NVIDIA also disclosed in December 2025 that it had signed an approximately $20 billion licensing and talent agreement with Groq, the inference startup, to integrate Groq's LPU technology into the Vera Rubin platform. The Groq 3 LPU manufactured on Samsung 4nm is the first chip to emerge from that agreement, and the Groq 3 LPX rack pairs with Vera Rubin NVL72 inside the same AI factory. This is an unusual arrangement: NVIDIA has historically positioned its accelerator as the complete answer for inference, and integrating a competitor's silicon directly into its roadmap is a notable strategic shift.

Software stack

The Vera Rubin platform ships with a refreshed CUDA and inference software stack tuned for agentic AI workloads.

CUDA 13 and runtime

CUDA 13 is the baseline release for Rubin. It introduces native programming-model support for the third-generation Transformer Engine, the new NVFP4 quantization helpers, and the disaggregated CPX scheduling primitives that the Dynamo orchestrator builds on. The runtime extends the unified memory model first introduced with Grace Hopper to a true tri-tier addressing scheme that spans LPDDR5X (Vera CPU), HBM4 (standard Rubin GPU), and GDDR7 (Rubin CPX) without requiring application-side copies. Migration of pages between tiers is handled by hardware coherence protocols on NVLink-C2C and by software prefetch hints on NVLink 6.

CUDA 13 also raises the per-thread-block resource limits on Rubin to take advantage of the larger shared-memory partitions in the fifth-generation SM design. Code paths written for Blackwell remain binary-compatible through CUDA 13's forward-compatible PTX layer, so existing Blackwell deployments can move to Rubin without recompilation, though specific kernels often need retuning to capture the full memory-bandwidth uplift.

Third-generation Transformer Engine

The third-generation Transformer Engine extends the FP8 and FP4 mixed-precision flow with hardware-accelerated adaptive compression, and ties more tightly into the attention kernels that dominate inference time on long contexts. Per-tensor scaling has been refined so that the format selection runs at the granularity of individual transformer layers, with a fallback to BF16 on layers that show numerical sensitivity during calibration. The Transformer Engine library also exposes a programming interface for sparse mixture-of-experts attention, which has become more relevant as model architectures shift toward MoE designs at the trillion-parameter scale.

Dynamo and disaggregated serving

NVIDIA Dynamo is the inference orchestrator built on top of CUDA 13. It is designed specifically to manage Rubin plus Rubin CPX disaggregated serving. Dynamo's Smart Router routes prefill traffic to CPX nodes and decode traffic to standard Rubin nodes, while the Dynamo GPU Planner autoscales the CPX-to-Rubin ratio in response to incoming traffic shape. KV cache transfer between prefill and decode nodes uses GPU-direct RDMA over ConnectX-9, with a software path that explicitly sequences the cache layer-by-layer to overlap with the first decode steps. NVIDIA reports that this overlapped transfer reduces the effective time-to-first-token tax of disaggregation to under 5 percent of total request latency on representative long-context workloads.

Power, telemetry, and orchestration

DSX Max-Q and DSX Flex provide rack and facility-level power management for AI factories built on Vera Rubin. DSX Max-Q applies dynamic frequency and voltage scaling across an entire NVL72 rack, with the goal of fitting more deployed GPUs within a fixed facility power envelope. NVIDIA cites up to 30 percent more deployed compute under fixed power budgets, though the exact figure depends on workload shape and how much headroom an operator was carrying. DSX Flex extends this to coordinated grid-flexible operation, where racks can throttle in response to grid demand-response signals, with NVIDIA arguing that this unlocks access to so-called stranded grid capacity for AI factories sited near constrained interconnect points.

Mission Control

NVIDIA Mission Control, the cluster-management plane introduced with Blackwell, has been extended with Rubin-aware health monitoring, optical-link telemetry from the Spectrum-6 fabric, and unified observability across the new disaggregated CPX-and-Rubin topology. Mission Control includes scheduler hooks that surface CPX availability as a separate pool to upper-layer schedulers like Slurm, Run:ai, and Kubernetes-based AI platforms.

Disaggregated inference

The introduction of the Rubin CPX accelerates a broader architectural trend in large language model serving: disaggregation of the prefill (context) and decode (generation) phases. This pattern was popularized by research papers including Splitwise (Microsoft) and DistServe (UC San Diego), and was already in use at major labs running Blackwell. Vera Rubin is the first NVIDIA platform to ship dedicated silicon for the prefill side rather than asking operators to repurpose a single SKU.

In traditional inference serving, a single GPU handles both the prefill pass (processing the user's input prompt) and the decode loop (generating output tokens one at a time). These two phases have very different compute profiles. Prefill is compute-intensive and parallelizes well across many GPU cores. Decode is memory-bandwidth-intensive, as it reads the full KV cache at each generation step.

With the Rubin platform, NVIDIA's Dynamo framework can route prefill requests to Rubin CPX nodes (optimized for compute-dense prefill on GDDR7 memory) and decode requests to standard Rubin nodes (optimized for memory bandwidth on token generation with HBM4). The Groq 3 LPX rack, added at GTC 2026, takes this further by providing an even more decode-specialized accelerator with 512 MB of on-chip SRAM per die and 150 TB/s of memory bandwidth, sitting alongside Rubin GPUs in the same platform.

This three-tier disaggregation (CPX for prefill, Rubin for general inference and training, Groq 3 LPX for decode at extreme throughput) is the operational shape NVIDIA expects for trillion-parameter inference at scale. It also represents an unusual choice: NVIDIA integrating a third-party accelerator (Groq) into its own platform rather than blocking competition. The licensing arrangement is reportedly worth approximately $20 billion across the term of the partnership.

Comparison with Blackwell Ultra

Feature	Blackwell Ultra (GB300)	Vera Rubin (VR200)
Architecture	Blackwell	Rubin
CPU	72-core Grace (Arm Neoverse V2)	88-core Vera (Olympus, custom Arm)
GPU dies per package	2	2
Process node (GPU)	TSMC 4NP	TSMC N3P
HBM generation	HBM3E	HBM4
Memory per package	288 GB	288 GB
Memory bandwidth	approximately 8 TB/s	22 TB/s
FP4 inference	15 PetaFLOPS	50 PetaFLOPS
FP4 training	10 PetaFLOPS	35 PetaFLOPS
NVLink generation	NVLink 5	NVLink 6
NVLink BW per GPU	1.8 TB/s	3.6 TB/s
Transistor count	approximately 208 billion	336 billion
GPU TDP	1,400 W	approximately 1,800 to 2,300 W
NVL rack GPU count	72	72
NVL72 NVFP4 inference	approximately 1 ExaFLOP	3.6 ExaFLOPS
Production	2025	H2 2026

References

NVIDIA Newsroom. "NVIDIA Vera Rubin Opens Agentic AI Frontier." March 16, 2026. https://nvidianews.nvidia.com/news/nvidia-vera-rubin-platform
NVIDIA Newsroom. "NVIDIA Kicks Off the Next Generation of AI With Rubin: Six New Chips, One Incredible AI Supercomputer." January 5, 2026. https://nvidianews.nvidia.com/news/rubin-platform-ai-supercomputer
NVIDIA Newsroom. "NVIDIA Unveils Rubin CPX: A New Class of GPU Designed for Massive-Context Inference." https://nvidianews.nvidia.com/news/nvidia-unveils-rubin-cpx-a-new-class-of-gpu-designed-for-massive-context-inference
NVIDIA Technical Blog. "Inside the NVIDIA Vera Rubin Platform: Six New Chips, One AI Supercomputer." https://developer.nvidia.com/blog/inside-the-nvidia-rubin-platform-six-new-chips-one-ai-supercomputer/
NVIDIA Technical Blog. "NVIDIA Vera CPU Delivers High Performance, Bandwidth, and Efficiency for AI Factories." https://developer.nvidia.com/blog/nvidia-vera-cpu-delivers-high-performance-bandwidth-and-efficiency-for-ai-factories/
NVIDIA Technical Blog. "NVIDIA Rubin CPX Accelerates Inference Performance and Efficiency for 1M+ Token Context Workloads." https://developer.nvidia.com/blog/nvidia-rubin-cpx-accelerates-inference-performance-and-efficiency-for-1m-token-context-workloads/
NVIDIA Technical Blog. "NVIDIA Vera Rubin POD: Seven Chips, Five Rack-Scale Systems, One AI Supercomputer." https://developer.nvidia.com/blog/nvidia-vera-rubin-pod-seven-chips-five-rack-scale-systems-one-ai-supercomputer/
NVIDIA. "Infrastructure for Scalable AI Reasoning: NVIDIA Vera Rubin Platform." https://www.nvidia.com/en-us/data-center/technologies/rubin/
NVIDIA. "Next Gen Data Center CPU: NVIDIA Vera CPU." https://www.nvidia.com/en-us/data-center/vera-cpu/
NVIDIA. "NVIDIA Vera Rubin NVL72: Co-Designed Infrastructure for Agentic AI." https://www.nvidia.com/en-us/data-center/vera-rubin-nvl72/
NVIDIA Blog. "NVIDIA DGX SuperPOD Sets the Stage for Rubin-Based Systems." https://blogs.nvidia.com/blog/dgx-superpod-rubin/
NVIDIA Blog. "Microsoft, NVIDIA and Anthropic Announce Strategic Partnerships." November 18, 2025. https://blogs.nvidia.com/blog/microsoft-nvidia-anthropic-announce-partnership/
NVIDIA Blog. "Accelerate Everything: NVIDIA CEO Says Ahead of COMPUTEX." June 2024. https://blogs.nvidia.com/blog/computex-2024-jensen-huang/
NVIDIA Investor Relations. "Q3 FY26 Earnings Conference Call." November 19, 2025. https://s201.q4cdn.com/141608511/files/doc_financials/2026/q3/NVDA-Q3-2026-Earnings-Call-19-November-2025-5_00-PM-ET.pdf
NVIDIA Newsroom. "NVIDIA Announces Spectrum-X Photonics, Co-Packaged Optics Networking Switches to Scale AI Factories to Millions of GPUs." March 2025. https://nvidianews.nvidia.com/news/nvidia-spectrum-x-co-packaged-optics-networking-switches-ai-factories
NVIDIA. "Silicon Photonics Networking for Agentic AI." https://www.nvidia.com/en-us/networking/products/silicon-photonics/
NVIDIA Technical Blog. "Scaling AI Factories with Co-Packaged Optics for Better Power Efficiency." https://developer.nvidia.com/blog/scaling-ai-factories-with-co-packaged-optics-for-better-power-efficiency/
ServeTheHome. "NVIDIA Computex 2024 Keynote: NVIDIA Rubin GPU, Vera CPU and 1.6T Networking." June 2024. https://www.servethehome.com/nvidia-computex-2024-keynote/
ServeTheHome. "NVIDIA Launches Next-Generation Rubin AI Compute Platform at CES 2026." January 2026. https://www.servethehome.com/nvidia-launches-next-generation-rubin-ai-compute-platform-at-ces-2026/
Tom's Hardware. "Nvidia's Vera Rubin platform in depth: Inside Nvidia's most complex AI and HPC platform to date." https://www.tomshardware.com/pc-components/gpus/nvidias-vera-rubin-platform-in-depth-inside-nvidias-most-complex-ai-and-hpc-platform-to-date
Tom's Hardware. "Nvidia enterprise GPU and CPU roadmaps: Rubin, Rubin Ultra, Feynman, and silicon photonics." https://www.tomshardware.com/tech-industry/semiconductors/nvidia-enterprise-roadmap-rubin-rubin-ultra-feynman-and-silicon-photonics
Tom's Hardware. "Nvidia launches Vera Rubin NVL72 AI supercomputer at CES." https://www.tomshardware.com/pc-components/gpus/nvidia-launches-vera-rubin-nvl72-ai-supercomputer-at-ces-promises-up-to-5x-greater-inference-performance-and-10x-lower-cost-per-token-than-blackwell-coming-2h-2026
Tom's Hardware. "Nvidia announces Rubin GPUs in 2026, Rubin Ultra in 2027, Feynman also added to roadmap." https://www.tomshardware.com/pc-components/gpus/nvidia-announces-rubin-gpus-in-2026-rubin-ultra-in-2027-feynam-after
Tom's Hardware. "Nvidia delivers first Vera Rubin AI GPU samples to customers." https://www.tomshardware.com/tech-industry/artificial-intelligence/nvidia-delivers-first-vera-rubin-ai-gpu-samples-to-customers-88-core-vera-cpu-paired-with-rubin-gpus-with-288-gb-of-hbm4-memory-apiece
Tom's Hardware. "How Nvidia's $20 billion Groq 3 LPU deal reshapes the Nvidia Vera Rubin Platform." https://www.tomshardware.com/tech-industry/semiconductors/nvidias-20-billion-groq-deal-produces-its-first-chip
Tom's Hardware. "Nvidia's new silicon photonics-based 400 Tb/s switch platforms enable clusters with millions of GPUs." https://www.tomshardware.com/networking/nvidias-silicon-photonics-based-1-6-tb-s-switch-platforms-enable-clusters-with-millions-of-gpus
Tom's Hardware. "Nvidia unveils details of new 88-core Vera CPUs positioned to compete with AMD and Intel." https://www.tomshardware.com/pc-components/gpus/nvidia-unveils-details-of-new-88-core-vera-cpus-positioned-to-compete-with-amd-and-intel-new-vera-cpu-rack-features-256-liquid-cooled-chips-that-deliver-up-to-a-6x-gain-in-cpu-throughput
The Register. "Nvidia unpacks Vera Rubin rack system at CES." January 5, 2026. https://www.theregister.com/2026/01/05/ces_rubin_nvidia/
Data Center Dynamics. "Nvidia CEO announces Vera Rubin chips are in full production during CES keynote." https://www.datacenterdynamics.com/en/news/nvidia-ceo-announces-vera-rubin-chips-are-in-full-production-during-ces-keynote/
Data Center Dynamics. "Nvidia's Rubin Ultra NVL576 rack expected to be 600kW, coming second half of 2027." https://www.datacenterdynamics.com/en/news/nvidias-rubin-ultra-nvl576-rack-expected-to-be-600kw-coming-second-half-of-2027/
Data Center Dynamics. "Microsoft contracts 30,000 GPUs from Nscale at Norway data center as OpenAI drops out of latest Stargate." https://www.datacenterdynamics.com/en/news/microsoft-contracts-30000-nvidia-rubin-gpus-from-nscale-at-data-center-in-narvik-norway/
Data Center Dynamics. "CoreWeave surpasses 1GW of data center capacity, pushes more into self builds." https://www.datacenterdynamics.com/en/news/coreweave-surpasses-1gw-of-data-center-capacity-pushes-more-into-self-builds/
SemiAnalysis. "NVIDIA GTC 2025: Built For Reasoning, Vera Rubin, Kyber, CPO, Dynamo Inference, Jensen Math, Feynman." https://newsletter.semianalysis.com/p/nvidia-gtc-2025-built-for-reasoning-vera-rubin-kyber-cpo-dynamo-inference-jensen-math-feynman
SemiAnalysis. "Vera Rubin: Extreme Co-Design: An Evolution from Grace Blackwell Oberon." https://newsletter.semianalysis.com/p/vera-rubin-extreme-co-design-an-evolution
SemiAnalysis. "Another Giant Leap: The Rubin CPX Specialized Accelerator and Rack." https://newsletter.semianalysis.com/p/another-giant-leap-the-rubin-cpx-specialized-accelerator-rack
CNBC. "Anthropic valued in range of $350 billion following investment deal with Microsoft, Nvidia." November 18, 2025. https://www.cnbc.com/2025/11/18/anthropic-ai-azure-microsoft-nvidia.html
CNBC. "First look at Nvidia's AI system Vera Rubin and how it beats Blackwell." February 25, 2026. https://www.cnbc.com/2026/02/25/first-look-at-nvidias-ai-system-vera-rubin-and-how-it-beats-blackwell.html
CNBC. "OpenAI pulls back from Stargate Norway data center deal as Microsoft takes over." April 15, 2026. https://www.cnbc.com/2026/04/15/openai-stargate-norway-project-microsoft.html
CNBC. "Meta commits to spending additional $21 billion with CoreWeave as AI costs keep rising." April 9, 2026. https://www.cnbc.com/2026/04/09/meta-commits-to-spending-additional-21-billion-with-coreweave-.html
CoreWeave. "CoreWeave and Meta Announce $21 Billion Expanded AI Infrastructure Agreement." April 9, 2026. https://www.coreweave.com/news/coreweave-and-meta-announce-21-billion-expanded-ai-infrastructure-agreement
CoreWeave. "CoreWeave Extends Its Cloud Platform with NVIDIA Rubin Platform." https://www.coreweave.com/news/coreweave-extends-its-cloud-platform-with-nvidia-rubin-platform
G42. "Global Tech Alliance Launches Stargate UAE." https://www.g42.ai/resources/news/global-tech-alliance-launches-stargate-uae
CNN Business. "Vera Rubin: Nvidia just mapped out its future in AI." January 5, 2026. https://www.cnn.com/2026/01/05/tech/vera-rubin-nvidia-ai-ces
CIO Dive. "Nvidia's Rubin platform aims to cut AI training, inference costs." https://www.ciodive.com/news/nvidia-rubin-cut-ai-training-inference-costs/808915/
Constellation Research. "Nvidia outlines roadmap including Rubin GPU platform, new Arm-based CPU Vera." https://www.constellationr.com/blog-news/insights/nvidia-outlines-roadmap-including-rubin-gpu-platform-new-arm-based-cpu-vera
VideoCardz. "NVIDIA Vera Rubin NVL72 Detailed: 72 GPUs, 36 CPUs, 260 TB/s Scale-Up Bandwidth." https://videocardz.com/newz/nvidia-vera-rubin-nvl72-detailed-72-gpus-36-cpus-260-tb-s-scale-up-bandwidth
The Next Platform. "Nvidia's Vera-Rubin Platform Obsoletes Current AI Iron Six Months Ahead Of Launch." January 2026. https://www.nextplatform.com/ai/2026/01/06/nvidias-vera-rubin-platform-obsoletes-current-ai-iron-six-months-ahead-of-launch/4092179
TrendForce. "GTC 2025: NVIDIA Unveils More on Rubin GPUs, Announces Feynman for 2028 in Roadmap Update." March 2025. https://www.trendforce.com/news/2025/03/19/news-gtc-2025-nvidia-unveils-more-on-rubin-gpus-announces-feynman-for-2028-in-roadmap-update/
TrendForce. "NVIDIA's Rubin Ultra Seen Sticking to Dual-Die Design on Packaging Constraints; TSMC 3nm Demand Intact." April 2026. https://www.trendforce.com/news/2026/04/01/news-nvidias-rubin-ultra-seen-sticking-to-dual-die-design-on-packaging-constraints-tsmc-3nm-demand-intact/
VentureBeat. "Nvidia introduces Vera Rubin, a seven-chip AI platform with OpenAI, Anthropic and Meta on board." https://venturebeat.com/infrastructure/nvidia-introduces-vera-rubin-a-seven-chip-ai-platform-with-openai-anthropic
Wikipedia. "Vera Rubin." https://en.wikipedia.org/wiki/Vera_Rubin
Wikipedia. "Rubin (microarchitecture)." https://en.wikipedia.org/wiki/Rubin_(microarchitecture)
Wikipedia. "Feynman (microarchitecture)." https://en.wikipedia.org/wiki/Feynman_(microarchitecture)

NVIDIA Vera Rubin

Background

From Hopper to Blackwell

Blackwell Ultra

Announcement timeline

Naming and inspiration

Architecture overview

Vera CPU

Olympus cores

Memory and connectivity

Vera CPU rack

Rubin GPU

Transistor count and die area

Packaging and die organization

Tensor cores and Transformer Engine

HBM4 memory

Rubin CPX

Rack-level CPX deployment

NVLink 6

SerDes and switch architecture

In-network compute

NVLink-C2C and CPU-GPU coherence

Networking platform

ConnectX-9 SuperNIC

BlueField-4 DPU

Spectrum-6 Ethernet Switch

Silicon photonics and co-packaged optics

Production timeline and roadmap

NVL72 rack system

Configuration

DGX Vera Rubin SuperPOD

Rubin Ultra

GPU architecture

Kyber rack system

System-level performance

Feynman successor

Performance projections

Rubin vs. Blackwell (per-GPU comparison)

Rack-level comparison

Efficiency claims

Power requirements

Customers and deployment

AI labs and model developers

Meta and CoreWeave

Stargate and the Microsoft-OpenAI realignment

Sovereign AI deployments

Cloud providers and neoclouds

System manufacturers

Capital and supply chain

Software stack

CUDA 13 and runtime

Third-generation Transformer Engine

Dynamo and disaggregated serving

Power, telemetry, and orchestration

Mission Control

Disaggregated inference

Comparison with Blackwell Ultra

See also

References

Improve this article

NVIDIA Vera Rubin

Background

From Hopper to Blackwell

Blackwell Ultra

Announcement timeline

Naming and inspiration

Architecture overview

Vera CPU

Olympus cores

Memory and connectivity

Vera CPU rack

Rubin GPU

Transistor count and die area

Packaging and die organization

Tensor cores and Transformer Engine

HBM4 memory

Rubin CPX

Rack-level CPX deployment

NVLink 6

SerDes and switch architecture