NVIDIA Rubin Ultra
Last reviewed
Jun 3, 2026
Sources
8 citations
Review status
Source-backed
Revision
v1 · 1,829 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
8 citations
Review status
Source-backed
Revision
v1 · 1,829 words
Add missing citations, update stale details, or suggest a clearer explanation.
NVIDIA Rubin Ultra is a planned data-center GPU platform from Nvidia, positioned on the company's roadmap as the mid-cycle "Ultra" refresh of the Rubin generation. It was disclosed by chief executive Jensen Huang during the GTC 2025 keynote in March 2025 and is scheduled for the second half of 2027.[1][2][3] Rubin Ultra is the successor step to the base Vera Rubin platform that ships in 2026, and it is most closely associated with a new rack-scale system called NVL576, built around a redesigned rack architecture that NVIDIA codenamed Kyber.[1][4]
Where the base Rubin platform pairs two reticle-sized compute dies in each GPU package, Rubin Ultra doubles that to four dies per package, lifts memory to roughly one terabyte of HBM4e per package, and scales out to a 576-die NVLink domain drawing about 600 kilowatts per rack.[1][2][5] NVIDIA presented Rubin Ultra as delivering 15 exaflops of FP4 inference and 5 exaflops of FP8 training per NVL576 rack, which the company characterized as roughly a 14-fold increase over its Blackwell-generation GB300 NVL72 system.[2][3][6] As a future product, every figure here reflects what NVIDIA showed on its 2025 roadmap rather than measured silicon, and several details remained subject to change as the design matured.
At GTC 2025, NVIDIA laid out a yearly data-center cadence in which each architecture is followed roughly twelve months later by an "Ultra" variant that increases compute and memory within the same family before the next architecture arrives.[2][6] The sequence presented ran from Blackwell (2024) and Blackwell Ultra (2025) to Rubin (2026), Rubin Ultra (2027), and then a subsequent architecture named for physicist Richard Feynman (Feynman, targeted for 2028).[2][6][7] Rubin and Rubin Ultra are named for the American astronomer Vera Rubin, and both pair their GPUs with NVIDIA's Arm-based Vera (CPU).[1][2]
The roadmap also drew a sharp distinction between the base Rubin rack and the Rubin Ultra rack. Base Rubin ships in the Oberon-style NVL144 rack, while Rubin Ultra moves to the larger Kyber-based NVL576.[4][5] NVIDIA additionally introduced a specialized companion product, the Rubin CPX, aimed at the long-context "prefill" phase of inference, which sits alongside the Rubin and Rubin Ultra line rather than replacing it.
The table below summarizes the cadence and the headline figures NVIDIA presented for each step. Entries are limited to figures the company disclosed or that were widely reported from its GTC 2025 slides.
| Platform | Target window | Rack / system | GPU memory | Per-rack FP4 (dense) | Notes |
|---|---|---|---|---|---|
| Blackwell | 2024 | GB200 NVL72 | 192 GB HBM3E | reference baseline | B200 GPU |
| Blackwell Ultra | 2025 | GB300 NVL72 | 288 GB HBM3E | reference baseline | B300, 15 PFLOPS FP4 per GPU |
| Rubin | 2026 | Vera Rubin NVL144 | 288 GB HBM4 | 3.6 exaflops | 50 PFLOPS FP4 per package; 2 dies |
| Rubin Ultra | 2027 (2H) | NVL576 (Kyber) | ~1 TB HBM4e | 15 exaflops | 100 PFLOPS FP4 per package; 4 dies |
| Feynman | 2028 | (not detailed) | next-generation HBM | not disclosed | named for Richard Feynman |
NVIDIA used the GTC 2025 keynote, held in San Jose in March 2025, to extend its previously published roadmap and to put hardware mockups of the future systems on stage.[1][2] Rubin itself had been named at the prior year's event; GTC 2025 added concrete configuration names (NVL144 for Rubin, NVL576 for Rubin Ultra), confirmed Rubin Ultra for the second half of 2027, and revealed the Kyber rack and the Feynman architecture beyond it.[2][6][7]
Because Rubin Ultra was more than two years out at the time of the announcement, NVIDIA presented it largely through engineering renderings, rack mockups, and roadmap slides rather than shipping parts. The most concrete physical artifact shown was the Kyber rack design and its midplane, displayed publicly at the show.[5] As with any pre-release roadmap, the specifications were forward-looking targets, and independent analysts noted that some details (such as the exact die count per package) could still shift before production.
The defining Rubin Ultra system is the NVL576. The "576" in the name reflects a counting convention NVIDIA adopted with this generation, in which each reticle-sized compute die is counted as a "GPU."[4][8] An NVL576 rack contains 144 Rubin Ultra GPU packages, and because each package carries four compute dies, the system totals 576 GPU dies inside a single NVLink domain.[4][5][8] This contrasts with the base Vera Rubin NVL144, which uses 72 packages of two dies each for 144 dies.[4][5] Industry observers nicknamed this die-counting approach "Jensen math," noting that it makes generation-over-generation GPU counts grow faster than the number of physical packages.[4]
Kyber is the codename for the new rack architecture that houses NVL576. Reporting from the show described the rack as organized into pods, with multiple compute blades per pod and several GPU packages per blade, all liquid cooled.[1][5] A notable engineering change is the move from the cabled backplane of the Blackwell-era racks to a large copper midplane: ServeTheHome, which photographed the design at GTC, described an NVL576 Kyber midplane arranged in roughly eighteen columns and four rows of connectors, replacing the cable cartridges used in the GB200 NVL72 generation and using copper rather than optics for the NVLink switch-to-blade links.[5] NVIDIA paired the Kyber generation with an 800-volt DC power architecture intended to reduce conversion losses at rack scale.[3]
NVIDIA stated that an NVL576 rack would draw on the order of 600 kilowatts, a large increase over the roughly 120 to 140 kilowatts of a current Blackwell NVL72 rack, and described each rack as comprising on the order of 2.5 million parts.[1][3][6] The scale-up fabric for the rack uses NVIDIA's seventh-generation NVLink switching, which the company cited at around 1.5 petabytes per second of aggregate NVLink bandwidth, with scale-out networking handled by ConnectX-9 class NICs.[3][6]
Each Rubin Ultra GPU package is a multi-die assembly. NVIDIA's roadmap described four reticle-sized GPU dies per package, doubling the two-die arrangement of base Rubin, in order to increase compute density within a single socket.[2][6][8] The package integrates these compute dies together with high-bandwidth memory stacks on an advanced 2.5D package; analysts described the assembly as combining multiple reticle-scale tiles, consistent with NVIDIA's stated move to four dies.[8]
On memory, NVIDIA specified roughly one terabyte of HBM4e per Rubin Ultra package, distributed across sixteen HBM sites, compared with 288 GB of HBM4 across eight sites on the base Rubin package.[2][6][8] HBM4e is the extended, higher-bandwidth iteration of the HBM4 standard. Per-package memory bandwidth was cited at around 32 terabytes per second.[8] Aggregated across the 144 packages, NVIDIA's NVL576 figures included roughly 147 terabytes of HBM and about 365 terabytes of total "fast memory" once the Vera CPUs' attached memory is included, with HBM bandwidth on the order of 4.6 petabytes per second for the full rack.[3][6] By comparison, NVIDIA put the base Rubin NVL144 at roughly 75 terabytes of fast memory per rack, underscoring the step change Rubin Ultra represents within the same generation.[6]
Like base Rubin, Rubin Ultra is a CPU-plus-GPU platform rather than a GPU alone. It continues to pair the Rubin GPUs with NVIDIA's custom Vera CPU, an Arm-based processor that NVIDIA described as having 88 custom Arm cores with two-way simultaneous multithreading for 176 threads, connected to the GPUs over an NVLink chip-to-chip interface running at about 1.8 terabytes per second.[1][2] The Vera CPU first appears with the 2026 Rubin generation and carries forward into Rubin Ultra; NVIDIA did not change the CPU design for the Ultra step, instead scaling the GPU side and the rack around it.[2][6] The Vera CPU's on-package memory contributes the LPDDR portion of the NVL576 system's "fast memory" total alongside the GPUs' HBM4e.[6]
The headline performance figures NVIDIA attached to Rubin Ultra are stated at the NVL576 rack level. The company said a fully populated NVL576 rack would deliver up to 15 exaflops of FP4 (NVFP4) compute for AI inference and 5 exaflops of FP8 compute for AI training, using dense (non-sparse) matrices.[2][3][6] At the package level, NVIDIA cited about 100 petaflops of FP4 performance per Rubin Ultra GPU package, double the roughly 50 petaflops of the base Rubin package.[2][6][8]
NVIDIA framed these numbers as roughly a 14-fold improvement in inference and training throughput over its Blackwell Ultra GB300 NVL72 system, the contemporaneous high-end rack.[2][3][6] For context, the company put the base Rubin NVL144 at 3.6 exaflops of FP4 inference and 1.2 exaflops of FP8 training per rack, so the NVL576 figures represent a further large jump within the Rubin family.[6] These are vendor-supplied projections for unreleased hardware measured in low-precision formats, and they should be read as roadmap targets rather than independently benchmarked results.
NVIDIA scheduled Rubin Ultra for the second half of 2027, one year after the base Rubin platform's 2026 launch and one year before the Feynman architecture targeted for 2028.[1][2][6] This places it squarely within the annual "tick-tock" rhythm NVIDIA adopted after Blackwell, in which a new architecture is followed by an Ultra refresh before the next architecture arrives.
The significance of Rubin Ultra lies less in any single specification than in the scale of the system NVIDIA proposed around it. By counting reticle-sized dies as GPUs, moving to four dies per package, adopting roughly a terabyte of HBM4e per package, and binding 576 dies into one NVLink domain inside a 600-kilowatt Kyber rack, NVIDIA signaled a continued push toward ever-larger single-image AI computers optimized for large-scale reasoning and training workloads.[3][4][6] The accompanying jump in rack power, from the low hundreds of kilowatts toward 600 kilowatts, and the shift to an 800-volt power architecture and a copper midplane, also reflect the data-center facility changes NVIDIA expects its largest customers to undertake to host these systems.[3][5] Because the platform remained a roadmap item rather than shipping hardware as of its 2025 disclosure, its final specifications and exact availability could differ from the figures NVIDIA presented at GTC 2025.