HBM4
Last reviewed
Jun 3, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 · 4,289 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 · 4,289 words
Add missing citations, update stale details, or suggest a clearer explanation.
HBM4 is the fourth principal generation of high bandwidth memory (HBM), a stacked DRAM standard that the JEDEC Solid State Technology Association published as JESD270-4 on April 17, 2025.[1] The specification doubles the per-stack interface from 1,024 to 2,048 data bits, raises peak total bandwidth to roughly 2 TB/s per stack at the JEDEC reference pin rate of 8 Gb/s, and supports stack heights from 4-high through 16-high using 24-gigabit or 32-gigabit DRAM dies for cube capacities up to 64 GB.[1] SK hynix, Samsung Electronics, and Micron all began shipping HBM4 samples to customers across 2025, with mass production tied to next-generation AI accelerators including the nvidia vera rubin platform and the amd mi400 family.[2][3][4] HBM4 is the first HBM generation in which the base logic die is widely fabricated on an advanced foundry process at tsmc rather than purely on a memory vendor's in-house technology, opening room for customer-specific base dies and tighter integration with AI accelerator silicon.[5] By mid-2026 all three vendors had entered HBM4 mass production, and on June 1, 2026 NVIDIA named SK hynix, Samsung, and Micron as suppliers for the Vera Rubin platform as it entered full production.[26]
High Bandwidth Memory was first standardized by JEDEC as JESD235 in late 2013 and was conceived as a way for graphics and high-performance compute parts to escape the bandwidth and power limits of conventional GDDR memory by stacking DRAM dies on top of a base logic die and connecting them to the host processor through a silicon interposer with a very wide interface.[6] Successive revisions HBM2 (2016), HBM2E (2018-2020), HBM3 (2022), and HBM3e (2023-2024) widened capacity per stack and lifted per-pin signaling speeds while keeping the 1,024-bit per-stack interface intact.[6][7] HBM3 set the per-pin reference at 6.4 Gb/s with up to 819 GB/s per stack; HBM3e pushed signaling past 9 Gb/s in production parts and peaked above 1.2 TB/s per stack in 12-high configurations shipping in late 2024.[7]
The pressure to widen the interface again came from the rise of large language model training and inference, where the ratio of arithmetic throughput on accelerators such as NVIDIA's Hopper and Blackwell GPUs to memory bandwidth had been climbing faster than DRAM signaling could sustain.[8] Industry analysts framed this as the "memory wall" or "bandwidth wall" problem for AI accelerators, noting that effective floating-point utilization on transformer workloads is often bounded by how fast model weights and KV caches can be read from HBM rather than by compute.[8] Widening the interface from 1,024 to 2,048 bits is the principal lever HBM4 uses to address that gap, because doubling the wires per stack approximately doubles the achievable bandwidth at any given per-pin clock rate.[1]
The wider interface is feasible because the same advanced packaging techniques that AI accelerators already use, in particular TSMC's CoWoS family of 2.5D interposers, have grown to support far more interconnect routing per stack than the original HBM-era interposers.[9] HBM4 was designed alongside packaging vendors so that 2,048-bit stacks would fit within reticle-scale interposers used by NVIDIA, AMD, and Google.[9]
JEDEC published JESD270-4 "High Bandwidth Memory (HBM4) DRAM" on April 17, 2025 through its JC-42 memory committee.[1] The press release credited contributions from AMD, Cadence, Google, Meta, Micron, NVIDIA, Samsung, SK hynix, and Synopsys.[1] JEDEC framed HBM4 as an evolutionary step beyond HBM3 that retains backwards compatibility at the controller interface so that a single memory controller can in principle drive either HBM3 or HBM4 stacks.[1]
The standard fixes a reference per-pin signaling rate of 8 Gb/s, with implementations free to clock individual stacks higher within the electrical envelope.[1] At 8 Gb/s the 2,048-bit interface yields approximately 2.048 TB/s of bandwidth per stack, roughly double the per-stack bandwidth of HBM3e at the same signaling rate.[1] HBM4 doubles the number of independent channels per stack from 16 (in HBM3) to 32, each of which is further split into two pseudo-channels, giving 64 pseudo-channels per stack for finer-grained access.[1]
JEDEC also defined a wider range of supply voltages than prior HBM generations. JESD270-4 lists four allowed VDDQ values (0.7 V, 0.75 V, 0.8 V, and 0.9 V) and two VDDC core voltages (1.0 V or 1.05 V), giving vendors latitude to trade power against headroom.[1] The standard supports 4-high, 8-high, 12-high, and 16-high stack configurations using 24-gigabit or 32-gigabit dies, putting the top cube density at 64 GB (32-gigabit dies stacked 16-high).[1] To accommodate taller stacks within existing thermal and mechanical budgets, JEDEC kept a 775-micrometer nominal package thickness for 12-high and 16-high stacks; TrendForce noted this relaxed the immediate pressure on memory makers to adopt hybrid bonding for HBM4.[9]
Reliability features in JESD270-4 include Directed Refresh Management (DRFM), a mechanism intended to harden DRAM rows against row-hammer style disturbance attacks, plus broader RAS (reliability, availability, serviceability) provisions.[1] These are aimed at hyperscale deployments where soft error rates aggregated over thousands of accelerators matter for system reliability.[1]
The table below compares the principal JEDEC reference parameters for HBM3, HBM3e, and HBM4. HBM3e is not a separately numbered standard but an extension of HBM3 (JESD238) with higher signaling rates published in JEDEC's HBM3 update; production HBM3e parts from SK hynix, Micron, and Samsung commonly run at 9.2-9.6 Gb/s and 36 GB per 12-high stack.[7][10]
| Parameter | HBM3 | HBM3e | HBM4 |
|---|---|---|---|
| JEDEC document | JESD238 (2022) | JESD238 update (2023-2024) | JESD270-4 (April 2025)[1] |
| Interface width per stack | 1,024 bits | 1,024 bits | 2,048 bits[1] |
| Reference per-pin rate | 6.4 Gb/s | up to 9.2-9.6 Gb/s (vendor) | 8 Gb/s (JEDEC reference)[1][7] |
| Peak per-stack bandwidth at reference rate | ~819 GB/s | ~1.2 TB/s (12-high parts) | ~2.0 TB/s[1][7] |
| Channels per stack | 16 | 16 | 32 (64 pseudo-channels)[1] |
| Supported stack heights | up to 12-high | up to 12-high | 4, 8, 12, 16-high[1] |
| Per-die capacity | 16 Gb / 24 Gb | 24 Gb / 32 Gb | 24 Gb / 32 Gb[1] |
| Top cube capacity | 24 GB (12-high, 16 Gb dies) | 36 GB (12-high, 24 Gb dies) | 64 GB (16-high, 32 Gb dies)[1][7] |
| VDDQ | 0.4 V (HBM3) | vendor extended | 0.7 / 0.75 / 0.8 / 0.9 V[1] |
| VDDC core voltage | 1.1 V | 1.1 V | 1.0 V or 1.05 V[1] |
Production HBM4 parts have already exceeded the JEDEC reference signaling rate. SK hynix, in its September 12, 2025 development announcement, said its first generation HBM4 product targets operating speeds above 10 Gb/s while improving power efficiency by more than 40 percent compared with its HBM3e parts.[11] Micron's June 2025 sampling announcement quoted "over 11 Gb/s" pin speeds and bandwidth of greater than 2.8 TB/s per stack for the 36 GB 12-high product line.[4] Samsung's qualification samples sent to NVIDIA and AMD in late 2025 ran at 11.7 Gb/s in customer testing, above NVIDIA's reported 10 Gb/s qualification floor.[12][13] NVIDIA's Rubin program ultimately pushed required pin speeds above 11 Gb/s (around 11.7 Gb/s), which forced all three vendors to revise their HBM4 designs to hold signal integrity at the higher data rate.[26][28]
SK hynix has consistently been the first HBM vendor to ship each new generation and HBM4 is no exception. The company announced on March 18, 2025 that it had shipped the industry's first 12-high HBM4 samples to major customers, using its Advanced MR-MUF (Mass Reflow Molded Underfill) process and reaching 36 GB per cube with bandwidth above 2 TB/s, roughly 60 percent faster than its HBM3e.[3] SK hynix said it expected to complete preparations for mass production within the second half of 2025.[3]
On September 12, 2025, SK hynix announced that it had "completed development" of HBM4 and was ready to begin mass production.[11] The product runs at over 10 Gb/s per pin, exceeding the JEDEC standard of 8 Gb/s, and improves power efficiency by more than 40 percent over its prior HBM3e generation.[11] SK hynix also said HBM4 could improve AI service performance by up to 69 percent in customer testing.[11] The September release describes manufacturing using SK hynix's 1bnm (fifth-generation 10 nm class) DRAM process combined with Advanced MR-MUF stacking.[11]
TrendForce reported in mid-December 2025 that SK hynix was supplying around 20,000 to 30,000 final HBM4 samples to NVIDIA for final testing and validation, with HBM4 shipments beginning in Q4 2025 and a full ramp through 2026.[14] In late January 2026 TrendForce additionally reported that SK hynix was expected to supply roughly two-thirds of NVIDIA's HBM4 demand in 2026.[15]
By February 2026 SK hynix had moved into HBM4 mass production for the Rubin generation, having put its production system in place the previous September and cleared NVIDIA's validation on large volumes of paid samples.[15][26] Reporting at the time put SK hynix at roughly two-thirds (about 60 to 70 percent) of NVIDIA's 2026 HBM4 allocation, a share estimate attributed to industry analysts including UBS rather than to either company.[15][28] NVIDIA's Rubin design uses eight 12-high HBM4 stacks per GPU, and the push to data rates above 11 Gb/s required SK hynix to refine its 2.5D packaging and circuit design to keep stacks stable at the higher speed.[26][28]
Samsung's HBM4 effort started later than SK hynix's. The Korea Economic Daily reported on December 3, 2025 that Samsung had cleared a critical internal qualification step for its HBM4 chips and was moving toward mass production.[16] Samsung delivered engineering samples to NVIDIA in late 2025 with final qualification targeted for early 2026.[16][12]
By late January 2026 TrendForce reported that Samsung had cleared NVIDIA's and AMD's final HBM4 qualification tests at 11.7 Gb/s per pin and was set to begin mass production and shipments in February 2026.[13] Samsung's HBM4 is fabricated on its sixth-generation 1c DRAM process and pairs the memory dies with its own 4 nm logic base die rather than depending on TSMC for the base die.[13] Samsung also said it planned to roughly double its overall HBM production capacity in 2026, with HBM4 taking a growing share.[17]
By mid-2026 Samsung had become the first vendor to pass NVIDIA's HBM4 quality certification and had begun mass production and supply for the Rubin platform.[26][27] Analyst estimates put Samsung at roughly 25 to 30 percent of NVIDIA's 2026 HBM4 allocation, behind SK hynix.[26][28] Samsung also began shipping early samples of HBM4E, the seventh-generation extension aimed at the Vera Rubin Ultra platform expected in late 2027.[26][27]
Micron sampled HBM4 to multiple major AI customers on June 10, 2025.[4] The samples are 36 GB 12-high stacks built on Micron's 1-beta (1β) DRAM process, with a 2,048-bit interface delivering bandwidth above 2.0 TB/s per stack at the initial product clock, then increasing to over 2.8 TB/s at pin speeds above 11 Gb/s for the production-class part.[4] Micron quotes more than 60 percent better performance and over 20 percent better power efficiency than its HBM3e.[4]
On March 16, 2026 Micron announced that its HBM4 36 GB 12-high parts had entered high-volume production designed for NVIDIA's Vera Rubin platform, alongside Micron's PCIe Gen 6 data center SSD and SOCAMM2 module products for the same generation of NVIDIA systems.[18] Micron also said it was sampling 48 GB 16-high HBM4 stacks to customers, increasing per-placement capacity by a third over the 12-high 36 GB part.[18] When NVIDIA named its Rubin HBM4 suppliers in June 2026, Micron was the third named supplier alongside SK hynix and Samsung, taking the remaining share of allocation after the two Korean vendors.[26]
A defining structural change in HBM4 is the move away from memory-vendor-owned base dies toward base dies fabricated on advanced foundry logic processes. SK hynix and TSMC announced a memorandum of understanding on April 19, 2024 under which TSMC would manufacture the HBM4 base die using TSMC's advanced logic process, with the intent of packing additional functionality (memory controller features, custom interfaces, in-package compute hooks) into the limited base-die area.[5] The same agreement covered tighter integration between SK hynix HBM stacks and TSMC's CoWoS packaging.[5]
TrendForce later reported that SK hynix was considering TSMC's 3 nm node for the HBM4E (the post-HBM4 extension expected in 2027) logic die and was likely to use TSMC's 12 nm process for mainstream HBM4 base dies while reserving 3 nm for premium designs targeting NVIDIA's flagship GPUs and Google's TPUs.[19] Samsung took a different approach and used its own 4 nm logic process for its HBM4 base die, preserving more vertical integration.[13]
The custom base-die concept has become central to vendor differentiation. The Next Platform, SemiAnalysis, and other analysts describe HBM4 base dies as a new arena for "custom HBM," where hyperscale customers can request specific logic, I/O, or in-memory-compute features fused into the bottom of the stack and then 2.5D-integrated with their accelerator.[20]
NVIDIA's Vera Rubin platform, announced as the successor to Blackwell at GTC 2025 in March 2025 with full production targeted for 2026, is the highest-profile design win for HBM4.[21] The Rubin GPU as presented at GTC 2025 carries 288 GB of HBM4 with 13 TB/s of aggregate memory bandwidth, up from 8 TB/s on the prior Blackwell Ultra generation.[21] Rubin Ultra, scheduled for the second half of 2027, uses HBM4e (the post-JEDEC vendor extension) and carries 1 TB of memory across four reticle-sized GPUs per package with roughly 100 PetaFLOPS of FP4 compute.[21]
At GTC 2026 in March 2026, Micron, Samsung, and SK hynix all positioned HBM4 supply around the Vera Rubin ramp.[18][13][11] Micron explicitly named NVIDIA Vera Rubin as the target platform for its volume HBM4 36 GB 12-high product.[18]
On June 1, 2026, at the GTC keynote held during Computex in Taipei, NVIDIA CEO Jensen Huang confirmed that the Vera Rubin platform was in full production and named SK hynix, Samsung, and Micron as its HBM4 suppliers.[26][27] NVIDIA did not publish official per-vendor allocation figures; analyst estimates circulating around the announcement placed SK hynix at roughly 60 to 70 percent of Rubin HBM4, Samsung at about 25 to 30 percent, and Micron supplying the remainder.[26][28] Huang said NVIDIA and its manufacturing partners, including Foxconn, Quanta, and Wistron, were building Vera Rubin NVL72 rack-scale systems at scale, and that rack assembly time had fallen from roughly two hours per rack to about five minutes.[26]
AMD disclosed the amd mi400 series at its 2025 Financial Analyst Day and provided more detail at CES 2026.[22][23] The MI400 family is built on AMD's CDNA 5 architecture and uses HBM4 to reach roughly 432 GB of memory per accelerator with 19.6 TB/s of bandwidth, up from 288 GB and 8 TB/s on the prior MI355X generation.[22] AMD also said the MI400 family adopts CoWoS-L packaging in place of the CoWoS-S used in earlier Instinct parts to handle the larger 2,048-bit interface and the increased die count.[22] The MI455X variant targets large-scale AI training, the MI430X targets HPC and government workloads, and the MI440X is positioned as a rack-mounted server combining eight MI400 GPUs with an AMD EPYC Venice CPU.[23]
Reporting from late 2025 and early 2026 indicates that Samsung's HBM4 has passed Google qualification and will appear in Google's seventh-generation TPU lineage, branded TPU Ironwood and its successors.[24] Google's Ironwood announcement at Cloud Next described 192 GB of HBM per chip and 7.4 TB/s of memory bandwidth, with public materials referring to HBM3e for the launch generation and HBM4 for upcoming derivatives.[24] tpu ironwood therefore sits at the transition point in Google's TPU memory roadmap from HBM3e to HBM4.[24]
HBM4 production rests on three tightly coupled manufacturing technologies: the DRAM die process at each memory vendor, the base-die logic process at TSMC or the memory vendor's own foundry, and 2.5D advanced packaging using TSMC's CoWoS family.[5][9] In 2026 TSMC is the dominant supplier of CoWoS packaging for HBM4-based accelerators, with capacity expansion plans aimed at supporting the Vera Rubin and MI400 ramps.[9]
CoWoS-L, the variant that uses smaller silicon bridges instead of a monolithic silicon interposer, is the version most associated with HBM4-class designs because it allows package sizes approaching six times the conventional reticle limit, which is needed to host the larger GPU dies plus eight or more HBM4 stacks.[9][22] NVIDIA's Rubin platform and AMD's MI400 are both reported to use CoWoS-L; AMD has stated this directly.[22]
On the memory side, TrendForce reported on December 30, 2025 that Samsung planned to raise its overall HBM capacity by roughly 50 percent in 2026, with much of the new capacity dedicated to HBM4.[17] SK hynix similarly raised guidance for HBM capacity in 2026 and 2027.[14][15] Industry coverage at the start of 2026 framed HBM4 supply as the principal bottleneck for AI accelerator shipments through the rest of the year.[17][25]
A January 2026 report from i-Connect007 noted that broad HBM4 mass production for some customers had been delayed to the end of Q1 2026 due to spec upgrades requested by NVIDIA and product strategy changes, although Samsung was on track for a February 2026 production start with NVIDIA and AMD.[25][13]
By the first half of 2026 all three HBM4 vendors were in mass production for the Rubin generation. SK hynix entered HBM4 mass production around February 2026, Samsung began mass production and supply after becoming the first vendor to pass NVIDIA's HBM4 quality certification, and Micron reached high-volume production of its 36 GB 12-high HBM4 in March 2026.[15][18][26][27] NVIDIA's June 1, 2026 confirmation that Vera Rubin had entered full production, with all three vendors named as suppliers, marked the point at which HBM4 became a volume production part rather than a sampling-and-qualification effort.[26]
The supply picture stayed concentrated. SK hynix held the largest share of NVIDIA's 2026 HBM4 allocation at an estimated 60 to 70 percent, Samsung an estimated 25 to 30 percent, and Micron the remainder, figures attributed to industry analysts rather than to NVIDIA.[26][28] These allocation estimates should be read as analyst projections, not official splits, since NVIDIA has not published per-vendor numbers.[26] Attention also began shifting to the next extension: Samsung shipped early HBM4E samples ahead of the Vera Rubin Ultra generation due in late 2027, signaling that the HBM4 ramp and the start of HBM4E sampling overlapped through 2026.[26][27]
HBM4 is best understood as a step change in interface width relative to HBM3e, paired with a comparatively modest increase in per-pin signaling. HBM3e in production typically runs at 9.2-9.6 Gb/s on a 1,024-bit per-stack interface to give roughly 1.2 TB/s per stack and 36 GB per 12-high cube using 24-gigabit dies.[7] HBM4 at the JEDEC reference of 8 Gb/s on 2,048 bits gives about 2.0 TB/s per stack and 48-64 GB per cube depending on stack height and die density.[1] Vendors clocking HBM4 above 10-11 Gb/s in production lift per-stack bandwidth to 2.5-2.8 TB/s.[4][11][13]
Capacity per accelerator therefore scales roughly twofold from HBM3e to HBM4 at constant stack count: an eight-stack accelerator goes from around 288 GB (HBM3e, 12-high, 36 GB stacks) to 432 GB (HBM4, 12-high, 54 GB stacks at 32 Gb dies) or higher with 16-high stacks.[22] The AMD MI400 series claims 432 GB and 19.6 TB/s, and NVIDIA's Rubin claims 288 GB and 13 TB/s using a different stack count and configuration.[21][22]
Power efficiency improves through a combination of the lower VDDQ voltages allowed by JEDEC (0.7-0.9 V on HBM4 versus 0.4 V VDDQ in the HBM3 reference but with significantly higher VDDQ in vendor HBM3e extensions) and the larger interface that reduces required signaling rate per pin to hit a given bandwidth.[1] SK hynix and Micron each quote roughly 20-40 percent better power efficiency than their respective HBM3e products at iso-bandwidth.[4][11]
A second important difference is the base-die strategy. HBM3 and HBM3e base dies were fabricated by the memory vendor on a relatively mature DRAM-compatible process. HBM4 base dies are commonly fabricated on advanced foundry logic processes (TSMC 12 nm or 3 nm; Samsung 4 nm) to support more sophisticated controller logic, configurable I/O, and the option of in-stack compute.[5][13][19] This shift makes HBM4 a more flexible building block for custom accelerator designs.
HBM4 inherits the basic constraints of stacked HBM. Per-cube capacity, while higher than HBM3e, is still capped at 64 GB for the 16-high 32-gigabit configuration, far less than the hundreds of gigabytes of DDR5 a server CPU can attach.[1] HBM stacks must be 2.5D-integrated next to the accelerator on a CoWoS-class interposer; TSMC's CoWoS capacity has historically been the principal bottleneck for AI accelerator volumes, and HBM4's larger package sizes have intensified pressure on CoWoS-L supply.[9]
Thermal management is a growing concern as 12-high and 16-high stacks dissipate more power per square millimeter of stack footprint. JEDEC's decision to keep 775-micrometer package thickness for taller stacks reduced the urgency of moving to hybrid bonding for HBM4, but analysts expect HBM4E and HBM5 to require hybrid bonding to keep going.[9]
Cost remains high. HBM accounts for a large fraction of accelerator bill-of-materials at the HBM3e generation already, and HBM4's wider interface, taller stacks, and advanced-node base die push that share higher.[4][11] Reports from late 2025 and early 2026 describe a multi-quarter shortage of HBM4 supply relative to demand from NVIDIA, AMD, and hyperscale TPU customers, with Samsung's late ramp and the dependency on TSMC for both base dies and CoWoS-L packaging as principal risk factors.[14][17][25] The higher signaling NVIDIA demanded for Rubin, above 11 Gb/s rather than the JEDEC 8 Gb/s reference, also forced all three vendors to retool circuits and 2.5D packaging, which contributed to the qualification delays seen in late 2025 and early 2026.[25][26][28]
Finally, HBM4 still relies on a 2.5D side-by-side topology rather than true 3D integration of memory directly on top of the compute die. This bounds total bandwidth by the cross-section of the interposer and by the I/O ring around the compute die, which is why HBM4's 2,048-bit interface required parallel advances in CoWoS-L and large reticle packaging.[9] Future generations are widely expected to move portions of HBM stacks onto the compute die through 3D hybrid bonding, with HBM4E and HBM5 forming intermediate steps.[19][24]