High Bandwidth Memory (HBM)

AI Hardware

29 min read

Updated Jun 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 23, 2026

Fact-checked

In review queue

Sources

43 citations

Revision

v6 · 5,860 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

High Bandwidth Memory (HBM) is a JEDEC-standardised, 3D-stacked DRAM technology that delivers very high memory bandwidth and good energy efficiency to a nearby logic die, typically a GPU or AI accelerator, by vertically stacking multiple DRAM dies over a base die using through-silicon vias (TSVs) and connecting them across a very wide parallel interface: 1024 bits per stack for HBM1 through HBM3e, doubled to 2048 bits for HBM4. ^[22] It is the defining memory of the AI accelerator era: HBM is forecast to exceed 30% of total DRAM market value in 2025 while staying just above 10% of DRAM bit volume ^[21]^[41], and it carried SK hynix past Samsung to the top of the global DRAM market in the first quarter of 2025 for the first time since 1992. ^[32] The first HBM standard, JESD235, was published by JEDEC in October 2013 ^[1], and the first commercial HBM-equipped product was AMD's Fiji GPU in the Radeon R9 Fury X, launched on 24 June 2015. ^[18]

HBM has become the defining memory technology of the modern AI accelerator era. Every flagship training and inference chip from NVIDIA, AMD, Google, Intel, AWS, Huawei, and most other vendors uses HBM, because large language model training and serving are bandwidth-bound at the matrix-multiplication and attention layers. By stacking DRAM dies vertically and using a very wide bus, HBM delivers between 3x and 10x the per-package bandwidth of GDDR or LPDDR alternatives, at lower energy per bit, in exchange for significant cost and packaging complexity.

What is high bandwidth memory used for?

HBM is used as the primary working memory of datacenter AI accelerators and top-tier HPC processors, where it holds model weights, the key/value (KV) cache, activations, and optimizer state, and feeds them to the compute units fast enough to keep thousands of matrix-multiply lanes busy. It is the memory tier behind essentially every modern large language model training cluster and high-throughput inference deployment.

Why does HBM matter for AI?

Large language model inference, especially the autoregressive decode phase that generates one token at a time, is dominated by the cost of streaming model weights and the key/value cache from memory into the compute units. The arithmetic per byte transferred is low, so the accelerator's effective throughput is set by memory bandwidth rather than peak floating point rate. This is the classic memory wall, and it gets worse as models scale.

Training is more compute-bound than decode but still benefits enormously from bandwidth, because activation memory, optimizer state, and weight gradients all live in HBM and must be moved between operators many times per step. Larger HBM capacity per accelerator also reduces the need for tensor or pipeline parallelism across multiple GPUs, which lowers communication overhead and lets a single device serve a larger model.

A practical example: an NVIDIA H200 with 141 GB of HBM3e at 4.8 TB/s ^[6] can hold the weights of a Llama 3.1 70B parameter model in FP16 (around 140 GB) on a single GPU, with a small amount of room left for the KV cache and activations. The previous-generation H100 with 80 GB of HBM3 cannot fit that model in FP16 on one GPU, forcing tensor parallelism across two or more devices. The same memory uplift moves into Blackwell B200 (192 GB) and AMD's MI325X (256 GB) ^[15] and MI355X (288 GB) ^[11], where state-of-the-art LLMs at full precision can sit on one or two devices.

Generations of HBM

The table below summarises the JEDEC standards and headline specifications for each HBM generation. Per-stack bandwidth is the JEDEC ceiling at the highest data rate; commercial products often ship at lower rates initially.

Generation	JEDEC standard	Year (standard)	Data rate per pin	Bus width per stack	Max bandwidth per stack	Max die stack	Max capacity per stack
HBM (HBM1)	JESD235	October 2013 ^[1]	1.0 Gbps	1024-bit	128 GB/s	4-Hi	4 GB
HBM2	JESD235A	January 2016 ^[5]	2.0 Gbps	1024-bit	256 GB/s	8-Hi	8 GB
HBM2E	JESD235B/C/D	2018-2021	3.6 Gbps	1024-bit	461 GB/s	12-Hi	24 GB
HBM3	JESD238	January 2022 ^[2]	6.4 Gbps	1024-bit	819 GB/s	16-Hi	64 GB
HBM3e	JESD238A	2024	9.6-9.8 Gbps	1024-bit	~1.2 TB/s	12-Hi (16-Hi qual.)	36 GB (commercial), 48 GB (spec ceiling)
HBM4	JESD270-4	16 April 2025 ^[3]	8.0 Gbps	2048-bit	2.0 TB/s	16-Hi	64 GB

HBM1 (JESD235, October 2013)

JEDEC published the first HBM standard, JESD235, in October 2013 after a multi-year collaboration that started with a 2010 proposal from AMD and SK hynix. ^[1]^[5] The original engineering work was presented by Lee et al. at the 2014 IEEE International Solid-State Circuits Conference (ISSCC) in the paper "A 1.2 V 8 Gb 8-Channel 128 GB/s High-Bandwidth Memory (HBM) Stacked DRAM with Effective I/O Test Circuits." ^[4]

HBM1 specified eight independent 128-bit channels per stack, a 1.0 Gbps per-pin data rate, a 1024-bit total interface, and 128 GB/s of bandwidth per stack. ^[1] Stacks topped out at 4-Hi with 1 GB capacity at 2 Gb per die.

The technology shipped commercially in AMD's Fiji GPU, the Radeon R9 Fury X, on 24 June 2015. ^[18] Fury X used four HBM1 stacks for 4 GB of total memory across a 4096-bit aggregate bus, delivering 512 GB/s, the highest of any single GPU at the time. ^[18] The 4 GB capacity ceiling proved to be Fiji's biggest practical limitation as games started exceeding 4 GB working sets in 2015 and 2016.

HBM2 (JESD235A, January 2016)

JEDEC accepted HBM2 as JESD235A in January 2016. ^[5] HBM2 doubled the per-pin data rate to 2.0 Gbps, kept the 1024-bit interface, and reached 256 GB/s per stack. Stack heights expanded to 8-Hi and per-stack capacity to 8 GB at 8 Gb per die.

NVIDIA shipped the first HBM2 product, the Tesla P100 (Pascal), with sales beginning on 5 April 2016. P100 carried 16 GB of HBM2 across four 4-Hi stacks, delivering 720 GB/s on TSMC's CoWoS substrate. ^[7] P100 proved that compute accelerators could be built around HBM as a primary memory tier. The Tesla V100 (Volta, 2017) followed with 16 GB or 32 GB HBM2 and 900 GB/s. AMD's Vega-architecture Radeon Vega 56 and Vega 64 also used HBM2. ^[5]

HBM2E (JESD235B/C/D, 2018-2021)

HBM2E is the marketing name for JEDEC updates JESD235B (November 2018), JESD235C (January 2020), and JESD235D (February 2021), which together pushed the per-pin rate to 3.2 then 3.6 Gbps and allowed 12-Hi stacks up to 24 GB.

Samsung announced 16 GB Flashbolt HBM2E at 3.2 Gbps on 20 March 2019 (410 GB/s per stack). ^[5] SK hynix followed on 12 August 2019 with 16 GB HBM2E at 3.6 Gbps (460 GB/s per stack). ^[5] NVIDIA used HBM2E in the A100 80 GB upgrade in late 2020, reaching just over 2 TB/s of memory bandwidth (1.94 TB/s PCIe, around 2.04 TB/s SXM). ^[8] Intel's Xeon CPU Max series (Sapphire Rapids HBM), launched on 10 January 2023, was the first x86 server CPU with on-package HBM, carrying 64 GB of HBM2E for around 1 TB/s. ^[20]

HBM3 (JESD238, January 2022)

JEDEC published HBM3 as JESD238 on 27 January 2022. ^[2] HBM3 doubled the per-pin data rate to 6.4 Gbps for 819 GB/s per stack, doubled the channel count from 8 (HBM2) to 16 with two pseudo-channels each (32 effective channels), and supported 4-Hi through 16-Hi stacks with capacities up to 64 GB at the spec ceiling. ^[2] Commercial parts shipped at 24 GB (12-Hi with 2 GB dies).

NVIDIA's H100 (Hopper, March 2022) was the first HBM3 product, carrying 80 GB across five active stacks (one disabled for yield) and reaching 3.35 TB/s in the SXM variant. ^[9] AMD's Instinct MI300X, launched in late 2023, used eight HBM3 stacks for 192 GB and 5.3 TB/s, the largest HBM capacity on a single GPU at the time. ^[10]

HBM3e (JESD238A, 2024)

HBM3e is the JEDEC JESD238A revision, finalised in 2024, extending HBM3 to 9.6 Gbps per pin (some vendors push to 9.8 Gbps) for roughly 1.2 TB/s per stack. SK hynix began volume production of 8-Hi HBM3e at 24 GB per stack in March 2024, then announced volume production of the world's first 12-Hi HBM3e at 36 GB per stack on 26 September 2024. ^[12]

NVIDIA's H200 was the first HBM3e product: 141 GB across six active 24 GB stacks at 4.8 TB/s, announced at SC23 in November 2023 and reaching customers in Q2 2024. ^[6] The B200 Blackwell GPU, launched in 2024, uses eight 24 GB HBM3e stacks for 192 GB and 8 TB/s across an 8192-bit aggregate bus. AMD's MI325X (256 GB at 6 TB/s) ^[15] and MI355X (288 GB at 8 TB/s, eight 36 GB 12-Hi stacks) ^[11] also use HBM3e.

HBM4 (JESD270-4, April 2025)

JEDEC published the HBM4 standard as JESD270-4 on 16 April 2025. ^[3] HBM4's most consequential change is the doubling of the per-stack interface from 1024 to 2048 bits, which lets the standard hit 2 TB/s per stack at a relatively conservative 8 Gbps per pin. ^[3] The number of independent channels per stack doubles from 16 to 32, each with two pseudo-channels, and HBM4 supports 4-Hi, 8-Hi, 12-Hi, and 16-Hi configurations with 24 Gb or 32 Gb dies, allowing a 64 GB ceiling per stack at 16-Hi with 32 Gb dies. ^[3] Vendor-specific VDDQ levels of 0.7 V, 0.75 V, 0.8 V, or 0.9 V and VDDC of 1.0 V or 1.05 V give designers room to trade power against speed, and the standard adds Directed Refresh Management (DRFM) for stronger row-hammer mitigation. ^[13]

The wider bus is partly an admission that further increases in pin speed are running into power and signal-integrity ceilings; widening the bus is more expensive in interposer area but cheaper in joules per bit. HBM4 also opens the door to a custom logic base die fabricated on a leading-edge node such as TSMC N3, which can host more sophisticated controllers, error correction, and (in some vendor proposals) compute-in-memory primitives. ^[14]

SK hynix delivered the world's first 12-layer HBM4 samples in March 2025 and is targeting mass production in the second half of 2025. ^[14] First HBM4-equipped accelerators are expected in 2026, with NVIDIA's Rubin generation and AMD's MI400 series among the named consumers.

The mass production race tightened through late 2025 and early 2026. SK hynix announced on 12 September 2025 that it had completed HBM4 development and established what it called the industry's first HBM4 mass production system, with stacks running above 10 Gbps per pin, faster than the 8 Gbps JEDEC base speed, and a claimed power efficiency improvement of more than 40% over the previous generation. ^[23] The company framed the milestone as the start of a new memory era: "SK hynix has become the world's first to complete the development of HBM4 and build a mass-production system," it said, calling HBM4 "a new paradigm beyond the limits of memory." ^[23] Micron shipped 36 GB 12-Hi HBM4 samples to key customers on 10 June 2025, built on its 1-beta DRAM process and quoting more than 2.0 TB/s of bandwidth per stack ^[24], then announced at GTC in March 2026 that the part was in high-volume production for NVIDIA's Vera Rubin platform, with pin speeds above 11 Gbps and per-stack bandwidth greater than 2.8 TB/s, a 2.3x bandwidth and more than 20% power-efficiency gain over its HBM3E, alongside samples of a 48 GB 16-Hi part. ^[25] Samsung was reported in January 2026 to have cleared final HBM4 qualification at NVIDIA and AMD with a 11.7 Gbps data rate that exceeds the 10 Gbps both customers required, with official shipments set to begin in February 2026. ^[26]

On the demand side, NVIDIA detailed the Vera Rubin platform in January 2026: each Rubin GPU carries up to 288 GB of HBM4 across eight stacks at an aggregate bandwidth of up to 22 TB/s ^[27], roughly 2.75x the 8 TB/s of HBM3e on Blackwell, and the first Vera Rubin samples reached customers in February 2026 ahead of production systems planned for the second half of 2026. ^[28] AMD's Instinct MI400 series, previewed in June 2025 and due in 2026, attaches twelve HBM4 stacks per GPU for 432 GB of capacity and about 19.6 TB/s of bandwidth, and a 72-GPU Helios rack aggregates 31 TB of HBM4. ^[29] The HBM4 supply itself was reported to be heavily concentrated: TrendForce and Korean press in January 2026 put SK hynix at roughly two-thirds (mid-50% to near-70%) of NVIDIA's Vera Rubin HBM4 allocation, with Samsung in the mid-20% range and Micron near 20%. ^[42]

HBM4E (planned 2026-2027)

Industry roadmaps from SK hynix, Samsung, and Micron point to an HBM4E refresh in the 2026-2027 window with higher pin speeds (around 10 Gbps), larger 16-Hi commercial stacks, and continued integration with custom logic base dies. ^[14] Specific targets cluster around 2.5-3 TB/s and 48-64 GB per stack.

NVIDIA's GTC 2025 roadmap placed HBM4E in its Rubin Ultra generation, due in the second half of 2027: each Rubin Ultra package combines four reticle-sized GPU dies with 1 TB of HBM4E, and the NVL576 rack built around it is specified at 365 TB of fast memory. ^[30]

Architecture

An HBM stack is a vertically integrated structure. At the bottom sits a base die (sometimes called a buffer or logic die) that handles the interface to the host, including the physical layer (PHY), arbitration, refresh control, and built-in self-test. Above the base die sit between two and sixteen DRAM dies, depending on the configuration: 4-Hi (four dies), 8-Hi (eight dies), 12-Hi (twelve dies), or 16-Hi (sixteen dies).

The dies in the stack are interconnected by through-silicon vias (TSVs), vertical conductors that run through holes etched in the silicon and filled with copper. TSVs are the enabling technology that makes HBM possible: traditional wire bonding cannot support the thousands of signals needed to expose a 1024-bit interface, but a 2D arrangement of TSVs can. Microbumps connect adjacent dies. ^[22]

For HBM1 through HBM3e the per-stack interface is 1024 data bits, organised as 8 channels of 128 bits (HBM1, HBM2) or 16 channels of 64 bits with two pseudo-channels each (HBM3, HBM3e). ^[2] HBM4 doubles this to 2048 data bits across 32 channels. ^[3]

Because the interface is so wide, HBM cannot use a normal printed circuit board. The dominant approach is a passive silicon interposer (TSMC's CoWoS-S, for example), where the GPU die and the HBM stacks sit side by side on top of a large piece of silicon that carries thousands of fine-pitch traces between them. CoWoS-L uses a redistribution layer with embedded silicon bridges to cover larger areas; Intel's EMIB takes a related approach with embedded silicon bridges in an organic substrate. ^[16]

A modern AI accelerator typically integrates four to eight HBM stacks around a central logic die. NVIDIA's H100 SXM has six HBM3 sites with five active (the sixth disabled for yield), the H200 has the same arrangement with HBM3e, and the B200 has eight HBM3e stacks across two compute chiplets. AMD's MI300X and MI325X have eight HBM stacks each. ^[10]

Manufacturing and packaging

HBM is produced by exactly three companies: SK hynix, Samsung Electronics, and Micron Technology. ^[17] The market structure has been unusually concentrated and unusually dynamic over the last few years.

SK hynix has led through most of the HBM3 and HBM3e ramp. The company shipped the first 8-layer HBM3e to NVIDIA in March 2024, followed by the first 12-layer 36 GB HBM3e in volume production in September 2024. ^[12] SK hynix delivered the first 12-layer HBM4 samples in March 2025. ^[14]

Micron entered HBM3e late but qualified its 24 GB 8-Hi HBM3e parts at 9.6 Gbps for the H200 and B200 ramps in 2024 and is now a meaningful supplier of HBM3e and HBM4.

Samsung was historically the largest DRAM maker but stumbled on HBM3e qualification at NVIDIA through 2024, with reports citing thermal and power-consumption issues. Samsung passed NVIDIA's qualification for 12-layer HBM3e in September 2025 and began HBM3e shipments to NVIDIA in Q3 2025. In the interim, Samsung remained the dominant HBM3e supplier to Google's TPU program. TrendForce reported that the September 2025 approval ended an 18-month qualification setback at NVIDIA. ^[34] KED Global reported in October 2025 that Samsung had sold out its 2026 HBM supply after the NVIDIA shipments began. ^[33]

On the packaging side, the binding constraint for the AI accelerator industry has been TSMC's CoWoS capacity, the advanced packaging line that integrates HBM stacks with logic dies on a silicon interposer. ^[16] TSMC CEO C.C. Wei has stated that CoWoS capacity is sold out through 2025 and into 2026. NVIDIA reportedly secures over 60% of TSMC's CoWoS allocation. TSMC plans to scale CoWoS from roughly 35,000 wafers per month at the end of 2024 to about 130,000 wafers per month by the end of 2026.

Yield is a real constraint. Stacking 12 or 16 thin DRAM dies, drilling thousands of TSVs through each, and bonding them all to a base die without a single short or open is hard. The shift to hybrid bonding, where copper-to-copper bonds replace microbumps, is one path to denser stacks at acceptable yield, and SK hynix and Samsung have both begun that transition for HBM4 and beyond.

AI accelerators using HBM

The table below lists the major AI and HPC accelerators that use HBM, their HBM generation, and their per-device memory configuration.

Accelerator	Vendor	HBM generation	Capacity	Bandwidth	Year
Tesla P100	NVIDIA	HBM2	16 GB	720 GB/s	2016 ^[7]
Tesla V100	NVIDIA	HBM2	16 GB / 32 GB	900 GB/s	2017
A100 (40 GB)	NVIDIA	HBM2	40 GB	1.55 TB/s	2020
A100 (80 GB)	NVIDIA	HBM2E	80 GB	1.94-2.04 TB/s	2020 ^[8]
H100 SXM	NVIDIA	HBM3	80 GB	3.35 TB/s	2022 ^[9]
H200	NVIDIA	HBM3e	141 GB	4.8 TB/s	2024 ^[6]
B200 (Blackwell)	NVIDIA	HBM3e	192 GB	8.0 TB/s	2024-2025
GB200 NVL72 (per GPU)	NVIDIA	HBM3e	192 GB	8.0 TB/s	2024-2025
B300 (Blackwell Ultra)	NVIDIA	HBM3e	288 GB	8.0 TB/s	2025 ^[30]
Rubin (Vera Rubin)	NVIDIA	HBM4	up to 288 GB	up to 22 TB/s	2026 ^[27]
MI200 / MI250X	AMD	HBM2E	128 GB	3.2 TB/s	2021-2022
MI300A	AMD	HBM3	128 GB	5.3 TB/s	2023
MI300X	AMD	HBM3	192 GB	5.3 TB/s	2023 ^[10]
MI325X	AMD	HBM3e	256 GB	6.0 TB/s	2024 ^[15]
MI355X (MI350 series)	AMD	HBM3e	288 GB	8.0 TB/s	2025 ^[11]
MI455X (MI400 series)	AMD	HBM4	432 GB	19.6 TB/s	2026 ^[29]
TPU v4	Google	HBM2	32 GB	1.2 TB/s	2021
TPU v5p	Google	HBM2E/HBM3	95 GB	2.8 TB/s	2023 ^[38]
TPU v7 (Ironwood)	Google	HBM3e	192 GB	7.37 TB/s	2025
Trainium2	AWS	HBM3	96 GB	~2.9 TB/s	2024
Gaudi 2	Intel	HBM2E	96 GB	2.45 TB/s	2022
Gaudi 3	Intel	HBM2E	128 GB	3.7 TB/s	2024
Ascend 910B	Huawei	HBM2E	64 GB	~1.6 TB/s	2023
Dojo D1	Tesla	HBM	n/a (training tile uses local SRAM and HBM-class peripheral memory)	n/a	2022
SN40L	SambaNova	HBM3	64 GB on-package + 1.5 TB DDR	1.6 TB/s HBM	2024
Xeon CPU Max	Intel	HBM2E	64 GB	~1.0 TB/s	2023 ^[20]
A64FX (Fugaku)	Fujitsu	HBM2	32 GB	1.0 TB/s	2020

A notable counterexample is Cerebras's Wafer-Scale Engine (WSE-2, WSE-3), which uses 44 GB of on-die SRAM rather than HBM, trading capacity for bandwidth and avoiding HBM's packaging constraints entirely. Most other large accelerator vendors have concluded that HBM is the right operating point for the foreseeable future.

Comparison with alternative memory technologies

Memory type	Typical bandwidth per device	Typical capacity per device	Energy per bit	Packaging	Where it is used
HBM3e	1.2 TB/s per stack, ~8 TB/s per GPU with 8 stacks	24-36 GB per stack, up to 288 GB per accelerator	Lowest of the four (rough rule of thumb: about 3-4 pJ/bit including PHY)	2.5D silicon interposer (CoWoS, EMIB)	Datacenter AI accelerators, top-tier HPC
GDDR7	~1.5 TB/s on a 384-bit bus	24-32 GB	Higher than HBM (around 7-8 pJ/bit)	Standard PCB, on-board chips	Consumer GPUs (RTX 5090), some workstation cards
GDDR6/6X	~1 TB/s on a 384-bit bus	16-24 GB	Higher than HBM	Standard PCB, on-board chips	Consumer GPUs (RTX 4090), inference cards (L40, L40S)
LPDDR5/5X	~100-500 GB/s depending on width	8-512 GB	Lowest absolute power, but lower bandwidth	LPCAMM or soldered on-board	Apple unified memory (M-series, Mac Studio, Mac Pro), edge AI, mobile, NVIDIA Grace and GB200 CPU complex
DDR5	50-100 GB/s per CPU socket	up to terabytes per socket via DIMMs	Moderate	DIMM modules, off-board	Server CPUs and general-purpose computing

The shorthand that captures the trade-off: HBM is the fastest and most energy-efficient on a per-bit basis but the most expensive and packaging-constrained; GDDR is the next step down in bandwidth but uses cheap on-board chips and standard PCBs; LPDDR is the lowest-power and offers the largest capacities at moderate bandwidth, which is why Apple uses it for unified memory and why NVIDIA pairs LPDDR5X with HBM3e on the GB200 (HBM3e on the Blackwell GPUs, LPDDR5X on the Grace CPU); DDR is the workhorse of CPU memory and stays on standard DIMM form factors.

The prefill/decode split in LLM serving has also begun to erode the assumption that every datacenter inference GPU needs HBM. In September 2025 NVIDIA announced Rubin CPX, a GPU aimed specifically at the compute-bound prefill (context-processing) phase of inference, pairing 30 petaflops of NVFP4 compute with 128 GB of GDDR7 and no HBM at all; the bandwidth-bound decode phase stays on HBM-equipped GPUs in the same rack. ^[31]

Why does HBM dominate AI accelerators?

The core reason is bandwidth per package. Transformer inference reads the entire model's weights from memory once per token, and the larger the model, the more bytes per token. At a given hardware spend, the only way to push token-generation latency down is to push memory bandwidth up. HBM's 3D-stacked architecture and 1024-bit (now 2048-bit) interface deliver several times the bandwidth per package that GDDR can manage at sane power budgets.

Capacity matters too. HBM stacks have grown from 1 GB (HBM1, 4-Hi) to 36 GB (HBM3e, 12-Hi), letting a single accelerator hold more model weights and a larger KV cache. The H200's 141 GB ^[6] and the MI355X's 288 GB ^[11] enable single-device serving of 70B and 200B parameter models without aggressive quantisation.

Energy per bit is the third lever. At the scale of a 100,000-GPU cluster running at 700-1400 W per accelerator, HBM's joules-per-bit advantage over GDDR translates into measurable power savings at the rack and datacenter level. NVIDIA CEO Jensen Huang has argued that this flexibility is exactly why HBM, rather than narrowly optimised on-chip SRAM, remains the default: HBM-equipped GPUs can serve a wide range of model sizes and workloads, and at Computex 2026 Huang underscored the resulting supply pressure by writing "Please Make More" and signing his name on an HBM4E wafer at SK hynix's booth. ^[43]

Limitations

The most-cited drawback is cost. HBM3e parts cost roughly $7-10 per GB at wholesale according to industry analyst estimates, compared to perhaps $2-3 per GB for GDDR6 and well under $1 per GB for commodity DDR5. ^[17] A B200 GPU with eight 24 GB HBM3e stacks is reported to carry roughly $2,400 of memory cost, more than the logic die itself. HBM is now estimated to account for 30-40% of the manufacturing cost of a flagship AI accelerator.

Advanced packaging is the second drawback. HBM cannot be soldered onto a normal PCB and requires CoWoS, EMIB, or a similar 2.5D substrate. ^[16] CoWoS capacity at TSMC has been the binding constraint on AI accelerator volumes through 2024 and 2025.

Thermal management is the third issue. Stacking 12 or 16 DRAM dies creates a vertical column with non-trivial thermal resistance, and the dies at the top of the stack run hotter than those at the bottom. The base die also acts as a thermal floor between the DRAM and the logic die, which sits on the same interposer and runs at hundreds of watts.

Finally, IO area on the logic die is a hard constraint. The PHY for an HBM3e stack consumes a non-trivial fraction of the GPU's perimeter, and the number of stacks that fit around a single die is bounded by the reticle limit and bump pitch. NVIDIA's B200 splits the GPU into two reticle-limited dies on a CoWoS-L substrate partly to expose enough perimeter for eight HBM3e stacks.

Industry dynamics in 2024-2026

The HBM market has been one of the most-watched corners of the semiconductor industry through the LLM boom. SK hynix's lead in HBM3 and HBM3e gave it the dominant supply position into NVIDIA's H100, H200, and B200 ramps. ^[17] TrendForce projected that HBM would account for over 20% of total DRAM market value in 2024 and over 30% in 2025, despite representing under 10% of bit volume. ^[21] By bit volume, HBM is estimated to have risen from about 2% of total DRAM in 2023 to 5% in 2024 and to surpass 10% in 2025, a small share of capacity that captures an outsized share of revenue. ^[41]

Micron's HBM3e qualification in 2024 added a meaningful second supplier to NVIDIA. Samsung's qualification stumbles in 2024 cost the company a generation of HBM3e supply at NVIDIA, but Samsung remained the dominant HBM supplier to Google's TPU program (estimates suggest over 60% of TPU HBM3e share) and recovered HBM3e qualification at NVIDIA in September 2025. Reports in late 2025 indicated all three suppliers had sold out 2026 HBM production. The Korean government has formally designated HBM as a strategic technology.

The HBM cycle reordered the memory industry's league tables. Counterpoint Research reported that SK hynix overtook Samsung in global DRAM revenue in the first quarter of 2025, the first change at the top since Samsung took the lead in 1992, with roughly 36% of the market on the strength of about 70% of HBM revenue. ^[32] Industry trackers put second-quarter 2025 HBM shipment shares at about 62% for SK hynix, 21% for Micron, and 17% for Samsung. ^[40] For full-year 2025, SK hynix also passed Samsung in annual operating profit for the first time. ^[35] The boom spilled into conventional memory: with HBM absorbing wafer capacity and AI servers soaking up DDR5 output, TrendForce raised its forecast for fourth-quarter 2025 conventional DRAM contract prices to 18-23% growth quarter on quarter ^[36], and in June 2026 projected that the tight supply would push HBM contract prices sharply higher again in 2027. ^[39]

Recent developments and the road to HBM4

Three threads dominate the HBM roadmap from 2024 through 2027.

First, HBM3e capacity continues to ramp. 8-Hi 24 GB stacks at 9.6 Gbps are in volume production at all three vendors; 12-Hi 36 GB stacks went into volume at SK hynix in September 2024 ^[12], with Micron and Samsung close behind. 16-Hi HBM3e qualification is targeted for 2025.

Second, HBM4 is now standardised (JESD270-4, April 2025) and approaching commercial production. The doubled 2048-bit interface and the option of a custom logic base die fabricated on a leading-edge process are the two structural changes that distinguish HBM4 from a routine speed bump. SK hynix delivered the first 12-Hi HBM4 samples in March 2025 and is targeting 2H 2025 mass production. ^[14] NVIDIA's Rubin generation and AMD's MI400 series are widely expected to be HBM4-based, shipping in 2026.

Third, processing-in-memory (PIM). Samsung's HBM2-PIM (Aquabolt-XL), introduced in 2021 at Hot Chips 33, embeds 16-wide SIMD engines inside the DRAM banks, claiming around 2x system-level performance and 60-70% energy reduction on memory-bound workloads. ^[19] SK hynix and Samsung are cooperating on an LPDDR6-PIM standard at JEDEC. None of these PIM variants has displaced standard HBM in mainstream accelerators yet.

The other structural shift is hybrid bonding, where adjacent dies in the stack are joined by direct copper-to-copper bonds rather than microbumps. Hybrid bonding allows finer pitches, lower thermal resistance, and taller stacks at acceptable yield, and is on the roadmap for 16-Hi HBM and beyond. Custom HBM, where a customer specifies a non-JEDEC-standard base die optimised for its accelerator, is also emerging.

The custom-HBM turn became concrete in April 2024, when SK hynix and TSMC signed a memorandum of understanding to build HBM4 base dies on TSMC's advanced logic processes (SK hynix had fabricated base dies in-house through HBM3e) and to co-optimise HBM integration with TSMC's CoWoS packaging for shared customers. ^[37]

References

JEDEC Solid State Technology Association. "JESD235: High Bandwidth Memory (HBM) DRAM." October 2013. https://www.jedec.org/standards-documents/docs/jesd235a ↩
JEDEC. "JEDEC Publishes HBM3 Update to High Bandwidth Memory (HBM) Standard." Press release, 27 January 2022. https://www.jedec.org/news/pressreleases/jedec-publishes-hbm3-update-high-bandwidth-memory-hbm-standard ↩
JEDEC. "JEDEC and Industry Leaders Collaborate to Release JESD270-4 HBM4 Standard." Press release, 16 April 2025. https://www.jedec.org/news/pressreleases/jedec%C2%AE-and-industry-leaders-collaborate-release-jesd270-4-hbm4-standard-advancing ↩
Lee, D. U., Kim, K. W., Kim, K. W., et al. "A 1.2 V 8 Gb 8-Channel 128 GB/s High-Bandwidth Memory (HBM) Stacked DRAM with Effective I/O Test Circuits." 2014 IEEE International Solid-State Circuits Conference (ISSCC), Digest of Technical Papers, pp. 432-433, 9-13 February 2014. https://ieeexplore.ieee.org/document/6757501 ↩
Wikipedia. "High Bandwidth Memory." https://en.wikipedia.org/wiki/High_Bandwidth_Memory ↩
NVIDIA Corporation. "H200 Tensor Core GPU" product page and datasheet. https://www.nvidia.com/en-us/data-center/h200/ ↩
NVIDIA Corporation. "NVIDIA Tesla P100 Datasheet." October 2016. https://images.nvidia.com/content/tesla/pdf/nvidia-tesla-p100-PCIe-datasheet.pdf ↩
NVIDIA Corporation. "NVIDIA A100 Tensor Core GPU Datasheet." June 2021. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/nvidia-a100-datasheet-us-nvidia-1758950-r4-web.pdf ↩
NVIDIA Corporation. "NVIDIA Hopper Architecture In-Depth." NVIDIA Technical Blog. https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/ ↩
AMD Corporation. "Instinct MI300X Platform Data Sheet." https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/data-sheets/amd-instinct-mi300x-platform-data-sheet.pdf ↩
AMD Corporation. "AMD Instinct MI350 Series GPUs." https://www.amd.com/en/products/accelerators/instinct/mi350.html ↩
SK hynix. "SK hynix Begins Volume Production of the World's First 12-Layer HBM3E." Press release, 26 September 2024. https://news.skhynix.com/sk-hynix-begins-volume-production-of-the-world-first-12-layer-hbm3e/ ↩
Tom's Hardware. "JEDEC finalizes HBM4 memory standard with major bandwidth and efficiency upgrades." 17 April 2025. https://www.tomshardware.com/pc-components/ram/jedec-finalizes-hbm4-memory-standard-with-major-bandwidth-and-efficiency-upgrades ↩
Tom's Hardware. "HBM roadmaps for Micron, Samsung, and SK hynix: To HBM4 and beyond." https://www.tomshardware.com/tech-industry/semiconductors/hbm-roadmaps-for-micron-samsung-and-sk-hynix-to-hbm4-and-beyond ↩
Tom's Hardware. "AMD's Instinct MI325X smiles for the camera: 256 GB of HBM3E." https://www.tomshardware.com/tech-industry/artificial-intelligence/amds-instinct-mi325x-smiles-for-the-camera-256-gb-of-hbm3e ↩
SemiAnalysis. "AI Capacity Constraints, CoWoS and HBM Supply Chain." https://newsletter.semianalysis.com/p/ai-capacity-constraints-cowos-and ↩
SemiAnalysis. "Scaling the Memory Wall: The Rise and Roadmap of HBM." https://newsletter.semianalysis.com/p/scaling-the-memory-wall-the-rise-and-roadmap-of-hbm ↩
AnandTech / VideoCardz. "AMD launches Radeon R9 Fury X, first graphics card with High-Bandwidth-Memory." June 2015. https://videocardz.com/56911/amd-launches-radeon-r9-fury-x-first-graphics-card-with-high-bandwidth-memory ↩
Kim, J. H., Kim, J. H., et al. "Aquabolt-XL: Samsung HBM2-PIM with In-Memory Processing for ML Accelerators and Beyond." Hot Chips 33 (2021). https://www.hc33.hotchips.org/assets/program/conference/day1/20210813_HC33_Aquabolt-XL_PIM_Jin_Kim_slide.pdf ↩
Intel Corporation. "Intel Launches 4th Gen Xeon Scalable Processors, Max Series CPUs and GPUs." Press release, 10 January 2023. https://www.intc.com/news-events/press-releases/detail/1598/intel-launches-4th-gen-xeon-scalable-processors-max-series ↩
TrendForce. "HBM Prices to Increase by 5-10% in 2025, Accounting for Over 30% of Total DRAM Value." 6 May 2024. https://www.trendforce.com/presscenter/news/20240506-12125.html ↩
Wevolver. "What is HBM (High Bandwidth Memory)? Deep Dive into Architecture, Packaging, and Applications." https://www.wevolver.com/article/what-is-hbm-high-bandwidth-memory-deep-dive-into-architecture-packaging-and-applications ↩
SK hynix. "SK hynix Completes World's First HBM4 Development and Readies Mass Production." Press release, 12 September 2025. https://news.skhynix.com/sk-hynix-completes-worlds-first-hbm4-development-and-readies-mass-production/ ↩
Micron Technology. "Micron Ships HBM4 to Key Customers to Power Next-Gen AI Platforms." Press release, 10 June 2025. https://www.globenewswire.com/news-release/2025/06/10/3096784/14450/en/Micron-Ships-HBM4-to-Key-Customers-to-Power-Next-Gen-AI-Platforms.html ↩
Micron Technology. "Micron in High-Volume Production of HBM4 Designed for NVIDIA Vera Rubin, PCIe Gen6 SSD and SOCAMM2." Press release, 16 March 2026. https://investors.micron.com/news-releases/news-release-details/micron-high-volume-production-hbm4-designed-nvidia-vera-rubin ↩
TrendForce. "Samsung Reportedly Set to Begin Official HBM4 Shipments to NVIDIA and AMD in February." 26 January 2026. https://www.trendforce.com/news/2026/01/26/news-samsung-reportedly-set-to-begin-official-hbm4-shipments-to-nvidia-and-amd-in-february/ ↩
NVIDIA Corporation. "Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer." NVIDIA Technical Blog, 5 January 2026. https://developer.nvidia.com/blog/inside-the-nvidia-rubin-platform-six-new-chips-one-ai-supercomputer/ ↩
Tom's Hardware. "Nvidia delivers first Vera Rubin AI GPU samples to customers - 88-core Vera CPU paired with Rubin GPUs with 288 GB of HBM4 memory apiece." February 2026. https://www.tomshardware.com/tech-industry/artificial-intelligence/nvidia-delivers-first-vera-rubin-ai-gpu-samples-to-customers-88-core-vera-cpu-paired-with-rubin-gpus-with-288-gb-of-hbm4-memory-apiece ↩
TechPowerUp. "AMD Previews 432 GB HBM4 Instinct MI400 GPUs and Helios Rack-Scale AI Solution." June 2025. https://www.techpowerup.com/337987/amd-previews-432-gb-hbm4-instinct-mi400-gpus-and-helios-rack-scale-ai-solution ↩
Tom's Hardware. "Nvidia announces Rubin GPUs in 2026, Rubin Ultra in 2027, Feynman also added to roadmap." 18 March 2025. https://www.tomshardware.com/pc-components/gpus/nvidia-announces-rubin-gpus-in-2026-rubin-ultra-in-2027-feynam-after ↩
NVIDIA Newsroom. "NVIDIA Unveils Rubin CPX: A New Class of GPU Designed for Massive-Context Inference." Press release, 9 September 2025. https://nvidianews.nvidia.com/news/nvidia-unveils-rubin-cpx-a-new-class-of-gpu-designed-for-massive-context-inference ↩
The Korea Herald. "SK hynix overtakes Samsung in global DRAM market for 1st time in Q1: report." April 2025. https://www.koreaherald.com/article/10502835 ↩
KED Global. "Samsung sells out 2026 HBM supply after starting Nvidia shipments in Q3." 30 October 2025. https://www.kedglobal.com/earnings/newsView/ked202510300005 ↩
TrendForce. "Samsung 12H HBM3e Reportedly Clears NVIDIA Tests After 18-Month Setback, HBM4 Reaches Final Phase." 22 September 2025. https://www.trendforce.com/news/2025/09/22/news-samsung-12h-hbm3e-reportedly-clears-nvidia-tests-after-18-month-setback-hbm4-reaches-final-phase/ ↩
CNBC. "SK Hynix overtakes Samsung in annual profit for the first time as AI reshapes rivalry." 29 January 2026. https://www.cnbc.com/2026/01/29/sk-hynix-beats-samsung-2025-profit-ai-memory-hbm.html ↩
TrendForce. "Tight DRAM Supply to Boost DDR5 Contract Prices - Profitability in 2026 Expected to Surpass HBM3e." Press release, 29 October 2025. https://www.trendforce.com/presscenter/news/20251029-12758.html ↩
SK hynix. "SK hynix Partners With TSMC to Strengthen HBM Technological Leadership." Press release, 19 April 2024. https://news.skhynix.com/sk-hynix-partners-with-tsmc-to-strengthen-hbm-technological-leadership/ ↩
Google Cloud. "TPU v5p" documentation. https://docs.cloud.google.com/tpu/docs/v5p ↩
TrendForce. "Tight DRAM Supply Gives Suppliers Greater Pricing Power in HBM, with HBM Contract Prices Expected to Surge Multiples Higher in 2027." Press release, 2 June 2026. https://www.trendforce.com/presscenter/news/20260602-13074.html ↩
The Korea Herald. "SK hynix overtakes Samsung to lead global memory market for 1st time with HBM surge." 2025. https://www.koreaherald.com/article/10544988 ↩
TrendForce. "Memory Industry Revenue Expected to Reach Record High in 2025 Due to Increasing Average Prices and the Rise of HBM and QLC." 22 July 2024. https://www.trendforce.com/presscenter/news/20240722-12228.html ↩
TrendForce. "SK hynix Reportedly to Supply About Two-Thirds of NVIDIA HBM4; Samsung Targets Early Delivery." 28 January 2026. https://www.trendforce.com/news/2026/01/28/news-sk-hynix-reportedly-to-supply-about-two-thirds-of-nvidia-hbm4-samsung-targets-early-delivery/ ↩
Tom's Hardware. "Nvidia CEO Jensen Huang explains why SRAM isn't here to eat HBM's lunch - high bandwidth memory offers more flexibility in AI deployments across a range of workloads." January 2026. https://www.tomshardware.com/tech-industry/nvidia-ceo-jensen-huang-makes-the-case-against-optimizing-ai-hardware-too-narrowly-at-ces ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

5 revisions by 1 contributors · full history

Suggest edit

High Bandwidth Memory (HBM)

What is high bandwidth memory used for?

Why does HBM matter for AI?

Generations of HBM

HBM1 (JESD235, October 2013)

HBM2 (JESD235A, January 2016)

HBM2E (JESD235B/C/D, 2018-2021)

HBM3 (JESD238, January 2022)

HBM3e (JESD238A, 2024)

HBM4 (JESD270-4, April 2025)

HBM4E (planned 2026-2027)

Architecture

Manufacturing and packaging

AI accelerators using HBM

Comparison with alternative memory technologies

Why does HBM dominate AI accelerators?

Limitations

Industry dynamics in 2024-2026

Recent developments and the road to HBM4

See also

References

Improve this article

What links here (24 of 49)

What links here (24 of 49)

What is high bandwidth memory used for?

Why does HBM matter for AI?

Generations of HBM

HBM1 (JESD235, October 2013)

HBM2 (JESD235A, January 2016)

HBM2E (JESD235B/C/D, 2018-2021)

HBM3 (JESD238, January 2022)

HBM3e (JESD238A, 2024)

HBM4 (JESD270-4, April 2025)

HBM4E (planned 2026-2027)

Architecture

Manufacturing and packaging

AI accelerators using HBM

Comparison with alternative memory technologies

Why does HBM dominate AI accelerators?

Limitations

Industry dynamics in 2024-2026

Recent developments and the road to HBM4

See also

References

Improve this article

Related Articles

Cloud TPU

CuDNN

Jetson Thor

Nvidia

NVIDIA Blackwell

NVIDIA DGX Spark

What links here (24 of 49)

Related Articles

Cloud TPU

CuDNN

Jetson Thor

Nvidia

NVIDIA Blackwell

NVIDIA DGX Spark

What links here (24 of 49)