HBM3e
Last reviewed
May 25, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 3,532 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 25, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 3,532 words
Add missing citations, update stale details, or suggest a clearer explanation.
HBM3e (also written HBM3E and HBM3 Extended) is the extended-bandwidth variant of HBM3 high bandwidth memory, a stacked DRAM technology used as on-package memory for AI accelerators and high-performance computing devices.[^1][^2] Standardized under JEDEC's JESD238 HBM3 specification with vendor-specific speed extensions up to 9.6 Gb/s per pin, HBM3e raised per-stack capacity to 24 GB in an 8-Hi configuration and 36 GB in a 12-Hi configuration, with per-stack bandwidth above 1.2 TB/s.[^1][^3][^4] SK hynix delivered the first 8-Hi HBM3e for mass production in March 2024 to support NVIDIA's H200 nvidia h100 successor, followed by Micron in February 2024 and Samsung after a delayed qualification cycle.[^5][^6][^7] HBM3e powers the dominant AI training and inference accelerators of the 2024 to 2025 cycle, including nvidia h200, nvidia b200, the nvidia gb200 nvl72 platform, nvidia gb300 nvl72 with 288 GB per GPU, amd instinct mi300x, amd instinct mi325x, amd instinct mi355x, and aws trainium2.[^8][^9][^10][^11][^12][^13][^14] The technology is succeeded by HBM4, which doubles the interface width to 2048 bits and reaches 2 TB/s per stack under JEDEC JESD270-4 (April 2025).[^15]
The High Bandwidth Memory family was first standardized by JEDEC as JESD235 in 2013 and reached commercial deployment in AMD's Fiji GPU in 2015.[^16] Successive generations, HBM2 (JESD235A, 2016), HBM2e (JESD235C, 2018), and HBM3 (JESD238, 2022), increased per-pin data rates and per-stack capacity while keeping the wide 1024-bit interface common to the family.[^1][^16] HBM3 doubled the per-pin data rate of HBM2 to 6.4 Gb/s, supported 8 Gb to 32 Gb per memory layer with stack heights up to 16-Hi, and reached 819 GB/s per device at peak data rate.[^1] JEDEC published the JESD238 specification on January 27, 2022.[^1]
HBM3e was not released as a separate JEDEC specification but rather as a vendor-driven extension of HBM3 with higher pin speeds, taller stacks (12-Hi), and higher-capacity 24 Gb DRAM dies.[^4][^17] SK hynix referred to the technology as the fifth generation in the HBM lineage when describing initial samples in August 2023.[^5] The 'e' designation signals an extended revision of HBM3, with the performance gain driven by higher per-pin data rates that scale from 9.2 Gb/s in early Micron and Samsung silicon to 9.6 Gb/s in SK hynix's 12-Hi product.[^4][^7][^18][^19]
Large transformer models are memory-bandwidth bound during inference: every generated token requires reading the model's parameters from memory, and for attention layers the key-value cache scales with sequence length.[^20] GDDR-style memory bolted to a wide PCB-trace bus, even at the high pin speeds of GDDR6X and GDDR7, could not match the bandwidth or energy efficiency required by trillion-parameter models.[^20] HBM addresses the bandwidth wall by mounting stacked DRAM dies directly onto the same silicon interposer or substrate as the compute die, exposing a 1024-bit per stack interface and packing multiple stacks (typically 4, 6, or 8) around the accelerator.[^21] Each individual HBM die exposes 256 bits of I/O, roughly 16 times the bus width of an LPDDR chip.[^21] SemiAnalysis estimates HBM3e cost at roughly 5 to 10 times the per-bit price of commodity DDR5, but the additional bandwidth and lower energy per bit make it the only viable choice for high-end AI accelerators.[^20][^22] By 2024, HBM revenue accounted for more than 30 percent of total DRAM value, up from a single-digit share in 2020.[^41]
HBM3e is implemented as a vertical stack of DRAM dies bonded to a base logic die using Through Silicon Vias (TSVs) and microbumps. The stack connects to the host accelerator through a 1024-bit channel arranged as 16 independent 64-bit channels with two pseudo-channels per channel, a structure inherited from the HBM3 JEDEC specification.[^1][^4]
| Parameter | HBM3 baseline (JESD238) | HBM3e 8-Hi (Micron, SK hynix) | HBM3e 12-Hi (Micron, SK hynix) | HBM3e 12-Hi (SK hynix Sep 2024) |
|---|---|---|---|---|
| Interface width per stack | 1024 bits | 1024 bits | 1024 bits | 1024 bits |
| Pin speed | up to 6.4 Gb/s | greater than 9.2 Gb/s | greater than 9.2 Gb/s | 9.6 Gb/s |
| Per-stack bandwidth | up to 819 GB/s | greater than 1.2 TB/s | greater than 1.2 TB/s | up to 1.22 TB/s |
| Stack height | 4-Hi to 16-Hi | 8-Hi | 12-Hi | 12-Hi |
| Per-die capacity | 8 Gb to 32 Gb | 24 Gb (3 GB) | 24 Gb (3 GB) | 24 Gb (3 GB) |
| Per-stack capacity | up to 64 GB | 24 GB | 36 GB | 36 GB |
| Independent channels | 16 | 16 | 16 | 16 |
| Pseudo-channels per channel | 2 | 2 | 2 | 2 |
The standard HBM3 specification limited total stack height to 720 micrometers, a constraint preserved in early HBM3e 8-Hi parts.[^21] To fit 12 dies in the same physical envelope, SK hynix made each DRAM die roughly 40 percent thinner than the 8-Hi version and used its Advanced MR-MUF (Mass Reflow Molded Underfill) process to provide thermal dissipation and warpage control across the taller stack.[^6] Micron's 12-Hi 36 GB product targeted similar height envelopes while reporting "significantly lower power consumption" than competing 8-Hi 24 GB devices.[^7]
Three differences distinguish HBM3e from baseline HBM3 in shipping products. First, pin speed rises from a 6.4 Gb/s ceiling to roughly 9.2 to 9.6 Gb/s, increasing per-stack bandwidth by 44 to 50 percent.[^1][^4][^6] Second, per-die capacity doubles from 16 Gb to 24 Gb, allowing 24 GB in 8-Hi and 36 GB in 12-Hi configurations.[^7][^18] Third, the technology is positioned exclusively as an AI accelerator memory: every shipping HBM3e product is paired with a TSMC CoWoS-S or CoWoS-L package containing an NVIDIA, AMD, or hyperscaler accelerator die.[^23][^24]
The HBM3e supply chain is concentrated among three DRAM vendors: SK hynix, Micron, and Samsung. Each shipped products to NVIDIA and AMD on overlapping but distinct timelines.
SK hynix announced HBM3e samples on August 21, 2023, claiming a 1.15 TB/s per-stack bandwidth at a 9 GT/s data rate and a 40 percent speed improvement over its HBM3 product.[^5] The company began volume production of 8-Hi HBM3e on March 19, 2024, becoming the first vendor to mass-produce the technology.[^25] On September 26, 2024, SK hynix announced volume production of the world's first 12-layer HBM3e product with 36 GB capacity and 9.6 Gb/s per-pin speed, equivalent to 1.22 TB/s per stack.[^6][^26] SK hynix held an estimated 54 percent share of the HBM market in 2024, rising to 62 percent in Q2 2025 according to industry trackers.[^27]
Micron began volume production of HBM3e on February 26, 2024, announcing 24 GB 8-Hi modules running at pin speeds above 9.2 Gb/s and bandwidth above 1.2 TB/s per stack.[^18] The company targeted NVIDIA's H200 platform for Q2 2024 shipments.[^18] On September 9, 2024, Micron announced 12-Hi 36 GB HBM3e sampling to customers, claiming over 1.2 TB/s of memory bandwidth at a pin speed greater than 9.2 Gb/s, and asserted "significantly lower power consumption" than 8-Hi 24 GB competing offerings.[^7] Micron's HBM share rose from 7 percent in 2024 to roughly 21 percent in Q2 2025, overtaking Samsung in that quarter.[^27]
Samsung introduced its HBM3e product line under the brand name "Shinebolt," which the company describes as offering up to 1,180 GB/s per stack at 9.2 Gb/s in a 12-layer configuration.[^28] Samsung's path to NVIDIA qualification was extended: the company demonstrated 12-Hi HBM3e at GTC 2024 in March 2024 but did not reach full NVIDIA qualification at that time.[^29] Reuters and KED Global reported that Samsung passed NVIDIA's qualification tests for the 12-layer HBM3e product in September 2025, roughly 18 months after the company completed development of the chip.[^30] Samsung's HBM market share declined from 39 percent in 2024 to 17 percent in Q2 2025 before recovering to 22 percent in Q3 2025 as additional qualifications closed.[^27]
HBM3e was the dominant on-package memory for AI training and inference accelerators announced from late 2023 through 2025. Five accelerator families consumed the majority of HBM3e supply: NVIDIA's Hopper-refresh H200, NVIDIA's Blackwell B100, B200, and B300, AMD's Instinct MI325X and MI355X (the MI300X used HBM3), AWS Trainium 2, and Etched's Sohu.[^8][^9][^10][^11][^12][^14][^31][^32]
| Accelerator | HBM3e capacity | HBM3e bandwidth | Stack count and height | Announced or shipped |
|---|---|---|---|---|
| NVIDIA H200 | 141 GB | 4.8 TB/s | six 24 GB 8-Hi stacks | announced November 13, 2023; shipping Q2 2024 |
| NVIDIA B100 / B200 | 192 GB | 8 TB/s | eight 24 GB 8-Hi stacks | announced March 18, 2024 at GTC |
| NVIDIA B300 (Blackwell Ultra) | 288 GB | up to 8 TB/s | eight 36 GB 12-Hi stacks | announced GTC 2025; shipping mid-2025 |
| AMD Instinct MI300X | 192 GB HBM3 (not HBM3e) | 5.2 TB/s | eight 24 GB 8-Hi stacks | launched December 2023 |
| AMD Instinct MI325X | 256 GB HBM3e | 6 TB/s | eight 32 GB 8-Hi stacks | launched Q4 2024 |
| AMD Instinct MI355X | 288 GB HBM3e | 8 TB/s | eight 36 GB 12-Hi stacks | announced June 2025 |
| AWS Trainium2 | 96 GB HBM3e | 2.9 TB/s | four stacks | announced Dec 2023; GA late 2024 |
| Etched Sohu | 144 GB HBM3e | not disclosed | six 24 GB 8-Hi stacks | announced June 2024 |
The H200 was announced on November 13, 2023 at the Supercomputing 23 (SC23) conference and is the first GPU to ship with HBM3e.[^8] It uses the same GH100 die as the H100 with the memory subsystem replaced: six stacks of 24 GB HBM3e for a total of 141 GB at 4.8 TB/s, compared to five stacks of 16 GB HBM3 for 80 GB at 3.35 TB/s on the H100 SXM5.[^8][^33] NVIDIA reported the configuration delivers up to 1.9 times the inference throughput of the H100 on Llama 2 70B.[^33] H200-powered systems from server manufacturers and cloud providers began shipping in Q2 2024.[^33]
NVIDIA announced the Blackwell architecture at GTC on March 18, 2024 with the B100 and B200 data center accelerators and the GB200 Grace Blackwell Superchip.[^9] The B200 is a dual-die package totaling 208 billion transistors and ships with eight HBM3e stacks for 192 GB at 8 TB/s of memory bandwidth on an 8192-bit aggregate interface.[^9][^34] The GB200 NVL72 rack-scale system contains 72 B200 GPUs and 36 Grace CPUs delivering 13.5 TB of HBM3e at 576 TB/s of aggregate memory bandwidth.[^9][^35] NVIDIA announced the Blackwell Ultra B300 at GTC 2025; the part raises per-GPU HBM3e to 288 GB by replacing the 24 GB 8-Hi stacks with 36 GB 12-Hi stacks, while keeping the same eight-stack layout.[^10] The B300 reaches dense FP4 performance of 15 PFLOPS, compared to 9 PFLOPS on the B200, with GB300 NVL72 systems shipping in mid-2025.[^10]
AMD's first-generation Instinct MI300X uses HBM3 (not HBM3e), shipping with 192 GB across eight 24 GB 8-Hi stacks at 5.2 TB/s and launching in late 2023.[^11][^36] AMD upgraded to HBM3e with the MI325X, announced at the Advancing AI event in October 2024 and shipping in Q4 2024: 256 GB of HBM3e at 6 TB/s, structured as eight 32 GB stacks.[^12][^37] The MI355X, announced in June 2025 with shipments later that year, increased on-package memory to 288 GB of HBM3e at 8 TB/s using eight 36 GB 12-Hi stacks, and added native FP4 and FP6 precision.[^13][^38]
Amazon Web Services announced Trainium 2 at re:Invent 2023; the accelerator integrates 96 GB of HBM3e per chip with 2.9 TB/s of memory bandwidth, organized in four HBM stacks.[^14][^39] SemiAnalysis reported that Trn2 instances initially operated the HBM3e at HBM3 speeds through firmware, with the option to clock the memory to roughly 3.2 TB/s in custom configurations.[^39] Trainium 2 entered general availability in late 2024 in the Trn2 instance family, and the larger UltraServer configuration provides 6 TB of HBM3e at 185 TB/s aggregate bandwidth across 64 chips.[^14][^40]
Etched's Sohu inference ai chip (transformer-only ASIC), unveiled in June 2024, uses six stacks of 24 GB HBM3e for 144 GB of on-package memory.[^32] cerebras wse 3 uses on-die SRAM rather than HBM. Various startups including Tenstorrent, MatX, and Groq have published HBM3e-using designs at conferences and in marketing materials, but production volumes remain small compared to NVIDIA and AMD.[^20]
HBM3e is integrated with the host accelerator through 2.5D advanced packaging, almost exclusively using TSMC's CoWoS (Chip-on-Wafer-on-Substrate) technology. CoWoS-S, which uses a monolithic silicon interposer, was the packaging path for NVIDIA Hopper and AMD MI300 generations; CoWoS-L, which uses local silicon interconnect bridges in a larger organic substrate, is used for NVIDIA Blackwell and Blackwell Ultra to support the larger dual-die compute layout with eight HBM stacks.[^23][^24] TSMC scaled CoWoS capacity from approximately 35,000 wafers per month in late 2024 to a target of 130,000 wafers per month by the end of 2026, with NVIDIA reportedly securing roughly 60 percent of the global CoWoS allocation.[^24]
Supply of HBM3e was constrained throughout 2024 and 2025. TrendForce reported HBM3e supply was tight enough that suppliers raised contract prices 5 to 10 percent in 2024 alone.[^41] SemiAnalysis reported that TSV yields for HBM3e ranged between 40 and 60 percent, lower than commodity DRAM yields, which compounded the production bottleneck.[^22] By late 2025, Samsung and SK hynix reportedly negotiated HBM3e contract price increases of nearly 20 percent for 2026 supply as demand from NVIDIA's H200 refresh inventory and the Blackwell Ultra ramp exceeded available capacity.[^42] SK hynix's dominance of the supply pool reached an estimated 62 percent of HBM revenue in Q2 2025 and 57 percent in Q3 2025, with Samsung's recovery driven by the late qualification of its 12-Hi product for NVIDIA.[^27][^30]
Direct per-GB pricing for HBM3e is not publicly disclosed by SK hynix, Samsung, or Micron because the parts ship under accelerator-vendor contracts rather than spot DRAM channels. Industry sources triangulate the cost. SemiAnalysis reports that the HBM stacks on a single high-end AI accelerator represent 40 to 50 percent of the bill of materials, with six to eight HBM3e stacks adding $700 to $1,500 to the manufacturing cost of a single GPU.[^22] Industry analysts including IntuitionLabs estimate HBM3e at approximately $15 to $20 per GB in 2024 to 2025, roughly 5 to 10 times the per-bit price of DDR5.[^43] Channel pricing for the resulting accelerators reflects this premium: NVIDIA H200 NVL cards retail in the $30,000 to $40,000 range through authorized resellers, with H200 SXM5 modules priced higher within OEM bundles.[^44]
Compared to baseline HBM3 (JESD238), HBM3e increases per-pin data rate from 6.4 Gb/s to roughly 9.2 to 9.6 Gb/s, expands per-stack capacity from 16 to 36 GB through the introduction of 24 Gb dies and 12-Hi stacking, and lifts per-stack bandwidth from 819 GB/s to 1.2 TB/s.[^1][^4][^6][^7] The interface remains 1024 bits per stack with 16 channels and 32 pseudo-channels, preserving backward compatibility with HBM3 memory controllers.[^4]
HBM4 was finalized by JEDEC as JESD270-4 in April 2025.[^15] Its principal change is doubling the per-stack interface width to 2048 bits, allowing per-stack bandwidth above 2 TB/s even with conservative per-pin data rates of 8 Gb/s.[^15] HBM4 also normalizes 12-Hi and 16-Hi stack heights and targets capacities up to 64 GB per stack with 32 Gb dies.[^15] Initial NVIDIA Rubin and AMD MI400-series accelerators are expected to use HBM4 starting in 2026.[^15][^45] Compared to HBM3e, HBM4 roughly doubles bandwidth at the cost of a wider interposer footprint and tighter alignment tolerances during 2.5D packaging.[^15]
HBM3e has three principal limitations. First, capacity remains modest relative to HPC memory pools: even at 36 GB per stack and eight stacks, a single GPU tops out at 288 GB, far below the parameter count of trillion-parameter models when stored in BF16.[^10] Second, the technology is supply-constrained, with TSV yields and CoWoS packaging capacity as the binding limits across 2024 and 2025.[^22][^24] Third, HBM3e's higher per-pin data rates push thermal density: SK hynix's MR-MUF process and Samsung's advanced TC-NCF technology aim to dissipate the additional heat across 12-Hi stacks, but stack thermals become a more pronounced constraint at 16-Hi.[^6][^28] These limitations motivate the move to HBM4 and its wider interface, which delivers more bandwidth per stack without further increasing per-pin data rates.[^15]