AMD Instinct MI325X

The AMD Instinct MI325X is a data center GPU accelerator designed for AI training and inference workloads. Released by Advanced Micro Devices (AMD) on October 10, 2024, the MI325X is a mid-cycle refresh of the AMD Instinct MI300X, retaining the same CDNA 3 compute architecture but replacing its HBM3 memory subsystem with 256 GB of higher-density, higher-bandwidth HBM3E. With a memory bandwidth of 6 TB/s, the MI325X holds the largest memory capacity of any production GPU accelerator at launch, surpassing the NVIDIA H200's 141 GB of HBM3E by a wide margin. It was announced at AMD's Advancing AI 2024 event alongside the 5th generation EPYC server processors.

The MI325X targets large language model (LLM) inference and training workloads where memory capacity constrains which models can be served on a given number of GPUs. AMD positioned it as a direct answer to NVIDIA H200 while production shipments of NVIDIA's next-generation Blackwell-based GPUs were still ramping. Production shipments of the MI325X began in Q4 2024, with broad server platform availability from OEM partners beginning in Q1 2025.

Background: from MI300X to MI325X

AMD introduced the AMD Instinct MI300X in December 2023 as its flagship AI accelerator, packing 192 GB of HBM3 memory across eight stacks onto a single OAM module. The MI300X marked a significant leap from the prior generation MI250X and attracted major cloud customers including Microsoft Azure, which deployed the accelerator to support OpenAI's GPT model serving workloads.

By early 2024, the memory capacity advantage the MI300X held over NVIDIA's H100 SXM5 (80 GB HBM3) had become a genuine selling point: large models that required multiple H100s to serve could sometimes run on a single MI300X or a smaller cluster of MI300Xs. However, NVIDIA's own H200 arrived with 141 GB of HBM3E and narrowed the gap. At its Computex 2024 appearance in June, AMD revealed a roadmap that included the MI325X as a Q4 2024 refresh, followed by the CDNA 4-based MI350 series in 2025.

The MI325X does not introduce new compute dies. AMD made a deliberate architectural decision to keep the same Aqua Vanjaram chip with its eight Compute Dies (XCDs) and four I/O Dies (IODs), fabricated on TSMC's 5nm and 6nm nodes respectively, and redirect engineering effort toward swapping the memory type. Each HBM3 stack was replaced by a denser HBM3E stack, raising per-GPU capacity from 192 GB to 256 GB and per-GPU bandwidth from approximately 5.3 TB/s to 6.0 TB/s. The result is a chip that an existing MI300X server can adopt with only firmware and cooling updates, since the physical OAM socket and server infrastructure carry over unchanged.

The power envelope, however, did increase. The MI325X has a rated TDP of 1,000 W per OAM module, compared to the MI300X's 750 W. This 33% rise in power reflects the higher energy draw of HBM3E at elevated bandwidth, and it meaningfully changes the cooling and power delivery requirements for data center deployment.

Architecture

The MI325X shares its full compute architecture with the MI300X under AMD's CDNA 3 (Compute DNA 3) generation. The chip contains 153 billion transistors across a multi-chip module (MCM) design.

Chiplet layout

The MI325X uses a 3D-stacked chiplet design with the following die configuration:

Eight Compute Dies (XCDs): each manufactured on TSMC N5 (5nm), containing the matrix engines, vector units, scalar units, and L2 cache. Each XCD contributes 152 compute units, giving the full chip 1,216 compute units in total.
Four I/O Dies (IODs): manufactured on TSMC N6 (6nm), handling PCIe connectivity, Infinity Fabric links to peer GPUs, memory controllers, and cache hierarchy.
Eight HBM3E stacks: mounted directly atop the IODs via through-silicon vias (TSVs), each stack providing 32 GB of capacity and contributing to the 256 GB total.

This arrangement, sometimes called the Aqua Vanjaram design, places eight compute tiles in a ring around four I/O tiles, with the memory stacks integrated vertically. The 2D footprint fits within the standard Open Accelerator Module (OAM) form factor defined by the Open Compute Project (OCP).

Compute units and matrix engines

Each compute unit in CDNA 3 contains a Matrix Core Engine capable of processing FP8, FP16, BF16, and FP64 matrix multiplications. AMD refers to these as second-generation Matrix Core engines relative to the CDNA 2 generation. The accelerator does not expose tensor cores in NVIDIA's terminology; AMD's compute unit hierarchy pairs scalar, vector, and matrix operations differently.

At the full 1,216 compute unit count, the MI325X achieves:

FP64: 81.7 TFLOPS
FP32: 163.4 TFLOPS
FP16/BF16: 1,307.4 TFLOPS
FP8: 2,614.9 TFLOPS
FP8 with structured sparsity: approximately 5,229.8 TFLOPS

These figures match the MI300X's compute performance, since the compute dies are identical. The bandwidth increase from HBM3E improves throughput for memory-bound operations without changing peak theoretical compute.

Memory subsystem

The HBM3E upgrade is the sole hardware change distinguishing the MI325X from the MI300X. High Bandwidth Memory 3E (HBM3E) is a denser and faster variant of the HBM3 standard, with per-stack density increasing from 24 GB (12-high stacks in the MI300X) to 32 GB (12-high stacks in the MI325X). The memory bus width remains 1,024 bits per stack, but HBM3E operates at higher transfer rates.

The resulting system-level numbers:

Total capacity: 256 GB
Total bandwidth: 6.0 TB/s (6,000 GB/s)
Memory interface width: 8,192 bits (8 stacks × 1,024 bits)
L2 Infinity Cache: 256 MB

The Infinity Cache sits between the compute dies and the memory stacks and serves as a last-level cache to reduce latency for frequently reused data. Its capacity is unchanged from the MI300X.

Inter-GPU interconnect

The MI325X uses AMD's Infinity Fabric for GPU-to-GPU communication within a server node. Each OAM module carries seven Infinity Fabric links, each running at 128 GB/s, for an aggregate inter-GPU bandwidth of 896 GB/s per device. In an eight-GPU node this creates a fully connected topology within a single compute partition, enabling the eight GPUs to share 2.048 TB of pooled HBM3E.

Infinity Fabric's per-link bandwidth trails NVIDIA's NVLink 4 interconnect, which the H200 uses in the HGX form factor. NVIDIA's NVSwitch 3 fabric in an HGX H200 node provides an all-to-all bisection bandwidth that exceeds what Infinity Fabric delivers at eight GPU scale. AMD has acknowledged this tradeoff: single-GPU and small-cluster advantages from greater memory capacity do not always translate to proportional multi-GPU training speedups because of the interconnect bottleneck.

The MI325X supports PCIe 5.0 x16 for host CPU communication alongside the Infinity Fabric peer links.

HBM3E memory

HBM3E is the third generation of High Bandwidth Memory, ratified by JEDEC in 2023. It raises per-die data rates above the 6.4 Gbps per pin achievable with HBM3 while keeping the same physical interface dimensions. SK Hynix was the first manufacturer to produce HBM3E at volume, with Samsung and Micron following. AMD's choice of supplier for the MI325X HBM3E stacks has not been publicly disclosed.

The practical advantage of 256 GB per accelerator is most visible in inference workloads for large models. A fully dense GPT-4-scale model (estimated around 1.8 trillion parameters in some analyses, though the exact count is not public) would require multiple accelerators regardless of memory size. More concretely, models in the 70 billion to 405 billion parameter range fit comfortably in either one or two MI325X GPUs at FP16 precision, whereas they require more H200 GPUs to service comparable batches. Llama 3.1 405B at BF16 occupies roughly 810 GB; three MI325X GPUs cover the model weights with headroom for KV cache, whereas seven H200 GPUs would be needed at equivalent precision.

The 6.0 TB/s bandwidth directly accelerates decode throughput in autoregressive inference. Because each output token requires loading the full model's key/value cache and weight matrices from memory, bandwidth determines tokens-per-second at a given batch size more than compute does in low-to-moderate batch regimes. AMD's internal benchmarks claimed 1.3x higher peak theoretical FP16 and FP8 performance over the H200 when accounting for both the compute units and the bandwidth increase.

Power and thermal design

At 1,000 W TDP, the MI325X consumes approximately 33% more power than the MI300X. The maximum transient power is specified at 1.1 kW. This places the MI325X in the same power tier as NVIDIA's H200 SXM5 (700 W) and below the air-cooled maximum of several competing accelerators, but the absolute figure still demands significant infrastructure.

AMD lists two board power configurations:

Standard OAM: 1,000 W TDP, liquid cooled
Air cooled variants: lower power configurations available in some OEM server designs

The dominant deployment scenario for the MI325X in high-performance data centers is direct liquid cooling. The OAM form factor plugs into an OCP-compliant Universal Baseboard that routes coolant through integrated cold plates. HPE's ProLiant Compute XD685, one of the first server platforms designed around the MI325X, offered both air and direct liquid cooling configurations in its eight-GPU, 5U chassis. Supermicro's H14 series similarly provided a liquid-cooled eight-GPU OAM board.

The power increase from 750 W to 1,000 W per GPU means an eight-GPU node draws roughly 8,000 W from the accelerators alone. Combined with host CPU, networking, and storage power, a dense MI325X node can approach 12-14 kW total rack unit demand, requiring power distribution units and facility infrastructure rated accordingly.

AMD and its OEM partners addressed this by retaining physical compatibility with MI300X server chassis wherever possible. The OAM socket and Universal Baseboard design are unchanged; operators upgrading from MI300X to MI325X primarily need to verify that existing liquid cooling loops can handle the higher thermal load and that power delivery circuits meet the 1,000 W per module spec.

Performance

Comparison with MI300X

Because the compute dies are identical, the MI325X's performance gains over the MI300X arise entirely from the memory subsystem upgrade. Operations that are compute-bound (i.e., where the GPU's matrix engines are the bottleneck) see little change. Operations that are memory-bandwidth-bound see gains proportional to the bandwidth increase, approximately 13% improvement in raw bandwidth.

The capacity increase of 33% (192 GB to 256 GB) yields larger practical gains for workloads that previously required model sharding across GPUs. Running Llama 2 70B inference on a single MI325X is possible at FP16 without tensor parallelism, whereas the same model on an MI300X requires FP8 quantization to fit in 192 GB with typical KV cache overhead. This consolidation reduces inter-GPU communication and simplifies deployment.

AMD's own benchmarking at launch cited a 1.3x improvement in inference throughput for large model serving compared to the MI300X on bandwidth-bound workloads.

MLPerf Inference v5.0

AMD submitted its first MI325X results in the MLPerf Inference v5.0 round, with partner submissions from Supermicro, ASUS, and Gigabyte. The workloads tested included Llama 2 70B (server and offline scenarios) and Stable Diffusion XL (text-to-image generation).

On Llama 2 70B, MI325X partner results were within 3% of AMD's reference submission performance, and the accelerator traded blows with the H200 across offline and server scenarios. The MLPerf submission used FP8 quantization via the OCP FP8-e4m3 format, multi-step vLLM scheduling to reduce CPU overhead, and GEMM tuning targeting critical matrix operations.

In MLPerf Inference v5.1 (September 2025), AMD submitted the MI355X in addition to the MI325X. The MI355X delivered 2.7x the tokens per second of the MI325X on Llama 2 70B server inference in FP8, reflecting the architectural improvements of CDNA 4 rather than a memory-only change.

Comparison with NVIDIA H200

The H200 carries 141 GB of HBM3E with 4.8 TB/s bandwidth. At a single accelerator level, the MI325X's advantages in memory capacity and bandwidth are real:

Specification	AMD Instinct MI325X	NVIDIA H200 SXM5
Architecture	CDNA 3	Hopper
Process node	TSMC 5nm/6nm	TSMC 4N
Compute units / SMs	1,216 CUs	132 SMs
FP64 (TFLOPS)	81.7	67.0
FP32 (TFLOPS)	163.4	67.0
FP16 (TFLOPS)	1,307.4	989.4
FP8 (TFLOPS)	2,614.9	1,978.8
FP8 with sparsity (TFLOPS)	~5,229.8	3,957.6
HBM memory	256 GB HBM3E	141 GB HBM3E
Memory bandwidth	6.0 TB/s	4.8 TB/s
TDP	1,000 W	700 W
Inter-GPU interconnect	Infinity Fabric	NVLink 4
Memory capacity advantage	1.81x vs H200	baseline
Memory bandwidth advantage	1.25x vs H200	baseline

In terms of peak theoretical TFLOPS, the MI325X's numbers exceed the H200 because CDNA 3's Matrix Core architecture computes at a different ratio of dense operations. However, the H200 uses NVLink 4 and NVSwitch 3 for multi-GPU communication, which provides substantially higher all-to-all bisection bandwidth in eight-GPU nodes. This means MI325X's single-GPU advantage partially erodes in multi-GPU training workloads.

AMD claimed the MI325X delivered an eight-GPU Llama 2 70B inference throughput within 3 to 7 percent of an equivalent eight-GPU H200 system, and image generation on SDXL within 10 percent of H200. Independent analysis from The Next Platform noted that single-device advantages in bandwidth mostly held, but scaling efficiency at eight GPUs was limited by Infinity Fabric compared to NVSwitch.

Comparison with NVIDIA B200

NVIDIA's Blackwell-based NVIDIA B200 was announced in March 2024 and entered production in late 2024. The B200 belongs to a different performance class: it carries 192 GB of HBM3E with 8.0 TB/s bandwidth and substantially higher FP8 compute (4,500+ TFLOPS), making direct comparisons with the MI325X less favorable for AMD. AMD positioned the MI325X against the H200 explicitly, not against Blackwell.

Specification	AMD Instinct MI325X	NVIDIA B200 SXM5
Architecture	CDNA 3	Blackwell
FP16 (TFLOPS)	1,307.4	~2,250
FP8 (TFLOPS)	2,614.9	~4,500
HBM memory	256 GB HBM3E	192 GB HBM3E
Memory bandwidth	6.0 TB/s	8.0 TB/s
TDP	1,000 W	1,000 W

The MI325X holds a memory capacity edge over the B200, but the B200 exceeds it in bandwidth and compute. AMD's response to Blackwell was the CDNA 4-based MI350 series, not the MI325X.

Deployment and customers

The MI325X was announced with commercial support from a broad set of OEM server builders and cloud providers.

OEM server platforms

At the Advancing AI 2024 event, AMD confirmed system support from Dell Technologies, Eviden, Gigabyte, Hewlett Packard Enterprise, Lenovo, and Supermicro. Production shipments from AMD started in Q4 2024, with OEM system availability broadening in Q1 2025.

HPE's ProLiant Compute XD685 was among the first validated platforms. It houses eight MI325X OAM modules alongside two AMD EPYC 9005-series CPUs in a 5U chassis supporting both air and direct liquid cooling. Supermicro's H14 8U 8-GPU MI325X platform offered a similarly dense configuration with optional liquid cooling.

Cloud providers

Vultr was the first cloud provider to make MI325X instances commercially available, launching in early 2025 with configurations pairing eight MI325X GPUs and 2.048 TB of pooled HBM3E. Vultr marketed the offering to enterprises that needed large-memory GPU instances for LLM serving, RAG pipelines, and fine-tuning.

Microsoft Azure had already deployed AMD Instinct MI300X accelerators in its ND MI300X V5 virtual machine series to power Azure OpenAI Service workloads. At the Advancing AI 2024 event, Microsoft was named among the cloud and AI ecosystem partners supporting the MI325X roadmap, though Azure's public MI325X VM SKUs were not announced at that time.

Oracle Cloud Infrastructure (OCI) has been a consistent AMD Instinct partner for both EPYC CPUs and Instinct accelerators. OCI announced expanded AMD EPYC compute instances at the same event, maintaining its status as a key AMD cloud partner.

Google Cloud participated in the Advancing AI 2024 ecosystem announcements. Its primary AMD deployments have been EPYC-based rather than Instinct GPU-based in public disclosures.

Meta was named among customers participating in AMD's launch announcements. Meta has deployed AMD Instinct accelerators as part of its diversified AI infrastructure strategy.

By mid-2025, the cloud GPU rental market showed MI325X pricing typically ranging from $2.00 to $2.25 per GPU per hour, compared to H200 pricing of approximately $3.72 to $10.60 per GPU per hour depending on provider and contract terms. The MI325X's lower rental price, combined with its larger memory, made it attractive for inference workloads that were not compute-bound.

ROCm software stack

The MI325X runs on AMD's ROCm (Radeon Open Compute) open-source software platform. ROCm provides the GPU runtime, HIP (Heterogeneous Interface for Portability), math libraries (rocBLAS, MIOpen, rocFFT), and communication libraries (RCCL, AMD's equivalent of NCCL).

Framework support covers PyTorch, TensorFlow, and JAX via ROCm backends. vLLM and SGLang, two widely used LLM inference frameworks, added MI300X and MI325X support throughout 2024, making inference deployment accessible without custom kernel development. ROCm 6.x releases during 2024 and into 2025 substantially improved PyTorch and JAX compatibility.

ROCm's software maturity still trails CUDA's in several areas. Most AI frameworks have been developed and optimized for NVIDIA's CUDA ecosystem first. Operators running training workloads at scale reported that achieving performance parity with H200 on MI300X or MI325X required more effort: custom Docker builds, AMD engineering engagement, and manual GEMM tuning. RCCL, used for collective communication in distributed training, has shown lower efficiency than NCCL for some topology configurations. NVIDIA's tight vertical integration between its networking (InfiniBand, NVLink), communication libraries, and CUDA gave H200 clusters an advantage in multi-node training throughput that was not fully offset by AMD's memory capacity lead.

For inference workloads with vLLM or SGLang, the ROCm experience in 2024-2025 was substantially more accessible than earlier generations, with most popular models running without modification.

Pricing

AMD has not published a suggested retail price for the MI325X. Data center GPU accelerators at this tier are sold through OEM channel deals and direct enterprise agreements rather than consumer retail. List prices for the MI300X in the $10,000-$15,000 USD per unit range have been cited by third-party market researchers, and the MI325X is expected to be priced comparably given the incremental nature of the upgrade.

Cloud rental pricing (as of mid-2025) places MI325X instances at approximately $2.00-$2.25 per GPU per hour. For reference, H200 SXM5 cloud instances on major hyperscalers ranged from roughly $3.72 to higher depending on provider and reservation type. The more favorable rental economics for the MI325X made it competitive for cost-sensitive inference deployments, particularly when memory capacity requirements ruled out H100.

Limitations

Several limitations have shaped MI325X adoption relative to NVIDIA's offerings.

No architectural compute improvement over MI300X. Because the compute dies are unchanged, the MI325X does not address any compute-bound training workloads better than its predecessor. Customers who needed more raw TFLOPS rather than more memory saw no benefit from upgrading.

Higher power draw. The jump from 750 W to 1,000 W TDP affects total cost of ownership. Data centers optimized for MI300X power budgets need reassessment before deploying MI325X. Liquid cooling infrastructure becomes more strongly recommended rather than optional.

Infinity Fabric scaling ceiling. As noted in independent benchmarks, the Infinity Fabric interconnect limits scaling efficiency at eight-GPU node configurations relative to NVLink 4/NVSwitch 3. Training on large models that require tight GPU-to-GPU communication sees less benefit from the MI325X's memory advantages.

ROCm ecosystem maturity. Despite improvements in 2024-2025, the ROCm software ecosystem continues to require more operator expertise than CUDA for production deployments. Sparse library support, less mature debugging tooling, and lower community availability of ROCm-specific optimizations create real friction in training pipelines.

Rapid succession by MI350 series. AMD's MI350X and AMD Instinct MI355X launched in 2025 with CDNA 4 architecture, 288 GB of HBM3E, and 8.0 TB/s bandwidth. The short interval between MI325X availability and MI355X availability compressed the MI325X's window as AMD's flagship accelerator, and some cloud providers moved to offering MI355X rather than building out MI325X capacity.

No FP4 support. The CDNA 3 architecture does not natively support FP4 or FP6 data types that NVIDIA's Blackwell GPUs and AMD's own CDNA 4 chips added. For workloads using aggressive quantization to maximize throughput per watt, MI325X is at a disadvantage compared to B200 and MI355X.

Successors

AMD's accelerator roadmap places the MI325X as the last CDNA 3 product before the CDNA 4-based MI350 series.

AMD Instinct MI350X and MI355X

The MI350X (air-cooled) and AMD Instinct MI355X (liquid-cooled) are built on the CDNA 4 architecture, manufactured on TSMC's 3nm process node. AMD production shipments of MI350 platforms began in May 2025. Both share:

288 GB HBM3E
8.0 TB/s memory bandwidth
Native FP4 and FP6 data type support
Up to 4x generational performance improvement (AMD's claimed figures, likely on specific inference workloads)

In MLPerf Inference v5.1, the MI355X achieved 2.7x the Llama 2 70B tokens per second of the MI325X. AMD also claims up to 35x faster inference performance versus the MI300X on certain configurations, though these figures represent favorable workload selection.

AMD Instinct MI400 series

AMD's MI400 series, powered by CDNA "Next" architecture, is planned for 2026 with the "Helios" rack architecture. AMD has previewed UALink over Ethernet connectivity in the MI400 platform, targeting a 72-accelerator scale-up domain analogous to NVIDIA's NVL72 configuration. AMD projects up to 10x performance improvement over the MI350 series for AI frontier model workloads.

Specifications summary

Specification	Value
Launch date	October 10, 2024
Architecture	CDNA 3 (Aqua Vanjaram)
Process node	TSMC N5 (compute), TSMC N6 (I/O)
Transistor count	153 billion
Compute units	1,216
FP64 performance	81.7 TFLOPS
FP32 performance	163.4 TFLOPS
FP16/BF16 performance	1,307.4 TFLOPS
FP8 performance	2,614.9 TFLOPS
FP8 with sparsity	~5,229.8 TFLOPS
Memory type	HBM3E
Memory capacity	256 GB
Memory bandwidth	6.0 TB/s
Memory bus width	8,192 bits
L2 / Infinity Cache	256 MB
Inter-GPU interconnect	Infinity Fabric (7x links, 128 GB/s each)
Host interface	PCIe 5.0 x16
Form factor	OAM (Open Accelerator Module)
TDP	1,000 W
Max power	1,100 W
Predecessor	AMD Instinct MI300X (750 W, 192 GB HBM3)
Successor	AMD Instinct MI355X (CDNA 4, 288 GB HBM3E)

References

AMD. "AMD Delivers Leadership AI Performance with AMD Instinct MI325X Accelerators." AMD Newsroom, October 10, 2024. https://www.amd.com/en/newsroom/press-releases/2024-10-10-amd-delivers-leadership-ai-performance-with-amd-in.html
AMD. "AMD Unveils Leadership AI Solutions at Advancing AI 2024." AMD Newsroom, October 10, 2024. https://www.amd.com/en/newsroom/press-releases/2024-10-10-amd-unveils-leadership-ai-solutions-at-advancing-a.html
AMD. "AMD Instinct MI325X Accelerator Data Sheet." https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/product-briefs/instinct-mi325x-datasheet.pdf
AMD. "AMD Instinct MI325X Platform Data Sheet." https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/product-briefs/instinct-mi325x-platform-datasheet.pdf
The Next Platform. "AMD Gives Nvidia Some Serious Heat In GPU Compute." October 10, 2024. https://www.nextplatform.com/2024/10/10/amd-gives-nvidia-some-serious-heat-in-gpu-compute/
Tom's Hardware. "AMD's Instinct MI325X smiles for the camera: 256 GB of HBM3E." https://www.tomshardware.com/tech-industry/artificial-intelligence/amds-instinct-mi325x-smiles-for-the-camera-256-gb-of-hbm3e
AMD ROCm Blogs. "AMD Instinct MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0." https://rocm.blogs.amd.com/artificial-intelligence/mi325x-accelerates-mlperf-inference/README.html
Data Center Dynamics. "HPE, Dell, Lenovo, and Supermicro detail new server offerings featuring AMD's new MI325X GPUs and Epyc CPUs." https://www.datacenterdynamics.com/en/news/hpe-dell-lenovo-and-supermicro-detail-new-server-offerings-featuring-amds-new-mi325x-gpus-and-epyc-cpus/
Vultr. "Announcing the AMD Instinct MI325X Accelerator on Vultr Cloud." https://blogs.vultr.com/MI325X
Business Wire. "Vultr Announces Availability of AMD Instinct MI325X GPUs to Power Enterprise AI." February 18, 2025. https://www.businesswire.com/news/home/20250218448181/en/Vultr-Announces-Availability-of-AMD-Instinct-MI325X-GPUs-to-Power-Enterprise-AI
VideoCardz.com. "AMD Instinct MI325X to feature 256GB HBM3E memory, CDNA4-based MI355X with 288GB." https://videocardz.com/newz/amd-instinct-mi325x-to-feature-256gb-hbm3e-memory-cdna4-based-mi355x-with-288gb
SDxCentral. "AMD details Instinct MI325X GPU, with 256GB of HBM3E." https://www.sdxcentral.com/news/amd-details-instinct-mi325x-gpu-with-256gb-of-hbm3e/
Supermicro. "H14 8U 8-GPU MI325X Datasheet." October 2024. https://www.supermicro.com/datasheet/h14/datasheet_H14_8U_8GPU_MI325X.pdf
AMD. "AMD Announces MI350X and MI355X AI GPUs." 2025. https://www.amd.com/en/newsroom/press-releases/2025-6-12-amd-unveils-vision-for-an-open-ai-ecosystem-detai.html
SemiAnalysis. "MI300X vs H100 vs H200 Benchmark Part 1: Training - CUDA Moat Still Alive." https://newsletter.semianalysis.com/p/mi300x-vs-h100-vs-h200-benchmark-part-1-training
AMD. "AMD Accelerates Pace of Data Center AI Innovation and Leadership with Expanded AMD Instinct GPU Roadmap." June 2, 2024. https://www.amd.com/en/newsroom/press-releases/2024-6-2-amd-accelerates-pace-of-data-center-ai-innovation-.html
XPU.pub. "Nvidia Blackwell Shines, AMD MI325X Debuts in Latest MLPerf." April 2025. https://xpu.pub/2025/04/07/mlperf-5-0/
TensorWave. "MI325X vs MI300X: What's New, What Matters, and Why It Changes Your AI Stack." https://tensorwave.com/blog/mi325x-vs-mi300x-whats-new-what-matters-and-why-it-changes-your-ai-stack
flopper.io. "AMD Instinct MI325X OAM GPU Specifications." https://flopper.io/gpu/amd-instinct-mi325x-oam
getdeploying.com. "MI325X Cloud Pricing: Compare Providers." https://getdeploying.com/gpus/amd-mi325x

AMD Instinct MI325X

Background: from MI300X to MI325X

Architecture

Chiplet layout

Compute units and matrix engines

Memory subsystem

Inter-GPU interconnect

HBM3E memory

Power and thermal design

Performance

Comparison with MI300X

MLPerf Inference v5.0

Comparison with NVIDIA H200

Comparison with NVIDIA B200

Deployment and customers

OEM server platforms

Cloud providers

ROCm software stack

Pricing

Limitations

Successors

AMD Instinct MI350X and MI355X

AMD Instinct MI400 series

Specifications summary

See also

References

Improve this article

Related Articles

AMD Instinct MI355X

AMD Instinct MI300X

NVIDIA B200

NVIDIA GB300 NVL72

NVIDIA DGX B300

AMD Instinct MI400

AMD Instinct MI325X

Background: from MI300X to MI325X

Architecture

Chiplet layout

Compute units and matrix engines

Memory subsystem

Inter-GPU interconnect

HBM3E memory

Power and thermal design

Performance

Comparison with MI300X

MLPerf Inference v5.0

Comparison with NVIDIA H200

Comparison with NVIDIA B200

Deployment and customers

OEM server platforms

Cloud providers

ROCm software stack

Pricing

Limitations

Successors

AMD Instinct MI350X and MI355X

AMD Instinct MI400 series

Specifications summary

See also

References

Related Articles

AMD Instinct MI355X

AMD Instinct MI300X

NVIDIA B200

NVIDIA GB300 NVL72

NVIDIA DGX B300

AMD Instinct MI400