AMD Instinct MI400

AI Hardware

18 min read

Updated Jun 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 23, 2026

Fact-checked

In review queue

Sources

26 citations

Revision

v2 · 3,651 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

AMD Instinct MI400 is AMD's 2026 family of data center GPU accelerators for artificial intelligence training and inference, built on the CDNA 5 architecture and equipped with 432 GB of HBM4 memory, 19.6 TB/s of memory bandwidth, and up to 40 PFLOPS of FP4 compute per accelerator. AMD first announced the MI400 series at its Advancing AI 2025 event in San Jose on June 12, 2025, then detailed the full product family at the CES 2026 keynote on January 5, 2026. The lineup spans the flagship MI455X for hyperscale AI training, the MI440X for enterprise AI servers, and the MI430X for sovereign AI and HPC, and it succeeds the AMD Instinct MI350 series.^[1]^[2]^[25]

The MI400 series anchors AMD's first complete rack-scale AI system, codenamed Helios, which couples 72 MI455X accelerators with next-generation EPYC "Venice" Zen 6 CPUs and Pensando "Vulcano" 800 Gb/s NICs in a double-wide Open Compute Project (OCP) rack. Helios is positioned as AMD's direct competitor to NVIDIA's Vera Rubin platform, and it is the hardware foundation for a multi-billion dollar, 6 gigawatt strategic partnership announced between AMD and OpenAI in October 2025. "We are thrilled to partner with OpenAI to deliver AI compute at massive scale," AMD chair and CEO Lisa Su said when the agreement was unveiled.^[3]^[4]^[26] Production shipments of MI400 silicon are expected to begin in mid-2026, with the first gigawatt of OpenAI deployments, using AMD Instinct MI450 GPUs, arriving in the second half of 2026.^[4]^[5]^[26]

When was the MI400 announced?

AMD first publicly committed to the MI400 series at its Advancing AI 2025 event in San Jose, California on June 12, 2025. During the keynote, AMD CEO Lisa Su outlined a multi-year accelerator roadmap that included the just-launched MI350 and MI355X parts, the MI400 series for 2026, and a follow-on MI500 generation expected in 2027. AMD framed the MI400 launch as the moment when its data center GPU business would transition from selling individual accelerators to selling complete, integrated rack-scale systems on equal footing with NVIDIA's GB200 NVL72 and the forthcoming Vera Rubin rack.^[1]^[5]^[25]

The full MI400 product family was unveiled at CES 2026 on January 5, 2026, when Lisa Su delivered the opening keynote in Las Vegas. The CES presentation introduced three named variants (MI430X, MI440X, MI455X), confirmed the Helios reference design, and showed working silicon mounted in a Helios compute tray for the first time.^[6]^[7]

Architecture

CDNA 5

The MI400 series is the first product family built on AMD's CDNA 5 (Compute DNA, fifth generation) GPU architecture. CDNA 5 is a clean-sheet redesign of the matrix and vector compute engines, dropping support for graphics rasterization entirely in favor of denser AI and HPC math units. AMD describes CDNA 5 as its most ambitious chiplet design to date, combining multiple compute and memory dies across two TSMC process nodes inside a single accelerator package.^[8]^[9]

The MI455X uses 12 compute chiplets on TSMC's N2 (2 nm class) process node, paired with 3 additional I/O chiplets on TSMC 3 nm. The package integrates these dies on top of a base interposer hosting the cache and memory fabric, then connects to 12 stacks of HBM4 memory. The full MI455X reaches approximately 320 billion transistors, up from the 208 billion in the Blackwell B200 and roughly 1.8 times the transistor count of the MI355X.^[8]^[10]

Compute and precision support

Each MI400 series accelerator delivers up to 40 PFLOPS of dense FP4 throughput and 20 PFLOPS of dense FP8 throughput, double the matrix throughput of the MI355X at the same precisions. The new Matrix Cores in CDNA 5 add native support for the OCP MX (microscaling) FP4 and FP6 formats that are also used by Blackwell, plus retained support for FP8, BF16, FP16, TF32, FP32, and FP64.

Precision format	MI400 dense throughput	Primary use case
FP4 (MXFP4)	40 PFLOPS	Quantized inference, ultra-low-precision training
FP6 (MXFP6)	30 PFLOPS	Inference with accuracy headroom
FP8	20 PFLOPS	Mixed-precision training, high-quality inference
BF16 / FP16	10 PFLOPS	Standard training, scientific AI
TF32	5 PFLOPS	Drop-in training replacement for FP32
FP32	2.5 PFLOPS	Traditional dense compute
FP64 (MI430X)	1.3 PFLOPS	HPC, scientific simulation

The MI430X variant emphasizes FP64 throughput, restoring the full-rate double-precision pipeline that was reduced in some MI350 SKUs. AMD positions the MI430X as a successor to the MI300A and the GPU half of the Frontier supercomputer for traditional HPC customers, while the MI440X and MI455X are tuned more aggressively toward low-precision AI throughput.^[2]^[9]

HBM4 memory

Each MI400 GPU is paired with 432 GB of HBM4 memory across 12 stacks, with each stack running at 8 Gb/s per pin to deliver an aggregate 19.6 TB/s of memory bandwidth. This represents a 1.5 times capacity improvement and a 2.45 times bandwidth improvement compared to the MI355X (288 GB of HBM3e at 8 TB/s). The choice of HBM4 puts the MI400 family slightly ahead of NVIDIA's first Rubin parts on raw capacity, even though Rubin runs its HBM4 stacks at a higher per-pin data rate.^[2]^[11]

The larger HBM4 footprint matters for very large mixture of experts and trillion-parameter dense models, which can fit more experts and KV cache per accelerator, and for agentic AI workloads that hold long context windows in memory. AMD claims a single MI455X can serve a 1 trillion parameter dense model entirely in its own HBM4 at FP4, eliminating the need to shard model weights across multiple accelerators.^[6]

Interconnect

The MI400 series introduces two new interconnect technologies. For scale-up communication within a Helios rack, MI400 accelerators support the Ultra Accelerator Link (UALink) standard alongside AMD's existing Infinity Fabric, making the MI400 family the first commercial silicon shipped with UALink support. The internal scale-up fabric provides 3.6 TB/s of GPU-to-GPU bandwidth per accelerator, with up to 260 TB/s of aggregate scale-up bandwidth across all 72 GPUs in a Helios rack.^[3]^[12]

For scale-out communication between racks, each MI400 GPU exposes 300 GB/s of dedicated scale-out bandwidth through paired Pensando "Vulcano" 800 Gb/s NICs. The Vulcano NIC implements the Ultra Ethernet Consortium (UEC) specification and supports both standard Ethernet and AMD's proprietary congestion control extensions. A single Helios rack provides up to 43 TB/s of aggregate Ethernet-based scale-out bandwidth, roughly an order of magnitude more than typical NVIDIA InfiniBand deployments of the same era.^[3]^[13]

Product lineup

MI455X

The MI455X is the flagship of the MI400 family and the only variant qualified for the Helios rack-scale platform. It delivers 40 PFLOPS of FP4 compute, 20 PFLOPS of FP8, 432 GB of HBM4 at 19.6 TB/s, and 3.6 TB/s of scale-up bandwidth per accelerator. The MI455X targets hyperscale AI training clusters, the largest distributed inference deployments, and frontier model training runs. It is sold only in rack-scale configurations.^[2]^[6]

Specification	MI455X
Architecture	CDNA 5
Transistors	~320 billion
Process node	TSMC N2 (compute) + TSMC N3 (I/O)
Chiplets	12 compute + 3 I/O + base interposer
HBM memory	432 GB HBM4 (12 stacks)
Memory bandwidth	19.6 TB/s
FP4 dense	40 PFLOPS
FP8 dense	20 PFLOPS
Scale-up bandwidth	3.6 TB/s per GPU
Scale-out bandwidth	300 GB/s per GPU
Interconnect	UALink, Infinity Fabric, Ultra Ethernet
TDP	Liquid-cooled, rack-scale (~1.4 to 1.8 kW range)

MI440X

The MI440X is the enterprise variant, designed to slot into standard 8-GPU OAM servers next to a single EPYC Venice CPU. It carries the same 432 GB of HBM4 and 19.6 TB/s memory bandwidth as the MI455X, but is optimized for lower-precision workloads (FP4, FP8, BF16) and ships in air-cooled or liquid-cooled form factors. The MI440X powers AMD's new Enterprise AI Platform reference design, which targets corporate AI training and inference at single-server to multi-rack scale.^[2]^[14]

MI430X

The MI430X is the HPC and sovereign AI variant. It uses a different chiplet mix that emphasizes FP64 and FP32 throughput alongside the same FP4 and FP8 matrix units. The MI430X targets national laboratories, government cloud customers, defense contractors, and traditional supercomputing centers that need both high-precision scientific compute and modern AI capability in the same accelerator.^[2]^[7]

Variant	FP4 / FP8	FP64	Workload emphasis
MI455X	40 / 20 PFLOPS	Reduced rate	Hyperscale AI training, frontier models
MI440X	40 / 20 PFLOPS	Reduced rate	Enterprise AI, mainstream inference
MI430X	40 / 20 PFLOPS	~1.3 PFLOPS (full rate)	HPC, sovereign AI, scientific computing

What is the AMD Helios rack-scale system?

Overview

Helios is AMD's first complete rack-scale AI system, and the primary delivery vehicle for the MI455X. AMD describes Helios as a reference design rather than a single SKU, meaning OEM partners (Dell, HPE, Lenovo, Supermicro) and ODMs will sell their own Helios-compatible racks that meet the AMD specification. The system is built on the Open Compute Project (OCP) 2025 double-wide rack standard, which Meta originally contributed to OCP. See AMD Helios for the full system article.^[12]^[15]

A single Helios rack integrates 72 MI455X accelerators, 18 EPYC "Venice" Zen 6 CPUs (paired with four GPUs each in a 1:4 ratio), and 72 Pensando Vulcano 800 Gb/s NICs (one per GPU for scale-out). The system uses direct-to-chip liquid cooling across all major components, with power consumption estimated at 130 to 170 kW per rack depending on configuration.^[12]^[13]

Performance and specifications

At the rack level, Helios delivers approximately 2.9 EFLOPS of FP4 inference compute and 1.4 EFLOPS of FP8 training compute, with 31 TB of aggregate HBM4 memory and 1.4 PB/s of total memory bandwidth. The rack-internal scale-up fabric provides 260 TB/s of bandwidth, and the scale-out network adds another 43 TB/s of Ethernet capacity for connecting Helios racks into larger superpods.^[3]^[12]

Helios rack metric	Value
MI455X accelerators per rack	72
EPYC Venice CPUs per rack	18 (Zen 6, 2 nm)
Pensando Vulcano NICs	72 (800 Gb/s each)
Total HBM4 memory	31 TB
Aggregate memory bandwidth	1.4 PB/s
Scale-up bandwidth	260 TB/s
Scale-out Ethernet bandwidth	43 TB/s
FP4 compute	~2.9 EFLOPS
FP8 compute	~1.4 EFLOPS
Form factor	OCP 2025 double-wide rack, direct-to-chip liquid cooling
Rack power	~130 to 170 kW

AMD claims roughly a 10 times AI performance increase per rack compared to the MI300X generation, driven by the FP4 matrix path, higher GPU count per rack (72 vs 8 in a standard MI300X server), and the new UALink scale-up fabric.^[6]^[16]

How does the MI400 compare to other accelerators?

Generational comparison

Specification	MI300X (CDNA 3, 2023)	MI325X (CDNA 3, 2024)	MI355X (CDNA 4, 2025)	MI455X (CDNA 5, 2026)
HBM capacity	192 GB HBM3	256 GB HBM3e	288 GB HBM3e	432 GB HBM4
HBM bandwidth	5.3 TB/s	6 TB/s	8 TB/s	19.6 TB/s
FP8 dense	2.6 PFLOPS	2.6 PFLOPS	10 PFLOPS	20 PFLOPS
FP4 dense	not supported	not supported	20 PFLOPS	40 PFLOPS
Scale-up bandwidth	896 GB/s (Infinity Fabric)	896 GB/s	1.075 TB/s	3.6 TB/s (UALink + IF)
Process node	TSMC 5 nm	TSMC 5 nm	TSMC N3	TSMC N2 (compute) + N3 (I/O)
Rack-scale platform	None	None	Helios reference (MI355X)	Helios (72-GPU rack)

The jump from CDNA 4 to CDNA 5 is the largest generational step AMD has taken in the Instinct line. Memory capacity grew 1.5 times, bandwidth grew 2.45 times, dense FP4 throughput doubled, and the scale-up fabric tripled. AMD compares the MI455X-based Helios rack to the MI300X-based 8-GPU server it replaces, claiming approximately 35 times the AI compute per node and 18 times the HBM capacity per node, before accounting for software efficiency gains.^[6]^[9]

How does the MI400 differ from NVIDIA Vera Rubin?

The MI455X is positioned as a direct competitor to NVIDIA's Vera Rubin platform, which features the Rubin R100 GPU paired with NVIDIA's new Vera CPU. The two platforms entered customer evaluation roughly simultaneously in 2026, with NVIDIA targeting partner availability in the second half of 2026.^[17]^[18]

Metric	AMD MI455X	NVIDIA Rubin R100 (announced)
Transistors	~320 billion	~336 billion
Process node	TSMC N2 (compute)	TSMC N3
HBM capacity	432 GB HBM4	288 GB HBM4
HBM bandwidth	19.6 TB/s	~22 TB/s
FP4 dense compute	40 PFLOPS	~50 PFLOPS (NVFP4)
FP8 dense compute	20 PFLOPS	~35 PFLOPS
Scale-up bandwidth	3.6 TB/s (UALink)	1.8 TB/s (NVLink 6)
Rack scale unit	Helios, 72 GPUs	Rubin NVL144, 144 GPU dies
Launch window	H2 2026	H2 2026

AMD leads on memory capacity, with 1.5 times more HBM4 per accelerator (432 GB versus 288 GB) and a stated 50 percent memory-capacity advantage at the rack level. NVIDIA leads on peak FP4 and FP8 throughput, per-pin HBM4 data rate, and the maturity of its software ecosystem. The competitive positioning has settled around the idea that AMD's MI400 family offers superior price and performance for inference-heavy workloads and large context windows, while NVIDIA's Rubin offers higher peak training throughput per accelerator and benefits from a more mature CUDA software stack.^[18]^[19]

Software

The MI400 series uses the AMD ROCm open-source software stack, which provides the equivalent functionality to NVIDIA's CUDA platform across compilers, libraries, runtimes, and AI frameworks. AMD released ROCm 7.0 in mid-2025 alongside the MI350 launch, and the version of ROCm shipping with MI400 hardware extends that stack with native support for the new CDNA 5 instructions, FP4 and FP6 matrix paths, UALink-aware collective operations, and integration with the Vulcano scale-out NICs.^[20]^[9]

AMD's headline performance claim for ROCm 7.0 is up to 4 times faster inference and up to 3 times faster training compared to ROCm 6.0 on equivalent hardware. The improvements come from kernel-level GEMM optimizations, new attention implementations tuned for long context, expanded support for vLLM and SGLang, and a redesigned distributed inference runtime that takes advantage of the MI400's UALink fabric for tensor-parallel and expert-parallel sharding.^[20]

The ROCm stack for MI400 supports the standard set of AI frameworks: PyTorch, JAX, TensorFlow, vLLM, SGLang, and the Hugging Face Transformers library, with optimized container images for popular open models including Llama 4, DeepSeek, and Qwen.

Customers and deployments

What is the AMD and OpenAI partnership?

On October 6, 2025, AMD and OpenAI announced a multi-year strategic partnership covering 6 gigawatts of AMD GPU capacity, beginning with a 1 gigawatt deployment of AMD Instinct MI450 series GPUs in the second half of 2026.^[4]^[26] The deal is the largest single non-NVIDIA AI customer commitment ever signed, with industry estimates valuing the GPU portion of the agreement between $15 billion and $25 billion over its multi-year term.^[4]^[21] "This partnership is a major step in building the compute capacity needed to realize AI's full potential," OpenAI co-founder and CEO Sam Altman said.^[26]

As part of the agreement, AMD granted OpenAI a warrant to purchase up to 160 million shares of AMD common stock at a nominal exercise price, with vesting tied to deployment milestones and AMD share-price targets. At full vesting, the warrant represents roughly a 10 percent stake in AMD, aligning OpenAI's commercial interest with AMD's data center GPU success. The first gigawatt of capacity is scheduled to come online in H2 2026, with the remaining 5 gigawatts deploying through subsequent MI400 and MI500 generations through roughly 2030.^[4]^[22]^[26]

Oracle Cloud Infrastructure

In October 2025, Oracle and AMD expanded their existing partnership to include MI400 series deployments on Oracle Cloud Infrastructure (OCI), following Oracle's June 2025 commitment to deploy 130,000 MI355X accelerators on OCI. Oracle intends to take delivery of approximately 50,000 MI400 series GPUs in 2026, with the first Helios racks coming online in OCI regions in H2 2026. OCI uses MI400 hardware to back its OCI Compute and OCI Supercluster products. Oracle's MI400 deployment is also tied indirectly to the OpenAI partnership, since OpenAI has separately contracted Oracle for tens of billions of dollars of compute capacity that will run on a mix of NVIDIA and AMD silicon.^[23]^[24]

Other customers

AMD has named Meta, Microsoft, Oracle, Tesla, and HUMAIN as customers committed to MI400 deployments. Meta is a particularly important partner because the OCP 2025 double-wide rack standard that Helios uses was originally contributed by Meta. Several national supercomputing centers in the United States, Europe, and Asia have committed to MI430X procurements for upcoming exascale and post-exascale systems.^[6]^[14]

Manufacturing and supply

The MI400 series compute chiplets are manufactured on TSMC's N2 (2 nm class) process node, making the MI455X one of the first high-volume products on N2. The I/O dies, base interposer, and memory controller chiplets use TSMC's N3 process. Packages are assembled using TSMC's chip-on-wafer-on-substrate (CoWoS) advanced packaging, with HBM4 stacks supplied by SK hynix and Samsung. Production shipments are scheduled to begin in Q2 2026, with broad availability in H2 2026. Analysts estimate AMD could ship between 200,000 and 300,000 MI400 series units in 2026, ramping in 2027 as N2 capacity and HBM4 supply expand. The largest capacity constraint on the MI400 ramp is HBM4 production, since the MI455X uses 12 HBM4 stacks per accelerator and the early HBM4 supply is shared across AMD, NVIDIA, and other AI accelerator makers.^[8]^[16]^[19]

Reception and analysis

Industry reception of the MI400 announcement has been broadly positive, particularly around the HBM4 capacity lead over Rubin, the OpenAI partnership as a multi-year demand anchor independent of NVIDIA's customer base, and the Helios rack-scale system, which moves AMD into direct competition with NVIDIA on integrated rack products rather than just individual accelerators.^[19]^[22]

Areas of skepticism include the maturity of the ROCm software stack relative to CUDA, the limited public benchmark data available before silicon ships in volume, and the operational complexity of running 72-GPU UALink fabrics at scale. Financial analysts have raised their data center GPU revenue forecasts for AMD into 2026 and 2027 on the strength of the MI400 ramp and the OpenAI partnership, with AMD's forward guidance citing the MI400 series as a major driver of a $10 billion or larger annual AI accelerator revenue run-rate.^[16]^[22]

References

AMD Accelerates Pace of Data Center AI Innovation and Leadership with Expanded AMD Instinct GPU Roadmap. AMD Investor Relations. June 12, 2025. ↩
AMD touts Instinct MI430X, MI440X, and MI455X AI accelerators and Helios rack-scale AI architecture at CES. Tom's Hardware. January 2026. ↩
AMD Helios: AI Rack Built on Meta's 2025 OCP Design. AMD Blogs. 2025. ↩
AMD and OpenAI Announce Strategic Partnership to Deploy 6 Gigawatts of AMD GPUs. AMD Newsroom. October 6, 2025. ↩
AMD Says Helios Racks And MI400 Series GPUs On Track For 2H 2026. The Next Platform. February 23, 2026. ↩
AMD unveils full MI400 product lineup, claims MI500 chips will deliver 1,000x increase in AI performance. Data Center Dynamics. 2026. ↩
AMD's EPYC Venice, Instinct MI455X, & Helios Hardware On Display for First Time at CES 2026. ServeTheHome. January 2026. ↩
CES 2026: Taking the Lids off AMD's Venice and MI400 SoCs. Chips and Cheese. January 2026. ↩
AMD Instinct MI400 Launches in 2026 with CDNA 5 Architecture. Guru3D. 2025. ↩
AMD's Next-Gen Instinct MI400 Accelerator Doubles The Compute To 40 PFLOPs. Wccftech. 2025. ↩
AMD launches Instinct MI350 series, confirms MI400 in 2026 with 432GB HBM4 memory. VideoCardz. June 2025. ↩
AMD Liquid Cooled 72-GPU Helios Racks. NextBigFuture. October 2025. ↩
AMD Vulcano 800G NIC Coming As AMD Outlines its UALink and UEC Scale Plans. ServeTheHome. 2025. ↩
CES 2026: AMD Details Helios AI Rack and Next-Gen Instinct MI400 GPUs. TechTimes. January 6, 2026. ↩
AMD Delivering Open Rack Scale AI Infrastructure. AMD Blogs. 2025. ↩
AMD MI400 Series: $7.2B AI GPU Challenging Nvidia. Tech Insider. 2026. ↩
AMD Instinct MI450 Is Coming In 2026 To Challenge NVIDIA's Vera Rubin. HotHardware. 2025. ↩
AMD MI400 vs NVIDIA B300: Performance, Pricing, and Migration Guide (2026). Spheron. 2026. ↩
AMD Advancing AI: MI350X and MI400 UALoE72, MI500 UAL256. SemiAnalysis. 2025. ↩
AMD Delivers Breakthrough MLPerf Inference 6.0 Results. AMD Blogs. 2026. ↩
AMD to supply 6GW of compute capacity to OpenAI in chip deal worth tens of billions. TechCrunch. October 6, 2025. ↩
AMD and OpenAI: The 6 Gigawatt Bet. More Than Moore. 2025. ↩
Oracle and AMD Expand Partnership to Help Customers Achieve Next-Generation AI Scale. Oracle Newsroom. October 14, 2025. ↩
AMD to Provide 50,000 AI Chips to Oracle in 2026 Following Mega Deal with OpenAI. TMTPost. 2025. ↩
AMD Unveils Vision for an Open AI Ecosystem, Detailing New Silicon, Software and Systems at Advancing AI 2025. AMD Newsroom. June 12, 2025. ↩
AMD and OpenAI Announce Strategic Partnership to Deploy 6 Gigawatts of AMD GPUs. AMD Investor Relations. October 6, 2025. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

AI Accelerator Comparison (H100 vs B200 vs MI300 vs TPU)AMD AMD EPYC Venice AMD Helios FP4 (4-bit floating point)HBM4 High Bandwidth Memory (HBM)Intel Crescent Island Meta-Amazon Graviton deal Nvidia Nvidia-OpenAI partnership ROCm

When was the MI400 announced?

Architecture

CDNA 5

Compute and precision support

HBM4 memory

Interconnect

Product lineup

MI455X

MI440X

MI430X

What is the AMD Helios rack-scale system?

Overview

Performance and specifications

How does the MI400 compare to other accelerators?

Generational comparison

How does the MI400 differ from NVIDIA Vera Rubin?

Software

Customers and deployments

What is the AMD and OpenAI partnership?

Oracle Cloud Infrastructure

Other customers

Manufacturing and supply

Reception and analysis

See also

References

Improve this article

Related Articles

Cloud TPU

CuDNN

Jetson Thor

Nvidia

NVIDIA Blackwell

NVIDIA DGX Spark

What links here

Related Articles

Cloud TPU

CuDNN

Jetson Thor

Nvidia

NVIDIA Blackwell

NVIDIA DGX Spark

What links here