AWS Trainium 2
Last reviewed
May 17, 2026
Sources
26 citations
Review status
Source-backed
Revision
v2 ยท 5,821 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 17, 2026
Sources
26 citations
Review status
Source-backed
Revision
v2 ยท 5,821 words
Add missing citations, update stale details, or suggest a clearer explanation.
AWS Trainium 2 (also written as Trainium2 and abbreviated Trn2) is the second generation of Amazon Web Services' custom machine learning training accelerator, developed by Amazon's in-house chip design team Annapurna Labs. Amazon first previewed Trainium 2 at AWS re:Invent 2023 and announced general availability at AWS re:Invent 2024 on December 3, 2024. Compared to the original AWS Trainium, the second generation delivers four times the compute performance, four times the memory bandwidth, and three times the memory capacity per chip. Each chip integrates eight NeuronCore-v3 compute cores, 96 GiB of HBM3e memory, and dedicated collective communication hardware for scale-out training. The chip powers Amazon EC2 Trn2 instances and the Trn2 UltraServer, a rack-scale system that connects 64 chips into a single shared-memory domain. AWS deployed Trainium 2 at massive scale through Project Rainier, a cluster of nearly 500,000 Trainium 2 chips activated in October 2025 in partnership with Anthropic for training and serving the Claude family of models. By early 2026, Anthropic was running more than one million Trainium 2 chips across Project Rainier and adjacent inference regions, and AWS had publicly named Apple and OpenAI as additional Trainium 2 customers.
Amazon introduced the original AWS Trainium accelerator in late 2021 as a purpose-built chip for deep learning training on AWS. The first generation, which powered the EC2 Trn1 and Trn1n instances, used two large NeuronCore-v2 compute cores per chip and shipped with 32 GiB of HBM2e memory. While Trn1 established a cost-per-training-step advantage for certain workloads, the architecture was designed before large language models dominated the training landscape, and its memory capacity and interconnect bandwidth proved limiting for models with hundreds of billions of parameters.
Amazon announced Trainium 2 at AWS re:Invent in November 2023, positioning it as a ground-up redesign for the generative AI era. The revised architecture increased the NeuronCore count from two to eight per chip, tripled device memory to 96 GiB, moved to HBM3e for higher bandwidth, and introduced a new generation of NeuronLink interconnect. The chip also added 16 dedicated Collective Communication Cores (CC-Cores), up from six in the first generation, to handle the all-reduce and all-gather operations that dominate large-scale distributed training without competing with the tensor cores for execution resources.
AWS made Trn2 instances generally available in the US East (Ohio) region on December 3, 2024, at the same AWS re:Invent event where the company also previewed Trainium 3 and disclosed Project Rainier details. The simultaneous GA announcement and next-generation preview reflected how quickly Amazon had accelerated its custom silicon roadmap in response to demand from Anthropic and other large-scale training customers. By early 2026, AWS reported more than 1.4 million Trainium chips deployed across all three generations, with Trainium 2 representing the dominant share.
Each Trainium 2 chip contains eight NeuronCore-v3 compute cores. The third generation of NeuronCore extends its predecessors with four functional engines per core:
NeuronCore-v3 also introduces native support for dynamic shapes and control flow through ISA extensions, which earlier generations lacked. This matters for workloads like reinforcement learning from human feedback (RLHF) and sequence-to-sequence tasks where batch dimensions vary at runtime. Trainium 2 additionally supports user-programmable rounding modes, giving researchers control over numerical behavior in reduced-precision formats.
The chip provides the following aggregate compute across all eight cores:
| Precision | Dense TFLOPS | Sparse TFLOPS |
|---|---|---|
| FP8 | 1,299 | 2,563 |
| BF16 / FP16 / TF32 | 667 | 2,563 |
| FP32 | 181 | -- |
Note: sparse figures reflect structured sparsity (2:4 pattern). Per-core FP8 dense throughput is approximately 162 TFLOPS and per-core BF16/FP16/TF32 dense throughput is approximately 83 TFLOPS.
Starting with Trainium 2, AWS Neuron supports Logical NeuronCore Configuration (LNC), which lets software combine the resources of two or four adjacent physical NeuronCores into a single logical NeuronCore. This feature is useful for workloads that benefit from a larger scratchpad or higher sustained memory bandwidth per logical compute unit, at the cost of reduced parallelism. LNC gives operators a tuning knob that did not exist on Trainium 1.
Trainium 2 includes 16 dedicated CC-Cores per chip, compared to six in Trainium 1. These cores handle collective operations (AllReduce, AllGather, ReduceScatter, AllToAll) independently of the tensor cores, so gradient synchronization across chips proceeds in parallel with the forward and backward passes. On NVIDIA GPU clusters, collective operations share streaming multiprocessors with compute, which can create resource contention at large batch sizes. The dedicated CC-Core design is one of the architectural choices AWS made to optimize the chip for distributed training throughput rather than single-chip peak compute.
| Specification | Value |
|---|---|
| Architecture | NeuronCore-v3 |
| NeuronCores per chip | 8 |
| FP8 dense compute | 1,299 TFLOPS |
| BF16 dense compute | 667 TFLOPS |
| FP32 compute | 181 TFLOPS |
| HBM type | HBM3e |
| HBM capacity | 96 GiB |
| HBM bandwidth | 2.9 TB/s |
| DMA bandwidth | 3.5 TB/s |
| Scratchpad (SBUF) | 224 MiB |
| CC-Cores | 16 |
| NeuronLink-v3 bandwidth | 1.28 TB/s per chip |
| Process node | TSMC N5 (5 nm class) |
| Thermal design power | ~500 W |
| Specification | trn2.3xlarge | trn2.48xlarge |
|---|---|---|
| Trainium 2 chips | 1 | 16 |
| NeuronCores | 8 | 128 |
| Accelerator memory (HBM) | 96 GB | 1.5 TB |
| HBM bandwidth | 2.9 TB/s | 46 TB/s |
| FP8 dense compute | 1.3 PFLOPs | 20.8 PFLOPs |
| System vCPUs | 12 | 192 |
| System memory | 128 GB | 2 TB |
| Instance storage | 1x 470 GB NVMe | 4x 1.92 TB NVMe |
| Network bandwidth (EFA v3) | 0.2 Tbps | 3.2 Tbps |
| EBS bandwidth | 5 Gbps | 80 Gbps |
| Specification | Trn2 UltraServer |
|---|---|
| Trainium 2 chips | 64 |
| NeuronCores | 512 |
| Accelerator memory | 6 TiB |
| HBM bandwidth | 185 TB/s |
| FP8 dense compute | 83.2 PFLOPs |
| FP8 sparse compute | 332 PFLOPs |
| Constituent instances | 4 x trn2u.48xlarge |
Each Trainium 2 chip carries 96 GiB of HBM3e across four stacks, with an aggregate bandwidth of 2.9 TB/s. Relative to the first-generation Trn1 chip (32 GiB HBM2e at 820 GB/s), this represents three times the capacity and roughly 3.5 times the bandwidth.
Physically, the chip is a multi-chiplet design in which two compute chiplets share access to all four HBM stacks. The layout creates non-uniform memory access (NUMA) characteristics: a compute chiplet accesses adjacent HBM stacks at full bandwidth but incurs a penalty when accessing stacks attached to the other chiplet. This is similar to the NUMA behavior observed in AMD's Instinct MI300X. Applications that require maximum HBM throughput benefit from NUMA-aware memory placement, and the Neuron SDK exposes controls for this at the operator level.
Trainium 2 also provides 224 MiB of on-chip scratchpad memory (SBUF), 4.7 times larger than Trainium 1's scratchpad. The SBUF is partitioned among NeuronCores and acts as a programmer-controlled tile buffer for the tensor engine, avoiding redundant HBM loads during matrix operations.
DMA bandwidth reaches 3.5 TB/s with inline memory compression and decompression. The inline compression capability reduces effective memory footprint for certain weight formats, which is useful during mixed-precision training where intermediate activations can be stored in a compressed representation before being expanded for the backward pass.
The 16-chip Trn2 instance aggregates to 1.5 TiB of HBM at 46 TB/s combined bandwidth. Memory pooling across up to 64 chips is supported in the UltraServer configuration, enabling trillion-parameter models to distribute their weight matrices across a shared 6 TiB address space.
Amazon Web Services offers two Trn2 instance sizes in the EC2 family:
The trn2.3xlarge is the entry-level instance with a single Trainium 2 chip, 12 vCPUs, 128 GB of system memory, and 96 GB of accelerator memory. It targets inference workloads, fine-tuning smaller models, and workloads that do not require multi-chip connectivity. Network bandwidth is 200 Gbps.
The trn2.48xlarge is the full-instance configuration with all 16 Trainium 2 chips enabled, 192 vCPUs, 2 TiB of system memory, 1.5 TiB of HBM, and 3.2 Tbps of Elastic Fabric Adapter (EFA) v3 network bandwidth. The 16 chips are connected in a 4x4 two-dimensional torus via NeuronLink-v3 at 1 TB/s aggregate chip-to-chip bandwidth. This instance is the building block for both standalone large-model training and for composition into UltraServer systems.
A third variant, the trn2u.48xlarge, has the same compute and memory specs as the trn2.48xlarge but includes UltraServer-capable hardware to support NeuronLink connections across server boundaries. Four trn2u.48xlarge instances combine to form one Trn2 UltraServer.
Trainium 2 instances can be reserved via Amazon EC2 Capacity Blocks for ML, which allows customers to book blocks of up to 64 instances for periods up to six months, with instant start times. Capacity Blocks are available up to eight weeks in advance, giving training teams predictable access without long-term commitment.
Training frameworks pre-configured in AWS Deep Learning AMIs (DLAMIs) are available out of the box, with PyTorch, JAX, and Hugging Face Transformers pre-installed and optimized for Trn2.
Trn2 instances launched in US East (Ohio) and progressively expanded throughout 2025 and into 2026 as AWS provisioned additional Trainium 2 facilities. By the start of 2026, Trn2 capacity was available in additional US regions and through cross-region inference endpoints, and AWS publicly committed to bringing Trn2 inference capacity online in Asia-Pacific and European regions during 2026 in response to international demand for Claude and other Bedrock models. The Indiana campus that anchors Project Rainier represents the single largest concentration of Trn2 hardware.
The Trn2 UltraServer is a rack-scale system that connects four trn2u.48xlarge instances using NeuronLink-v3 to form a single scale-up domain with 64 Trainium 2 chips. Unlike traditional multi-server GPU clusters that rely on network-layer collectives for weight synchronization, the UltraServer exposes all 6 TiB of HBM as a shared accelerator memory pool accessible by any chip at memory bus speeds.
This design removes the latency and bandwidth penalties of inter-server networking during the portions of training where all-reduce operations dominate. For large language models where gradient synchronization represents a significant fraction of step time, moving that communication onto the NeuronLink fabric rather than over EFA can reduce effective training time.
The UltraServer extends the chip topology from a 4x4 two-dimensional torus (within a single instance) to a 4x4x4 three-dimensional torus (across the four instances). The Z-axis connections between instances use OSFP-XD active electrical cables and provide 64 GB/s of point-to-point bandwidth per chip pair, while X and Y axis connections within a server run at 128 GB/s per connection. AWS deliberately avoided optical transceivers for these links, using passive and active copper cables instead. Copper cables provide roughly 100 times better mean time to failure and far lower cable flapping rates than optical transceivers, which is a meaningful reliability advantage in a cluster with thousands of interconnects.
Multiple UltraServers can be connected into an UltraCluster. AWS supports UltraClusters of up to 2,048 Trainium 2 chips at present, interconnected via a petabit-scale non-blocking Ethernet fabric. UltraClusters integrate with Amazon FSx for Lustre for high-throughput checkpoint storage, enabling fast save and restore of model states during long training runs.
The UltraServer entered general availability shortly after the December 2024 Trn2 launch, and through 2025 AWS rolled UltraServers out as the default building block for both Project Rainier and customer-facing Trn2 reservations of any meaningful size.
NeuronLink-v3 is the scale-up interconnect that connects Trainium 2 chips within and across servers. Unlike NVIDIA NVLink, which uses a switch topology enabling all-to-all communication between GPUs in a node, NeuronLink uses direct point-to-point connections arranged in torus configurations. Each chip has six physical NeuronLink ports: four for intra-server connections (forming the 4x4 2D torus) and two additional ports activated in UltraServer mode (extending to the 4x4x4 3D torus).
The torus topology determines which collective communication algorithms are most efficient. Ring AllReduce, which travels around the torus, achieves near-peak bandwidth utilization on a 2D or 3D torus. AWS's NeuronX Collective Communication Library (NXCCL) implements AllReduce, AllGather, ReduceScatter, and AllToAll collectives optimized for the specific torus dimensions. The library is conceptually similar to NVIDIA's NCCL but tuned for Trainium 2's topology rather than NVLink's all-to-all connectivity.
The aggregate NeuronLink-v3 bandwidth per chip is 1.28 TB/s, which is the sum across all active ports. This figure captures the total bidirectional chip-to-chip capacity rather than a single-port measure.
For scale-out communication between UltraServer systems, Trn2 instances use Elastic Fabric Adapter v3, which provides 3.2 Tbps of network bandwidth per instance. EFA v3 supports AWS-designed RDMA and is used for data-parallel gradient synchronization across racks, while NeuronLink handles tensor-parallel and pipeline-parallel communication within a scale-up domain.
All Trainium 2 instances run the AWS Neuron SDK, a comprehensive software stack that includes a compiler, runtime, training libraries, inference libraries, and developer tools. The Neuron SDK is shipped on a roughly quarterly cadence, with major versions tracking new framework releases and incremental versions delivering performance and stability improvements.
Neuron supports PyTorch and JAX as primary training frameworks. PyTorch support uses TorchDynamo bytecode capture to extract computation graphs for compilation, a significant improvement over the earlier PyTorch/XLA path used for Trainium 1, which relied on lazy tensor tracing and was prone to graph fragmentation. JAX support, which reached beta status in 2024, benefits from JAX's functional execution model and static compilation assumptions, which map well to Trainium 2's torus topology and the Neuron compiler's optimization pipeline.
Neuron SDK 2.26.0, released in September 2025, added support for PyTorch 2.8 and JAX 0.6.2 and shipped expanded inference capabilities and new parallelism modes for Trn2. The following Neuron SDK 2.27.0 release, announced at AWS re:Invent in December 2025 alongside Trainium 3, introduced support for Trn3 UltraServers while continuing to update the Trn2 toolchain, and brought several Trn2-relevant features out of private beta or into broader availability.
For inference, Neuron supports vLLM, Hugging Face Transformers, and TensorRT-style optimizations through its inference libraries. PyTorch Lightning and Amazon SageMaker AI are supported as training orchestration layers.
The Neuron SDK integrates with OpenXLA through stable HLO and GSPMD interfaces, which means models written for TPU with JAX/XLA can be ported to Trainium 2 with relatively minor changes. GSPMD (General and Scalable Parallelism for ML Computation) enables automatic partitioning of models across chips.
Neuron Kernel Interface (NKI), introduced in 2024, gives performance engineers direct access to the NeuronCore-v3 instruction set architecture, memory allocation, and execution scheduling. NKI uses a tile-based programming model similar to Triton (used for CUDA kernel development) and allows developers to write custom operators in Python that compile to native NeuronCore instructions. This is useful for fused attention kernels, custom normalization layers, and other operators that would otherwise be decomposed into multiple passes through HBM.
The NKI Library of optimized reference kernels expanded through 2025 from seven to sixteen documented kernel APIs by the re:Invent 2025 release. Coverage now includes a flash-attention-style fused attention kernel, top-k and expert-routing kernels for Mixture of Experts (MoE) models, several variants of layer normalization, and speculative decoding primitives used by Bedrock latency-optimized inference. AWS ships these kernels as open-source Python with documentation and benchmark scripts.
At re:Invent 2025, AWS also previewed an open-source NKI compiler built on MLIR (private beta), a native PyTorch operator path called TorchNeuron that exposes NKI kernels as standard PyTorch operators, and Neuron DRA (Dynamic Resource Allocation) for Kubernetes-native scheduling of Trainium chips. Together these narrow the tooling gap with CUDA for engineers willing to write low-level code.
The Neuron compiler is built on MLIR and is open source. It accepts models in HLO, TorchScript, or traced JAX form and produces optimized binaries for NeuronCore-v3 execution. Key optimizations include operator fusion to minimize HBM traffic, tiling selection to maximize scratchpad utilization, and NUMA-aware weight placement.
Neuron Explorer provides profiling and debugging tools within the SDK, giving developers visibility into NeuronCore utilization, memory bandwidth consumption, and collective communication efficiency. The 2025 SDK releases added trace timelines aligned to NeuronLink transfers, which is the main tool customers use to identify whether a step time is bound by tensor compute, HBM traffic, scratchpad spills, or collective communication.
AWS's published performance figures position Trn2 against prior-generation Trainium and against GPU-based EC2 instances:
| Benchmark | Trn2 vs Trn1 | Trn2 vs P5/P5e (H100) |
|---|---|---|
| Training throughput | 4x faster | 30-40% better price-performance |
| Memory bandwidth | 4x higher | -- |
| Memory capacity | 3x larger | -- |
| Energy efficiency | ~3x better | -- |
For inference, AWS reported that Trainium 2 delivers three times higher token-generation throughput for Meta's Llama 3.1 405B model on Amazon Bedrock compared to other available cloud provider offerings at the time of launch in December 2024.
Claude 3.5 Haiku running on Trainium 2 with latency optimization delivers up to 60 percent faster inference compared to the non-latency-optimized version of the same model, according to Anthropic's November 2024 announcement.
In the SemiAnalysis analysis of the Trainium 2 architecture, the chip's arithmetic intensity is approximately 225.9 BF16 FLOP per byte, which is lower than competing accelerators such as the NVIDIA H100 (approximately 300 FLOP/byte) or Google TPU v6e (approximately 500 FLOP/byte). Lower arithmetic intensity is not inherently a disadvantage: it suits workloads such as Mixture of Experts architectures and memory-bandwidth-bound inference tasks where the balance between compute and memory access favors higher bandwidth over raw TFLOPS.
When Trainium 3 launched at re:Invent 2025, AWS published internal benchmarks comparing the two generations on OpenAI's GPT-OSS open-weight model. Trainium 3 delivered three times higher throughput per chip and four times faster response time than Trn2 UltraServers on the same workload, implicitly establishing a Trn2 UltraServer baseline as the headline reference point for serving large open-weight LLMs in 2026.
The following table compares Trainium 2 specifications with NVIDIA's H100 SXM5, H200 SXM, and Blackwell B200 accelerators:
| Specification | Trainium 2 (chip) | NVIDIA H100 SXM5 | NVIDIA H200 SXM | NVIDIA B200 |
|---|---|---|---|---|
| FP8 dense compute | 1,299 TFLOPS | 3,958 TFLOPS | 3,958 TFLOPS | 9,000 TFLOPS |
| BF16 dense compute | 667 TFLOPS | 1,979 TFLOPS | 1,979 TFLOPS | 4,500 TFLOPS |
| FP32 compute | 181 TFLOPS | 67 TFLOPS | 67 TFLOPS | 160 TFLOPS |
| HBM capacity | 96 GiB | 80 GiB | 141 GiB | 192 GiB |
| HBM bandwidth | 2.9 TB/s | 3.35 TB/s | 4.8 TB/s | 8.0 TB/s |
| Scale-up interconnect | NeuronLink-v3 (1.28 TB/s) | NVLink 4 (0.9 TB/s) | NVLink 4 (0.9 TB/s) | NVLink 5 (1.8 TB/s) |
| TDP | ~500 W | 700 W | 700 W | 1,000 W |
Note: NVIDIA's FP8 figures reflect the Tensor Core with sparsity disabled for the dense row. Trainium 2 FP8 figures from AWS Neuron documentation. H100 and H200 values are for SXM variants. B200 values are from NVIDIA's published product brief.
Trainium 2's raw TFLOPS at FP8 and BF16 are lower than the H100, H200, and B200. AWS argues that the relevant metric for training cost is throughput per dollar rather than peak TFLOPS, and that Trainium 2's combination of lower on-demand pricing, HBM capacity, and the scale-up topology of the UltraServer produces a better cost-per-trained-token outcome for large models.
From a software ecosystem standpoint, NVIDIA's CUDA platform has a much larger library of hand-optimized kernels, profiling tools, and community examples than the Neuron SDK. AWS has invested in closing this gap through NKI, open-source Neuron compiler development, and partnerships with framework teams at PyTorch and Hugging Face. NVIDIA's larger scale-up domain in GB200 NVL72 systems (72 GPUs sharing NVLink) gives Blackwell-generation clusters more flexibility in tensor-parallel degree, which can reduce inference costs at large model sizes. AWS's response is a combination of the Trn2 UltraServer's 64-chip domain at the current generation and explicit support for NVIDIA NVLink Fusion in Trainium 4, which will allow heterogeneous clusters of Trn4 and Blackwell-class GPUs to share a NVLink fabric.
The most consequential deployment of Trainium 2 is Project Rainier, an AI compute cluster built by AWS for Anthropic to train and serve the Claude model family.
AWS detailed Project Rainier at AWS re:Invent in December 2024, describing a cluster with hundreds of thousands of Trainium 2 chips spread across multiple U.S. data centers. AWS brought Project Rainier online in October 2025, completing the deployment in under twelve months from announcement, a pace AWS described as historically fast for infrastructure at this scale. At activation the cluster contained nearly 500,000 Trainium 2 chips spread across a 1,200-acre data center campus in St. Joseph County, Indiana, and additional U.S. sites. AWS publicly stated that Project Rainier represented an approximately 70 percent expansion of its AI compute infrastructure at activation, and that the cluster delivered more than five times the compute Anthropic had used to train its prior generation of Claude models.
Project Rainier is named after Mount Rainier, the 4,392-meter stratovolcano visible from Amazon's Seattle headquarters on clear days. The Indiana campus represents an investment of approximately $11 billion and was constructed in a partnership with local utilities to secure dedicated power capacity. Initial commissioned power was over 400 MW, with expansion plans publicly disclosed to reach more than 1,000 MW as additional buildings are energized.
After activation, Anthropic scaled the Project Rainier footprint aggressively through late 2025 and early 2026. By the first quarter of 2026 Anthropic was running Claude on more than one million Trainium 2 chips across Rainier and adjacent inference regions. This scale-up coincided with sharp growth in Anthropic's run-rate revenue, which the company reported as approximately $9 billion at the end of 2025 and over $30 billion by early 2026, almost all of it generated by Claude inference traffic on Trainium hardware.
Project Rainier disclosures during 2025 confirmed that Anthropic uses the same Trainium 2 chips for both inference and training but time-shifts the two workloads against diurnal demand patterns. During North American daytime hours, when interactive Claude usage peaks, the majority of the cluster serves inference traffic. As inference demand falls overnight, Anthropic re-allocates capacity to training runs and large-scale evaluation. This shared use of training-class hardware for production inference is enabled by Trainium 2's 96 GiB HBM per chip, the UltraServer shared-memory domain, and the latency-oriented optimizations in Neuron SDK 2.26 and 2.27.
Anthropic's engineering teams work closely with Annapurna Labs on hardware co-design, providing feedback from Claude training and inference runs to influence future Trainium generations. This collaboration directly informed the increased memory bandwidth and 3 nm process choice in Trainium 3 and the NVLink Fusion compatibility planned for Trainium 4.
Project Rainier was initially kept out of public reporting. AWS CEO Andy Jassy disclosed that Trainium was "fully subscribed" and represented a "multibillion-dollar business that grew 150 percent quarter-over-quarter" at the time of the re:Invent 2024 announcements. The original Amazon investment in Anthropic was $4 billion in 2023 and was raised to $8 billion in early 2024. In November 2025, the two companies announced an expanded multi-year compute agreement under which Anthropic committed to spend more than $100 billion on AWS over approximately a decade, with up to 5 GW of total capacity to be brought online for training and serving Claude, including new Trainium 2 capacity arriving in the first half of 2026 and roughly 1 GW of combined Trainium 2 and Trainium 3 capacity by the end of 2026.
Amazon Bedrock is the fully managed service through which AWS customers access foundation models from Anthropic, Meta, Mistral, and other providers via API. Trainium 2 powers two distinct capabilities in Bedrock: latency-optimized inference for specific models and background training infrastructure.
Bedrock's latency-optimized inference tier uses Trainium 2 to accelerate models that benefit from the chip's high memory bandwidth and capacity. At launch in December 2024, latency-optimized Trainium 2 inference was available for:
Latency-optimized Trainium 2 inference was initially restricted to the US East (Ohio) region via cross-region inference. Through 2025 AWS extended the option to additional US regions and added newer Anthropic and Amazon Nova models to the supported list as they became available. The Asia-Pacific and European expansion announced in the November 2025 Anthropic agreement is intended to make Trn2-backed inference natively available in those geographies during 2026, reducing the need to rely on cross-region inference for non-US Bedrock customers.
Beyond the publicly visible inference tier, Trainium 2 runs workloads for Bedrock model providers, including the Anthropic Claude family. By 2025, more than 50 percent of token throughput on Amazon Bedrock ran on Trainium hardware, the majority of which is Trainium 2. The share continued to grow into 2026 as Project Rainier capacity came online and as additional models were enabled on the latency-optimized tier.
Claude 3.5 Haiku accessed through Bedrock with latency optimization on Trainium 2 was priced at $1.00 per million input tokens and $5.00 per million output tokens in the US East (Ohio) region at launch in late 2024. Standard (non-latency-optimized) Claude 3.5 Haiku was priced at $0.80 per million input tokens and $4.00 per million output tokens across all regions. Pricing for later Claude versions and other models on the latency-optimized tier is published per-model on the Bedrock pricing page.
Anthropic is by far the largest Trainium 2 customer. The company moved from roughly 500,000 chips at Project Rainier activation in October 2025 to more than one million chips by early 2026, with additional capacity committed under the November 2025 multi-year agreement. Anthropic uses Trainium 2 for the full Claude lifecycle: pretraining, fine-tuning, reinforcement learning, evaluation, and production inference through both first-party APIs and Amazon Bedrock.
In late 2025 OpenAI and Amazon disclosed a $50 billion partnership under which OpenAI committed to consume approximately 2 GW of AWS Trainium capacity for inference and training of the GPT family, with AWS becoming the exclusive cloud distributor for an OpenAI Frontier enterprise platform. Trainium 2 was the initial deployment target, with later expansion to Trainium 3 as that chip ramped. The OpenAI agreement was the first publicly disclosed use of AWS custom silicon by another frontier model provider and was treated as strategic validation of the Trainium architecture beyond the Anthropic relationship.
Apple uses Trainium 2 as part of the server-side infrastructure for Apple Intelligence, specifically within the Private Cloud Compute service that handles requests too large for on-device processing. Apple has publicly cited efficiency gains of more than 40 percent versus comparable GPU instances for its workload mix. Apple was named alongside Anthropic and OpenAI as a Trainium customer during AWS's spring 2026 press tour of the Annapurna Labs facility.
Other disclosed Trainium 2 customers include:
AWS CEO Andy Jassy acknowledged in late 2024 that Trainium 2 was "fully subscribed" at launch, meaning demand exceeded available capacity. The supply-constrained dynamic persisted through 2025 and 2026, with each capacity expansion typically committed to existing customers under multi-year agreements before reaching the broader on-demand pool.
EC2 Trn2 pricing as of launch in December 2024:
| Instance | On-demand price |
|---|---|
| trn2.48xlarge | $0.125 per hour (Capacity Blocks for ML, normalized) |
AWS has not publicly listed a single uniform on-demand price for Trn2 across all reservation modes; most large customers consume Trn2 capacity via Capacity Blocks for ML, multi-year reserved agreements, or via Bedrock's per-token pricing rather than spot or hourly on-demand instances.
Spot pricing for trn2.48xlarge has been observed at approximately $8.72 per hour, reflecting the premium the market places on guaranteed access to scarce Trainium 2 capacity. Reserved Instance and Savings Plans pricing for Trn2 was not generally available at GA launch, and most reserved capacity through 2025 was absorbed by the Anthropic, OpenAI, and Apple agreements before reaching the public reservation system.
AWS claims Trn2 instances deliver 30 to 40 percent better price-performance than P5e and P5en instances (which use NVIDIA H100 SXM5 GPUs), citing internal benchmark results for large language model training workloads. The Trn2 UltraServer is positioned for the largest training and inference workloads, with the standalone Trn2 instance targeting fine-tuning and smaller inference deployments.
Trainium 2 chips have a thermal design power of approximately 500 W per chip. A fully populated trn2.48xlarge with 16 chips draws roughly 8 kW from the accelerator subsystem alone before host CPU, memory, networking, and cooling overhead. The Trn2 UltraServer's four-instance rack-scale unit is in the 35 to 40 kW class, comparable to NVIDIA HGX-based racks of similar generation but with higher chip density per rack thanks to the lower per-chip TDP.
Project Rainier's Indiana campus was designed with dedicated electrical infrastructure and an initial commissioned power of more than 400 MW for the Trn2 phase, scaling to more than 1 GW as additional buildings energize during 2026 and 2027. AWS reported that the campus uses liquid-assisted cooling for Trn2 UltraServers and that the choice of copper rather than optical interconnect for NeuronLink reduces both peak power and steady-state failure rates relative to comparable optical fabrics. Trainium 3, at a 3 nm process, was disclosed at re:Invent 2025 as delivering four times better performance per watt than Trn2 UltraServers, framing Trn2 as the first AWS custom-silicon generation broadly competitive on energy efficiency with the contemporary NVIDIA GPU generation.
AWS announced Trainium 3 at AWS re:Invent 2024, one year after announcing Trainium 2, establishing an annual chip cadence. Trainium 3 reached general availability at AWS re:Invent 2025 in December 2025.
Trainium 3 is built on a 3 nm process, compared to Trainium 2's 5 nm node. Per-chip specifications as disclosed by AWS include:
The Trn3 UltraServer scales to 144 chips, delivering 362 FP8 PFLOPs, 20.7 TiB of HBM, and 706 TB/s of aggregate memory bandwidth. AWS claims Trn3 delivers 4.4 times higher performance and 3.9 times higher memory bandwidth than Trn2 UltraServers, with four times better performance per watt.
At the same re:Invent 2025 event, AWS previewed Trainium 4, targeting late 2026 or early 2027 availability. Trainium 4 is expected to deliver three times the FP8 performance and four times the memory bandwidth of Trainium 3, and will support NVIDIA NVLink Fusion, enabling heterogeneous clusters where Trainium 4 chips and NVIDIA Blackwell-generation GPUs share a NVLink fabric. The Trainium 2 fleet is expected to remain in production for several years alongside Trn3 and Trn4 capacity, both because Anthropic, OpenAI, and Apple have multi-year commitments that pre-date Trn3 GA and because the Trn2 cost-per-token economics on inference workloads remain favorable for many model classes even after newer generations ship.