AWS Trainium 3

AI Hardware AI Infrastructure

18 min read

Updated Jun 24, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 24, 2026

Fact-checked

In review queue

Sources

16 citations

Revision

v2 · 3,587 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

AWS Trainium 3 (also written as Trainium3 and abbreviated Trn3) is the third-generation custom AI training and inference accelerator from Amazon Web Services, designed by Amazon's in-house chip team Annapurna Labs and the first AWS-designed AI chip built on a 3 nanometer process. AWS reached general availability through Amazon EC2 Trn3 UltraServers on December 1, 2025, stating that Trn3 UltraServers deliver up to 4.4 times the compute performance, 3.9 times the memory bandwidth, and 4 times the energy efficiency of the prior-generation Trn2 UltraServer.^[1]^[2] Each chip delivers 2.52 petaflops of dense FP8 compute, carries 144 GB of HBM3e memory, and provides 4.9 TB/s of memory bandwidth, roughly doubling the per-chip FP8 throughput of AWS Trainium 2.^[1]^[3]

Trainium 3 was first previewed at AWS re:Invent in December 2024 and formally introduced with full specifications at re:Invent in December 2025.^[2]^[4] At the system level, Trn3 UltraServers scale from 64 chips in the Gen1 configuration to 144 chips in the Gen2 configuration, with the larger system reaching 362 petaflops of FP8 compute, 20.7 TB of HBM3e capacity, and 706 TB/s of aggregate memory bandwidth.^[1]^[3] Customers including Anthropic, Karakuri, Metagenomi, NetoAI, Ricoh, and Splash Music have reported "reducing training and inference costs by up to 50%" relative to alternative accelerators, according to AWS.^[1]

Background and development

AWS designed its first Trainium accelerator in 2020 and shipped it in EC2 Trn1 instances in 2022 as a deep learning training chip aimed at lowering the cost of large model training relative to general-purpose GPUs. The original chip used two NeuronCore-v2 cores with 32 GiB of HBM2e on a 7 nanometer process. The second-generation AWS Trainium 2, announced at re:Invent 2023 and made generally available in late 2024, was a wholesale redesign for the generative AI era: it added six more NeuronCore-v3 cores, tripled on-package memory to 96 GiB of HBM3e, and introduced the Trn2 UltraServer, a rack-scale shared-memory domain of 64 chips connected by NeuronLink-v3. Trainium 2 became the basis of Project Rainier, a cluster of nearly 500,000 chips that AWS brought online in less than twelve months in partnership with Anthropic for training the Claude family of models; AWS says the cluster provides more than five times the compute Anthropic used to train its previous models.^[5]

Amazon previewed Trainium 3 during the Trainium 2 launch at re:Invent 2024, telling attendees that the next chip would use a 3 nanometer process, would offer roughly four times the performance of Trainium 2, and would be available in late 2025 with fuller volumes in early 2026.^[4] Over the following twelve months Annapurna Labs worked closely with Anthropic on the silicon design, with the AI safety company providing direct input into the chip's instruction set architecture and its interconnect fabric over a collaboration that ran for more than two years.^[6] The program reflected a broader strategic shift inside AWS toward owning more of the AI compute stack, with the design philosophy explicitly optimizing performance per total cost of ownership rather than peak performance.^[7]

How is Trainium 3 built? Chip architecture

What process and packaging does Trainium 3 use?

Trainium 3 is fabricated on TSMC's N3P process, the company's performance-enhanced 3 nanometer node, a step up from the N5 node used for Trainium 2.^[7] SemiAnalysis notes that Trainium 3 is one of the first adopters of N3P, alongside Nvidia's Vera Rubin and AMD's MI450X active interposer die.^[7] The accelerator is packaged using TSMC's CoWoS-R variant, which employs an organic thin-film interposer (six layers of copper redistribution layers on a polymer substrate) rather than the silicon interposer used in CoWoS-S packaging on high-end GPUs.^[7] The package is composed of two CoWoS-R assemblies rather than one large interposer, a topology AWS chose to prioritize yield and supply continuity given industry-wide competition for advanced packaging supply through 2025 and 2026.^[7]

The package integrates four stacks of HBM3e in a 12-high configuration, each providing 36 GiB for a total of 144 GiB per chip, versus the 8-high stacks on Trainium 2.^[7] Pin speeds reach 9.6 gigabits per second, which SemiAnalysis described as "the highest HBM3E pin speeds we've seen yet" in 2025 and a substantial step up from the 5.7 gigabits per second used on Trainium 2.^[7]

How are the NeuronCore-v4 compute cores organized?

Each Trainium 3 accelerator contains eight NeuronCore-v4 cores, the same count as Trainium 2 but with substantially upgraded internal organization.^[8] A NeuronCore-v4 consists of four execution engines: a Tensor Engine for matrix and convolution operations, a Vector Engine for elementwise operations and reductions, a Scalar Engine for control flow, and a GPSIMD Engine for general-purpose work such as sorts and gathers.^[8] The core includes 32 MiB of SBUF software-managed scratchpad memory (up from 28 MiB on NeuronCore-v3) and a 2 MiB PSUM buffer for accumulating partial sums during matrix multiplication.^[8]

The Tensor Engine contains two separate systolic arrays optimized for different numeric formats. A 128 by 128 BF16 array handles bfloat16, float16, TF32, and FP32 computations, while a larger 512 by 128 array is dedicated to the OCP-compliant MXFP8 and MXFP4 microscaling formats, doubling the MXFP8 capability of Trainium 2.^[7] Per Tensor Engine, the core delivers 315 teraflops of MXFP8 or MXFP4 throughput, 79 teraflops of BF16/FP16/TF32, and 20 teraflops of FP32. With eight cores per chip, a single Trainium 3 device reaches 2.52 petaflops of dense MXFP8 or MXFP4 performance.^[3]^[7] Notably, BF16 throughput per chip is roughly unchanged from Trainium 2 at around 0.65 petaflops. AWS put the majority of the additional silicon area into MXFP8 and MXFP4 capability rather than scaling all numeric formats uniformly, on the basis that production-scale generative AI workloads in 2025 and 2026 are increasingly dominated by FP8 training, FP8 inference, and FP4 inference.^[7]

Memory and on-chip bandwidth

Trainium 3 integrates 144 GiB of HBM3e per chip across four stacks, providing 4.9 TB/s of memory bandwidth.^[3] The capacity is 1.5 times that of Trainium 2 and the bandwidth is roughly 70 percent higher. The larger memory footprint allows Trainium 3 to hold significantly larger model weights, optimizer states, and KV caches per chip, which reduces the number of chips that must be enlisted simply to fit a given model and improves utilization on inference workloads with long context windows.

The table below compares the key per-chip specifications of Trainium 3 against its predecessors.

Specification	AWS Trainium	AWS Trainium 2	AWS Trainium 3
Launch year	2022	2024	2025
Process node	TSMC N7	TSMC N5	TSMC N3P
NeuronCore version	v2 (2 cores)	v3 (8 cores)	v4 (8 cores)
FP8 / MXFP8 compute (per chip)	~0.19 PF (FP16)	1.3 PF	2.52 PF
BF16 compute (per chip)	0.19 PF	0.65 PF	0.65 PF
HBM capacity	32 GiB HBM2e	96 GiB HBM3e	144 GiB HBM3e
HBM bandwidth	0.82 TB/s	2.9 TB/s	4.9 TB/s
HBM pin speed	3.2 Gb/s	5.7 Gb/s	9.6 Gb/s
Packaging	Standard	CoWoS-R	CoWoS-R
Scale-up domain	16 chips (Trn1)	64 chips	64 or 144 chips

How do Trn3 UltraServers scale? System architecture

A Trn3 UltraServer is a rack-scale system that ties multiple Trainium 3 chips into a single shared-memory scale-up domain. AWS ships Trn3 UltraServers in two configurations: a Gen1 with 64 chips and a Gen2 with 144 chips.^[1]^[3] Both expose a single logical memory and compute pool, which lets large model parallelism strategies place tensor parallel groups and pipeline parallel stages across the entire scale-up domain without crossing network protocol boundaries.

Sleds and racks

Each Trn3 server, internally referred to as a sled, contains four Trainium 3 chips connected through an intra-server PCIe switch. Every chip exposes four PCIe Gen6 x8 links to this switch, providing 256 GB/s of bidirectional bandwidth within a sled.^[7] The Gen1 UltraServer uses 16 sleds for 64 chips, while the Gen2 uses 36 sleds for 144 chips.

NeuronLink-v4 and NeuronSwitch-v1

The defining architectural feature of the Trn3 UltraServer is its NeuronSwitch-v1 fabric, the first switched all-to-all interconnect that AWS has shipped on a Trainium generation.^[1]^[7] Prior Trn2 UltraServers used a directly-cabled 2D torus topology over NeuronLink-v3, which kept costs low but produced bottlenecks for all-to-all collective patterns such as the exchanges that mixture-of-experts routing requires. NeuronSwitch-v1 replaces the torus with an all-to-all switched fabric over NeuronLink-v4, which AWS says doubles interchip interconnect bandwidth over the Trn2 UltraServer.^[1]^[7] In the Gen2 UltraServer, any chip can communicate with any other chip at NeuronLink speeds without multi-hop routing.^[7]

The Gen2 UltraServer aggregates the fabric into 706 TB/s of total HBM-to-HBM bandwidth across all 144 chips, supporting any-to-any communication without oversubscription.^[3] AWS plans three successive PCIe switch generations during Trainium 3's commercial lifetime: a first-generation 160-lane, 20-port PCIe Gen6 switch used at launch, a 320-lane higher-radix PCIe switch, and a switch that will move the fabric onto the open UALink protocol.^[7]

Scale-out networking

Beyond the UltraServer, Trn3 systems connect to larger training clusters through Elastic Fabric Adapter version 4 (EFAv4), AWS's custom RDMA-over-Ethernet stack. Each chip is provisioned with 200 Gb/s of EFAv4 bandwidth by default, with a 400 Gb/s option, delivered through Nitro-v6 400G SmartNICs.^[7] Multiple UltraServers are aggregated into EC2 UltraClusters, AWS's training cluster architecture, which can connect hundreds of thousands of Trainium chips into a single training workload.

How fast is Trainium 3? Performance and benchmarks

AWS publicly compares Trn3 UltraServers to Trn2 UltraServers across several axes. The headline figures are 4.4 times higher peak compute, 3.9 times higher aggregate memory bandwidth, and 4 times better energy efficiency at the UltraServer level.^[1]^[2] On the Amazon Bedrock managed inference platform, AWS reports that Trainium 3 is its fastest accelerator, delivering up to 3 times faster performance than Trainium 2 with over 5 times higher output tokens per megawatt at similar latency per user.^[1]

The table below summarizes the published UltraServer-level specifications.

Metric	Trn2 UltraServer	Trn3 Gen1 UltraServer	Trn3 Gen2 UltraServer
Chips per UltraServer	64	64	144
Peak MXFP8 compute	~83 PF	161 PF	362 PF
Aggregate HBM capacity	6 TiB	9 TiB	20.7 TiB
Aggregate HBM bandwidth	~185 TB/s	314 TB/s	706 TB/s
Interchip fabric	NeuronLink-v3 torus	NeuronSwitch-v1	NeuronSwitch-v1
Per-chip NeuronLink bandwidth	~1 TB/s	2 TB/s	2 TB/s
Per-chip EFA bandwidth	200 Gb/s	200 or 400 Gb/s	200 or 400 Gb/s
Relative performance	1.0x	~2.0x	up to 4.4x

Mark Carroll, an AWS director of engineering on the Trainium team, attributed the gains to the combination of the new chip and the new Neuron switches, telling TechCrunch in 2026 that "that's why Trainium3 is breaking all kinds of records."^[6] Anthropic has continued to scale Claude training and serving on Trainium hardware beyond the footprint already deployed for Project Rainier.^[9]

What software supports Trainium 3? The Neuron SDK

Trainium 3 is supported by the AWS Neuron SDK, the same compiler and runtime toolchain used by Trainium 1 and Trainium 2. The SDK exposes Trainium hardware to higher-level frameworks through native PyTorch and JAX integrations, with the Neuron compiler responsible for partitioning models and scheduling NeuronCore execution.^[8] AWS says moving a supported PyTorch model to Trainium is "basically a one-line change, and then recompile, and then run on Trainium."^[6] For workloads that require lower-level control, the Neuron Kernel Interface (NKI) allows engineers to write custom kernels using a Python-based programming model that compiles directly to NeuronCore instructions.^[8] A Trainium 3 era addition is a profiling and debugging toolset that visualizes how a model is executing across the engines of each NeuronCore, the SBUF and PSUM memory hierarchy, and the NeuronLink and EFA networking fabrics.

The supported numeric formats include FP32, BF16, FP16, TF32, MXFP8, and MXFP4, with the compiler automatically managing mixed-precision policies.^[8] Additional framework integrations include vLLM for high-throughput inference, Hugging Face's Optimum Neuron for transformer model deployment, PyTorch Lightning for training orchestration, and TorchTitan for very large language model training. Trn3 UltraServers are exposed through the standard set of AWS managed services: Amazon SageMaker offers managed training jobs and inference endpoints; Amazon EKS and ECS provide container orchestration; AWS Batch and AWS ParallelCluster support HPC-style scheduling; and Amazon Bedrock uses Trn3 hardware as an inference backend for hosted foundation models.^[1]

Who uses Trainium 3? Customers and adoption

At launch, AWS announced a range of customers using or planning to use Trainium 3. The headline adopter is Anthropic, which uses more than one million Trainium 2 chips to train and serve Claude and has committed to expanding its Trainium capacity through 2026.^[5]^[6] In late 2025, Anthropic and Amazon expanded their collaboration to secure up to 5 gigawatts of new compute for training and deploying Claude, including new Trainium capacity coming online through 2026.^[10]

OpenAI signed a multi-year agreement with AWS that includes approximately 2 gigawatts of Trainium capacity spanning the Trainium 3 and Trainium 4 generations, part of a $38 billion AWS compute agreement that the two companies subsequently expanded.^[11] The deal was widely read as a notable diversification of OpenAI's supplier base away from a near-exclusive reliance on Nvidia GPUs.

Apple is an AWS AI-silicon customer that evaluates Trainium for pre-training future Apple Intelligence models. At re:Invent 2024, Apple's senior director for AI and machine learning Benoit Dupin said Apple expected up to 50 percent efficiency improvement in pre-training with AWS; the same statements made clear the Trainium chips "would not be used to actually run Apple Intelligence features for customers," which run on Apple's own silicon.^[12] Databricks has used Trainium for parts of its training services. Other named Trainium 3 customers include Decart (real-time generative video), poolside, Ricoh, Karakuri, Metagenomi, NetoAI, and Splash Music.^[1]

The table below shows major announced Trainium 3 customers and use cases as of mid 2026.

Customer	Use case	Notes
Anthropic	Training and serving Claude	Primary Trainium customer, up to 5 GW expanded partnership
OpenAI	Frontier model training and inference	Multi-year deal, ~2 GW of Trainium capacity
Apple	Pre-training Apple Intelligence models	Up to 50% pre-training efficiency vs prior; not used for customer inference
Databricks	AI model training	Foundation and customer model training
Decart	Real-time generative video	Reports 4x faster inference at half the cost of GPUs
poolside	AI coding assistant	Training and inference
Ricoh	Enterprise document AI	Up to 50% cost reduction
Karakuri	Japanese-language LLMs	Up to 50% cost reduction
Metagenomi	Bioinformatics models	Up to 50% cost reduction
Splash Music	Generative music	Up to 50% cost reduction

Energy efficiency and power

A central design objective of Trainium 3 was to reduce the energy intensity of large-scale model training and serving. AWS reports that Trn3 UltraServers deliver 4 times the energy efficiency of Trn2 UltraServers, driven by the move from TSMC N5 to N3P, the higher utilization of the MXFP8 and MXFP4 systolic arrays, and the switched NeuronSwitch-v1 fabric that reduces wasted communication time.^[1]^[7] In real-world serving workloads, the combined effect is over 5 times higher output tokens per megawatt.^[1]

AWS deploys Trainium 3 in two rack-level configurations that differ primarily in chip density and cooling: an air-cooled NL32x2 configuration and a liquid-cooled NL72x2 configuration.^[7] AWS has not published precise per-chip TDP numbers, though industry analysis indicates Trainium 3 falls in the high-hundreds-of-watts range per accelerator, on par with contemporary Nvidia Blackwell and Blackwell Ultra accelerators.^[7] Press reports describe Trainium 3 as reducing data center power consumption for equivalent AI workloads by around 40 percent compared to Trainium 2 deployments.^[13]

Pricing and economics

AWS does not publish per-chip-hour list pricing for Trn3 UltraServers, instead offering them primarily through capacity reservations, savings plans, and multi-year contracts.^[3] AWS says its Trn3 UltraServers "cost up to 50% less to run for comparable performance than using classic cloud servers," with customer-reported cost reductions of up to 50 percent attributed to price-performance, energy savings, and operational density.^[1]^[6] For workloads that map well to Trainium's strengths (large transformer training and inference with MXFP8 or MXFP4, mixture-of-experts models, long-context inference), the headline economics are competitive with Nvidia GPUs; for workloads that depend on the CUDA ecosystem or specialized GPU libraries that have no Neuron equivalent, Trainium 3 has been less attractive.

How does Trainium 3 compare to Nvidia GPUs and Google TPUs?

Trainium 3 entered the market in late 2025 against a maturing set of competing AI accelerators. The closest direct competitor is Nvidia's Blackwell architecture, particularly the B200 and Blackwell Ultra (GB300) parts, which dominated frontier AI training shipments through 2025.^[14] On peak FP8 throughput, Trainium 3 is broadly competitive with B200 on a per-chip basis, though Nvidia retains advantages on aggregate per-system bandwidth, software ecosystem maturity, and inference-optimized FP4 throughput on Blackwell Ultra.^[14] SemiAnalysis observed that AWS's liquid-cooled NL72x2 switched topology, which spans 144 chips across two racks, is "a potential challenger approaching" Nvidia's 72-package NVL72 reference rack on aggregate bandwidth.^[7]

Other competitors in this generation include Google's Tensor Processing Unit line, specifically TPU v6e (Trillium) and TPU v7 (Ironwood), which target similar workloads on Google Cloud, and AMD's MI355X and MI400 series accelerators.^[7] Trainium 3 differs from all of these by being available only through AWS, which constrains its market reach but allows AWS to closely co-design the chip with its broader cloud infrastructure and pricing model.

Roadmap: Trainium 4 and beyond

Alongside the Trn3 UltraServer launch at re:Invent 2025, AWS previewed the fourth-generation Trainium 4 chip, scheduled to begin delivery in 2027.^[15] AWS describes Trainium 4 as bringing significant performance improvements across all dimensions, including at least 3 times the FP8 processing power and 4 times the memory bandwidth of Trainium 3, with higher FP4 performance and support for Nvidia's NVLink Fusion technology as part of a broader interoperability strategy.^[15] AWS has indicated that intermediate networking and software improvements will continue to arrive on Trainium 3 systems through 2026, and has signaled interest in adopting the UALink open standard for chip-to-chip interconnect in a future Trn3 switch revision and in Trainium 4.^[7]

Reception

Industry analyst response to Trainium 3 was generally positive. The semiconductor analysis firm SemiAnalysis titled its December 2025 deep dive "AWS Trainium3 Deep Dive: A Potential Challenger Approaching," framing the chip as opening "yet another front" for Nvidia's Jensen Huang alongside Google's TPU v7 and AMD's MI450X, while cautioning that "Nvidia will stay King of the Jungle" if its development pace accelerates further.^[7] Coverage in HPCwire, Tom's Hardware, and The Next Platform highlighted the move to a switched scale-up topology with NeuronSwitch-v1 as the most consequential architectural change, since it brings AWS's training systems closer to the topological flexibility of Nvidia's NVLink switched domains.^[16]^[14]^[15]

Some observers cautioned that Trainium 3's per-chip BF16 throughput, unchanged from Trainium 2, would limit its appeal for workloads that have not yet migrated to FP8 or FP4 numerics, and that the Neuron software stack still trails CUDA in library breadth and third-party tool support.^[7] The broad customer adoption at launch, particularly the OpenAI and continued Anthropic commitments, was widely interpreted as evidence that frontier AI labs see Trainium 3 economics as competitive enough to justify the engineering investment required to port major training pipelines onto a non-Nvidia stack.

References

AWS, "Trainium3 UltraServers Now Available: Enabling Customers to Train and Deploy AI Models Faster at Lower Cost," December 1, 2025. https://press.aboutamazon.com/2025/12/trainium3-ultraservers-now-available-enabling-customers-to-train-and-deploy-ai-models-faster-at-lower-cost ↩
About Amazon, "Trainium3 UltraServers now available: Enabling customers to train and deploy AI models faster at lower cost," December 2025. https://www.aboutamazon.com/news/aws/trainium-3-ultraserver-faster-ai-training-lower-cost ↩
AWS, "Gen AI Compute Instance - Amazon EC2 Trn3 UltraServers," 2025-2026. https://aws.amazon.com/ec2/instance-types/trn3/ ↩
TechCrunch, "AWS Trainium2 chips for building LLMs are now generally available, with Trainium3 coming in late 2025," December 3, 2024. https://techcrunch.com/2024/12/03/aws-trainium2-chips-for-building-llms-are-now-generally-available-with-trainium3-coming-in-late-2025/ ↩
About Amazon, "AWS activates Project Rainier: One of the world's largest AI compute clusters comes online," 2025. https://www.aboutamazon.com/news/aws/aws-project-rainier-ai-trainium-chips-compute-cluster ↩
TechCrunch, "An exclusive tour of Amazon's Trainium lab, the chip that's won over Anthropic, OpenAI, even Apple," March 22, 2026. https://techcrunch.com/2026/03/22/an-exclusive-tour-of-amazons-trainium-lab-the-chip-thats-won-over-anthropic-openai-even-apple/ ↩
SemiAnalysis, "AWS Trainium3 Deep Dive: A Potential Challenger Approaching," December 2025. https://newsletter.semianalysis.com/p/aws-trainium3-deep-dive-a-potential ↩
AWS Neuron Documentation, "Trainium3 Architecture" and "NeuronCore-v4 Architecture," 2025-2026. https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/trainium3.html ↩
About Amazon, "Announcing Amazon EC2 Trn3 UltraServers for faster, lower-cost generative AI training," December 2025. https://aws.amazon.com/about-aws/whats-new/2025/12/amazon-ec2-trn3-ultraservers/ ↩
Anthropic, "Anthropic and Amazon expand collaboration for up to 5 gigawatts of new compute," 2025. https://www.anthropic.com/news/anthropic-amazon-compute ↩
About Amazon, "OpenAI and Amazon announce strategic partnership," November 3, 2025. https://www.aboutamazon.com/news/aws/amazon-open-ai-strategic-partnership-investment ↩
Data Center Dynamics, "Apple uses Amazon's Graviton and Inferentia chips, also explores Trainium2," December 2024. https://www.datacenterdynamics.com/en/news/apple-uses-amazons-graviton-and-inferentia-chips-also-explores-trainium2/ ↩
Data Centre Magazine, "AWS Trainium3 Cuts AI Data Centre Power Consumption by 40%," December 2025. https://datacentremagazine.com/news/trainium3-new-aws-chip-promises-4x-performance-boost ↩
Tom's Hardware, "Amazon launches Trainium3 AI accelerator, competing directly against Blackwell Ultra in FP8 performance," December 2025. https://www.tomshardware.com/tech-industry/artificial-intelligence/amazon-launches-trainium3-ai-accelerator-competing-directly-against-blackwell-ultra-in-fp8-performance ↩
The Next Platform, "With Trainium4, AWS Will Crank Up Everything But The Clocks," December 3, 2025. https://www.nextplatform.com/2025/12/03/with-trainium4-aws-will-crank-up-everything-but-the-clocks/ ↩
HPCwire, "AWS Brings the Trainium3 Chip to Market With New EC2 UltraServers," December 2, 2025. https://www.hpcwire.com/2025/12/02/aws-brings-the-trainium3-chip-to-market-with-new-ec2-ultraservers/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

AWS Inferentia AWS Trainium Anthropic-Amazon Trainium expansion GPU cluster Google TPU 8t Microsoft Maia 200 Nvidia Project Rainier

Background and development

How is Trainium 3 built? Chip architecture

What process and packaging does Trainium 3 use?

How are the NeuronCore-v4 compute cores organized?

Memory and on-chip bandwidth

How do Trn3 UltraServers scale? System architecture

Sleds and racks

NeuronLink-v4 and NeuronSwitch-v1

Scale-out networking

How fast is Trainium 3? Performance and benchmarks

What software supports Trainium 3? The Neuron SDK

Who uses Trainium 3? Customers and adoption

Energy efficiency and power

Pricing and economics

How does Trainium 3 compare to Nvidia GPUs and Google TPUs?

Roadmap: Trainium 4 and beyond

Reception

See also

References

Improve this article

Related Articles

Cloud TPU

NVIDIA Picasso

Tensor Processing Unit (TPU)

TPU Pod

TPU Node

TPU Worker

What links here

Related Articles

Cloud TPU

NVIDIA Picasso

Tensor Processing Unit (TPU)

TPU Pod

TPU Node

TPU Worker

What links here