Ampere (microarchitecture)
Last reviewed
Jun 3, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 · 1,971 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 · 1,971 words
Add missing citations, update stale details, or suggest a clearer explanation.
Ampere is a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to both the Volta and Turing architectures. It was introduced in 2020 and spanned the full range of Nvidia's product lines, from the data-center A100 accelerator that became the workhorse of large-scale deep learning to the consumer GeForce RTX 30 series of gaming graphics cards. Across these segments Ampere served as Nvidia's principal compute and graphics platform, and the A100 in particular was the dominant GPU for training and serving neural networks during the period of roughly 2020 to 2022, underpinning much of the early wave of large language models.[1][2]
The architecture is named after André-Marie Ampère, the French mathematician and physicist who was one of the founders of the science of electromagnetism and after whom the ampere, the SI unit of electric current, is named. The naming follows Nvidia's long-standing convention of honoring noted scientists and mathematicians, a series that includes the preceding Volta (Alessandro Volta) and Turing (Alan Turing) architectures and the succeeding Hopper (Grace Hopper) and Ada Lovelace architectures.[1]
Nvidia unveiled the Ampere architecture on May 14, 2020, during the keynote of its GPU Technology Conference (GTC), which was held online that year. The launch vehicle was the data-center A100 Tensor Core GPU, which Nvidia announced as already being in full production and shipping to customers. Chief executive Jensen Huang presented the A100 alongside the DGX A100 system, an integrated server built around eight A100 accelerators.[2][3]
The consumer side of the architecture followed several months later. Nvidia announced the GeForce RTX 30 series on September 1, 2020, with the first cards reaching the market beginning September 17, 2020. Professional and workstation Ampere products, marketed under the new NVIDIA RTX A-series and Nvidia A-series names, were announced beginning in October 2020.[4][5]
The flagship compute die of the Ampere generation is GA100, which powers the A100 accelerator. GA100 is fabricated on TSMC's 7 nm (N7) process and contains 54.2 billion transistors on a die measuring 826 mm², making it one of the largest chips ever put into volume production at the time. The full GA100 design comprises 128 streaming multiprocessors (SMs) and 8,192 FP32 CUDA cores, although the shipping A100 enabled 108 SMs and 6,912 FP32 CUDA cores to improve manufacturing yields, together with 3,456 FP64 cores.[1][6]
The A100 introduced several capabilities that proved central to AI workloads:
The original A100 carried 40 GB of HBM2 memory delivering about 1,555 GB/s of bandwidth, a 40 MB L2 cache (nearly seven times that of the V100), and third-generation NVLink providing 600 GB/s of total bidirectional bandwidth across 12 links, double the 300 GB/s of the V100. Its peak FP32 throughput is 19.5 TFLOPS, while TF32 Tensor Core math reaches 156 TFLOPS dense (and 312 TFLOPS with sparsity). The SXM4 module form factor carries a 400 W thermal design power.[1][6]
In November 2020, at the SC20 supercomputing conference, Nvidia announced an upgraded A100 80GB variant that doubled the high-bandwidth memory to 80 GB using HBM2e and pushed memory bandwidth past 2 TB/s (about 2,039 GB/s) while keeping the same 400 W power envelope. Nvidia marketed the A100 as delivering up to 20 times the AI performance of the preceding V100.[6][7]
For gaming and creative use, Ampere appeared in the GeForce RTX 30 series, built on a family of dies that Nvidia designated the "GA10x" class: GA102, GA104, GA106, and GA107. Unlike the GA100, these consumer dies are manufactured on a custom version of Samsung's 8 nm process branded "8N," a node tuned specifically for Nvidia.[4][8]
The largest consumer die, GA102, contains 28.3 billion transistors on a 628.4 mm² die and is organized into 7 graphics processing clusters (GPCs). It powered the top of the launch lineup, the GeForce RTX 3090 (10,496 CUDA cores, 24 GB of GDDR6X) and the RTX 3080 (8,704 CUDA cores). The smaller GA104 die (17.4 billion transistors, 392 mm²) powered the RTX 3070 (5,888 CUDA cores) and the RTX 3060 Ti, while GA106 and GA107 served lower tiers such as the RTX 3060 and RTX 3050.[4][8]
The defining change in the consumer Ampere SM was a redesigned data path that doubled FP32 throughput relative to Turing: each SM can issue floating-point math on both of its data paths, whereas Turing SMs dedicated one path to integer operations. The RTX 30 series also added second-generation ray-tracing (RT) cores, which roughly doubled ray-triangle intersection throughput over Turing, and third-generation Tensor Cores used to drive the DLSS (Deep Learning Super Sampling) upscaling feature. The series launched with GDDR6X memory on its flagship cards and arrived to unusually high demand, with availability constrained throughout 2020 and 2021 by the COVID-19 pandemic, a cryptocurrency-mining boom, and broad semiconductor shortages.[4][8]
Beyond the A100 and the GeForce line, Nvidia shipped a wide range of Ampere accelerators aimed at enterprise inference, virtualization, visualization, and edge servers. As part of this generation Nvidia consolidated its professional branding, retiring the Quadro and Tesla names in favor of "NVIDIA RTX A-series" workstation cards and plain "Nvidia A-series" data-center accelerators.[5][9]
| Product | Die | Process | Memory | Primary use |
|---|---|---|---|---|
| A100 (40 GB / 80 GB) | GA100 | TSMC 7 nm (N7) | 40 / 80 GB HBM2 / HBM2e | AI training and HPC; flagship data center |
| A800 | GA100 | TSMC 7 nm (N7) | 40 / 80 GB HBM2 / HBM2e | China-market A100 variant |
| A30 | GA100 | TSMC 7 nm (N7) | 24 GB HBM2 | Mainstream AI inference and training |
| A40 | GA102 | Samsung 8 nm (8N) | 48 GB GDDR6 | Data-center visualization and virtualization |
| A10 | GA102 | Samsung 8 nm (8N) | 24 GB GDDR6 | Graphics, virtualization, and inference |
| A16 | 4 x GA107 | Samsung 8 nm (8N) | 64 GB GDDR6 (4 x 16 GB) | Virtual desktop infrastructure |
| A2 | GA107 | Samsung 8 nm (8N) | 16 GB GDDR6 | Low-power edge and entry inference |
| RTX A6000 | GA102 | Samsung 8 nm (8N) | 48 GB GDDR6 | Professional workstation graphics |
| GeForce RTX 3090 | GA102 | Samsung 8 nm (8N) | 24 GB GDDR6X | Enthusiast gaming and content creation |
Among the data-center parts, the A30 joined the A100 as the second Ampere product to support Multi-Instance GPU partitioning, since both are based on the GA100 die. The A40 and A10 used the GA102 die for graphics-rich and virtualization workloads, the A16 combined four GA107 dies for virtual-desktop deployments, and the compact, low-power A2 (GA107) targeted edge inference. On the workstation side, the RTX A6000 was notable as the only GA102-based card to enable the die's full complement of 10,752 CUDA cores, paired with 48 GB of GDDR6 memory, and it was joined by the RTX A5000, RTX A4000, and other RTX A-series models.[5][9][10]
Following United States export controls announced in 2022 that restricted shipments of high-performance accelerators such as the A100 to China, Nvidia created the A800, a China-market variant of the A100 based on the same GA100 die. The principal modification was a reduction of the NVLink interconnect bandwidth from the A100's 600 GB/s down to 400 GB/s, placing it below the 600 GB/s interconnect threshold set by the U.S. Bureau of Industry and Security and thereby keeping the part exportable. Nvidia put the A800 into production in the third quarter of 2022; it was later itself restricted by tightened 2023 export rules.[11][12]
A distinctive feature of the Ampere generation is its split manufacturing strategy. The compute-oriented GA100 (and the A100, A30, and A800 built from it) is fabricated on TSMC's 7 nm N7 process, while every consumer and most professional/visualization dies, GA102 through GA107, are built on a custom version of Samsung's 8 nm process branded 8N. Using two foundries and two nodes allowed Nvidia to reserve the denser, more expensive TSMC process for its largest data-center chip while securing high-volume capacity at Samsung for the consumer GeForce line.[1][8]
Ampere occupies a pivotal position in Nvidia's GPU roadmap. It unified the compute lineage that had run through Volta (the V100) and the graphics-plus-ray-tracing lineage of Turing under a single architectural generation, even though it continued to use distinct dies and process nodes for the two segments. It was in turn succeeded by two specialized architectures: Hopper (the H100) took over the data-center role in 2022, while Ada Lovelace (the GeForce RTX 40 series) succeeded it in consumer graphics, also in 2022.[1]
The A100's combination of third-generation Tensor Cores, TF32, structural sparsity, large HBM2/HBM2e capacity, and high-bandwidth NVLink made it the standard accelerator for training and deploying large neural networks in the period from 2020 to 2022. It anchored Nvidia's DGX A100 systems and HGX A100 baseboards and was deployed at scale in cloud data centers, where it powered much of the foundational research and early production of large language models before the H100 became broadly available. In this sense Ampere, and the A100 in particular, marked the architecture on which the modern era of large-scale generative AI was largely built.[2][6]