# Ampere (microarchitecture)

> Source: https://aiwiki.ai/wiki/nvidia_ampere
> Updated: 2026-06-03
> Categories: AI Hardware, NVIDIA
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

# Ampere (microarchitecture)

**Ampere** is a graphics processing unit (GPU) microarchitecture developed by [Nvidia](/wiki/nvidia) as the successor to both the Volta and Turing architectures. It was introduced in 2020 and spanned the full range of Nvidia's product lines, from the data-center A100 accelerator that became the workhorse of large-scale [deep learning](/wiki/deep_learning) to the consumer GeForce RTX 30 series of gaming graphics cards. Across these segments Ampere served as Nvidia's principal compute and graphics platform, and the A100 in particular was the dominant GPU for training and serving [neural networks](/wiki/neural_network) during the period of roughly 2020 to 2022, underpinning much of the early wave of large [language models](/wiki/large_language_model).[1][2]

The architecture is named after André-Marie Ampère, the French mathematician and physicist who was one of the founders of the science of electromagnetism and after whom the ampere, the SI unit of electric current, is named. The naming follows Nvidia's long-standing convention of honoring noted scientists and mathematicians, a series that includes the preceding Volta (Alessandro Volta) and Turing (Alan Turing) architectures and the succeeding Hopper (Grace Hopper) and Ada Lovelace architectures.[1]

## Announcement and release

Nvidia unveiled the Ampere architecture on May 14, 2020, during the keynote of its GPU Technology Conference (GTC), which was held online that year. The launch vehicle was the data-center A100 Tensor Core GPU, which Nvidia announced as already being in full production and shipping to customers. Chief executive Jensen Huang presented the A100 alongside the DGX A100 system, an integrated server built around eight A100 accelerators.[2][3]

The consumer side of the architecture followed several months later. Nvidia announced the GeForce RTX 30 series on September 1, 2020, with the first cards reaching the market beginning September 17, 2020. Professional and workstation Ampere products, marketed under the new NVIDIA RTX A-series and Nvidia A-series names, were announced beginning in October 2020.[4][5]

## The GA100 data-center die and the A100

The flagship compute die of the Ampere generation is GA100, which powers the [A100](/wiki/nvidia_a100) accelerator. GA100 is fabricated on [TSMC](/wiki/tsmc)'s 7 nm (N7) process and contains 54.2 billion transistors on a die measuring 826 mm², making it one of the largest chips ever put into volume production at the time. The full GA100 design comprises 128 streaming multiprocessors (SMs) and 8,192 FP32 [CUDA](/wiki/cuda) cores, although the shipping A100 enabled 108 SMs and 6,912 FP32 CUDA cores to improve manufacturing yields, together with 3,456 FP64 cores.[1][6]

The A100 introduced several capabilities that proved central to AI workloads:

- **Third-generation Tensor Cores.** Each A100 SM contains four third-generation Tensor Cores, and each core performs 256 FP16/FP32 fused-multiply-add (FMA) operations per clock, so a single SM delivers 1,024 dense FP16/FP32 FMA operations per clock, roughly double the per-SM matrix throughput of Volta and Turing. The cores support a broad set of numeric formats, including FP16, BF16 (bfloat16), INT8, INT4, and binary, and they also extended Tensor Core acceleration to FP64 for high-performance computing, reaching about 2.5 times the FP64 throughput of the V100.[1]
- **TensorFloat-32 (TF32).** Ampere introduced a new numeric mode for Tensor Cores called TF32, which combines the wide 8-bit exponent (and therefore the numeric range) of FP32 with the 10-bit mantissa precision of FP16. TF32 lets the Tensor Cores accelerate FP32 inputs automatically, without code changes, delivering large speedups on training math while preserving the dynamic range that deep-learning training requires.[1]
- **Fine-grained structural sparsity.** The Tensor Cores can exploit a 2:4 structured-sparsity pattern, in which two of every four elements in a weight vector are zero. Skipping the zeros doubles effective Tensor Core throughput for models pruned to this pattern.[1]
- **Multi-Instance GPU (MIG).** A single A100 can be partitioned into as many as seven isolated GPU instances, each with its own dedicated slice of compute, L2 cache, memory controllers, and high-bandwidth memory. MIG lets a single physical GPU be shared securely among multiple users or jobs, improving utilization for inference and multi-tenant workloads.[1][6]

The original A100 carried 40 GB of HBM2 memory delivering about 1,555 GB/s of bandwidth, a 40 MB L2 cache (nearly seven times that of the V100), and third-generation NVLink providing 600 GB/s of total bidirectional bandwidth across 12 links, double the 300 GB/s of the V100. Its peak FP32 throughput is 19.5 TFLOPS, while TF32 Tensor Core math reaches 156 TFLOPS dense (and 312 TFLOPS with sparsity). The SXM4 module form factor carries a 400 W thermal design power.[1][6]

In November 2020, at the SC20 supercomputing conference, Nvidia announced an upgraded **A100 80GB** variant that doubled the high-bandwidth memory to 80 GB using HBM2e and pushed memory bandwidth past 2 TB/s (about 2,039 GB/s) while keeping the same 400 W power envelope. Nvidia marketed the A100 as delivering up to 20 times the AI performance of the preceding V100.[6][7]

## The consumer GeForce RTX 30 series (GA10x)

For gaming and creative use, Ampere appeared in the GeForce RTX 30 series, built on a family of dies that Nvidia designated the "GA10x" class: GA102, GA104, GA106, and GA107. Unlike the GA100, these consumer dies are manufactured on a custom version of Samsung's 8 nm process branded "8N," a node tuned specifically for Nvidia.[4][8]

The largest consumer die, GA102, contains 28.3 billion transistors on a 628.4 mm² die and is organized into 7 graphics processing clusters (GPCs). It powered the top of the launch lineup, the GeForce RTX 3090 (10,496 CUDA cores, 24 GB of GDDR6X) and the RTX 3080 (8,704 CUDA cores). The smaller GA104 die (17.4 billion transistors, 392 mm²) powered the RTX 3070 (5,888 CUDA cores) and the RTX 3060 Ti, while GA106 and GA107 served lower tiers such as the RTX 3060 and RTX 3050.[4][8]

The defining change in the consumer Ampere SM was a redesigned data path that doubled FP32 throughput relative to Turing: each SM can issue floating-point math on both of its data paths, whereas Turing SMs dedicated one path to integer operations. The RTX 30 series also added second-generation ray-tracing (RT) cores, which roughly doubled ray-triangle intersection throughput over Turing, and third-generation Tensor Cores used to drive the DLSS (Deep Learning Super Sampling) upscaling feature. The series launched with GDDR6X memory on its flagship cards and arrived to unusually high demand, with availability constrained throughout 2020 and 2021 by the COVID-19 pandemic, a cryptocurrency-mining boom, and broad semiconductor shortages.[4][8]

## Professional, workstation, and other data-center products

Beyond the A100 and the GeForce line, Nvidia shipped a wide range of Ampere accelerators aimed at enterprise inference, virtualization, visualization, and edge servers. As part of this generation Nvidia consolidated its professional branding, retiring the Quadro and Tesla names in favor of "NVIDIA RTX A-series" workstation cards and plain "Nvidia A-series" data-center accelerators.[5][9]

| Product | Die | Process | Memory | Primary use |
| --- | --- | --- | --- | --- |
| A100 (40 GB / 80 GB) | GA100 | TSMC 7 nm (N7) | 40 / 80 GB HBM2 / HBM2e | AI training and HPC; flagship data center |
| A800 | GA100 | TSMC 7 nm (N7) | 40 / 80 GB HBM2 / HBM2e | China-market A100 variant |
| A30 | GA100 | TSMC 7 nm (N7) | 24 GB HBM2 | Mainstream AI inference and training |
| A40 | GA102 | Samsung 8 nm (8N) | 48 GB GDDR6 | Data-center visualization and virtualization |
| A10 | GA102 | Samsung 8 nm (8N) | 24 GB GDDR6 | Graphics, virtualization, and inference |
| A16 | 4 x GA107 | Samsung 8 nm (8N) | 64 GB GDDR6 (4 x 16 GB) | Virtual desktop infrastructure |
| A2 | GA107 | Samsung 8 nm (8N) | 16 GB GDDR6 | Low-power edge and entry inference |
| RTX A6000 | GA102 | Samsung 8 nm (8N) | 48 GB GDDR6 | Professional workstation graphics |
| GeForce RTX 3090 | GA102 | Samsung 8 nm (8N) | 24 GB GDDR6X | Enthusiast gaming and content creation |

Among the data-center parts, the A30 joined the A100 as the second Ampere product to support Multi-Instance GPU partitioning, since both are based on the GA100 die. The A40 and A10 used the GA102 die for graphics-rich and virtualization workloads, the A16 combined four GA107 dies for virtual-desktop deployments, and the compact, low-power A2 (GA107) targeted edge inference. On the workstation side, the RTX A6000 was notable as the only GA102-based card to enable the die's full complement of 10,752 CUDA cores, paired with 48 GB of GDDR6 memory, and it was joined by the RTX A5000, RTX A4000, and other RTX A-series models.[5][9][10]

### The China-specific A800

Following United States export controls announced in 2022 that restricted shipments of high-performance accelerators such as the A100 to China, Nvidia created the [A800](/wiki/nvidia_a800), a China-market variant of the A100 based on the same GA100 die. The principal modification was a reduction of the NVLink interconnect bandwidth from the A100's 600 GB/s down to 400 GB/s, placing it below the 600 GB/s interconnect threshold set by the U.S. Bureau of Industry and Security and thereby keeping the part exportable. Nvidia put the A800 into production in the third quarter of 2022; it was later itself restricted by tightened 2023 export rules.[11][12]

## Process nodes

A distinctive feature of the Ampere generation is its split manufacturing strategy. The compute-oriented GA100 (and the A100, A30, and A800 built from it) is fabricated on TSMC's 7 nm N7 process, while every consumer and most professional/visualization dies, GA102 through GA107, are built on a custom version of Samsung's 8 nm process branded 8N. Using two foundries and two nodes allowed Nvidia to reserve the denser, more expensive TSMC process for its largest data-center chip while securing high-volume capacity at Samsung for the consumer GeForce line.[1][8]

## Place in the lineage and significance

Ampere occupies a pivotal position in Nvidia's GPU roadmap. It unified the compute lineage that had run through Volta (the V100) and the graphics-plus-ray-tracing lineage of Turing under a single architectural generation, even though it continued to use distinct dies and process nodes for the two segments. It was in turn succeeded by two specialized architectures: Hopper (the H100) took over the data-center role in 2022, while Ada Lovelace (the GeForce RTX 40 series) succeeded it in consumer graphics, also in 2022.[1]

The A100's combination of third-generation Tensor Cores, TF32, structural sparsity, large HBM2/HBM2e capacity, and high-bandwidth NVLink made it the standard accelerator for training and deploying large neural networks in the period from 2020 to 2022. It anchored Nvidia's DGX A100 systems and HGX A100 baseboards and was deployed at scale in cloud data centers, where it powered much of the foundational research and early production of large language models before the H100 became broadly available. In this sense Ampere, and the A100 in particular, marked the architecture on which the modern era of large-scale generative AI was largely built.[2][6]

## References

[1] NVIDIA, "NVIDIA Ampere Architecture In-Depth," NVIDIA Technical Blog. https://developer.nvidia.com/blog/nvidia-ampere-architecture-in-depth/

[2] NVIDIA, "NVIDIA's New Ampere Data Center GPU in Full Production," NVIDIA Newsroom, May 14, 2020. https://nvidianews.nvidia.com/news/nvidias-new-ampere-data-center-gpu-in-full-production

[3] Ryan Smith, "NVIDIA Announces Ampere Architecture and A100 Products," AnandTech, May 14, 2020. https://www.anandtech.com/show/15801/nvidia-announces-ampere-architecture-and-a100-products

[4] "GeForce RTX 30 series," Wikipedia. https://en.wikipedia.org/wiki/GeForce_RTX_30_series

[5] "Ampere (microarchitecture)," Wikipedia. https://en.wikipedia.org/wiki/Ampere_(microarchitecture)

[6] NVIDIA, "NVIDIA A100 Tensor Core GPU Architecture" (whitepaper). https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf

[7] NVIDIA, "NVIDIA Doubles Down: Announces A100 80GB GPU, Supercharging World's Most Powerful GPU for AI Supercomputing," NVIDIA Newsroom, November 16, 2020. https://nvidianews.nvidia.com/news/nvidia-doubles-down-announces-a100-80gb-gpu-supercharging-worlds-most-powerful-gpu-for-ai-supercomputing

[8] NVIDIA, "NVIDIA Ampere GA102 GPU Architecture" (whitepaper). https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.1.pdf

[9] VideoCardz, "NVIDIA announces RTX A6000 workstation graphics card with full GA102 GPU is now available." https://videocardz.com/newz/nvidia-announces-rtx-a6000-workstation-graphics-card-with-full-ga102-gpu-is-now-available

[10] Tom's Hardware, "Nvidia Launches New Professional Ampere Graphics for Desktops and Laptops." https://www.tomshardware.com/news/nvidia-launches-a-host-of-professional-ampere-gpus

[11] WCCFTech, "NVIDIA & Partners Reallocating 'China-Only' A800 AI GPU Supply As US Restrictions Go Into Affect." https://wccftech.com/nvidia-reallocating-china-only-a800-ai-gpu-supply-as-us-restrictions-go-into-affect/

[12] U.S. Securities and Exchange Commission, NVIDIA Corporation Form 8-K, August 26, 2022. https://www.sec.gov/Archives/edgar/data/0001045810/000104581022000146/nvda-20220826.htm

