Ampere (microarchitecture)

AI Hardware NVIDIA

12 min read

Updated Jun 27, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 27, 2026

Fact-checked

In review queue

Sources

13 citations

Revision

v2 · 2,453 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Ampere is a graphics processing unit (GPU) microarchitecture from Nvidia, announced on May 14, 2020, as the successor to the Volta and Turing architectures. It spans Nvidia's full product range, from the data-center A100 accelerator that became the standard chip for training large neural networks to the consumer GeForce RTX 30 series of gaming cards. Built around third-generation Tensor Cores with new features such as the TensorFloat-32 (TF32) numeric format, structured sparsity, and Multi-Instance GPU (MIG) partitioning, the A100 was the dominant accelerator for deep learning from roughly 2020 to 2022 and underpinned much of the early wave of large language models before being succeeded by Hopper (the H100) in 2022.^[1]^[2]

The architecture is named after Andre-Marie Ampere, the French mathematician and physicist who was one of the founders of the science of electromagnetism and after whom the ampere, the SI unit of electric current, is named. The naming follows Nvidia's long-standing convention of honoring noted scientists and mathematicians, a series that includes the preceding Volta (Alessandro Volta) and Turing (Alan Turing) architectures and the succeeding Hopper (Grace Hopper) and Ada Lovelace architectures.^[1]

What is NVIDIA Ampere?

NVIDIA Ampere is the 2020 GPU architecture that unified Nvidia's compute and graphics lines under a single generation. On the data-center side it is best known for the A100 Tensor Core GPU; on the consumer side it is the basis of the GeForce RTX 30 series. Across both segments Ampere served as Nvidia's principal compute and graphics platform, and the A100 in particular was the workhorse GPU for large-scale AI training and inference during the period of roughly 2020 to 2022.^[1]^[2]

When was Ampere announced and released?

Nvidia unveiled the Ampere architecture on May 14, 2020, during the keynote of its GPU Technology Conference (GTC), which was held online that year. The launch vehicle was the data-center A100 Tensor Core GPU, which Nvidia announced as already being in full production and shipping to customers. Chief executive Jensen Huang presented the A100 alongside the DGX A100 system, an integrated server built around eight A100 accelerators.^[2]^[3]

At the announcement, Huang framed the A100 as a generational shift for the data center: "The powerful trends of cloud computing and AI are driving a tectonic shift in data center designs so that what was once a sea of CPU-only servers is now GPU-accelerated computing. NVIDIA A100 GPU is a 20x AI performance leap and an end-to-end machine learning accelerator, from data analytics to training to inference."^[2]

The consumer side of the architecture followed several months later. Nvidia announced the GeForce RTX 30 series in early September 2020, with the first cards reaching the market beginning September 17, 2020. Professional and workstation Ampere products, marketed under the new NVIDIA RTX A-series and Nvidia A-series names, were announced beginning in October 2020.^[4]^[5]

What is the A100 (the GA100 data-center die)?

The flagship compute die of the Ampere generation is GA100, which powers the A100 accelerator. GA100 is fabricated on TSMC's 7 nm (N7) process and contains 54.2 billion transistors on a die measuring 826 mm2, making it the largest 7 nm chip put into volume production at the time of its launch. The full GA100 design comprises 128 streaming multiprocessors (SMs) and 8,192 FP32 CUDA cores, although the shipping A100 enabled 108 SMs and 6,912 FP32 CUDA cores to improve manufacturing yields, together with 3,456 FP64 cores.^[1]^[6]

The A100 introduced several capabilities that proved central to AI workloads:

Third-generation Tensor Cores. Each A100 SM contains four third-generation Tensor Cores, and each core performs 256 FP16/FP32 fused-multiply-add (FMA) operations per clock, so a single SM delivers 1,024 dense FP16/FP32 FMA operations per clock, roughly double the per-SM matrix throughput of Volta and Turing. The cores support a broad set of numeric formats, including FP16, BF16 (bfloat16), INT8, INT4, and binary, and they also extended Tensor Core acceleration to FP64 for high-performance computing, reaching about 2.5 times the FP64 throughput of the V100.^[1]
TensorFloat-32 (TF32). Ampere introduced a new numeric mode for Tensor Cores called TF32, which combines the wide 8-bit exponent (and therefore the numeric range) of FP32 with the 10-bit mantissa precision of FP16. TF32 lets the Tensor Cores accelerate FP32 inputs automatically, without code changes; Nvidia states that TF32 Tensor Core operations run up to 10 times faster than V100 FP32 FMA operations, or up to 20 times faster with structured sparsity, while preserving the dynamic range that deep-learning training requires.^[1]
Fine-grained structural sparsity. The Tensor Cores can exploit a 2:4 structured-sparsity pattern, in which two of every four elements in a weight vector are zero. Skipping the zeros doubles effective Tensor Core throughput for models pruned to this pattern.^[1]
Multi-Instance GPU (MIG). A single A100 can be partitioned into as many as seven isolated GPU instances, each with its own dedicated slice of compute, L2 cache, memory controllers, and high-bandwidth memory. MIG lets a single physical GPU be shared securely among multiple users or jobs, improving utilization for inference and multi-tenant workloads.^[1]^[6]

The original A100 carried 40 GB of HBM2 memory delivering about 1,555 GB/s of bandwidth, a 40 MB L2 cache (nearly seven times that of the V100), and third-generation NVLink providing 600 GB/s of total bidirectional bandwidth across 12 links, double the 300 GB/s of the V100. Its peak FP32 throughput is 19.5 TFLOPS, while TF32 Tensor Core math reaches 156 TFLOPS dense (and 312 TFLOPS with sparsity). The SXM4 module form factor carries a 400 W thermal design power.^[1]^[6]

In November 2020, at the SC20 supercomputing conference, Nvidia announced an upgraded A100 80GB variant that doubled the high-bandwidth memory to 80 GB using HBM2e and pushed memory bandwidth past 2 TB/s (about 2,039 GB/s) while keeping the same 400 W power envelope. Nvidia marketed the A100 as delivering up to 20 times the AI performance of the preceding V100.^[6]^[7]

A100 specification	40 GB (SXM4)	80 GB (SXM)
GPU die	GA100	GA100
Process node	TSMC 7 nm (N7)	TSMC 7 nm (N7)
Transistors	54.2 billion	54.2 billion
FP32 CUDA cores	6,912	6,912
Memory	40 GB HBM2	80 GB HBM2e
Memory bandwidth	~1,555 GB/s	~2,039 GB/s
Peak FP32	19.5 TFLOPS	19.5 TFLOPS
TF32 Tensor (dense / sparse)	156 / 312 TFLOPS	156 / 312 TFLOPS
NVLink bandwidth	600 GB/s	600 GB/s
TDP	400 W	400 W

What is the consumer GeForce RTX 30 series (GA10x)?

For gaming and creative use, Ampere appeared in the GeForce RTX 30 series, built on a family of dies that Nvidia designated the "GA10x" class: GA102, GA104, GA106, and GA107. Unlike the GA100, these consumer dies are manufactured on a custom version of Samsung's 8 nm process branded "8N," a node tuned specifically for Nvidia.^[4]^[8]

The largest consumer die, GA102, contains 28.3 billion transistors on a 628.4 mm2 die and is organized into 7 graphics processing clusters (GPCs). It powered the top of the launch lineup, the GeForce RTX 3090 (10,496 CUDA cores, 24 GB of GDDR6X) and the RTX 3080 (8,704 CUDA cores). The smaller GA104 die (17.4 billion transistors, 392 mm2) powered the RTX 3070 (5,888 CUDA cores) and the RTX 3060 Ti, while GA106 and GA107 served lower tiers such as the RTX 3060 and RTX 3050.^[4]^[8]

The defining change in the consumer Ampere SM was a redesigned data path that doubled FP32 throughput relative to Turing: each SM can issue floating-point math on both of its data paths, whereas Turing SMs dedicated one path to integer operations. The RTX 30 series also added second-generation ray-tracing (RT) cores, which roughly doubled ray-triangle intersection throughput over Turing, and third-generation Tensor Cores used to drive the DLSS (Deep Learning Super Sampling) upscaling feature. The series launched with GDDR6X memory on its flagship cards and arrived to unusually high demand, with availability constrained from launch until 2022 by the COVID-19 pandemic, a cryptocurrency-mining boom, and the broad 2020-2023 global semiconductor shortage.^[4]^[8]

What professional, workstation, and other data-center Ampere products were there?

Beyond the A100 and the GeForce line, Nvidia shipped a wide range of Ampere accelerators aimed at enterprise inference, virtualization, visualization, and edge servers. As part of this generation Nvidia consolidated its professional branding, retiring the Quadro and Tesla names in favor of "NVIDIA RTX A-series" workstation cards and plain "Nvidia A-series" data-center accelerators.^[5]^[9]

Product	Die	Process	Memory	Primary use
A100 (40 GB / 80 GB)	GA100	TSMC 7 nm (N7)	40 / 80 GB HBM2 / HBM2e	AI training and HPC; flagship data center
A800	GA100	TSMC 7 nm (N7)	40 / 80 GB HBM2 / HBM2e	China-market A100 variant
A30	GA100	TSMC 7 nm (N7)	24 GB HBM2	Mainstream AI inference and training
A40	GA102	Samsung 8 nm (8N)	48 GB GDDR6	Data-center visualization and virtualization
A10	GA102	Samsung 8 nm (8N)	24 GB GDDR6	Graphics, virtualization, and inference
A16	4 x GA107	Samsung 8 nm (8N)	64 GB GDDR6 (4 x 16 GB)	Virtual desktop infrastructure
A2	GA107	Samsung 8 nm (8N)	16 GB GDDR6	Low-power edge and entry inference
RTX A6000	GA102	Samsung 8 nm (8N)	48 GB GDDR6	Professional workstation graphics
GeForce RTX 3090	GA102	Samsung 8 nm (8N)	24 GB GDDR6X	Enthusiast gaming and content creation

Among the data-center parts, the A30 joined the A100 as the second Ampere product to support Multi-Instance GPU partitioning, since both are based on the GA100 die. The A40 and A10 used the GA102 die for graphics-rich and virtualization workloads, the A16 combined four GA107 dies for virtual-desktop deployments, and the compact, low-power A2 (GA107) targeted edge inference. On the workstation side, the RTX A6000 was notable as the only GA102-based card to enable the die's full complement of 10,752 CUDA cores, paired with 48 GB of GDDR6 memory, and it was joined by the RTX A5000, RTX A4000, and other RTX A-series models.^[5]^[9]^[10]

What was the China-specific A800?

Following United States export controls announced in 2022 that restricted shipments of high-performance accelerators such as the A100 to China, Nvidia created the A800, a China-market variant of the A100 based on the same GA100 die. The principal modification was a reduction of the NVLink interconnect bandwidth from the A100's 600 GB/s down to 400 GB/s, placing it below the 600 GB/s interconnect threshold set by the U.S. Bureau of Industry and Security and thereby keeping the part exportable. Nvidia put the A800 into production in the third quarter of 2022; it was later itself restricted by tightened 2023 export rules.^[11]^[12]

What process nodes does Ampere use?

A distinctive feature of the Ampere generation is its split manufacturing strategy. The compute-oriented GA100 (and the A100, A30, and A800 built from it) is fabricated on TSMC's 7 nm N7 process, while every consumer and most professional/visualization dies, GA102 through GA107, are built on a custom version of Samsung's 8 nm process branded 8N. Using two foundries and two nodes allowed Nvidia to reserve the denser, more expensive TSMC process for its largest data-center chip while securing high-volume capacity at Samsung for the consumer GeForce line.^[1]^[8]

How does Ampere compare to Hopper, and why does it matter?

Ampere occupies a pivotal position in Nvidia's GPU roadmap. It unified the compute lineage that had run through Volta (the V100) and the graphics-plus-ray-tracing lineage of Turing under a single architectural generation, even though it continued to use distinct dies and process nodes for the two segments. It was in turn succeeded by two specialized architectures: Hopper (the H100) took over the data-center role in 2022, while Ada Lovelace (the GeForce RTX 40 series) succeeded it in consumer graphics, also in 2022.^[1]

The successor Hopper H100, announced at GTC 2022, moved the data-center die to a custom TSMC 4 nm process with about 80 billion transistors, roughly doubled A100 memory bandwidth to over 3 TB/s using HBM3, and added a dedicated Transformer Engine for FP8 training, which Nvidia positioned as an order-of-magnitude performance leap over the A100 for large-scale AI.^[13]

Generation	Flagship GPU	Die	Process	Transistors	Year
Volta	V100	GV100	TSMC 12 nm	21.1 billion	2017
Ampere	A100	GA100	TSMC 7 nm	54.2 billion	2020
Hopper	H100	GH100	TSMC 4 nm	~80 billion	2022

The A100's combination of third-generation Tensor Cores, TF32, structural sparsity, large HBM2/HBM2e capacity, and high-bandwidth NVLink made it the standard accelerator for training and deploying large neural networks in the period from 2020 to 2022. It anchored Nvidia's DGX A100 systems and HGX A100 baseboards and was deployed at scale in cloud data centers, where it powered much of the foundational research and early production of large language models before the H100 became broadly available. In this sense Ampere, and the A100 in particular, marked the architecture on which the modern era of large-scale generative AI was largely built.^[2]^[6]

References

NVIDIA, "NVIDIA Ampere Architecture In-Depth," NVIDIA Technical Blog. https://developer.nvidia.com/blog/nvidia-ampere-architecture-in-depth/ ↩
NVIDIA, "NVIDIA's New Ampere Data Center GPU in Full Production," NVIDIA Newsroom, May 14, 2020. https://nvidianews.nvidia.com/news/nvidias-new-ampere-data-center-gpu-in-full-production ↩
Ryan Smith, "NVIDIA Announces Ampere Architecture and A100 Products," AnandTech, May 14, 2020. https://www.anandtech.com/show/15801/nvidia-announces-ampere-architecture-and-a100-products ↩
"GeForce RTX 30 series," Wikipedia. https://en.wikipedia.org/wiki/GeForce_RTX_30_series ↩
"Ampere (microarchitecture)," Wikipedia. https://en.wikipedia.org/wiki/Ampere_(microarchitecture) ↩
NVIDIA, "NVIDIA A100 Tensor Core GPU Architecture" (whitepaper). https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf ↩
NVIDIA, "NVIDIA Doubles Down: Announces A100 80GB GPU, Supercharging World's Most Powerful GPU for AI Supercomputing," NVIDIA Newsroom, November 16, 2020. https://nvidianews.nvidia.com/news/nvidia-doubles-down-announces-a100-80gb-gpu-supercharging-worlds-most-powerful-gpu-for-ai-supercomputing ↩
NVIDIA, "NVIDIA Ampere GA102 GPU Architecture" (whitepaper). https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.1.pdf ↩
VideoCardz, "NVIDIA announces RTX A6000 workstation graphics card with full GA102 GPU is now available." https://videocardz.com/newz/nvidia-announces-rtx-a6000-workstation-graphics-card-with-full-ga102-gpu-is-now-available ↩
Tom's Hardware, "Nvidia Launches New Professional Ampere Graphics for Desktops and Laptops." https://www.tomshardware.com/news/nvidia-launches-a-host-of-professional-ampere-gpus ↩
WCCFTech, "NVIDIA & Partners Reallocating 'China-Only' A800 AI GPU Supply As US Restrictions Go Into Affect." https://wccftech.com/nvidia-reallocating-china-only-a800-ai-gpu-supply-as-us-restrictions-go-into-affect/ ↩
U.S. Securities and Exchange Commission, NVIDIA Corporation Form 8-K, August 26, 2022. https://www.sec.gov/Archives/edgar/data/0001045810/000104581022000146/nvda-20220826.htm ↩
NVIDIA, "NVIDIA Announces Hopper Architecture, the Next Generation of Accelerated Computing," NVIDIA Newsroom, March 22, 2022. https://nvidianews.nvidia.com/news/nvidia-announces-hopper-architecture-the-next-generation-of-accelerated-computing ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Ada Lovelace (microarchitecture)NVIDIA A100 NVIDIA A800 NVIDIA ConnectX Volta (microarchitecture)

What is NVIDIA Ampere?

When was Ampere announced and released?

What is the A100 (the GA100 data-center die)?

What is the consumer GeForce RTX 30 series (GA10x)?

What professional, workstation, and other data-center Ampere products were there?

What was the China-specific A800?

What process nodes does Ampere use?

How does Ampere compare to Hopper, and why does it matter?

References

Improve this article

Related Articles

CuDNN

Jetson Thor

NVIDIA Blackwell

NVIDIA DGX Spark

NVIDIA Picasso

Jensen Huang

What links here

Related Articles

CuDNN

Jetson Thor

NVIDIA Blackwell

NVIDIA DGX Spark

NVIDIA Picasso

Jensen Huang

What links here