Volta (microarchitecture)
Last reviewed
Jun 3, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 · 2,004 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 · 2,004 words
Add missing citations, update stale details, or suggest a clearer explanation.
Volta is a GPU microarchitecture developed by Nvidia and introduced in 2017. It is best known as the architecture of the Tesla V100, the data center accelerator that brought the first generation of Tensor Cores to market and established matrix-math acceleration as a defining feature of GPUs used for deep learning. Volta succeeded the Pascal architecture in Nvidia's data center and professional lines. Unusually, it was never used for a mainstream consumer GeForce gaming product: those markets moved instead to the Turing architecture, while the data center line continued with Ampere and its A100. Volta is widely regarded as a pivotal architecture in the history of AI hardware, both for popularizing dedicated tensor math units and for powering the Summit and Sierra supercomputers, the world's two fastest systems at their debut.
Following Nvidia's convention of naming microarchitectures after scientists, Volta is named after Alessandro Volta (1745 to 1827), the Italian physicist and chemist who invented the voltaic pile, an early electrochemical battery, and after whom the volt is named. The codename had appeared on Nvidia's public GPU roadmaps as early as 2013, several years before silicon shipped.
Nvidia formally unveiled Volta on May 10, 2017, when chief executive Jensen Huang introduced the Tesla V100 accelerator during the opening keynote of the company's GPU Technology Conference (GTC) in San Jose, California. Huang framed the launch as a step beyond what conventional process scaling alone could deliver, citing roughly a 5x increase in peak deep learning throughput over the preceding Pascal-based Tesla P100. Volume shipments of the V100 began in the third quarter of 2017. Additional Volta products followed, including the Titan V in December 2017 and, at GTC 2018, a 32 GB version of the V100 and the Quadro GV100 workstation card.
Volta's flagship silicon is the GV100 GPU. It contains 21.1 billion transistors on a die of approximately 815 mm squared, manufactured on TSMC's 12 nm FFN ("FinFET NVIDIA") process, a refinement of TSMC's 16 nm FinFET technology customized for Nvidia. At the time it was one of the largest production chips ever fabricated, approaching the practical reticle limit of contemporary lithography.
The full GV100 is organized into six Graphics Processing Clusters (GPCs) containing a total of 84 streaming multiprocessors (SMs). Each Volta SM holds 64 single-precision FP32 cores, 64 integer INT32 cores, 32 double-precision FP64 cores, and 8 Tensor Cores, giving the complete die 5,376 FP32 (CUDA) cores, 2,688 FP64 cores, and 672 Tensor Cores. Shipping products did not enable the full die: to improve manufacturing yield, the Tesla V100 and the other Volta cards activate 80 of the 84 SMs, for 5,120 CUDA cores and 640 Tensor Cores.
Beyond raw core counts, Volta introduced several changes to the SM that influenced later Nvidia designs:
The Tensor Core was Volta's signature contribution. Before Volta, GPUs accelerated deep learning using their general-purpose CUDA cores; Volta added hardware dedicated specifically to the dense matrix multiplications that dominate the training and inference of deep neural networks. By computing in reduced-precision FP16 while accumulating in FP32, the cores delivered far higher throughput than FP32 math while preserving enough numerical accuracy for many models, a technique known as mixed-precision training.
The performance gap was substantial. Nvidia rated the V100's 640 Tensor Cores at roughly 120 deep learning TFLOPS at launch (later revised to about 125 TFLOPS for the SXM2 form factor), compared with the card's roughly 15 TFLOPS of FP32 throughput. Nvidia claimed up to a 12x improvement in training throughput and up to 6x in inference relative to the Pascal-based P100. The approach proved durable: Tensor Cores became a permanent fixture of Nvidia data center and professional GPUs, expanding in later architectures to support additional numeric formats such as INT8, TF32, BF16, and FP8 through Ampere, Hopper, and Blackwell.
The following table summarizes the GV100 die and the principal Tesla V100 configurations.
| Specification | Full GV100 | Tesla V100 (SXM2) | Tesla V100 (PCIe) |
|---|---|---|---|
| Process | TSMC 12 nm FFN | TSMC 12 nm FFN | TSMC 12 nm FFN |
| Transistors | 21.1 billion | 21.1 billion | 21.1 billion |
| Die size | ~815 mm squared | ~815 mm squared | ~815 mm squared |
| SMs | 84 | 80 | 80 |
| CUDA (FP32) cores | 5,376 | 5,120 | 5,120 |
| FP64 cores | 2,688 | 2,560 | 2,560 |
| Tensor Cores | 672 | 640 | 640 |
| FP64 peak | n/a | ~7.8 TFLOPS | ~7 TFLOPS |
| FP32 peak | n/a | ~15.7 TFLOPS | ~14 TFLOPS |
| Tensor (deep learning) | n/a | ~125 TFLOPS | ~112 TFLOPS |
| Memory | up to 32 GB HBM2 | 16 or 32 GB HBM2 | 16 or 32 GB HBM2 |
| Memory bus | 4,096-bit | 4,096-bit | 4,096-bit |
| Memory bandwidth | up to 900 GB/s | 900 GB/s | 900 GB/s |
| Interconnect | NVLink 2.0 | NVLink 2.0, ~300 GB/s | PCIe 3.0, ~32 GB/s |
| TDP | n/a | 300 W | 250 W |
The V100 paired the GV100 with four stacks of second-generation high-bandwidth memory (HBM2) on a 4,096-bit interface, delivering 900 GB/s of bandwidth, about 50 percent more than the P100. The card launched with 16 GB of HBM2; a 32 GB variant arrived in 2018. The SXM2 module form factor exposed Nvidia's second-generation NVLink interconnect, with six links of 25 GB/s each for an aggregate 300 GB/s of GPU-to-GPU bandwidth, roughly twice that of the first-generation NVLink on Pascal. A PCI Express add-in-card variant communicated over the slower PCIe 3.0 bus and carried a lower 250 W power envelope versus 300 W for the SXM2 module.
Volta reached the market in four main products, none of them a mainstream gaming card.
| Product | Released | CUDA cores | Tensor Cores | Memory | Notes |
|---|---|---|---|---|---|
| Tesla V100 | 2017 (16 GB), 2018 (32 GB) | 5,120 | 640 | 16 or 32 GB HBM2 | Data center accelerator; SXM2 and PCIe forms |
| Titan V | December 2017 | 5,120 | 640 | 12 GB HBM2 | "Prosumer" card, USD 2,999 |
| Titan V CEO Edition | 2018 | 5,120 | 640 | 32 GB HBM2 | Limited edition given to researchers |
| Quadro GV100 | March 2018 | 5,120 | 640 | 32 GB HBM2 | Workstation card, USD 8,999 |
The Tesla V100 was the workhorse of the lineup and the basis for nearly all of Volta's deployments in data centers and supercomputers. The Titan V, launched in December 2017 at USD 2,999, was marketed to researchers and enthusiasts as the first GV100-based card sold through Nvidia's own store; notably, it carried a cut-down memory subsystem with 12 GB of HBM2 across a 3,072-bit interface (three stacks rather than four) for roughly 653 GB/s of bandwidth. A limited Titan V CEO Edition with the full 32 GB of memory was later distributed to selected AI researchers. The Quadro GV100, announced at GTC 2018 at USD 8,999, targeted professional visualization and added support for Nvidia's RTX real-time ray tracing software stack, pairing 32 GB of HBM2 with NVLink for dual-card configurations.
Volta never appeared in a mainstream GeForce gaming graphics card. The architecture's emphasis on FP64 throughput, Tensor Cores, and expensive HBM2 memory suited data center and professional workloads but was poorly matched to cost-sensitive gaming. Nvidia instead carried Volta's tensor and scheduling advances forward into the Turing architecture, launched in 2018, which served as the consumer and professional graphics successor and added dedicated ray-tracing (RT) cores. The data center lineage continued separately with the Ampere A100 in 2020.
Volta GPUs anchored several of the most powerful computers of their era. The U.S. Department of Energy's Summit system at Oak Ridge National Laboratory, built by IBM, used 4,608 nodes each containing two IBM POWER9 CPUs and six Tesla V100 GPUs, for a total of about 27,648 V100s. Summit topped the TOP500 list in June 2018 and reached 148.6 petaflops on the High Performance Linpack benchmark, with a theoretical peak near 200 petaflops, the great majority of it supplied by the GPUs. Its companion system, Sierra, at Lawrence Livermore National Laboratory, used a similar design with 4,320 nodes carrying two POWER9 CPUs and four V100s each, about 17,280 GPUs, and ranked among the top systems in the world for nuclear stockpile simulation.
Nvidia also packaged Volta into its own integrated AI appliances. The DGX-1 with Volta combined eight Tesla V100 GPUs in a single 3U server, providing 128 GB of aggregate HBM2 memory, 40,960 CUDA cores, and 5,120 Tensor Cores, and was rated at about 1 petaflops of mixed-precision deep learning performance. Its successor, the DGX-2, doubled the count to sixteen 32 GB V100s linked by twelve NVSwitch chips, an all-to-all interconnect fabric that let any GPU reach any other at 300 GB/s. The DGX-2 offered 512 GB of GPU memory and approximately 2 petaflops of performance, and it established the GPU-rich appliance template that Nvidia continued in later DGX generations.
Volta occupies an important transitional place in Nvidia's architectural history. It split the previously unified GPU roadmap into distinct branches: a compute-focused data center line that ran from Volta through Ampere, Hopper, and Blackwell, and a graphics line that began with Turing for consumer and professional users. More consequentially for the broader field, Volta normalized the idea of building dedicated matrix-multiply hardware into a GPU. The Tensor Core it introduced became standard equipment across the industry's AI accelerators, and the V100 served as a primary training platform for many landmark deep learning models in the late 2010s. Even years after its release, V100-based systems remained in heavy use for research and production machine learning, a testament to the architecture's lasting influence.