Volta (microarchitecture)

AI Hardware NVIDIA

10 min read

Updated Jul 17, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 17, 2026

Fact-checked

In review queue

Sources

12 citations

Revision

v3 · 2,001 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Volta is a GPU microarchitecture developed by Nvidia and introduced in 2017. It is best known as the architecture of the Tesla V100, the data center accelerator that brought the first generation of Tensor Cores to market and established matrix-math acceleration as a defining feature of GPUs used for deep learning. Volta succeeded the Pascal architecture in Nvidia's data center and professional lines. Unusually, it was never used for a mainstream consumer GeForce gaming product: those markets moved instead to the Turing architecture, while the data center line continued with Ampere and its A100. Volta is widely regarded as a pivotal architecture in the history of AI hardware, both for popularizing dedicated tensor math units and for powering the Summit and Sierra supercomputers, the world's two fastest systems at their debut.^[8]

Naming

Following Nvidia's convention of naming microarchitectures after scientists, Volta is named after Alessandro Volta (1745 to 1827), the Italian physicist and chemist who invented the voltaic pile, an early electrochemical battery, and after whom the volt is named. The codename had appeared on Nvidia's public GPU roadmaps as early as 2013, several years before silicon shipped.

Announcement and release

Nvidia formally unveiled Volta on May 10, 2017, when chief executive Jensen Huang introduced the Tesla V100 accelerator during the opening keynote of the company's GPU Technology Conference (GTC) in San Jose, California.^[2] Huang framed the launch as a step beyond what conventional process scaling alone could deliver, citing roughly a 5x increase in peak deep learning throughput over the preceding Pascal-based Tesla P100.^[2] Volume shipments of the V100 began in the third quarter of 2017.^[3] Additional Volta products followed, including the Titan V in December 2017 and, at GTC 2018, a 32 GB version of the V100 and the Quadro GV100 workstation card.^[6]^[7]

The GV100 die

Volta's flagship silicon is the GV100 GPU. It contains 21.1 billion transistors on a die of approximately 815 mm squared, manufactured on TSMC's 12 nm FFN ("FinFET NVIDIA") process, a refinement of TSMC's 16 nm FinFET technology customized for Nvidia.^[1]^[4] At the time it was one of the largest production chips ever fabricated, approaching the practical reticle limit of contemporary lithography.

The full GV100 is organized into six Graphics Processing Clusters (GPCs) containing a total of 84 streaming multiprocessors (SMs).^[1] Each Volta SM holds 64 single-precision FP32 cores, 64 integer INT32 cores, 32 double-precision FP64 cores, and 8 Tensor Cores, giving the complete die 5,376 FP32 (CUDA) cores, 2,688 FP64 cores, and 672 Tensor Cores.^[1]^[5] Shipping products did not enable the full die: to improve manufacturing yield, the Tesla V100 and the other Volta cards activate 80 of the 84 SMs, for 5,120 CUDA cores and 640 Tensor Cores.^[1]

Architectural innovations

Beyond raw core counts, Volta introduced several changes to the SM that influenced later Nvidia designs:

Tensor Cores. Each Tensor Core performs a fused multiply-accumulate on small matrices, multiplying two 4x4 FP16 matrices and adding a third 4x4 FP16 or FP32 matrix to produce an FP32 result. A single core executes 64 floating-point fused multiply-add operations per clock, and the units are designed to feed the General Matrix Multiply (GEMM) operations at the heart of neural network training and inference.^[1]
Independent thread scheduling. Volta gave each thread its own program counter and call stack, relaxing the lockstep execution of earlier single-instruction multiple-thread (SIMT) designs. This allowed threads within a warp to diverge and reconverge with finer-grained synchronization, simplifying certain parallel algorithms.^[1]^[12]
Unified L1 and shared memory. Volta combined the L1 data cache and shared memory into a single 128 KB block per SM that could be partitioned by the programmer, a large increase over Pascal and a design later carried into subsequent architectures.^[1]^[12]

Tensor Cores and deep learning

The Tensor Core was Volta's signature contribution. Before Volta, GPUs accelerated deep learning using their general-purpose CUDA cores; Volta added hardware dedicated specifically to the dense matrix multiplications that dominate the training and inference of deep neural networks. By computing in reduced-precision FP16 while accumulating in FP32, the cores delivered far higher throughput than FP32 math while preserving enough numerical accuracy for many models, a technique known as mixed-precision training.

The performance gap was substantial. Nvidia rated the V100's 640 Tensor Cores at roughly 120 deep learning TFLOPS at launch (later revised to about 125 TFLOPS for the SXM2 form factor), compared with the card's roughly 15 TFLOPS of FP32 throughput.^[1] Nvidia claimed up to a 12x improvement in training throughput and up to 6x in inference relative to the Pascal-based P100.^[2] The approach proved durable: Tensor Cores became a permanent fixture of Nvidia data center and professional GPUs, expanding in later architectures to support additional numeric formats such as INT8, TF32, BF16, and FP8 through Ampere, Hopper, and Blackwell.

Specifications

The following table summarizes the GV100 die and the principal Tesla V100 configurations.

Specification	Full GV100	Tesla V100 (SXM2)	Tesla V100 (PCIe)
Process	TSMC 12 nm FFN	TSMC 12 nm FFN	TSMC 12 nm FFN
Transistors	21.1 billion	21.1 billion	21.1 billion
Die size	~815 mm squared	~815 mm squared	~815 mm squared
SMs	84	80	80
CUDA (FP32) cores	5,376	5,120	5,120
FP64 cores	2,688	2,560	2,560
Tensor Cores	672	640	640
FP64 peak	n/a	~7.8 TFLOPS	~7 TFLOPS
FP32 peak	n/a	~15.7 TFLOPS	~14 TFLOPS
Tensor (deep learning)	n/a	~125 TFLOPS	~112 TFLOPS
Memory	up to 32 GB HBM2	16 or 32 GB HBM2	16 or 32 GB HBM2
Memory bus	4,096-bit	4,096-bit	4,096-bit
Memory bandwidth	up to 900 GB/s	900 GB/s	900 GB/s
Interconnect	NVLink 2.0	NVLink 2.0, ~300 GB/s	PCIe 3.0, ~32 GB/s
TDP	n/a	300 W	250 W

The V100 paired the GV100 with four stacks of second-generation high-bandwidth memory (HBM2) on a 4,096-bit interface, delivering 900 GB/s of bandwidth, about 50 percent more than the P100.^[1] The card launched with 16 GB of HBM2; a 32 GB variant arrived in 2018.^[7] The SXM2 module form factor exposed Nvidia's second-generation NVLink interconnect, with six links of 25 GB/s each for an aggregate 300 GB/s of GPU-to-GPU bandwidth, roughly twice that of the first-generation NVLink on Pascal.^[1] A PCI Express add-in-card variant communicated over the slower PCIe 3.0 bus and carried a lower 250 W power envelope versus 300 W for the SXM2 module.^[1]

Products

Volta reached the market in four main products, none of them a mainstream gaming card.

Product	Released	CUDA cores	Tensor Cores	Memory	Notes
Tesla V100	2017 (16 GB), 2018 (32 GB)	5,120	640	16 or 32 GB HBM2	Data center accelerator; SXM2 and PCIe forms
Titan V	December 2017	5,120	640	12 GB HBM2	"Prosumer" card, USD 2,999
Titan V CEO Edition	2018	5,120	640	32 GB HBM2	Limited edition given to researchers
Quadro GV100	March 2018	5,120	640	32 GB HBM2	Workstation card, USD 8,999

The Tesla V100 was the workhorse of the lineup and the basis for nearly all of Volta's deployments in data centers and supercomputers. The Titan V, launched in December 2017 at USD 2,999, was marketed to researchers and enthusiasts as the first GV100-based card sold through Nvidia's own store; notably, it carried a cut-down memory subsystem with 12 GB of HBM2 across a 3,072-bit interface (three stacks rather than four) for roughly 653 GB/s of bandwidth.^[6] A limited Titan V CEO Edition with the full 32 GB of memory was later distributed to selected AI researchers. The Quadro GV100, announced at GTC 2018 at USD 8,999, targeted professional visualization and added support for Nvidia's RTX real-time ray tracing software stack, pairing 32 GB of HBM2 with NVLink for dual-card configurations.^[7]

No consumer GeForce Volta

Volta never appeared in a mainstream GeForce gaming graphics card. The architecture's emphasis on FP64 throughput, Tensor Cores, and expensive HBM2 memory suited data center and professional workloads but was poorly matched to cost-sensitive gaming. Nvidia instead carried Volta's tensor and scheduling advances forward into the Turing architecture, launched in 2018, which served as the consumer and professional graphics successor and added dedicated ray-tracing (RT) cores. The data center lineage continued separately with the Ampere A100 in 2020.

Systems and supercomputers

Volta GPUs anchored several of the most powerful computers of their era. The U.S. Department of Energy's Summit system at Oak Ridge National Laboratory, built by IBM, used 4,608 nodes each containing two IBM POWER9 CPUs and six Tesla V100 GPUs, for a total of about 27,648 V100s.^[9] Summit topped the TOP500 list in June 2018 and reached 148.6 petaflops on the High Performance Linpack benchmark, with a theoretical peak near 200 petaflops, the great majority of it supplied by the GPUs.^[8] Its companion system, Sierra, at Lawrence Livermore National Laboratory, used a similar design with 4,320 nodes carrying two POWER9 CPUs and four V100s each, about 17,280 GPUs, and ranked among the top systems in the world for nuclear stockpile simulation.^[8]

Nvidia also packaged Volta into its own integrated AI appliances. The DGX-1 with Volta combined eight Tesla V100 GPUs in a single 3U server, providing 128 GB of aggregate HBM2 memory, 40,960 CUDA cores, and 5,120 Tensor Cores, and was rated at about 1 petaflops of mixed-precision deep learning performance.^[10] Its successor, the DGX-2, doubled the count to sixteen 32 GB V100s linked by twelve NVSwitch chips, an all-to-all interconnect fabric that let any GPU reach any other at 300 GB/s.^[11] The DGX-2 offered 512 GB of GPU memory and approximately 2 petaflops of performance, and it established the GPU-rich appliance template that Nvidia continued in later DGX generations.^[11]

Lineage and significance

Volta occupies an important transitional place in Nvidia's architectural history. It split the previously unified GPU roadmap into distinct branches: a compute-focused data center line that ran from Volta through Ampere, Hopper, and Blackwell, and a graphics line that began with Turing for consumer and professional users. More consequentially for the broader field, Volta normalized the idea of building dedicated matrix-multiply hardware into a GPU. The Tensor Core it introduced became standard equipment across the industry's AI accelerators, and the V100 served as a primary training platform for many landmark deep learning models in the late 2010s. Even years after its release, V100-based systems remained in heavy use for research and production machine learning, a testament to the architecture's lasting influence.

References

NVIDIA. "NVIDIA Tesla V100 GPU Architecture (Whitepaper, WP-08608-001_v1.1)." August 2017. https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf ↩
NVIDIA Newsroom. "NVIDIA Launches Revolutionary Volta GPU Platform, Fueling Next Era of AI and High Performance Computing." May 10, 2017. https://nvidianews.nvidia.com/news/nvidia-launches-revolutionary-volta-gpu-platform-fueling-next-era-of-ai-and-high-performance-computing ↩
Wikipedia. "Volta (microarchitecture)." https://en.wikipedia.org/wiki/Volta_(microarchitecture) ↩
WCCFtech. "NVIDIA Volta GV100 12nm FinFET GPU Detailed: Tesla V100 Specifications Include 21 Billion Transistors, 5120 CUDA Cores, 16 GB HBM2 With 900 GB/s Bandwidth." https://wccftech.com/nvidia-volta-gv100-gpu-tesla-v100-architecture-specifications-deep-dive/ ↩
Tom's Hardware. "Nvidia Details Volta GV100 GPU, Tesla V100 Accelerator." https://www.tomshardware.com/news/nvidia-tesla-v100-volta-gpu,34379.html ↩
PC Perspective. "NVIDIA Launches Titan V, the World's First Consumer Volta GPU with HBM2." December 2017. https://pcper.com/2017/12/nvidia-launches-titan-v-the-worlds-first-consumer-volta-gpu-with-hbm2/ ↩
Tom's Hardware. "Nvidia Announces Quadro GV100, New NVSwitch, 32GB Tesla V100 At GTC 2018." https://www.tomshardware.com/news/nvidia-gtc-2018-v100-nvswitch,36748.html ↩
TOP500. "US Regains TOP500 Crown with Summit Supercomputer, Sierra Grabs Number Three Spot." June 2018. https://www.top500.org/news/us-regains-top500-crown-with-summit-supercomputer-sierra-grabs-number-three-spot/ ↩
Wikipedia. "Summit (supercomputer)." https://en.wikipedia.org/wiki/Summit_(supercomputer) ↩
NVIDIA. "NVIDIA DGX-1 With Tesla V100 System Architecture (Whitepaper)." https://images.nvidia.com/content/pdf/dgx1-v100-system-architecture-whitepaper.pdf ↩
Wikipedia. "Nvidia DGX." https://en.wikipedia.org/wiki/Nvidia_DGX ↩
NVIDIA. "Volta Tuning Guide." https://docs.nvidia.com/cuda/volta-tuning-guide/index.html ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

NVIDIA Blackwell

Naming

Announcement and release

The GV100 die

Architectural innovations

Tensor Cores and deep learning

Specifications

Products

No consumer GeForce Volta

Systems and supercomputers

Lineage and significance

References

Improve this article

Related Articles

CuDNN

Jetson Thor

NVIDIA Blackwell

NVIDIA DGX Spark

NVIDIA Picasso

Jensen Huang

What links here

Related Articles

CuDNN

Jetson Thor

NVIDIA Blackwell

NVIDIA DGX Spark

NVIDIA Picasso

Jensen Huang