NVIDIA L40S
Last reviewed
Jun 3, 2026
Sources
8 citations
Review status
Source-backed
Revision
v1 · 1,597 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
8 citations
Review status
Source-backed
Revision
v1 · 1,597 words
Add missing citations, update stale details, or suggest a clearer explanation.
The NVIDIA L40S is a data-center graphics processing unit (GPU) based on the Ada Lovelace architecture, announced by NVIDIA at the SIGGRAPH conference on August 8, 2023. NVIDIA marketed it as a "universal" data-center accelerator: a single card intended to handle artificial-intelligence training and inference, generative AI, model fine-tuning, 3D graphics and rendering, video processing, and industrial digitalization workloads on the Omniverse platform. Built on the same large AD102 die used in NVIDIA's flagship Ada Lovelace products, the L40S pairs 48 GB of GDDR6 memory with fourth-generation Tensor Cores and an FP8 Transformer Engine. It became commercially significant during the 2023 to 2024 GPU supply crunch, when it was widely adopted as a more available and less expensive alternative to the H100 for inference and some training work.
The L40S is the compute-rebalanced sibling of the earlier NVIDIA L40, a graphics- and visualization-focused card released in late 2022. The two share the same die, memory configuration, and physical form factor, but the L40S raises clock speeds and power, and NVIDIA positions it as a compute accelerator rather than primarily a visualization product.
The L40S is built on NVIDIA's Ada Lovelace architecture and the AD102 graphics processor, the largest die in the Ada generation. Ada Lovelace introduced fourth-generation Tensor Cores with a Transformer Engine that supports the FP8 (8-bit floating point) data format, third-generation ray-tracing (RT) cores, and the architecture's CUDA core design. The L40S exposes 18,176 CUDA cores, 568 fourth-generation Tensor Cores, and 142 third-generation RT Cores.
A defining characteristic of the L40S is its memory subsystem. Rather than the high-bandwidth memory (HBM) used on NVIDIA's top-tier compute accelerators such as the A100 and H100, the L40S uses 48 GB of conventional GDDR6 memory with error-correcting code (ECC) on a 384-bit bus, providing 864 GB/s of bandwidth. This is considerably less bandwidth than an H100 (which offers roughly 3.35 TB/s with HBM3), but GDDR6 is cheaper and more readily manufactured, and 48 GB is a large pool relative to the card's class. The card also omits NVLink, NVIDIA's high-speed GPU-to-GPU interconnect; multiple L40S cards communicate over the standard PCIe Gen4 x16 host interface (about 64 GB/s bidirectional) rather than a dedicated GPU fabric. The L40S also lacks Multi-Instance GPU (MIG) partitioning, though it does support NVIDIA virtual-GPU (vGPU) software.
Physically, the L40S is a dual-slot, full-height PCIe card with a passive (fanless) cooler designed for the high-airflow chassis of data-center servers, drawing up to 350 W through a single 16-pin power connector.
The following table lists the principal specifications of the NVIDIA L40S. Tensor throughput figures are quoted as dense, then with structured sparsity (dense | sparse), following NVIDIA's datasheet convention.
| Specification | NVIDIA L40S |
|---|---|
| Architecture | Ada Lovelace (AD102) |
| Announced | August 8, 2023 (SIGGRAPH) |
| CUDA cores | 18,176 |
| Tensor Cores | 568 (4th generation) |
| RT Cores | 142 (3rd generation) |
| GPU memory | 48 GB GDDR6 with ECC |
| Memory interface | 384-bit |
| Memory bandwidth | 864 GB/s |
| FP32 | 91.6 TFLOPS |
| TF32 Tensor Core | 183 | 366 TFLOPS |
| FP16 / BF16 Tensor Core | 362 | 733 TFLOPS |
| FP8 Tensor Core | 733 | 1,466 TFLOPS |
| INT8 Tensor Core | 733 | 1,466 TOPS |
| RT Core performance | 212 TFLOPS |
| Interconnect | PCIe Gen4 x16 (no NVLink) |
| Form factor | Dual-slot, full-height; passive cooling |
| Max power | 350 W |
| Media engines | 3x NVENC (AV1), 3x NVDEC |
NVIDIA summarizes the card's AI capability as more than 1.45 petaflops of tensor processing. The FP8 Transformer Engine, inherited from the Ada Lovelace architecture, can automatically recast computations between FP8 and FP16 precision to accelerate transformer-based neural networks while managing accuracy. In NVIDIA's own comparisons against the previous-generation A100 Tensor Core GPU, the L40S was claimed to deliver up to 1.2 times higher generative-AI inference performance and up to 1.7 times higher training performance.
The plain NVIDIA L40 and the L40S are closely related but aimed at different priorities. Both are built on the AD102 die, carry 18,176 CUDA cores, 568 Tensor Cores, and 142 RT Cores, and use 48 GB of GDDR6 with ECC at 864 GB/s. Neither supports NVLink. The differences are in clocks, power, and emphasis.
The L40, introduced in 2022, is positioned chiefly for graphics, visualization, virtual workstations, and as "the engine of NVIDIA Omniverse in the data center," with AI acceleration as a secondary capability. The L40S runs at a higher power limit (350 W versus 300 W) and a modestly higher boost clock (around 2,520 MHz versus 2,490 MHz), which lifts its sustained compute throughput. The most consequential gap is in AI tensor math: the L40S roughly doubles the L40's dense FP8 and FP16 tensor throughput (about 733 TFLOPS dense versus about 362 TFLOPS dense). Both cards physically support FP8 because they share the Ada architecture, but NVIDIA tunes and markets the L40S around the Transformer Engine and AI compute, whereas the L40 leans toward rendering and visual computing. The ray-tracing figures are nearly identical (about 212 TFLOPS on the L40S versus about 209 TFLOPS on the L40), reflecting the shared RT-core count.
| Attribute | NVIDIA L40 | NVIDIA L40S |
|---|---|---|
| Launch | 2022 | 2023 |
| Primary focus | Graphics, visualization, Omniverse | AI compute, inference, generative AI |
| FP32 | 90.5 TFLOPS | 91.6 TFLOPS |
| FP8 Tensor (dense | sparse) | ~362 | 724 TFLOPS | 733 | 1,466 TFLOPS |
| RT Core performance | ~209 TFLOPS | 212 TFLOPS |
| Max power | 300 W | 350 W |
| Memory | 48 GB GDDR6, 864 GB/s | 48 GB GDDR6, 864 GB/s |
| NVLink | No | No |
The L40S arrived at a moment of acute scarcity for NVIDIA's flagship data-center accelerator. Through late 2023 and into 2024, the H100 was extremely difficult to obtain, with reported lead times stretching to many months as TSMC's CoWoS advanced-packaging capacity and HBM supply constrained output. Because the L40S uses standard GDDR6 rather than HBM and does not require CoWoS packaging, it could be manufactured and shipped in far greater volume, often described as available in weeks rather than quarters.
This availability, combined with lower cost and power, made the L40S an attractive substitute for buyers who could not secure Hopper-class hardware. Industry analysis at the time characterized the trade-off plainly: across a broad range of AI training and inference workloads the H100 offered roughly two to three times the performance of an L40S, but the H100 also cost on the order of three times as much and consumed about twice the power. For inference and for fine-tuning smaller models, where the L40S's 48 GB of memory and strong FP8 throughput were sufficient, the card frequently delivered better performance per dollar and per watt. Its principal limitations relative to the H100 were the lower memory bandwidth of GDDR6, the absence of NVLink for tightly coupled multi-GPU scaling, and very limited double-precision (FP64) throughput, which restricts its usefulness for traditional high-performance computing.
NVIDIA reinforced this positioning by making the L40S a centerpiece of its OVX server line for generative AI and Omniverse, with configurations supporting up to eight L40S GPUs per system. Because the card lacks NVLink, deployments that needed many GPUs typically relied on PCIe and, in some composable-infrastructure designs, attached large numbers of L40S cards to a single host to maximize density and throughput.
At launch, NVIDIA lined up broad original-equipment-manufacturer (OEM) support for the L40S, with ASUS, Dell Technologies, GIGABYTE, Hewlett Packard Enterprise, Lenovo, QCT, and Supermicro all planning OVX systems built around the card, and general availability beginning in the fall of 2023. The L40S was subsequently offered widely by cloud and GPU-rental providers as a mid-tier accelerator for inference, generative-AI serving, fine-tuning, rendering, and virtual-desktop workloads, frequently paired with software such as NVIDIA's TensorRT-LLM inference library to extract maximum performance from its FP8 Transformer Engine.
The L40S's significance lies in how it broadened access to capable AI compute during a period when the most powerful accelerators were rationed. By repackaging the large Ada Lovelace die with a generous pool of conventional GDDR6 memory and a high power budget, NVIDIA produced a versatile card that, while well short of the H100 in raw throughput and memory bandwidth, was abundant, affordable, and good enough for a wide swath of inference and generative-AI deployments. It sits in NVIDIA's data-center lineup above the compact, low-power L4 (also an Ada Lovelace product) and below the HBM-based Hopper accelerators, occupying the role of a high-volume, general-purpose GPU for organizations that prioritized availability and cost efficiency over peak performance.