NVIDIA L40S
Last reviewed
Sources
8 citations
Review status
Source-backed
Revision
v2 · 1,895 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
8 citations
Review status
Source-backed
Revision
v2 · 1,895 words
Add missing citations, update stale details, or suggest a clearer explanation.
The NVIDIA L40S is a data-center GPU based on the Ada Lovelace architecture that NVIDIA markets as a single "universal" accelerator for AI inference and training, generative AI, 3D graphics, rendering, and video processing. NVIDIA announced it at the SIGGRAPH conference on August 8, 2023, and it pairs 48 GB of GDDR6 memory with fourth-generation Tensor Cores and an FP8 Transformer Engine, delivering more than 1.45 petaflops of tensor processing on a standard PCIe card with no NVLink.[1][2][3] During the 2023 to 2024 GPU supply crunch it became commercially significant as a more available and less expensive alternative to the H100 for inference and mixed AI-plus-graphics work, including workloads on the Omniverse platform.[7]
Built on the same large AD102 die used in NVIDIA's flagship Ada Lovelace products, the L40S is the compute-rebalanced sibling of the earlier NVIDIA L40, a graphics- and visualization-focused card released in late 2022.[4][5] The two share the same die, 48 GB memory configuration, and physical form factor, but the L40S raises clock speeds and power and is positioned as a compute accelerator rather than primarily a visualization product.[4]
The NVIDIA L40S is a dual-slot PCIe data-center GPU built on the Ada Lovelace architecture and announced at SIGGRAPH on August 8, 2023.[3] NVIDIA describes it as "a powerful, universal data center processor designed to accelerate the most compute-intensive, complex applications, including AI training and inference, 3D design and visualization, video processing and industrial digitalization with the NVIDIA Omniverse platform."[3] In NVIDIA's framing, the card's defining trait is breadth: rather than specializing in either rendering (like the L40) or peak AI throughput (like the H100), it is tuned to do a wide range of AI, graphics, and video work acceptably well on widely available hardware.[1][4]
Bob Pette, vice president of professional visualization at NVIDIA, summarized the positioning at launch: "As generative AI transforms every industry, enterprises are increasingly seeking large-scale compute resources in the data center. OVX systems with NVIDIA L40S GPUs accelerate AI, graphics and video processing workloads, and meet the demanding performance requirements of an ever-increasing set of complex and diverse applications."[3]
The L40S is built on NVIDIA's Ada Lovelace architecture and the AD102 graphics processor, the largest die in the Ada generation.[2][4] Ada Lovelace introduced fourth-generation Tensor Cores with a Transformer Engine that supports the FP8 (8-bit floating point) data format, third-generation ray-tracing (RT) cores, and the architecture's CUDA core design.[1][2] The L40S exposes 18,176 CUDA cores, 568 fourth-generation Tensor Cores, and 142 third-generation RT Cores.[2]
A defining characteristic of the L40S is its memory subsystem. Rather than the high-bandwidth memory (HBM) used on NVIDIA's top-tier compute accelerators such as the A100 and H100, the L40S uses 48 GB of conventional GDDR6 memory with error-correcting code (ECC) on a 384-bit bus, providing 864 GB/s of bandwidth.[2][8] This is considerably less bandwidth than an H100 (which offers roughly 3.35 TB/s with HBM3), but GDDR6 is cheaper and more readily manufactured, and 48 GB is a large pool relative to the card's class.[7] The card also omits NVLink, NVIDIA's high-speed GPU-to-GPU interconnect; multiple L40S cards communicate over the standard PCIe Gen4 x16 host interface (about 64 GB/s bidirectional) rather than a dedicated GPU fabric.[2][8] The L40S also lacks Multi-Instance GPU (MIG) partitioning, though it does support NVIDIA virtual-GPU (vGPU) software.[8]
Physically, the L40S is a dual-slot, full-height PCIe card with a passive (fanless) cooler designed for the high-airflow chassis of data-center servers, drawing up to 350 W through a single 16-pin power connector.[2][8]
The following table lists the principal specifications of the NVIDIA L40S. Tensor throughput figures are quoted as dense, then with structured sparsity (dense | sparse), following NVIDIA's datasheet convention.[2]
| Specification | NVIDIA L40S |
|---|---|
| Architecture | Ada Lovelace (AD102) |
| Announced | August 8, 2023 (SIGGRAPH) |
| CUDA cores | 18,176 |
| Tensor Cores | 568 (4th generation) |
| RT Cores | 142 (3rd generation) |
| GPU memory | 48 GB GDDR6 with ECC |
| Memory interface | 384-bit |
| Memory bandwidth | 864 GB/s |
| FP32 | 91.6 TFLOPS |
| TF32 Tensor Core | 183 | 366 TFLOPS |
| FP16 / BF16 Tensor Core | 362 | 733 TFLOPS |
| FP8 Tensor Core | 733 | 1,466 TFLOPS |
| INT8 Tensor Core | 733 | 1,466 TOPS |
| RT Core performance | 212 TFLOPS |
| Interconnect | PCIe Gen4 x16 (no NVLink) |
| Form factor | Dual-slot, full-height; passive cooling |
| Max power | 350 W |
| Media engines | 3x NVENC (AV1), 3x NVDEC |
NVIDIA summarizes the card's AI capability as more than 1.45 petaflops of tensor processing.[2][3] The FP8 Transformer Engine, inherited from the Ada Lovelace architecture, "intelligently recasts between FP8 and FP16 precisions" to accelerate transformer-based neural networks while managing accuracy.[1] In NVIDIA's own comparisons against the previous-generation A100 Tensor Core GPU, the L40S was claimed to deliver up to 1.2 times higher generative-AI inference performance and up to 1.7 times higher training performance.[3]
The plain NVIDIA L40 and the L40S are closely related but aimed at different priorities. Both are built on the AD102 die, carry 18,176 CUDA cores, 568 Tensor Cores, and 142 RT Cores, and use 48 GB of GDDR6 with ECC at 864 GB/s. Neither supports NVLink.[2][6] The differences are in clocks, power, and emphasis.
The L40, introduced in 2022, is positioned chiefly for graphics, visualization, virtual workstations, and as "the engine of NVIDIA Omniverse in the data center," with AI acceleration as a secondary capability.[5][6] The L40S runs at a higher power limit (350 W versus 300 W) and a modestly higher boost clock (around 2,520 MHz versus 2,490 MHz), which lifts its sustained compute throughput.[4][8] The most consequential gap is in AI tensor math: the L40S roughly doubles the L40's dense FP8 and FP16 tensor throughput (about 733 TFLOPS dense versus about 362 TFLOPS dense).[2][6] Both cards physically support FP8 because they share the Ada architecture, but NVIDIA tunes and markets the L40S around the Transformer Engine and AI compute, whereas the L40 leans toward rendering and visual computing.[1][5] The ray-tracing figures are nearly identical (about 212 TFLOPS on the L40S versus about 209 TFLOPS on the L40), reflecting the shared RT-core count.[2][6]
| Attribute | NVIDIA L40 | NVIDIA L40S |
|---|---|---|
| Launch | 2022 | 2023 |
| Primary focus | Graphics, visualization, Omniverse | AI compute, inference, generative AI |
| FP32 | 90.5 TFLOPS | 91.6 TFLOPS |
| FP8 Tensor (dense | sparse) | ~362 | 724 TFLOPS | 733 | 1,466 TFLOPS |
| RT Core performance | ~209 TFLOPS | 212 TFLOPS |
| Max power | 300 W | 350 W |
| Memory | 48 GB GDDR6, 864 GB/s | 48 GB GDDR6, 864 GB/s |
| NVLink | No | No |
The L40S arrived at a moment of acute scarcity for NVIDIA's flagship data-center accelerator. Through late 2023 and into 2024, the H100 (an NVIDIA Hopper architecture product) was extremely difficult to obtain, with reported lead times stretching to many months as TSMC's CoWoS advanced-packaging capacity and HBM supply constrained output.[7] Because the L40S uses standard GDDR6 rather than HBM and does not require CoWoS packaging, it could be manufactured and shipped in far greater volume, often described as available in weeks rather than quarters.[7]
This availability, combined with lower cost and power, made the L40S an attractive substitute for buyers who could not secure Hopper-class hardware. Industry analysis at the time characterized the trade-off plainly: across a broad range of AI training and inference workloads the H100 offered roughly two to three times the performance of an L40S, but the H100 also cost on the order of three times as much and consumed about twice the power.[7] For inference and for fine-tuning smaller models, where the L40S's 48 GB of memory and strong FP8 throughput were sufficient, the card frequently delivered better performance per dollar and per watt.[7] Its principal limitations relative to the H100 were the lower memory bandwidth of GDDR6, the absence of NVLink for tightly coupled multi-GPU scaling, and very limited double-precision (FP64) throughput (one analysis put it at about 1.4 teraflops running in emulation mode, calling it "basically non-existent"), which restricts its usefulness for traditional high-performance computing.[7]
| Attribute | NVIDIA L40S | NVIDIA H100 (SXM) |
|---|---|---|
| Architecture | Ada Lovelace (AD102) | Hopper (GH100) |
| Memory | 48 GB GDDR6, 864 GB/s | 80 GB HBM3, ~3.35 TB/s |
| NVLink | No (PCIe Gen4 only) | Yes |
| Relative AI performance | Baseline | ~2-3x the L40S |
| Relative cost | Baseline | ~3x the L40S |
| Relative power | Baseline | ~2x the L40S |
| FP64 | Minimal | High (HPC-class) |
NVIDIA reinforced this positioning by making the L40S a centerpiece of its OVX server line for generative AI and Omniverse, with configurations supporting up to eight L40S GPUs per system.[3] Because the card lacks NVLink, deployments that needed many GPUs typically relied on PCIe and, in some composable-infrastructure designs, attached large numbers of L40S cards to a single host to maximize density and throughput.[7]
At launch, NVIDIA lined up broad original-equipment-manufacturer (OEM) support for the L40S, with ASUS, Dell Technologies, GIGABYTE, Hewlett Packard Enterprise, Lenovo, QCT, and Supermicro all planning OVX systems built around the card, and general availability beginning in the fall of 2023.[3] The L40S was subsequently offered widely by cloud and GPU-rental providers as a mid-tier accelerator for inference, generative-AI serving, fine-tuning, rendering, and virtual-desktop workloads, frequently paired with software such as NVIDIA's TensorRT-LLM inference library to extract maximum performance from its FP8 Transformer Engine.[1]
The L40S's significance lies in how it broadened access to capable AI compute during a period when the most powerful accelerators were rationed. By repackaging the large Ada Lovelace die with a generous pool of conventional GDDR6 memory and a high power budget, NVIDIA produced a versatile card that, while well short of the H100 in raw throughput and memory bandwidth, was abundant, affordable, and good enough for a wide swath of inference and generative-AI deployments.[7] It sits in NVIDIA's data-center lineup above the compact, low-power L4 (also an Ada Lovelace product) and below the HBM-based Hopper accelerators, occupying the role of a high-volume, general-purpose GPU for organizations that prioritized availability and cost efficiency over peak performance.[1][7]