NVIDIA L40S

AI Hardware NVIDIA

9 min read

Updated Jun 27, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 27, 2026

Fact-checked

In review queue

Sources

8 citations

Revision

v2 · 1,895 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

The NVIDIA L40S is a data-center GPU based on the Ada Lovelace architecture that NVIDIA markets as a single "universal" accelerator for AI inference and training, generative AI, 3D graphics, rendering, and video processing. NVIDIA announced it at the SIGGRAPH conference on August 8, 2023, and it pairs 48 GB of GDDR6 memory with fourth-generation Tensor Cores and an FP8 Transformer Engine, delivering more than 1.45 petaflops of tensor processing on a standard PCIe card with no NVLink.^[1]^[2]^[3] During the 2023 to 2024 GPU supply crunch it became commercially significant as a more available and less expensive alternative to the H100 for inference and mixed AI-plus-graphics work, including workloads on the Omniverse platform.^[7]

Built on the same large AD102 die used in NVIDIA's flagship Ada Lovelace products, the L40S is the compute-rebalanced sibling of the earlier NVIDIA L40, a graphics- and visualization-focused card released in late 2022.^[4]^[5] The two share the same die, 48 GB memory configuration, and physical form factor, but the L40S raises clock speeds and power and is positioned as a compute accelerator rather than primarily a visualization product.^[4]

What is the NVIDIA L40S?

The NVIDIA L40S is a dual-slot PCIe data-center GPU built on the Ada Lovelace architecture and announced at SIGGRAPH on August 8, 2023.^[3] NVIDIA describes it as "a powerful, universal data center processor designed to accelerate the most compute-intensive, complex applications, including AI training and inference, 3D design and visualization, video processing and industrial digitalization with the NVIDIA Omniverse platform."^[3] In NVIDIA's framing, the card's defining trait is breadth: rather than specializing in either rendering (like the L40) or peak AI throughput (like the H100), it is tuned to do a wide range of AI, graphics, and video work acceptably well on widely available hardware.^[1]^[4]

Bob Pette, vice president of professional visualization at NVIDIA, summarized the positioning at launch: "As generative AI transforms every industry, enterprises are increasingly seeking large-scale compute resources in the data center. OVX systems with NVIDIA L40S GPUs accelerate AI, graphics and video processing workloads, and meet the demanding performance requirements of an ever-increasing set of complex and diverse applications."^[3]

What architecture and design does the L40S use?

The L40S is built on NVIDIA's Ada Lovelace architecture and the AD102 graphics processor, the largest die in the Ada generation.^[2]^[4] Ada Lovelace introduced fourth-generation Tensor Cores with a Transformer Engine that supports the FP8 (8-bit floating point) data format, third-generation ray-tracing (RT) cores, and the architecture's CUDA core design.^[1]^[2] The L40S exposes 18,176 CUDA cores, 568 fourth-generation Tensor Cores, and 142 third-generation RT Cores.^[2]

A defining characteristic of the L40S is its memory subsystem. Rather than the high-bandwidth memory (HBM) used on NVIDIA's top-tier compute accelerators such as the A100 and H100, the L40S uses 48 GB of conventional GDDR6 memory with error-correcting code (ECC) on a 384-bit bus, providing 864 GB/s of bandwidth.^[2]^[8] This is considerably less bandwidth than an H100 (which offers roughly 3.35 TB/s with HBM3), but GDDR6 is cheaper and more readily manufactured, and 48 GB is a large pool relative to the card's class.^[7] The card also omits NVLink, NVIDIA's high-speed GPU-to-GPU interconnect; multiple L40S cards communicate over the standard PCIe Gen4 x16 host interface (about 64 GB/s bidirectional) rather than a dedicated GPU fabric.^[2]^[8] The L40S also lacks Multi-Instance GPU (MIG) partitioning, though it does support NVIDIA virtual-GPU (vGPU) software.^[8]

Physically, the L40S is a dual-slot, full-height PCIe card with a passive (fanless) cooler designed for the high-airflow chassis of data-center servers, drawing up to 350 W through a single 16-pin power connector.^[2]^[8]

What are the L40S specifications?

The following table lists the principal specifications of the NVIDIA L40S. Tensor throughput figures are quoted as dense, then with structured sparsity (dense | sparse), following NVIDIA's datasheet convention.^[2]

Specification	NVIDIA L40S
Architecture	Ada Lovelace (AD102)
Announced	August 8, 2023 (SIGGRAPH)
CUDA cores	18,176
Tensor Cores	568 (4th generation)
RT Cores	142 (3rd generation)
GPU memory	48 GB GDDR6 with ECC
Memory interface	384-bit
Memory bandwidth	864 GB/s
FP32	91.6 TFLOPS
TF32 Tensor Core	183 \| 366 TFLOPS
FP16 / BF16 Tensor Core	362 \| 733 TFLOPS
FP8 Tensor Core	733 \| 1,466 TFLOPS
INT8 Tensor Core	733 \| 1,466 TOPS
RT Core performance	212 TFLOPS
Interconnect	PCIe Gen4 x16 (no NVLink)
Form factor	Dual-slot, full-height; passive cooling
Max power	350 W
Media engines	3x NVENC (AV1), 3x NVDEC

NVIDIA summarizes the card's AI capability as more than 1.45 petaflops of tensor processing.^[2]^[3] The FP8 Transformer Engine, inherited from the Ada Lovelace architecture, "intelligently recasts between FP8 and FP16 precisions" to accelerate transformer-based neural networks while managing accuracy.^[1] In NVIDIA's own comparisons against the previous-generation A100 Tensor Core GPU, the L40S was claimed to deliver up to 1.2 times higher generative-AI inference performance and up to 1.7 times higher training performance.^[3]

How does the L40S compare to the L40?

The plain NVIDIA L40 and the L40S are closely related but aimed at different priorities. Both are built on the AD102 die, carry 18,176 CUDA cores, 568 Tensor Cores, and 142 RT Cores, and use 48 GB of GDDR6 with ECC at 864 GB/s. Neither supports NVLink.^[2]^[6] The differences are in clocks, power, and emphasis.

The L40, introduced in 2022, is positioned chiefly for graphics, visualization, virtual workstations, and as "the engine of NVIDIA Omniverse in the data center," with AI acceleration as a secondary capability.^[5]^[6] The L40S runs at a higher power limit (350 W versus 300 W) and a modestly higher boost clock (around 2,520 MHz versus 2,490 MHz), which lifts its sustained compute throughput.^[4]^[8] The most consequential gap is in AI tensor math: the L40S roughly doubles the L40's dense FP8 and FP16 tensor throughput (about 733 TFLOPS dense versus about 362 TFLOPS dense).^[2]^[6] Both cards physically support FP8 because they share the Ada architecture, but NVIDIA tunes and markets the L40S around the Transformer Engine and AI compute, whereas the L40 leans toward rendering and visual computing.^[1]^[5] The ray-tracing figures are nearly identical (about 212 TFLOPS on the L40S versus about 209 TFLOPS on the L40), reflecting the shared RT-core count.^[2]^[6]

Attribute	NVIDIA L40	NVIDIA L40S
Launch	2022	2023
Primary focus	Graphics, visualization, Omniverse	AI compute, inference, generative AI
FP32	90.5 TFLOPS	91.6 TFLOPS
FP8 Tensor (dense \| sparse)	~362 \| 724 TFLOPS	733 \| 1,466 TFLOPS
RT Core performance	~209 TFLOPS	212 TFLOPS
Max power	300 W	350 W
Memory	48 GB GDDR6, 864 GB/s	48 GB GDDR6, 864 GB/s
NVLink	No	No

How does the L40S compare to the H100?

The L40S arrived at a moment of acute scarcity for NVIDIA's flagship data-center accelerator. Through late 2023 and into 2024, the H100 (an NVIDIA Hopper architecture product) was extremely difficult to obtain, with reported lead times stretching to many months as TSMC's CoWoS advanced-packaging capacity and HBM supply constrained output.^[7] Because the L40S uses standard GDDR6 rather than HBM and does not require CoWoS packaging, it could be manufactured and shipped in far greater volume, often described as available in weeks rather than quarters.^[7]

This availability, combined with lower cost and power, made the L40S an attractive substitute for buyers who could not secure Hopper-class hardware. Industry analysis at the time characterized the trade-off plainly: across a broad range of AI training and inference workloads the H100 offered roughly two to three times the performance of an L40S, but the H100 also cost on the order of three times as much and consumed about twice the power.^[7] For inference and for fine-tuning smaller models, where the L40S's 48 GB of memory and strong FP8 throughput were sufficient, the card frequently delivered better performance per dollar and per watt.^[7] Its principal limitations relative to the H100 were the lower memory bandwidth of GDDR6, the absence of NVLink for tightly coupled multi-GPU scaling, and very limited double-precision (FP64) throughput (one analysis put it at about 1.4 teraflops running in emulation mode, calling it "basically non-existent"), which restricts its usefulness for traditional high-performance computing.^[7]

Attribute	NVIDIA L40S	NVIDIA H100 (SXM)
Architecture	Ada Lovelace (AD102)	Hopper (GH100)
Memory	48 GB GDDR6, 864 GB/s	80 GB HBM3, ~3.35 TB/s
NVLink	No (PCIe Gen4 only)	Yes
Relative AI performance	Baseline	~2-3x the L40S
Relative cost	Baseline	~3x the L40S
Relative power	Baseline	~2x the L40S
FP64	Minimal	High (HPC-class)

NVIDIA reinforced this positioning by making the L40S a centerpiece of its OVX server line for generative AI and Omniverse, with configurations supporting up to eight L40S GPUs per system.^[3] Because the card lacks NVLink, deployments that needed many GPUs typically relied on PCIe and, in some composable-infrastructure designs, attached large numbers of L40S cards to a single host to maximize density and throughput.^[7]

How was the L40S adopted, and why does it matter?

At launch, NVIDIA lined up broad original-equipment-manufacturer (OEM) support for the L40S, with ASUS, Dell Technologies, GIGABYTE, Hewlett Packard Enterprise, Lenovo, QCT, and Supermicro all planning OVX systems built around the card, and general availability beginning in the fall of 2023.^[3] The L40S was subsequently offered widely by cloud and GPU-rental providers as a mid-tier accelerator for inference, generative-AI serving, fine-tuning, rendering, and virtual-desktop workloads, frequently paired with software such as NVIDIA's TensorRT-LLM inference library to extract maximum performance from its FP8 Transformer Engine.^[1]

The L40S's significance lies in how it broadened access to capable AI compute during a period when the most powerful accelerators were rationed. By repackaging the large Ada Lovelace die with a generous pool of conventional GDDR6 memory and a high power budget, NVIDIA produced a versatile card that, while well short of the H100 in raw throughput and memory bandwidth, was abundant, affordable, and good enough for a wide swath of inference and generative-AI deployments.^[7] It sits in NVIDIA's data-center lineup above the compact, low-power L4 (also an Ada Lovelace product) and below the HBM-based Hopper accelerators, occupying the role of a high-volume, general-purpose GPU for organizations that prioritized availability and cost efficiency over peak performance.^[1]^[7]

References

NVIDIA, "L40S GPU for AI and Graphics Performance." https://www.nvidia.com/en-us/data-center/l40s/ ↩
NVIDIA, "NVIDIA L40S Datasheet." https://resources.nvidia.com/en-us-l40s/l40s-datasheet-28413 ↩
NVIDIA Newsroom, "NVIDIA, Global Data Center System Manufacturers to Supercharge Generative AI and Industrial Digitalization," August 8, 2023. https://nvidianews.nvidia.com/news/nvidia-global-data-center-system-manufacturers-to-supercharge-generative-ai-and-industrial-digitalization ↩
ServeTheHome, "NVIDIA L40S GPU for Data Center Visualization Launched." https://www.servethehome.com/nvidia-l40s-gpu-for-data-center-visualization-launched/ ↩
NVIDIA, "NVIDIA L40 GPU for Data Center." https://www.nvidia.com/en-us/data-center/l40/ ↩
NVIDIA, "NVIDIA L40 Datasheet," January 2023. https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/support-guide/NVIDIA-L40-Datasheet-January-2023.pdf ↩
The Next Platform, "What To Do When You Can't Get Nvidia H100 GPUs," November 17, 2023. https://www.nextplatform.com/2023/11/17/what-to-do-when-you-cant-get-nvidia-h100-gpus/ ↩
Lenovo Press, "ThinkSystem NVIDIA L40S 48GB PCIe Gen4 Passive GPU Product Guide." https://lenovopress.lenovo.com/lp1812-nvidia-l40s-48gb-pcie-gen4-passive-gpu ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Ada Lovelace (microarchitecture)NVIDIA A800 NVIDIA L4 NVIDIA RTX PRO 6000 Blackwell

What is the NVIDIA L40S?

What architecture and design does the L40S use?

What are the L40S specifications?

How does the L40S compare to the L40?

How does the L40S compare to the H100?

How was the L40S adopted, and why does it matter?

References

Improve this article

Related Articles

CuDNN

Jetson Thor

NVIDIA Blackwell

NVIDIA DGX Spark

NVIDIA Picasso

Jensen Huang

What links here

Related Articles

CuDNN

Jetson Thor

NVIDIA Blackwell

NVIDIA DGX Spark

NVIDIA Picasso

Jensen Huang

What links here