NVIDIA L4
Last reviewed
Jun 3, 2026
Sources
9 citations
Review status
Source-backed
Revision
v1 · 1,501 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
9 citations
Review status
Source-backed
Revision
v1 · 1,501 words
Add missing citations, update stale details, or suggest a clearer explanation.
The NVIDIA L4 is a compact, power-efficient data-center GPU built on the Ada Lovelace architecture and optimized for artificial-intelligence inference and video processing. Announced by NVIDIA at the GTC conference on March 21, 2023, the L4 is a single-slot, low-profile card that operates within a 72 watt power envelope, allowing it to be powered entirely by a PCIe slot with no supplemental power connector. It is positioned as the successor to the Turing-based Tesla T4 and is aimed at mainstream and edge servers where energy efficiency, server density, and a broad mix of inference, video, and graphics workloads matter more than raw peak throughput.[1][2]
NVIDIA markets the L4 as a "universal" accelerator: a single card intended to handle generative AI inference, large language model serving, image classification, object detection, speech recognition, recommendation, video transcoding, and cloud graphics. Its combination of fourth-generation Tensor Cores, dedicated video engines with AV1 support, and a low thermal footprint made it a widely deployed inference and video GPU across major cloud providers following its launch.[2][3]
The L4 is the inference-and-video member of NVIDIA's Ada Lovelace data-center lineup, which also includes the higher-end L40S. It is built on the AD104 graphics processor, the same die family used in several GeForce RTX 40-series and professional Ada cards, fabricated on TSMC's 4N process. Unlike consumer cards, the L4 ships with no display outputs, uses passive cooling intended for server airflow, and is tuned for sustained data-center duty cycles rather than peak clocks.[4][5]
Architecturally, the L4 inherits the key Ada Lovelace features: fourth-generation Tensor Cores that add FP8 precision through the Transformer Engine, third-generation RT cores for ray tracing, and Shader Execution Reordering. FP8 support is significant for AI inference because it roughly doubles Tensor Core throughput relative to FP16 while preserving accuracy for many transformer models, and it pairs with NVIDIA's inference software such as TensorRT-LLM and the Triton Inference Server. The card does not support NVLink or Multi-Instance GPU (MIG) partitioning.[2][5]
A distinguishing trait of the L4 is its media subsystem. The GPU integrates two NVENC encoders, four NVDEC decoders, and four JPEG decoders, with hardware AV1 encode and decode. NVIDIA states a single L4 can process more than 1,000 concurrent AV1 video streams at 720p30, making the card attractive for video-on-demand, live transcoding, and AI video pipelines that combine decoding, inference, and re-encoding on one device.[2][3]
The following figures are drawn from NVIDIA's L4 product page and product brief. Tensor Core throughput numbers are quoted with structured sparsity enabled; the dense (non-sparse) values are half of those listed.[1][3]
| Specification | NVIDIA L4 |
|---|---|
| Architecture | Ada Lovelace |
| GPU die | AD104 |
| Process | TSMC 4N |
| CUDA cores | 7,424 |
| Streaming multiprocessors | 60 |
| Tensor Cores | 240 (4th generation) |
| RT cores | 60 (3rd generation) |
| FP32 | 30.3 TFLOPS |
| TF32 Tensor Core | 120 TFLOPS |
| FP16 / BF16 Tensor Core | 242 TFLOPS |
| FP8 Tensor Core | 485 TFLOPS |
| INT8 Tensor Core | 485 TOPS |
| Memory | 24 GB GDDR6 (with ECC) |
| Memory interface | 192-bit |
| Memory bandwidth | 300 GB/s |
| Interconnect | PCIe Gen4 x16 |
| Video engines | 2 NVENC, 4 NVDEC, 4 JPEG decoders (AV1 encode/decode) |
| Form factor | Single-slot, low-profile (169 mm x 69 mm) |
| Cooling | Passive |
| Maximum power | 72 W |
The L4 is the direct successor to the Tesla T4, NVIDIA's previous low-profile inference GPU, which launched in 2018 on the Turing architecture. The two cards share the same physical concept, a single-slot, low-profile, bus-powered form factor for high-density servers, but the L4 advances nearly every metric. NVIDIA reports that customers migrating from the T4 to the L4 can expect roughly 2x to 4x better performance, with up to 2.7x more generative AI throughput and up to 3x more AI video encode and decode performance on Ada Lovelace's upgraded media engines.[2][6]
| Feature | NVIDIA L4 (2023) | NVIDIA T4 (2018) |
|---|---|---|
| Architecture | Ada Lovelace | Turing |
| GPU die | AD104 | TU104 |
| CUDA cores | 7,424 | 2,560 |
| Tensor Cores | 240 (4th gen) | 320 (2nd gen) |
| Memory | 24 GB GDDR6 | 16 GB GDDR6 |
| Memory bandwidth | 300 GB/s | 320 GB/s |
| FP32 | 30.3 TFLOPS | 8.1 TFLOPS |
| Max power | 72 W | 70 W |
| Form factor | Single-slot low-profile | Single-slot low-profile |
The comparison shows the L4's design philosophy: it keeps the T4's tight power and physical budget (only 2 watts higher TDP and the same slot footprint) while adding 50 percent more memory, far more compute, modern FP8 precision, and AV1 video. The slightly lower memory bandwidth versus the T4 reflects the L4's narrower 192-bit bus, a tradeoff that NVIDIA offsets with the larger 48 MB L2 cache and higher clocks of the Ada architecture.[4][6]
The L4's primary strengths follow from its balance of capability and efficiency:
Because of these traits, the L4 is frequently chosen for edge AI, retail and smart-city video analytics, cloud gaming and virtual workstations, and as an entry point for generative AI inference where the larger memory and power of cards like the L40S or H100 is unnecessary.
Google Cloud was the first cloud provider to offer the L4, debuting it in the G2 accelerator-optimized machine family. NVIDIA and Google announced the partnership alongside the GTC launch, with the L4 entering private preview on Google Cloud on March 22, 2023, public preview on April 4, 2023, and general availability on May 9, 2023. G2 instances can be configured with 1 to 8 L4 GPUs and are also offered through Vertex AI for model serving. Google described the G2 as targeting cost-efficient inference, video transcoding, and graphics workloads, citing up to 40 percent better cost efficiency than comparable A10G-based instances and 2x to 4x better performance than T4 instances.[7][8]
Following the Google Cloud debut, the L4 became broadly available across other major cloud platforms and through server vendors including Cisco, Dell, Lenovo, Supermicro, and others, who offer it as a passively cooled add-in card validated for their rack servers. Its small footprint and bus-powered design allow OEMs to qualify it across a wide range of existing server platforms.[5][9]
The L4 represents NVIDIA's strategy of segmenting its data-center portfolio by workload rather than offering a single accelerator for all tasks. While flagship parts such as the H100 target large-scale training and high-end inference, the L4 addresses the high-volume, cost-sensitive tier: inference servers, video infrastructure, and edge deployments where efficiency and density dominate purchasing decisions. By packaging Ada Lovelace's FP8 Tensor Cores and modern AV1 video engines into a 72 watt single-slot card, NVIDIA extended the popular T4 template into the generative AI era, and the L4 became one of the most widely deployed inference and video GPUs across cloud and enterprise data centers in the years after its 2023 launch.[2][6]