NVIDIA L4

AI Hardware NVIDIA

7 min read

Updated Jul 16, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 16, 2026

Fact-checked

In review queue

Sources

9 citations

Revision

v2 · 1,498 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

The NVIDIA L4 is a compact, power-efficient data-center GPU built on the Ada Lovelace architecture and optimized for artificial-intelligence inference and video processing. Announced by NVIDIA at the GTC conference on March 21, 2023, the L4 is a single-slot, low-profile card that operates within a 72 watt power envelope, allowing it to be powered entirely by a PCIe slot with no supplemental power connector. It is positioned as the successor to the Turing-based Tesla T4 and is aimed at mainstream and edge servers where energy efficiency, server density, and a broad mix of inference, video, and graphics workloads matter more than raw peak throughput.^[1]^[2]

NVIDIA markets the L4 as a "universal" accelerator: a single card intended to handle generative AI inference, large language model serving, image classification, object detection, speech recognition, recommendation, video transcoding, and cloud graphics. Its combination of fourth-generation Tensor Cores, dedicated video engines with AV1 support, and a low thermal footprint made it a widely deployed inference and video GPU across major cloud providers following its launch.^[2]^[3]

What the L4 is

The L4 is the inference-and-video member of NVIDIA's Ada Lovelace data-center lineup, which also includes the higher-end L40S. It is built on the AD104 graphics processor, the same die family used in several GeForce RTX 40-series and professional Ada cards, fabricated on TSMC's 4N process. Unlike consumer cards, the L4 ships with no display outputs, uses passive cooling intended for server airflow, and is tuned for sustained data-center duty cycles rather than peak clocks.^[4]^[5]

Architecturally, the L4 inherits the key Ada Lovelace features: fourth-generation Tensor Cores that add FP8 precision through the Transformer Engine, third-generation RT cores for ray tracing, and Shader Execution Reordering. FP8 support is significant for AI inference because it roughly doubles Tensor Core throughput relative to FP16 while preserving accuracy for many transformer models, and it pairs with NVIDIA's inference software such as TensorRT-LLM and the Triton Inference Server. The card does not support NVLink or Multi-Instance GPU (MIG) partitioning.^[2]^[5]

A distinguishing trait of the L4 is its media subsystem. The GPU integrates two NVENC encoders, four NVDEC decoders, and four JPEG decoders, with hardware AV1 encode and decode. NVIDIA states a single L4 can process more than 1,000 concurrent AV1 video streams at 720p30, making the card attractive for video-on-demand, live transcoding, and AI video pipelines that combine decoding, inference, and re-encoding on one device.^[2]^[3]

Specifications

The following figures are drawn from NVIDIA's L4 product page and product brief. Tensor Core throughput numbers are quoted with structured sparsity enabled; the dense (non-sparse) values are half of those listed.^[1]^[3]

Specification	NVIDIA L4
Architecture	Ada Lovelace
GPU die	AD104
Process	TSMC 4N
CUDA cores	7,424
Streaming multiprocessors	60
Tensor Cores	240 (4th generation)
RT cores	60 (3rd generation)
FP32	30.3 TFLOPS
TF32 Tensor Core	120 TFLOPS
FP16 / BF16 Tensor Core	242 TFLOPS
FP8 Tensor Core	485 TFLOPS
INT8 Tensor Core	485 TOPS
Memory	24 GB GDDR6 (with ECC)
Memory interface	192-bit
Memory bandwidth	300 GB/s
Interconnect	PCIe Gen4 x16
Video engines	2 NVENC, 4 NVDEC, 4 JPEG decoders (AV1 encode/decode)
Form factor	Single-slot, low-profile (169 mm x 69 mm)
Cooling	Passive
Maximum power	72 W

Lineage: the T4 successor

The L4 is the direct successor to the Tesla T4, NVIDIA's previous low-profile inference GPU, which launched in 2018 on the Turing architecture. The two cards share the same physical concept, a single-slot, low-profile, bus-powered form factor for high-density servers, but the L4 advances nearly every metric. NVIDIA reports that customers migrating from the T4 to the L4 can expect roughly 2x to 4x better performance, with up to 2.7x more generative AI throughput and up to 3x more AI video encode and decode performance on Ada Lovelace's upgraded media engines.^[2]^[6]

Feature	NVIDIA L4 (2023)	NVIDIA T4 (2018)
Architecture	Ada Lovelace	Turing
GPU die	AD104	TU104
CUDA cores	7,424	2,560
Tensor Cores	240 (4th gen)	320 (2nd gen)
Memory	24 GB GDDR6	16 GB GDDR6
Memory bandwidth	300 GB/s	320 GB/s
FP32	30.3 TFLOPS	8.1 TFLOPS
Max power	72 W	70 W
Form factor	Single-slot low-profile	Single-slot low-profile

The comparison shows the L4's design philosophy: it keeps the T4's tight power and physical budget (only 2 watts higher TDP and the same slot footprint) while adding 50 percent more memory, far more compute, modern FP8 precision, and AV1 video. The slightly lower memory bandwidth versus the T4 reflects the L4's narrower 192-bit bus, a tradeoff that NVIDIA offsets with the larger 48 MB L2 cache and higher clocks of the Ada architecture.^[4]^[6]

Strengths

The L4's primary strengths follow from its balance of capability and efficiency:

AI inference. Fourth-generation Tensor Cores with FP8 and INT8 make the L4 well suited to serving small and mid-sized language models, image and recommendation models, and other latency-sensitive inference workloads at low cost per query.^[2]
Video processing. The dedicated NVENC, NVDEC, and JPEG engines, together with AV1, let the L4 transcode large numbers of streams and feed AI video analytics pipelines without consuming general compute resources. NVIDIA cites up to 120x higher AI video performance than a CPU-only server in some pipelines.^[2]^[3]
Low power and density. At 72 watts and a single low-profile slot, the L4 can be packed in large numbers into mainstream and edge servers, often without supplemental power cabling, which simplifies deployment and lowers cooling demands.^[1]^[3]
Energy efficiency. NVIDIA states the L4 can deliver up to 99 percent better energy efficiency than traditional CPU-based infrastructure for AI video, a claim aligned with its positioning for cost- and power-constrained inference fleets.^[2]^[3]

Because of these traits, the L4 is frequently chosen for edge AI, retail and smart-city video analytics, cloud gaming and virtual workstations, and as an entry point for generative AI inference where the larger memory and power of cards like the L40S or H100 is unnecessary.

Cloud availability and adoption

Google Cloud was the first cloud provider to offer the L4, debuting it in the G2 accelerator-optimized machine family. NVIDIA and Google announced the partnership alongside the GTC launch, with the L4 entering private preview on Google Cloud on March 22, 2023, public preview on April 4, 2023, and general availability on May 9, 2023. G2 instances can be configured with 1 to 8 L4 GPUs and are also offered through Vertex AI for model serving. Google described the G2 as targeting cost-efficient inference, video transcoding, and graphics workloads, citing up to 40 percent better cost efficiency than comparable A10G-based instances and 2x to 4x better performance than T4 instances.^[7]^[8]

Following the Google Cloud debut, the L4 became broadly available across other major cloud platforms and through server vendors including Cisco, Dell, Lenovo, Supermicro, and others, who offer it as a passively cooled add-in card validated for their rack servers. Its small footprint and bus-powered design allow OEMs to qualify it across a wide range of existing server platforms.^[5]^[9]

Significance

The L4 represents NVIDIA's strategy of segmenting its data-center portfolio by workload rather than offering a single accelerator for all tasks. While flagship parts such as the H100 target large-scale training and high-end inference, the L4 addresses the high-volume, cost-sensitive tier: inference servers, video infrastructure, and edge deployments where efficiency and density dominate purchasing decisions. By packaging Ada Lovelace's FP8 Tensor Cores and modern AV1 video engines into a 72 watt single-slot card, NVIDIA extended the popular T4 template into the generative AI era, and the L4 became one of the most widely deployed inference and video GPUs across cloud and enterprise data centers in the years after its 2023 launch.^[2]^[6]

References

NVIDIA, "L4 Tensor Core GPU for AI & Graphics," NVIDIA Data Center. https://www.nvidia.com/en-us/data-center/l4/ ↩
NVIDIA Developer Blog, "Supercharging AI Video and AI Inference Performance with NVIDIA L4 GPUs." https://developer.nvidia.com/blog/supercharging-ai-video-and-ai-inference-performance-with-nvidia-l4-gpus/ ↩
NVIDIA, "NVIDIA L4 GPU Accelerator Product Brief," PB-11316-001_v01, March 2023. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/l4/PB-11316-001_v01.pdf ↩
ServeTheHome, "NVIDIA L4 24GB Released Upgrading the NVIDIA T4." https://www.servethehome.com/nvidia-l4-24gb-released-upgrading-the-nvidia-t4/ ↩
Lenovo Press, "ThinkSystem NVIDIA L4 24GB PCIe Gen4 Passive GPU Product Guide." https://lenovopress.lenovo.com/lp1717-thinksystem-nvidia-l4-24gb-pcie-gen4-passive-gpu ↩
NVIDIA Investor Relations, "NVIDIA and Google Cloud Deliver Powerful New Generative AI Platform, Built on the New L4 GPU and Vertex AI," March 21, 2023. https://investor.nvidia.com/news/press-release-details/2023/NVIDIA-and-Google-Cloud-Deliver-Powerful-New-Generative-AI-Platform-Built-on-the-New-L4-GPU-and-Vertex-AI/default.aspx ↩
Google Cloud Blog, "Introducing G2 VMs with NVIDIA L4 GPUs." https://cloud.google.com/blog/products/compute/introducing-g2-vms-with-nvidia-l4-gpus ↩
Google Cloud Documentation, "GPU machine types: Compute Engine." https://docs.cloud.google.com/compute/docs/gpus ↩
Cisco, "NVIDIA L4 Tensor Core GPU Datasheet." https://www.cisco.com/c/dam/en/us/products/collateral/servers-unified-computing/ucs-c-series-rack-servers/nvidia-l4-gpu.pdf ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Ada Lovelace (microarchitecture)NVIDIA L40S

What the L4 is

Specifications

Lineage: the T4 successor

Strengths

Cloud availability and adoption

Significance

References

Improve this article

Related Articles

CuDNN

Jetson Thor

NVIDIA Blackwell

NVIDIA DGX Spark

NVIDIA Picasso

Jensen Huang

What links here

Related Articles

CuDNN

Jetson Thor

NVIDIA Blackwell

NVIDIA DGX Spark

NVIDIA Picasso

Jensen Huang

What links here