# NVIDIA L4

> Source: https://aiwiki.ai/wiki/nvidia_l4
> Updated: 2026-07-16
> Categories: AI Hardware, NVIDIA
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

The **NVIDIA L4** is a compact, power-efficient data-center GPU built on the [Ada Lovelace](/wiki/nvidia_ada_lovelace) architecture and optimized for artificial-intelligence inference and video processing. Announced by [NVIDIA](/wiki/nvidia) at the GTC conference on March 21, 2023, the L4 is a single-slot, low-profile card that operates within a 72 watt power envelope, allowing it to be powered entirely by a PCIe slot with no supplemental power connector. It is positioned as the successor to the Turing-based [Tesla T4](/wiki/nvidia_t4) and is aimed at mainstream and edge servers where energy efficiency, server density, and a broad mix of inference, video, and graphics workloads matter more than raw peak throughput.[1][2]

NVIDIA markets the L4 as a "universal" accelerator: a single card intended to handle generative AI inference, large language model serving, image classification, object detection, speech recognition, recommendation, video transcoding, and cloud graphics. Its combination of fourth-generation Tensor Cores, dedicated video engines with AV1 support, and a low thermal footprint made it a widely deployed inference and video GPU across major cloud providers following its launch.[2][3]

## What the L4 is

The L4 is the inference-and-video member of NVIDIA's Ada Lovelace data-center lineup, which also includes the higher-end [L40S](/wiki/nvidia_l40s). It is built on the AD104 graphics processor, the same die family used in several GeForce RTX 40-series and professional Ada cards, fabricated on TSMC's 4N process. Unlike consumer cards, the L4 ships with no display outputs, uses passive cooling intended for server airflow, and is tuned for sustained data-center duty cycles rather than peak clocks.[4][5]

Architecturally, the L4 inherits the key Ada Lovelace features: fourth-generation Tensor Cores that add FP8 precision through the Transformer Engine, third-generation RT cores for ray tracing, and Shader Execution Reordering. FP8 support is significant for AI inference because it roughly doubles Tensor Core throughput relative to FP16 while preserving accuracy for many transformer models, and it pairs with NVIDIA's inference software such as [TensorRT-LLM](/wiki/tensorrt_llm) and the [Triton Inference Server](/wiki/triton_inference_server). The card does not support NVLink or Multi-Instance GPU (MIG) partitioning.[2][5]

A distinguishing trait of the L4 is its media subsystem. The GPU integrates two NVENC encoders, four NVDEC decoders, and four JPEG decoders, with hardware AV1 encode and decode. NVIDIA states a single L4 can process more than 1,000 concurrent AV1 video streams at 720p30, making the card attractive for video-on-demand, live transcoding, and AI video pipelines that combine decoding, inference, and re-encoding on one device.[2][3]

## Specifications

The following figures are drawn from NVIDIA's L4 product page and product brief. Tensor Core throughput numbers are quoted with structured sparsity enabled; the dense (non-sparse) values are half of those listed.[1][3]

| Specification | NVIDIA L4 |
| --- | --- |
| Architecture | Ada Lovelace |
| GPU die | AD104 |
| Process | TSMC 4N |
| CUDA cores | 7,424 |
| Streaming multiprocessors | 60 |
| Tensor Cores | 240 (4th generation) |
| RT cores | 60 (3rd generation) |
| FP32 | 30.3 TFLOPS |
| TF32 Tensor Core | 120 TFLOPS |
| FP16 / BF16 Tensor Core | 242 TFLOPS |
| FP8 Tensor Core | 485 TFLOPS |
| INT8 Tensor Core | 485 TOPS |
| Memory | 24 GB GDDR6 (with ECC) |
| Memory interface | 192-bit |
| Memory bandwidth | 300 GB/s |
| Interconnect | PCIe Gen4 x16 |
| Video engines | 2 NVENC, 4 NVDEC, 4 JPEG decoders (AV1 encode/decode) |
| Form factor | Single-slot, low-profile (169 mm x 69 mm) |
| Cooling | Passive |
| Maximum power | 72 W |

## Lineage: the T4 successor

The L4 is the direct successor to the [Tesla T4](/wiki/nvidia_t4), NVIDIA's previous low-profile inference GPU, which launched in 2018 on the Turing architecture. The two cards share the same physical concept, a single-slot, low-profile, bus-powered form factor for high-density servers, but the L4 advances nearly every metric. NVIDIA reports that customers migrating from the T4 to the L4 can expect roughly 2x to 4x better performance, with up to 2.7x more generative AI throughput and up to 3x more AI video encode and decode performance on Ada Lovelace's upgraded media engines.[2][6]

| Feature | NVIDIA L4 (2023) | NVIDIA T4 (2018) |
| --- | --- | --- |
| Architecture | Ada Lovelace | Turing |
| GPU die | AD104 | TU104 |
| CUDA cores | 7,424 | 2,560 |
| Tensor Cores | 240 (4th gen) | 320 (2nd gen) |
| Memory | 24 GB GDDR6 | 16 GB GDDR6 |
| Memory bandwidth | 300 GB/s | 320 GB/s |
| FP32 | 30.3 TFLOPS | 8.1 TFLOPS |
| Max power | 72 W | 70 W |
| Form factor | Single-slot low-profile | Single-slot low-profile |

The comparison shows the L4's design philosophy: it keeps the T4's tight power and physical budget (only 2 watts higher TDP and the same slot footprint) while adding 50 percent more memory, far more compute, modern FP8 precision, and AV1 video. The slightly lower memory bandwidth versus the T4 reflects the L4's narrower 192-bit bus, a tradeoff that NVIDIA offsets with the larger 48 MB L2 cache and higher clocks of the Ada architecture.[4][6]

## Strengths

The L4's primary strengths follow from its balance of capability and efficiency:

- **AI inference.** Fourth-generation Tensor Cores with FP8 and INT8 make the L4 well suited to serving small and mid-sized language models, image and recommendation models, and other latency-sensitive inference workloads at low cost per query.[2]
- **Video processing.** The dedicated NVENC, NVDEC, and JPEG engines, together with AV1, let the L4 transcode large numbers of streams and feed AI video analytics pipelines without consuming general compute resources. NVIDIA cites up to 120x higher AI video performance than a CPU-only server in some pipelines.[2][3]
- **Low power and density.** At 72 watts and a single low-profile slot, the L4 can be packed in large numbers into mainstream and edge servers, often without supplemental power cabling, which simplifies deployment and lowers cooling demands.[1][3]
- **Energy efficiency.** NVIDIA states the L4 can deliver up to 99 percent better energy efficiency than traditional CPU-based infrastructure for AI video, a claim aligned with its positioning for cost- and power-constrained inference fleets.[2][3]

Because of these traits, the L4 is frequently chosen for edge AI, retail and smart-city video analytics, cloud gaming and virtual workstations, and as an entry point for generative AI inference where the larger memory and power of cards like the [L40S](/wiki/nvidia_l40s) or [H100](/wiki/nvidia_h100) is unnecessary.

## Cloud availability and adoption

Google Cloud was the first cloud provider to offer the L4, debuting it in the G2 accelerator-optimized machine family. NVIDIA and Google announced the partnership alongside the GTC launch, with the L4 entering private preview on Google Cloud on March 22, 2023, public preview on April 4, 2023, and general availability on May 9, 2023. G2 instances can be configured with 1 to 8 L4 GPUs and are also offered through Vertex AI for model serving. Google described the G2 as targeting cost-efficient inference, video transcoding, and graphics workloads, citing up to 40 percent better cost efficiency than comparable A10G-based instances and 2x to 4x better performance than T4 instances.[7][8]

Following the Google Cloud debut, the L4 became broadly available across other major cloud platforms and through server vendors including Cisco, Dell, Lenovo, Supermicro, and others, who offer it as a passively cooled add-in card validated for their rack servers. Its small footprint and bus-powered design allow OEMs to qualify it across a wide range of existing server platforms.[5][9]

## Significance

The L4 represents NVIDIA's strategy of segmenting its data-center portfolio by workload rather than offering a single accelerator for all tasks. While flagship parts such as the [H100](/wiki/nvidia_h100) target large-scale training and high-end inference, the L4 addresses the high-volume, cost-sensitive tier: inference servers, video infrastructure, and edge deployments where efficiency and density dominate purchasing decisions. By packaging Ada Lovelace's FP8 Tensor Cores and modern AV1 video engines into a 72 watt single-slot card, NVIDIA extended the popular T4 template into the generative AI era, and the L4 became one of the most widely deployed inference and video GPUs across cloud and enterprise data centers in the years after its 2023 launch.[2][6]

## References

1. NVIDIA, "L4 Tensor Core GPU for AI & Graphics," NVIDIA Data Center. https://www.nvidia.com/en-us/data-center/l4/
2. NVIDIA Developer Blog, "Supercharging AI Video and AI Inference Performance with NVIDIA L4 GPUs." https://developer.nvidia.com/blog/supercharging-ai-video-and-ai-inference-performance-with-nvidia-l4-gpus/
3. NVIDIA, "NVIDIA L4 GPU Accelerator Product Brief," PB-11316-001_v01, March 2023. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/l4/PB-11316-001_v01.pdf
4. ServeTheHome, "NVIDIA L4 24GB Released Upgrading the NVIDIA T4." https://www.servethehome.com/nvidia-l4-24gb-released-upgrading-the-nvidia-t4/
5. Lenovo Press, "ThinkSystem NVIDIA L4 24GB PCIe Gen4 Passive GPU Product Guide." https://lenovopress.lenovo.com/lp1717-thinksystem-nvidia-l4-24gb-pcie-gen4-passive-gpu
6. NVIDIA Investor Relations, "NVIDIA and Google Cloud Deliver Powerful New Generative AI Platform, Built on the New L4 GPU and Vertex AI," March 21, 2023. https://investor.nvidia.com/news/press-release-details/2023/NVIDIA-and-Google-Cloud-Deliver-Powerful-New-Generative-AI-Platform-Built-on-the-New-L4-GPU-and-Vertex-AI/default.aspx
7. Google Cloud Blog, "Introducing G2 VMs with NVIDIA L4 GPUs." https://cloud.google.com/blog/products/compute/introducing-g2-vms-with-nvidia-l4-gpus
8. Google Cloud Documentation, "GPU machine types: Compute Engine." https://docs.cloud.google.com/compute/docs/gpus
9. Cisco, "NVIDIA L4 Tensor Core GPU Datasheet." https://www.cisco.com/c/dam/en/us/products/collateral/servers-unified-computing/ucs-c-series-rack-servers/nvidia-l4-gpu.pdf