NVIDIA H20
Last reviewed
Jun 3, 2026
Sources
17 citations
Review status
Source-backed
Revision
v1 ยท 2,138 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
17 citations
Review status
Source-backed
Revision
v1 ยท 2,138 words
Add missing citations, update stale details, or suggest a clearer explanation.
The NVIDIA H20 is a data-center graphics processing unit (GPU) based on NVIDIA's Hopper architecture, designed specifically for the Chinese market to comply with United States export controls on advanced artificial-intelligence accelerators. Announced in late 2023 and shipping from early 2024, the H20 is a deliberately constrained variant of the flagship H100: it offers substantially lower compute throughput but retains large, high-bandwidth memory and full-speed chip-to-chip interconnect, a combination that makes it relatively strong for AI inference and other memory-bound workloads. The chip became the centerpiece of an unusually public US-China export-control saga during 2025, including a de facto US ban that triggered a multibillion-dollar charge against NVIDIA's earnings, a subsequent reversal tied to a reported revenue-sharing arrangement with the US government, and a security review and purchasing crackdown by Chinese authorities.[1][2][3]
In October 2022 the US Department of Commerce's Bureau of Industry and Security (BIS) introduced rules restricting exports to China of the most capable AI accelerators, using thresholds based on processing performance and chip-to-chip interconnect bandwidth. NVIDIA responded by creating cut-down versions of its existing products, the A800 (derived from the A100) and the H800 (derived from the H100), which reduced interconnect bandwidth to stay below the limits.[4]
On October 17, 2023, BIS tightened the rules again, adding a "performance density" metric and removing the interconnect-only workaround that the 800-series chips had exploited. NVIDIA was notified to halt exports of the H800 and A800, and the company moved to design new compliant parts. The H20 emerged as the most capable of three new China-market products NVIDIA developed under the revised rules, alongside the L20 and L2.[3][4]
The design philosophy of the H20 is to trade compute for memory. Export thresholds are tied largely to arithmetic throughput, so NVIDIA reduced the number of active streaming multiprocessors (and therefore peak floating-point performance) while preserving large memory capacity and very high memory bandwidth. Because modern AI inference is frequently limited by how quickly model weights and key-value caches can be moved rather than by raw math, the resulting chip can be competitive with, and in some inference scenarios faster than, the far more powerful but export-restricted H100.[1][5]
The H20 uses the same GH100 silicon family as the H100 but enables roughly 78 of the 144 streaming multiprocessors present on a full Hopper die, about 41 percent fewer GPU cores than the top H100 configuration. It pairs this reduced compute with 96 GB of HBM3 memory and 4.0 TB/s of memory bandwidth, and it retains full fourth-generation NVLink at 900 GB/s. Tensor-core throughput figures below are dense (without structured sparsity).[1][5][6]
| Specification | NVIDIA H20 (SXM) |
|---|---|
| Architecture | Hopper |
| Streaming multiprocessors | ~78 (of 144 on a full die) |
| Memory | 96 GB HBM3 |
| Memory bandwidth | 4.0 TB/s |
| FP16 / BF16 (dense) | ~148 TFLOPS |
| FP8 / INT8 (dense) | ~296 TFLOPS / TOPS |
| TF32 (dense) | ~74 TFLOPS |
| FP32 | ~44 TFLOPS |
| NVLink bandwidth | 900 GB/s |
| L2 cache | 60 MB |
| Multi-Instance GPU | up to 7 instances |
| Thermal design power | ~400 W |
| Target system | 8-way HGX |
The H20's defining characteristic is the gap between its memory subsystem, which is near or above flagship class, and its compute, which is heavily curtailed. The table below compares dense tensor-core throughput so the figures are directly comparable across parts.[1][5][6][7]
| Metric | H20 (SXM) | H100 (SXM) | H200 (SXM) |
|---|---|---|---|
| Memory | 96 GB HBM3 | 80 GB HBM3 | 141 GB HBM3e |
| Memory bandwidth | 4.0 TB/s | ~3.35 TB/s | 4.8 TB/s |
| FP16 / BF16 (dense) | ~148 TFLOPS | ~989 TFLOPS | ~989 TFLOPS |
| FP8 (dense) | ~296 TFLOPS | ~1,979 TFLOPS | ~1,979 TFLOPS |
| NVLink bandwidth | 900 GB/s | 900 GB/s | 900 GB/s |
| TDP | ~400 W | 700 W | 700 W |
In dense FP16, the H20 delivers roughly 15 percent of the H100's tensor-core throughput, reflecting its sharply reduced core count. Yet because the H20 carries more memory than the H100 (96 GB versus 80 GB) and higher bandwidth (4.0 TB/s versus about 3.35 TB/s), independent analysts found it could outperform the H100 on certain inference tasks, with reports describing it as roughly 20 percent faster than the H100 for some inference workloads at low batch sizes, even as the H100 remained far superior for large-scale model pre-training. Relative to the H200, the H20 trails on every metric except that both share the same 900 GB/s NVLink. This profile, weak compute but strong memory and interconnect, is what drew scrutiny from US policymakers who argued the H20 was well suited to inference and to clustering into large systems.[1][5][7]
NVIDIA began informing Chinese customers about the H20 in late 2023 and started volume shipments in early 2024, after some initial delays. Demand was strong: Chinese cloud providers and technology companies, including major internet firms, placed large orders, and by various reports NVIDIA shipped on the order of one million H20 units during 2024. The chip became the primary high-end Western AI accelerator legally available in China, and for several quarters it represented a significant share of NVIDIA's China data-center revenue. Some Chinese buyers initially complained that the part was overpriced relative to its reduced performance and weighed domestic alternatives, but the H20's software compatibility with NVIDIA's CUDA ecosystem and its inference strengths sustained demand.[3][8]
The H20's legal status in the United States changed sharply over the course of 2025.[2][9][10]
| Date (2025) | Event |
|---|---|
| April 9 | The US government informed NVIDIA that a license would be required to export the H20 to China and "D:5" arms-embargoed countries.[2] |
| April 14 | The US government told NVIDIA the license requirement would remain in effect "for the indefinite future."[2] |
| April 15 | NVIDIA disclosed in an SEC filing that it expected to record charges of about $5.5 billion in its fiscal first quarter (ended April 27, 2025) for H20 inventory, purchase commitments, and related reserves.[9] |
| late May | In its first-quarter fiscal 2026 results, NVIDIA reported an actual H20 charge of $4.5 billion, about $1 billion less than the estimate, after it reused some materials; it had recognized $4.6 billion of H20 sales before the new rules and was unable to ship a further $2.5 billion.[1] |
| July 14 to 15 | NVIDIA said the US administration had assured it that H20 export licenses would be granted, and that it was filing license applications; CEO Jensen Huang relayed the news to customers during a visit to China.[10] |
| July 31 | The Cyberspace Administration of China (CAC) summoned NVIDIA to explain alleged "backdoor" tracking and remote-shutdown risks in the H20; NVIDIA denied that its chips contain any such backdoors.[11] |
| August 11 | The Financial Times reported that NVIDIA and AMD had agreed to remit 15 percent of their China AI-chip sales revenue to the US government in connection with export licenses, covering the H20 and AMD's MI308.[12] |
| August 22 | Reports said NVIDIA had moved to halt H20 production, instructing suppliers to suspend manufacturing, after Chinese regulators discouraged domestic firms from buying the chip.[13] |
| August 27 | In its second-quarter fiscal 2026 results, NVIDIA reported no H20 sales to China-based customers in the quarter, a $180 million release of previously reserved H20 inventory tied to a roughly $650 million sale to a customer outside China, and said it was awaiting US guidelines before booking China H20 revenue.[14] |
The April action effectively halted H20 exports to China. The $5.5 billion estimate disclosed on April 15, later revised to a $4.5 billion realized charge, reflected inventory and supplier purchase commitments that NVIDIA could no longer fulfill. The US rationale cited the risk that the H20 could be used in a supercomputer in China.[1][2][9]
The reported 15 percent arrangement was widely characterized as unprecedented: it would mark the first time a US company agreed to share revenue with the federal government in exchange for export licenses. According to the reporting, President Donald Trump initially sought 20 percent and the rate was negotiated down to 15 percent after a meeting with Jensen Huang. NVIDIA, declining detailed comment, said it follows the rules the US government sets for its participation in global markets. Reporting and subsequent NVIDIA disclosures noted that, as of late 2025, the arrangement had not been codified in a published regulation, leaving its precise legal mechanics and finalization uncertain.[12][14]
China's reaction complicated NVIDIA's hoped-for sales recovery. On July 31, 2025, the CAC summoned NVIDIA over what it described as security risks in the H20, pointing to claims that the chips could incorporate "tracking and positioning" or "remote shutdown" capabilities, and asked the company to submit supporting materials. NVIDIA publicly rejected the allegations, stating that its products contain no backdoors that would allow remote access or control.[11]
Subsequently, Chinese authorities reportedly discouraged or instructed domestic technology companies to refrain from purchasing the H20 on national-security grounds, encouraging use of domestic accelerators such as those from Huawei and Cambricon. Amid this pressure, reports in August 2025 indicated that NVIDIA had asked component and packaging suppliers to suspend H20 production. The combination of the US license uncertainty and Chinese discouragement left the H20 in a difficult position in both jurisdictions during the second half of 2025.[13][15]
NVIDIA was widely reported to be developing a new China-market accelerator based on its newer Blackwell architecture, referred to in coverage as the "B30A," which would be more capable than the H20 while remaining below the company's flagship Blackwell products. NVIDIA publicly stated that there was no product called B30A "planned, designed, or produced," so the existence and naming of any such chip remained unconfirmed as of late 2025, even as executives signaled interest in offering more advanced parts to China if US rules permitted.[16][17]
The H20 illustrates the central tension in US controls on AI hardware: rules anchored to compute throughput can be navigated by trading arithmetic performance for memory and bandwidth, producing a chip that remains valuable for the inference workloads that increasingly dominate AI deployment. Its trajectory through 2025, from a multibillion-dollar de facto ban, to a reported revenue-sharing condition for resumed exports, to a security review and purchasing crackdown inside China, made it a high-profile case study in how export policy, corporate strategy, and geopolitics intersect in the semiconductor industry. The episode also underscored the financial stakes for NVIDIA, which had counted China among its larger markets, and intensified debate in both Washington and Beijing over the wisdom of allowing or accepting constrained Western AI accelerators.[1][2][12]