NVIDIA A800
Last reviewed
Jun 3, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 1,896 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 1,896 words
Add missing citations, update stale details, or suggest a clearer explanation.
The NVIDIA A800 is a datacenter graphics processing unit that Nvidia created for the Chinese market as an export-compliant variant of the A100. Built on the same Ampere GA100 silicon as the A100, the A800 is differentiated by a single deliberately limited specification: its NVLink chip-to-chip interconnect bandwidth was reduced from 600 GB/s to 400 GB/s so that the part would fall just below the threshold defined by the United States export controls announced in October 2022.[1][2][3] The core compute, memory, and Tensor Core capabilities were left unchanged; only the interconnect was throttled.[2][4]
Nvidia confirmed the A800 in a statement on Monday, November 7, 2022, roughly one month after the US Bureau of Industry and Security (BIS) first restricted exports of advanced computing chips to China.[1][5] The chip had entered production in the third quarter of 2022.[1][4] It served as the workhorse AI accelerator inside China for much of 2023 until a tightening of the export rules on October 17, 2023 closed the loophole it relied on and barred its sale to Chinese customers.[6][7] The A800 is the Ampere-generation counterpart to the Hopper-based H800, which Nvidia introduced for China through the same strategy in early 2023.[8][9]
On October 7, 2022, BIS published a sweeping set of export controls intended to slow China's access to the most advanced AI and supercomputing hardware.[10][11] For datacenter accelerators the rules turned on two technical criteria. A chip required a license for export to China if it exceeded a "total processing performance" (TPP) value of 4,800, or if it had an aggregate bidirectional interconnect (input/output) bandwidth greater than 600 GB/s.[10][11] The A100, with third-generation NVLink running at exactly 600 GB/s and Tensor-Core throughput well above the compute threshold, was caught by both prongs, as was the newer H100.[3][10]
China was a material part of Nvidia's datacenter business, and the company moved quickly to engineer a part that stayed under the line. Because the interconnect-bandwidth criterion was a fixed, easily measured number, Nvidia chose to keep the A100's full compute and simply dial the NVLink interconnect down from 600 GB/s to 400 GB/s.[2][3] An Nvidia spokesperson said at launch that "the A800 meets the US Government's clear test for reduced export control and cannot be programmed to exceed it."[1][5] The result was a chip that was, in raw matrix-math throughput, essentially an A100, but that communicated with its peers across an 8-GPU server roughly a third more slowly.
The interconnect, rather than per-chip compute, is the binding constraint when training very large models, because gradients and activations must be exchanged between many GPUs at every step. Slower NVLink lengthens that communication and stretches out training time on large clusters, which is precisely the effect the controls were designed to produce. For single-GPU inference and smaller workloads, the A800 behaved almost identically to an A100.[2][8]
The two products are the same GA100 die, the same 7nm TSMC process, the same memory configurations, and the same Tensor-Core performance figures. The single substantive difference is NVLink bandwidth. The comparison below summarizes the relationship.
| Specification | A100 (SXM 80GB) | A800 (SXM 80GB) |
|---|---|---|
| Architecture | Ampere (GA100) | Ampere (GA100) |
| Process node | TSMC 7nm (N7) | TSMC 7nm (N7) |
| FP32 CUDA cores | 6,912 | 6,912 |
| Third-gen Tensor Cores | 432 | 432 |
| Memory | 80 GB HBM2e | 80 GB HBM2e |
| Memory bandwidth | ~2,039 GB/s | ~2,039 GB/s |
| NVLink (3rd gen) | 600 GB/s | 400 GB/s |
| TF32 / BF16 / FP16 Tensor | 156 / 312 / 312 TFLOPS | 156 / 312 / 312 TFLOPS |
| INT8 Tensor | 624 TOPS | 624 TOPS |
| TDP (SXM) | 400 W | 400 W |
| Market | Worldwide | China only |
Sources: NVIDIA A100 datasheets and architecture whitepaper; A800 reporting and OEM spec sheets.[1][2][4][12]
Because the per-chip math is unchanged, on most benchmarks a single A800 scores within measurement noise of an A100. The gap appears in multi-GPU scaling efficiency: the reduced NVLink throughput means an 8-way A800 server loses some of the scaling that an equivalent A100 server retains, with the penalty growing as the number of cooperating GPUs and the communication-to-compute ratio of the workload increase.[2][8]
As with the A100, the A800 shipped in both the SXM mezzanine form factor and as a PCI Express add-in card, and across more than one memory capacity. OEM product guides and distributor listings document three principal SKUs: the A800 80GB SXM module, the A800 80GB PCIe card, and a lower-capacity A800 40GB PCIe card.[12][13]
| Variant | Form factor | Memory | NVLink | TDP |
|---|---|---|---|---|
| A800 80GB SXM | SXM4 mezzanine | 80 GB HBM2e | 400 GB/s | 400 W |
| A800 80GB PCIe | Dual-slot PCIe Gen 4 | 80 GB HBM2e | 400 GB/s (bridge) | 300 W |
| A800 40GB PCIe | Dual-slot PCIe Gen 4 | 40 GB HBM2 | 400 GB/s (bridge) | 240 W |
Sources: Lenovo ThinkSystem A800 product guide and distributor specifications.[12][13]
The SXM module plugged into HGX-style NVLink baseboards and was the form factor used to build the large multi-GPU servers that Chinese cloud providers and AI labs deployed at scale. The PCIe cards, like their A100 equivalents, fit standard rack servers and could be paired through an NVLink bridge, but with the interconnect capped at the 400 GB/s ceiling. Major server vendors including Lenovo and Dell listed A800 configurations for the China market before the product was withdrawn following the 2023 controls.[12][13]
(A workstation product also named the NVIDIA A800 40GB Active, an Ada Lovelace-class graphics card sold outside China, is a separate and unrelated product despite sharing the "A800" name. This article concerns the Ampere datacenter accelerator.)[14]
On October 17, 2023, BIS issued an update to the advanced-computing rules that was explicitly designed to capture the China-specific parts that Nvidia and others had engineered around the 2022 thresholds.[6][7][11] The most consequential change was the removal of the interconnect-bandwidth criterion, the exact parameter the A800 and H800 had been built to evade.[11][15] In its place BIS leaned on compute, lowering the effective control point and adding a new "performance density" metric (a measure of processing performance per unit of silicon area) intended as a proxy for how advanced a chip is.[11][15] Under the revised framework a datacenter chip required a license if its TPP reached 4,800, or if its TPP reached 1,600 and its performance density reached 5.92.[11]
Because the A800 retained the A100's full compute, it now exceeded the recast thresholds and lost the protection of the interconnect carve-out. The A800 and H800 were therefore prohibited from export to China, alongside the A100, H100, and several other accelerators reported as affected, including the L40, L40S, and the GeForce RTX 4090.[6][7][16] Commerce Secretary Gina Raimondo framed the update as an effort to "protect technologies that have clear national security or human rights implications" while stating that the vast majority of semiconductors would remain unrestricted.[7][15] The same update added Chinese GPU developers Biren Technology and Moore Threads to the Entity List.[6][15]
The A800 established a template that Nvidia reused as each new flagship was caught by the controls. When the Hopper-based H100 became restricted, Nvidia introduced the H800 in March 2023, again leaving the compute largely intact while cutting the chip-to-chip interconnect, reported by Reuters at roughly 300 GB/s, about half of the H100's bandwidth and a steeper relative reduction than the A800's cut from 600 to 400 GB/s.[8][9][17] Nvidia described its "800-series products" as fully compliant with the export rules.[9] The H800 was widely deployed by Chinese firms such as Alibaba, Baidu, and Tencent and, like the A800, was barred by the October 2023 update.[9][17]
After the 2023 rules eliminated the interconnect loophole, Nvidia could no longer simply throttle bandwidth, and its subsequent China parts reduced raw compute instead. The Hopper-based H20, introduced for China in late 2023 and 2024, was the next-generation successor in this lineage, designed to fit under the compute and performance-density thresholds rather than the now-removed interconnect limit.[16][18]
The A800 is a textbook illustration of how narrowly drawn technical thresholds can be engineered around. By fixing on a single measurable interconnect number, the October 2022 controls left room for a chip that delivered essentially full A100 compute to Chinese buyers within weeks of the rules taking effect.[2][3] For roughly a year the A800 (and later the H800) became the default high-end AI accelerator inside China, sold openly through channels including domestic marketplaces and stockpiled by cloud providers and AI developers.[1][17]
That episode directly shaped the October 2023 redesign, in which BIS pivoted from interconnect bandwidth to compute and performance density precisely to neutralize the 800-series workaround.[11][15] The cycle of control, compliant redesign, and tightened control became the recurring pattern of US-China AI hardware policy, with the A800 as its first and clearest case, succeeded by the H800 and then by the compute-limited H20. The A800 thus marks the moment the export-control contest shifted from a question of which chips Nvidia could sell to one of how each rule would be drafted, measured, and circumvented.[11][18]