Nvidia Corporation is an American multinational technology company that designs and manufactures graphics processing units (GPUs), systems on chips (SoCs), and related software for gaming, professional visualization, data centers, and automotive markets. Originally a graphics chip company serving the video game industry, Nvidia has become the dominant supplier of hardware for artificial intelligence training and inference, with an estimated 90% share of the AI accelerator market as of 2025. The company is headquartered in Santa Clara, California, and employed approximately 42,000 people as of early 2026.
Nvidia's market capitalization surpassed $4 trillion in July 2025, making it one of the most valuable publicly traded companies in the world. Its rapid growth has been driven almost entirely by the explosive demand for AI computing infrastructure from hyperscale cloud providers, AI research labs, and enterprises adopting large language models and generative AI.
Nvidia was founded on April 5, 1993, by Jensen Huang, Chris Malachowsky, and Curtis Priem. The three co-founders famously planned the company during a meeting at a Denny's restaurant on Berryessa Road in East San Jose, California, in late 1992. They began working out of Priem's townhouse in Fremont, California, with $40,000 in initial capital.
Jensen Huang, a Taiwanese-American electrical engineer, had previously worked as a microprocessor designer at AMD and as director of CoreWare at LSI Logic. He has served as president and CEO of Nvidia since the company's founding. Malachowsky came from Sun Microsystems, and Priem had worked at both Sun Microsystems and IBM.
The company's name is derived from the Latin word "invidia," meaning envy. In its first two years, Nvidia developed the NV1 multimedia accelerator, which was released in 1995. The chip was not commercially successful, but the lessons learned from its development shaped the company's future direction.
In 1999, Nvidia released the GeForce 256, which it marketed as "the world's first GPU" (graphics processing unit). While earlier graphics accelerators existed, the GeForce 256 was the first consumer chip to integrate transform and lighting calculations on the GPU itself, offloading these tasks from the CPU. This product established Nvidia as a leader in consumer graphics and defined the GPU as a product category.
Throughout the 2000s, Nvidia dominated the gaming GPU market alongside rival ATI (later acquired by AMD). The company expanded into professional visualization with its Quadro product line and into high-performance computing with the Tesla line of compute accelerators.
Nvidia went public on January 22, 1999, listing on the Nasdaq stock exchange under the ticker symbol NVDA. The company's initial market capitalization was approximately $563 million.
GPUs were originally designed for a single task: rendering pixels on a screen for video games and graphics applications. However, researchers in the early 2000s recognized that the massively parallel architecture of GPUs could be applied to other computationally intensive problems. A GPU contains thousands of small cores that can execute the same operation on many data points simultaneously, making it well suited for tasks like matrix multiplication, physics simulations, and scientific computing.
This approach, known as General-Purpose computing on Graphics Processing Units (GPGPU), initially required developers to "trick" the GPU by reformulating their computations as graphics rendering tasks. The process was cumbersome and error-prone, which limited adoption.
In 2006, Nvidia released CUDA (Compute Unified Device Architecture), a parallel computing platform and programming model that allowed developers to write general-purpose programs for Nvidia GPUs using extensions to the C programming language. CUDA eliminated the need to express computations as graphics shaders and provided a straightforward way to harness the parallel processing power of GPUs.
CUDA's release was a turning point. For the first time, scientists, engineers, and researchers could write GPU-accelerated code without expertise in graphics programming. Nvidia invested heavily in developer tools, documentation, and university outreach programs to build the CUDA ecosystem.
The impact on deep learning became clear in 2012, when Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton trained AlexNet, a convolutional neural network that won the ImageNet competition by a large margin. AlexNet was trained on two Nvidia GeForce GTX 580 GPUs. This result demonstrated that GPUs were dramatically faster than CPUs for training neural networks, and it sparked the modern deep learning revolution.
As deep learning frameworks like TensorFlow (2015) and PyTorch (2016) emerged, Nvidia worked closely with framework developers to optimize performance on its hardware. The company built specialized libraries such as cuDNN (CUDA Deep Neural Network library) for accelerating neural network primitives and cuBLAS for linear algebra. These libraries became deeply integrated into every major AI framework, creating a powerful software moat that competitors have struggled to replicate.
By 2020, CUDA had become the default compute backend for virtually all serious AI research and production workloads. The combination of mature libraries, extensive documentation, a large developer community, and years of optimization meant that switching to an alternative platform involved significant friction, even when competitive hardware was available.
Nvidia has released a series of GPU architectures, each one bringing major improvements in AI training and inference performance. The company has maintained a roughly two-year cadence for new data center GPU architectures.
The Tesla architecture, launched in 2007, was Nvidia's first GPU family designed specifically for general-purpose computing. The Tesla C870 and subsequent models were marketed toward high-performance computing (HPC) and scientific research rather than gaming. Tesla GPUs supported CUDA and offered double-precision floating point, making them suitable for computational physics and molecular dynamics.
The Fermi architecture improved upon Tesla with better double-precision performance and support for error-correcting code (ECC) memory, which was important for scientific applications. Fermi also introduced a unified address space and support for C++ in CUDA programs.
Kepler introduced dynamic parallelism and Hyper-Q technology, allowing the GPU to manage workloads more efficiently. Maxwell focused on energy efficiency and delivered a significant performance-per-watt improvement.
The Pascal architecture, realized in the Tesla P100 accelerator, was Nvidia's first data center GPU built on the 16nm FinFET process. The P100 featured 3,584 CUDA cores, 16 GB of HBM2 memory, and up to 720 GB/s of memory bandwidth. Pascal also introduced NVLink, a high-speed interconnect for GPU-to-GPU communication that was faster than PCIe.
Volta was a landmark architecture for AI. The Tesla V100 introduced Tensor Cores, specialized hardware units designed to accelerate matrix multiply-and-accumulate operations that are central to deep learning. The V100 featured 5,120 CUDA cores, 640 Tensor Cores, 16 or 32 GB of HBM2 memory, 900 GB/s of memory bandwidth, and approximately 21.1 billion transistors fabricated on a 12nm process.
Tensor Cores enabled mixed-precision training, where computations are performed in FP16 (half precision) while maintaining FP32 (single precision) accuracy for accumulation. This approach roughly doubled training throughput compared to FP32-only execution, with minimal impact on model quality.
Although primarily a gaming architecture (GeForce RTX 20 series), Turing introduced second-generation Tensor Cores and RT cores for ray tracing. The data center variant, the T4, became widely used for inference workloads due to its low power consumption and INT8 support.
The Ampere architecture, embodied in the A100 accelerator, brought third-generation Tensor Cores with support for additional data types including TF32 (TensorFloat-32), BF16 (bfloat16), and FP64 Tensor Core operations. The A100 was built on a 7nm process with 54 billion transistors, 6,912 CUDA cores, 432 Tensor Cores, and 40 or 80 GB of HBM2e memory providing up to 2 TB/s of bandwidth.
The A100 also introduced Multi-Instance GPU (MIG) technology, allowing a single GPU to be partitioned into up to seven independent instances for running multiple workloads concurrently. The A100's third-generation NVLink provided 600 GB/s of GPU-to-GPU bandwidth.
The Hopper architecture, named after computer scientist Grace Hopper, produced the H100 accelerator. Built on a 4nm process with approximately 80 billion transistors, the H100 featured 16,896 CUDA cores, 528 Tensor Cores, and 80 GB of HBM3 memory delivering 3.35 TB/s of bandwidth.
The H100 introduced the Transformer Engine, a hardware feature that automatically manages mixed-precision computation between FP8 and FP16 formats on a layer-by-layer basis. This was specifically designed to accelerate transformer architectures, which underpin modern LLMs. The H100 SXM variant delivered approximately 67 TFLOPS of FP32 performance and up to 1,979 TFLOPS of FP16 Tensor performance.
The H100 became the most sought-after chip in the AI industry during 2023 and 2024. Wait times stretched to months, and the chip traded on secondary markets at significant premiums above list price.
Nvidia later released the H200, an updated version with 141 GB of HBM3e memory and 4.8 TB/s of bandwidth, offering substantially improved performance for large model inference due to the increased memory capacity and bandwidth.
The Blackwell architecture, named after mathematician David Blackwell, represented another major leap. Blackwell GPUs use a novel dual-die design in which two GPU dies are connected by a high-bandwidth on-chip link and function as a single unified GPU.
The B200 accelerator features approximately 208 billion transistors (104 billion per die), 20,480 CUDA cores, 192 GB of HBM3e memory, and 8 TB/s of memory bandwidth. The B200 introduced fifth-generation Tensor Cores with native FP4 (4-bit floating point) support for inference and second-generation Transformer Engine support.
Nvidia also released the B300 (Blackwell Ultra) variant with 288 GB of HBM3e memory and enhanced compute capabilities, delivering up to 15,000 TFLOPS in FP4 Tensor operations.
The GB200 NVL72 is a rack-scale system that combines 72 Blackwell GPUs and 36 Grace CPUs connected via fifth-generation NVLink. This configuration delivers up to 1,440 petaflops of FP4 inference performance and is designed for training and running trillion-parameter models.
Blackwell GPUs began volume production in late 2024 and ramped aggressively through 2025, with every major cloud provider deploying them at scale.
At GTC 2026, Nvidia unveiled the Rubin architecture, its next-generation platform built on TSMC's 3nm process node. The Rubin R100 GPU features 336 billion transistors, 288 GB of HBM4 memory, 22 TB/s of memory bandwidth, and 50 petaflops of FP4 compute performance. Nvidia has described Rubin as being designed for the "reasoning era" of AI, targeting workloads that require extended chain-of-thought computation. Volume production is expected in the second half of 2026, with Rubin Ultra planned for 2027.
The following table summarizes key specifications of Nvidia's major data center GPU accelerators used for AI workloads.
| GPU | Architecture | Year | Process | CUDA Cores | Memory | Memory Type | Bandwidth (TB/s) | FP32 (TFLOPS) | FP16 Tensor (TFLOPS) | TDP (W) | Transistors (B) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Tesla P100 | Pascal | 2016 | 16nm | 3,584 | 16 GB | HBM2 | 0.72 | 10.6 | N/A | 300 | 15.3 |
| Tesla V100 | Volta | 2017 | 12nm | 5,120 | 32 GB | HBM2 | 0.90 | 15.7 | 125 | 300 | 21.1 |
| A100 (SXM) | Ampere | 2020 | 7nm | 6,912 | 80 GB | HBM2e | 2.0 | 19.5 | 312 | 400 | 54.2 |
| H100 (SXM) | Hopper | 2022 | 4nm | 16,896 | 80 GB | HBM3 | 3.35 | 67 | 1,979 | 700 | 80 |
| H200 | Hopper | 2024 | 4nm | 16,896 | 141 GB | HBM3e | 4.8 | 67 | 1,979 | 700 | 80 |
| B200 | Blackwell | 2025 | 4nm | 20,480 | 192 GB | HBM3e | 8.0 | N/A | 2,250 (FP16) | 1,000 | 208 |
| B300 | Blackwell Ultra | 2025 | 4nm | 20,480 | 288 GB | HBM3e | 8.0 | N/A | ~2,500 (FP16) | 1,400 | 208 |
| R100 | Rubin | 2026 | 3nm | TBD | 288 GB | HBM4 | 22.0 | TBD | TBD | TBD | 336 |
Nvidia sells GPUs through two distinct product lines: data center accelerators (A100, H100, H200, B200) designed for professional AI workloads, and GeForce consumer GPUs (RTX 4090, RTX 5090) designed primarily for gaming.
Although consumer GPUs can be used for AI tasks, there are important differences.
| Feature | Data Center GPUs (e.g., H100) | Consumer GPUs (e.g., RTX 5090) |
|---|---|---|
| Memory capacity | 80 to 288 GB (HBM) | 24 to 32 GB (GDDR) |
| Memory bandwidth | 3.35 to 8+ TB/s | 1.0 to 1.8 TB/s |
| Tensor Core support | Full precision range (FP64, FP32, FP16, FP8, FP4) | Limited precision support |
| Multi-GPU interconnect | NVLink (up to 1.8 TB/s) | PCIe only (~128 GB/s) |
| ECC memory | Yes | No |
| MIG support | Yes (A100, H100) | No |
| Price | $25,000 to $40,000+ | $1,800 to $2,600 |
| Typical use case | Large-scale training, enterprise inference | Fine-tuning, small model inference, research |
The limited memory capacity of consumer GPUs is the primary bottleneck for AI workloads. A single RTX 5090 with 32 GB of GDDR7 cannot hold the parameters of models larger than about 15 billion parameters at full precision, whereas an H200 with 141 GB of HBM3e can handle much larger models. The lack of NVLink on consumer GPUs also makes multi-GPU training significantly less efficient, as GPUs must communicate over the much slower PCIe bus.
That said, consumer GPUs offer strong price-to-performance ratios for smaller workloads. The RTX 5090 delivers roughly 30 to 45% better deep learning performance than the RTX 4090, and research teams on tight budgets sometimes build multi-GPU workstations with consumer cards for experimentation and fine-tuning.
Nvidia's DGX product line provides turnkey AI computing systems that bundle multiple GPUs with optimized networking, storage, and software.
The DGX-1, announced in April 2016, was marketed as "the world's first deep learning supercomputer." It contained eight Tesla P100 GPUs connected via NVLink and was designed to deliver the computational equivalent of approximately 250 conventional servers. Jensen Huang personally delivered the first DGX-1 unit to the OpenAI research lab.
The DGX-2 doubled the GPU count to 16 Tesla V100 GPUs connected through NVSwitch, delivering 2 petaflops of deep learning performance in a single system.
The DGX A100 featured eight A100 GPUs, 15 TB of NVMe storage, 1 TB of system RAM, and eight 200 Gb/s InfiniBand ConnectX-6 network interfaces, providing 5 petaflops of FP8 AI performance.
The DGX H100 contained eight H100 GPUs delivering 32 petaflops of FP8 AI compute, 640 GB of total HBM3 memory, and fourth-generation NVLink for GPU-to-GPU communication at 900 GB/s per GPU.
The DGX B200 features eight Blackwell B200 GPUs delivering up to 40 petaflops of FP8 AI performance. Nvidia claims the DGX B200 provides 3x faster training and 15x faster inference on large Mixture-of-Experts models compared to the DGX H100.
The DGX SuperPOD is a large-scale cluster configuration that combines multiple DGX systems with high-bandwidth networking and shared storage. SuperPODs scale from dozens to thousands of GPUs and are designed for training frontier AI models. Multiple organizations, including Meta, Microsoft, and various national laboratories, have deployed DGX SuperPOD configurations.
In March 2025, Nvidia announced the DGX Spark, a desktop-sized AI computer based on Blackwell with 128 GB of unified memory. The DGX Spark is targeted at AI researchers and developers who want to prototype and fine-tune models locally before deploying to larger cloud-based infrastructure.
Nvidia's competitive advantage extends well beyond hardware. The company has built a comprehensive software ecosystem that spans the entire AI development pipeline, from data preparation through model training to production inference.
CUDA is the foundational layer of Nvidia's software stack. Released in 2006, it provides a parallel computing platform and programming model that allows developers to use Nvidia GPUs for general-purpose computation. CUDA includes a compiler (nvcc), runtime libraries, debugging tools, and profiling utilities. As of 2025, there are over 5 million CUDA developers worldwide, and more than 1,000 GPU-accelerated applications have been built on the platform.
cuDNN (CUDA Deep Neural Network library) provides highly optimized implementations of common neural network operations such as convolution, pooling, normalization, and activation functions. Every major deep learning framework, including PyTorch, TensorFlow, and JAX, relies on cuDNN for GPU-accelerated training and inference.
TensorRT is a high-performance inference optimization SDK. It takes trained neural network models and applies graph optimizations, layer fusion, kernel auto-tuning, precision calibration (FP16, INT8, FP8), and other techniques to maximize inference throughput and minimize latency. TensorRT can speed up inference by up to 6x compared to running the same model in a standard framework. TensorRT-LLM is a specialized version designed for optimizing and serving large language models.
Triton Inference Server is an open-source inference serving platform that supports models from multiple frameworks (PyTorch, TensorFlow, ONNX, TensorRT) and can run on both GPUs and CPUs. Triton handles model versioning, dynamic batching, ensemble pipelines, and provides HTTP/gRPC endpoints for serving predictions at scale. It has become widely adopted for production AI deployments.
Nvidia NeMo is an end-to-end framework for building, customizing, and deploying large language models and conversational AI systems. NeMo provides tools for data curation, supervised fine-tuning, reinforcement learning from human feedback (RLHF), and model alignment. It integrates with Nvidia's hardware optimizations and supports distributed training across large GPU clusters.
RAPIDS is a suite of open-source GPU-accelerated libraries for data science and analytics. It includes cuDF (a GPU DataFrame library similar to pandas), cuML (GPU-accelerated machine learning algorithms), and cuGraph (graph analytics). RAPIDS allows data scientists to accelerate their existing workflows by moving computation from CPUs to GPUs with minimal code changes.
| Software | Purpose |
|---|---|
| NCCL | Multi-GPU and multi-node collective communication library |
| cuBLAS | GPU-accelerated basic linear algebra |
| Nsight Systems | System-wide performance profiling |
| DALI | GPU-accelerated data loading and preprocessing pipeline |
| Magnum IO | Optimized I/O for data center workloads |
| Nvidia AI Enterprise | Supported software suite for enterprise AI deployment |
Training a modern large language model requires three primary resources: data, algorithms, and compute. Nvidia GPUs are central to the compute component. The training process involves repeatedly performing forward passes (computing predictions) and backward passes (computing gradients and updating model weights) over massive datasets. These operations are dominated by matrix multiplications, which map efficiently onto the parallel architecture of GPUs.
A typical large-scale training run for a frontier model uses thousands of GPUs working in parallel. For example, training a model with hundreds of billions of parameters might use a cluster of 8,000 to 32,000 H100 GPUs running continuously for several weeks. The GPUs communicate gradient updates using high-speed NVLink and InfiniBand networking.
Nvidia's hardware and software together address several bottlenecks in this pipeline.
Nvidia holds a near-monopoly on the AI training hardware market. Estimates from industry analysts in 2025 placed Nvidia's share of the AI accelerator market at approximately 90%, with the remaining 10% split among AMD, Google, Intel, Amazon, and various startups.
Several factors contribute to this dominance.
Software ecosystem lock-in: The CUDA ecosystem has been developing for nearly two decades. Most AI researchers, framework developers, and MLOps engineers have deep expertise in CUDA-based tools. Switching to a different hardware platform means rewriting or adapting code, revalidating model behavior, and retraining operations teams.
Performance leadership: Each generation of Nvidia GPUs has delivered substantial performance improvements over the previous generation and over competing offerings. While competitors have occasionally matched Nvidia on paper specifications, real-world training performance (which depends heavily on software optimization) has consistently favored Nvidia.
Supply chain and manufacturing: Nvidia has secured priority access to advanced manufacturing capacity at TSMC and to HBM supply from Samsung and SK Hynix, giving it the ability to ship large volumes of cutting-edge chips.
Full-stack integration: By providing hardware, interconnects (NVLink, NVSwitch), networking (acquired Mellanox in 2020 for $6.9 billion), and software in a single optimized stack, Nvidia reduces the integration burden for customers.
Nvidia's revenue growth over the past several years illustrates the scale of the AI computing boom.
| Fiscal Year (ending January) | Total Revenue | Data Center Revenue | YoY Revenue Growth |
|---|---|---|---|
| FY2022 | $26.9B | $10.6B | 61% |
| FY2023 | $27.0B | $15.0B | 0.2% |
| FY2024 | $60.9B | $47.5B | 126% |
| FY2025 | $130.5B | ~$115B | 114% |
| FY2026 | $215.9B | ~$194B | 65% |
The data center segment has become the overwhelming driver of Nvidia's business, growing from about 40% of total revenue in FY2022 to approximately 90% in FY2026. Quarterly revenue accelerated throughout FY2026, reaching a record $68.1 billion in Q4 FY2026 (ending January 2026).
Nvidia's gross margins have also been exceptionally high for a semiconductor company, consistently exceeding 70% during the AI boom, reflecting the combination of strong demand, limited competition, and the high value that customers place on AI training performance.
Nvidia's stock price has experienced extraordinary growth, driven by the AI boom.
| Date | Milestone |
|---|---|
| January 1999 | IPO on Nasdaq; market cap ~$563 million |
| May 2023 | Market cap crosses $1 trillion |
| February 2024 | Market cap crosses $2 trillion |
| June 2024 | Market cap crosses $3 trillion |
| July 2025 | Market cap briefly touches $4 trillion |
| January 2025 | Single-day market cap loss of ~$600 billion (largest in US history at the time) following the DeepSeek announcement |
| March 2026 | Market cap approximately $4.4 trillion |
Since its IPO, Nvidia's market capitalization has increased by more than 770,000%. The stock (NVDA) underwent a 10-for-1 stock split in June 2024.
Although Nvidia dominates the AI accelerator market, several competitors are working to challenge its position.
AMD is Nvidia's most direct competitor in the GPU market. AMD's Instinct MI300X, launched in late 2023, offered 192 GB of HBM3 memory and competitive inference performance. The MI350 series, launched in mid-2025, featured 288 GB of HBM3E memory and claimed 35x inference performance improvement over the previous generation. At CES 2026, AMD unveiled the MI455X, described as the world's first 2nm AI GPU with 432 GB of HBM4 memory.
AMD has secured deployment commitments from Microsoft, Meta, and OpenAI. However, AMD's ROCm software ecosystem, while improving, is still considered less mature than Nvidia's CUDA stack, and many AI workloads do not yet run as efficiently on AMD hardware.
Google has developed its own Tensor Processing Units (TPUs) since 2015. TPUs are custom ASICs designed specifically for tensor operations. Google uses TPUs extensively for internal AI workloads and offers them to external customers through Google Cloud. The seventh-generation TPU, Ironwood, released in late 2025, delivers 4,614 TFLOPS per chip, which analysts have described as being on par with Blackwell in some workloads.
TPUs are tightly integrated with Google's JAX and TensorFlow frameworks. However, they are only available through Google Cloud, limiting their reach compared to Nvidia GPUs, which can be purchased outright or rented from any cloud provider.
Amazon Web Services (AWS) developed the Trainium series of custom AI training chips. Trainium2, launched in 2024, is used by Anthropic to train its Claude models, with deployments reportedly exceeding 500,000 chips. AWS launched Trainium3 in December 2025 with 2.52 petaflops of FP8 compute and 144 GB of HBM3e memory. Amazon's approach is to offer Trainium as a lower-cost alternative to Nvidia GPUs within its cloud ecosystem.
Intel entered the AI accelerator market through its acquisition of Habana Labs in 2019. The Gaudi line of AI accelerators was positioned as a cost-effective alternative to Nvidia's offerings. However, Intel confirmed plans to discontinue the Gaudi line when its next-generation GPU architecture launches in 2026 or 2027, signaling a strategic pivot.
| Competitor | Product Line | Key Advantage | Key Limitation |
|---|---|---|---|
| AMD | Instinct MI series | High memory capacity; competitive pricing | ROCm ecosystem less mature than CUDA |
| TPU (Ironwood) | Tight JAX/TF integration; strong for training | Only available on Google Cloud | |
| Amazon | Trainium | Lower cost on AWS; good for inference | Limited to AWS; less community support |
| Intel | Gaudi (discontinued) | Budget-friendly | Being phased out |
| Custom silicon (various) | Microsoft Maia, Meta MTIA | Optimized for specific workloads | Not available to general market |
Despite growing competition, Nvidia's combination of hardware performance, software maturity, and ecosystem breadth has maintained its dominant position. Custom ASIC shipments from cloud providers are projected to grow 44.6% in 2026, compared to 16.1% growth for GPU shipments, indicating that the competitive dynamics are slowly shifting.
Nvidia GPUs are available through all major cloud platforms, and the company has established deep partnerships with each of the leading providers.
Nvidia and AWS have a partnership spanning over 15 years. AWS offers Nvidia GPU instances across multiple GPU generations, including P4d (A100), P5 (H100), and P6 (Blackwell) instances. In 2025, Nvidia launched DGX Cloud on AWS, a fully managed AI training platform. AWS has committed to deploying more than one million Nvidia GPUs, including Blackwell and Rubin architectures, across global cloud regions starting in 2026.
Microsoft Azure offers extensive Nvidia GPU availability, including NC, ND, and NV series virtual machines. Azure has deployed large-scale clusters using Nvidia GB300 NVL72 systems for training frontier AI models, including those built by OpenAI. Microsoft has also used Nvidia RTX PRO 6000 Blackwell GPUs for Azure workloads and has announced plans for further expansion.
While Google also develops its own TPUs, Google Cloud offers Nvidia GPU instances (A100, H100, and Blackwell-based) for customers who prefer or require Nvidia hardware. Nvidia DGX Cloud is also available on Google Cloud.
These partnerships reflect a mutually dependent relationship. Cloud providers need Nvidia GPUs to attract AI workloads, while Nvidia benefits from the massive purchasing power of hyperscale data centers. Hyperscalers accounted for just over 50% of Nvidia's data center revenue in FY2026.
The US government has imposed a series of export restrictions on advanced AI chips to China, significantly affecting Nvidia's business in one of its largest markets.
October 2022: The Bureau of Industry and Security (BIS) introduced the first round of semiconductor export controls targeting China. The rules restricted the export of chips above certain performance thresholds, effectively banning sales of the A100 and H100 to Chinese customers.
2023: Nvidia designed and sold the A800 and H800, modified versions of the A100 and H100 with reduced interconnect bandwidth to comply with the initial export controls. In October 2023, BIS broadened the rules to close this loophole, and Nvidia was notified to immediately halt exports of the H800.
2024: Nvidia introduced the H20, a further downgraded chip designed to fall below the revised performance thresholds. Nvidia sold approximately one million H20 chips to Chinese customers in 2024.
January 2025: The Biden administration issued the "AI Diffusion Rule," establishing global performance thresholds that blocked sales of flagship GPUs like the H100 and H200 to China, while creating a tiered system of export permissions for different countries.
April 2025: The Trump administration halted AI chip exports to China entirely.
July 2025: The administration reversed course, allowing Nvidia to resume shipments of H20 processors and AMD to restart MI308 sales to China.
December 2025: Further adjustments allowed export of the Nvidia H200 to approved Chinese customers under specific licensing conditions.
The export controls have had a meaningful but not devastating impact on Nvidia's overall revenue, as growth in other markets (particularly US hyperscalers) has more than offset reduced China sales. However, the restrictions have prompted Chinese companies to accelerate development of domestic alternatives, with Huawei's Ascend 910B being the most notable competitor in the Chinese market.
Beyond building hardware and software platforms, Nvidia conducts significant AI research through Nvidia Research and applies AI to several application domains.
Nvidia has developed the Nemotron family of large language models for enterprise and agentic AI applications. The Nemotron models are released under permissive open licenses and are designed to be customized for specific business use cases through NeMo. The family includes Nemotron Speech (for speech recognition), Nemotron RAG (for retrieval-augmented generation with embedding and reranking models), and general-purpose Nemotron chat models.
The Artificial Analysis Open Index has rated the Nemotron family among the most open model releases in the AI ecosystem based on license permissibility, data transparency, and technical documentation availability.
Nvidia Isaac GR00T is a platform for developing AI-powered humanoid robots. The latest version, GR00T N1.6, is a reasoning vision-language-action (VLA) model that enables full-body control for humanoid robots, allowing them to move and manipulate objects simultaneously based on natural language instructions and visual input.
The GR00T platform includes simulation environments built on Nvidia Omniverse and Isaac Sim, allowing developers to train and validate robot behaviors in simulation before deploying them on physical hardware. Companies including Franka Robotics and NEURA Robotics are using the GR00T platform.
Nvidia has released large datasets to support robotics research, including over 500,000 robotics trajectories and extensive simulation assets.
Nvidia DRIVE is the company's platform for autonomous vehicle development. It includes the DRIVE Orin and DRIVE Thor system-on-chip processors for in-vehicle AI computing, as well as software tools for perception, mapping, and planning.
At CES 2026, Nvidia introduced DRIVE Alpamayo-R1 (AR1), described as the world's first open reasoning VLA model for autonomous vehicle research. AR1 integrates chain-of-thought AI reasoning with path planning, allowing the vehicle's AI system to explain its driving decisions.
Mercedes-Benz has announced that its new CLA model will be the first passenger car to feature a system built on the Nvidia DRIVE platform with Alpamayo capabilities. Other automotive partners include Toyota, BYD, Hyundai, and several Chinese EV manufacturers.
Nvidia contributes open-source training data for autonomous driving research, including over 100 terabytes of vehicle sensor data.
Nvidia Omniverse is a platform for building and simulating 3D virtual worlds, or "digital twins." It is used in manufacturing, architecture, robotics, and autonomous vehicle simulation. Omniverse is built on the Universal Scene Description (OpenUSD) framework and allows real-time collaboration and physics-accurate simulation.
Nvidia has made several acquisitions that strengthened its position in AI and data center computing.
| Year | Company | Price | Significance |
|---|---|---|---|
| 2020 | Mellanox Technologies | $6.9B | High-speed data center networking (InfiniBand) |
| 2020 | Arm Ltd. (attempted) | $40B | Failed acquisition of Arm; blocked by regulators in 2022 |
| 2019 | Cumulus Networks | ~$100M | Data center networking software |
| 2024 | Run:ai | ~$700M | GPU orchestration and workload management |
The Mellanox acquisition was particularly strategic, as it gave Nvidia control over the InfiniBand networking technology that is used to connect GPUs in large-scale AI training clusters. By owning both the GPU and the network fabric, Nvidia can optimize the entire data path for distributed training workloads.
The attempted acquisition of Arm for $40 billion would have given Nvidia ownership of the CPU architecture used in most mobile devices and an increasing number of data center servers. However, the deal was blocked by regulators in multiple jurisdictions due to competition concerns and was officially terminated in February 2022.
Jensen Huang has led Nvidia as CEO since its founding in 1993, making him one of the longest-serving CEOs in the technology industry. He holds a bachelor's degree in electrical engineering from Oregon State University and a master's degree from Stanford University.
Huang's leadership style emphasizes long-term technical bets. The decision to invest in CUDA in 2006, years before deep learning became mainstream, is frequently cited as one of the most prescient strategic decisions in technology history. Under Huang's leadership, Nvidia pivoted from a gaming-focused GPU company to the dominant platform company for artificial intelligence.
As of 2026, Jensen Huang's personal net worth is estimated at over $130 billion, largely derived from his Nvidia shares.