Nvidia
Last reviewed
May 31, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v7 ยท 7,814 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 31, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v7 ยท 7,814 words
Add missing citations, update stale details, or suggest a clearer explanation.
Nvidia Corporation is an American multinational technology company that designs and manufactures graphics processing units (GPUs), systems on chips (SoCs), and related software for gaming, professional visualization, data centers, and automotive markets. Originally a graphics chip company serving the video game industry, Nvidia has become the dominant supplier of hardware for artificial intelligence training and inference, with an estimated 80 to 90% share of the AI accelerator market as of 2026. The company is headquartered in Santa Clara, California, and employed approximately 42,000 people as of early 2026.
Nvidia's market capitalization first surpassed $4 trillion in July 2025 and crossed $5 trillion on October 29, 2025, making it the first company in history to reach that valuation; it briefly traded above $5.5 trillion in early 2026.[17][28] Its rapid growth has been driven almost entirely by the explosive demand for AI computing infrastructure from hyperscale cloud providers, AI research labs, and enterprises adopting large language models and generative AI. For fiscal 2026 (ending January 2026), Nvidia reported revenue of $215.9 billion, of which approximately $193.7 billion came from its Data Center segment.[4]
Nvidia was founded on April 5, 1993, by Jensen Huang, Chris Malachowsky, and Curtis Priem. The three co-founders famously planned the company during a meeting at a Denny's restaurant on Berryessa Road in East San Jose, California, in late 1992. They began working out of Priem's townhouse in Fremont, California, with $40,000 in initial capital.
Jensen Huang, a Taiwanese-American electrical engineer, had previously worked as a microprocessor designer at AMD and as director of CoreWare at LSI Logic. He has served as president and CEO of Nvidia since the company's founding. Malachowsky came from Sun Microsystems, and Priem had worked at both Sun Microsystems and IBM.
The company's name is derived from the Latin word "invidia," meaning envy. In its first two years, Nvidia developed the NV1 multimedia accelerator, which was released in 1995. The chip was not commercially successful, but the lessons learned from its development shaped the company's future direction.
In 1999, Nvidia released the GeForce 256, which it marketed as "the world's first GPU" (graphics processing unit). While earlier graphics accelerators existed, the GeForce 256 was the first consumer chip to integrate transform and lighting calculations on the GPU itself, offloading these tasks from the CPU. This product established Nvidia as a leader in consumer graphics and defined the GPU as a product category.
Throughout the 2000s, Nvidia dominated the gaming GPU market alongside rival ATI (later acquired by AMD). The company expanded into professional visualization with its Quadro product line and into high-performance computing with the Tesla line of compute accelerators.
Nvidia went public on January 22, 1999, listing on the Nasdaq stock exchange under the ticker symbol NVDA. The company's initial market capitalization was approximately $563 million.
GPUs were originally designed for a single task: rendering pixels on a screen for video games and graphics applications. However, researchers in the early 2000s recognized that the massively parallel architecture of GPUs could be applied to other computationally intensive problems. A GPU contains thousands of small cores that can execute the same operation on many data points simultaneously, making it well suited for tasks like matrix multiplication, physics simulations, and scientific computing.
This approach, known as General-Purpose computing on Graphics Processing Units (GPGPU), initially required developers to "trick" the GPU by reformulating their computations as graphics rendering tasks. The process was cumbersome and error-prone, which limited adoption.
In 2006, Nvidia released CUDA (Compute Unified Device Architecture), a parallel computing platform and programming model that allowed developers to write general-purpose programs for Nvidia GPUs using extensions to the C programming language. CUDA eliminated the need to express computations as graphics shaders and provided a straightforward way to harness the parallel processing power of GPUs.
CUDA's release was a turning point. For the first time, scientists, engineers, and researchers could write GPU-accelerated code without expertise in graphics programming. Nvidia invested heavily in developer tools, documentation, and university outreach programs to build the CUDA ecosystem.
The impact on deep learning became clear in 2012, when Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton trained AlexNet, a convolutional neural network that won the ImageNet competition by a large margin. AlexNet was trained on two Nvidia GeForce GTX 580 GPUs. This result demonstrated that GPUs were dramatically faster than CPUs for training neural networks, and it sparked the modern deep learning revolution.
As deep learning frameworks like TensorFlow (2015) and PyTorch (2016) emerged, Nvidia worked closely with framework developers to optimize performance on its hardware. The company built specialized libraries such as cuDNN (CUDA Deep Neural Network library) for accelerating neural network primitives and cuBLAS for linear algebra. These libraries became deeply integrated into every major AI framework, creating a powerful software moat that competitors have struggled to replicate.
By 2020, CUDA had become the default compute backend for virtually all serious AI research and production workloads. The combination of mature libraries, extensive documentation, a large developer community, and years of optimization meant that switching to an alternative platform involved significant friction, even when competitive hardware was available.
Nvidia has released a series of GPU architectures, each one bringing major improvements in AI training and inference performance. The company has maintained a roughly two-year cadence for new data center GPU architectures, but in 2024 it announced a shift to an annual cadence covering both GPUs and full platforms.
The Tesla architecture, launched in 2007, was Nvidia's first GPU family designed specifically for general-purpose computing. The Tesla C870 and subsequent models were marketed toward high-performance computing (HPC) and scientific research rather than gaming. Tesla GPUs supported CUDA and offered double-precision floating point, making them suitable for computational physics and molecular dynamics.
The Fermi architecture improved upon Tesla with better double-precision performance and support for error-correcting code (ECC) memory, which was important for scientific applications. Fermi also introduced a unified address space and support for C++ in CUDA programs.
Kepler introduced dynamic parallelism and Hyper-Q technology, allowing the GPU to manage workloads more efficiently. Maxwell focused on energy efficiency and delivered a significant performance-per-watt improvement.
The Pascal architecture, realized in the Tesla P100 accelerator, was Nvidia's first data center GPU built on the 16nm FinFET process. The P100 featured 3,584 CUDA cores, 16 GB of HBM2 memory, and up to 720 GB/s of memory bandwidth. Pascal also introduced NVLink, a high-speed interconnect for GPU-to-GPU communication that was faster than PCIe.
Volta was a landmark architecture for AI. The Tesla V100 introduced Tensor Cores, specialized hardware units designed to accelerate matrix multiply-and-accumulate operations that are central to deep learning. The V100 featured 5,120 CUDA cores, 640 Tensor Cores, 16 or 32 GB of HBM2 memory, 900 GB/s of memory bandwidth, and approximately 21.1 billion transistors fabricated on a 12nm process.
Tensor Cores enabled mixed-precision training, where computations are performed in FP16 (half precision) while maintaining FP32 (single precision) accuracy for accumulation. This approach roughly doubled training throughput compared to FP32-only execution, with minimal impact on model quality.
Although primarily a gaming architecture (GeForce RTX 20 series), Turing introduced second-generation Tensor Cores and RT cores for ray tracing. The data center variant, the T4, became widely used for inference workloads due to its low power consumption and INT8 support.
The Ampere architecture, embodied in the A100 accelerator, brought third-generation Tensor Cores with support for additional data types including TF32 (TensorFloat-32), BF16 (bfloat16), and FP64 Tensor Core operations. The A100 was built on a 7nm process with 54 billion transistors, 6,912 CUDA cores, 432 Tensor Cores, and 40 or 80 GB of HBM2e memory providing up to 2 TB/s of bandwidth.
The A100 also introduced Multi-Instance GPU (MIG) technology, allowing a single GPU to be partitioned into up to seven independent instances for running multiple workloads concurrently. The A100's third-generation NVLink provided 600 GB/s of GPU-to-GPU bandwidth.
The Hopper architecture, named after computer scientist Grace Hopper, produced the H100 accelerator. Built on a 4nm process with approximately 80 billion transistors, the H100 featured 16,896 CUDA cores, 528 Tensor Cores, and 80 GB of HBM3 memory delivering 3.35 TB/s of bandwidth.
The H100 introduced the Transformer Engine, a hardware feature that automatically manages mixed-precision computation between FP8 and FP16 formats on a layer-by-layer basis. This was specifically designed to accelerate transformer architectures, which underpin modern LLMs. The H100 SXM variant delivered approximately 67 TFLOPS of FP32 performance and up to 1,979 TFLOPS of FP16 Tensor performance.
The H100 became the most sought-after chip in the AI industry during 2023 and 2024. Wait times stretched to months, and the chip traded on secondary markets at significant premiums above list price.
Nvidia later released the H200, an updated version with 141 GB of HBM3e memory and 4.8 TB/s of bandwidth, offering substantially improved performance for large model inference due to the increased memory capacity and bandwidth.
The Blackwell architecture, named after mathematician David Blackwell, represented another major leap. Blackwell GPUs use a novel dual-die design in which two GPU dies are connected by a high-bandwidth on-chip link and function as a single unified GPU.
The B200 accelerator features approximately 208 billion transistors (104 billion per die), 20,480 CUDA cores, 192 GB of HBM3e memory, and 8 TB/s of memory bandwidth. The B200 introduced fifth-generation Tensor Cores with native FP4 (4-bit floating point) support for inference and second-generation Transformer Engine support.
Nvidia also released the B300 (Blackwell Ultra) variant with 288 GB of HBM3e memory and enhanced compute capabilities, delivering up to 15,000 TFLOPS in FP4 Tensor operations.
The GB200 NVL72 is a rack-scale system that combines 72 Blackwell GPUs and 36 Grace CPUs connected via fifth-generation NVLink. This configuration delivers up to 1,440 petaflops of FP4 inference performance and is designed for training and running trillion-parameter models.
Blackwell GPUs began volume production in late 2024 and ramped aggressively through 2025, with every major cloud provider deploying them at scale. In Nvidia's Q3 FY2026 earnings call (November 2025), CEO Jensen Huang stated that "Blackwell sales are off the charts, and cloud GPUs are sold out."[29]
At GTC 2026, held March 16 to 19 at the SAP Center in San Jose, Nvidia unveiled the Vera Rubin platform, the company's next-generation architecture. The platform pairs the new Rubin GPU with a custom Arm-based CPU called Vera, named after astronomer Vera Rubin.[30][31]
The Rubin GPU uses two reticle-sized dies and is targeted at delivering up to 50 PFLOPS of NVFP4 (FP4) compute, 288 GB of HBM4 memory, and 22 TB/s of memory bandwidth. The Vera CPU contains 88 custom Arm v9-class cores with 2-way simultaneous multithreading (176 threads) and 1.8 TB/s of NVLink-C2C bandwidth to its companion GPU, approximately double the bandwidth of the prior Grace CPU.[31]
A full Vera Rubin NVL144 rack integrates 144 Rubin GPU dies (in 72 packages) with 36 Vera CPUs and delivers up to 3.6 NVFP4 exaflops of inference performance and approximately 1.2 FP8 exaflops of training performance, according to Nvidia's specifications.[31] Volume production of Vera Rubin is targeted for the second half of 2026. Rubin Ultra is planned for 2027, and a successor architecture, codenamed Feynman, has been previewed for 2028.
In September 2025, Nvidia separately introduced the Rubin CPX, described as a new class of GPU purpose-built for "massive-context" inference workloads such as million-token coding and generative video. Rubin CPX combines a monolithic die with 128 GB of GDDR7 memory and is rated at 30 NVFP4 petaflops, with integrated video encode and decode acceleration; it is intended to be paired with standard Rubin GPUs in a disaggregated inference architecture and is expected to ship at the end of 2026.[32]
The following table summarizes key specifications of Nvidia's major data center GPU accelerators used for AI workloads.
| GPU | Architecture | Year | Process | CUDA Cores | Memory | Memory Type | Bandwidth (TB/s) | FP32 (TFLOPS) | FP16 Tensor (TFLOPS) | TDP (W) | Transistors (B) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Tesla P100 | Pascal | 2016 | 16nm | 3,584 | 16 GB | HBM2 | 0.72 | 10.6 | N/A | 300 | 15.3 |
| Tesla V100 | Volta | 2017 | 12nm | 5,120 | 32 GB | HBM2 | 0.90 | 15.7 | 125 | 300 | 21.1 |
| A100 (SXM) | Ampere | 2020 | 7nm | 6,912 | 80 GB | HBM2e | 2.0 | 19.5 | 312 | 400 | 54.2 |
| H100 (SXM) | Hopper | 2022 | 4nm | 16,896 | 80 GB | HBM3 | 3.35 | 67 | 1,979 | 700 | 80 |
| H200 | Hopper | 2024 | 4nm | 16,896 | 141 GB | HBM3e | 4.8 | 67 | 1,979 | 700 | 80 |
| B200 | Blackwell | 2025 | 4nm | 20,480 | 192 GB | HBM3e | 8.0 | N/A | 2,250 (FP16) | 1,000 | 208 |
| B300 | Blackwell Ultra | 2025 | 4nm | 20,480 | 288 GB | HBM3e | 8.0 | N/A | ~2,500 (FP16) | 1,400 | 208 |
| Rubin CPX | Rubin | 2026 | 3nm | TBD | 128 GB | GDDR7 | TBD | TBD | 30 PFLOPS (NVFP4) | TBD | TBD |
| Rubin (VR200) | Rubin | 2026 | 3nm | TBD | 288 GB | HBM4 | 22.0 | TBD | 50 PFLOPS (NVFP4) | TBD | TBD |
Nvidia sells GPUs through two distinct product lines: data center accelerators (A100, H100, H200, B200) designed for professional AI workloads, and GeForce consumer GPUs (RTX 4090, RTX 5090) designed primarily for gaming.
Although consumer GPUs can be used for AI tasks, there are important differences.
| Feature | Data Center GPUs (e.g., H100) | Consumer GPUs (e.g., RTX 5090) |
|---|---|---|
| Memory capacity | 80 to 288 GB (HBM) | 24 to 32 GB (GDDR) |
| Memory bandwidth | 3.35 to 8+ TB/s | 1.0 to 1.8 TB/s |
| Tensor Core support | Full precision range (FP64, FP32, FP16, FP8, FP4) | Limited precision support |
| Multi-GPU interconnect | NVLink (up to 1.8 TB/s) | PCIe only (~128 GB/s) |
| ECC memory | Yes | No |
| MIG support | Yes (A100, H100) | No |
| Price | $25,000 to $40,000+ | $1,800 to $2,600 |
| Typical use case | Large-scale training, enterprise inference | Fine-tuning, small model inference, research |
The limited memory capacity of consumer GPUs is the primary bottleneck for AI workloads. A single RTX 5090 with 32 GB of GDDR7 cannot hold the parameters of models larger than about 15 billion parameters at full precision, whereas an H200 with 141 GB of HBM3e can handle much larger models. The lack of NVLink on consumer GPUs also makes multi-GPU training significantly less efficient, as GPUs must communicate over the much slower PCIe bus.
That said, consumer GPUs offer strong price-to-performance ratios for smaller workloads. The RTX 5090 delivers roughly 30 to 45% better deep learning performance than the RTX 4090, and research teams on tight budgets sometimes build multi-GPU workstations with consumer cards for experimentation and fine-tuning.
Nvidia's DGX product line provides turnkey AI computing systems that bundle multiple GPUs with optimized networking, storage, and software.
The DGX-1, announced in April 2016, was marketed as "the world's first deep learning supercomputer." It contained eight Tesla P100 GPUs connected via NVLink and was designed to deliver the computational equivalent of approximately 250 conventional servers. Jensen Huang personally delivered the first DGX-1 unit to the OpenAI research lab.
The DGX-2 doubled the GPU count to 16 Tesla V100 GPUs connected through NVSwitch, delivering 2 petaflops of deep learning performance in a single system.
The DGX A100 featured eight A100 GPUs, 15 TB of NVMe storage, 1 TB of system RAM, and eight 200 Gb/s InfiniBand ConnectX-6 network interfaces, providing 5 petaflops of FP8 AI performance.
The DGX H100 contained eight H100 GPUs delivering 32 petaflops of FP8 AI compute, 640 GB of total HBM3 memory, and fourth-generation NVLink for GPU-to-GPU communication at 900 GB/s per GPU.
The DGX B200 features eight Blackwell B200 GPUs delivering up to 40 petaflops of FP8 AI performance. Nvidia claims the DGX B200 provides 3x faster training and 15x faster inference on large Mixture-of-Experts models compared to the DGX H100. The DGX B300, released in 2025, is based on the Blackwell Ultra B300 GPU and is the flagship of the current generation.
The DGX SuperPOD is a large-scale cluster configuration that combines multiple DGX systems with high-bandwidth networking and shared storage. SuperPODs scale from dozens to thousands of GPUs and are designed for training frontier AI models. Multiple organizations, including Meta, Microsoft, and various national laboratories, have deployed DGX SuperPOD configurations.
Announced as "Project DIGITS" at CES 2025 in January and rebranded as DGX Spark at GTC in March 2025, the system is a desktop-sized AI computer built around the new GB10 Grace Blackwell Superchip. The first generation provided 128 GB of unified memory and approximately 1,000 sparse FP4 TOPS of compute. It is sold through system builders including ASUS, Dell, HP, and Lenovo and is targeted at researchers and developers who want to prototype and fine-tune models locally before deploying to cloud-based infrastructure.[33]
A larger sibling, DGX Station, was unveiled alongside Spark at GTC 2025; it is built on the Blackwell Ultra platform and packaged in a workstation form factor for solo developers and small teams.
Nvidia's competitive advantage extends well beyond hardware. The company has built a comprehensive software ecosystem that spans the entire AI development pipeline, from data preparation through model training to production inference.
CUDA is the foundational layer of Nvidia's software stack. Released in 2006, it provides a parallel computing platform and programming model that allows developers to use Nvidia GPUs for general-purpose computation. CUDA includes a compiler (nvcc), runtime libraries, debugging tools, and profiling utilities. As of 2025, there are over 5 million CUDA developers worldwide, and more than 1,000 GPU-accelerated applications have been built on the platform.
cuDNN (CUDA Deep Neural Network library) provides highly optimized implementations of common neural network operations such as convolution, pooling, normalization, and activation functions. Every major deep learning framework, including PyTorch, TensorFlow, and JAX, relies on cuDNN for GPU-accelerated training and inference.
TensorRT is a high-performance inference optimization SDK. It takes trained neural network models and applies graph optimizations, layer fusion, kernel auto-tuning, precision calibration (FP16, INT8, FP8), and other techniques to maximize inference throughput and minimize latency. TensorRT can speed up inference by up to 6x compared to running the same model in a standard framework. TensorRT-LLM is a specialized version designed for optimizing and serving large language models.
Triton Inference Server is an open-source inference serving platform that supports models from multiple frameworks (PyTorch, TensorFlow, ONNX, TensorRT) and can run on both GPUs and CPUs. Triton handles model versioning, dynamic batching, ensemble pipelines, and provides HTTP/gRPC endpoints for serving predictions at scale. It has become widely adopted for production AI deployments.
Nvidia NeMo is an end-to-end framework for building, customizing, and deploying large language models and conversational AI systems. NeMo provides tools for data curation, supervised fine-tuning, reinforcement learning from human feedback (RLHF), and model alignment. It integrates with Nvidia's hardware optimizations and supports distributed training across large GPU clusters.
NVIDIA AI Enterprise is a supported software platform that bundles CUDA, cuDNN, TensorRT, Triton, NeMo, and a suite of pre-built tools with enterprise-grade support contracts. A major component is NIM (NVIDIA Inference Microservices), introduced at GTC 2024 and expanded through 2025-2026: each NIM packages a model (LLM, vision, speech, or domain-specific) together with an optimized inference engine and OpenAI-compatible API endpoints, deployable as a container on any CUDA-capable infrastructure. NIM is sold per-GPU-per-year as part of the AI Enterprise subscription.
RAPIDS is a suite of open-source GPU-accelerated libraries for data science and analytics. It includes cuDF (a GPU DataFrame library similar to pandas), cuML (GPU-accelerated machine learning algorithms), and cuGraph (graph analytics). RAPIDS allows data scientists to accelerate their existing workflows by moving computation from CPUs to GPUs with minimal code changes.
| Software | Purpose |
|---|---|
| NCCL | Multi-GPU and multi-node collective communication library |
| cuBLAS | GPU-accelerated basic linear algebra |
| Nsight Systems | System-wide performance profiling |
| DALI | GPU-accelerated data loading and preprocessing pipeline |
| Magnum IO | Optimized I/O for data center workloads |
| Run:ai | GPU orchestration and workload management (acquired 2024) |
Training a modern large language model requires three primary resources: data, algorithms, and compute. Nvidia GPUs are central to the compute component. The training process involves repeatedly performing forward passes (computing predictions) and backward passes (computing gradients and updating model weights) over massive datasets. These operations are dominated by matrix multiplications, which map efficiently onto the parallel architecture of GPUs.
A typical large-scale training run for a frontier model uses thousands of GPUs working in parallel. For example, training a model with hundreds of billions of parameters might use a cluster of 8,000 to 32,000 H100 GPUs running continuously for several weeks. The GPUs communicate gradient updates using high-speed NVLink and InfiniBand networking.
Nvidia's hardware and software together address several bottlenecks in this pipeline.
Following the 2020 acquisition of Mellanox Technologies, Nvidia has built one of the largest networking businesses in the data center industry; networking revenue grew approximately 142% year-over-year to roughly $11 billion in Q4 FY2026 alone.[4]
Spectrum-X is Nvidia's Ethernet platform tuned for AI workloads, combining Spectrum-4 (and successor Spectrum-6) ASICs with BlueField-3 SuperNICs. At GTC 2025, Nvidia announced Spectrum-X Photonics and Quantum-X Photonics silicon-photonics switches that integrate the optical engine directly into the switch package. Configurations include up to 512 ports of 800 Gb/s Ethernet (400 Tb/s total throughput) and 144 ports of 800 Gb/s InfiniBand. Nvidia states the photonics designs use roughly 4x fewer lasers and deliver about 3.5x better power efficiency than equivalent pluggable-optics deployments.[34]
Spectrum-XGS Ethernet, introduced in 2025, is a "scale-across" technology designed to link geographically distributed data centers into a unified, giga-scale AI super-factory. Meta and Oracle were named as early Spectrum-X Ethernet customers.[35]
BlueField-3 DPUs continue to ship in volume; BlueField-4 STX storage processors and Spectrum-6 SPX Ethernet switches were announced at GTC 2026 as components of the Vera Rubin rack-scale platform.[30]
Nvidia holds a near-monopoly on the AI training hardware market. Estimates from industry analysts in 2026 placed Nvidia's share of the AI accelerator market at approximately 80 to 90%, with the remaining share split among AMD, Google, Intel, Amazon, and various startups.
Several factors contribute to this dominance.
Software ecosystem lock-in: The CUDA ecosystem has been developing for nearly two decades. Most AI researchers, framework developers, and MLOps engineers have deep expertise in CUDA-based tools. Switching to a different hardware platform means rewriting or adapting code, revalidating model behavior, and retraining operations teams.
Performance leadership: Each generation of Nvidia GPUs has delivered substantial performance improvements over the previous generation and over competing offerings. While competitors have occasionally matched Nvidia on paper specifications, real-world training performance (which depends heavily on software optimization) has consistently favored Nvidia.
Supply chain and manufacturing: Nvidia has secured priority access to advanced manufacturing capacity at TSMC and to HBM supply from Samsung and SK Hynix, giving it the ability to ship large volumes of cutting-edge chips.
Full-stack integration: By providing hardware, interconnects (NVLink, NVSwitch), networking (acquired Mellanox in 2020 for $6.9 billion), and software in a single optimized stack, Nvidia reduces the integration burden for customers.
Nvidia's revenue growth over the past several years illustrates the scale of the AI computing boom.
| Fiscal Year (ending January) | Total Revenue | Data Center Revenue | YoY Revenue Growth |
|---|---|---|---|
| FY2022 | $26.9B | $10.6B | 61% |
| FY2023 | $27.0B | $15.0B | 0.2% |
| FY2024 | $60.9B | $47.5B | 126% |
| FY2025 | $130.5B | ~$115B | 114% |
| FY2026 | $215.9B | $193.7B | 65% |
The data center segment has become the overwhelming driver of Nvidia's business, growing from about 40% of total revenue in FY2022 to approximately 90% in FY2026. Quarterly revenue accelerated throughout FY2026:
| Quarter | Revenue | Data Center | YoY |
|---|---|---|---|
| Q1 FY2026 (Apr 2025) | $44.1B | $39.1B | 69% |
| Q2 FY2026 (Jul 2025) | $46.7B | $41.1B | 56% |
| Q3 FY2026 (Oct 2025) | $57.0B | $51.2B | 62% |
| Q4 FY2026 (Jan 2026) | $68.1B | $62.3B | 73% |
Within Q4 FY2026, networking hardware alone contributed $10.98 billion (up 263% year-over-year) while compute GPUs contributed $51.3 billion.[4]
For Q1 FY2027 (ending April 2026), Nvidia guided to revenue of $78.0 billion plus or minus 2%, explicitly noting that the outlook assumes no Data Center compute revenue from China.[4] Results for Q1 FY2027 are scheduled to be reported on May 20, 2026.[36]
Nvidia's gross margins have remained exceptionally high for a semiconductor company. FY2026 non-GAAP gross margin was 71.3%, down from the FY2024 peak as Blackwell ramped through a more complex bring-up cycle; non-GAAP gross margins were 73.6% in Q3 FY2026 alone.[29]
Nvidia's stock price has experienced extraordinary growth, driven by the AI boom.
| Date | Milestone |
|---|---|
| January 1999 | IPO on Nasdaq; market cap ~$563 million |
| May 2023 | Market cap crosses $1 trillion |
| February 2024 | Market cap crosses $2 trillion |
| June 2024 | 10-for-1 stock split; market cap crosses $3 trillion |
| January 2025 | Single-day loss of ~$600 billion following the DeepSeek R1 announcement |
| July 2025 | Market cap briefly touches $4 trillion |
| October 29, 2025 | First company in history to close above $5 trillion market cap |
| Q1 2026 | Market cap reached approximately $5.5 trillion[28] |
Since its IPO, Nvidia's market capitalization has increased by approximately 1,000,000% in nominal terms. The stock (NVDA) underwent a 10-for-1 stock split in June 2024 (its sixth split overall).
Although Nvidia dominates the AI accelerator market, several competitors are working to challenge its position.
AMD is Nvidia's most direct competitor in the GPU market. AMD's Instinct MI300X, launched in late 2023, offered 192 GB of HBM3 memory and competitive inference performance. The MI325X and MI355X followed in 2024-2025. At CES 2026, AMD unveiled the MI400 series, with 432 GB of HBM4 memory targeting Nvidia's Rubin generation.
AMD has secured deployment commitments from Microsoft, Meta, and OpenAI. However, AMD's ROCm software ecosystem, while improving, is still considered less mature than Nvidia's CUDA stack, and many AI workloads do not yet run as efficiently on AMD hardware.
Google has developed its own Tensor Processing Units (TPUs) since 2015. TPUs are custom ASICs designed specifically for tensor operations. Google uses TPUs extensively for internal AI workloads and offers them to external customers through Google Cloud. The seventh-generation TPU, Ironwood, released in late 2025, delivers 4,614 TFLOPS per chip, which analysts have described as being on par with Blackwell in some workloads.
TPUs are tightly integrated with Google's JAX and TensorFlow frameworks. However, they are only available through Google Cloud, limiting their reach compared to Nvidia GPUs, which can be purchased outright or rented from any cloud provider.
Amazon Web Services (AWS) developed the Trainium series of custom AI training chips. Trainium2, launched in 2024, is used by Anthropic to train its Claude models, with deployments reportedly exceeding 500,000 chips. AWS launched Trainium3 in December 2025 with 2.52 petaflops of FP8 compute and 144 GB of HBM3e memory. Amazon's approach is to offer Trainium as a lower-cost alternative to Nvidia GPUs within its cloud ecosystem.
Intel entered the AI accelerator market through its acquisition of Habana Labs in 2019. The Gaudi line of AI accelerators was positioned as a cost-effective alternative to Nvidia's offerings. However, Intel confirmed plans to discontinue the Gaudi line when its next-generation GPU architecture launches in 2026 or 2027, signaling a strategic pivot.
| Competitor | Product Line | Key Advantage | Key Limitation |
|---|---|---|---|
| AMD | Instinct MI series | High memory capacity; competitive pricing | ROCm ecosystem less mature than CUDA |
| TPU (Ironwood) | Tight JAX/TF integration; strong for training | Only available on Google Cloud | |
| Amazon | Trainium | Lower cost on AWS; good for inference | Limited to AWS; less community support |
| Intel | Gaudi (discontinued) | Budget-friendly | Being phased out |
| Custom silicon (various) | Microsoft Maia, Meta MTIA | Optimized for specific workloads | Not available to general market |
Despite growing competition, Nvidia's combination of hardware performance, software maturity, and ecosystem breadth has maintained its dominant position. Custom ASIC shipments from cloud providers are projected to grow 44.6% in 2026, compared to 16.1% growth for GPU shipments, indicating that the competitive dynamics are slowly shifting.
Nvidia GPUs are available through all major cloud platforms, and the company has established deep partnerships with each of the leading providers.
Nvidia and AWS have a partnership spanning over 15 years. AWS offers Nvidia GPU instances across multiple GPU generations, including P4d (A100), P5 (H100), and P6 (Blackwell) instances. In 2025, Nvidia launched DGX Cloud on AWS, a fully managed AI training platform. AWS has committed to deploying more than one million Nvidia GPUs, including Blackwell and Rubin architectures, across global cloud regions starting in 2026.
Microsoft Azure offers extensive Nvidia GPU availability, including NC, ND, and NV series virtual machines. Azure has deployed large-scale clusters using Nvidia GB300 NVL72 systems for training frontier AI models, including those built by OpenAI. Microsoft has also used Nvidia RTX PRO 6000 Blackwell GPUs for Azure workloads and has announced plans for further expansion.
While Google also develops its own TPUs, Google Cloud offers Nvidia GPU instances (A100, H100, and Blackwell-based) for customers who prefer or require Nvidia hardware. Nvidia DGX Cloud is also available on Google Cloud.
CoreWeave, a GPU-focused cloud provider that went public on the Nasdaq on March 28, 2025 (raising $1.5 billion), has emerged as Nvidia's most strategically intertwined cloud customer. In January 2026, Nvidia invested an additional $2 billion in CoreWeave at $87.20 per share to support CoreWeave's planned buildout of more than 5 gigawatts of AI factory capacity by 2030. As of early 2026, Nvidia held approximately a 13% stake in CoreWeave, up from 7% at IPO.[37] An expanded master services agreement signed in fall 2025 commits Nvidia to purchase up to $6.3 billion in unsold CoreWeave capacity through 2032.
These partnerships reflect a mutually dependent relationship. Cloud providers need Nvidia GPUs to attract AI workloads, while Nvidia benefits from the massive purchasing power of hyperscale data centers. Hyperscalers accounted for just over 50% of Nvidia's data center revenue in FY2026. The combined fiscal 2026 AI-related capital expenditure of Microsoft, Amazon, Meta, and Alphabet has been publicly disclosed at roughly $700 billion across all four companies.
In September 2025, Nvidia and OpenAI announced a strategic partnership in which Nvidia would invest up to $100 billion in OpenAI progressively, in stages tied to the deployment of at least 10 gigawatts of NVIDIA AI systems for OpenAI training and inference workloads.[38] The first gigawatt is scheduled to be deployed in the second half of 2026 on the Vera Rubin platform. In December 2025, Nvidia CFO Colette Kress disclosed that a definitive agreement had not yet been completed, noting the deal remained at the letter-of-intent stage.[39]
The partnership sits alongside the Stargate Project, a joint venture announced on January 21, 2025 by OpenAI, SoftBank, Oracle, and Abu Dhabi-based MGX. Stargate plans to deploy $500 billion in AI infrastructure over four years in the United States, with $100 billion committed at launch. Stargate's flagship Abilene, Texas site opened in September 2025 running Oracle Cloud Infrastructure on racks of Nvidia GPUs; additional sites have been announced in New Mexico, Ohio, and elsewhere.[40]
In November 2025, Nvidia and Microsoft jointly announced a $15 billion combined investment in Anthropic, with Nvidia contributing $10 billion and Microsoft contributing $5 billion at a valuation of approximately $350 billion. As part of the deal, Anthropic committed to purchase $30 billion of Microsoft Azure compute capacity, and to deploy up to 1 gigawatt of NVIDIA Grace Blackwell and Vera Rubin systems. Anthropic and Nvidia will also co-engineer future architectures, optimizing Claude models for Nvidia silicon and tailoring future Nvidia hardware to Anthropic workloads.[41]
The deal made Claude the only frontier model commercially available on all three major hyperscale clouds (AWS, Azure, Google Cloud) and is widely viewed as Nvidia's hedge against single-customer concentration on OpenAI.
The US government has imposed a series of export restrictions on advanced AI chips to China, significantly affecting Nvidia's business in one of its largest markets.
October 2022: The Bureau of Industry and Security (BIS) introduced the first round of semiconductor export controls targeting China. The rules restricted the export of chips above certain performance thresholds, effectively banning sales of the A100 and H100 to Chinese customers.
2023: Nvidia designed and sold the A800 and H800, modified versions of the A100 and H100 with reduced interconnect bandwidth to comply with the initial export controls. In October 2023, BIS broadened the rules to close this loophole, and Nvidia was notified to immediately halt exports of the H800.
2024: Nvidia introduced the H20, a further downgraded chip designed to fall below the revised performance thresholds. Nvidia sold approximately one million H20 chips to Chinese customers in 2024.
January 2025: The outgoing Biden administration issued the "AI Diffusion Rule" (Framework for AI Diffusion), establishing global performance thresholds that blocked sales of flagship GPUs like the H100 and H200 to China while creating a tiered system of export permissions for different countries.
April 2025: The Trump administration tightened controls further, effectively halting all H20 exports to China; Nvidia took an approximately $5.5 billion inventory and commitment charge as a result.[42]
July 2025: After intensive lobbying by Jensen Huang, including meetings in both Washington and Beijing, the administration reversed course and allowed Nvidia to resume H20 shipments to China and AMD to restart MI308 sales. The new policy paired resumed sales with a reported 15% US government fee on chip sales into China.[43]
August 2025: Chinese regulators and several large Chinese internet companies signaled reluctance to absorb H20 inventory, citing both performance concerns relative to domestic alternatives and political pressure to favor local suppliers.
December 2025: Further adjustments allowed export of the H200 to approved Chinese customers under specific licensing conditions.
For Q1 FY2027, Nvidia's guidance explicitly assumes zero Data Center compute revenue from China, reflecting ongoing uncertainty around the policy regime.[4] The export controls have prompted Chinese companies to accelerate development of domestic alternatives, including Huawei's Ascend 910C, Cambricon's Siyuan 590, and MetaX's C500.
Beyond building hardware and software platforms, Nvidia conducts significant AI research through Nvidia Research and applies AI to several application domains.
Nvidia has developed the Nemotron family of large language models for enterprise and agentic AI applications. The Nemotron models are released under permissive open licenses and are designed to be customized for specific business use cases through NeMo. The family includes Nemotron 3, Nemotron-CC (a CommonCrawl-based training corpus), the Jet-Nemotron efficient variant, Nemotron Speech (for speech recognition), and Nemotron RAG (for retrieval-augmented generation with embedding and reranking models).
The Artificial Analysis Open Index has rated the Nemotron family among the most open model releases in the AI ecosystem based on license permissibility, data transparency, and technical documentation availability.
At CES 2025 on January 6, 2025, Nvidia introduced Cosmos, a platform of generative world foundation models (WFMs) for physical AI, including autonomous vehicles and robotics. Cosmos models generate physics-aware video from text, image, and sensor inputs. The first set of Cosmos Predict models was released under an open model license on Hugging Face and the NVIDIA NGC catalog, with early adopters including Agility Robotics, Figure AI, Foretellix, Skild AI, and Uber.[44] Nvidia stated that the Cosmos data pipeline can process 20 million hours of video in roughly 14 days on Blackwell hardware (versus more than three years on CPU-only systems).
A major Cosmos release in November 2025 added Cosmos Reason and Cosmos Transfer models for synthetic data generation and reasoning-on-pixels workflows.
Isaac GR00T is Nvidia's platform for AI-powered humanoid robots. GR00T N1, announced at GTC on March 18, 2025, was described by Nvidia as "the world's first open, fully customizable foundation model for generalized humanoid reasoning and skills." It uses a dual-system vision-language-action (VLA) architecture: a language-vision module interprets the environment and instructions while a diffusion-transformer module generates motor actions. Early access partners included Agility Robotics, Boston Dynamics, Mentee Robotics, and NEURA Robotics.[45]
Subsequent releases through 2025-2026 culminated in GR00T N1.6, a reasoning VLA that enables full-body control for humanoids. Nvidia has also released large open datasets, including hundreds of thousands of robotics trajectories generated in Isaac Sim and Omniverse.
Nvidia DRIVE is the company's platform for autonomous vehicle development. It includes the DRIVE Orin and DRIVE Thor system-on-chip processors for in-vehicle AI computing, as well as software tools for perception, mapping, and planning. The DRIVE AGX Thor superchip provides up to 2,000 FP8 TFLOPS for production vehicles.
At CES 2026, Nvidia introduced DRIVE Alpamayo-R1 (AR1), an open reasoning VLA model for autonomous-driving research that combines chain-of-thought reasoning with path planning. Mercedes-Benz announced that its new CLA model would be the first production passenger car to ship with the complete NVIDIA DRIVE AV software stack and Alpamayo capabilities; the system was demonstrated driving for 90% of test rides through San Francisco in early 2026. Hyundai Motor Group announced a separate AI factory partnership in November 2025 covering vehicle platforms, smart factories, and robotics. Other DRIVE customers include Toyota, BYD, Lucid, and several Chinese EV manufacturers.
Nvidia Omniverse is a platform for building and simulating 3D virtual worlds, or "digital twins." It is used in manufacturing, architecture, robotics, and autonomous vehicle simulation. Omniverse is built on the Universal Scene Description (OpenUSD) framework and allows real-time collaboration and physics-accurate simulation. Industrial digital-twin deployments in 2025-2026 included partnerships with Foxconn (for manufacturing-floor optimization) and BMW.
Nvidia has made several acquisitions that strengthened its position in AI and data center computing.
| Year | Company | Price | Significance |
|---|---|---|---|
| 2019 | Cumulus Networks | ~$100M | Data center networking software |
| 2020 | Mellanox Technologies | $6.9B | High-speed data center networking (InfiniBand) |
| 2020 | Arm Ltd. (attempted) | $40B | Failed acquisition of Arm; abandoned in February 2022 after regulatory pushback |
| 2024 | Run:ai (closed Dec 30, 2024) | ~$700M | GPU orchestration and Kubernetes scheduler for AI workloads |
The Mellanox acquisition was particularly strategic, as it gave Nvidia control over the InfiniBand networking technology that is used to connect GPUs in large-scale AI training clusters. By owning both the GPU and the network fabric, Nvidia can optimize the entire data path for distributed training workloads.
The attempted acquisition of Arm for $40 billion would have given Nvidia ownership of the CPU architecture used in most mobile devices and an increasing number of data center servers. However, the deal was blocked by regulators in multiple jurisdictions due to competition concerns and was officially terminated in February 2022.
The Run:ai acquisition closed on December 30, 2024 after unconditional approval from the European Commission. Nvidia subsequently committed to open-sourcing the Run:ai platform, which orchestrates GPU resources across Kubernetes clusters and can support non-Nvidia accelerators over time.[46]
Jensen Huang has led Nvidia as CEO since its founding in 1993, making him one of the longest-serving CEOs in the technology industry. He holds a bachelor's degree in electrical engineering from Oregon State University and a master's degree from Stanford University.
Huang's leadership style emphasizes long-term technical bets. The decision to invest in CUDA in 2006, years before deep learning became mainstream, is frequently cited as one of the most prescient strategic decisions in technology history. Under Huang's leadership, Nvidia pivoted from a gaming-focused GPU company to the dominant platform company for artificial intelligence.
Huang's personal net worth, derived almost entirely from his approximately 3.5% stake in Nvidia, was reported at $191.5 billion on Forbes' real-time billionaires list in mid-May 2026, making him among the world's ten wealthiest individuals.[47]
Chief Financial Officer Colette Kress has served in that role since 2013. President of the Enterprise Computing division Manuvir Das and SVP of GPU engineering Jonah Alben are among the most senior technical executives, while Jay Puri, EVP of Worldwide Field Operations, oversees go-to-market.