NVIDIA DGX Cloud
Last reviewed
Jun 3, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,919 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,919 words
Add missing citations, update stale details, or suggest a clearer explanation.
NVIDIA DGX Cloud is a managed artificial intelligence computing service from Nvidia that gives enterprises access to NVIDIA's DGX supercomputing infrastructure and software stack over the internet, rented on a subscription basis rather than purchased outright. Announced at the company's GTC conference on March 21, 2023, the service was pitched as a way for "every enterprise" to reach an AI supercomputer "from a browser," removing the procurement, deployment and operational burden of building on-premises clusters. [1] DGX Cloud is hosted on the data centers of third-party cloud and colocation partners as well as on NVIDIA's own infrastructure, yet the offering, pricing and customer relationship are owned by NVIDIA. The service marked NVIDIA's most direct move into selling accelerated computing as a service, layering a software-and-services business on top of the GPU hardware it already sold to the same cloud providers. [2]
Over its first three years the offering broadened considerably. It expanded from training-focused clusters to a portfolio that includes DGX Cloud Serverless Inference for deploying models, and DGX Cloud Lepton, a global GPU marketplace launched in 2025 that grew out of NVIDIA's acquisition of the startup Lepton AI.
NVIDIA unveiled DGX Cloud during chief executive Jensen Huang's keynote at GTC on March 21, 2023, framing the moment in characteristically expansive terms. "We are at the iPhone moment of AI," Huang said. "Startups are racing to build disruptive products and business models, and incumbents are looking to respond. DGX Cloud gives customers instant access to NVIDIA AI supercomputing in global-scale clouds." [1]
The core proposition was that an enterprise could rent a dedicated cluster of NVIDIA DGX systems on a monthly basis and reach it through a web browser, sidestepping the long lead times, capital expenditure and engineering effort required to acquire and run scarce accelerated computing hardware in-house. Each instance was configured as an eight-GPU node and came bundled with NVIDIA's software for managing and developing AI workloads. NVIDIA positioned the service for the training of large, multi-node models, including the large language models and other generative AI systems that were driving demand for its chips at the time. [1]
DGX Cloud inverted NVIDIA's traditional relationship with the cloud. Rather than only selling GPUs to hyperscalers who then resold capacity, NVIDIA placed its own DGX hardware inside partner data centers, wrapped it in NVIDIA software, and sold the bundle directly to enterprises as a first-party service. The pricing model and the customer billing relationship belonged to NVIDIA even when the technology was accessed through a partner's cloud marketplace. [2]
Analysts read the strategy as a deliberate shift toward software and services revenue. NVIDIA was, as one assessment put it, "not rich enough, or dumb enough" to build a cloud to rival Amazon Web Services, Microsoft Azure or Google Cloud, so it used those clouds as a distribution channel while capturing recurring, higher-margin software income on top of hardware sales. The arrangement also let NVIDIA sell the same GPUs twice in economic terms: once to the cloud provider building the data center, and again, through DGX Cloud subscriptions, to the enterprise consuming the capacity. [2]
At launch NVIDIA named a specific set of hosting partners rather than the full roster of major hyperscalers. Oracle Cloud Infrastructure was first to make DGX Cloud generally available, paired with a purpose-built remote direct memory access (RDMA) network, bare-metal compute and high-performance storage scaling to tens of thousands of GPUs. The colocation provider Equinix also offered the service from its data centers. NVIDIA said Microsoft Azure and Google Cloud would follow. [1][3]
Notably, Amazon Web Services was not among the original DGX Cloud hosts. AWS and NVIDIA announced a DGX Cloud collaboration later, on November 28, 2023, under which AWS would host the service. That deployment was described as the first DGX Cloud built on the GH200 NVL32 multi-node platform based on the NVIDIA GH200 Grace Hopper Superchip, with a single Amazon EC2 instance able to provide up to 20 terabytes of shared memory for terabyte-scale workloads. The same collaboration produced Project Ceiba, a supercomputer for NVIDIA's own research featuring 16,384 GH200 Superchips and rated at 65 exaflops of AI performance. [4] DGX Cloud subsequently became available to purchase through AWS Marketplace, a step covered at AWS re:Invent in 2024. [5]
| Partner | Role | Status / timing |
|---|---|---|
| Oracle Cloud Infrastructure | Hosting cloud | First generally available, March 2023 |
| Equinix | Colocation host | Available at launch, March 2023 |
| Microsoft Azure | Hosting cloud | Announced as coming, 2023 |
| Google Cloud | Hosting cloud | Announced as coming, 2023 |
| Amazon Web Services | Hosting cloud | DGX Cloud collaboration announced November 2023; on AWS Marketplace by 2024 |
A defining feature of DGX Cloud was that the rented hardware shipped with NVIDIA's full enterprise AI software, so customers did not have to assemble their own stack. The bundle centered on the NVIDIA Base Command Platform, which handled cluster management, job scheduling and orchestration of large training runs, and NVIDIA AI Enterprise, the supported suite of more than a hundred frameworks, libraries and pretrained models, including the RAPIDS data-science libraries. [1][3]
Alongside the launch NVIDIA introduced cloud services that ran on this infrastructure, including NeMo for building and customizing large language models, Picasso for image, video and 3D generation, and BioNeMo for drug discovery and the language of proteins. These frameworks let DGX Cloud customers fine-tune and deploy generative AI models without leaving NVIDIA's environment. [1]
Early adopters cited by NVIDIA spanned several industries: the biotechnology firm Amgen for drug discovery, CCC Intelligent Solutions for AI in insurance claims, and ServiceNow for AI research and enterprise software. [1]
NVIDIA set the entry price for DGX Cloud at 36,999 US dollars per instance per month. Each instance comprised eight NVIDIA H100 or A100 80GB Tensor Core GPUs, for a total of 640GB of GPU memory per node, plus the bundled Base Command Platform and AI Enterprise software. [1][3] Independent coverage confirmed the figure and noted the unusual commercial structure: customers paid NVIDIA directly even when they reached the service through a partner cloud's marketplace, which shielded NVIDIA from the margin compression it would have faced as a mere hardware supplier and preserved its pricing power. [2]
DGX Cloud crystallized a broader repositioning of NVIDIA from a chip vendor into a full-stack AI platform company that increasingly monetized software and services. By the time of the launch, analysts observed that NVIDIA had more employees working on software than on hardware, and DGX Cloud represented a "cloud-first" turn that let the company sell recurring access to its technology rather than only one-time hardware. [2]
The model was also strategically delicate. NVIDIA's hosting partners were simultaneously its largest customers for GPUs and, through their own AI services, its competitors for enterprise AI spending. By owning the DGX Cloud relationship and price while running on partner infrastructure, NVIDIA inserted itself into the value chain without building a hyperscale cloud of its own, capturing software revenue that was independent of who owned the underlying data center. [2] The approach foreshadowed NVIDIA's later expansion into adjacent services and acquisitions, including the cluster-management company Run:ai, as it built out the software layer around its accelerated computing.
As the AI market shifted from a near-exclusive focus on training toward large-scale inference and agentic applications, DGX Cloud grew into a family of offerings.
At GTC in March 2025 NVIDIA introduced DGX Cloud Serverless Inference, built on NVIDIA Cloud Functions. Positioned as a horizontal aggregator, it abstracts the underlying infrastructure across AWS, Azure, Google Cloud, private clouds and on-premises data centers, and provides auto-scaling, global load balancing and multi-cloud deployment for production AI workloads. Developers can deploy models packaged as NVIDIA NIM microservices, custom containers or Helm charts, and the service was opened to independent software vendors and NVIDIA Cloud Partners. [6]
The most significant addition was DGX Cloud Lepton, announced on May 18, 2025, at Computex. DGX Cloud Lepton is described as an AI platform with a compute marketplace that connects developers building agentic and physical AI to tens of thousands of GPUs drawn from a global network of cloud providers, through a single unified interface. Developers can either purchase capacity from participating providers or bring their own clusters, and the platform integrates the NVIDIA software stack, including NIM and NeMo microservices, NVIDIA Blueprints and NVIDIA Cloud Functions. [7]
DGX Cloud Lepton grew directly out of NVIDIA's acquisition of Lepton AI, a startup that rented NVIDIA GPU servers and built cloud management software. The deal, reported at several hundred million dollars and covering a team of roughly 20 people, closed in April 2025; Lepton AI's co-founders Yangqing Jia, the creator of the Caffe deep-learning framework and a former vice president at Alibaba, and Junjie Bai joined NVIDIA. The acquired technology and team became the foundation for the rebranded marketplace, extending NVIDIA's cloud and software ambitions against the dominance of the major hyperscalers. [8][9]
The initial set of NVIDIA Cloud Partners contributing NVIDIA Blackwell and other-architecture GPUs to the DGX Cloud Lepton marketplace included CoreWeave, Crusoe, Firmus, Foxconn, GMI Cloud, Lambda, Nebius, Nscale, SoftBank Corp. and Yotta Data Services, with AWS and Microsoft Azure named as the first large-scale cloud providers to participate. [7] On June 11, 2025, NVIDIA expanded DGX Cloud Lepton in Europe, adding regional providers including Mistral AI, Nebius, Nscale, Firebird, Fluidstack, Hydra Host, Scaleway and Together AI, and partnering with venture firms such as Accel, Elaia, Partech and Sofinnova Partners to offer eligible startups up to 100,000 US dollars in GPU capacity credits. [10]
| Offering | Introduced | Purpose |
|---|---|---|
| DGX Cloud (clusters) | March 2023 | Rented multi-node DGX clusters for large-scale training |
| DGX Cloud on AWS (GH200 NVL32) | November 2023 | Grace Hopper based instances with large shared memory |
| DGX Cloud Serverless Inference | March 2025 | Multi-cloud, auto-scaling inference via Cloud Functions |
| DGX Cloud Lepton | May 2025 | Global GPU marketplace aggregating many cloud providers |