# NVIDIA DGX Cloud

> Source: https://aiwiki.ai/wiki/nvidia_dgx_cloud
> Updated: 2026-06-03
> Categories: AI Infrastructure, NVIDIA
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

# NVIDIA DGX Cloud

**NVIDIA DGX Cloud** is a managed artificial intelligence computing service from [Nvidia](/wiki/nvidia) that gives enterprises access to NVIDIA's [DGX](/wiki/nvidia_dgx) supercomputing infrastructure and software stack over the internet, rented on a subscription basis rather than purchased outright. Announced at the company's GTC conference on March 21, 2023, the service was pitched as a way for "every enterprise" to reach an AI supercomputer "from a browser," removing the procurement, deployment and operational burden of building on-premises clusters. [1] DGX Cloud is hosted on the data centers of third-party cloud and colocation partners as well as on NVIDIA's own infrastructure, yet the offering, pricing and customer relationship are owned by NVIDIA. The service marked NVIDIA's most direct move into selling accelerated computing as a service, layering a software-and-services business on top of the GPU hardware it already sold to the same cloud providers. [2]

Over its first three years the offering broadened considerably. It expanded from training-focused clusters to a portfolio that includes DGX Cloud Serverless Inference for deploying models, and DGX Cloud Lepton, a global GPU marketplace launched in 2025 that grew out of NVIDIA's acquisition of the startup [Lepton AI](/wiki/lepton_ai).

## The GTC 2023 announcement

NVIDIA unveiled DGX Cloud during chief executive Jensen Huang's keynote at GTC on March 21, 2023, framing the moment in characteristically expansive terms. "We are at the iPhone moment of AI," Huang said. "Startups are racing to build disruptive products and business models, and incumbents are looking to respond. DGX Cloud gives customers instant access to NVIDIA AI supercomputing in global-scale clouds." [1]

The core proposition was that an enterprise could rent a dedicated cluster of NVIDIA DGX systems on a monthly basis and reach it through a web browser, sidestepping the long lead times, capital expenditure and engineering effort required to acquire and run scarce accelerated computing hardware in-house. Each instance was configured as an eight-GPU node and came bundled with NVIDIA's software for managing and developing AI workloads. NVIDIA positioned the service for the training of large, multi-node models, including the large language models and other generative AI systems that were driving demand for its chips at the time. [1]

## The rent-the-infrastructure model

DGX Cloud inverted NVIDIA's traditional relationship with the cloud. Rather than only selling GPUs to hyperscalers who then resold capacity, NVIDIA placed its own DGX hardware inside partner data centers, wrapped it in NVIDIA software, and sold the bundle directly to enterprises as a first-party service. The pricing model and the customer billing relationship belonged to NVIDIA even when the technology was accessed through a partner's cloud marketplace. [2]

Analysts read the strategy as a deliberate shift toward software and services revenue. NVIDIA was, as one assessment put it, "not rich enough, or dumb enough" to build a cloud to rival Amazon Web Services, Microsoft Azure or Google Cloud, so it used those clouds as a distribution channel while capturing recurring, higher-margin software income on top of hardware sales. The arrangement also let NVIDIA sell the same GPUs twice in economic terms: once to the cloud provider building the data center, and again, through DGX Cloud subscriptions, to the enterprise consuming the capacity. [2]

## Cloud and colocation partners

At launch NVIDIA named a specific set of hosting partners rather than the full roster of major hyperscalers. Oracle Cloud Infrastructure was first to make DGX Cloud generally available, paired with a purpose-built remote direct memory access (RDMA) network, bare-metal compute and high-performance storage scaling to tens of thousands of GPUs. The colocation provider Equinix also offered the service from its data centers. NVIDIA said Microsoft Azure and Google Cloud would follow. [1][3]

Notably, Amazon Web Services was not among the original DGX Cloud hosts. AWS and NVIDIA announced a DGX Cloud collaboration later, on November 28, 2023, under which AWS would host the service. That deployment was described as the first DGX Cloud built on the GH200 NVL32 multi-node platform based on the NVIDIA [GH200 Grace Hopper Superchip](/wiki/nvidia_gh200), with a single Amazon EC2 instance able to provide up to 20 terabytes of shared memory for terabyte-scale workloads. The same collaboration produced Project Ceiba, a supercomputer for NVIDIA's own research featuring 16,384 GH200 Superchips and rated at 65 exaflops of AI performance. [4] DGX Cloud subsequently became available to purchase through AWS Marketplace, a step covered at AWS re:Invent in 2024. [5]

| Partner | Role | Status / timing |
| --- | --- | --- |
| Oracle Cloud Infrastructure | Hosting cloud | First generally available, March 2023 |
| Equinix | Colocation host | Available at launch, March 2023 |
| Microsoft Azure | Hosting cloud | Announced as coming, 2023 |
| Google Cloud | Hosting cloud | Announced as coming, 2023 |
| Amazon Web Services | Hosting cloud | DGX Cloud collaboration announced November 2023; on AWS Marketplace by 2024 |

## Bundled software stack

A defining feature of DGX Cloud was that the rented hardware shipped with NVIDIA's full enterprise AI software, so customers did not have to assemble their own stack. The bundle centered on the [NVIDIA Base Command Platform](/wiki/nvidia_base_command), which handled cluster management, job scheduling and orchestration of large training runs, and [NVIDIA AI Enterprise](/wiki/nvidia_ai_enterprise), the supported suite of more than a hundred frameworks, libraries and pretrained models, including the [RAPIDS](/wiki/rapids) data-science libraries. [1][3]

Alongside the launch NVIDIA introduced cloud services that ran on this infrastructure, including [NeMo](/wiki/nvidia_nemo) for building and customizing large language models, Picasso for image, video and 3D generation, and BioNeMo for drug discovery and the language of proteins. These frameworks let DGX Cloud customers fine-tune and deploy generative AI models without leaving NVIDIA's environment. [1]

Early adopters cited by NVIDIA spanned several industries: the biotechnology firm Amgen for drug discovery, CCC Intelligent Solutions for AI in insurance claims, and ServiceNow for AI research and enterprise software. [1]

## Pricing as reported

NVIDIA set the entry price for DGX Cloud at 36,999 US dollars per instance per month. Each instance comprised eight NVIDIA H100 or A100 80GB Tensor Core GPUs, for a total of 640GB of GPU memory per node, plus the bundled Base Command Platform and AI Enterprise software. [1][3] Independent coverage confirmed the figure and noted the unusual commercial structure: customers paid NVIDIA directly even when they reached the service through a partner cloud's marketplace, which shielded NVIDIA from the margin compression it would have faced as a mere hardware supplier and preserved its pricing power. [2]

## Strategic significance

DGX Cloud crystallized a broader repositioning of NVIDIA from a chip vendor into a full-stack AI platform company that increasingly monetized software and services. By the time of the launch, analysts observed that NVIDIA had more employees working on software than on hardware, and DGX Cloud represented a "cloud-first" turn that let the company sell recurring access to its technology rather than only one-time hardware. [2]

The model was also strategically delicate. NVIDIA's hosting partners were simultaneously its largest customers for GPUs and, through their own AI services, its competitors for enterprise AI spending. By owning the DGX Cloud relationship and price while running on partner infrastructure, NVIDIA inserted itself into the value chain without building a hyperscale cloud of its own, capturing software revenue that was independent of who owned the underlying data center. [2] The approach foreshadowed NVIDIA's later expansion into adjacent services and acquisitions, including the cluster-management company [Run:ai](/wiki/run_ai), as it built out the software layer around its accelerated computing.

## Evolution: Serverless Inference and DGX Cloud Lepton

As the AI market shifted from a near-exclusive focus on training toward large-scale inference and agentic applications, DGX Cloud grew into a family of offerings.

At GTC in March 2025 NVIDIA introduced **DGX Cloud Serverless Inference**, built on NVIDIA Cloud Functions. Positioned as a horizontal aggregator, it abstracts the underlying infrastructure across AWS, Azure, Google Cloud, private clouds and on-premises data centers, and provides auto-scaling, global load balancing and multi-cloud deployment for production AI workloads. Developers can deploy models packaged as NVIDIA NIM microservices, custom containers or Helm charts, and the service was opened to independent software vendors and NVIDIA Cloud Partners. [6]

The most significant addition was **DGX Cloud Lepton**, announced on May 18, 2025, at Computex. DGX Cloud Lepton is described as an AI platform with a compute marketplace that connects developers building agentic and physical AI to tens of thousands of GPUs drawn from a global network of cloud providers, through a single unified interface. Developers can either purchase capacity from participating providers or bring their own clusters, and the platform integrates the NVIDIA software stack, including [NIM](/wiki/nvidia_nim) and NeMo microservices, NVIDIA Blueprints and NVIDIA Cloud Functions. [7]

DGX Cloud Lepton grew directly out of NVIDIA's acquisition of **Lepton AI**, a startup that rented NVIDIA GPU servers and built cloud management software. The deal, reported at several hundred million dollars and covering a team of roughly 20 people, closed in April 2025; Lepton AI's co-founders Yangqing Jia, the creator of the Caffe deep-learning framework and a former vice president at Alibaba, and Junjie Bai joined NVIDIA. The acquired technology and team became the foundation for the rebranded marketplace, extending NVIDIA's cloud and software ambitions against the dominance of the major hyperscalers. [8][9]

The initial set of NVIDIA Cloud Partners contributing NVIDIA Blackwell and other-architecture GPUs to the DGX Cloud Lepton marketplace included CoreWeave, Crusoe, Firmus, Foxconn, GMI Cloud, Lambda, Nebius, Nscale, SoftBank Corp. and Yotta Data Services, with AWS and Microsoft Azure named as the first large-scale cloud providers to participate. [7] On June 11, 2025, NVIDIA expanded DGX Cloud Lepton in Europe, adding regional providers including Mistral AI, Nebius, Nscale, Firebird, Fluidstack, Hydra Host, Scaleway and Together AI, and partnering with venture firms such as Accel, Elaia, Partech and Sofinnova Partners to offer eligible startups up to 100,000 US dollars in GPU capacity credits. [10]

| Offering | Introduced | Purpose |
| --- | --- | --- |
| DGX Cloud (clusters) | March 2023 | Rented multi-node DGX clusters for large-scale training |
| DGX Cloud on AWS (GH200 NVL32) | November 2023 | Grace Hopper based instances with large shared memory |
| DGX Cloud Serverless Inference | March 2025 | Multi-cloud, auto-scaling inference via Cloud Functions |
| DGX Cloud Lepton | May 2025 | Global GPU marketplace aggregating many cloud providers |

## See also

- [Nvidia](/wiki/nvidia)
- [NVIDIA AI Enterprise](/wiki/nvidia_ai_enterprise)
- [Lepton AI](/wiki/lepton_ai)
- [DGX Spark](/wiki/dgx_spark)
- [Run:ai](/wiki/run_ai)

## References

[1] NVIDIA. "NVIDIA Launches DGX Cloud, Giving Every Enterprise Instant Access to AI Supercomputer From a Browser." NVIDIA Newsroom, March 21, 2023. https://nvidianews.nvidia.com/news/nvidia-launches-dgx-cloud-giving-every-enterprise-instant-access-to-ai-supercomputer-from-a-browser

[2] Timothy Prickett Morgan. "Nvidia Bends The Clouds To Its Own Financial Will." The Next Platform, March 21, 2023. https://www.nextplatform.com/2023/03/21/nvidia-bends-the-clouds-to-its-own-financial-will/

[3] "NVIDIA DGX Cloud." Oracle Cloud Marketplace listing, Oracle Corporation. https://cloudmarketplace.oracle.com/marketplace/en_US/listing/154619827

[4] NVIDIA. "AWS and NVIDIA Announce Strategic Collaboration to Offer New Supercomputing Infrastructure, Software and Services for Generative AI." NVIDIA Newsroom, November 28, 2023. https://nvidianews.nvidia.com/news/aws-nvidia-strategic-collaboration-for-generative-ai

[5] "Nvidia DGX Cloud now available via AWS." Data Center Dynamics, 2024. https://www.datacenterdynamics.com/en/news/nvidia-dgx-cloud-now-available-via-aws/

[6] "NVIDIA Announces DGX Cloud Serverless Inference at GTC 2025." C# Corner, March 2025. https://www.c-sharpcorner.com/news/nvidia-announces-dgx-cloud-serverless-inference-at-gtc-20252

[7] NVIDIA. "NVIDIA Announces DGX Cloud Lepton to Connect Developers to NVIDIA's Global Compute Ecosystem." NVIDIA Newsroom, May 18, 2025. https://nvidianews.nvidia.com/news/nvidia-announces-dgx-cloud-lepton-to-connect-developers-to-nvidias-global-compute-ecosystem

[8] "NVIDIA acquires Chinese GPU cloud startup Lepton AI: report." TechNode, April 9, 2025. https://technode.com/2025/04/09/nvidia-acquires-chinese-gpu-cloud-startup-lepton-ai-report/

[9] "Nvidia launches 'Lepton' AI platform connecting GPU cloud providers' resources." Data Center Dynamics, May 2025. https://www.datacenterdynamics.com/en/news/nvidia-launches-lepton-ai-platform-connecting-gpu-cloud-providers-resources/

[10] NVIDIA. "NVIDIA DGX Cloud Lepton Connects Europe's Developers to Global NVIDIA Compute Ecosystem." NVIDIA Newsroom, June 11, 2025. https://nvidianews.nvidia.com/news/nvidia-dgx-cloud-lepton-connects-europes-developers-to-global-nvidia-compute-ecosystem

