Cloud computing
Last reviewed
Apr 28, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 ยท 4,221 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Apr 28, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 ยท 4,221 words
Add missing citations, update stale details, or suggest a clearer explanation.
Cloud computing is the on-demand availability of computer system resources, especially data storage and computing power, delivered over the internet without active management by the end user. Instead of owning and operating physical servers, customers rent capacity from a service provider, paying only for what they consume. By 2024 the model had grown into a global market worth roughly 330 billion US dollars per year, with generative AI workloads driving a large share of recent growth.[1]
The canonical technical definition comes from the United States National Institute of Standards and Technology, which in Special Publication 800-145 (Mell and Grance, 2011) describes cloud computing as a model that exhibits five essential characteristics, three service models, and four deployment models.[2] The five characteristics are on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service. The three service models are Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). The four deployment models are private, public, hybrid, and community cloud.
Cloud computing underpins most modern AI infrastructure. Large language models such as Anthropic's Claude and OpenAI's GPT family are trained and served on hyperscale cloud GPU clusters; managed services such as Amazon SageMaker, AWS Bedrock, Azure OpenAI, and Vertex AI abstract away the complexity of provisioning GPUs and TPUs; and GPU-as-a-service "neoclouds" such as CoreWeave, Lambda Labs, and Crusoe have emerged specifically to host AI training and inference.
The NIST definition is the most widely cited reference text for cloud computing in industry, government, and academia.[2] It is short enough to quote in full in procurement documents yet broad enough to cover the range of public, private, and hybrid deployments seen today.
If any one of these properties is missing, the system is usually classified as a hosted or managed service rather than true cloud computing.
Industry practice extends these categories with Function as a Service (FaaS), Container as a Service (CaaS), Database as a Service (DBaaS), and Backend as a Service (BaaS), each of which fits within or between the three NIST categories.
The conceptual roots of cloud computing reach back to the time-sharing era of the 1960s, when systems such as MIT's Compatible Time-Sharing System (CTSS, 1961) and the multi-organization MULTICS project demonstrated that a single mainframe could serve many remote users at once.[3] In a 1961 talk at MIT's centennial, John McCarthy anticipated the commercial form of this idea, predicting that "computation may someday be organized as a public utility," sold by the cycle the way electricity was sold by the kilowatt-hour.[4] In the following decades, IBM and DEC commercialized time-sharing, and remote application hosting matured into the application service provider model of the late 1990s.
The phrase "cloud computing" appears in internal Compaq business plans dated 1996 and was used publicly in a 1997 INFORMS keynote by University of Texas professor Ramnath Chellappa, who described it as "a new computing paradigm where the boundaries of computing will be determined by economic rationale rather than technical limits alone."[5] The term entered mainstream business vocabulary in August 2006 when Google chief executive Eric Schmidt used it at a Search Engine Strategies conference to describe the data-services-and-architecture model that Google operated.[6]
The modern public cloud era is generally dated to 2006. Salesforce.com, founded in 1999 by Marc Benioff and Parker Harris, had already popularized the multi-tenant SaaS delivery model with its slogan "No software," but Salesforce was a single application. The launch of Amazon Web Services reframed the cloud as a set of general-purpose primitives. Amazon S3 launched on March 14, 2006 as the first publicly available AWS service, followed by EC2 in beta on August 25, 2006.[7][8] These two services, object storage and rentable virtual machines billed by the hour, became the template for almost every IaaS provider that followed. Google App Engine launched in preview on April 7, 2008 as a PaaS for Python web applications. Microsoft announced Windows Azure (later renamed Microsoft Azure) at its Professional Developers Conference on October 27, 2008, and the service became commercially available on February 1, 2010.[9] Google's broader IaaS and developer offerings were unified under the Google Cloud Platform brand from 2011 to 2012.
A second wave of cloud platforms grew out of open-source infrastructure software. Docker was first released in March 2013, popularizing application containers. Kubernetes, originally developed at Google as Borg's successor, was open-sourced in June 2014 and donated to the newly created Cloud Native Computing Foundation. AWS Lambda was announced on November 13, 2014 at AWS re:Invent, ushering in the serverless era.[10] Demand accelerated again from 2022 onward as generative AI workloads consumed enormous amounts of GPU capacity, and Synergy Research Group reported that the global cloud infrastructure market reached 330 billion US dollars in 2024, up roughly 60 billion year over year, with GenAI-related services driving about half of that growth.[1]
The boundary between IaaS, PaaS, and SaaS is often blurry, since most providers sell products from all three categories.
| Layer | Customer manages | Provider manages | AWS | Azure | Google Cloud |
|---|---|---|---|---|---|
| SaaS | Data, configuration | Application, runtime, OS, virtualization, hardware | WorkMail | Microsoft 365 | Google Workspace |
| FaaS / Serverless | Function code, triggers | Runtime, OS, scaling, hardware | AWS Lambda | Azure Functions | Cloud Functions |
| PaaS | Application code, data | Runtime, OS, scaling, hardware | Elastic Beanstalk | App Service | App Engine |
| CaaS | Container images, manifests | Orchestrator, OS, hardware | ECS, EKS, Fargate | AKS | GKE |
| IaaS | OS, runtime, code, data | Virtualization, hardware, network | EC2, S3, VPC | VMs, Blob Storage | Compute Engine, Cloud Storage |
A typical AI startup in 2025 might run inference on a managed model API (SaaS), package agent logic on serverless functions (FaaS), keep training data in object storage (IaaS), and use a managed vector database (DBaaS).
Most public-facing internet services run on public cloud. Banks, defense agencies, and large healthcare providers often operate private cloud environments, sometimes on hardware in their own data centers and sometimes on dedicated hardware inside a public cloud region. Hybrid cloud is common for enterprises with regulated data: workloads with low latency or strict residency requirements stay on premises while elastic workloads burst to the public cloud. Community cloud examples include AWS GovCloud (US) for federal, state, and local government agencies, and consortium clouds operated by European banks. Multi-cloud, where an organization deliberately uses two or more public clouds, is now a common procurement strategy used to avoid vendor lock-in and to access provider-specific services such as Google's TPUs or Azure's OpenAI integration.
Three hyperscalers dominate the public IaaS and PaaS market. According to Synergy Research Group's Q4 2024 report, AWS held roughly 30 percent global market share, Microsoft Azure 21 percent, and Google Cloud 12 percent, for a combined 63 percent of a market growing at 22 percent year over year.[1] Alibaba Cloud leads China and ranks fourth globally; Oracle Cloud and IBM Cloud round out the top six.
| Provider | Founded / launched | HQ | Primary services |
|---|---|---|---|
| Amazon Web Services (AWS) | 2006 (S3, EC2) | Seattle, US | EC2, S3, Lambda, SageMaker, Bedrock |
| Microsoft Azure | 2010 | Redmond, US | VMs, Blob Storage, Azure ML, Azure OpenAI |
| Google Cloud Platform | 2008 / 2011 rebrand | Mountain View, US | Compute Engine, BigQuery, Vertex AI, TPUs |
| Alibaba Cloud | 2009 | Hangzhou, China | ECS, OSS, MaxCompute, PAI |
| Oracle Cloud Infrastructure | 2016 (Gen 2) | Austin, US | OCI Compute, Autonomous Database, GPU clusters |
| IBM Cloud | 2011 (SmartCloud) | Armonk, US | Cloud VPC, watsonx, Power Systems |
| Tencent Cloud | 2010 | Shenzhen, China | CVM, COS, TI Platform |
| Salesforce | 1999 | San Francisco, US | Sales Cloud, Service Cloud, Einstein |
| CoreWeave | 2017 | Roseland, US | GPU IaaS for AI training and inference |
| Lambda Labs | 2012 | San Francisco, US | GPU cloud, H100 / H200 instances |
| Crusoe Energy | 2018 | Denver, US | Stranded-gas-powered AI data centers and GPU cloud |
A newer category of providers, sometimes called "neoclouds," specializes in renting GPU capacity for AI workloads. CoreWeave, Lambda Labs, Crusoe, Together AI, and Nebius all operate clusters of NVIDIA accelerators sold by the hour or as reserved capacity. By Q4 2024, Synergy Research Group reported that Oracle and these specialized AI clouds had begun to inch market share away from the big three, though their share remained in low single digits.[1]
Cloud platforms now ship comprehensive stacks for training and serving AI models, spanning four layers: raw GPU and accelerator infrastructure, managed training and tuning, model hosting and inference APIs, and higher-level agents and applications.
AWS launched Amazon SageMaker at re:Invent 2017 as a unified platform for building, training, and deploying machine learning models, with managed Jupyter notebooks, distributed training, hyperparameter tuning, model registry, and autoscaling inference endpoints. For foundation-model workloads, AWS introduced Amazon Bedrock, announced April 13, 2023 and generally available on September 28, 2023.[11] Bedrock provides a single API for invoking models from Anthropic (Claude), AI21 Labs, Cohere, Meta (Llama), Stability AI, Mistral, and Amazon's own Titan and Nova families, plus fine-tuning, knowledge bases for retrieval-augmented generation, agents, and guardrails. At the hardware tier, AWS offers EC2 P5 instances with eight NVIDIA H100 GPUs and newer P5e instances with NVIDIA H200 GPUs, alongside custom Trainium training chips and Inferentia inference chips made by Amazon's Annapurna Labs subsidiary. Anthropic is a major Bedrock partner: Amazon invested up to 4 billion US dollars in Anthropic in September 2023 and an additional 4 billion in November 2024, with AWS named as Anthropic's primary cloud and training partner.
Microsoft's AI portfolio centers on the Azure OpenAI Service, which became generally available on January 17, 2023, giving enterprise customers access to OpenAI's GPT, DALL-E, and embedding models with Azure compliance and virtual-network isolation.[12] Microsoft has invested an estimated 13 billion US dollars in OpenAI since 2019 and runs much of OpenAI's training and inference on dedicated Azure capacity. Azure Machine Learning offers a similar workflow to SageMaker, and Azure ND-series VMs host NVIDIA H100 and H200 GPUs in 8-GPU configurations connected by NVLink and InfiniBand. In late 2024 Microsoft consolidated its agent and model-customization tools under the Azure AI Foundry brand.
Google's flagship managed AI service is Vertex AI, launched at Google I/O in May 2021. Vertex AI integrates AutoML, custom training, model registry, and a Model Garden with first-party models such as Gemini and Imagen alongside open models such as Llama, Mistral, and Anthropic's Claude. Google Cloud is the only hyperscaler that offers Tensor Processing Units, Google's custom AI accelerators. Public TPU generations include TPU v5e (efficiency-optimized), TPU v5p (announced December 2023), and TPU v6 "Trillium" (announced May 2024). Google bundles TPUs, GPUs, fast networking, storage, and a unified scheduler under the AI Hypercomputer brand. Anthropic also runs Claude on Vertex AI, and Google made multiple multibillion-dollar investments in Anthropic between 2023 and 2024.
NVIDIA's DGX Cloud, announced in March 2023, is a multi-cloud AI training service that offers reserved DGX nodes hosted inside Oracle, Microsoft, Google, and AWS data centers under a single NVIDIA-managed software stack. Neoclouds occupy the middle ground between hyperscalers and on-premises hardware: CoreWeave, founded in 2017 as a cryptocurrency mining business, pivoted to GPU rental and by 2024 operated more than 250,000 NVIDIA GPUs across about 30 data centers, becoming a key training partner for Microsoft and OpenAI workloads. Lambda Labs sells on-demand and reserved H100 / H200 capacity to AI startups and researchers; Crusoe Energy operates AI-optimized data centers powered partly by stranded natural gas; Together AI and Nebius mix model APIs with bare-GPU rental. OpenAI's compute strategy expanded beyond Azure during 2024 and 2025, with reported multi-year capacity agreements with Oracle Cloud and CoreWeave to supplement Microsoft Azure capacity.
Virtualization lets multiple virtual machines share a single physical server. Public clouds use a mix of open-source and proprietary hypervisors: Xen powered the original EC2 fleet from 2006, and AWS later migrated most instance families to its custom Nitro hypervisor, which offloads networking, storage, and security to dedicated hardware cards. Azure runs primarily on Hyper-V; Google Cloud uses a modified KVM. Container runtimes such as containerd and gVisor add lighter-weight isolation above VMs. Multi-tenancy means a single instance of infrastructure or software serves many customers with logical isolation: SaaS applications usually share a single database with tenant identifiers on each row, while IaaS providers separate tenants at the hypervisor or dedicated-host level.
Most public clouds organize physical capacity into regions, each composed of multiple availability zones. AWS popularized this terminology: a region is a geographic area such as US East (N. Virginia) or eu-west-1 (Ireland), and an availability zone is one or more discrete data centers with independent power, cooling, and networking. As of 2025, AWS operates more than 33 regions and over 100 availability zones; Azure and Google Cloud each operate more than 30 regions. Cloud storage splits into three tiers: object storage (S3, Azure Blob, Google Cloud Storage) for opaque binary objects accessed via HTTP, block storage (EBS, Azure Disks, Persistent Disk) for raw volumes that look like local disks, and file storage (EFS, Azure Files, Filestore) for shared file systems via NFS or SMB. Customer networks are carved out as Virtual Private Clouds (VPCs) with customer-controlled IP ranges, route tables, and firewall rules, connected to corporate networks via AWS Direct Connect or Azure ExpressRoute. CDNs such as AWS CloudFront, Google Cloud CDN, Azure Front Door, Cloudflare, and Fastly cache content at hundreds of edge locations.
Identity, authorization, and audit are handled by AWS IAM, Azure Active Directory (renamed Microsoft Entra ID in 2023), and Google Cloud IAM. These services support fine-grained policy languages, role assumption, federation via SAML or OIDC, and short-lived credentials. Misconfigured IAM policies remain among the most common causes of cloud data leaks.
Cloud pricing is one of the most-debated topics in enterprise IT. Most providers offer a menu of pricing modes for the same underlying resources:
Cloud bills also include less-visible cost categories. Egress fees, charged for data leaving the provider's network, have been criticized as a hidden lock-in mechanism: pulling a multi-petabyte data set out of one cloud can cost tens of thousands of dollars. The European Union's Data Act, which entered into force on January 11, 2024, requires cloud providers to make customer switching easier and to phase out egress fees; Google Cloud, AWS, and Azure each announced free egress for customers leaving the platform during 2024 in response.[13] The FinOps movement, formalized by the founding of the FinOps Foundation in 2019 and its merger into the Linux Foundation in 2020, defines a discipline for managing cloud cost and value across engineering, finance, and product teams. By 2024 over 90 percent of large enterprises with significant cloud spend reported having a dedicated FinOps function, according to the FinOps Foundation's annual State of FinOps survey.[14] Microsoft, Alphabet, Amazon, and Meta together announced more than 200 billion US dollars of planned 2024 capital spending on data centers and AI infrastructure, with higher figures projected for 2025; this AI capex has become large enough to show up in macroeconomic statistics on US business investment.[15]
Cloud security is governed by the shared responsibility model, popularized by AWS but adopted in some form by every major provider. The provider is responsible for the security "of" the cloud (physical data centers, hardware, hypervisor, network fabric, managed-service control planes), while the customer is responsible for security "in" the cloud (operating systems, application code, data classification, IAM and network configuration). Most major cloud breaches have stemmed from customer-side misconfiguration rather than provider compromise.
Providers pursue extensive compliance attestations: common frameworks include SOC 2 Type II, ISO/IEC 27001, ISO/IEC 27017, ISO/IEC 27018, PCI DSS, HIPAA, FedRAMP (Moderate and High), the UK's Cyber Essentials Plus, and Australia's IRAP. The European Union's GDPR imposes additional data-residency and data-subject-rights obligations. Encryption at rest is on by default for most cloud storage; encryption in transit is generally TLS 1.2 or 1.3. Customers often layer customer-managed keys on top using AWS KMS, Azure Key Vault, or Google Cloud KMS, sometimes with hardware security module backing.
A newer category of "sovereign cloud" offerings addresses concerns by European, Middle Eastern, and Asian governments that data stored in US-owned clouds might fall under US law, in particular the CLOUD Act of 2018, which authorizes US authorities to compel disclosure of data held by US companies even when stored abroad. Microsoft Cloud for Sovereignty, Google Sovereign Cloud, AWS European Sovereign Cloud (announced October 2023, scheduled to launch in 2025), and partnerships such as the Bleu joint venture between Capgemini, Orange, and Microsoft for the French market are direct responses. Major cloud-related security events include the May 2019 Capital One breach (an SSRF on AWS, exploited by a former AWS engineer), the December 2020 SolarWinds compromise affecting Microsoft cloud customers via the Orion software update, and the July 19, 2024 CrowdStrike Falcon sensor incident, in which a faulty content update sent millions of Windows hosts into boot loops, disrupting airlines, banks, and hospitals worldwide.
Hybrid and multi-cloud. Tools such as AWS Outposts, Azure Stack, Azure Arc, and Google Anthos bring cloud control planes onto customer premises. Service meshes (Istio, Linkerd) and platform-engineering tools (Backstage, Crossplane) help organizations operate workloads across multiple clouds and on-premises environments without rewriting them for each platform.
Edge computing. Edge computing extends cloud capabilities closer to data sources and users to reduce latency. Cloudflare Workers (2017), AWS Wavelength (2020), AWS Local Zones, Azure Edge Zones, and Google Distributed Cloud Edge target real-time gaming, vehicle workloads, low-latency 5G applications, and on-premises industrial AI inference.
Sustainability. Hyperscale data centers consume large amounts of electricity, and providers have responded with climate commitments. Microsoft pledged in January 2020 to be carbon-negative by 2030 and remove its historical emissions by 2050. Google announced in 2020 a goal of running on 24/7 carbon-free energy in all data centers by 2030. Amazon's Climate Pledge, signed in 2019, aims for net-zero by 2040, supported by power purchase agreements that made Amazon the world's largest corporate buyer of renewable electricity from 2020 through 2024 according to BloombergNEF. Critics note that GenAI workloads are pushing electricity demand and water usage upward at rates that strain these commitments.
AI-driven demand surge. Synergy Research Group estimated that GenAI accounted for about half of all cloud market growth in 2024.[1] This has driven hyperscaler capex to record highs, contributed to long lead times for NVIDIA H100 and H200 GPUs, and made power and grid interconnection capacity the primary bottleneck to data-center expansion in many regions. Infrastructure-as-code tools such as HashiCorp Terraform (2014), Pulumi (2018), and AWS CloudFormation are standard practice, and cloud configurations are increasingly generated and reviewed by AI assistants.
Vendor lock-in and egress fees. Migrating from one cloud to another involves rewriting infrastructure-as-code templates, IAM policies, and managed-service integrations; the deeper a customer adopts proprietary services such as DynamoDB, BigQuery, or Cosmos DB, the higher the switching cost. Charging customers to extract their own data has long been criticized as anti-competitive, though the 2024 EU Data Act has begun to constrain this practice.
Outages and concentration risk. Because a few hyperscalers host a large fraction of internet services, single-region failures cause cascading effects. Notable examples include the AWS US-East-1 outages of February 28, 2017 (an S3 typo took down portions of the public web) and December 7, 2021, Azure outages in September 2018 and July 2024, and a June 2019 Google Cloud networking outage. UK and EU regulators opened formal market studies of hyperscaler concentration in 2023 and 2024.
Privacy and government access. The CLOUD Act and analogous laws have created concerns about cross-border data access. The Court of Justice of the European Union's 2020 Schrems II ruling invalidated the EU-US Privacy Shield, leading to years of legal uncertainty for transatlantic data transfers and contributing to the rise of sovereign cloud offerings.
Over-provisioning, waste, and workforce shifts. Industry surveys including Flexera's annual State of the Cloud have estimated that 25 to 35 percent of cloud spend is wasted on idle or oversized resources, one motivator for the FinOps movement. Critics also argue that the AI capex surge undermines hyperscalers' climate commitments, that water consumption for evaporative cooling at large data centers can be locally significant in arid regions, and that traditional system-administration roles have shrunk in favor of platform-engineering, site-reliability-engineering, and FinOps roles. Some enterprises characterize "lift and shift" cloud migrations as exercises that move costs from capital to operating expense without delivering proportionate productivity gains.