# AI Infrastructure

> Source: https://aiwiki.ai/wiki/ai_infrastructure
> Updated: 2026-06-23
> Categories: AI Infrastructure
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**AI infrastructure** is the stack of hardware, facilities, and software that trains and serves artificial intelligence models: accelerator chips, the servers and networks that bind them into clusters, the [data centers](/wiki/data_center) and power supplies that house and feed them, the orchestration and inference software that keeps them utilized, and the cloud platforms that sell the resulting capacity. Between 2024 and 2026 it became one of the largest capital programs in business history. The four largest cloud companies spent close to $400 billion on capital projects in 2025 and announced 2026 plans approaching $700 billion [1][2], while [NVIDIA](/wiki/nvidia), the dominant accelerator vendor, booked $193.7 billion of data center revenue in its fiscal year ending January 25, 2026 [3]. "Blackwell sales are off the charts, and cloud GPUs are sold out," NVIDIA chief executive Jensen Huang said in November 2025, summarizing demand that ran ahead of supply across the entire stack [31].

## Overview

AI workloads split into training, which concentrates enormous compute on a single job for weeks or months, and inference, which serves billions of smaller requests with strict latency targets. Training pushed the industry toward ever-larger synchronized clusters; the shift to reasoning models and AI agents in 2025 and 2026 made inference capacity the faster-growing demand. Both run on the same layered stack, and each layer has its own bottleneck: memory and packaging at the silicon layer, interconnect bandwidth at the systems layer, electricity at the facilities layer, and capital itself at the platform layer.

| Layer | Role | Representative technologies and providers |
|---|---|---|
| Silicon | Accelerators, memory, fabrication | NVIDIA and [AMD](/wiki/amd) GPUs, [TPUs](/wiki/tpu), [Trainium](/wiki/trainium), [HBM](/wiki/high_bandwidth_memory), [TSMC](/wiki/tsmc) fabs and [CoWoS](/wiki/cowos) packaging |
| Systems and networking | Racks, scale-up and scale-out fabrics, cooling | [NVLink](/wiki/nvlink)/[NVSwitch](/wiki/nvswitch), [InfiniBand](/wiki/infiniband), [Spectrum-X](/wiki/spectrum_x) Ethernet, liquid-cooled racks |
| Facilities and power | Buildings, grid connections, generation | Gigawatt campuses, substations, nuclear PPAs, on-site gas turbines |
| Software | Drivers, orchestration, training and serving engines | [CUDA](/wiki/cuda), [PyTorch](/wiki/pytorch)/[JAX](/wiki/jax), [Kubernetes](/wiki/kubernetes)/[Slurm](/wiki/slurm), [vLLM](/wiki/vllm), [TensorRT-LLM](/wiki/tensorrt_llm) |
| Platforms | Selling compute as a service | AWS, Azure, Google Cloud, [Oracle](/wiki/oracle) OCI, [neoclouds](/wiki/neocloud) such as [CoreWeave](/wiki/coreweave) and [Nebius](/wiki/nebius) |

## What is the compute layer?

NVIDIA [GPUs](/wiki/gpu) anchor the market. The [H100](/wiki/h100) generation defined the 2023 to 2024 shortage era; the [Blackwell](/wiki/blackwell) generation ramped through 2025; and the successor Vera Rubin platform is slated for late 2026. NVIDIA's fiscal 2026 results (year ending January 25, 2026) showed total revenue of $215.9 billion, up 65 percent, with data center revenue of $193.7 billion, up 68 percent [3]. AMD is the principal merchant rival: its Instinct MI300X and MI355X parts won deployments at Microsoft, Meta, and OpenAI, and its rack-scale [Helios](/wiki/amd_helios_rack) system, built around the MI450 series, ships in 2026 under a 6 gigawatt supply agreement with [OpenAI](/wiki/openai) [4].

Hyperscalers also build their own silicon to cut costs and reduce NVIDIA dependence. [Google](/wiki/google) deploys TPUs, including the seventh-generation Ironwood offered from late 2025; [Anthropic](/wiki/anthropic) contracted in October 2025 for access to up to one million TPUs, with more than a gigawatt of capacity due in 2026 [5]. Amazon's Trainium2 powers [Project Rainier](/wiki/project_rainier), a multi-site cluster of hundreds of thousands of chips built largely for Anthropic. [Microsoft](/wiki/microsoft) (Maia) and [Meta](/wiki/meta) (MTIA) field their own parts, and OpenAI announced a partnership with [Broadcom](/wiki/broadcom) in October 2025 to co-develop 10 gigawatts of custom accelerators.

Upstream, supply is concentrated. TSMC fabricates nearly all leading-edge accelerators and its CoWoS advanced packaging was the binding constraint on GPU output from 2023 onward; capacity is being expanded from roughly 35,000 wafers per month in late 2024 toward a planned 130,000 by the end of 2026, with NVIDIA reportedly taking more than 60 percent of it [6]. [High-bandwidth memory](/wiki/high_bandwidth_memory) is a three-supplier market in which [SK Hynix](/wiki/sk_hynix) held 57 to 62 percent of revenue across 2025, ahead of Samsung and Micron, with the contest shifting to HBM4 in 2026 [7][8].

## How are AI clusters networked?

Clusters are wired at two scales. Scale-up fabrics make a rack behave like one giant accelerator: NVIDIA's [GB200 NVL72](/wiki/gb200_nvl72) connects 72 Blackwell GPUs and 36 Grace CPUs over fifth-generation NVLink at 1.8 TB/s per GPU, presenting the rack as a single 72-GPU domain [9]. Such racks draw on the order of 120 kW, roughly ten times a conventional rack, which forced a rapid industry shift to direct-to-chip liquid cooling. An open alternative, the Ultra Accelerator Link (UALink) consortium, published its 1.0 specification in April 2025.

Scale-out networks stitch racks into clusters of tens or hundreds of thousands of GPUs. InfiniBand, inherited from NVIDIA's 2020 [Mellanox](/wiki/mellanox) acquisition, long dominated training back-ends, but Ethernet adapted for AI is gaining: NVIDIA sells Spectrum-X, while the [Ultra Ethernet Consortium](/wiki/ultra_ethernet_consortium) (Broadcom, AMD, Arista, and others) released its 1.0 specification in June 2025 [10]. Networking is now a major business in its own right; NVIDIA's networking revenue grew 142 percent in fiscal 2026 on the NVLink fabric ramp [3]. Storage rounds out the system layer: parallel file systems and flash platforms from vendors such as [VAST Data](/wiki/vast_data) and WEKA feed training data and absorb multi-terabyte model checkpoints.

## How big are AI data centers and how are they powered?

Training-class facilities have grown from tens of megawatts to gigawatt campuses. The [Stargate](/wiki/stargate) venture of OpenAI, SoftBank, and Oracle, announced in January 2025 with a target of $500 billion and 10 gigawatts of capacity across the United States by 2029, had reached nearly 7 gigawatts of planned capacity and over $400 billion of committed investment by September 2025, anchored by a 1.2 gigawatt flagship campus in Abilene, Texas developed with [Crusoe](/wiki/crusoe) [11]. (See the [Stargate Project](/wiki/stargate_project) for the full site-by-site buildout.) Meta's Hyperion campus in Richland Parish, Louisiana is designed to scale to about 5 gigawatts [12][13]. [xAI](/wiki/xai) assembled its [Colossus](/wiki/colossus) cluster in Memphis in 2024, standing up more than 100,000 GPUs in under four months, and Anthropic announced a $50 billion US data center program with [Fluidstack](/wiki/fluidstack) in November 2025, starting in Texas and New York [14].

Electricity is the hardest constraint. The International Energy Agency estimated data centers consumed about 415 TWh in 2024, around 1.5 percent of global electricity, and projects roughly 945 TWh by 2030, with the United States accounting for the largest share of growth (up about 130 percent from 2024) [15]. The IEA notes that electricity use by accelerated AI servers is set to grow about 30 percent per year through 2030, far faster than the 9 percent annual growth of conventional servers [15]. Multi-year waits for grid interconnections, transformers, and turbines pushed developers toward on-site gas generation as a bridge at sites including Abilene and Memphis, and toward long-term nuclear contracts:

| Buyer | Supplier and project | Capacity | Announced |
|---|---|---|---|
| Microsoft | [Constellation Energy](/wiki/constellation_energy), Three Mile Island Unit 1 restart (Crane Clean Energy Center) | 835 MW, 20-year PPA | September 2024; restart targeted 2027 or 2028 [16] |
| Google | [Kairos Power](/wiki/kairos_power) [small modular reactors](/wiki/small_modular_reactor) | 500 MW by 2035, first unit via TVA | October 2024 [17] |
| Amazon | [X-energy](/wiki/x_energy) Xe-100 SMRs (about $700 million invested) | Up to 12 reactors, ambition of 5 GW by 2039 | October 2024 [17] |
| Meta | Constellation's Clinton plant, plus deals with Vistra, [TerraPower](/wiki/terrapower), and [Oklo](/wiki/oklo) | 1,121 MW from 2027; up to 6.6 GW total per one tracker | June 2025 onward [17] |

SMR deliveries remain unproven before the 2030s, so near-term load growth is met mostly by gas, grid purchases, and existing nuclear and renewables.

## What software runs on AI infrastructure?

NVIDIA's CUDA platform remains the default target for AI software and a key competitive moat; AMD's ROCm and compiler-level efforts narrow the gap. Models are built in frameworks such as PyTorch and JAX, with distributed-training libraries (Megatron, FSDP, DeepSpeed) sharding work across thousands of accelerators. Clusters are scheduled with Kubernetes, Slurm, or [Ray](/wiki/ray), and large jobs depend on telemetry and checkpointing to survive hardware failures that occur daily at scale.

Inference serving became its own discipline as usage exploded. vLLM, an open-source engine from UC Berkeley, introduced PagedAttention in 2023 to manage key-value cache memory and raise throughput several-fold [18]; [SGLang](/wiki/sglang) and NVIDIA's TensorRT-LLM and Dynamo compete on the same problem. Techniques such as continuous batching, speculative decoding, quantization, and prefill-decode disaggregation directly set the cost per token, making serving software one of the highest-leverage layers of the stack.

## Who builds and sells AI infrastructure?

The hyperscalers dominate spending. Their reported capital expenditures and guidance:

| Company | 2025 capex (actual) | 2026 plan |
|---|---|---|
| Amazon | $131.8B | About $200B [19] |
| Microsoft | About $118B (calendar-year basis) | Near $190B (analyst estimate) [2] |
| Alphabet | $91B | $175-185B guided February 2026, raised to $180-190B in April 2026 [2][20] |
| Meta | About $72B | $125-145B, raised from $115-135B in April 2026 [2] |

Combined, that is roughly $700 billion planned for 2026, nearly double 2025; tallies of 2025 spending range from about $370 billion to over $410 billion depending on lease accounting [1][2]. Oracle vaulted into the top tier through a reported five-year cloud agreement with OpenAI worth about $300 billion [21], part of the roughly $1.4 trillion in total compute commitments OpenAI disclosed across Oracle, Microsoft, NVIDIA, AMD, Broadcom, AWS, and CoreWeave [22].

Below the hyperscalers sit GPU-specialist neoclouds. CoreWeave, which pivoted from cryptocurrency mining and listed on Nasdaq in March 2025, became, in its own words, "the fastest cloud in history to reach $5 billion in annual revenue," reporting $5.13 billion of 2025 revenue (up 168 percent) and a $66.8 billion contract backlog, with 3.1 gigawatts of contracted power and 2026 revenue guidance of $12 billion to $13 billion [23][32]. Nebius signed a five-year deal worth $17.4 billion (expandable to $19.4 billion) to supply Microsoft with dedicated GPU capacity [24], and Microsoft struck a similar multibillion-dollar agreement with [Lambda](/wiki/lambda) in November 2025. [Together AI](/wiki/together_ai), Crusoe, and Fluidstack compete in the same tier. A notable pattern is hyperscalers renting from neoclouds to relieve their own shortages, while model developers spread across every platform: Anthropic, for example, simultaneously uses Google TPUs [5], Amazon's Trainium-based Project Rainier, and a $30 billion Azure compute commitment under which NVIDIA and Microsoft agreed to invest up to $10 billion and $5 billion in the company respectively [25].

## Is the AI infrastructure boom a bubble?

The boom is increasingly debt- and structure-financed. CoreWeave pioneered GPU-backed credit, borrowing $2.3 billion in 2023 and $7.5 billion in May 2024 in a Blackstone- and Magnetar-led facility secured against its NVIDIA fleet, then raising about $18 billion more in debt and equity during 2025 [23][26]. Meta moved its Hyperion campus into a joint venture with Blue Owl Capital that raised $27 billion of debt and $2.5 billion of equity in October 2025, the largest private credit transaction on record, keeping 80 percent of the project off Meta's balance sheet [12][13].

Critics focus on circularity: chip vendors investing in customers who use the money to buy chips. NVIDIA's September 2025 letter of intent to invest up to $100 billion in OpenAI alongside a 10 gigawatt deployment, and AMD's grant to OpenAI of warrants for up to 160 million shares tied to its 6 gigawatt order, drew bubble comparisons in late 2025 [4][27]. NVIDIA ultimately scaled the OpenAI plan back to a $30 billion equity stake in the company's $110 billion round announced in February 2026 [28]. Skeptics also question whether five-to-six-year GPU depreciation schedules match the chips' economic life, and whether revenue can ever cover commitments: OpenAI's roughly $1.4 trillion in obligations stood against about $20 billion of annualized revenue in November 2025 [22], and by February 2026 the company had reset investor expectations to around $600 billion of compute spending through 2030 [29]. Defenders point to sold-out capacity, NVIDIA's continued beat-and-raise results [30], and inference demand from reasoning models. Whether the buildout proves to be a railroad-style overshoot or simply early remains the central open question of the field.

## References

1. CNBC, "Tech megacaps to spend nearly $700 billion in 2026 as cash takes big hit," February 6, 2026. https://www.cnbc.com/2026/02/06/google-microsoft-meta-amazon-ai-cash.html
2. Om Malik, "What I Learned about Hyperscalers' AI Spend," April 30, 2026. https://om.co/2026/04/30/what-i-learned-about-hyperscalers-ai-spend/
3. NVIDIA, Q4 and fiscal 2026 results press release (SEC Form 8-K), February 25, 2026. https://www.sec.gov/Archives/edgar/data/0001045810/000104581026000019/q4fy26pr.htm
4. Bloomberg, "OpenAI's Nvidia, AMD Deals Boost $1 Trillion AI Boom With Circular Deals," October 7, 2025. https://www.bloomberg.com/news/features/2025-10-07/openai-s-nvidia-amd-deals-boost-1-trillion-ai-boom-with-circular-deals
5. CNBC, "Google and Anthropic announce cloud deal worth tens of billions of dollars," October 23, 2025. https://www.cnbc.com/2025/10/23/anthropic-google-cloud-deal-tpu.html
6. FinancialContent, "The Great Packaging Pivot: How TSMC is Doubling CoWoS Capacity to Break the AI Supply Bottleneck through 2026," January 2026. https://markets.financialcontent.com/stocks/article/tokenring-2026-1-1-the-great-packaging-pivot-how-tsmc-is-doubling-cowos-capacity-to-break-the-ai-supply-bottleneck-through-2026
7. Counterpoint Research, "Global DRAM and HBM Market Share: Quarterly." https://counterpointresearch.com/en/insights/global-dram-and-hbm-market-share
8. TrendForce, "SK hynix 2026 Outlook: HBM3E Remains Mainstream, HBM4 Dual Strategy," January 5, 2026. https://www.trendforce.com/news/2026/01/05/news-sk-hynix-2026-outlook-hbm3e-remains-mainstream-hbm4-dual-strategy-amid-triple-market-headwinds/
9. NVIDIA, "GB200 NVL72" product page. https://www.nvidia.com/en-us/data-center/gb200-nvl72/
10. Ultra Ethernet Consortium. https://ultraethernet.org/
11. OpenAI, "OpenAI, Oracle, and SoftBank expand Stargate with five new AI data center sites," September 23, 2025. https://openai.com/index/five-new-stargate-sites/
12. Meta, "Meta Announces Joint Venture with Funds Managed by Blue Owl Capital to Develop Hyperion Data Center," October 2025. https://about.fb.com/news/2025/10/meta-blue-owl-capital-develop-hyperion-data-center/
13. Data Center Dynamics, "Meta forms $27 billion joint venture with Blue Owl to fund gigawatt-scale AI data center campus in Louisiana," October 2025. https://www.datacenterdynamics.com/en/news/meta-forms-27-billion-joint-venture-with-blue-owl-to-fund-gigawatt-scale-ai-data-center-campus-in-louisiana/
14. Anthropic, "Anthropic invests $50 billion in American AI infrastructure," November 2025. https://www.anthropic.com/news/anthropic-invests-50-billion-in-american-ai-infrastructure
15. International Energy Agency, "Energy and AI: Energy demand from AI," April 2025. https://www.iea.org/reports/energy-and-ai/energy-demand-from-ai
16. World Nuclear News, "Constellation to restart Three Mile Island unit, powering Microsoft," September 2024. https://www.world-nuclear-news.org/articles/constellation-to-restart-three-mile-island-unit-powering-microsoft
17. SMR Intel, "Every Nuclear-Powered Data Center Deal: Google, Amazon, Meta and Microsoft," 2026. https://smrintel.com/nuclear-data-center-deals/
18. Kwon et al., "Efficient Memory Management for Large Language Model Serving with PagedAttention," arXiv, September 2023. https://arxiv.org/abs/2309.06180
19. Futurum Group, "Amazon Q4 FY 2025: Revenue Beat, AWS +24% Amid $200B Capex Plan," February 2026. https://futurumgroup.com/insights/amazon-q4-fy-2025-revenue-beat-aws-24-amid-200b-capex-plan/
20. Alphabet, Q4 and fiscal year 2025 results (SEC Form 8-K), February 4, 2026. https://www.sec.gov/Archives/edgar/data/0001652044/000165204426000012/googexhibit991q42025.htm
21. IntuitionLabs, "Oracle and OpenAI's $300B Deal: AI Infrastructure Analysis," 2025. https://intuitionlabs.ai/articles/oracle-openai-300b-deal-analysis
22. TechCrunch, "Sam Altman says OpenAI has $20B ARR and about $1.4 trillion in data center commitments," November 6, 2025. https://techcrunch.com/2025/11/06/sam-altman-says-openai-has-20b-arr-and-about-1-4-trillion-in-data-center-commitments/
23. CoreWeave, Q4 and full year 2025 results press release (SEC Form 8-K), February 26, 2026. https://www.sec.gov/Archives/edgar/data/0001769628/000176962826000094/coreweave4q25earningspress.htm
24. Data Center Dynamics, "Microsoft to use Nebius GPU data centers, in deal worth $17.4bn over five years," September 2025. https://www.datacenterdynamics.com/en/news/microsoft-to-use-nebius-gpu-data-centers-in-deal-worth-174bn-over-five-years/
25. Microsoft, "Microsoft, NVIDIA and Anthropic announce strategic partnerships," November 18, 2025. https://blogs.microsoft.com/blog/2025/11/18/microsoft-nvidia-and-anthropic-announce-strategic-partnerships/
26. CNBC, "AI infrastructure startup CoreWeave raises $7.5 billion in debt deal led by Blackstone," May 17, 2024. https://www.cnbc.com/2024/05/17/ai-startup-coreweave-raises-7point5-billion-in-debt-blackstone-leads.html
27. NBC News, "The AI boom's reliance on circular deals is raising fears of a bubble," October 2025. https://www.nbcnews.com/business/economy/openai-nvidia-amd-deals-risks-rcna234806
28. Crowdfund Insider, "Nvidia Set To Acquire $30 Billion Stake In OpenAI, Potentially Reshaping AI Alliances," February 2026. https://www.crowdfundinsider.com/2026/02/263154-nvidia-set-to-acquire-30-billion-stake-in-openai-potentially-reshaping-ai-alliances/
29. CNBC, "OpenAI resets spending expectations, tells investors compute target is around $600 billion by 2030," February 20, 2026. https://www.cnbc.com/2026/02/20/openai-resets-spend-expectations-targets-around-600-billion-by-2030.html
30. Fortune, "Nvidia smashes Q4 2026 with $68 billion in revenue, and a Q1 outlook that quashes AI bubble talk," February 25, 2026. https://fortune.com/2026/02/25/nvidia-nvda-earnings-q4-results-jensen-huang/
31. NVIDIA, "NVIDIA Announces Financial Results for Third Quarter Fiscal 2026," November 19, 2025. https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-third-quarter-fiscal-2026
32. CoreWeave, "CoreWeave Reports Strong Fourth Quarter and Fiscal Year 2025 Results," February 26, 2026. https://investors.coreweave.com/news/news-details/2026/CoreWeave-Reports-Strong-Fourth-Quarter-and-Fiscal-Year-2025-Results/