AI Infrastructure
Last reviewed
Jun 9, 2026
Sources
30 citations
Review status
Source-backed
Revision
v1 ยท 2,359 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 9, 2026
Sources
30 citations
Review status
Source-backed
Revision
v1 ยท 2,359 words
Add missing citations, update stale details, or suggest a clearer explanation.
AI infrastructure is the stack of hardware, facilities, and software that trains and serves artificial intelligence models: accelerator chips, the servers and networks that bind them into clusters, the data centers and power supplies that house and feed them, the orchestration and inference software that keeps them utilized, and the cloud platforms that sell the resulting capacity. Between 2024 and 2026 it became one of the largest capital programs in business history. The four largest cloud companies spent close to $400 billion on capital projects in 2025 and announced 2026 plans approaching $700 billion [1][2], while NVIDIA, the dominant accelerator vendor, booked $193.7 billion of data center revenue in its fiscal year ending January 25, 2026 [3].
AI workloads split into training, which concentrates enormous compute on a single job for weeks or months, and inference, which serves billions of smaller requests with strict latency targets. Training pushed the industry toward ever-larger synchronized clusters; the shift to reasoning models and AI agents in 2025 and 2026 made inference capacity the faster-growing demand. Both run on the same layered stack, and each layer has its own bottleneck: memory and packaging at the silicon layer, interconnect bandwidth at the systems layer, electricity at the facilities layer, and capital itself at the platform layer.
| Layer | Role | Representative technologies and providers |
|---|---|---|
| Silicon | Accelerators, memory, fabrication | NVIDIA and AMD GPUs, TPUs, Trainium, HBM, TSMC fabs and CoWoS packaging |
| Systems and networking | Racks, scale-up and scale-out fabrics, cooling | NVLink/NVSwitch, InfiniBand, Spectrum-X Ethernet, liquid-cooled racks |
| Facilities and power | Buildings, grid connections, generation | Gigawatt campuses, substations, nuclear PPAs, on-site gas turbines |
| Software | Drivers, orchestration, training and serving engines | CUDA, PyTorch/JAX, Kubernetes/Slurm, vLLM, TensorRT-LLM |
| Platforms | Selling compute as a service | AWS, Azure, Google Cloud, Oracle OCI, neoclouds such as CoreWeave and Nebius |
NVIDIA GPUs anchor the market. The H100 generation defined the 2023 to 2024 shortage era; the Blackwell generation ramped through 2025; and the successor Vera Rubin platform is slated for late 2026. NVIDIA's fiscal 2026 results (year ending January 25, 2026) showed total revenue of $215.9 billion, up 65 percent, with data center revenue of $193.7 billion, up 68 percent [3]. AMD is the principal merchant rival: its Instinct MI300X and MI355X parts won deployments at Microsoft, Meta, and OpenAI, and its rack-scale Helios system, built around the MI450 series, ships in 2026 under a 6 gigawatt supply agreement with OpenAI [4].
Hyperscalers also build their own silicon to cut costs and reduce NVIDIA dependence. Google deploys TPUs, including the seventh-generation Ironwood offered from late 2025; Anthropic contracted in October 2025 for access to up to one million TPUs, with more than a gigawatt of capacity due in 2026 [5]. Amazon's Trainium2 powers Project Rainier, a multi-site cluster of hundreds of thousands of chips built largely for Anthropic. Microsoft (Maia) and Meta (MTIA) field their own parts, and OpenAI announced a partnership with Broadcom in October 2025 to co-develop 10 gigawatts of custom accelerators.
Upstream, supply is concentrated. TSMC fabricates nearly all leading-edge accelerators and its CoWoS advanced packaging was the binding constraint on GPU output from 2023 onward; capacity is being expanded from roughly 35,000 wafers per month in late 2024 toward a planned 130,000 by the end of 2026, with NVIDIA reportedly taking more than 60 percent of it [6]. High-bandwidth memory is a three-supplier market in which SK Hynix held 57 to 62 percent of revenue across 2025, ahead of Samsung and Micron, with the contest shifting to HBM4 in 2026 [7][8].
Clusters are wired at two scales. Scale-up fabrics make a rack behave like one giant accelerator: NVIDIA's GB200 NVL72 connects 72 Blackwell GPUs and 36 Grace CPUs over fifth-generation NVLink at 1.8 TB/s per GPU, presenting the rack as a single 72-GPU domain [9]. Such racks draw on the order of 120 kW, roughly ten times a conventional rack, which forced a rapid industry shift to direct-to-chip liquid cooling. An open alternative, the Ultra Accelerator Link (UALink) consortium, published its 1.0 specification in April 2025.
Scale-out networks stitch racks into clusters of tens or hundreds of thousands of GPUs. InfiniBand, inherited from NVIDIA's 2020 Mellanox acquisition, long dominated training back-ends, but Ethernet adapted for AI is gaining: NVIDIA sells Spectrum-X, while the Ultra Ethernet Consortium (Broadcom, AMD, Arista, and others) released its 1.0 specification in June 2025 [10]. Networking is now a major business in its own right; NVIDIA's networking revenue grew 142 percent in fiscal 2026 on the NVLink fabric ramp [3]. Storage rounds out the system layer: parallel file systems and flash platforms from vendors such as VAST Data and WEKA feed training data and absorb multi-terabyte model checkpoints.
Training-class facilities have grown from tens of megawatts to gigawatt campuses. The Stargate venture of OpenAI, SoftBank, and Oracle, announced in January 2025 with a target of $500 billion and 10 gigawatts, had reached nearly 7 gigawatts of planned capacity and over $400 billion of committed investment by September 2025, anchored by a 1.2 gigawatt flagship campus in Abilene, Texas developed with Crusoe [11]. Meta's Hyperion campus in Richland Parish, Louisiana is designed to scale to about 5 gigawatts [12][13]. xAI assembled its Colossus cluster in Memphis in 2024, standing up more than 100,000 GPUs in under four months, and Anthropic announced a $50 billion US data center program with Fluidstack in November 2025, starting in Texas and New York [14].
Electricity is the hardest constraint. The International Energy Agency estimated data centers consumed about 415 TWh in 2024, around 1.5 percent of global electricity, and projects roughly 945 TWh by 2030, with the United States accounting for the largest share of growth (up about 130 percent from 2024) [15]. Multi-year waits for grid interconnections, transformers, and turbines pushed developers toward on-site gas generation as a bridge at sites including Abilene and Memphis, and toward long-term nuclear contracts:
| Buyer | Supplier and project | Capacity | Announced |
|---|---|---|---|
| Microsoft | Constellation Energy, Three Mile Island Unit 1 restart (Crane Clean Energy Center) | 835 MW, 20-year PPA | September 2024; restart targeted 2027 or 2028 [16] |
| Kairos Power small modular reactors | 500 MW by 2035, first unit via TVA | October 2024 [17] | |
| Amazon | X-energy Xe-100 SMRs (about $700 million invested) | Up to 12 reactors, ambition of 5 GW by 2039 | October 2024 [17] |
| Meta | Constellation's Clinton plant, plus deals with Vistra, TerraPower, and Oklo | 1,121 MW from 2027; up to 6.6 GW total per one tracker | June 2025 onward [17] |
SMR deliveries remain unproven before the 2030s, so near-term load growth is met mostly by gas, grid purchases, and existing nuclear and renewables.
NVIDIA's CUDA platform remains the default target for AI software and a key competitive moat; AMD's ROCm and compiler-level efforts narrow the gap. Models are built in frameworks such as PyTorch and JAX, with distributed-training libraries (Megatron, FSDP, DeepSpeed) sharding work across thousands of accelerators. Clusters are scheduled with Kubernetes, Slurm, or Ray, and large jobs depend on telemetry and checkpointing to survive hardware failures that occur daily at scale.
Inference serving became its own discipline as usage exploded. vLLM, an open-source engine from UC Berkeley, introduced PagedAttention in 2023 to manage key-value cache memory and raise throughput several-fold [18]; SGLang and NVIDIA's TensorRT-LLM and Dynamo compete on the same problem. Techniques such as continuous batching, speculative decoding, quantization, and prefill-decode disaggregation directly set the cost per token, making serving software one of the highest-leverage layers of the stack.
The hyperscalers dominate spending. Their reported capital expenditures and guidance:
| Company | 2025 capex (actual) | 2026 plan |
|---|---|---|
| Amazon | $131.8B | About $200B [19] |
| Microsoft | About $118B (calendar-year basis) | Near $190B (analyst estimate) [2] |
| Alphabet | $91B | $175-185B guided February 2026, raised to $180-190B in April 2026 [2][20] |
| Meta | About $72B | $125-145B, raised from $115-135B in April 2026 [2] |
Combined, that is roughly $700 billion planned for 2026, nearly double 2025; tallies of 2025 spending range from about $370 billion to over $410 billion depending on lease accounting [1][2]. Oracle vaulted into the top tier through a reported five-year cloud agreement with OpenAI worth about $300 billion [21], part of the roughly $1.4 trillion in total compute commitments OpenAI disclosed across Oracle, Microsoft, NVIDIA, AMD, Broadcom, AWS, and CoreWeave [22].
Below the hyperscalers sit GPU-specialist neoclouds. CoreWeave, which pivoted from cryptocurrency mining and listed on Nasdaq in March 2025, passed $5 billion in 2025 revenue with a $66.8 billion contract backlog [23]. Nebius signed a five-year deal worth $17.4 billion (expandable to $19.4 billion) to supply Microsoft with dedicated GPU capacity [24], and Microsoft struck a similar multibillion-dollar agreement with Lambda in November 2025. Together AI, Crusoe, and Fluidstack compete in the same tier. A notable pattern is hyperscalers renting from neoclouds to relieve their own shortages, while model developers spread across every platform: Anthropic, for example, simultaneously uses Google TPUs [5], Amazon's Trainium-based Project Rainier, and a $30 billion Azure compute commitment under which NVIDIA and Microsoft agreed to invest up to $10 billion and $5 billion in the company respectively [25].
The boom is increasingly debt- and structure-financed. CoreWeave pioneered GPU-backed credit, borrowing $2.3 billion in 2023 and $7.5 billion in May 2024 in a Blackstone- and Magnetar-led facility secured against its NVIDIA fleet, then raising about $18 billion more in debt and equity during 2025 [23][26]. Meta moved its Hyperion campus into a joint venture with Blue Owl Capital that raised $27 billion of debt and $2.5 billion of equity in October 2025, the largest private credit transaction on record, keeping 80 percent of the project off Meta's balance sheet [12][13].
Critics focus on circularity: chip vendors investing in customers who use the money to buy chips. NVIDIA's September 2025 letter of intent to invest up to $100 billion in OpenAI alongside a 10 gigawatt deployment, and AMD's grant to OpenAI of warrants for up to 160 million shares tied to its 6 gigawatt order, drew bubble comparisons in late 2025 [4][27]. NVIDIA ultimately scaled the OpenAI plan back to a $30 billion equity stake in the company's $110 billion round announced in February 2026 [28]. Skeptics also question whether five-to-six-year GPU depreciation schedules match the chips' economic life, and whether revenue can ever cover commitments: OpenAI's roughly $1.4 trillion in obligations stood against about $20 billion of annualized revenue in November 2025 [22], and by February 2026 the company had reset investor expectations to around $600 billion of compute spending through 2030 [29]. Defenders point to sold-out capacity, NVIDIA's continued beat-and-raise results [30], and inference demand from reasoning models. Whether the buildout proves to be a railroad-style overshoot or simply early remains the central open question of the field.