xAI Colossus
Last reviewed
Jun 3, 2026
Sources
20 citations
Review status
Source-backed
Revision
v2 ยท 1,982 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
20 citations
Review status
Source-backed
Revision
v2 ยท 1,982 words
Add missing citations, update stale details, or suggest a clearer explanation.
Colossus is an artificial-intelligence supercomputer and GPU training cluster operated by xAI in Memphis, Tennessee. First brought online in 2024, it is widely described as one of the largest single-site AI GPU clusters in the world. Its initial phase comprised roughly 100,000 NVIDIA H100 accelerators installed in about 122 days, a build schedule that drew attention across the industry because comparable systems typically take a year or more to deploy. [1][2] The cluster was constructed primarily to train xAI's Grok family of large language models and to provide computing support to the social platform X. [3]
Colossus has also become the focus of an environmental and public-health dispute in Memphis over the on-site methane gas turbines used to power the facility, which a coalition led by the NAACP and the Southern Environmental Law Center alleges were operated without the air-quality permits required under the Clean Air Act. [4][5] In February 2026, the cluster's ownership changed when SpaceX acquired xAI in an all-stock deal, folding the AI lab and Colossus into Elon Musk's spaceflight company. [17][18]
Colossus occupies a repurposed factory in the South Memphis area and is built around NVIDIA Hopper-generation GPUs networked with high-bandwidth Ethernet. [1][6] xAI markets it as the world's largest AI supercomputer, a claim that is contested because several rival clusters do not publish comparable figures and because total GPU counts have shifted rapidly with each expansion. [3][7] The first site is now commonly referred to as Colossus 1 to distinguish it from a much larger second campus, Colossus 2, under construction nearby. [7][8]
The system's defining characteristics are its scale and the speed at which it was assembled. xAI and its hardware partners reported that the facility went from an empty building to a running 100,000-GPU cluster in 122 days, and that only 19 days elapsed between the first server rack reaching the data hall floor and the start of model training. [6] The cluster then roughly doubled in size over an additional 92 days. [1][2]
xAI sited Colossus in a former Electrolux appliance plant in South Memphis, a roughly 785,000-square-foot building that closed in 2020 and was acquired in late 2023. [3] Reusing an existing industrial shell allowed the company to skip the multi-year timeline normally associated with greenfield data center construction. [2][9]
Build partners Dell Technologies and Supermicro supplied and integrated the server hardware, while NVIDIA provided the GPUs and networking platform. [3][6] The first phase reached an operational 100,000-GPU configuration in mid-to-late 2024, with public announcements that summer and fall. [1][6] xAI subsequently doubled the installed GPU count, and through 2025 it continued adding newer accelerators and began the separate Colossus 2 buildout. [1][7]
The cluster is built from Supermicro 4U liquid-cooled systems, each containing one NVIDIA HGX H100 board with eight GPUs and two x86 CPUs. Eight such systems fill a rack, giving 64 GPUs per rack. [6][9] Networking uses the NVIDIA Spectrum-X Ethernet platform rather than InfiniBand, with Spectrum SN5600 switches and BlueField-3 SuperNICs delivering 400 Gb/s of bandwidth to each GPU. [6][10] NVIDIA reported that the design sustained 95 percent data throughput with no packet loss from flow collisions across the network fabric. [10]
Reported GPU counts by phase are summarized below. Figures combine xAI, NVIDIA, and press reporting; later totals mix multiple GPU generations and vary between sources.
| Phase | Approx. date | GPUs | Composition | Notes |
|---|---|---|---|---|
| Phase 1 | 2024 | ~100,000 | NVIDIA H100 | Built in ~122 days [1][6] |
| Phase 2 | Late 2024 | ~200,000 | H100 and H200 | Doubled in ~92 days [1][2] |
| Phase 3 | 2025 | ~230,000 reported | ~150,000 H100, ~50,000 H200, ~30,000 GB200 | Mixed-generation totals vary by source [1][9] |
| Networking element | Specification |
|---|---|
| Fabric | NVIDIA Spectrum-X Ethernet [6][10] |
| Switch | NVIDIA Spectrum SN5600 (up to 800 Gb/s ports) [10] |
| Per-GPU NIC | NVIDIA BlueField-3 SuperNIC, 400 Gb/s [6][10] |
| Per-server bandwidth | ~3.6 Tb/s per HGX H100 system [6] |
Power has been the most constrained resource at the site. The local grid connection was initially limited and was upgraded toward 150 megawatts as the cluster grew, while the running load of the 200,000-GPU configuration was reported at roughly 250 megawatts. [3][9] To bridge the gap between demand and available grid capacity, xAI installed on-site natural-gas turbines. By April 2025, reporting and thermal-imaging analyses indicated about 35 turbines on the property producing on the order of 420 megawatts. [3][9] The company also deployed a large fleet of Tesla Megapack battery units, reported at roughly 168 to 208 units, to smooth the power delivery that GPU training workloads demand. [3][9]
Because the H100-class systems are liquid-cooled, the facility relies on water for heat rejection. Estimates of full-capacity water use ran into the millions of gallons per day, and Memphis officials approved an associated wastewater treatment project to support the campus. [3]
xAI has been developing a second and substantially larger campus, generally called Colossus 2, in the Memphis metropolitan area, including a site in Southaven, Mississippi, just across the state line. [7][8] Reporting describes Colossus 2 as targeting hundreds of thousands of additional GPUs, including a large allocation of NVIDIA GB200 systems, with a long-term ambition of reaching roughly one million GPUs across the combined sites. [1][7] Industry analysts have characterized Colossus 2 as among the first data-center projects designed around more than a gigawatt of power. [7]
The expansion accelerated at the end of 2025. On December 30, 2025, Musk said on X that xAI had bought a third building next to Colossus 2 near Southaven and would convert it into a data center, naming the structure "MACROHARDRR" in a continuation of his "Macrohard" jab at Microsoft. [17][19] With the added building, reporting put the combined Memphis site on track for nearly 2 gigawatts of power, and Colossus 2 was slated to receive more than 555,000 NVIDIA GPUs, a mix of H200, GB200, and newer Blackwell-class GB300 parts. [17][19] The GPUs alone were reported to cost roughly 18 billion dollars, a figure that excludes the servers, networking, and cooling around them, and the 555,000 units represent more than half of xAI's stated one-million-GPU goal at a single location. [17][19]
Colossus exists primarily to train and serve xAI's models. The cluster's stated purpose is to train the company's Grok chatbot and to provide compute for related products and for the X platform owned by Elon Musk. [3] In May 2026, the cluster's role broadened when Anthropic agreed to rent the full capacity of Colossus 1, described in reporting as more than 220,000 NVIDIA GPUs and over 300 megawatts of compute, under a contract reported at roughly 1.25 billion dollars per month. [11][12] The arrangement, struck between competing AI developers, was widely covered as a marker of how scarce large-scale training capacity had become. [11][12]
On February 2, 2026, SpaceX agreed to acquire xAI in an all-stock transaction, making the AI lab a wholly owned subsidiary and bringing Colossus under the same corporate roof as SpaceX's launch business and the Starlink satellite-internet network. Reporting valued SpaceX at about 1 trillion dollars and xAI at about 250 billion dollars, a combined roughly 1.25 trillion dollars that contemporaneous coverage called the largest merger ever announced. [17][18] Musk framed the deal in part as a step toward eventual orbital data centers and as a way to fund xAI's cash-intensive buildout from SpaceX's balance sheet ahead of a planned SpaceX public offering; later in 2026 the AI unit was rebranded as a SpaceX division. The mechanics, valuations, and downstream effects on Colossus are covered in detail in the SpaceX-xAI merger article. [18][20]
Colossus has been at the center of a sustained dispute over air quality in South Memphis. The facility sits near Boxtown and other predominantly Black neighborhoods that environmental advocates describe as already carrying a heavy industrial-pollution burden; the Southern Environmental Law Center (SELC) has cited an existing cancer-risk estimate for the area at roughly four times the national average. [13][14]
The dispute centers on the gas turbines. The SELC, acting on behalf of the NAACP, contended that xAI ran as many as 35 unpermitted methane gas turbines at Colossus 1 without the air-quality permits required for major pollution sources, and that the turbines emit nitrogen oxides and other pollutants linked to respiratory and cardiovascular disease. [13][14] xAI later obtained a county permit for 15 permanent turbines and reported removing unpermitted units at the South Memphis site. [13][15]
The conflict escalated to litigation over the second campus. In April 2026, the NAACP and its Mississippi State Conference, represented by Earthjustice and the SELC, brought a Clean Air Act case alleging that xAI was operating 27 unpermitted methane gas turbines at the Colossus 2 power plant in Southaven, Mississippi. [4][5][16] The complaint asserts that the plant has the potential to emit more than 1,700 tons of smog-forming nitrogen oxides per year, which would make it among the largest industrial NOx sources in the greater Memphis area. [4][16] xAI has defended its operations and the speed of its Memphis buildout, and a regulatory backdrop shifted in early 2026 when the U.S. Environmental Protection Agency clarified that large methane gas turbines require permits even for temporary operation. [3][16]
Colossus is frequently cited as a turning point in how quickly hyperscale AI training infrastructure can be deployed, compressing what had been multi-year schedules into months by reusing an existing building, standardizing on liquid-cooled NVIDIA GPU racks, and adopting Ethernet rather than InfiniBand at extreme scale. [2][6] At the same time, it has become a prominent example of the tension between rapid AI expansion and local environmental and public-health concerns, particularly where on-site fossil-fuel generation is used to meet power demand that the grid cannot immediately supply. [4][13] The combination of record build speed, contested scale claims, and an active Clean Air Act lawsuit has made the facility a recurring reference point in debates over the costs and siting of AI compute. [7][16]