Microsoft Azure Maia 100
Last reviewed
May 18, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,261 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 18, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,261 words
Add missing citations, update stale details, or suggest a clearer explanation.
The Microsoft Azure Maia 100, often referred to as Maia 100, is a custom artificial intelligence accelerator designed by Microsoft for large-scale generative AI workloads running inside its Azure cloud. It was unveiled on November 15, 2023, during the opening day of Microsoft Ignite in Seattle, alongside an Arm-based server CPU called Cobalt 100.[^1][^2][^3] The chip was Microsoft's first internally designed AI accelerator and the public-facing product of a multi-year effort codenamed "Project Athena" that had been under way inside the company since roughly 2019.[^4][^5]
Maia 100 was positioned as a vertically integrated piece of silicon, co-designed with OpenAI and built specifically to run workloads such as Microsoft Copilot, the Azure OpenAI Service, GitHub Copilot and Bing's generative features.[^1][^2][^6] Manufactured on TSMC's 5-nanometer (N5) process with CoWoS-S advanced packaging, the chip packs 105 billion transistors on a reticle-sized die of roughly 820 mm², making it one of the largest production silicon devices on N5 at the time of its announcement.[^1][^7][^8] It is paired with 64 GB of HBM2E memory delivering 1.8 TB/s of bandwidth, a design point that observers noted was conservative compared to the HBM3-equipped accelerators shipping from NVIDIA and AMD at the same time.[^7][^9][^10]
Although Microsoft initially indicated that Maia 100 would begin rolling out in early 2024 to power Copilot, Azure OpenAI Service and other internal workloads, subsequent reporting in 2024 and 2025 indicated that the chip was never deployed at the volumes Microsoft had originally targeted. Multiple sources, including The Information and Tom's Hardware, reported that Maia 100 ended up being used primarily for internal staff training and validation rather than as a production AI accelerator for OpenAI's flagship models, and that the follow-on inference part — codenamed "Braga" and later released as the Maia 200 — was delayed into 2026.[^11][^12][^13] The Maia 200 was formally unveiled by Microsoft on January 26, 2026, with deployments beginning in the company's US Central region near Des Moines, Iowa.[^14][^15][^16]
Microsoft's drive toward custom silicon dates back to the mid-2010s, when the company moved from buying turnkey datacenter equipment to designing its own servers, racks, and networking hardware in collaboration with the Open Compute Project.[^2][^17] In 2017 Microsoft began openly publishing custom server designs through OCP, and by 2019 it had begun work on what would become its first AI accelerator. The Information first reported the existence of the project — internally codenamed "Athena" — in April 2023, several months before the public Ignite 2023 unveiling.[^5][^18]
The motivation for Athena was both economic and strategic. By 2023 Microsoft had become the world's largest external customer of NVIDIA H100 GPUs through its partnership with OpenAI, and the global H100 backlog stretched into 2024 with TSMC's CEO publicly warning that the GPU shortage could persist through 2025.[^4][^18] Microsoft was simultaneously paying for billions of dollars of capacity at neoclouds like CoreWeave and offering refunds for unused GPU reservations.[^4] Owning a portion of the inference and training stack would, in principle, give Microsoft leverage over pricing, supply timing, and ultimately the per-token economics of its AI services.[^9][^11]
The chip was developed by Microsoft's Azure Hardware Systems and Infrastructure (AHSI) group under corporate vice president Rani Borkar, a former Intel executive who joined Microsoft in 2019 to lead its silicon teams.[^2][^5] Microsoft technical fellow Brian Harry led the engineering organization that designed the chip, and Pat Stemen served as a partner program manager for the broader AHSI effort.[^2] By 2023 SemiAnalysis reported that Microsoft had spent on the order of two billion dollars on its silicon initiatives and employed close to a thousand engineers across its various chip teams, including former Apple chip architect Mike Filippo.[^5]
Microsoft has not published a formal etymology for the name "Maia." In Greco-Roman mythology, Maia is the eldest of the seven Pleiades — daughters of Atlas and Pleione — and the mother of Hermes; the name also denotes "midwife" or "mother" in archaic Greek and lends its name to the month of May.[^19] The Maia 100 is the first product in a numbered series that continues with Maia 200 (codename "Braga"), Maia 200-Refresh ("Braga-R") and a planned 2027 successor known internally as "Clea."[^13][^16] The companion Arm CPU line is named for the cobalt-blue stars in the Pleiades cluster, reinforcing the celestial-naming theme.[^2]
When Maia 100 launched in November 2023 it was one half of a coordinated two-chip announcement at Ignite. The companion product, the Azure Cobalt 100, was a 128-core Arm-based server CPU built on Arm's Neoverse Compute Subsystems (CSS) N2 platform, combining two 64-core Neoverse N2 tiles on a single TSMC 5-nm die running at 3.4 GHz.[^20][^21][^22] Cobalt 100 entered preview in May 2024 and reached general availability in October 2024, by which time it had been deployed in 32 Azure regions to host workloads such as Microsoft Teams and Azure SQL Database.[^21][^22] At Ignite 2024 Microsoft expanded the family by announcing the Azure Boost DPU, its first in-house data processing unit, derived from technology acquired in the December 2022 acquisition of Fungible for approximately $190 million, and the Azure Integrated HSM, an in-house hardware security module.[^23][^24] Together, Maia, Cobalt, Boost and the HSM make up what Microsoft calls a "trifecta" of cloud silicon spanning AI compute, general-purpose compute, network/storage offload and confidential computing.[^23]
The most authoritative public technical description of Maia 100 came at the Hot Chips 2024 conference in August 2024, where partner SoC architect Sherry Xu and engineering lead Sairam Ramakrishnan presented a 17-slide deck titled "Inside Maia 100."[^6][^7][^25] Microsoft followed up with a companion technical post on its Azure Infrastructure blog the same month.[^6]
Maia 100 is fabricated on TSMC's N5 (5-nanometer) process and uses TSMC's CoWoS-S 2.5D interposer packaging to integrate the main system-on-chip die with four stacks of HBM2E memory.[^6][^7][^8] The SoC die is reticle-sized at approximately 820 mm², and the device contains 105 billion transistors — what SemiAnalysis described as "the highest transistor count monolithic die that has ever been disclosed publicly" at the time of unveiling.[^5][^7][^8] The chip is provisioned at 500 watts in production, but is designed to support a thermal envelope of up to 700 watts.[^7][^8][^25]
Maia 100's SoC is organized into 16 clusters of four "tiles" each, for a total of 64 tiles per chip.[^6][^7] Every cluster includes a Cluster Control Processor (CCP) and a Cluster Data Movement Engine (CDMA) that manage access to the L2 cache.[^7] Each tile contains:
In aggregate the chip exposes 500 MB of software-managed L1/L2 scratchpad SRAM — among the largest on-die SRAM footprints disclosed for a contemporary AI accelerator.[^7][^9] Microsoft says the scratchpad is software-managed rather than hardware-cached, allowing the compiler to schedule data movement explicitly to avoid cache pollution during long-running matmuls.[^6][^7]
Microsoft has been deliberately reticent about Maia 100's headline FLOPS figures. The Hot Chips presentation listed peak dense-tensor throughput in petaoperations per second (POPS) of 3 POPS at 6-bit precision, 1.5 POPS at 9-bit, and approximately 0.8 POPS at BF16.[^25] Third-party reporters noted the chip's emphasis on the OCP Microscaling formats — which are co-developed under the Open Compute Project by Microsoft, Meta, NVIDIA, AMD, Arm, Intel and Qualcomm — and on a separate data-compression engine designed to reduce the bandwidth and capacity pressure that often bottlenecks large-language-model inference.[^6][^9][^10][^25] An onboard image-decoder block and confidential-compute features for tenant isolation are also disclosed.[^25]
The accelerator pairs the SoC with four HBM2E stacks, providing 64 GB of high-bandwidth memory at an aggregate 1.8 TB/s.[^1][^7][^8] Microsoft's decision to use HBM2E rather than the HBM3 or HBM3e that NVIDIA, AMD and Google were adopting at the same time has been widely commented on; observers describe it as a deliberately conservative choice intended to control cost and de-risk supply, but one that left Maia 100 capacity- and bandwidth-constrained relative to competing accelerators by the time it was deployed.[^9][^10][^11] SemiAnalysis later wrote that "as expected of first generation silicon, Maia 100 was not manufactured in high volume or deployed for production workloads. The chip was architected before the Gen AI boom, leaving it short on memory bandwidth suitable for inference."[^11]
Maia 100's most distinctive system-level feature is its networking. Rather than adopt NVIDIA's proprietary NVLink/InfiniBand stack, Microsoft built an Ethernet-based scale-up and scale-out fabric.[^6][^7][^25] Each Maia 100 chip exposes 12 ports of 400 Gigabit Ethernet, yielding 4.8 Tb/s of aggregate per-chip bandwidth.[^1][^7][^25] Microsoft uses a custom RoCE-like (RDMA over Converged Ethernet) protocol with AES-GCM encryption, supports confidential-compute traffic over the back-end fabric, and reports collective-operation bandwidths of up to 4,800 Gbps for all-gather/reduce-scatter and 1,200 Gbps for all-to-all on a single accelerator.[^6][^7][^25] Three of the twelve Ethernet ports are typically used for intra-node communication between Maia chips on the same server tray, and the remaining nine connect to top-of-rack and aggregation switches in a multi-plane topology.[^7]
This Ethernet-first design — paired with Microsoft's contributions to the Open Compute Project and its membership in the Ultra Ethernet Consortium — marked a clear strategic divergence from NVIDIA's vertically integrated NVLink + InfiniBand stack, and aligned with similar bets by Meta and Google.[^7][^9][^25]
Maia 100 ships with a dedicated software development kit, often called the Maia SDK, designed to make AI workloads portable from existing GPU code-bases.[^6] The SDK exposes two programming models:
The SDK includes a first-class PyTorch backend that supports both eager and graph execution modes; integrations with ONNX Runtime; a kernel debugger, profiler and visualizer; model-quantization tooling; an inter-Maia communications library that maps NCCL-style collectives onto the Ethernet fabric; and an administration utility called maia-smi modeled on NVIDIA's nvidia-smi.[^6] Microsoft says that simple PyTorch models can be retargeted to Maia with a single line of code.[^6] Models destined for Azure OpenAI Service are typically traced into ONNX or compiled directly through the Triton path before being loaded onto Maia hardware.[^6]
Maia 100 is not delivered as a drop-in PCIe card. Microsoft built an entirely new server and rack assembly to host the part because none of the company's existing datacenter racks could accommodate the chip's power, networking and cooling profile.[^2][^17][^27]
The Maia rack is substantially wider than the standard 19-inch racks Microsoft uses elsewhere in Azure, in order to accommodate the dense networking cable harness and high-current power distribution that Maia 100's 500 W operating point and 12×400 GbE per chip require.[^2][^17][^27] Each rack hosts custom-designed server boards containing multiple Maia 100 accelerators along with general-purpose CPU heads. Microsoft has declined to publish exact accelerator counts per rack for the Maia 100 generation, but in subsequent reporting on the follow-on Maia 200 system The Next Platform indicated that a coherent Maia 100 cluster domain consisted of 576 nodes hosting a total of 2,304 compute engines — implying four accelerators per node.[^16]
Because Maia 100 is a liquid-cooled part operating at 500–700 watts and Microsoft's existing Azure datacenters were not built with the large facility-level chilled-liquid plant required for traditional direct-to-chip cooling, Microsoft engineered an in-aisle companion called the sidekick.[^2][^17][^27] The sidekick is a self-contained cooling unit that sits adjacent to each Maia rack and operates analogously to an automotive radiator: cold liquid flows from the sidekick to cold plates affixed to each Maia 100 chip, the plates' microchannels absorb heat from the silicon, and the warmed liquid returns to the sidekick where it is recooled and recirculated.[^2][^17][^27]
This closed-loop, rack-adjacent design allowed Microsoft to deploy Maia hardware into existing datacenters that lacked facility-scale liquid plant — a real constraint given the company's global footprint of pre-AI-era halls — and was later described by Microsoft as the basis for its broader move toward direct-to-chip and microfluidic cooling for future generations of silicon.[^27][^28]
When Microsoft unveiled Maia 100 on November 15, 2023, the company said the chip would begin "rolling out early next year" to Azure datacenters, where it would initially run Microsoft Copilot, Azure OpenAI Service, GitHub Copilot, Bing search and other internal AI workloads.[^1][^2][^4] In a Bloomberg interview around the announcement, Rani Borkar said Microsoft was already testing Maia 100 against the Bing chatbot and GitHub Copilot, and against GPT-3.5-Turbo running on OpenAI's behalf — notably not GPT-4.[^29]
OpenAI's involvement was central to Microsoft's framing of the chip. In the announcement materials, OpenAI CEO Sam Altman said: "Since first partnering with Microsoft, we've collaborated to co-design Azure's AI infrastructure at every layer for our models and unprecedented training needs… Azure's end-to-end AI architecture, now optimized down to the silicon with Maia, paves the way for training more capable models and making those models cheaper for our customers."[^2] Microsoft executive vice president of Cloud and AI Scott Guthrie was quoted in parallel: "At scale, it's important for us to optimize and integrate every layer of the infrastructure stack to maximize performance, diversify our supply chain and give customers infrastructure choice."[^1][^2]
Public information about Maia 100's production deployment is sparse, and by 2024 it had become clear that the chip was not deployed at the scale Microsoft had implied. Several reports converged on a picture in which Maia 100 served primarily as a learning vehicle for Microsoft's silicon and software teams rather than as a major production accelerator:
Microsoft itself has continued to point to Maia 100 as a deployed accelerator powering Copilot and Azure OpenAI workloads, but the company has not disclosed concrete deployment volumes, performance numbers against NVIDIA H100 or H200, or customer-facing Maia VM products, and external Maia 100 instances have not been generally available on Azure.[^1][^6][^9][^11]
In the wake of Maia 100's limited rollout, Microsoft built a multi-generation roadmap of inference-oriented successors:
The Maia 200 announcement omitted any external Azure VM availability and was framed entirely around Microsoft's internal AI services such as OpenAI API inference, Microsoft Foundry, and the Office 365 family of copilots — a continuation of the internal-only deployment pattern set by Maia 100.[^16]
Maia 100 arrived in a market that had already produced multiple generations of hyperscaler-designed accelerators. Google's TPU line had been in production since 2015; AWS Trainium had reached general availability with the launch of its second-generation product in 2024; and Meta had begun publicly discussing its MTIA accelerator in 2023.[^10][^30] Maia 100 made Microsoft the last of the four major US hyperscalers — Amazon, Google, Meta, and Microsoft — to unveil a public-facing in-house AI accelerator.[^5]
Reaction within the technical press was mixed. ServeTheHome, Tom's Hardware and The Next Platform praised the chip's Ethernet-first scale-up fabric, large software-managed SRAM, and aggressive adoption of the OCP microscaling number formats.[^7][^25][^31] Less flattering coverage centered on Maia 100's HBM2E choice. TechRadar wrote that Microsoft "deliberately chose to use old tech for its NVIDIA GPU rival," noting that the chip's HBM2E memory capped it at 64 GB and 1.8 TB/s at a time when NVIDIA H100 was shipping with 80 GB of HBM3 at 3.35 TB/s and NVIDIA H200 would shortly arrive with HBM3e.[^10]
Against the leading merchant and hyperscaler accelerators in 2024, Maia 100 occupied a distinct niche. Approximate, publicly-disclosed numbers are summarized below:
Maia 100's defining differentiator is therefore not raw tensor throughput — where it lagged H100 and MI300X — but its very large on-chip SRAM, its Ethernet-based fabric, and Microsoft's tight vertical integration of compiler, runtime and rack hardware.[^7][^9][^25][^31]
By the time Microsoft announced Maia 200 in January 2026, public assessments of Maia 100's commercial significance were uniformly muted. Bloomberg, Tom's Hardware and DCD wrote that Microsoft had effectively used Maia 100 as a first iteration to debug its silicon, packaging, networking and software pipeline rather than as a major production accelerator, and was placing its real volume bet on Maia 200 (which DIGITIMES reported was expected to ship at more than ten times the volume of Maia 100).[^11][^12][^13][^15] In parallel, Microsoft expanded — rather than contracted — its commitments to NVIDIA and AMD: at Ignite 2023 the company simultaneously announced new NC H100 v5 and ND H200 v5 GPU instances and the ND MI300 v5 series based on the AMD Instinct MI300X, and it continued to be one of NVIDIA's largest customers through 2025 and 2026.[^1][^2][^32]
The Maia 100 launch nevertheless had two lasting consequences. First, it established a durable Microsoft AI silicon team and product cadence: by 2026 the company had a multi-year roadmap covering Maia 200, Braga-R and Clea on the AI side and Cobalt 100 and Cobalt 200 on the CPU side, supplemented by the Azure Boost DPU and Azure Integrated HSM.[^13][^16][^23] Second, it cemented Microsoft's strategic choice of an Ethernet-based, OCP-aligned, open-standards scale-up fabric in opposition to NVIDIA's vertically integrated NVLink + InfiniBand stack, a choice subsequently doubled down on in Maia 200's "AI Transport Layer" Ethernet design and in Microsoft's leadership role in the Ultra Ethernet Consortium.[^6][^7][^16][^25]
In that sense, even though the chip itself never powered OpenAI's flagship models at scale, Maia 100's most important legacy may lie not in the silicon it shipped but in the platform — racks, sidekicks, compilers, ONNX integration and Ethernet fabric — that it forced Microsoft to design and that succeeding generations of Azure AI silicon now inherit.