Microsoft Maia 200
Last reviewed
Jun 3, 2026
Sources
13 citations
Review status
Source-backed
Revision
v2 ยท 2,011 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
13 citations
Review status
Source-backed
Revision
v2 ยท 2,011 words
Add missing citations, update stale details, or suggest a clearer explanation.
Microsoft Maia 200 is a custom artificial intelligence accelerator designed by Microsoft for inference workloads in the Azure cloud. It is the second chip in the Azure Maia line, following the Maia 100 that Microsoft announced in November 2023. Microsoft introduced Maia 200 on January 26, 2026, in a blog post by Scott Guthrie, the executive vice president for Cloud and AI, describing it as an accelerator built to lower the cost of generating AI tokens at scale [1][2]. The chip is built on TSMC 3 nanometer technology and is tuned for low precision inference rather than training, which sets it apart from general purpose accelerators that try to serve both jobs [1][3].
The launch came after a long stretch of reporting about delays. Through 2025, several outlets described a next generation Maia design, codenamed Braga, that had slipped from a planned 2025 deployment into 2026 because of design changes, staffing churn, and late feature requests tied to OpenAI [4][5]. The January 2026 announcement confirmed that the chip reported under the Braga codename had shipped as Maia 200 [6]. As of mid 2026 the chip is running inside Microsoft data centers but is not generally available to Azure customers, so much of what follows is Microsoft's own framing rather than independently benchmarked results.
Microsoft started designing its own data center chips to gain more control over cost, supply, and the way hardware and software fit together. At its Ignite conference in November 2023 the company revealed two parts at once. Maia 100 was an AI accelerator aimed at training and serving large language models, and Cobalt 100 was a 128 core Arm based CPU for general cloud compute [7]. Microsoft pitched both as pieces of a systems approach, where the company designs the silicon, the server, the rack, the cooling, and the network together instead of buying parts off the shelf [7].
The motivation is money and supply. Renting and buying NVIDIA GPUs at the scale Azure needs is expensive, and demand has often outrun supply. A first party inference chip lets Microsoft run its own services, and the heaviest external workload it hosts, on hardware it controls. Maia 200 is the clearest expression of that strategy so far, because Microsoft positioned it specifically against the cost of inference rather than as a general replacement for GPUs [1][3].
Microsoft and several technical outlets published a consistent set of specifications on the day of the announcement. The chip is built on TSMC 3 nanometer process. Microsoft's own materials describe it as carrying over 140 billion transistors, while The Register reported 144 billion and TechCrunch wrote that it has over 100 billion, so the exact figure depends on the source [1][3][8]. The memory subsystem uses 216 GB of HBM3e running at 7 TB/s, paired with 272 MB of on chip SRAM that The Register reported is split into compute and transfer partitions [1][3]. Tom's Hardware noted that the on package memory comes across six HBM3e stacks [9].
For compute, Microsoft states that each chip delivers more than 10 petaFLOPS of FP4 performance and more than 5 petaFLOPS of FP8 performance inside a 750 watt system on chip power envelope [1][2]. The hardware tensor cores are built for low precision formats. The Register reported that Maia 200 supports FP8, FP6, and FP4 in hardware, which fits its inference focus [3]. Chips connect to each other at 2.8 TB/s of bidirectional bandwidth, and Microsoft says clusters scale over standard Ethernet to as many as 6,144 accelerators, which The Register translated to roughly 61 exaFLOPS of aggregate FP4 compute and about 1.3 petabytes of HBM3e across a full cluster [1][3].
The table below lists the figures Microsoft and reporters published. Treat them as vendor stated specifications, since there is no independent benchmarking yet.
| Attribute | Reported value | Source |
|---|---|---|
| Process node | TSMC 3 nm (N3) | [1][3] |
| Transistors | Over 140 billion (Microsoft); 144 billion (The Register); over 100 billion (TechCrunch) | [1][3][8] |
| HBM memory | 216 GB HBM3e at 7 TB/s | [1][3] |
| On chip SRAM | 272 MB | [1][3] |
| FP4 compute | Over 10 petaFLOPS | [1][2] |
| FP8 compute | Over 5 petaFLOPS | [1][2] |
| Power envelope | 750 W SoC TDP | [1][3] |
| Chip to chip bandwidth | 2.8 TB/s bidirectional | [1][3] |
| Cluster scale | Up to 6,144 accelerators over Ethernet | [1][3] |
| Efficiency claim | About 30 percent better performance per dollar than current fleet hardware | [1][10] |
Microsoft also said it is previewing a Maia software development kit with PyTorch integration, a Triton compiler, an optimized kernel library, and access to a low level programming language for the chip [1]. The company has invited developers, academics, and frontier AI labs to try the kit, which is a softer form of access than general availability [8].
Maia 200 is meant to serve Microsoft's own AI products and the OpenAI models that run on Azure. Microsoft says the chip is already handling inference and that it supports Copilot, and the company tied it to serving OpenAI's GPT-5.2 family along with Microsoft Foundry models [1][8]. CNBC reported that the chip began running in Microsoft data centers and that, on an April 2026 earnings call, chief executive Satya Nadella said Maia 200 offers over 30 percent improved tokens per dollar compared with the latest silicon in the company's fleet [10].
Deployment so far is limited to a few regions. The Register reported that Maia 200 is running in Microsoft's US Central region near Des Moines, Iowa, with the US West 3 region near Phoenix, Arizona, planned next [3]. Microsoft has framed the rollout as serving internal and partner workloads first. The strongest signal of outside interest came in May 2026, when The Information reported, as relayed by CNBC and Bloomberg, that Anthropic was in early talks with Microsoft about using Maia chips. No deal had been signed, and Microsoft had not opened the chip to Azure customers at that point [10][11].
On May 21, 2026, The Information reported that Anthropic was in early stage talks with Microsoft to rent Azure servers running Maia 200 and use them to serve its Claude models. CNBC and Data Center Dynamics relayed the report the same week [11][13]. The discussions, as described, would move a subset of Claude inference jobs onto Maia instances in Azure data centers, the cost heavy stage of answering user prompts rather than the separate work of training new models [11][13]. The talks were described as unsigned and exploratory, and Microsoft had not opened Maia to general Azure customers, so any outside deployment remained conditional at the time of reporting.
What makes the report notable, according to the coverage, is that it would be the first time Maia silicon ran a workload outside Microsoft itself. Until then, Maia 200 had served only Microsoft's own products and the OpenAI models hosted on Azure, so an Anthropic deployment would make Anthropic the first external customer and give the chip its first test on a frontier model that Microsoft did not build, under latency and reliability targets set by another company [11][13]. Several outlets framed this as the clearest outside validation the program had drawn so far, while cautioning that early stage AI infrastructure talks often do not close.
Reporting tied the discussions to Anthropic's appetite for compute. Demand for the Claude assistant and the Claude Code programming tool grew sharply through early 2026, which pushed Anthropic to look for inference capacity across more than one supplier [11][13]. The arrangement, if it closed, would sit alongside Anthropic's existing compute relationships rather than replace them, and it would give Microsoft a paying customer for Maia beyond its own services. As of early June 2026 neither company had confirmed a signed agreement, and the figures above for Maia's efficiency remained Microsoft's own.
Maia 200 lands in a market where the large cloud providers are all building inference silicon to cut their dependence on NVIDIA. Microsoft drew direct comparisons in its launch materials. It claimed three times the FP4 performance of AWS Trainium 3, Amazon's third generation accelerator, and FP8 performance above Google's seventh generation TPU, the chip Google calls Ironwood [1][8]. These are Microsoft's numbers and have not been verified by outside testing, so they are best read as marketing claims rather than settled results.
Against NVIDIA the picture is more measured. The Register titled its coverage around the idea that Maia 200 promises Blackwell levels of performance at about two thirds of the power, a reference to NVIDIA Blackwell [3]. Even so, the same report noted that Microsoft is not expected to slow its NVIDIA GPU purchases, in part because NVIDIA's Vera Rubin generation is promised to deliver a large jump in inference performance [3]. Maia 200 is an inference part, so it does not replace the GPUs Microsoft uses for training. The realistic role is to take a share of Microsoft's own serving load and improve the economics of that work, not to displace NVIDIA across Azure.
Several things about Maia 200 are still open. The chip is not generally available to Azure customers, so independent users cannot yet test it or buy capacity on it [10][11]. The performance figures all come from Microsoft, and no third party benchmark has confirmed the FP4 and FP8 numbers or the comparisons to Trainium and TPU [1][8]. The transistor count is reported inconsistently across sources, which is a small sign of how much detail still comes secondhand [1][3][8].
Pricing, broad regional availability, and any committed roadmap beyond the first two regions are unconfirmed. The relationship to the earlier Braga reporting is now clear, but the production ramp and yield on TSMC 3 nanometer line were the subject of ongoing reporting rather than disclosed numbers [6][12]. The Anthropic discussions were described as early stage and unsigned, so the prospect of Anthropic becoming Maia's first external customer is reported but not confirmed [11][13]. In short, Maia 200 is a real, announced, and deployed chip, but its place in the wider market still rests largely on claims that the industry has not yet checked.