Microsoft MAI
Last reviewed
Jun 3, 2026
Sources
9 citations
Review status
Source-backed
Revision
v1 · 2,012 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
9 citations
Review status
Source-backed
Revision
v1 · 2,012 words
Add missing citations, update stale details, or suggest a clearer explanation.
Microsoft MAI is the family of first-party artificial intelligence models built in-house by Microsoft AI, the consumer AI division that Microsoft created in March 2024 and put under Mustafa Suleyman. The name is short for "Microsoft AI." What started in mid-2025 as a pair of experimental models has grown into a full stack of reasoning, coding, image, voice, and transcription systems, most of them announced at the Build 2026 developer conference on June 2, 2026. The push matters because for years Microsoft mostly shipped someone else's models. Its products ran on OpenAI systems, the result of a partnership that began in 2019 and eventually totaled around 13 billion dollars in investment. MAI is Microsoft's attempt to own the model layer itself rather than rent it.
When Suleyman, a co-founder of DeepMind and of Inflection AI, joined Microsoft in March 2024, the company was in an unusual spot. Its OpenAI deal was lucrative, but it also limited what Microsoft could do on its own. Reporting around the partnership described an arrangement that effectively kept Microsoft from pursuing its own artificial general intelligence research and even capped the size of models it could train internally. Suleyman's mandate was to build a real first-party capability anyway, and Microsoft AI spent its first year and a half assembling the talent, data pipelines, and GPU clusters to do it.
The strategic logic is straightforward. Relying on a single outside provider for the core technology in Copilot, Office, Windows, and Azure is a concentration risk, both commercially and operationally. By training its own models, Microsoft can tune them for its products, control costs, and avoid being locked to one vendor's roadmap. Satya Nadella framed the Build 2026 launch as a move from "consuming a frontier model to fully participating at the frontier," and he leaned hard on the word "optionality" throughout the keynote.[1][2] The company has been careful to say this is not a divorce. The OpenAI partnership has been extended, reportedly through 2030, and OpenAI models still ship inside Microsoft products alongside the new MAI ones.[1][3]
The table below lists the MAI models that Microsoft has publicly confirmed. The two 2025 entries were the first proof that the in-house effort could ship; the 2026 entries, unveiled at Build, fill out the family across most major modalities.
| Model | Type | Announced | Key facts |
|---|---|---|---|
| MAI-Voice-1 | Speech generation | Aug 28, 2025 | Generates a full minute of audio in under a second on a single GPU; powers Copilot Daily, Podcasts, and Copilot Labs[4] |
| MAI-1-preview | Foundation LLM | Aug 28, 2025 | First end-to-end-trained Microsoft model; mixture-of-experts; pre- and post-trained on roughly 15,000 NVIDIA H100 GPUs; tested publicly on LMArena[4] |
| MAI-DxO | Medical diagnostic orchestrator | Jun 30, 2025 | Coordinates a virtual panel of model "physicians"; 85.5% accuracy on 304 hard NEJM cases versus about 20% for human doctors[5] |
| MAI-Thinking-1 | Reasoning | Jun 2, 2026 | 35B active / roughly 1T total parameters, sparse MoE, 256K context; trained with zero distillation; matches Claude Opus 4.6 on SWE-Bench Pro[2][6] |
| MAI-Code-1 / MAI-Code-1-Flash | Coding | Jun 2, 2026 | Inference-efficient coding models tuned for GitHub and VS Code; rolling out across all Copilot tiers (Free, Pro, Pro+, Max)[1][6] |
| MAI-Image-2.5 (+ Flash) | Image generation | Jun 2, 2026 | First MAI model for both text-to-image and image-to-image; reported high placements on the Arena image leaderboards[7] |
| MAI-Voice-2 (+ Flash) | Speech generation | Jun 2, 2026 | Successor to MAI-Voice-1; adds 15-plus languages and new voices[7] |
| MAI-Transcribe-1.5 | Speech-to-text | Jun 2, 2026 | Transcription across 43 languages, with streaming support said to be coming[7] |
Microsoft introduced its first two in-house models on August 28, 2025.[4] MAI-1-preview was the headliner, the division's first foundation model trained end to end rather than fine-tuned from something else. It uses a mixture-of-experts design and, according to Microsoft, was pre-trained and post-trained on about 15,000 NVIDIA H100 GPUs. That is a serious but not enormous cluster by frontier standards, which fit the framing at the time: this was a capable consumer-oriented model good at following instructions, not a bid to top every leaderboard. Microsoft put MAI-1-preview on LMArena for public, blind comparison, opened a trusted-tester API, and began rolling it into Copilot for some text tasks.[4]
MAI-Voice-1 launched the same day and, in some ways, made the stronger impression. It is a high-fidelity, expressive speech model that can produce a full minute of audio in under a second on a single GPU, which Microsoft pitched as one of the most efficient speech systems around.[4] It went straight into shipping features, including Copilot Daily and the Podcasts experience, plus a more playful home in Copilot Labs for things like guided meditations and choose-your-own-adventure stories. Voice has been central to Suleyman's vision of an AI "companion," so it is not a coincidence that speech was one of the first two models out the door.
A couple of months earlier, on June 30, 2025, Microsoft AI had published work on MAI-DxO, the Microsoft AI Diagnostic Orchestrator.[5] It is worth treating a little differently from the others. MAI-DxO is not a single trained model but an orchestration system that coordinates multiple language models to act like a panel of physicians, asking follow-up questions, ordering tests, running a cost check, and verifying its own reasoning before committing to a diagnosis. On a benchmark Microsoft built from 304 of the New England Journal of Medicine's hardest case records (the Sequential Diagnosis Benchmark, or SDBench), the orchestrator reached 85.5 percent accuracy when paired with a strong base model, against a mean of roughly 20 percent for 21 practicing physicians on the same cases, and at lower simulated testing cost.[5] Because it is model-agnostic, MAI-DxO is more a preview of the "domain-specific superintelligence" thesis than a product, and it directly informed the medical focus of the team Microsoft would announce that fall.
At Build 2026 on June 2, 2026, Microsoft unveiled seven first-party MAI models at once, the clearest signal yet that the in-house effort had matured from experiment to platform.[1][2][7] The reasoning model, MAI-Thinking-1, drew the most attention. It is a mid-sized sparse mixture-of-experts model with 35 billion active parameters and roughly one trillion total, paired with a 256,000-token context window. Its headline claim is provenance: Microsoft says it was trained from scratch on clean, commercially licensed data with no distillation from any third-party frontier model, and with AI-generated content excluded from pre-training.[6] The team summed up the philosophy as the idea that capabilities should be learned rather than inherited. On benchmarks, MAI-Thinking-1 posts 97.0 percent on AIME 2025 and 94.5 percent on AIME 2026, scores 52.8 percent on SWE-Bench Pro (which Microsoft says puts it level with Claude Opus 4.6 on coding), and was preferred over Claude Sonnet 4.6 in a blind side-by-side of 1,276 tasks run by the data firm Surge.[2][6] It launched in private preview on Microsoft Foundry. The deeper technical detail lives in its own article: MAI-Thinking-1.
Alongside it came the coding line, MAI-Code-1 and a faster MAI-Code-1-Flash, tuned for GitHub and Visual Studio Code and described as inference-efficient. MAI-Code-1-Flash began rolling out across every GitHub Copilot tier (Free, Pro, Pro+, and Max) starting June 2, beginning with a small group of users and widening from there.[1][6] As with the reasoning model, the deep dive belongs in the dedicated MAI-Code-1 article; this overview just situates it in the family.
The rest of the lineup covered the other modalities. MAI-Image-2.5 and a Flash variant are Microsoft's first models to handle both text-to-image and image-to-image generation, landing in PowerPoint, rolling out on OneDrive, and coming to Foundry.[7] MAI-Voice-2 and its Flash variant extend the original voice model to more than 15 additional languages with new voice options, while MAI-Transcribe-1.5 targets state-of-the-art speech-to-text across 43 languages.[7] Taken together, the seven models gave Microsoft a homegrown option in nearly every category where it had previously depended on outside providers.
The research engine behind much of this is the MAI Superintelligence Team, which Microsoft announced on November 6, 2025. Suleyman leads it, joined by Microsoft AI Chief Scientist Karen Simonyan, and the group set out to hire more than 100 researchers across several US cities and London.[8][9] Its stated north star is what Microsoft calls Humanist Superintelligence, or HSI: in the company's words, "incredibly advanced AI capabilities that always work for, in service of, people and humanity more generally."[8] The framing is deliberately narrower than the open-ended AGI race. Microsoft describes HSI as problem-oriented and domain-specific, "not an unbounded and unlimited entity with high degrees of autonomy" but AI that is "carefully calibrated, contextualized, within limits."[8] The team's early focus areas are medical diagnosis (building directly on MAI-DxO), AI companions, and clean energy.
The timing tells its own story. The superintelligence team was announced only after Microsoft and OpenAI restructured their agreement, which reportedly removed the constraints that had kept Microsoft from chasing its own advanced AI and capping model sizes.[9] Freed from those terms, Microsoft could pursue frontier research in-house while keeping the OpenAI relationship intact, which is roughly the "best of both" position Suleyman has described.
A model family is only as useful as the channels it ships through, and for MAI that channel is increasingly Microsoft Foundry (the rebranded Azure AI platform). Most of the 2026 models landed in Foundry first, often in private or public preview, where developers can call them through the same Chat Completions API surface used for other models.[6] Microsoft has positioned Foundry as a model-agnostic layer where MAI models sit next to OpenAI, open-source, and partner models, which is the practical expression of Nadella's "optionality" pitch: customers pick the model that fits the task and the budget rather than being locked to one. The consumer side runs in parallel, with MAI models woven into Copilot, Office apps like PowerPoint, OneDrive, GitHub Copilot, and Windows features.
The reaction has been a mix of genuine respect and healthy skepticism. The "no distillation, clean licensed data" claim for MAI-Thinking-1 is the kind of thing enterprises with compliance and copyright worries actually care about, and if the benchmark numbers hold up under independent testing, a 35-billion-active-parameter model trading blows with the top frontier systems on coding is a real efficiency story. The harder questions are about positioning. Microsoft is still OpenAI's largest backer and biggest distribution partner, so building competing models is a delicate balance, and it is fair to wonder how independent these models really are when the same company keeps extending its OpenAI deal. There is also the usual gap between launch-day benchmarks and lived product quality, which only shows up once millions of Copilot users are leaning on these models day to day. What is clear is the direction: after years of being mainly a distributor of someone else's AI, Microsoft now intends to build its own, top to bottom.