MAI-Code-1
Last reviewed
Jun 3, 2026
Sources
7 citations
Review status
Source-backed
Revision
v1 · 1,581 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
7 citations
Review status
Source-backed
Revision
v1 · 1,581 words
Add missing citations, update stale details, or suggest a clearer explanation.
MAI-Code-1 is the name commonly used for Microsoft's first in-house coding model, which the company shipped as MAI-Code-1-Flash at the Microsoft Build 2026 developer conference on June 2, 2026. It is a text-to-text model built end to end by Microsoft AI for fast, low-cost assistance inside GitHub Copilot, and it began rolling out the same day to Copilot users in Visual Studio Code. MAI-Code-1-Flash is one of seven models in the MAI family that Microsoft trained from scratch without distilling from any third-party system, a move the company framed as a step toward reducing its dependence on OpenAI.[1][2][3]
A note on naming is worth getting out of the way first. Coverage and the seed brief sometimes refer to a model called "MAI-Code-1," but Microsoft has only released the Flash variant. There is no separately announced larger "MAI-Code-1" base model as of June 2026. The "Flash" suffix follows Microsoft's house convention for its smaller, faster, cheaper models, the same way it uses it for MAI-Image-2.5 Flash.[2][4]
MAI-Code-1-Flash is built specifically for coding agents and interactive developer help: code generation and completion, repository question answering, refactoring, and tool-using "agentic" tasks that run inside a real editor. According to the official model card, it uses a transformer architecture with self-attention and sparse Mixture-of-Experts layers, supports a 256,000-token context window, and takes text in and produces text out. Its training cut-off is December 2025, and Microsoft says it was trained between March and May 2026 on Microsoft infrastructure using carefully curated, appropriately licensed data.[4][5]
The detail Microsoft emphasizes most is how the model was trained, not just what it scored. MAI-Code-1-Flash started from the mid-training checkpoint of MAI-Thinking-1, Microsoft's first in-house reasoning model, then went through supervised fine-tuning and a large reinforcement-learning stage run across more than 150,000 environments. Crucially, the company says it trained and evaluated the model directly inside the GitHub Copilot production harness, the same tool-calling and verification loop that serves real users, rather than against a stripped-down benchmark setup. The idea is that the model learns to interact with the surrounding tools the way it actually will in production. It also uses what Microsoft calls adaptive solution-length control: it stays terse on simple requests and spends more reasoning budget on hard ones.[4][5]
Here is the part that does not cleanly resolve, and it is worth flagging rather than papering over. The MAI family announcement and most press coverage describe MAI-Code-1-Flash as a 5-billion-parameter model. The official model card, however, lists its size as 137B and describes it as a sparse Mixture-of-Experts system. Microsoft has not said whether those figures refer to active parameters, total parameters, or different configurations, so the two numbers sit side by side without an official reconciliation.[2][4][6]
The likeliest reading, given the sparse MoE design, is that the model has roughly 137 billion total parameters with only a small fraction (plausibly around 5 billion) active per token, which would explain both numbers and the model's low serving cost. But Microsoft has not confirmed that interpretation, so it should be treated as inference rather than fact. The table below gives both figures as reported.
| Attribute | Reported value | Source |
|---|---|---|
| Developer | Microsoft (Microsoft AI) | Model card[4] |
| Release date | June 2, 2026 (including EU) | Model card[4] |
| Architecture | Transformer with sparse Mixture-of-Experts layers | Model card[4] |
| Parameters | 137B (model card); ~5B (family announcement) | Conflicting[2][4][6] |
| Context length | 256,000 tokens | Model card[4] |
| Training window | March 2026 to May 2026 | Model card[4] |
| Training cut-off | December 2025 | Model card[4] |
| Built from | MAI-Thinking-1 mid-training checkpoint | Model card[4] |
| Pricing | "To be finalized" | Model card[4] |
Microsoft positions MAI-Code-1-Flash against Claude Haiku 4.5, Anthropic's small, fast model, and reports that it leads across the coding benchmarks the company ran. The headline figure is SWE-Bench Pro, where the model card lists a 51.2% pass rate against 35.2% for Claude Haiku 4.5, a 16-point margin. Microsoft also reports wins on SWE-Bench Verified, SWE-Bench Multilingual, and Terminal Bench 2, in each case using fewer tokens per completed task. On harder problems the company says the model solves them with up to 60% fewer tokens than comparable models, which is the basis for its price-to-performance claim.[1][4][5]
All of these numbers come from Microsoft's own runs inside its Copilot harness, with both models evaluated under the same settings. They have not been independently reproduced, and the model card itself notes the Claude figures are from Microsoft's internal benchmark system. The coding results, as published in the model card, are below.[4]
| Benchmark | MAI-Code-1-Flash pass rate | Avg tokens (K) | Claude Haiku 4.5 pass rate | Avg tokens (K) |
|---|---|---|---|---|
| SWE-Bench Verified | 71.6 | 10.8 | 66.6 | 27.3 |
| SWE-Bench Pro | 51.2 | 28.0 | 35.2 | 29.8 |
| SWE-Bench Multilingual | 65.5 | 15.3 | 62.7 | 17.2 |
| Terminal Bench 2 | 54.8 | 21.6 | 41.6 | 25.0 |
Beyond coding, the model card reports stronger results on instruction following (an IF Bench suite averaging 75.0 versus 46.1 for Claude Haiku 4.5) and a set of reasoning evaluations including AIME 2026, GPQA Diamond, and an agentic tool-use benchmark. Microsoft also cited an 85.8% adjusted-accuracy result on an internal adversarial coding benchmark spanning 186 questions across 34 categories. The token-usage columns matter as much as the pass rates here: Microsoft's whole pitch is that a small model can match or beat a larger one while being cheaper to run, so shorter solutions at equal quality are the point.[1][4][5]
MAI-Code-1-Flash did not arrive alone. At Build 2026 the Microsoft AI group, led by CEO Mustafa Suleyman, unveiled seven models trained from scratch, including the MAI-Thinking-1 reasoning model, the MAI-Image-2.5 image models, MAI-Voice-2 for speech, and MAI-Transcribe-1.5 for transcription. Suleyman described the launch as being "about long term self-sufficiency for Microsoft and our partners." CNBC and GeekWire both read it the same way: as Microsoft's most concrete move yet to lessen its reliance on OpenAI and to lower costs for developers and enterprises.[1][2][3]
That framing has a specific backdrop. Microsoft and OpenAI renegotiated their partnership in 2026, and Microsoft has been steadily standing up its own model line under the MAI banner ever since. Running these models on Microsoft's own Azure infrastructure lets the company avoid paying a partner for inference and pass some of that saving along. Suleyman has said internally that after tuning its models for enterprise customers such as McKinsey, Microsoft was able to beat OpenAI's GPT-5.5 on those workloads at roughly ten times better cost efficiency, a claim that, like the benchmark numbers, comes from Microsoft and has not been independently verified.[1][2]
For Copilot specifically, a small in-house model is attractive because Copilot serves an enormous volume of routine, latency-sensitive requests. A cheap, fast model that Microsoft controls end to end, trained on the exact harness it will run in, is well suited to that high-frequency tier. It does not need to be the smartest model in the picker to be the most economical default for everyday completions and lightweight agentic edits.[4][5]
At launch MAI-Code-1-Flash was available only through GitHub Copilot in Visual Studio Code, starting with a fraction of individual users and expanding over the following weeks. It is rolling out across the Copilot Free, Pro, Pro+, and Max plans. Developers can pick it in the VS Code model picker, and Copilot's automatic model router (the "Auto" picker) may route eligible tasks to it without any manual selection or extra setup. Microsoft says support for GitHub Copilot CLI is planned for a later phase, and any API release would come with its own documentation.[4][5][7]
Several outlets reported that the model is also reachable through third-party inference providers including Fireworks AI, Baseten, and OpenRouter, which would let developers use it outside the Microsoft and GitHub stack. Pricing on those channels had not been published at launch, and the model card lists Copilot pricing as still "to be finalized." Some reporting cited indicative rates of $0.75 per million input tokens and $4.50 per million output tokens, with cached input at $0.075 per million, but those figures were described as not yet final.[2][4][7]
The model card lists English as the supported language, even though the model posts strong numbers on SWE-Bench Multilingual, so coverage of non-English natural-language interaction may be narrower than the multilingual coding result suggests. As with any large language model, Microsoft cautions that generated code can be wrong and should be reviewed and tested before it is relied on.[4]