Mistral Medium 3.5
Last reviewed
Jun 2, 2026
Sources
8 citations
Review status
Source-backed
Revision
v1 · 1,728 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 2, 2026
Sources
8 citations
Review status
Source-backed
Revision
v1 · 1,728 words
Add missing citations, update stale details, or suggest a clearer explanation.
Mistral Medium 3.5 is an open-weight large language model released by Mistral AI on 28 April 2026. The company describes it as its "first flagship merged model": a single dense network of about 128 billion parameters that folds instruction-following, reasoning, and coding into one set of weights, replacing the separate models Mistral had previously shipped for each of those jobs.[1][2] It became the default model in Le Chat, Mistral's consumer assistant, and in the company's Vibe coding tool, and it is published under a modified MIT license on Hugging Face.[1][3]
The "merged" framing is the model's main selling point. Where Mistral had run a dedicated reasoning model (Magistral) and a dedicated coding model (Devstral 2) alongside its general Medium line, Medium 3.5 collapses those capabilities into a single checkpoint whose reasoning depth can be turned up or down per request.[2][4]
Mistral Medium 3.5 is a dense, multimodal transformer with roughly 128 billion parameters and a context window of 256,000 tokens.[1][3] It accepts text and image input and returns text, and it exposes a per-request control over how much reasoning it spends on a query, so the same weights can produce a fast chat reply or grind through a multi-step agentic task.[1][3] Because the model is dense rather than a mixture of experts, every parameter is active for every generated token, which Mistral and several reviewers contrast with the sparse architectures used by some competing systems.[4]
The model is distributed as open weights. Mistral published the checkpoint on Hugging Face under a "modified MIT" license, a permissive license that allows commercial use and fine-tuning but adds carve-outs for very large companies.[1][5] In Mistral's own product surfaces it is reachable through Le Chat, the Vibe CLI, and the paid API; it is also offered through third-party platforms including NVIDIA's hosted endpoints and Microsoft Copilot Studio.[1][6]
Mistral AI is a French AI company founded in 2023 that has built much of its reputation on open-weight releases, from Mistral 7B and Mixtral through the closed Mistral Large frontier line. The "Medium" tier sits below Large and is positioned as a cost-efficient workhorse rather than the absolute top of the range.
Mistral Medium 3 opened that tier in 2025 as a model aimed at strong performance at a fraction of the price of frontier systems. Medium 3.5 is the consolidation point of the line: secondary coverage describes it as superseding the intermediate Medium 3.1 release while also retiring Magistral and the Codestral-derived Devstral 2 coding model into the same weights.[2][4] Mistral's own announcement is narrower, stating that Medium 3.5 becomes the default in Le Chat and replaces Devstral 2 in the Vibe CLI; the broader claim that it also displaces Medium 3.1 and Magistral in Le Chat comes from press reporting and the Ollama model listing rather than the launch post.[1][2][3]
Mistral's documentation dates the model to 28 April 2026 and tags it version v26.04, the date that the model-card slug mistral-medium-3-5-26-04 also encodes.[7] The weights and the API listing went up around the same time, and most early coverage placed the launch in late April or the first days of May 2026.[2][4] A follow-up product post, "Remote agents in Vibe. Powered by Mistral Medium 3.5," tied the model to two new features, asynchronous cloud coding agents in Vibe and a "Work mode" preview in Le Chat, and that post carries a May 2026 date.[1] Microsoft listed the model in Copilot Studio later in May.[6]
| Attribute | Value | Source |
|---|---|---|
| Parameters | ~128 billion (dense) | [1][3] |
| Architecture | Dense transformer, multimodal | [1][3] |
| Context window | 256,000 tokens | [1][3] |
| Modalities | Text and image input, text output | [3] |
| Reasoning control | reasoning_effort per request (e.g. none / high) | [1][3] |
| Vision encoder | Retrained from scratch for variable image sizes and aspect ratios | [1] |
| License | Modified MIT (open weights) | [1][5] |
| Self-hosting | Runs on as few as four GPUs | [1][2] |
The single most repeated technical detail is that Medium 3.5 is dense, not sparse. Reviewers note that all 128 billion parameters load and activate for every token, which makes the memory footprint predictable but means the model cannot lean on the conditional compute tricks of a mixture-of-experts design.[4] Mistral says the model self-hosts on as few as four GPUs, which several writers read as four H100-class accelerators at reduced precision.[1][2] The vision stack was rebuilt: Mistral states it trained the vision encoder "from scratch to handle variable image sizes and aspect ratios," making Medium 3.5 a genuine multimodal model rather than a text model with a bolted-on adapter.[1]
The headline capability is the merge itself. A single request can be answered with reasoning switched off for low latency, or with reasoning set high for complex prompts and agentic runs, all from the same weights.[1][3] Mistral pitches the model at long-horizon work: calling multiple tools reliably, sustaining multi-step coding sessions, and emitting structured output that downstream code can parse.[1][6] Native function calling is supported, which is what lets Le Chat's "Work mode" drive tools in parallel until a task is finished.[1]
On the product side, Medium 3.5 powers Vibe's remote agents, cloud-hosted coding sessions that can be spawned from the CLI or from Le Chat and can take over a local session, and the Work mode preview in Le Chat that chains tool calls across a multi-step job.[1]
Mistral published only a small set of headline scores at launch, both on agentic tasks, and did not release full results for general-knowledge or reasoning suites such as MMLU-Pro or GPQA Diamond; numbers for those circulating in the community are not Mistral-published and should be treated with caution.[1][5]
| Benchmark | Score | Notes | Source |
|---|---|---|---|
| SWE-Bench Verified | 77.6% | Real-world software-engineering bug fixes; reported ahead of Devstral 2 and Qwen3.5 397B A17B | [1][8] |
| τ³-Telecom | 91.4 | Multi-turn agentic tool calling | [1][3] |
| MMLU-Pro / GPQA Diamond | Not disclosed at launch | Mistral did not publish these in the launch post | [5] |
The SWE-Bench Verified figure of 77.6% is the number most often cited, and Mistral presents it as beating its own retired Devstral 2 coding model along with Qwen3.5 397B A17B.[1][8] Independent writers were more measured: TechSifted and others reported that the model leads on coding and telecom-style agent benchmarks but lags Claude badly on banking-domain tasks, and that it sits below proprietary frontier models overall while being roughly competitive with Claude Sonnet 4.5 on some coding tests.[4]
Mistral ships Medium 3.5 through its own assistant and developer surfaces and through several partners. In Le Chat it is the default model, with Work mode available on the Pro, Team, and Enterprise plans; in Vibe it powers the CLI and the new remote agents; and the open weights are downloadable from Hugging Face for self-hosting.[1] The model is also offered for prototyping on NVIDIA-accelerated endpoints and as an NVIDIA NIM container, and it appears in Microsoft Copilot Studio's model lineup for agent builders.[1][6]
| Channel | Access | Notes | Source |
|---|---|---|---|
| Le Chat | Default model; Work mode (Preview) | Pro, Team, Enterprise plans for Work mode | [1] |
| Vibe CLI | Default model; remote cloud agents | Replaces Devstral 2 | [1] |
| API | $1.50 / 1M input tokens | Pay-per-token | [1][4] |
| API | $7.50 / 1M output tokens | Pay-per-token | [1][4] |
| Hugging Face | Open weights, modified MIT | Self-host (~4 GPUs) | [1][5] |
| NVIDIA / Copilot Studio | Hosted endpoints, NIM, Copilot Studio | Third-party hosting | [1][6] |
API pricing of $1.50 per million input tokens and $7.50 per million output tokens is consistent across the announcement and independent coverage.[1][4]
Reaction split along familiar lines. The consolidation drew praise as an operational simplification: one model to deploy and bill instead of separate reasoning, coding, and general checkpoints, with weights you can run on your own hardware.[2][4] Mistral's continued commitment to open weights was treated as a notable counterpoint to the closed frontier labs.[5]
The criticism centered on price and on the "open" label. Some developers argued that $1.50 in / $7.50 out is steep for a 128-billion-parameter model relative to comparably sized open competitors.[2] Others pushed on the license: a "modified MIT" license that adds use-case and large-company restrictions is, as one reviewer put it, harder to justify as genuinely open in regulated industries than a plain permissive license would be.[4] On capability, independent testing found the model strong on coding and agentic tool use but uneven across domains and below the proprietary frontier overall.[4]
Mistral's published evaluation is thin: only two agentic benchmarks were disclosed at launch, so the model's general-knowledge and hard-reasoning ceilings are harder to triangulate than its coding numbers, and the community scores filling that gap are not vendor-verified.[5] The dense 128-billion-parameter design gives a predictable but non-trivial hardware footprint, with self-hosting pitched at four GPUs rather than something a single consumer card can run.[1][2] Domain coverage is uneven, with reported weakness on banking-style tasks.[4] And the "modified MIT" license, while permitting most commercial use, carries carve-outs that limit how cleanly the weights can be called open.[4][5] At launch the Le Chat Work mode was labeled a preview rather than a finished feature.[1]