Mistral Small 4
Last reviewed
Sources
16 citations
Review status
Source-backed
Revision
v1 · 1,864 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
16 citations
Review status
Source-backed
Revision
v1 · 1,864 words
Add missing citations, update stale details, or suggest a clearer explanation.
Mistral Small 4 is an open-weight large language model released by the French artificial intelligence company Mistral AI on March 16, 2026 under the Apache 2.0 license. It marks a sharp break with earlier members of the Mistral Small line, which were dense networks of roughly 24 billion parameters. Mistral Small 4 is instead a sparse Mixture-of-Experts (MoE) model with 119 billion total parameters and about 6.5 billion active per token. Mistral describes it as the first model in its catalogue to fold the abilities of three previously separate specialist lines into one general-purpose model: Magistral for step-by-step reasoning, Pixtral for vision and multimodal input, and Devstral for agentic coding. The model is served through Mistral's API under the identifier mistral-small-2603 and ships with a configurable reasoning control, a 256K-token context window, and native image input.
Mistral Small 4 continues the company's practice of shipping capable open-weight models that can be downloaded, fine-tuned, and self-hosted at no licensing cost. The headline change from the 3.x generation is structural. Earlier Small models (3.1 and 3.2) were dense 24 billion parameter transformers capped at a 128K context window. Mistral Small 4 moves to a granular Mixture-of-Experts design with 119 billion total parameters but only about 6.5 billion active on any given token, and it doubles the context window to 256K. Despite the much larger total parameter count, the "Small" designation persists for two reasons: the per-token compute is set by the active parameters (fewer than the 24 billion of the previous dense models), and the lineage runs directly from the older Small releases. When its reasoning control is turned off, Mistral says the model behaves like the prior Mistral-Small-3.2-24B chat model.
The defining feature is a single per-request control, reasoning_effort, that lets a developer switch the same weights between a fast instruct mode and a deliberate reasoning mode. Setting reasoning_effort to "none" produces quick, lightweight answers comparable to Mistral Small 3.2, while "high" produces long chain-of-thought output with verbosity comparable to Mistral's earlier Magistral-Small-2509 reasoning model. This hybrid behavior, rather than any single benchmark record, is what Mistral presents as the model's main selling point.
Through 2025 and early 2026 Mistral maintained several specialist small-model families in parallel: Magistral for reasoning, Pixtral for vision and document understanding, and Devstral for coding agents, alongside the general Mistral Small instruct models. Mistral Small 4 is positioned as the convergence point. The official announcement calls it "the first Mistral model to unify the capabilities of our flagship models, Magistral for reasoning, Pixtral for multimodal, and Devstral for agentic coding, into a single, versatile model."
It is worth being precise about what that means. Mistral Small 4 is a newly trained MoE model whose training was designed to cover all of these capability areas at once. It is not a literal weight-merge of the three existing specialist models, and the announcement does not describe it that way. The framing is about capabilities and about deployment convenience: instead of standing up separate endpoints for reasoning, vision, and coding, a team can run one model and select behavior per request. Several third-party write-ups summarized this as "one model to replace three" or as collapsing three deployments into a single endpoint. As of this writing Mistral has not publicly stated that the standalone Magistral, Pixtral, or Devstral lines are being discontinued, so the accurate description is consolidation of capabilities into a new model rather than a confirmed retirement of the older families. The vision capability associated with Pixtral is delivered through a multimodal encoder that accepts image input alongside text; the reasoning capability associated with Magistral is exposed through the reasoning_effort control; and the coding and agentic capability associated with Devstral is covered by the model's native function calling, JSON output, and tool-use behavior.
Mistral Small 4 uses a granular Mixture-of-Experts architecture with 128 experts, of which 4 are activated for each token. Mistral reports 119 billion total parameters with about 6.5 billion active per token; the announcement cites roughly 6 billion active in the expert layers, or about 8 billion when the embedding and output layers are included. The context window is 256K tokens (262,144). The model accepts text and image input and produces text output, supports native function calling and structured JSON output, and honors a system prompt. Mistral lists broad multilingual coverage, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, and Arabic, among others. All specifications here are as reported by Mistral.
Because the active-parameter count is low, Mistral emphasizes efficiency. Relative to Mistral Small 3, the company cites about a 40 percent reduction in end-to-end completion time in a latency-optimized configuration and roughly three times the requests per second in a throughput-optimized configuration. The move from a dense 24 billion parameter design to the sparse MoE design is what enabled both the larger context window and these throughput gains.
| Specification | Detail (per Mistral) |
|---|---|
| Developer | Mistral AI |
| Release date | March 16, 2026 |
| API identifier | mistral-small-2603 |
| Hugging Face repo | mistralai/Mistral-Small-4-119B-2603 |
| Architecture | Mixture-of-Experts, 128 experts, 4 active per token |
| Total parameters | 119 billion |
| Active parameters | about 6.5 billion per token |
| Context window | 256K tokens (262,144) |
| Modalities | Text and image input, text output |
| Reasoning control | reasoning_effort = none or high |
| License | Apache 2.0 |
| List price (API) | 0.15 USD per million input tokens, 0.60 USD per million output tokens |
Mistral's headline claim is that Mistral Small 4 matches or surpasses OpenAI's open-weight GPT-OSS 120B across reasoning, coding, and long-context tasks while generating noticeably shorter outputs, which lowers cost and latency at equal quality. On the AA-LCR long-context reasoning evaluation the model reportedly reaches a score of 0.72 using about 1,600 characters of output, where a comparable Qwen 3.5 result is cited at roughly 5,800 to 6,100 characters. On LiveCodeBench it is said to outperform GPT-OSS 120B while producing about 20 percent less output, and on AIME 2025 it is reported to match or exceed GPT-OSS 120B, again with shorter generations. On GPQA Diamond the model scores about 57.6 with reasoning turned off and about 71.2 with reasoning set to high, illustrating the gain from the deliberate mode. For multimodal understanding, early third-party testing reports an MMMU-Pro score near 60, above Mistral Small 3.2 and Mistral Medium 3.1. A widely cited third-party guide also lists an MMLU-Pro figure around 78.
These numbers should be read with care. The reasoning, coding, and long-context comparisons against GPT-OSS 120B come from Mistral's own announcement, and several of the peer figures (GPQA, MMMU-Pro, MMLU-Pro) appear in independent write-ups rather than a single audited table. The recurring theme across sources is efficiency at parity: similar accuracy to larger or competing open models such as GPT-OSS 120B, Qwen 3.5, and Gemma 4, but with fewer active parameters and shorter outputs.
| Benchmark | Mistral Small 4 | Note |
|---|---|---|
| GPQA Diamond (reasoning off) | about 57.6 | standard mode |
| GPQA Diamond (reasoning high) | about 71.2 | deliberate mode |
| MMMU-Pro (vision) | about 60 | above Small 3.2 and Medium 3.1 |
| AA-LCR (long context) | 0.72 | about 1.6K characters of output |
| LiveCodeBench | beats GPT-OSS 120B | about 20 percent less output |
| AIME 2025 | matches or beats GPT-OSS 120B | shorter generations |
Mistral Small 4 is released under the permissive Apache 2.0 license, which allows commercial and non-commercial use, modification, and redistribution. The full weights (about 242 GB in Safetensors format) are published on Hugging Face as mistralai/Mistral-Small-4-119B-2603. Mistral also published an NVFP4 quantized checkpoint of about 70.8 GB, produced in collaboration with the vLLM and Red Hat teams, plus an "eagle" head for speculative decoding; community GGUF quantizations appeared quickly for local use. The model can be run with vLLM (Mistral's recommended path), llama.cpp, LM Studio, SGLang, and Hugging Face Transformers, and fine-tuned with frameworks such as Axolotl. It is also packaged as an NVIDIA NIM container and offered for free prototyping on NVIDIA's build platform.
For hosted access the model is available on Mistral's La Plateforme API and AI Studio under the identifier mistral-small-2603, at a list price of 0.15 USD per million input tokens and 0.60 USD per million output tokens, and it is mirrored by aggregators such as OpenRouter. The launch itself drew some criticism. Reviewers noted that the reasoning_effort parameter, the feature that defines the model, was not functional or documented in the public API on launch day; documentation was added roughly a week later. Independent tester Simon Willison confirmed the model on the API while flagging the missing parameter docs and a weak SVG drawing in his standard pelican test, a reminder that the model targets understanding and generation of text and code rather than image synthesis.
Mistral Small 4 is notable less for topping any leaderboard than for what it represents about the open-weight model landscape in 2026. By packing reasoning, vision, and agentic coding into one Apache 2.0 model with a low active-parameter cost, Mistral targets the operational pain of running a fleet of separate specialist endpoints. The sparse MoE approach lets the company advertise frontier-adjacent quality at the inference cost of a much smaller dense model, which matters for self-hosting and for price-sensitive generative AI deployments. It also continues Mistral's strategy, distinct among well-funded labs, of pairing a hosted API with genuinely downloadable weights, positioning the model directly against other open releases such as GPT-OSS 120B, Qwen 3.5, and Gemma 4. Released alongside larger Mistral models including Mistral Large 3, Mistral Small 4 anchors the affordable, self-hostable tier of the company's 2026 lineup and reinforces the broader Mistral push toward open, customizable AI.