Mistral Small 4

Large Language Models Open Source AI

9 min read

Updated Jul 17, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 17, 2026

Fact-checked

In review queue

Sources

16 citations

Revision

v2 · 1,864 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Mistral Small 4 is an open-weight large language model released by the French artificial intelligence company Mistral AI on March 16, 2026 under the Apache 2.0 license.^[1] It marks a sharp break with earlier members of the Mistral Small line, which were dense networks of roughly 24 billion parameters. Mistral Small 4 is instead a sparse Mixture-of-Experts (MoE) model with 119 billion total parameters and about 6.5 billion active per token.^[1] Mistral describes it as the first model in its catalogue to fold the abilities of three previously separate specialist lines into one general-purpose model: Magistral for step-by-step reasoning, Pixtral for vision and multimodal input, and Devstral for agentic coding.^[1] The model is served through Mistral's API under the identifier mistral-small-2603 and ships with a configurable reasoning control, a 256K-token context window, and native image input.^[2]

Overview

Mistral Small 4 continues the company's practice of shipping capable open-weight models that can be downloaded, fine-tuned, and self-hosted at no licensing cost. The headline change from the 3.x generation is structural. Earlier Small models (3.1 and 3.2) were dense 24 billion parameter transformers capped at a 128K context window. Mistral Small 4 moves to a granular Mixture-of-Experts design with 119 billion total parameters but only about 6.5 billion active on any given token, and it doubles the context window to 256K.^[1] Despite the much larger total parameter count, the "Small" designation persists for two reasons: the per-token compute is set by the active parameters (fewer than the 24 billion of the previous dense models), and the lineage runs directly from the older Small releases. When its reasoning control is turned off, Mistral says the model behaves like the prior Mistral-Small-3.2-24B chat model.^[1]

The defining feature is a single per-request control, reasoning_effort, that lets a developer switch the same weights between a fast instruct mode and a deliberate reasoning mode. Setting reasoning_effort to "none" produces quick, lightweight answers comparable to Mistral Small 3.2, while "high" produces long chain-of-thought output with verbosity comparable to Mistral's earlier Magistral-Small-2509 reasoning model.^[1] This hybrid behavior, rather than any single benchmark record, is what Mistral presents as the model's main selling point.

The consolidation of Magistral, Pixtral and Devstral

Through 2025 and early 2026 Mistral maintained several specialist small-model families in parallel: Magistral for reasoning, Pixtral for vision and document understanding, and Devstral for coding agents, alongside the general Mistral Small instruct models. Mistral Small 4 is positioned as the convergence point. The official announcement calls it "the first Mistral model to unify the capabilities of our flagship models, Magistral for reasoning, Pixtral for multimodal, and Devstral for agentic coding, into a single, versatile model."^[1]

It is worth being precise about what that means. Mistral Small 4 is a newly trained MoE model whose training was designed to cover all of these capability areas at once. It is not a literal weight-merge of the three existing specialist models, and the announcement does not describe it that way. The framing is about capabilities and about deployment convenience: instead of standing up separate endpoints for reasoning, vision, and coding, a team can run one model and select behavior per request. Several third-party write-ups summarized this as "one model to replace three" or as collapsing three deployments into a single endpoint.^[10] As of this writing Mistral has not publicly stated that the standalone Magistral, Pixtral, or Devstral lines are being discontinued, so the accurate description is consolidation of capabilities into a new model rather than a confirmed retirement of the older families. The vision capability associated with Pixtral is delivered through a multimodal encoder that accepts image input alongside text; the reasoning capability associated with Magistral is exposed through the reasoning_effort control; and the coding and agentic capability associated with Devstral is covered by the model's native function calling, JSON output, and tool-use behavior.

Architecture and specifications

Mistral Small 4 uses a granular Mixture-of-Experts architecture with 128 experts, of which 4 are activated for each token. Mistral reports 119 billion total parameters with about 6.5 billion active per token; the announcement cites roughly 6 billion active in the expert layers, or about 8 billion when the embedding and output layers are included.^[1] The context window is 256K tokens (262,144). The model accepts text and image input and produces text output, supports native function calling and structured JSON output, and honors a system prompt. Mistral lists broad multilingual coverage, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, and Arabic, among others.^[2] All specifications here are as reported by Mistral.

Because the active-parameter count is low, Mistral emphasizes efficiency. Relative to Mistral Small 3, the company cites about a 40 percent reduction in end-to-end completion time in a latency-optimized configuration and roughly three times the requests per second in a throughput-optimized configuration.^[1] The move from a dense 24 billion parameter design to the sparse MoE design is what enabled both the larger context window and these throughput gains.

Specification	Detail (per Mistral)
Developer	Mistral AI
Release date	March 16, 2026^[1]
API identifier	mistral-small-2603^[2]
Hugging Face repo	mistralai/Mistral-Small-4-119B-2603^[3]
Architecture	Mixture-of-Experts, 128 experts, 4 active per token^[1]
Total parameters	119 billion^[1]
Active parameters	about 6.5 billion per token^[1]
Context window	256K tokens (262,144)^[2]
Modalities	Text and image input, text output^[2]
Reasoning control	reasoning_effort = none or high^[2]
License	Apache 2.0^[1]
List price (API)	0.15 USD per million input tokens, 0.60 USD per million output tokens^[13]

Benchmarks

Mistral's headline claim is that Mistral Small 4 matches or surpasses OpenAI's open-weight GPT-OSS 120B across reasoning, coding, and long-context tasks while generating noticeably shorter outputs, which lowers cost and latency at equal quality.^[1] On the AA-LCR long-context reasoning evaluation the model reportedly reaches a score of 0.72 using about 1,600 characters of output, where a comparable Qwen 3.5 result is cited at roughly 5,800 to 6,100 characters.^[1] On LiveCodeBench it is said to outperform GPT-OSS 120B while producing about 20 percent less output, and on AIME 2025 it is reported to match or exceed GPT-OSS 120B, again with shorter generations.^[1] On GPQA Diamond the model scores about 57.6 with reasoning turned off and about 71.2 with reasoning set to high, illustrating the gain from the deliberate mode.^[11] For multimodal understanding, early third-party testing reports an MMMU-Pro score near 60, above Mistral Small 3.2 and Mistral Medium 3.1.^[14] A widely cited third-party guide also lists an MMLU-Pro figure around 78.^[10]

These numbers should be read with care. The reasoning, coding, and long-context comparisons against GPT-OSS 120B come from Mistral's own announcement,^[1] and several of the peer figures (GPQA, MMMU-Pro, MMLU-Pro) appear in independent write-ups rather than a single audited table. The recurring theme across sources is efficiency at parity: similar accuracy to larger or competing open models such as GPT-OSS 120B, Qwen 3.5, and Gemma 4, but with fewer active parameters and shorter outputs.

Benchmark	Mistral Small 4	Note
GPQA Diamond (reasoning off)	about 57.6	standard mode^[11]
GPQA Diamond (reasoning high)	about 71.2	deliberate mode^[11]
MMMU-Pro (vision)	about 60	above Small 3.2 and Medium 3.1^[14]
AA-LCR (long context)	0.72	about 1.6K characters of output^[1]
LiveCodeBench	beats GPT-OSS 120B	about 20 percent less output^[1]
AIME 2025	matches or beats GPT-OSS 120B	shorter generations^[1]

License and availability

Mistral Small 4 is released under the permissive Apache 2.0 license, which allows commercial and non-commercial use, modification, and redistribution.^[1] The full weights (about 242 GB in Safetensors format) are published on Hugging Face as mistralai/Mistral-Small-4-119B-2603.^[3] Mistral also published an NVFP4 quantized checkpoint of about 70.8 GB, produced in collaboration with the vLLM and Red Hat teams, plus an "eagle" head for speculative decoding; community GGUF quantizations appeared quickly for local use.^[4] The model can be run with vLLM (Mistral's recommended path), llama.cpp, LM Studio, SGLang, and Hugging Face Transformers, and fine-tuned with frameworks such as Axolotl. It is also packaged as an NVIDIA NIM container and offered for free prototyping on NVIDIA's build platform.^[16]

For hosted access the model is available on Mistral's La Plateforme API and AI Studio under the identifier mistral-small-2603, at a list price of 0.15 USD per million input tokens and 0.60 USD per million output tokens, and it is mirrored by aggregators such as OpenRouter.^[12] The launch itself drew some criticism. Reviewers noted that the reasoning_effort parameter, the feature that defines the model, was not functional or documented in the public API on launch day; documentation was added roughly a week later.^[9] Independent tester Simon Willison confirmed the model on the API while flagging the missing parameter docs and a weak SVG drawing in his standard pelican test, a reminder that the model targets understanding and generation of text and code rather than image synthesis.^[5]

Significance

Mistral Small 4 is notable less for topping any leaderboard than for what it represents about the open-weight model landscape in 2026. By packing reasoning, vision, and agentic coding into one Apache 2.0 model with a low active-parameter cost, Mistral targets the operational pain of running a fleet of separate specialist endpoints. The sparse MoE approach lets the company advertise frontier-adjacent quality at the inference cost of a much smaller dense model, which matters for self-hosting and for price-sensitive generative AI deployments.^[6] It also continues Mistral's strategy, distinct among well-funded labs, of pairing a hosted API with genuinely downloadable weights, positioning the model directly against other open releases such as GPT-OSS 120B, Qwen 3.5, and Gemma 4. Released alongside larger Mistral models including Mistral Large 3, Mistral Small 4 anchors the affordable, self-hostable tier of the company's 2026 lineup and reinforces the broader Mistral push toward open, customizable AI.

References

Mistral AI, "Introducing Mistral Small 4," March 16, 2026. https://mistral.ai/news/mistral-small-4/ ↩
Mistral AI Docs, "Mistral Small 4 model card (mistral-small-4-0-26-03)." https://docs.mistral.ai/models/model-cards/mistral-small-4-0-26-03 ↩
Hugging Face, "mistralai/Mistral-Small-4-119B-2603." https://huggingface.co/mistralai/Mistral-Small-4-119B-2603 ↩
Hugging Face, "mistralai/Mistral-Small-4-119B-2603-NVFP4." https://huggingface.co/mistralai/Mistral-Small-4-119B-2603-NVFP4 ↩
Simon Willison, "Introducing Mistral Small 4," March 16, 2026. https://simonwillison.net/2026/Mar/16/mistral-small-4/ ↩
VentureBeat, "Mistral's Small 4 consolidates reasoning, vision and coding into one model at a fraction of the inference cost." https://venturebeat.com/technology/mistrals-small-4-consolidates-reasoning-vision-and-coding-into-one-model-at ↩
MarkTechPost, "Mistral AI Releases Mistral Small 4: A 119B-Parameter MoE Model that Unifies Instruct, Reasoning, and Multimodal Workloads," March 16, 2026. https://www.marktechpost.com/2026/03/16/mistral-ai-releases-mistral-small-4-a-119b-parameter-moe-model-that-unifies-instruct-reasoning-and-multimodal-workloads/
StartupHub.ai, "Mistral Small 4 Unifies AI Capabilities," 2026. https://www.startuphub.ai/ai-news/artificial-intelligence/2026/mistral-small-4-unifies-ai-capabilities
CTOL Digital Solutions, "Mistral Small 4 Review: Impressive Specs, Shaky Launch." https://ctol.digital/news/mistral-small-4-review-impressive-specs-shaky-launch-is-it-worth-the-hype/ ↩
Emelia, "Mistral Small 4: One AI Model to Replace Three (Complete Guide and Benchmarks 2026)." https://emelia.io/hub/mistral-small-4-complete-guide-benchmarks ↩
BenchLM.ai, "Mistral Small 4 Benchmarks 2026: Scores, Rankings and Performance." https://benchlm.ai/models/mistral-small-4 ↩
OpenRouter, "Mistral Small 4 (mistralai/mistral-small-2603) API Pricing and Benchmarks." https://openrouter.ai/mistralai/mistral-small-2603 ↩
Mistral AI, "Pricing." https://mistral.ai/pricing/ ↩
ComputerTech, "Mistral Small 4 Review 2026: The First MoE Model That Does It All." https://computertech.co/mistral-small-4-review/ ↩
Analytics Vidhya, "Mistral Small 4: The One Model That Codes, Reasons, and Chats," March 2026. https://www.analyticsvidhya.com/blog/2026/03/mistral-small-4/
NVIDIA NGC Catalog, "Mistral Small-4 119b-2603 (NIM)." https://catalog.ngc.nvidia.com/orgs/nim/teams/mistralai/containers/mistral-small-4-119b-2603 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Magistral Mistral Large 3

Overview

The consolidation of Magistral, Pixtral and Devstral

Architecture and specifications

Benchmarks

License and availability

Significance

References

Improve this article

Related Articles

LLaMA

Proprietary vs. Open Source Large Language Models (LLMs)

DeepSeek

LangChain

Meta AI

Mistral AI

What links here

Related Articles

LLaMA

Proprietary vs. Open Source Large Language Models (LLMs)

DeepSeek

LangChain

Meta AI

Mistral AI

What links here