Magistral

11 min read

Updated Jul 23, 2026

Magistral is the first family of reasoning models from Mistral AI, the French AI company, first released on June 10, 2025. ^[1] Mistral describes it as "the first reasoning model by Mistral AI, excelling in domain-specific, transparent, and multilingual reasoning," built to work through problems step by step rather than answering in a single pass. ^[1] The launch came with two members: Magistral Small, a 24 billion parameter model released with open weights under the Apache 2.0 license, and Magistral Medium, a larger, more capable variant offered through Mistral's commercial products and API. ^[1]^[2] Magistral Medium scored 73.6 percent on AIME 2024 (rising to 90 percent with majority voting at 64 samples), and Magistral Small scored 70.7 percent (83.3 percent with majority voting), according to Mistral. ^[1]^[3]

A few days after the announcement, Mistral published a technical report on arXiv that described the training pipeline and the reasoning behind the design. ^[3] Magistral arrived during a stretch when reasoning models had moved to the center of the field. OpenAI's o-series and DeepSeek-R1 had shown that letting a model produce a long internal chain of thought before its final answer could lift performance on math, coding, and logic problems by a wide margin. Mistral had until that point shipped general purpose chat and instruct models, so Magistral marked its entry into the same category, and it did so partly on its own terms by training the models with its own infrastructure rather than borrowing reasoning traces from existing systems. ^[3]

What is Magistral?

Magistral is a large language model tuned to reason explicitly. Given a question, it writes out a working-through stage, usually wrapped in dedicated thinking tokens, and then gives a final answer that follows from that work. ^[2] This is the chain-of-thought pattern that defines the current generation of reasoning systems, and it tends to help most on tasks where a single forward guess is unreliable, such as competition mathematics, multi-step coding, and structured logical problems.

Mistral framed two qualities as the point of the model. The first is transparency. Because the reasoning trace is exposed rather than hidden, a user can follow how the model reached a conclusion, which Mistral pitched at regulated settings where every step of a decision may need to be auditable. ^[1] Mistral describes this as "transparent reasoning that you can follow and verify." ^[1] The second quality is multilingual reasoning, where the chain of thought itself happens in the user's language rather than defaulting to English. As the company puts it, "Magistral's chain-of-thought works across global languages and alphabets," so the model can "reason natively" in the user's own language. ^[1]^[3]

The two variants split along an open versus commercial line that runs through much of Mistral's catalog. Magistral Small is downloadable and modifiable under Apache 2.0, which lets developers run it locally and fine-tune it freely. ^[2] Magistral Medium is not open. It is reached through Le Chat, Mistral's assistant, and through the La Plateforme API, with availability also planned across cloud marketplaces including Amazon SageMaker, IBM watsonx, Azure AI, and Google Cloud. ^[1]

What are Magistral Small and Magistral Medium?

Magistral does not introduce a new pretrained backbone. Each variant adds reasoning behavior on top of an existing Mistral model. Magistral Small is built on Mistral Small 3.1, specifically the Mistral-Small-3.1-24B-Instruct-2503 checkpoint, which fixes its size at 24 billion parameters. ^[2] Magistral Medium is built on Mistral Medium 3, a larger model that Mistral keeps closed, so its parameter count has not been published. ^[3]

The table below summarizes how the two initial variants differ.

Attribute	Magistral Small	Magistral Medium
Release date	June 10, 2025	June 10, 2025
Parameters	24 billion	Unpublished (closed)
Base model	Mistral Small 3.1	Mistral Medium 3
License	Apache 2.0 (open weights)	Proprietary (API only)
Access	Hugging Face download, self-host	Le Chat, La Plateforme API, cloud marketplaces
Context window	128k tokens	128k tokens

Both variants carry a 128k token context window, though Mistral notes that quality can fall off past roughly 40k tokens of generation and recommends keeping within that range for the open model. ^[2] Because Magistral Small inherits a 24B footprint, it is small enough to run on a single high-end consumer GPU or a well-equipped laptop once quantized, which is part of the appeal of the open release. ^[4]

How was Magistral trained?

The training method is the most distinctive part of the project. Mistral built Magistral with reinforcement learning using a setup it describes as a ground-up approach that relied only on its own models and infrastructure, without distilling reasoning traces from any outside model. ^[3] The two variants reached their reasoning ability by different routes. Magistral Medium was trained with reinforcement learning alone on top of Mistral Medium 3, and the report states that this pure RL run produced close to a 50 percent gain in AIME 2024 pass@1 over the starting checkpoint. ^[3] Magistral Small then learned from the larger model. It was trained with a round of supervised fine-tuning on reasoning traces drawn from Magistral Medium, used as cold-start data, followed by its own reinforcement learning stage. ^[2]^[3]

The reinforcement learning algorithm builds on GRPO, the group relative policy optimization method that DeepSeek popularized with DeepSeek-R1, but Mistral changed several pieces. ^[3] The report describes removing the KL divergence penalty entirely, normalizing the loss by the total length of generations in a group, normalizing advantages within each minibatch, and widening the upper clipping bound through a clip-higher setting in roughly the 0.26 to 0.28 range to keep the policy from collapsing to low entropy. It also filters out groups whose samples all receive the same advantage so they do not waste a training step. ^[3] This style of training, where rewards come from automatically checkable signals such as whether a math answer is correct or code passes its tests, is often grouped under reinforcement learning with verifiable rewards.

Multilingual reasoning was handled through reward shaping rather than a separate model. During training Mistral translated a fraction of the problems into languages including French, Spanish, Italian, German, Russian, and Chinese, then used a classifier to check that the question, the reasoning, and the answer all stayed in the same language, granting a small extra reward when they did. ^[3] The result is that the chain of thought is written in the user's language instead of being translated after the fact. The report also reports a useful side effect. Running reinforcement learning on text alone tended to preserve, and sometimes improve, the base model's other abilities such as instruction following, function calling, and multimodal understanding, rather than eroding them. ^[3]

To make large-scale online reinforcement learning practical, Mistral leaned on an asynchronous system. Generators produce completions continuously while training proceeds, and updated weights are pushed to those generators using NCCL without stopping generation or discarding the in-progress cache. ^[3] Much of the technical report is given over to these infrastructure choices, since keeping the generation and training loops fed is a large part of what makes pure RL at this scale workable.

How good is Magistral at reasoning?

Mistral reported results on the standard reasoning benchmarks of the moment, including the AIME competition mathematics sets for 2024 and 2025, GPQA Diamond for graduate-level science questions, and LiveCodeBench for coding. The figures below are the pass@1 numbers from the technical report, with majority-vote results shown where Mistral provided them. ^[3]

Benchmark	Magistral Medium	Magistral Small
AIME 2024 (pass@1)	73.6%	70.7%
AIME 2024 (maj@64)	90.0%	83.3%
AIME 2025 (pass@1)	64.9%	62.8%
AIME 2025 (maj@64)	83.3%	76.7%
GPQA Diamond	70.8%	68.2%
LiveCodeBench v5	59.4%	55.8%

The pattern is what you would expect from the two-tier design. Magistral Medium leads on every measure, while Magistral Small lands a few points behind despite being the open, smaller model that learned partly from its larger sibling. ^[3] The majority-vote columns, where the model samples many answers and takes the most common one, show how much headroom the sampling strategy can add on AIME, lifting Magistral Medium from 73.6 percent to 90 percent on the 2024 set. ^[1]^[3] These are the only numbers Mistral published for the first release, and comparisons against other vendors' models should be read with care, since benchmark setups and decoding settings differ between labs.

How multilingual is Magistral?

Multilingual reasoning is the feature Mistral leaned on most when separating Magistral from its rivals. The blog post highlighted strong reasoning in English, French, Spanish, German, Italian, Arabic, Russian, and Simplified Chinese, and the Magistral Small model card lists support across roughly two dozen languages in total. ^[1]^[2] What matters here is not only that the model can read a non-English prompt but that its internal reasoning stays in that language, which is the behavior the language-consistency reward was designed to produce. ^[3] For a European company that has often positioned itself around sovereignty and language coverage, keeping the reasoning trace in the user's own language fits the broader pitch.

How does Magistral compare to DeepSeek-R1 and OpenAI's o-series?

Magistral sits in the same space as DeepSeek-R1 and OpenAI's o-series, and the comparison is mostly about how open the models are. Like DeepSeek-R1, the Small variant ships with open weights, so it can be inspected, self-hosted, and fine-tuned, which the closed o-series models cannot. ^[4] Unlike DeepSeek-R1, which is a very large mixture-of-experts model, Magistral Small is a compact 24B dense model that an individual can run on local hardware, trading peak benchmark scores for accessibility. ^[2]^[4] Several outlets described Magistral as a European entry into reasoning models, with Mistral pairing the open Small release against an enterprise Medium tier reached through Le Chat and the API. ^[1]^[5]

The reasoning-heavy approach was also tied to product features. In Le Chat, a Flash Answers mode used Magistral Medium to return responses at "up to 10x faster token throughput than most competitors," a speed-up Mistral built in collaboration with Cerebras and its Wafer Scale Engine inference hardware. ^[1] Mistral framed this as enabling "real-time reasoning and user feedback, at scale." ^[1]

What changed in Magistral 1.1 and 1.2?

Magistral has moved through point releases since launch. Mistral shipped Magistral 1.1 on July 24, 2025, covering both Magistral Medium 1.1 and Magistral Small 1.1 under the version codes magistral-medium-2507 and magistral-small-2507. ^[6]

The larger step came with Magistral 1.2 in mid-September 2025, released as magistral-medium-2509 and magistral-small-2509. ^[6]^[7] This update rebased Magistral Small on Mistral Small 3.2 and added a vision encoder, so the model could take image inputs and extend its reasoning to visual questions for the first time. ^[7]^[8] Mistral also listed quality-of-life changes, including better tone, cleaner LaTeX and Markdown formatting, shorter answers on easy prompts, and a lower tendency to fall into runaway generation loops, along with the [THINK] and [/THINK] tokens used to wrap the reasoning span. ^[7] The benchmark gains were sizable. On Mistral's reported figures, Magistral Small 1.2 reached 86.14 percent on AIME 2024 and 77.34 percent on AIME 2025, well above the 70.52 percent and 62.03 percent posted by Magistral Small 1.1 on the same measures, with GPQA Diamond and LiveCodeBench also improving. ^[7] Magistral Small 1.2 stayed open under Apache 2.0 and small enough to run on a single consumer card or a 32GB laptop once quantized. ^[7]^[8]

What are Magistral's limitations?

Some limits follow directly from the design. Reasoning models spend extra tokens thinking before they answer, which raises latency and cost on hard questions and can be wasteful on simple ones, a point Mistral partly addressed in 1.2 by shortening answers on easy prompts. ^[7] Magistral Small's quality can degrade past roughly 40k tokens of generation even though the context window is 128k, so the practical reasoning budget is smaller than the headline figure suggests. ^[2] The first release of Magistral Small was text only, with image support arriving only with the 1.2 vision encoder. ^[7] And because Magistral Medium is closed and its parameters are unpublished, outside researchers cannot directly study the stronger of the two models, so independent verification rests mainly on the open Small variant and on Mistral's own technical report. ^[3] As with any single-vendor benchmark disclosure, the reported scores are best treated as a starting point rather than a settled ranking.

References

^Mistral AI. "Magistral." Mistral AI, 2025. mistral.ai/...magistral
^Mistral AI. "Magistral-Small-2506 Model Card." Hugging Face, 2025. huggingface.co/...Magistral-Small-2506
^Mistral AI. "Magistral." arXiv:2506.10910, 2025. arxiv.org/...2506.10910
^Simon Willison. "Magistral, the first reasoning model by Mistral AI." simonwillison.net, 2025. simonwillison.net/...magistral
^Kyle Wiggers. "Mistral releases a pair of AI reasoning models." TechCrunch, 2025. techcrunch.com/...es-a-pair-of-ai-reasoning-models
^Mistral AI. "Changelog." Mistral Docs, 2025. docs.mistral.ai/...changelog
^Mistral AI. "Magistral-Small-2509 Model Card." Hugging Face, 2025. huggingface.co/...Magistral-Small-2509
^Carl Franzen. "Mistral's updated Magistral Small 1.2 reasoning model can analyze images and fit on a MacBook." VentureBeat, 2025. venturebeat.com/...ng-model-can-analyze-images-and
Mistral AI. "Magistral Small 1.2." Mistral Docs, 2025. docs.mistral.ai/...magistral-small-1-2-25-09
Mistral AI. "Magistral Medium 1.0." Mistral Docs, 2025. docs.mistral.ai/...magistral-medium-1-0-25-06

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · v3 · 2,220 words · full history

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Suggest edit

What links here

AI Model Release Timeline (2022-2026)Guillaume Lample Mistral AI Mistral Medium 3 Mistral Medium 3.5 Mistral Small 4

What is Magistral?

What are Magistral Small and Magistral Medium?

How was Magistral trained?

How good is Magistral at reasoning?

How multilingual is Magistral?

How does Magistral compare to DeepSeek-R1 and OpenAI's o-series?

What changed in Magistral 1.1 and 1.2?

What are Magistral's limitations?

References

Improve this article

Related Articles

DeepSeek-R1-Distill

DeepSeek V3.1

QwQ

Phi-4-mini-flash-reasoning

Phi-4 Reasoning

OLMo 3

What links here

Related Articles

DeepSeek-R1-Distill

DeepSeek V3.1

QwQ

Phi-4-mini-flash-reasoning

Phi-4 Reasoning

OLMo 3

What links here