Mistral Large 3

AI Companies AI Models Large Language Models Mixture of Experts

35 min read

Updated Jun 28, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 28, 2026

Fact-checked

In review queue

Sources

31 citations

Revision

v6 · 7,042 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

What is Mistral Large 3?

Mistral Large 3 is a sparse mixture-of-experts large language model released on December 2, 2025 by the French AI company Mistral AI, distributed as open weights under the Apache 2.0 license with roughly 675 billion total parameters, about 41 billion active per token, and a 256,000-token context window.^[1]^[3]^[17] It is the third generation of the company's flagship Large series and Mistral's first mixture-of-experts flagship since the original Mixtral models of 2023 and 2024, and Mistral describes it as "our most capable model to date."^[1] The model is multimodal, accepting both text and images through a roughly 2.5 billion parameter vision encoder fused into the same checkpoint, and Mistral positions it as a general-purpose, multilingual, non-reasoning generalist rather than a dedicated reasoning model.^[1]^[3]

The published model card lists roughly 675 billion total parameters with about 41 billion active per token, a 256,000-token context window, and integrated image understanding via the fused vision encoder.^[3] Mistral published both a base and an instruction-tuned checkpoint under Apache 2.0, continuing the open-weight tradition that distinguishes it from most other companies competing at the frontier.^[1]^[3]

The model was trained from scratch on approximately 3,000 NVIDIA H200 GPUs and is positioned as a general-purpose, multilingual, multimodal generalist rather than a dedicated reasoning model.^[1]^[14] At launch, Mistral and external evaluators placed it among the strongest open-weight non-reasoning models on the market, with an LMArena Elo of roughly 1,418 putting it second among open-source non-reasoning entrants and sixth among open-weight models overall.^[1]^[6] In Mistral's own words, the model "debuts at #2 in the OSS non-reasoning models category (#6 amongst OSS models overall)."^[1] The company also announced that a dedicated reasoning variant would follow, stating plainly that "a reasoning version is coming soon"; as of May 2026 that reasoning Large 3 has still not shipped, while configurable-reasoning capability has reached the smaller Mistral Small 4 and Medium 3.5 lines.^[1]^[22]^[24]

Mistral Large 3 is the technical centerpiece of the broader Mistral 3 family, which also includes the smaller dense Ministral 3 models at 3B, 8B, and 14B parameters, each shipped in base, instruct, and reasoning variants under the same Apache 2.0 license.^[1] The flagship is distributed through Mistral's own La Plateforme and Mistral AI Studio, Amazon Bedrock, Azure AI Foundry, Hugging Face, IBM watsonx, OpenRouter, Fireworks, Together AI, Modal, Unsloth AI, and several other hosts.^[10]^[12]^[13] API pricing at launch is $0.50 per million input tokens and $1.50 per million output tokens, which undercuts most closed frontier offerings by a wide margin.^[4]^[9]

Key facts

Attribute	Value
Developer	Mistral AI
Released	December 2, 2025^[1]^[17]
Architecture	Granular sparse MoE transformer + fused vision encoder^[3]
Parameters	~675B total / ~41B active (language 673B / 39B; vision ~2.5B)^[3]
Context window	256,000 tokens^[3]
Input modalities	Text, image (recommended ~1:1 aspect ratio, up to 10 images per prompt)^[3]
License	Apache 2.0^[1]^[3]
Training hardware	~3,000 NVIDIA H200 GPUs^[1]
Tokenizer	`mistral` (mistral_common >= 1.8.6)^[3]
Quantizations	FP8 (default), NVFP4, BF16^[3]
Reference engine	vLLM >= 1.12.0^[3]
API price (input / output)	$0.50 / $1.50 per million tokens^[4]
LMArena Elo at launch	~1,418 (#2 OSS non-reasoning, #6 OSS overall)^[1]^[6]
Artificial Analysis Intelligence Index	23^[4]
Hugging Face ID	`mistralai/Mistral-Large-3-675B-Instruct-2512`^[3]

Why did Mistral build Large 3?

Mistral AI was founded in 2023 in Paris by Arthur Mensch, Guillaume Lample, and Timothee Lacroix, three former research scientists from Google DeepMind and Meta AI. The company quickly became the public face of European foundation-model development, building a reputation for technical efficiency by releasing dense models like Mistral 7B and the original Mixtral 8x7B and 8x22B mixture-of-experts models under permissive open-weight licenses while operating with a smaller compute budget than U.S. competitors. By late 2025, the company had raised more than 1.7 billion euros in a Series C round led by ASML at a roughly 11.7 billion euro valuation and had become one of the few non-American labs producing models that competed directly with offerings from OpenAI, Anthropic, and Google.^[19]

The Mistral Large line is the company's flagship generalist family. Mistral Large launched in February 2024 as a closed-weights commercial dense model, followed by Mistral Large 2 in July 2024, which had 123 billion parameters and was released under the Mistral Research License with a separate commercial license track. The intermediate refresh, Mistral Medium 3, arrived in May 2025 as a mid-tier offering and was deliberately positioned as a smaller model with strong cost-performance characteristics. Mistral Large 3 returns to the flagship slot after a roughly seventeen-month gap and reverses two trends from the second generation: it is much larger in nominal parameter count, and it is once again fully open weight rather than gated behind a research-only license.^[1]^[3]

The move back to MoE is also notable. Mistral popularized open-source mixture-of-experts work with Mixtral 8x7B in late 2023 and Mixtral 8x22B in 2024, but Mistral Large 1 and 2 were dense models. Throughout 2025, several competing labs released large MoE flagships, including DeepSeek V3, DeepSeek V4, Qwen3 Max, and Kimi K2.5, all of which use sparse activation to push past 600 billion parameters while keeping inference costs roughly comparable to dense 30B to 70B systems. Mistral Large 3 is the company's direct answer to that wave.

The broader Mistral 3 release was a coordinated launch rather than a single model drop. Alongside Mistral Large 3, the company published the Ministral 3 family of small dense models in 3B, 8B, and 14B sizes, each shipped in base, instruct, and reasoning variants with optional image understanding, and each released under Apache 2.0.^[1] Mistral framed the combined release as a deliberate full-stack push: a frontier-grade flagship at the high end and a set of laptop, drone, and edge-device models at the low end, all sharing tokenizer infrastructure and a common training lineage. Mistral also announced a branding move at the same launch, renaming its commercial cloud product La Plateforme as Mistral AI Studio and unifying its enterprise APIs under a single billing umbrella.^[1]

What is Mistral Large 3's architecture?

Mistral Large 3 is built on a sparse mixture-of-experts transformer. The MoE language backbone accounts for approximately 673 billion parameters with about 39 billion active per token, and an integrated vision encoder of roughly 2.5 billion parameters brings the published totals to 675 billion total and 41 billion active.^[3] Only a small subset of experts is selected by the router for each token, so the per-token compute and memory bandwidth requirements at inference are far below what a comparable dense 675B model would need. The activation ratio of roughly 16 to 1 between total and active parameters sits in the same broad range as DeepSeek V3 at 671B total and 37B active, and the convergence on that ratio across several recent open-weight flagships is itself a notable finding about where the sparse-MoE design space has settled.

Mistral describes the architecture as "granular" MoE, the same broad family of sparse designs popularized by DeepSeek and adopted by several other recent open-weight flagships.^[1] Granular MoE uses many small experts rather than a few large ones, which generally improves expert specialization and routing coherence at the cost of more complex implementation. Mistral has not published a precise expert count or the top-k routing setting for the production model, although the company has stated that the routing layer was redesigned to improve coherence compared with the earlier Mixtral series. The launch blog emphasizes that the routing scheme was co-designed with NVIDIA to scale across many GPUs and long contexts, and the launch was accompanied by specialized Blackwell attention and MoE kernels for GB200 NVL72 systems.^[14]

The vision encoder ingests image inputs and projects them into the language model's token space, allowing the same backbone to handle text-only and image-plus-text prompts without a separate model. Mistral recommends near-square aspect ratios for image inputs and cropping wide or tall images before submission, and the model card sets a soft limit of ten images per prompt.^[3] The vision pathway supports OCR, document understanding, and standard image question answering, although Mistral notes that dedicated vision-first models still outperform Large 3 on some narrow multimodal benchmarks. The fact that the vision encoder is fused into the main checkpoint rather than bolted on as a separate adapter is one of the structural distinctions between Large 3 and several rival open-weight flagships that ship multimodality as a follow-on.

The context window is 256,000 tokens, which puts the model in roughly the same range as Claude Opus 4.7 and well above GPT-5 at standard settings, although below the million-token context offered by Gemini 3 Pro and the multi-million token context of Anthropic's 1M Sonnet variant.^[3] The instruction-tuned checkpoint is post-trained in FP8 precision, which is unusual for an open-weight release and is intended to make it easier to serve on a single eight-GPU node of NVIDIA H200 or B200 hardware. An NVFP4 quantization is also distributed for serving on H100 or A100 nodes, and a BF16 version is available for users who need it. The official model card notes that the NVFP4 variant is best used at contexts below 64,000 tokens, while the FP8 variant supports the full 256,000-token window without quality regressions.^[3]

The tokenizer is the Mistral tokenizer family rather than the Tekken tokenizer used in some earlier Mistral releases, and the company explicitly designed it for short, dense outputs. Artificial Analysis evaluators noted that Mistral Large 3 produced roughly 5.2 million output tokens across their full benchmark suite compared with a median of 11 million, which translates into substantially lower output costs per task even before the lower headline price per million tokens is taken into account.^[4] Mistral's launch blog highlighted this verbosity reduction as a deliberate design point intended to make the model competitive on cost per task and not just on cost per token.

How was Mistral Large 3 trained?

Mistral has been more restrained about training details than some of its peers, and the disclosed information is correspondingly narrow. The company has confirmed that the model was trained from scratch rather than upcycled from a smaller checkpoint, and that the training run used approximately 3,000 NVIDIA H200 GPUs.^[1] The H200 was Nvidia's late-2024 successor to the H100, with 141 GB of HBM3e memory per accelerator and substantially higher memory bandwidth, which is particularly useful for the high-bandwidth communication patterns of MoE training.

Neither the total token count nor the data mixture has been published. Mistral has said that the training data covers more than forty languages and a wide range of programming languages, and that the post-training stage focused on instruction following, function calling, and tool use rather than long chain-of-thought reasoning.^[1] The reasoning-tuned variant announced for a later release will be the dedicated reasoning post-train, and as of May 2026 the company had indicated that it remained in development while the smaller Ministral 3 reasoning variants, the merged Mistral Small 4 (March 16, 2026), and Medium 3.5 (April 28, 2026) with configurable reasoning shipped first.^[22]^[24]

Mistral has not disclosed a compute estimate in FLOPs, a final loss, or the precise blend of pretraining versus annealing stages. Several third-party writeups have estimated the training compute at the lower end of the frontier range based on the GPU count and likely duration, but those figures are not from Mistral and should be treated as informed speculation rather than published facts. Independent observers have nevertheless noted that a 3,000-GPU H200 run is small by frontier-lab standards. Most U.S. labs at the same time were training frontier models on clusters of 20,000 to 100,000 or more accelerators, and the compute gap relative to that cohort is part of what makes Mistral's competitive performance on standard benchmarks notable.^[6]

The instruction tuning recipe includes preference-based fine-tuning on conversational and agentic data, with explicit support for function calling, structured JSON output, system prompts, and multi-turn tool use. Mistral has confirmed that the tool-call parser used in deployment is the same mistral parser used across the rest of the Mistral 3 family, and the company recommends limiting the number of tools exposed to the model in a single prompt to reduce routing errors.^[3] Mistral has not published a paper or a detailed technical report for Large 3, although the Hugging Face model card and the launch blog post provide most of the architectural and benchmark numbers cited above.^[1]^[3]

For reasoning-heavy tasks, Mistral has explicitly positioned Large 3 as a System 1 pattern-matcher rather than a System 2 deliberative reasoner.^[1] The company's design intent is that the model handles fast, high-throughput tasks like chat, document drafting, code completion, and tool selection, while a separate reasoning variant will eventually handle long chain-of-thought workloads. This positioning matches the broader industry split between non-reasoning and reasoning models that emerged across 2024 and 2025, in which the two families share architectures but use different post-training recipes and inference modes.

How good is Mistral Large 3? Benchmark performance

Mistral and several independent evaluators have published a partial set of benchmark scores for Mistral Large 3. The table below collects the most widely reported figures, all of which refer to the instruction-tuned 2512 checkpoint unless noted otherwise.

Benchmark	Score	Notes
LMArena Elo	~1,418	#2 among open-source non-reasoning at launch, #6 among open-weight overall^[1]^[6]
MMLU (8-language multilingual)	~85.5%	Reported in third-party analyses^[6]
MMLU-Pro	~73.1%	Harder MMLU variant; independent evaluation^[20]
HumanEval (Python)	~90-92% pass@1	Code generation benchmark^[20]
GPQA Diamond	~43.9%	Graduate-level science questions^[20]
SimpleQA	~23.8%	Factual recall benchmark^[6]
Artificial Analysis Intelligence Index	23	Composite score, below average for open-weight cohort^[4]
Time to first token	1.02 to 1.07 seconds	Across hosted providers in third-party tests^[4]
Output throughput	~49.3 tokens per second	Mistral-hosted endpoint in Artificial Analysis testing^[4]
Total output tokens (full Intelligence Index suite)	~5.2M	vs. median 11M, reflecting concise outputs^[4]

A few of these results are worth flagging individually. On LMArena, the model's roughly 1,418 Elo placed it second among open-source non-reasoning entrants at the time of release.^[1] On HumanEval, the high pass@1 score is consistent with Mistral's claim that Large 3 was the strongest open-source coding model on LMArena at launch, although different evaluators report 90-92% depending on prompt formatting.^[20] On GPQA Diamond and other reasoning-heavy benchmarks, the model trails dedicated reasoning systems such as DeepSeek R1, GPT-5 in reasoning mode, and Claude Opus 4.7 with extended thinking enabled, which is consistent with Mistral's own positioning of the December 2025 release as a non-reasoning generalist with a reasoning variant to follow.^[6]

The Artificial Analysis Intelligence Index of 23 is a composite score that aggregates several public benchmarks and places Mistral Large 3 in the middle of the pack against the broader landscape of frontier and near-frontier models.^[4] The figure is below the index scores for the latest frontier reasoning systems but above many older non-reasoning open-weight releases. Time to first token in Artificial Analysis's hosted evaluation was 1.02 seconds, and output throughput averaged 49.3 tokens per second, both of which are below average among the models evaluated and reflect the relatively heavy serving footprint of a 41B-active MoE compared with smaller dense systems. When the model is hosted on Amazon Bedrock rather than Mistral's own API, the same evaluators measured 166.9 tokens per second of output, which suggests that the gating factor in the Mistral-hosted case is serving capacity rather than the model itself.^[4]

The lower SimpleQA score is the most pointed area of weakness in the published results. SimpleQA is designed to probe long-tail factual recall, and a sub-25 percent score implies that the model still hallucinates on obscure named entities and specific dates. This is a common weakness among open-weight generalists and is not unique to Mistral Large 3, but it is worth keeping in mind for applications that depend on grounded factual accuracy. Retrieval augmentation or function calling against authoritative knowledge sources is the standard mitigation, and Mistral has explicitly recommended pairing the model with retrieval pipelines for knowledge-heavy workloads.^[6]

On coding benchmarks beyond HumanEval, the picture is more mixed. Mistral Large 3 performs solidly on LiveCodeBench but does not lead the open-weight cohort, with GLM-4.6 and the strongest variants of Kimi K2 scoring higher on the most modern coding suites.^[6] Early SWE-Bench Verified figures, where third-party evaluators ran the model against real GitHub issues, placed Large 3 in a middle tier on agentic coding tasks. Mistral has acknowledged this and continued to recommend its specialized Devstral models for SWE-Bench-style workflows; the merged Mistral Small 4 release in March 2026 absorbed the Devstral agentic-coding lineage into a configurable reasoning model, and the Mistral Medium 3.5 release in April 2026 reported 77.6% on SWE-Bench Verified, comfortably above the launch Large 3 figure.^[22]^[24]

Comparison benchmark	Mistral Large 3	Mistral Medium 3	Notes
Context window	256K tokens	~128K tokens	Both released as multimodal-capable
API input price per 1M tokens	$0.50	$0.40	Medium 3 is slightly cheaper on input
API output price per 1M tokens	$1.50	$2.00	Large 3 is cheaper on output
Architecture	Sparse MoE (675B/41B)	Mid-tier (size not fully disclosed)	Medium 3 size not fully published
LMArena Elo (non-reasoning)	~1,418	Lower tier	Large 3 ranks higher
MMLU (multilingual)	~85.5%	~82% reported	Large 3 leads on multilingual
Release date	December 2, 2025	May 7, 2025	Seven-month gap

The head-to-head with Mistral Medium 3 is a useful reference point because Medium 3 had been the company's most capable hosted model for the seven months leading up to the December 2025 launch.^[5] Large 3 is positioned above Medium 3 on most public benchmarks, with the gap widening on multilingual prompts and on long-context tasks where the 256K window offers a clear advantage. Medium 3 remained in service as a less expensive option for production workloads that did not require the full Large 3 capabilities, and Mistral continued to update Medium 3 in parallel with Medium 3.1 and Medium 3.5 refreshes rather than positioning Large 3 as a direct replacement.^[24]

Is Mistral Large 3 open source? API and licensing

The instruction-tuned weights are published on Hugging Face as mistralai/Mistral-Large-3-675B-Instruct-2512, and a base checkpoint is available alongside the instruct version.^[3] Both checkpoints are released under the Apache 2.0 license, which permits commercial use, modification, and redistribution without a separate license fee.^[1]^[3] Mistral calls Large 3 "one of the best permissive open weight models in the world," and the Apache 2.0 choice is a notable shift from Mistral Large 2, which was released under the more restrictive Mistral Research License and required a separate commercial license for production use.^[1] Mistral has framed the Apache 2.0 choice as a deliberate return to the company's open-source roots after a period in 2024 and early 2025 during which several Mistral releases had been gated by research-only licenses.

The model is served through Mistral's own La Plateforme API and Mistral AI Studio at $0.50 per million input tokens and $1.50 per million output tokens.^[4]^[9] Mistral does not separately price image inputs at the standard tier, and the published rates apply to the full 256K context window. Comparable hosting on Amazon Bedrock, Azure AI Foundry, IBM watsonx, OpenRouter, Fireworks, Together AI, and Modal is available with provider-specific pricing that tracks the Mistral list price closely.^[10]^[12]^[13]^[26] NVIDIA NIM and AWS SageMaker integrations were announced as coming soon at launch and shipped during the first quarter of 2026. The model is also offered as a managed Amazon Bedrock endpoint with a serverless billing model, which means customers pay only for tokens consumed and not for provisioned capacity.^[12]

Provider	Input price per 1M tokens	Output price per 1M tokens	Notes
Mistral La Plateforme / AI Studio	$0.50	$1.50	First-party hosting, ~1.02-1.07s time to first token^[4]^[9]
Amazon Bedrock	$0.50	$1.50	Serverless, ~166.9 t/s output throughput in third-party tests^[4]^[12]
Azure AI Foundry / Microsoft Foundry	$0.50	$1.50	Available across eastus, westeurope, and other regions^[13]
IBM watsonx	Enterprise tier	Enterprise tier	Targeted at regulated industries^[26]
OpenRouter	Pass-through	Pass-through	Multi-provider routing^[10]
Fireworks / Together AI	Provider-specific	Provider-specific	Hosted on each provider's infrastructure
Modal	Self-managed	Self-managed	Customer brings own quotas
Self-hosted (Hugging Face)	Hardware cost only	Hardware cost only	Free weights under Apache 2.0^[3]

For self-hosters, Mistral recommends vLLM 1.12.0 or later as the inference engine of choice, and the FP8 checkpoint is designed to fit on a single eight-GPU node of H200 or B200 hardware.^[3] The NVFP4 quantization targets eight-GPU H100 and A100 nodes, and the BF16 release is provided for users with sufficient memory and bandwidth. The recommended vLLM command sets a maximum context length of 262,144 tokens with tensor parallelism of eight and the Mistral tokenizer, config, and load format flags. Hugging Face Transformers support was not yet available at launch, with Mistral noting that community contributions were welcome to add Transformers compatibility, and a working Transformers integration appeared in the open-source ecosystem during the first months of 2026.^[3]

Red Hat published a same-day deployment guide alongside the launch, using its OpenShift AI distribution to serve both the FP8 and NVFP4 checkpoints, and several other inference platforms including Hyperstack, DigitalOcean, and Lambda Cloud announced day-zero support.^[15] Mistral worked directly with NVIDIA on the launch and shipped TensorRT-LLM and SGLang kernel integrations in parallel with the model, along with optimizations for Blackwell GPUs and edge deployment paths through NVIDIA DGX Spark, RTX PCs, and Jetson devices.^[14] Most of the latter optimizations target the smaller Ministral 3 models rather than the flagship.

The Apache 2.0 license carries the usual provision that users must not use the model to infringe third-party intellectual property rights, but otherwise places no restrictions on commercial use or derivative works. Fine-tuning and continued pretraining on the published weights are explicitly permitted, and several third-party fine-tunes of Large 3 appeared on Hugging Face within days of release. The Mistral model card recommends a sampling temperature below 0.1 for production deployments and notes that higher temperatures are appropriate for creative use cases.^[3]

How does Mistral Large 3 compare to other models?

Mistral Large 3 is one of several heavy-MoE flagships that shipped in late 2025 and early 2026, and the most informative comparisons are with the other open-weight or open-API models in that cohort. The table below collects the headline specifications and the highest-confidence comparative numbers from public sources. Some figures are approximate, and the comparison is across non-reasoning baselines unless noted otherwise.

Model	Developer	Architecture	Total / active params	Context window	Weights	API input price per 1M tokens
Mistral Large 3	Mistral AI	Sparse MoE	675B / 41B	256K	Apache 2.0 open	$0.50
GPT-5	OpenAI	Closed; rumored MoE	Not disclosed	~400K	Closed	Higher tier
Claude Opus 4.7	Anthropic	Closed	Not disclosed	200K (1M variant available)	Closed	Higher tier
Gemini 3 Pro	Google DeepMind	Closed	Not disclosed	1M+	Closed	Mid tier
DeepSeek V4	DeepSeek	Sparse MoE	Not disclosed (V3 was 671B / 37B)	128K+	Open	Very low
Kimi K2.5	Moonshot AI	Sparse MoE	~1T / ~32B	256K+	Open	Low
Mistral Medium 3	Mistral AI	Mid-tier (size not disclosed)	Not disclosed	~128K	Closed-hosted	$0.40

Against the closed frontier triumvirate of GPT-5, Claude Opus 4.7, and Gemini 3 Pro, Mistral Large 3 trails the leaders on the hardest reasoning, agentic, and tool-use benchmarks, particularly when the closed models are run in reasoning or extended-thinking mode. That gap is consistent with the fact that Mistral has positioned the December 2025 release as a non-reasoning generalist; the reasoning variant announced for a later release is the more direct comparison and was not available at the time of writing.^[1] The Gemini 3 Pro LMArena Elo of roughly 1,501 is approximately 80 points higher than the Mistral Large 3 figure, which is a meaningful gap in pairwise preference terms but still leaves Mistral comfortably above older closed flagships.

Against the other large open-weight MoEs, the comparison is closer and more interesting. DeepSeek V4 and Kimi K2.5 are similarly sized sparse-MoE generalists released around the same window, and head-to-head benchmark differences on standard suites like MMLU-Pro and HumanEval are typically within a few points. Mistral's advantages relative to that group are stronger multilingual coverage outside English and Chinese, integrated multimodality with a vision encoder built into the same checkpoint, the Apache 2.0 license, and a hosted API priced below most U.S. competitors. DeepSeek V4 in particular is often cheaper to self-serve and stronger on some math and code reasoning benchmarks, while Mistral Large 3 tends to do better on multilingual dialogue and general instruction following. Independent reviewers have noted that on a composite general-intelligence index, Mistral Large 3 finishes behind DeepSeek 3.2, Kimi K2 Thinking, and GLM-4.6 when those models are evaluated in their best modes, but stays ahead of older open-weight generalists from 2024.^[6]

How does Mistral Large 3 differ from Mistral Large 2?

Against the previous Mistral generation, the differences are stark. Mistral Large 2 was a 123B-parameter dense model with a 128K context window, no native multimodality, and a research license. Mistral Large 3 doubles the context window, adds image inputs, adopts a permissive open-source license, and roughly quintuples the nominal parameter count while keeping per-token inference compute manageable through sparse activation. The non-reasoning generalist quality gain over Large 2 is substantial on most public benchmarks, although the gain on specialized reasoning tasks is modest given that neither model was reasoning-tuned at release.

Attribute	Mistral Large 2 (July 2024)	Mistral Large 3 (December 2025)
Architecture	Dense transformer	Sparse MoE + fused vision encoder^[3]
Total parameters	123B	~675B (~41B active)^[3]
Context window	128K tokens	256K tokens^[3]
Modalities	Text only	Text and image^[3]
License	Mistral Research License	Apache 2.0^[1]^[3]
Reasoning mode	No	No at launch (variant announced)^[1]

The in-family comparison with Mistral Medium 3 is informative as well. Medium 3 is positioned as a mid-tier model with strong cost-performance characteristics, while Large 3 is positioned as the most capable Mistral model overall.^[5] Medium 3 has a 128K context window and a slightly lower input price, but Large 3 leads on the harder reasoning benchmarks, on multilingual coverage, and on long-context tasks. The two models were complementary at launch, with Medium 3 covering high-volume cost-sensitive workloads and Large 3 covering the most demanding generalist tasks until later refreshes shifted the picture (see below).

How was Mistral Large 3 received?

Reception was generally positive at launch, with reviewers describing the model as the strongest open-weight non-reasoning generalist released to date by a Western lab.^[17]^[27] Mistral's choice to relicense under Apache 2.0 was widely highlighted as a meaningful shift, since it removes the commercial licensing friction that had constrained adoption of Mistral Large 2 in some enterprise settings. Coverage at Analytics Vidhya, DataCamp, and several developer-focused outlets noted that the model produced compact and readable code on first-look tests and held up well on multilingual prompts including Levantine Arabic dialect and several South and Southeast Asian languages.^[7]^[8] Reviewers at VentureBeat, eWeek, TechCrunch, and CNBC framed the release as a notable competitive move against the U.S. and Chinese open-weight cohorts.^[17]^[20]^[27]

The Artificial Analysis Intelligence Index score of 23 generated some discussion.^[4] Critics noted that the figure placed Mistral Large 3 in roughly the middle of the broader frontier and near-frontier cohort rather than at the top, and pointed out that the model's lack of a reasoning mode at release made head-to-head comparisons with the latest GPT-5 and Claude variants unfavorable. Supporters countered that the index aggregates benchmarks where reasoning models have a built-in advantage, and that the comparison is more meaningful once the reasoning variant of Mistral Large 3 ships. Independent reviewers who tested the model on practical agentic tasks generally reported that Mistral Large 3 handled function calling and structured output well, with reliable tool selection and good adherence to schema constraints, although they noted that the model is not designed to run long autonomous chains with hundreds of tool calls in the way some specialized agentic systems are.

The SimpleQA result drew the most pointed criticism. A sub-25 percent score on factual recall is a real limitation for any deployment that asks the model to answer trivia-style questions about people, places, or events without retrieval support. Several writeups recommended pairing Mistral Large 3 with a retrieval pipeline for production knowledge work, which is unsurprising advice for any non-reasoning generalist but worth stating explicitly here.^[6] The companion enterprise documentation from Mistral AI Studio echoes this recommendation and provides direct integrations with retrieval connectors for SharePoint, Google Drive, and several enterprise content management systems.

Within the open-source community, the most-discussed point was the deployment footprint. A 675B-parameter checkpoint, even with sparse activation and FP8 weights, requires a single eight-GPU node of H200 or B200 hardware to serve at full quality, and that is well above the hardware budget of most individual developers or small research groups.^[3] The NVFP4 quantization eases the cost somewhat by enabling H100 and A100 deployments, but the model remains out of reach for hobbyist single-GPU serving. Several community members called for a distilled or smaller dense variant in the Large 3 line; Mistral has indicated that the Ministral 3 family at 3B, 8B, and 14B parameters serves part of that role, although those models are dense and architecturally separate from the flagship.

French and European observers framed the release as a notable moment for European AI sovereignty.^[27] Mistral remains one of the few non-U.S. labs operating at the frontier of generalist language modeling, and the Apache 2.0 release positions Mistral Large 3 as a model that can be deployed in regulated settings without depending on a U.S. cloud provider. Several European cloud and enterprise vendors began offering hosted Mistral Large 3 inference within days of release. The launch was widely covered in French-language press and was framed in some commentary as a deliberate statement of European intent in the face of U.S. and Chinese model dominance.

On balance, Mistral Large 3 was received as a strong, well-engineered open-weight generalist with a clear product position: a permissive open-source flagship priced below most closed alternatives, with the trade-off that the highest-end reasoning, math, and agentic benchmarks still favor the closed competition until the reasoning variant ships.

Who uses Mistral Large 3? Enterprise adoption

One of the more striking features of the Mistral Large 3 launch was that it was accompanied by a large enterprise deal rather than left to find customers in the months after release. On December 1, 2025, one day before the public model announcement, Mistral and HSBC announced a multi-year strategic partnership giving the bank access to Mistral's full commercial model range, including future developments, with the explicit intent of integrating Mistral Large 3 and its successors into HSBC's internal systems on a self-hosted basis.^[16] The deal covered banking applications including credit processing, customer onboarding, fraud detection, and anti-money-laundering workflows.

Mistral followed the HSBC deal with the broader Mistral Forge platform, announced at NVIDIA GTC on March 17, 2026. Forge is positioned as an enterprise platform that lets organizations build frontier-grade AI models trained on their own proprietary data, with support for full pre-training, post-training, and reinforcement learning on internal datasets.^[18] Early adopters announced at GTC included Ericsson for 5G and 5G Advanced network management, the European Space Agency, the Italian consulting company Reply, Singapore's DSO and HTX, and ASML for semiconductor manufacturing optimization.^[18]^[28] Mistral Large 3 is the most capable foundation model in the Forge stack as of the platform's launch, and Forge customers typically use the model as a base for domain-specific fine-tunes rather than running it directly in inference.

The Ericsson partnership was formalized in February 2026 with a multi-year agreement to apply Mistral's customization stack to telecom-specific use cases, including automation of legacy code translation, AI-assisted development for 6G research, and custom AI agents for complex network workflows; Ericsson also acts as a design partner for the Forge platform.^[28] Accenture announced a parallel co-development partnership in February 2026 covering more than 700,000 of its own employees, and Singapore's Singtel RE:AI announced a sovereign offering built on Mistral models in April 2026.^[31]

Within Le Chat Enterprise, Mistral's hosted conversational product, Mistral Large 3 replaced Mistral Medium 3 as the default model within weeks of release. Le Chat Enterprise had been launched earlier in 2025 and tripled its revenue in the hundred days following Large 3's integration, according to the company's own figures.^[18] The product features enterprise integrations with SharePoint, Google Drive, and a range of enterprise content management systems, plus flexible deployment options including self-hosted, sovereign-cloud, and on-premises deployments. Mistral has reported that enterprise traction has been particularly strong in Europe, with roughly 60 percent of company revenue coming from European customers as of mid-2025 and that share holding steady through early 2026.

In parallel with the commercial rollout, Mistral committed substantial capital to European infrastructure. In March 2026, the company raised approximately 830 million dollars in debt to fund the installation of about 13,800 NVIDIA GB300 GPUs at a Paris-region data center, bringing the facility to a total capacity of 44 megawatts.^[21] A separate 1.2 billion euro deal for a Swedish facility was announced to host follow-on training runs. The combined infrastructure footprint is one of the largest single-customer GPU orders in European AI history, and it signals that Mistral intends to train successor models, including the eventual Large 3 reasoning variant and any Large 4 generation, on its own sovereign infrastructure rather than relying on U.S. cloud providers.

What other Mistral 3 models followed Large 3?

Although Mistral Large 3 has remained the company's frontier flagship through May 2026, the Mistral 3 family expanded significantly in the months after launch. The official Mistral documentation changelog lists the following major model events between December 2025 and May 2026:^[25]

Date	Release	Highlights
Dec 2, 2025	Mistral Large 3 + Ministral 3 family	Flagship MoE + 3B/8B/14B dense models, Apache 2.0^[1]
Mar 12, 2026	Mistral Moderation 2603	Content moderation classifier; custom guardrails added to Agents API^[25]
Mar 16, 2026	Mistral Small 4 (`mistral-small-2603`) + Leanstral	Hybrid model unifying instruct, reasoning, and coding; 256K context. Leanstral is Mistral's first open-source code agent for Lean 4 formal proofs^[25]
Mar 17, 2026	Mistral Forge	Enterprise AI training platform announced at NVIDIA GTC, with ASML, Ericsson, ESA, Reply, DSO, HTX as early adopters^[18]
Mar 23, 2026	Voxtral TTS (`voxtral-tts-2603`)	Zero-shot voice cloning, 9 languages, real-time streaming; ~4B parameters; weights on Hugging Face^[25]^[29]^[30]
Apr 28, 2026	Mistral Medium 3.5	Frontier-class multimodal model with adjustable `reasoning_effort` parameter; 128B dense, 77.6% SWE-Bench Verified, 256K context, Modified MIT license^[22]^[24]^[25]

The release cadence has two clear consequences for the position of Mistral Large 3 itself. First, configurable reasoning, the design pattern that Mistral has chosen for its long chain-of-thought workloads, has now shipped in both Small 4 and Medium 3.5, but not yet in Large 3. The Large 3 reasoning variant remains in development; industry observers expect it to follow the same reasoning_effort toggle design rather than ship as a separate fixed-mode checkpoint.^[22]^[23] Second, Medium 3.5 now meaningfully overlaps with Large 3 on agentic and coding workloads, scoring 77.6% on SWE-Bench Verified versus Large 3's middle-tier figures on the same benchmark, while also offering adjustable reasoning that Large 3 lacks.^[24] Medium 3.5 is priced higher than Large 3 on output ($7.50 vs. $1.50 per million tokens), which preserves Large 3's cost-per-token advantage for general workloads even as Medium 3.5 takes the lead on the hardest reasoning and coding tasks.^[24]

Voxtral TTS, released on March 23-26, 2026, marked Mistral's first foray into speech synthesis and reported a 68.4% win rate against ElevenLabs Flash v2.5 on multilingual voice cloning, with weights distributed under CC BY-NC 4.0 for non-commercial use.^[25]^[29]^[30] It is architecturally separate from the language models but uses the Mistral 3 tokenizer infrastructure.

When will the Mistral Large 3 reasoning variant ship? Future development

Mistral indicated at launch that a dedicated reasoning variant of Large 3 would follow, stating that "a reasoning version is coming soon."^[1] As of May 2026 that variant had not yet shipped, but Mistral had released configurable-reasoning models elsewhere in its lineup, including the Ministral 3 reasoning variants in December 2025, the merged Mistral Small 4 in March 2026, and Mistral Medium 3.5 in April 2026.^[22]^[24] Industry observers expect the Large 3 reasoning variant to follow a similar configurable-reasoning design rather than a fixed chain-of-thought mode, and to ship during 2026 once the company's new Paris and Swedish data centers come online.^[21]^[23]

Beyond the reasoning variant, Mistral has continued to expand the Mistral 3 family. The launch of Mistral Forge in March 2026 established the company's enterprise customization platform, and the Voxtral TTS model in late March 2026 marked Mistral's first foray into speech synthesis.^[18]^[29] Subsequent updates to Mistral Medium 3, including the Medium 3.1 and Medium 3.5 refreshes in early 2026, brought integrated reasoning, coding, and vision capabilities into the mid-tier line.^[24] Mistral Large 3 has remained the company's frontier-grade flagship throughout this period and is expected to hold that position until either the reasoning variant of Large 3 ships or a Mistral Large 4 generation is announced.

The broader strategic context is that Mistral has positioned itself as a full-stack provider of European sovereign AI, with foundation models, an enterprise platform, a custom-model training service, and an expanding set of specialized applications. Mistral Large 3 is the technical anchor of that stack, and its open-weight Apache 2.0 release means that customers can self-host the model in regulated environments without depending on either Mistral or any U.S. cloud provider. That combination of frontier-grade capability, permissive licensing, and European hosting options has been the central selling point of the model in the months after launch.

References

Mistral AI. "Introducing Mistral 3." Mistral AI News, December 2, 2025. https://mistral.ai/news/mistral-3 ↩
Mistral AI. "Mistral Large 3." Mistral Documentation, 2025. https://docs.mistral.ai/models/mistral-large-3-25-12
Mistral AI. "Mistral-Large-3-675B-Instruct-2512." Hugging Face model card, December 2025. https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512 ↩
Artificial Analysis. "Mistral Large 3: Intelligence, Performance and Price Analysis." December 2025. https://artificialanalysis.ai/models/mistral-large-3 ↩
Artificial Analysis. "Mistral Large 3 vs Mistral Medium 3: Model Comparison." December 2025. https://artificialanalysis.ai/models/comparisons/mistral-large-3-vs-mistral-medium-3 ↩
IntuitionLabs. "Mistral Large 3: An Open-Source MoE LLM Explained." December 2025. https://intuitionlabs.ai/articles/mistral-large-3-moe-llm-explained ↩
DataCamp. "Mistral 3: A Look at the Model Family, Benchmarks, and More." December 2025. https://www.datacamp.com/blog/mistral-3 ↩
Analytics Vidhya. "Mistral Large 3: First Look and Testing." December 2025. https://www.analyticsvidhya.com/blog/2025/12/mistral-large-3/ ↩
LLM Stats. "Mistral Large 3 (675B Instruct 2512) Benchmarks, Pricing and Context Window." December 2025. https://llm-stats.com/models/mistral-large-latest ↩
OpenRouter. "Mistral Large 3 2512 API Pricing and Benchmarks." December 2025. https://openrouter.ai/mistralai/mistral-large-2512 ↩
Vercel AI Gateway. "Mistral Large 3 by Mistral AI: Specs, Pricing and API." December 2025. https://vercel.com/ai-gateway/models/mistral-large-3
AWS. "Mistral Large 3 and Ministral 3 family now available first on Amazon Bedrock." December 2025. https://aws.amazon.com/about-aws/whats-new/2025/12/mistral-large-3-ministral-3-family-available-amazon-bedrock/ ↩
Microsoft. "Mistral 3 on Microsoft Foundry: Open, multimodal, enterprise-ready." December 2025. https://azure.microsoft.com/en-us/blog/introducing-mistral-large-3-in-microsoft-foundry-open-capable-and-ready-for-production-workloads/ ↩
NVIDIA. "NVIDIA Partners With Mistral AI to Accelerate New Family of Open Models." December 2025. https://blogs.nvidia.com/blog/mistral-frontier-open-models/ ↩
Red Hat Developer. "Run Mistral Large 3 and Ministral 3 on vLLM with Red Hat AI on Day 0." December 2, 2025. https://developers.redhat.com/articles/2025/12/02/run-mistral-large-3-ministral-3-vllm-red-hat-ai ↩
HSBC. "HSBC and Mistral AI join forces to accelerate AI adoption across global bank." December 1, 2025. https://www.hsbc.com/news-and-views/news/media-releases/2025/hsbc-and-mistral-ai-join-forces-to-accelerate-ai-adoption-across-global-bank ↩
TechCrunch. "Mistral closes in on Big AI rivals with new open-weight frontier and small models." December 2, 2025. https://techcrunch.com/2025/12/02/mistral-closes-in-on-big-ai-rivals-with-mistral-3-open-weight-frontier-and-small-models/ ↩
TechCrunch. "Mistral bets on build-your-own AI as it takes on OpenAI, Anthropic in the enterprise." March 17, 2026. https://techcrunch.com/2026/03/17/mistral-forge-nvidia-gtc-build-your-own-ai-enterprise/ ↩
CNBC. "AI firm Mistral valued at $14 billion as chip giant ASML takes major stake." September 9, 2025. https://www.cnbc.com/2025/09/09/ai-firm-mistral-valued-at-14-billion-as-asml-takes-major-stake.html ↩
Medium (Barnacle Goose). "Mistral Large 3 (2512) Review." December 2025. https://medium.com/@leucopsis/mistral-large-3-2512-review-7788c779a5e4 ↩
CNBC. "Mistral secures $830 million in debt financing to fund AI data center near Paris." March 30, 2026. https://www.cnbc.com/2026/03/30/mistral-ai-paris-data-center-cluster-debt-financing.html ↩
Mistral AI. "Mistral Small 4: hybrid instruct, reasoning, and coding model." March 16, 2026. https://docs.mistral.ai/getting-started/changelog ↩
Aizolo. "Mistral AI Models 2026: A Complete Guide." 2026. https://aizolo.com/blog/mistral-ai-models-2026/ ↩
The Decoder. "Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model." April 2026. https://the-decoder.com/mistrals-new-flagship-medium-3-5-folds-chat-reasoning-and-code-into-one-model/ ↩
Mistral AI. "Changelog." Mistral Documentation, 2026. https://docs.mistral.ai/getting-started/changelog ↩
IBM. "Mistral Large 3 now available on IBM watsonx." December 2025. https://www.ibm.com/new/announcements/mistral-large-3-now-available-on-ibm-watsonx ↩
CNBC. "French AI lab Mistral releases new AI models as it looks to keep pace with OpenAI and Google." December 2, 2025. https://www.cnbc.com/2025/12/02/mistral-unveils-new-ai-models-in-bid-to-compete-with-openai-google.html ↩
Ericsson. "Mistral AI and Ericsson partner to drive AI innovation in telecom." February 2026. https://www.ericsson.com/en/news/2026/2/mistral-ai-and-ericsson-partner-to-drive-ai-innovation-in-telecom ↩
TechCrunch. "Mistral releases a new open source model for speech generation." March 26, 2026. https://techcrunch.com/2026/03/26/mistral-releases-a-new-open-source-model-for-speech-generation/ ↩
VentureBeat. "Mistral AI just released a text-to-speech model it says beats ElevenLabs, and it's giving away the weights for free." March 2026. https://venturebeat.com/orchestration/mistral-ai-just-released-a-text-to-speech-model-it-says-beats-elevenlabs-and ↩
RCR Wireless. "Mistral AI targets industrial AI with Stellantis deal, ASML partnership." October 2025. https://www.rcrwireless.com/20251001/industry-4-0/mistral-ai-industrial-stellantis-asml ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

5 revisions by 1 contributors · full history

Suggest edit

What links here

AI Model Release Timeline (2022-2026)Best Open-Source LLMs LLM Size and Parameter Comparison Mistral 7B Mistral Small 4 Pixtral Pixtral Large

What is Mistral Large 3?

Key facts

Why did Mistral build Large 3?

What is Mistral Large 3's architecture?

How was Mistral Large 3 trained?

How good is Mistral Large 3? Benchmark performance

Is Mistral Large 3 open source? API and licensing

How does Mistral Large 3 compare to other models?

How does Mistral Large 3 differ from Mistral Large 2?

How was Mistral Large 3 received?

Who uses Mistral Large 3? Enterprise adoption

What other Mistral 3 models followed Large 3?

When will the Mistral Large 3 reasoning variant ship? Future development

See also

References

Improve this article

Related Articles

Mixtral 8x22B

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

What links here

Related Articles

Mixtral 8x22B

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

What links here