Ministral 3B / 8B
Last reviewed
Jun 8, 2026
Sources
5 citations
Review status
Source-backed
Revision
v1 · 1,473 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 8, 2026
Sources
5 citations
Review status
Source-backed
Revision
v1 · 1,473 words
Add missing citations, update stale details, or suggest a clearer explanation.
Ministral is a family of two small language models released by the French artificial intelligence company Mistral AI on October 16, 2024. Marketed under the wordplay name "les Ministraux," the family consists of Ministral 3B and Ministral 8B, named for their approximately 3 billion and 8 billion parameters. The release marked the first anniversary of Mistral 7B, the model that established Mistral AI as a leading open-weight developer. Both Ministral models target on-device and edge AI use cases, including local inference, privacy-sensitive applications, low-latency assistants, and lightweight agentic sub-tasks such as function calling within larger pipelines.[1][2]
The Ministral models occupy the smallest tier of Mistral AI's lineup, sitting below mid-sized and frontier models such as Mistral NeMo and Mistral Large. Mistral positioned the pair for what it described as a growing demand for "local, privacy-first inference for critical applications," citing examples such as on-device translation, internet-less smart assistants, local analytics, and autonomous robotics.[1] Beyond standalone deployment, Mistral promoted the models as efficient intermediaries inside agentic workflows, where a small model can perform input parsing, task routing, and API function-calling at low latency and cost before handing more demanding work to a larger model.[1]
Both models support a context window of up to 128,000 tokens, although the inference framework vLLM supported 32,000 tokens at launch.[1] Mistral released the family with benchmark claims asserting state-of-the-art performance in the sub-10-billion-parameter category, though some independent evaluations later tempered those claims (see Architecture and benchmarks).[3]
Mistral AI, founded in Paris in 2023 by former DeepMind and Meta AI researchers, built its early reputation on releasing capable open-weight models. Mistral 7B, released in September 2023 under the permissive Apache 2.0 license, demonstrated that a relatively compact model could outperform larger contemporaries on many tasks, and it became a widely adopted base for fine-tuning.[1]
The Ministral release arrived amid an industry-wide push toward smaller, more efficient models that can run on consumer hardware rather than data-center accelerators. In the months surrounding the launch, competing small models included Meta's Llama 3.2 1B and 3B, Google's Gemma 2 2B and 9B, and Microsoft's Phi series. These models reflected a shift in emphasis from raw scale toward efficiency, on-device privacy, and cost per token, trends that the Ministral family was explicitly designed to address.[2][3]
The family comprises two instruction-tuned models intended for different deployment points. Ministral 3B is the smaller and cheaper option, aimed at the most constrained environments, while Ministral 8B offers higher quality at a still-modest footprint. According to Mistral, Ministral 3B already outperformed the year-old Mistral 7B on most internal benchmarks despite having less than half the parameters, illustrating the efficiency gains achieved over the preceding year.[1][4]
The two models differ significantly in availability. Mistral released the weights for Ministral 8B Instruct on Hugging Face for research use, under the model identifier mistralai/Ministral-8B-Instruct-2410. Ministral 3B was not released as open weights and was offered through Mistral's commercial API only.[1][2] Both were made available on Mistral's hosted platform, La Plateforme, under the API model names ministral-8b-latest and ministral-3b-latest.[1]
The specifications below describe the original October 2024 release. In December 2025, Mistral introduced a successor generation branded "Ministral 3" (Ministral 3 3B, 8B, and 14B), released under the Apache 2.0 license, which should not be confused with the original Ministral family described here.[5]
| Specification | Ministral 3B | Ministral 8B (Instruct-2410) |
|---|---|---|
| Developer | Mistral AI | Mistral AI |
| Release date | October 16, 2024 | October 16, 2024 |
| Parameters | ~3 billion | 8,019,808,256 (~8B) |
| Architecture | Dense transformer | Dense transformer |
| Layers | Not publicly disclosed | 36 |
| Hidden dimension | Not publicly disclosed | 4,096 |
| Attention heads / KV heads | Not publicly disclosed | 32 / 8 (grouped-query attention) |
| Head dimension | Not publicly disclosed | 128 |
| Vocabulary size | Not publicly disclosed | 131,072 (V3-Tekken tokenizer) |
| Context window | Up to 128,000 tokens | Up to 128,000 tokens |
| Attention pattern | Standard | Interleaved sliding-window attention |
| Weights released | No (API only) | Yes (Instruct, research use) |
| License | Mistral Commercial License | Mistral Research License + Mistral Commercial License |
| La Plateforme price (input and output) | $0.04 / million tokens | $0.10 / million tokens |
Sources: Mistral AI announcement and the Ministral-8B-Instruct-2410 model card.[1][4] Detailed architecture figures for Ministral 3B were not published by Mistral.
Both Ministral models are dense transformer language models. The headline architectural feature is Ministral 8B's interleaved sliding-window attention, which Mistral describes as a "special interleaved sliding-window attention pattern for faster and memory-efficient inference."[1] The Hugging Face model card characterizes the attention configuration as a ragged pattern, alternating a full 128,000-token window with shorter 32,000-token sliding windows across layers, which reduces the memory cost of long-context inference while retaining the ability to attend across the full sequence.[4] Ministral 8B uses grouped-query attention with 8 key-value heads against 32 query heads, and the V3-Tekken tokenizer with a vocabulary of 131,072 tokens.[4]
On benchmarks, Mistral reported that the models set "a new frontier in knowledge, commonsense, reasoning, function-calling, and efficiency in the sub-10B category."[1] On the MMLU knowledge benchmark, Mistral's figures placed Ministral 3B at 60.9 and Ministral 8B at 65.0. For comparison, Mistral cited Gemma 2 2B at 52.4, Llama 3.2 3B at 56.2, and Llama 3.1 8B at 64.7, positioning each Ministral model ahead of its size-class peers.[3] Mistral further stated that Ministral 3B surpassed Gemma 2 2B and Llama 3.2 3B across AGIEval, TriviaQA, GSM8K, HumanEval, and multilingual tasks, while Ministral 8B outperformed both Mistral 7B and Llama 3.1 8B on most reported benchmarks.[3]
These comparisons should be read with attribution, because they come from Mistral's own evaluations. Independent testing by Artificial Analysis, reported by DeepLearning.AI's The Batch, reached more modest conclusions, placing Ministral 3B behind Llama 3.2 3B and Ministral 8B behind both Llama 3.1 8B and Gemma 2 9B on MMLU and MATH.[3] The discrepancy reflects the well-known sensitivity of small-model benchmarks to prompting, few-shot configuration, and evaluation methodology. The Ministral 8B Instruct model card also reports chat and coding results, including an MT-Bench score of 8.3, an Arena Hard score of 70.9, and a HumanEval pass@1 of 76.8.[4]
Licensing differs between the two models and is one of the most frequently misstated aspects of the release. Ministral 8B was published under two options: the Mistral Research License, a non-commercial license that permits research and evaluation use of the released weights, and the Mistral Commercial License for production deployment. The Hugging Face model card states plainly that commercial use of the open weights requires contacting Mistral AI for a license.[1][4] Ministral 3B was offered under the Mistral Commercial License only and was not distributed as open weights; access was through the API.[1]
Both models were available immediately on La Plateforme, Mistral's hosted API, priced at $0.04 per million tokens for Ministral 3B and $0.10 per million tokens for Ministral 8B, applied uniformly to input and output tokens.[1] Mistral indicated that the models would also become available through cloud partners, and that customers requiring self-deployment of either model could arrange commercial licensing directly. The open weights for Ministral 8B Instruct were distributed in formats compatible with the vLLM inference library, which Mistral recommended for production pipelines.[4]
The Ministral release reinforced the strategic importance of small, efficient models in 2024, a period when several major developers competed to deliver capable models that could run locally on phones, laptops, and edge devices rather than in the cloud. By pairing an API-only 3-billion-parameter model with a partially open 8-billion-parameter model, Mistral pursued a hybrid distribution strategy that balanced community access against commercial revenue, a shift from the fully permissive Apache 2.0 approach of Mistral 7B.[1][2]
The family's emphasis on function-calling and agentic sub-tasks also anticipated the broader move toward agent-based systems, in which inexpensive small models handle routing and tool use while larger models reason over complex problems. Although independent benchmarks suggested the original models did not uniformly lead their class, the Ministral line established a durable product tier for Mistral that the company continued to develop, culminating in the Apache 2.0-licensed Ministral 3 generation released in December 2025.[3][5]