# Mistral 7B

> Source: https://aiwiki.ai/wiki/mistral_7b
> Updated: 2026-06-21
> Categories: AI Companies, Large Language Models, Open Source AI
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**Mistral 7B** is a 7.3-billion-parameter, decoder-only [large language model](/wiki/large_language_model) released by [mistral ai](/wiki/mistral_ai) on September 27, 2023, under the [apache 2 license](/wiki/apache_2_license). It was the company's first publicly released model and one of the first 7B-class systems to outperform Meta's [llama 2](/wiki/llama_2) 13B across most standard benchmarks at the time of release, while also matching or beating the much larger [LLaMA 1](/wiki/llama) 34B on reasoning, math and code tasks.[^1][^2] The launch made a simple but consequential point: with the right architectural choices and a careful training mix, a 7B model could match a 13B competitor on most evaluations while costing far less to serve. Mistral 7B established Mistral AI as a serious player in foundation-model research only four months after the company was founded, and it set the template that most subsequent dense open-weights LLMs would follow.[^1][^3]

The paper states the claim in one sentence: "Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation."[^1] The model achieves this with two architectural levers, again from the abstract: it "leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost."[^1]

The model shipped under the [apache 2 license](/wiki/apache_2_license), with weights distributed both through [hugging face](/wiki/hugging_face) and through a direct BitTorrent magnet link that Mistral posted on X (formerly Twitter) the day before the official blog announcement.[^2][^3][^4] That magnet link became something of a meme in open-source AI circles, partly because Llama 2's license at the time included acceptable-use restrictions and a 700-million-monthly-active-user clause that some saw as not quite "open." Mistral 7B contained no such restrictions. The launch blog put it plainly: "We're releasing Mistral 7B under the Apache 2.0 license, it can be used without restrictions."[^2][^5]

## Quick answer: what is Mistral 7B and why does it matter?

Mistral 7B is a 7.3-billion-parameter open-weights base language model, released September 27, 2023 as Mistral AI's first model and the first open release from a major non-US foundation-model lab to ship frontier-tier weights at the 7B scale. It is decoder-only and built on the [LLaMA](/wiki/llama)-style recipe (RoPE, RMSNorm, SwiGLU), with two distinguishing efficiency features: [grouped-query attention](/wiki/grouped_query_attention) (32 query heads, 8 key-value heads) to shrink the [KV cache](/wiki/kv_cache) by about 4x, and [sliding window attention](/wiki/sliding_window_attention) (a 4,096-token window across 32 layers) to process long sequences cheaply. At launch it outperformed [llama 2](/wiki/llama_2) 13B on every reported benchmark except NaturalQuestions while serving at roughly half the cost, and Mistral marketed it as performing "equivalently to a Llama 2 that would be more than 3x its size."[^1][^2] Its permissive [Apache 2.0](/wiki/apache_2_license) license made it the default starting point for a wave of commercial open-source fine-tunes such as Zephyr 7B, OpenHermes 2.5, and Starling-LM.[^27][^29][^31]

## Infobox

| Field | Value |
|---|---|
| Developer | [mistral ai](/wiki/mistral_ai) |
| Initial release | September 27, 2023[^2] |
| Latest version | Mistral 7B v0.3 / Instruct v0.3 (May 22, 2024)[^6] |
| Parameter count | ~7.24 billion (rounded to "7B" in the name; ~7.3 billion as quoted in the announcement)[^1][^2] |
| Architecture | Decoder-only Transformer with GQA + SWA, RoPE, RMSNorm, SwiGLU[^1] |
| Context length | 8,192 tokens (v0.1); 32,768 tokens (v0.2, v0.3)[^7][^8] |
| Vocabulary | 32,000 (v0.1) / 32,768 (v0.3)[^1][^6] |
| Tokenizer | SentencePiece byte-fallback BPE (v3 in v0.3)[^1][^6] |
| License | [Apache License 2.0](/wiki/apache_2_license)[^2][^3] |
| Paper | arXiv:2310.06825 (October 10, 2023)[^1] |

## When was Mistral 7B released, and by whom?

[mistral ai](/wiki/mistral_ai) was founded in April 2023 in Paris by Arthur Mensch, Guillaume Lample, and Timothée Lacroix.[^9] The three co-founders had originally met as students at the École Polytechnique outside Paris. Mensch had been a research scientist at [google deepmind](/wiki/google_deepmind), where he was one of the lead authors on the [Chinchilla](/wiki/chinchilla) scaling-laws paper. Lample and Lacroix had been research scientists at [meta ai](/wiki/meta_ai), where they were among the lead authors of the original [LLaMA](/wiki/llama) paper. Mensch took the CEO role, Lample became Chief Scientist, and Lacroix became Chief Technology Officer.[^9]

The new company raised a roughly €105 million ($113 million) seed round in June 2023, led by Lightspeed Venture Partners, with participation from Xavier Niel, JCDecaux Holding, Eric Schmidt, Bpifrance, Rodolphe Saadé, and others.[^10][^11] Reports at the time framed it as the largest seed round in European history, valuing the four-week-old company at roughly €240 million (around $260 million in USD).[^10][^11] The fundraise was widely cited as evidence that European investors were now willing to put nine-figure cheques behind frontier-AI research; the founders pitched the company as building open, sovereign foundation models as an alternative to closed US labs.

The first model was promised within months of the company's founding. Internally Mistral AI was building toward something larger (the mixture-of-experts model that would eventually ship as [Mixtral 8x7B](/wiki/mixtral)), but the team wanted an open release out the door first.[^12] That release was Mistral 7B, shipped on September 27, 2023, about three months after the company was founded.[^2]

## Why did the Mistral 7B release matter?

A few things made Mistral 7B more than just another open-weights checkpoint:

- It demonstrated that a 7B-class model could be competitive with [llama 2](/wiki/llama_2) 13B at roughly half the inference cost. That mattered for both consumer hardware and large-scale serving.[^1][^2]
- The [apache 2 license](/wiki/apache_2_license) allowed full commercial use, modification, and redistribution, with no acceptable-use clause and no large-platform carve-outs. For many companies and researchers this was the first major release of an English-strong base model under such a permissive license.[^2][^3]
- Mistral AI was the first major non-US foundation-model lab to release frontier-tier open weights at scale. The release became part of a larger argument about European AI sovereignty.[^9][^11]
- The model shipped with day-one support in [vllm](/wiki/vllm), [Text Generation Inference (TGI)](/wiki/huggingface_tgi), and [llama cpp](/wiki/llama_cpp), which meant that within hours people were running it on consumer GPUs, laptops, and cloud instances.[^2][^3]
- The torrent-link release style, which Mistral repeated for [Mixtral 8x7B](/wiki/mixtral) in December 2023, set the tone for a particular kind of "drop the weights, write the paper later" engineering culture.[^4][^12]

For a company that had existed for under five months at the time of the release, all of this was unusually self-confident. It worked.

## The paper and its authors

The Mistral 7B technical report was posted to arXiv on October 10, 2023 as arXiv:2310.06825.[^1] The eighteen listed authors are Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed.[^1] Many of them carried over experience from [DeepMind](/wiki/google_deepmind), [Meta AI](/wiki/meta_ai), and [hugging face](/wiki/hugging_face), where Le Scao had led the BigScience BLOOM project. The paper's abstract opens with the model's design goal in the authors' own words: "We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency."[^1]

The blog post announcing the release went up on mistral.ai on September 27, 2023, with the headline "Mistral 7B, the best 7B model to date."[^2] It claimed three months of development from the founding of the company to the release of the model.[^2]

## Architecture

Mistral 7B is a decoder-only [transformer](/wiki/transformer) in the same broad family as [LLaMA](/wiki/llama) and [llama 2](/wiki/llama_2). It keeps the now-standard combination of pre-normalisation with [rmsnorm](/wiki/rmsnorm), [swiglu](/wiki/swiglu) feed-forward layers, and [rotary position embeddings (RoPE)](/wiki/rotary_position_embedding) on the queries and keys.[^1][^13] The notable choices are at the attention level, where Mistral pairs [grouped query attention](/wiki/grouped_query_attention) (GQA) with [sliding window attention](/wiki/sliding_window_attention) (SWA).[^1]

The full configuration from Table 1 of the paper:[^1]

| Parameter | Value |
|---|---|
| Total parameters | ~7.24 billion |
| Layers (`n_layers`) | 32 |
| Model dimension (`dim`) | 4096 |
| Feed-forward hidden dimension (`hidden_dim`) | 14,336 |
| Attention heads (`n_heads`) | 32 |
| Key-value heads (`n_kv_heads`) | 8 |
| Head dimension (`head_dim`) | 128 |
| Vocabulary size (`vocab_size`) | 32,000 (byte-fallback BPE, Llama-style) |
| Sliding window (`window_size`) | 4096 tokens |
| Context length (`context_len`) | 8192 tokens |
| Positional encoding | RoPE |
| Normalisation | RMSNorm (pre-norm) |
| Activation | SwiGLU |
| Tokenizer | SentencePiece, Llama-style byte-fallback BPE |

The 32-to-8 ratio of query heads to key-value heads is the GQA factor, and it cuts the size of the [KV cache](/wiki/kv_cache) by 4x compared to standard multi-head attention with no measurable drop in quality once the model is trained from scratch with that configuration.[^1][^14] This is the single most important change for inference cost on long contexts.

### Why these particular ingredients

Mistral 7B did not introduce any individually new components. RMSNorm comes from Zhang and Sennrich's 2019 root-mean-square normalisation paper, SwiGLU from Shazeer's GLU variants work, and RoPE from Su et al. 2021.[^13][^15] The combination, however, is the one [LLaMA](/wiki/llama) popularised in early 2023, and Mistral 7B inherits it almost wholesale, swapping only the attention layer. The pre-norm placement (LayerNorm before each sub-block instead of after) is the standard transformer recipe that has dominated training-stability practice since GPT-3 era.[^13]

The MLP follows the standard SwiGLU geometry: a projection up to `hidden_dim = 14,336`, a gated SiLU non-linearity, and a projection back down to `dim = 4096`. The 14,336 figure is roughly 3.5 times the model dimension, in line with the 8/3 multiplier that became the open-source convention for SwiGLU MLPs after [LLaMA](/wiki/llama).[^15]

### What is grouped-query attention in Mistral 7B?

[Grouped-Query Attention](/wiki/grouped_query_attention) was introduced in the GQA paper by Joshua Ainslie and colleagues at Google in May 2023 (arXiv:2305.13245).[^14] The idea is a middle ground between vanilla multi-head attention, where every query head has its own key and value projections, and multi-query attention (MQA), where all query heads share a single set of key and value projections. GQA partitions the query heads into a smaller number of groups and gives each group its own key and value projection. Mistral 7B uses 32 query heads grouped into 8 KV heads, so each group of 4 query heads shares one set of K and V matrices.[^1][^14]

The practical effect is that the [KV cache](/wiki/kv_cache), which dominates memory during autoregressive generation at long sequence lengths, shrinks by the group factor. That makes batched serving cheaper and lets the model fit longer contexts in the same memory budget. The GQA paper showed that uptraining a multi-head model into a GQA configuration recovers nearly all of the original quality, and Mistral 7B confirmed that training from scratch with GQA works just as well.[^14]

[llama 2](/wiki/llama_2) had already adopted GQA at the 34B and 70B sizes but kept full multi-head attention for the 7B and 13B variants.[^16] Mistral 7B was one of the first widely released sub-10B open models to ship with GQA, and the pattern was picked up almost immediately by the rest of the field. Within a year, GQA was the default for new dense decoder LLMs in roughly the 1B to 100B range, including [llama 3](/wiki/llama_3), [Gemma](/wiki/gemma), and [Qwen](/wiki/qwen) families.[^17][^18]

### How does sliding window attention work?

The second architectural choice is [sliding window attention](/wiki/sliding_window_attention), originally introduced for the Longformer model by Iz Beltagy, Matthew Peters, and Arman Cohan in April 2020 (arXiv:2004.05150).[^19] In a sliding window of size W, each token only attends to the previous W tokens rather than to the entire history. The cost of attention drops from O(n²) to O(n·W), and the receptive field grows linearly with depth: with 32 layers and a window of 4096, the effective receptive field reaches 32 × 4096 = 131,072 tokens, far beyond the nominal 8192 context length.[^1][^2] The launch blog summarised the mechanism this way: "each layer attends to the previous `4,096` hidden states."[^2]

Mistral pairs sliding-window attention with a rolling KV-cache buffer. At position i, only the keys and values for positions i − W to i − 1 are kept in memory; older entries are overwritten in place inside a fixed-size circular buffer of size W. The cache index at timestep i is simply i mod W. The result is that memory per layer stays constant once the prompt passes the window size, regardless of how long the actual prompt is.[^1] On a 32k-token sequence, the paper reports the rolling buffer reduces cache memory usage by 8x relative to full attention without hurting quality, and the launch blog highlighted a 2x speed improvement over standard attention for a 16k-token sequence at a 4k window, on top of the GQA savings.[^1][^2]

For very long prompts, the paper also describes pre-fill chunking: split the prompt into chunks of size W, process them sequentially using a causal mask within each chunk and a sliding-window mask against cached prior chunks, and let the rolling cache accumulate the relevant state.[^1] In the original release Mistral promoted an "effective" context of 32K thanks to SWA plus rolling buffer, although in practice quality at very long contexts depended heavily on the use case.[^2] The v0.2 instruct model later increased the nominal context window to 32,768 tokens and dropped sliding-window attention from the default configuration, signalling that full attention with a longer base context had become the more common pattern across the field.[^7][^20]

### The sliding window receptive-field math

The receptive-field calculation deserves a closer look because it is one of the easier-to-misread numbers in the paper. At layer 1, a token sees the previous W = 4096 tokens through one attention operation. At layer 2, each of those 4096 tokens has already aggregated information from a 4096-token window of its own, so the layer-2 query effectively reaches back roughly 2W tokens. At layer k the reachable span is approximately k·W. With k = 32 layers and W = 4096, the theoretical span is 32 × 4096 = 131,072 tokens.[^1]

The practical span is smaller, since information attenuates as it has to be re-aggregated layer by layer, but the construction explains how a model with a 4k attention window and only 8k positional embeddings can carry useful long-range signal much further than naïve attention would suggest. The Mistral team's reported [FlashAttention](/wiki/flash_attention) modifications also yielded a 2x speed boost for 16k-token sequences over the vanilla attention baseline.[^1]

### Tokenizer

The original v0.1 tokenizer was a [LLaMA](/wiki/llama)-style byte-fallback [BPE](/wiki/byte_pair_encoding) trained with [SentencePiece](/wiki/sentencepiece), with a vocabulary size of 32,000 tokens.[^1][^3] Byte-fallback BPE means that any character that is not covered by the learned merge vocabulary is encoded as a sequence of raw UTF-8 bytes; the tokenizer therefore never fails on unfamiliar characters or scripts. Mistral 7B v0.3 extended the vocabulary to 32,768 entries to make room for new control tokens and to improve efficiency on certain scripts.[^6] The v0.3 update introduced the "v3 tokenizer" packaged via the `mistral_common` library; later Mistral releases (NeMo, Pixtral, Mistral Large 2) used yet newer tokenizers, including the Tekken tokenizer derived from OpenAI's tiktoken.[^21]

## Training

Mistral AI has not published a complete account of the training data or compute budget for Mistral 7B. The paper notes that the model was pretrained on data extracted from "the open Web" and emphasises that the model is a base model with no built-in moderation, leaving safety alignment to downstream fine-tuners.[^1] Total parameter count is approximately 7.24 billion when summed across embeddings, attention projections, and MLP layers, hence the "7B" name; the announcement blog rounds this to "7.3 billion."[^1][^2]

Hardware and exact token counts have not been disclosed in print. What is documented is the architectural recipe (the eight-line config in Table 1) and the published evaluation numbers. The instruct variant was trained via [supervised fine-tuning](/wiki/supervised_fine-tuning) on publicly available instruction-following datasets, without [rlhf](/wiki/rlhf) or [DPO](/wiki/dpo) in the v0.1 release.[^1] The v0.1 paper additionally describes a content-moderation experiment in which the model was prompted to self-classify its own outputs into categories such as illegal activities, hateful content, and unqualified advice, with the authors reporting 99.4% precision and 95.6% recall on a curated adversarial test set.[^1]

## How does Mistral 7B compare to Llama 2 on benchmarks?

The Mistral 7B paper benchmarks the base model against [LLaMA 1](/wiki/llama) (7B, 13B, 33B), [llama 2](/wiki/llama_2) (7B, 13B), and [Code Llama](/wiki/code_llama) 7B across a standard suite of evaluations. The headline numbers from Table 2 of the paper (Mistral 7B vs the closest competitor, Llama 2 13B):[^1]

| Benchmark | Mistral 7B | Llama 2 13B | Llama 2 7B | Code-Llama 7B |
|---|---|---|---|---|
| [MMLU](/wiki/mmlu) (5-shot) | 60.1% | 55.6% | 44.4% | 36.9% |
| [HellaSwag](/wiki/hellaswag) (0-shot) | 81.3% | 80.7% | 77.1% | 62.9% |
| [WinoGrande](/wiki/winogrande) (0-shot) | 75.3% | 72.9% | 69.5% | 62.3% |
| PIQA (0-shot) | 83.0% | 80.8% | 77.9% | 72.8% |
| Arc-Easy | 80.0% | 75.2% | 68.7% | 59.4% |
| Arc-Challenge | 55.5% | 48.8% | 43.2% | 34.5% |
| NaturalQuestions | 28.8% | 29.0% | 24.7% | 11.0% |
| TriviaQA | 69.9% | 69.6% | 63.8% | 34.9% |
| [HumanEval](/wiki/humaneval) (pass@1) | 30.5% | 18.9% | 11.6% | 31.1% |
| MBPP | 47.5% | 35.4% | 26.1% | 52.5% |
| MATH | 13.1% | 6.0% | 3.9% | 5.2% |
| [GSM8K](/wiki/gsm8k) (8-shot, maj@8) | 52.2% | 34.3% | 16.0% | 20.8% |

Mistral 7B beat Llama 2 13B on every benchmark in the table except NaturalQuestions, where the two were within a percentage point. On MMLU the gap was about 4.5 points, on GSM8K it was about 18 points, and on HumanEval it was about 11.6 points.[^1] The math and reasoning gaps were big enough that Mistral 7B was also competitive with or better than the much larger Llama 1 33B on those tasks, a comparison the launch blog turned into one of its headline framings.[^1][^2] Mistral summarised the result as: "Mistral 7B performs equivalently to a Llama 2 that would be more than 3x its size."[^2]

The paper additionally reports an [MT-Bench](/wiki/mt_bench) score of 6.84 ± 0.07 for **Mistral-7B-Instruct-v0.1**, ahead of [llama 2](/wiki/llama_2) 13B Chat at 6.65 and ahead of all other 7B chat models at the time of publication.[^1] A side-by-side human preference test reported in the paper showed Mistral preferred 5,020 times versus Llama 2 13B Chat preferred 4,143 times in the assessed sample on llmboxing.com/leaderboard.[^1] On MMLU specifically, Mistral 7B Instruct v0.1 scored 56.3%, which is several points below the base model's 60.1%, a typical pattern for early instruction-tuned 7B models, where the tuning data was not designed to preserve knowledge benchmarks.[^22]

The headline framing in the release blog was that Mistral 7B "performs equivalently to a Llama 2 that would be more than 3x its size" on reasoning and reading comprehension.[^2] That framing was marketing-flavoured, but the underlying numbers held up to independent scrutiny on Hugging Face's Open LLM Leaderboard, where Mistral 7B sat near the top of its weight class for most of late 2023.[^23]

## Instruct variants and version history

Mistral has shipped several iterations under the Mistral 7B name. The headline differences are tokenizer changes, instruction-following data, function-calling support, and the move from 8K to 32K context.

| Variant | Release | Notes |
|---|---|---|
| Mistral-7B-v0.1 (base) | Sept 27, 2023 | Original base model. 8K context, 32k vocab, GQA + SWA.[^1][^2] |
| Mistral-7B-Instruct-v0.1 | Sept 27, 2023 | First instruct version, supervised fine-tune on public instruction data; MT-Bench 6.84.[^1][^22] |
| Mistral-7B-Instruct-v0.2 | Dec 11, 2023 | Improved instruction following. 32K context (RoPE θ = 1e6), SWA disabled.[^7][^20] |
| Mistral-7B-v0.2 (base) | March 23, 2024 | Base release matching v0.2 instruct architecture, posted during a hackathon at SHACK15 in San Francisco co-hosted with Cerebral Valley.[^24][^25] |
| Mistral-7B-v0.3 (base) | May 22, 2024 | Vocabulary extended to 32,768 entries; v3 tokenizer.[^6] |
| Mistral-7B-Instruct-v0.3 | May 22, 2024 | v3 tokenizer, function calling via `[TOOL_CALLS]`, `[AVAILABLE_TOOLS]`, `[TOOL_RESULTS]` control tokens.[^6][^26] |

### v0.1 to v0.2

v0.1 used a strict 8K context window and 4K sliding window. v0.2, released as an instruct fine-tune on December 11, 2023, raised the nominal context to 32,768 tokens, removed sliding-window attention from the default configuration, and increased the RoPE base frequency to θ = 1 × 10⁶ to better support long-context extrapolation.[^7][^20] In `config.json` terms, `sliding_window` was set to `null`, `max_position_embeddings` to 32,768, and `rope_theta` from 10,000.0 to 1,000,000.0.[^7][^20]

The matching base model was released about three months later in March 2024 at Mistral's hackathon at SHACK15 in San Francisco, co-hosted with the Cerebral Valley community.[^24][^25] It was the first non-instruct v0.2 weights set to be officially distributed by Mistral. Because the official `mistralai` organisation on Hugging Face did not initially host the v0.2 base weights, the early redistribution lived at `mistral-community/Mistral-7B-v0.2` and `alpindale/Mistral-7B-v0.2-hf`.[^25] v0.2 became the workhorse for fine-tuning experiments throughout 2024 because it kept the 7.3B parameter count and Apache 2.0 license but added the longer context that downstream applications had started to expect.

### v0.3

The v0.3 generation extended the vocabulary from 32,000 to 32,768 entries to add three new control tokens, `[TOOL_CALLS]`, `[AVAILABLE_TOOLS]`, and `[TOOL_RESULTS]`, used by the structured [function-calling](/wiki/function_calling) format.[^6][^26] Function calls are issued by the model emitting a JSON payload between `[TOOL_CALLS]` boundaries, and tool results are returned inside `[TOOL_RESULTS]` boundaries; tool-call IDs are constrained to exactly nine alphanumeric characters.[^6] The v0.3 release accompanied a broader push by Mistral to support agentic workloads alongside the [Mixtral 8x7B](/wiki/mixtral) and Mistral Large product lines.[^26]

### Chat templates

The instruct variants use a chat template centred on the `[INST]` and `[/INST]` control tokens. The very first user instruction is preceded by the `<s>` begin-of-sentence token; subsequent instructions are not. Assistant generation ends with the `</s>` end-of-sentence token. A typical multi-turn sequence looks like:

```
<s>[INST] What is your favourite condiment? [/INST]
Well, I'm quite partial to a good squeeze of fresh lemon juice...</s>
[INST] Do you have mayonnaise recipes? [/INST]
```

The template is built into the `tokenizer.apply_chat_template` method in [Hugging Face Transformers](/wiki/transformers_library), which handles the formatting automatically when supplied with a list of `{"role": ..., "content": ...}` messages.[^7][^22]

## Reception and downstream fine-tunes

Mistral 7B was downloaded heavily within hours of release and immediately became the base model for a wave of community fine-tunes. A few of the most influential:

- **Zephyr 7B Beta** from Hugging Face H4 (released November 2023, paper "Zephyr: Direct Distillation of LM Alignment," arXiv:2310.16944) used [Direct Preference Optimization (DPO)](/wiki/direct_preference_optimization_dpo) on UltraFeedback over an UltraChat-tuned Mistral 7B base. It reached an [MT-Bench](/wiki/mt_bench) score of 7.34 and a 90.6% [AlpacaEval](/wiki/alpacaeval) win rate, the highest MT-Bench for a 7B open chat model at the time and even ahead of Llama 2 Chat 70B on chat tasks.[^27][^28]
- **OpenHermes 2.5 Mistral 7B** from Teknium (November 3, 2023) was fine-tuned on roughly 1 million examples of primarily GPT-4-generated instruction data plus code; it became one of the most downloaded community chat models of late 2023 and popularised the ChatML format on Mistral. Reported scores included HumanEval pass@1 of 50.7%, TruthfulQA 53.04%, AGI-Eval 43.07%, and a GPT4All average of 73.12.[^29]
- **Notus 7B** from Argilla (November 2023) was a DPO fine-tune that started from Hugging Face H4's `zephyr-7b-sft-full` and applied a re-binarised version of UltraFeedback using preference ratings rather than the original critique scores. It reached MT-Bench 7.30 and AlpacaEval 91.42%, slightly above Zephyr Beta on the latter.[^30]
- **Starling-LM-7B-alpha** from a UC Berkeley team (Banghua Zhu and colleagues, November 2023) used Reinforcement Learning from AI Feedback ([rlaif](/wiki/rlaif)) with an Advantage-induced Policy Alignment (APA) algorithm and the Nectar reward model on top of OpenChat 3.5, which was itself a Mistral 7B fine-tune. Starling reached MT-Bench 8.09, behind only GPT-4 and GPT-4 Turbo at the time.[^31]
- **Dolphin-2.x-Mistral-7B** from Eric Hartford and the Cognitive Computations community (December 2023 onwards) focused on producing an uncensored chat model on the Mistral base, widely used for role-play and creative applications. Dolphin-2.8-Mistral-7B-v02 (March 2024) was a full-weights fine-tune of Mistral 7B v0.2 with a 16K sequence length, trained on roughly 10x L40S GPUs over three days on Crusoe Cloud.[^32]

The architectural pattern of GQA plus sliding-window attention plus RoPE plus RMSNorm plus SwiGLU, with a roughly 4x query-to-KV-head ratio, became the default recipe for new dense open-weights LLMs in the 2024 to 2026 period. Models from Alibaba's [qwen](/wiki/qwen) line, Google's [gemma](/wiki/gemma) line, Meta's [llama 3](/wiki/llama_3) line, and several others adopted the same general blueprint, with variations on window size or whether to keep SWA at all.[^17][^18]

The release pattern (weights first, paper later, no application form) has also stuck. Within a year, the default community expectation for a serious open release was Apache 2.0 or a similarly permissive license, weights on [hugging face](/wiki/hugging_face), day-one support in popular inference engines, and at most a brief blog post. Anything more restrictive started to look defensive.[^4][^5]

## Strategic and commercial impact

For Mistral AI itself, the success of the 7B release set up a sequence of larger funding rounds:

- **December 11, 2023:** €385 million ($428 million) Series A at a roughly €2 billion valuation, led by Andreessen Horowitz with participation from Salesforce, BNP Paribas, Lightspeed, and others.[^33][^34]
- **February 26, 2024:** $16 million strategic investment from Microsoft alongside an Azure distribution partnership that made Mistral Large the second LLM hosted natively on Azure AI Studio after OpenAI's.[^35][^36]
- **June 11, 2024:** €600 million ($640 million) Series B at a €5.8 billion ($6 billion) valuation, led by General Catalyst, with €468 million in equity and €132 million in debt; participating investors included Nvidia, Andreessen Horowitz, Salesforce Ventures, IBM, Samsung Venture, Cisco, ServiceNow, and others.[^37]
- **September 9, 2025:** €1.7 billion ($2 billion) Series C at an €11.7 billion ($13.7-14 billion) valuation in which the Dutch lithography giant ASML invested €1.3 billion and took an 11% stake on a fully diluted basis, becoming Mistral's largest single shareholder and lead investor.[^38][^39]

The company became one of the most-cited examples of European AI capacity in policy discussions about sovereignty and competitiveness. Existing investors Nvidia, DST Global, Andreessen Horowitz, Bpifrance, General Catalyst, Index Ventures, and Lightspeed also participated in the Series C alongside ASML.[^38][^39]

The wider ecosystem of fine-tunes built on Mistral 7B is hard to count precisely. As of 2026, the Hugging Face hub lists thousands of derivative models, including instruction-tuned variants in dozens of languages, role-play and uncensored models, code-focused fine-tunes, retrieval-augmented setups, and small-scale reasoning models. Many of the early "Mistral" community fine-tunes were the first widely used non-Meta open-weights chat models that people felt they could deploy commercially without legal review.[^5][^17]

## How do you run Mistral 7B locally?

Part of the reason Mistral 7B took off so quickly is that the inference story was extremely friendly. Day-one support landed in [vllm](/wiki/vllm), [text-generation-inference (TGI)](/wiki/huggingface_tgi), and [llama.cpp](/wiki/llama_cpp).[^2][^3] Within a week there were quantised [GGUF](/wiki/gguf), [GGML](/wiki/ggml), GPTQ, AWQ, and [EXL2](/wiki/exl2) builds on Hugging Face from community contributors, several of which fit comfortably on a single 8 GB consumer GPU.[^40]

Concrete deployment numbers worth noting:

- In bf16, the model weights take roughly 14 GB. A single 16 GB consumer GPU like an RTX 4080 can serve it without offload.
- 4-bit GGUF [quantization](/wiki/quantization) brings the file size to around 4 GB, which means it runs at acceptable speed on M1 and M2 MacBooks and on CPUs with enough RAM.[^40]
- The KV cache savings from GQA mean that, for a given batch and context, Mistral 7B uses about a quarter of the cache memory of an equivalent multi-head 7B model.[^14]
- The official launch blog cited roughly a 2x speed improvement over standard attention for a 16k-token sequence with a 4k sliding window.[^2]
- The rolling buffer cache reported in the paper reduced cache memory usage by 8x on a 32k-token sequence relative to dense attention.[^1]

Tooling support spread quickly. [ollama](/wiki/ollama) added a pre-packaged Mistral 7B build very early, and the model became one of the most-downloaded entries in the Ollama library through 2024 and 2025.[^41] LM Studio, Jan, GPT4All, and the major commercial inference hosts (Together AI, Anyscale, Fireworks, Replicate, OpenRouter, and others) all offered hosted Mistral 7B endpoints within weeks of release. By 2025 the official `mistralai/Mistral-7B-v0.1` repository was logging over 500,000 downloads in its first month of release and well above 900,000 monthly downloads for stretches of 2024 to 2025, putting it among the most-downloaded open-weights causal-LM repositories on the platform.[^42][^43]

### Recommended fine-tuning tools

Because v0.1 was released under Apache 2.0 with no acceptable-use clause, the post-launch fine-tuning ecosystem was unusually wide. The most commonly used wrappers for Mistral 7B fine-tuning include Hugging Face's [Transformers](/wiki/transformers_library) plus [TRL](/wiki/huggingface_trl) (TRL ships built-in SFT and DPO trainers), [PEFT](/wiki/huggingface_peft) for parameter-efficient training, [LoRA](/wiki/lora) and [QLoRA](/wiki/qlora) for low-cost adapter fine-tunes, and Mistral's own `mistral-finetune` repository released alongside v0.3.[^44] [LLaMA-Factory](/wiki/llama_factory) also added Mistral support among its first batch of non-LLaMA architectures.

## Is Mistral 7B open source?

Mistral 7B is released under the [Apache License, version 2.0](/wiki/apache_2_license), one of the most permissive licenses in use for foundation models. The release used two distribution channels at once. The official Hugging Face repository at `mistralai/Mistral-7B-v0.1` (and the corresponding instruct variants) hosted the SafeTensors weights.[^3] Separately, the Mistral team posted a BitTorrent magnet link on social media a day before the blog post went live. The torrent contained the same weights plus a sample inference script.[^4][^45]

The license carries no acceptable-use addendum, no platform-size restrictions, no separate research-only clause, and no requirement to identify model outputs.[^2][^3] By contrast, [llama 2](/wiki/llama_2)'s "Community License" at the time included a 700-million-monthly-active-user restriction, an acceptable-use policy, and a requirement to attribute outputs as Llama-derived.[^16][^46] Mistral's own framing in the launch blog was unambiguous: the model "can be used without restrictions."[^2]

The combination of a recognised permissive license, a clean state-of-the-art claim at the 7B size, and a low barrier to actually running the thing was the trifecta that drove adoption.[^4][^5]

## The Mistral model family after 7B

Mistral 7B was the first in what has become a wide line of releases. The most relevant follow-ups for understanding its place in the family:

| Model | Released | Notes |
|---|---|---|
| Mistral 7B | Sept 27, 2023 | Dense 7.3B, Apache 2.0.[^2] |
| [Mixtral 8x7B](/wiki/mixtral) | Dec 11, 2023 | Sparse [mixture-of-experts](/wiki/mixture_of_experts): 8 experts of ~7B each, 2 routed per token. About 46.7B total parameters and ~13B active. Apache 2.0.[^12] |
| Mistral Medium | Dec 2023 | First proprietary commercial model (closed weights).[^9] |
| [Mistral Large](/wiki/mistral_large) | Feb 26, 2024 | Closed-weights commercial flagship, first hosted on Azure via Microsoft partnership.[^35][^36] |
| [Mixtral 8x22B](/wiki/mixtral_8x22b) | April 2024 | Bigger MoE successor to Mixtral 8x7B. Apache 2.0.[^9] |
| Codestral 22B | May 29, 2024 | Code-focused dense model under the Mistral Non-Production License.[^47] |
| Mistral 7B v0.3 | May 22, 2024 | Updated tokenizer (32,768-entry vocab), function calling.[^6] |
| Codestral [Mamba](/wiki/mamba) 7B | July 16, 2024 | First Mistral model using the Mamba state-space architecture.[^9] |
| Mathstral 7B | July 16, 2024 | Math-focused fine-tune.[^9] |
| Mistral NeMo 12B | July 18, 2024 | 12B model built with NVIDIA, 128K context, Tekken tokenizer.[^21] |
| [Pixtral 12B](/wiki/pixtral) | September 2024 | First multimodal Mistral release; based on the NeMo 12B text backbone.[^48] |
| [Ministral 3B / 8B](/wiki/ministral) | October 2024 | Smaller models for edge use.[^9] |
| Mistral Small 3 (24B) | January 30, 2025 | 24B dense, Apache 2.0, ~81% MMLU, 32K context.[^49] |
| Mistral Small 3.1 / 3.2 | March 2025 / June 2025 | Successive updates to the 24B Small line.[^9] |
| Magistral Small / Medium | June 2025 | Reasoning-focused models.[^9] |
| [Mistral Medium 3](/wiki/mistral_medium_3) | May 2025 | Enterprise-grade dense model.[^9] |
| [Mistral Large 3](/wiki/mistral_large_3) | December 2, 2025 | Flagship dense/MoE successor with 675B total / 41B active parameters; commercial.[^50] |

By the time Mistral Large 3 shipped in late 2025, the original 7B was no longer the company's headline product, but it had not been retired. The base v0.3 weights remained one of the most heavily downloaded checkpoints on Hugging Face and stayed in active use for fine-tuning, distillation, and edge deployment.[^43][^50]

## What are the limitations of Mistral 7B?

Mistral 7B is, by 2026 standards, a small model. There are clear limits.

- **Knowledge is dated.** The pretraining cutoff is roughly mid-2023. Anything after that has to come from retrieval, fine-tuning, or in-context examples.[^1]
- **It is strongest in English.** Multilingual coverage is acceptable but not on par with later Mistral releases like NeMo 12B or with comparable multilingual-first models.[^21]
- **Short context.** The 8K context of v0.1 is short by current standards. v0.2 and v0.3 raised this to 32K but still fall short of the very long contexts (200K+) now common in flagship proprietary models.[^7][^20]
- **No built-in moderation.** Mistral has been explicit that safety alignment is left to downstream users, which is part of the appeal for some and the criticism for others.[^1][^3]
- **Surpassed by newer 7B to 9B-class models.** [Llama 3.1 8B](/wiki/llama_3_1), Qwen2 and Qwen2.5 7B, [Gemma 2 9B](/wiki/gemma_2), and several task-specific fine-tunes all surpass v0.1 on most public leaderboards. For pure capability per parameter, Mistral 7B is no longer state of the art.[^17][^18]
- **Documented sliding-window quality cliffs.** In v0.1, attention quality degrades past the 4K window, which is why Mistral disabled SWA in v0.2 once they had moved to a longer dense context.[^7][^20]

What it remains useful for: a strong, well-documented, permissively licensed baseline for fine-tuning research and a standard reference architecture for understanding the GQA-plus-SWA design pattern.

## Recent status (2025 to 2026)

In 2025 and into 2026 Mistral 7B continues to show up as the default starting point for academic fine-tuning papers, for university courses on LLM internals, and for production deployments where a small, locally hosted, permissively licensed model is the right fit. Mistral AI has not deprecated it. The v0.3 weights are still served from the official Hugging Face organisation, and [ollama](/wiki/ollama), [llama.cpp](/wiki/llama_cpp), [vllm](/wiki/vllm), and [TGI](/wiki/huggingface_tgi) all maintain support.[^41][^43]

Mistral AI itself has shifted its public emphasis toward larger commercial models ([Mistral Large 3](/wiki/mistral_large_3), [Mistral Medium 3](/wiki/mistral_medium_3), Magistral Medium) and toward the [Mixtral](/wiki/mixtral) MoE line. The September 2025 partnership and €1.3 billion investment from ASML, which gave the Dutch lithography company an 11% stake and made it Mistral's biggest single shareholder, signalled that the company is positioning itself as a long-term European AI champion with deep ties to the European semiconductor industry.[^38][^39] In late 2025 and early 2026 Mistral also broke ground on data centres near Paris and Sweden, supported by a roughly $830 million infrastructure round, the first dedicated computing build-out of that scale for a European AI lab.[^9]

The original 7B sits in the company's history the same way [LLaMA 1](/wiki/llama) sits in Meta's: the first one out the door, the proof of concept, the model that made everything afterward easier to ship.

## See also

- [mistral ai](/wiki/mistral_ai)
- [Mixtral 8x7B](/wiki/mixtral)
- [mixtral 8x22b](/wiki/mixtral_8x22b)
- [mistral large](/wiki/mistral_large)
- [mistral large 3](/wiki/mistral_large_3)
- [mistral medium 3](/wiki/mistral_medium_3)
- [pixtral](/wiki/pixtral)
- [codestral](/wiki/codestral)
- [llama 2](/wiki/llama_2)
- [llama 3](/wiki/llama_3)
- [grouped query attention](/wiki/grouped_query_attention)
- [sliding window attention](/wiki/sliding_window_attention)
- [rotary position embedding](/wiki/rotary_position_embedding)
- [rmsnorm](/wiki/rmsnorm)
- [swiglu](/wiki/swiglu)
- [byte pair encoding](/wiki/byte_pair_encoding)
- [sentencepiece](/wiki/sentencepiece)
- [apache 2 license](/wiki/apache_2_license)
- [hugging face](/wiki/hugging_face)
- [vllm](/wiki/vllm)
- [llama cpp](/wiki/llama_cpp)
- [ollama](/wiki/ollama)
- [kv cache](/wiki/kv_cache)
- [gguf](/wiki/gguf)

## References

[^1]: Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., de las Casas, D., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Le Scao, T., Lavril, T., Wang, T., Lacroix, T., El Sayed, W., "Mistral 7B", arXiv preprint, 2023-10-10. https://arxiv.org/abs/2310.06825. Accessed 2026-06-21.

[^2]: Mistral AI, "Mistral 7B: The best 7B model to date, Apache 2.0", Mistral AI blog, 2023-09-27. https://mistral.ai/news/announcing-mistral-7b. Accessed 2026-06-21.

[^3]: Hugging Face, "mistralai/Mistral-7B-v0.1 model card", Hugging Face Hub, 2023. https://huggingface.co/mistralai/Mistral-7B-v0.1. Accessed 2026-06-21.

[^4]: Franzen, C., "Mistral AI bucks release trend by dropping torrent link to new open source LLM", VentureBeat, 2023-09-27. https://venturebeat.com/ai/mistral-ai-bucks-release-trend-by-dropping-torrent-link-to-new-open-source-llm. Accessed 2026-06-21.

[^5]: Maiberg, E., "$260 Million AI Company Releases Undeletable Chatbot That Gives Detailed Instructions on Murder, Ethnic Cleansing", 404 Media, 2023-09-29. https://www.404media.co/260-million-ai-company-releases-chatbot-that-gives-detailed-instructions-on-murder-ethnic-cleansing/. Accessed 2026-06-21.

[^6]: Hugging Face, "mistralai/Mistral-7B-Instruct-v0.3 model card", Hugging Face Hub, 2024-05-22. https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3. Accessed 2026-06-21.

[^7]: Hugging Face, "mistralai/Mistral-7B-Instruct-v0.2 model card", Hugging Face Hub, 2023-12-11. https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2. Accessed 2026-06-21.

[^8]: Together AI, "Mistral (7B) Instruct v0.2 API", Together AI model catalog, 2024. https://www.together.ai/models/mistral-7b-instruct-v0-2. Accessed 2026-06-21.

[^9]: Wikipedia contributors, "Mistral AI", Wikipedia, 2026 revision. https://en.wikipedia.org/wiki/Mistral_AI. Accessed 2026-06-21.

[^10]: Dillet, R., "France's Mistral AI blows in with a $113M seed round at a $260M valuation to take on OpenAI", TechCrunch, 2023-06-13. https://techcrunch.com/2023/06/13/frances-mistral-ai-blows-in-with-a-113m-seed-round-at-a-260m-valuation-to-take-on-openai/. Accessed 2026-06-21.

[^11]: Tech.eu, "FantAstIque! French start-up Mistral AI raises a €105 million Seed round in its first month of existence", Tech.eu, 2023-06-14. https://tech.eu/2023/06/14/fantastique-french-start-up-mistral-ai-raises-a-105-million-seed-round-in-its-first-month-of-existence/. Accessed 2026-06-21.

[^12]: Mistral AI, "Mixtral of experts: A high quality Sparse Mixture-of-Experts", Mistral AI blog, 2023-12-11. https://mistral.ai/news/mixtral-of-experts. Accessed 2026-06-21.

[^13]: Touvron, H., Lavril, T., Izacard, G. et al., "LLaMA: Open and Efficient Foundation Language Models", arXiv:2302.13971, 2023-02-27. https://arxiv.org/abs/2302.13971. Accessed 2026-06-21.

[^14]: Ainslie, J., Lee-Thorp, J., de Jong, M., Zemlyanskiy, Y., Lebron, F., Sanghai, S., "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints", arXiv:2305.13245, 2023-05-22. https://arxiv.org/abs/2305.13245. Accessed 2026-06-21.

[^15]: Shazeer, N., "GLU Variants Improve Transformer", arXiv:2002.05202, 2020-02-12. https://arxiv.org/abs/2002.05202. Accessed 2026-06-21.

[^16]: Touvron, H., Martin, L., Stone, K. et al., "Llama 2: Open Foundation and Fine-Tuned Chat Models", arXiv:2307.09288, 2023-07-18. https://arxiv.org/abs/2307.09288. Accessed 2026-06-21.

[^17]: Llama Team, "The Llama 3 Herd of Models", arXiv:2407.21783, 2024-07-31. https://arxiv.org/abs/2407.21783. Accessed 2026-06-21.

[^18]: Google DeepMind, "Gemma: Open Models Based on Gemini Research and Technology", arXiv:2403.08295, 2024-03-13. https://arxiv.org/abs/2403.08295. Accessed 2026-06-21.

[^19]: Beltagy, I., Peters, M. E., Cohan, A., "Longformer: The Long-Document Transformer", arXiv:2004.05150, 2020-04-10. https://arxiv.org/abs/2004.05150. Accessed 2026-06-21.

[^20]: Mistral AI Labs (@MistralAILabs), "New release: Mistral 7B v0.2 Base", X (Twitter), 2024-03-23. https://x.com/MistralAILabs/status/1771670765521281370. Accessed 2026-06-21.

[^21]: Mistral AI, "Mistral NeMo", Mistral AI blog, 2024-07-18. https://mistral.ai/news/mistral-nemo. Accessed 2026-06-21.

[^22]: Hugging Face, "mistralai/Mistral-7B-Instruct-v0.1 model card", Hugging Face Hub, 2023-09-27. https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1. Accessed 2026-06-21.

[^23]: Beeching, E. et al., "Open LLM Leaderboard", Hugging Face Spaces, 2023 to 2024. https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard. Accessed 2026-06-21.

[^24]: Cerebral Valley (@cerebral_valley), "@MistralAI just announced Mistral 7B v0.2 Base Model at our hackathon at @SHACK15sf", X (Twitter), 2024-03-23. https://x.com/cerebral_valley/status/1771630171679776900. Accessed 2026-06-21.

[^25]: Hugging Face, "mistral-community/Mistral-7B-v0.2 (base, March 2024 release)", Hugging Face Hub, 2024-03. https://huggingface.co/mistral-community/Mistral-7B-v0.2. Accessed 2026-06-21.

[^26]: MarkTechPost, "Mistral AI Team Releases the Mistral-7B-Instruct-v0.3", MarkTechPost, 2024-05-22. https://www.marktechpost.com/2024/05/22/mistral-ai-team-releases-the-mistral-7b-instruct-v0-3-an-instruct-fine-tuned-version-of-the-mistral-7b-v0-3/. Accessed 2026-06-21.

[^27]: Hugging Face, "HuggingFaceH4/zephyr-7b-beta model card", Hugging Face Hub, 2023-10. https://huggingface.co/HuggingFaceH4/zephyr-7b-beta. Accessed 2026-06-21.

[^28]: Tunstall, L., Beeching, E., Lambert, N. et al., "Zephyr: Direct Distillation of LM Alignment", arXiv:2310.16944, 2023-10-25. https://arxiv.org/abs/2310.16944. Accessed 2026-06-21.

[^29]: Hugging Face, "teknium/OpenHermes-2.5-Mistral-7B model card", Hugging Face Hub, 2023-11-03. https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B. Accessed 2026-06-21.

[^30]: Bartolomé, A. and Vila-Suero, D., "Introducing Notus: A DPO fine-tune of Zephyr with a focus on high-quality data", Hugging Face Blog, 2023-11-29. https://huggingface.co/blog/alvarobartt/notus-7b-v1. Accessed 2026-06-21.

[^31]: Hugging Face, "berkeley-nest/Starling-LM-7B-alpha model card", Hugging Face Hub, 2023-11. https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha. Accessed 2026-06-21.

[^32]: Hugging Face, "cognitivecomputations/dolphin-2.8-mistral-7b-v02 model card", Hugging Face Hub, 2024-03. https://huggingface.co/cognitivecomputations/dolphin-2.8-mistral-7b-v02. Accessed 2026-06-21.

[^33]: Tech.eu, "Mistral AI confirms €385M Series A funding round", Tech.eu, 2023-12-11. https://tech.eu/2023/12/11/mistral-ai-confirms-385m-series-a-funding-round/. Accessed 2026-06-21.

[^34]: Latham & Watkins, "Latham Represents a16z on €385 Million Series A Funding Round of Mistral AI", Latham & Watkins press release, 2023-12. https://www.lw.com/en/news/2023/12/latham-advises-a16z-on-mistral-ai-series-a-funding-round. Accessed 2026-06-21.

[^35]: Microsoft Azure, "Microsoft and Mistral AI announce new partnership to accelerate AI innovation and introduce Mistral Large first on Azure", Microsoft Azure blog, 2024-02-26. https://azure.microsoft.com/en-us/blog/microsoft-and-mistral-ai-announce-new-partnership-to-accelerate-ai-innovation-and-introduce-mistral-large-first-on-azure/. Accessed 2026-06-21.

[^36]: Browne, R., "Microsoft invests in Europe's Mistral AI to expand beyond OpenAI", CNBC, 2024-02-26. https://www.cnbc.com/2024/02/26/microsoft-invests-in-europes-mistral-ai-to-expand-beyond-openai.html. Accessed 2026-06-21.

[^37]: Dillet, R., "Paris-based AI startup Mistral AI raises $640M", TechCrunch, 2024-06-11. https://techcrunch.com/2024/06/11/paris-based-ai-startup-mistral-ai-raises-640-million/. Accessed 2026-06-21.

[^38]: CNBC, "AI firm Mistral valued at $14 billion as chip giant ASML takes major stake", CNBC, 2025-09-09. https://www.cnbc.com/2025/09/09/ai-firm-mistral-valued-at-14-billion-as-asml-takes-major-stake.html. Accessed 2026-06-21.

[^39]: Mistral AI, "Mistral AI raises €1.7B to accelerate technological progress with AI", Mistral AI blog, 2025-09-09. https://mistral.ai/news/mistral-ai-raises-1-7-b-to-accelerate-technological-progress-with-ai. Accessed 2026-06-21.

[^40]: Hugging Face, "TheBloke/Mistral-7B-v0.1-GGUF model card", Hugging Face Hub, 2023-09. https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF. Accessed 2026-06-21.

[^41]: Ollama, "mistral", Ollama model library, 2023 to 2026. https://ollama.com/library/mistral. Accessed 2026-06-21.

[^42]: Quantumrun Foresight, "Mistral 7B Statistics and User Trends", Quantumrun, 2025. https://www.quantumrun.com/consulting/mistral-7b-statistics/. Accessed 2026-06-21.

[^43]: Hugging Face, "mistralai/Mistral-7B-v0.3 model card", Hugging Face Hub, 2024-05-22. https://huggingface.co/mistralai/Mistral-7B-v0.3. Accessed 2026-06-21.

[^44]: Mistral AI, "mistral-finetune (GitHub repository)", GitHub, 2024. https://github.com/mistralai/mistral-finetune. Accessed 2026-06-21.

[^45]: Slashdot, "$260 Million AI Startup Releases 'Unmoderated' Chatbot Via Torrent", Slashdot, 2023-09-29. https://slashdot.org/story/23/09/29/2024216/260-million-ai-startup-releases-unmoderated-chatbot-via-torrent. Accessed 2026-06-21.

[^46]: Meta AI, "Llama 2 Community License Agreement", Meta AI, 2023-07. https://ai.meta.com/llama/license/. Accessed 2026-06-21.

[^47]: Mistral AI, "Codestral: Hello, World!", Mistral AI blog, 2024-05-29. https://mistral.ai/news/codestral. Accessed 2026-06-21.

[^48]: Wiggers, K., "Mistral releases Pixtral 12B, its first multimodal model", TechCrunch, 2024-09-11. https://techcrunch.com/2024/09/11/mistral-releases-pixtral-its-first-multimodal-model/. Accessed 2026-06-21.

[^49]: Mistral AI, "Mistral Small 3", Mistral AI blog, 2025-01-30. https://mistral.ai/news/mistral-small-3. Accessed 2026-06-21.

[^50]: Wikipedia contributors, "Mistral AI (model timeline)", Wikipedia, 2026 revision. https://en.wikipedia.org/wiki/Mistral_AI#Models. Accessed 2026-06-21.

