Pixtral Large
Last reviewed
Jun 8, 2026
Sources
4 citations
Review status
Source-backed
Revision
v1 · 1,299 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 8, 2026
Sources
4 citations
Review status
Source-backed
Revision
v1 · 1,299 words
Add missing citations, update stale details, or suggest a clearer explanation.
Pixtral Large is a multimodal (vision-language) large language model released by Mistral AI on November 18, 2024. With roughly 124 billion total parameters, it was the company's frontier multimodal flagship at launch, pairing a large text decoder built on Mistral Large 2 with a dedicated vision encoder so the model can interpret images, documents, charts, and diagrams alongside text. It is the second and larger entry in Mistral's Pixtral family of vision-language models, following the smaller open-weights Pixtral 12B released in September 2024. [1][2]
Pixtral Large extends Mistral Large 2 with image understanding while retaining that model's text-only capabilities. Mistral positioned it as demonstrating "frontier-level image understanding," with particular strength on documents, charts, and natural images. The model accepts interleaved text and image inputs and produces text outputs, making it suited to tasks such as reading scanned documents, answering questions about figures and infographics, and reasoning over mathematical content presented visually. [1][2]
The model carries a 128,000-token context window, which Mistral states is large enough to fit a minimum of 30 high-resolution images, allowing it to process a long document together with many accompanying images in a single request. At release, Mistral made the model available through its developer API as pixtral-large-latest, in its Le Chat consumer assistant, and as downloadable open weights on Hugging Face under a research license. [1][2]
Mistral AI is a Paris-based artificial intelligence company founded in April 2023 by Arthur Mensch, Guillaume Lample, and Timothee Lacroix, several of whom previously worked at Meta AI and Google DeepMind. The company built its reputation on releasing capable open-weights language models, including the Mistral 7B and Mixtral mixture-of-experts series, alongside commercial models served through its La Plateforme API. [1]
The Pixtral line is Mistral's family of multimodal models. The first member, Pixtral 12B, launched in September 2024 as an open-weights model under the permissive Apache 2.0 license, combining a 12-billion-parameter language backbone with a 400-million-parameter vision encoder. Pixtral Large, announced roughly two months later, scaled the same approach up to the company's flagship Mistral Large 2 decoder, trading Pixtral 12B's fully open license for a research-and-commercial licensing split appropriate to a frontier-scale model. [1][2]
Mistral has since continued the lineage with larger and newer multimodal systems; the company later released Mistral Large 3 and other models, and by 2026 listed Pixtral Large as a legacy model superseded by newer vision-capable releases. [3]
Pixtral Large uses a two-part design. The language component is a 123-billion-parameter multimodal decoder derived from Mistral Large 2 (released internally as Mistral-Large-Instruct-2407), which provides the model's text reasoning, generation, and instruction-following abilities. Attached to it is a roughly 1-billion-parameter vision encoder that converts input images into representations the decoder can attend to, bringing the combined total to approximately 124 billion parameters. [2]
This architecture follows the common vision-language model pattern of bolting a vision encoder onto a pretrained language decoder, so that the system inherits the text model's broad knowledge and reasoning while adding the ability to ground that reasoning in pixels. The model card distributes Pixtral Large as Mistral Large 24.11, and Mistral recommended the vLLM serving framework for self-deployment, noting at release that the standard Hugging Face Transformers implementation was not yet supported. The instruct version uses Mistral's V7 chat template with explicit system-prompt, instruction, and image tokens. [2]
| Property | Detail |
|---|---|
| Developer | Mistral AI |
| Announced | November 18, 2024 |
| Model type | Multimodal (vision-language) large language model |
| Total parameters | Approximately 124 billion |
| Language decoder | 123 billion parameters (built on Mistral Large 2) |
| Vision encoder | Approximately 1 billion parameters |
| Context window | 128,000 tokens (minimum of 30 high-resolution images) |
| Inputs / outputs | Text and images in; text out |
| API identifier | pixtral-large-latest |
| Weights name | Pixtral-Large-Instruct-2411 (Mistral Large 24.11) |
| License | Mistral Research License (non-commercial); Mistral Commercial License |
| Recommended serving | vLLM |
At release, Mistral reported that Pixtral Large delivered frontier-class results across a range of multimodal benchmarks, comparing it against contemporary models including OpenAI's GPT-4o (August 2024 version), Anthropic's Claude 3.5 Sonnet (the October 2024 update), Google's Gemini 1.5 Pro, and Meta's Llama 3.2 90B. The figures below are drawn from Mistral's published model card and should be read as the company's own reported results. [1][2]
| Benchmark (metric) | Pixtral Large score |
|---|---|
| MathVista (chain-of-thought) | 69.4 |
| MMMU (chain-of-thought) | 64.0 |
| ChartQA (chain-of-thought) | 88.1 |
| DocVQA (ANLS) | 93.3 |
| VQAv2 (VQA match) | 80.9 |
| AI2D (bounding box) | 93.8 |
| MM-MT-Bench | 7.4 |
Mistral highlighted MathVista, a benchmark for mathematical reasoning over visual data, where Pixtral Large scored 69.4 and which the company said outperformed all other models in its comparison set. On document and chart understanding, Mistral claimed the model surpassed GPT-4o and Gemini 1.5 Pro on DocVQA and ChartQA. Independent coverage noted that the model's 88.1 on ChartQA edged GPT-4o's reported 85.2 but trailed Claude 3.5 Sonnet's 89.1, illustrating that the leadership was task-dependent rather than uniform. On MM-MT-Bench, an open, judge-based multimodal evaluation meant to reflect real-world use, Mistral reported that Pixtral Large outperformed Claude 3.5 Sonnet, Gemini 1.5 Pro, and the latest GPT-4o. Mistral also pointed to the LMSys vision arena, where it said Pixtral Large ranked as the leading open-weights model. [1][2][4]
As with all vendor-reported benchmarks, exact methodology (prompting, chain-of-thought, and evaluation harness) can affect the numbers, and results on rapidly updated competitor models can shift; the comparisons above reflect the specific model versions Mistral evaluated in November 2024. Other vision-language models from the same period not in Mistral's comparison table, such as Alibaba's Qwen2-VL, were also competitive on several of these tasks. [2]
Pixtral Large was released under a dual-licensing model. The open weights are governed by the Mistral Research License (MRL), which permits use for research and educational purposes on a non-commercial basis. Any commercial use, including experimentation, testing, and production deployment, requires a separate Mistral Commercial License obtained from the company. This split mirrors the licensing Mistral applied to Mistral Large 2 and differs from the fully open Apache 2.0 terms of the smaller Pixtral 12B. [1][2]
For developers who prefer hosted access, the model was offered through Mistral's La Plateforme API under the name pixtral-large-latest and surfaced in the company's Le Chat assistant. Mistral also stated that Mistral Large 24.11 would become available through cloud partners shortly after launch, beginning with Google Cloud and Microsoft Azure. The downloadable weights were published on Hugging Face as Pixtral-Large-Instruct-2411 for self-hosting under the MRL. [1][2]
Pixtral Large marked Mistral AI's entry into frontier-scale multimodal modeling, extending the company's strategy of competitive open-weights releases from text into vision. By grounding a flagship-size language decoder in images, it gave developers and researchers a high-capacity alternative to proprietary multimodal systems such as GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro, with weights available for inspection and non-commercial use. Its release reinforced the broader 2024 trend toward multimodal AI as a default capability for flagship models rather than a specialized add-on. While Mistral later superseded Pixtral Large with newer vision-capable models, it remained a notable milestone as the largest member of the Pixtral family and Mistral's first frontier-class vision-language release. [1][2][3]