# Pixtral Large

> Source: https://aiwiki.ai/wiki/pixtral_large
> Updated: 2026-06-28
> Categories: AI Models, Large Language Models, Open Source AI
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**Pixtral Large** is a 124-billion-parameter multimodal (vision-language) large language model released by [Mistral AI](/wiki/mistral_ai) on November 18, 2024. It pairs a 123-billion-parameter text decoder built on [Mistral Large 2](/wiki/mistral_large) with a roughly 1-billion-parameter vision encoder, giving the model a 128,000-token context window and the ability to interpret images, documents, charts, and diagrams alongside text. It is the second and larger entry in Mistral's [Pixtral](/wiki/pixtral) family of [vision-language models](/wiki/vision_language_model), following the smaller open-weights Pixtral 12B released in September 2024. [1][2]

## What is Pixtral Large?

Pixtral Large extends Mistral Large 2 with image understanding while retaining that model's text-only capabilities. Mistral describes it as the company's frontier multimodal flagship at launch: "Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding," the company wrote in its November 18, 2024 announcement. [1] The model accepts interleaved text and image inputs and produces text outputs, making it suited to tasks such as reading scanned documents, answering questions about figures and infographics, and reasoning over mathematical content presented visually. [1][2]

The model carries a 128,000-token context window, which Mistral states is large enough to fit a minimum of 30 high-resolution images, allowing it to process a long document together with many accompanying images in a single request. At release, Mistral made the model available through its developer API as `pixtral-large-latest`, in its [Le Chat](/wiki/le_chat) consumer assistant, and as downloadable open weights on Hugging Face under a research license. [1][2]

## Who built Pixtral Large?

Mistral AI is a Paris-based artificial intelligence company founded in April 2023 by Arthur Mensch, Guillaume Lample, and Timothee Lacroix, several of whom previously worked at [Meta AI](/wiki/meta_ai) and Google DeepMind. The company built its reputation on releasing capable open-weights language models, including the Mistral 7B and [Mixtral](/wiki/mixtral) mixture-of-experts series, alongside commercial models served through its La Plateforme API. [1]

The Pixtral line is Mistral's family of multimodal models. The first member, Pixtral 12B, launched in September 2024 as an open-weights model under the permissive Apache 2.0 license, combining a 12-billion-parameter language backbone with a 400-million-parameter vision encoder. Pixtral Large, announced roughly two months later, scaled the same approach up to the company's flagship Mistral Large 2 decoder, trading Pixtral 12B's fully open license for a research-and-commercial licensing split appropriate to a frontier-scale model. [1][2]

Mistral has since continued the lineage with larger and newer multimodal systems; the company later released [Mistral Large 3](/wiki/mistral_large_3) and other models, and by 2026 listed Pixtral Large as a legacy model superseded by newer vision-capable releases. [3]

## How is Pixtral Large built?

Pixtral Large uses a two-part design. The language component is a 123-billion-parameter multimodal decoder derived from Mistral Large 2 (released internally as Mistral-Large-Instruct-2407), which provides the model's text reasoning, generation, and instruction-following abilities. Attached to it is a roughly 1-billion-parameter vision encoder, about 2.5 times larger than the 400-million-parameter encoder in Pixtral 12B, that converts input images into representations the decoder can attend to, bringing the combined total to approximately 124 billion parameters. [2]

This architecture follows the common vision-language model pattern of bolting a vision encoder onto a pretrained language decoder, so that the system inherits the text model's broad knowledge and reasoning while adding the ability to ground that reasoning in pixels. The model card distributes Pixtral Large as Mistral Large 24.11, and Mistral recommended the vLLM serving framework for self-deployment, noting at release that the standard Hugging Face Transformers implementation was not yet supported. The instruct version uses Mistral's V7 chat template with explicit system-prompt, instruction, and image tokens. [2]

### What are Pixtral Large's specifications?

| Property | Detail |
|---|---|
| Developer | Mistral AI |
| Announced | November 18, 2024 |
| Model type | Multimodal (vision-language) large language model |
| Total parameters | Approximately 124 billion |
| Language decoder | 123 billion parameters (built on Mistral Large 2) |
| Vision encoder | Approximately 1 billion parameters |
| Context window | 128,000 tokens (minimum of 30 high-resolution images) |
| Inputs / outputs | Text and images in; text out |
| API identifier | `pixtral-large-latest` |
| Weights name | Pixtral-Large-Instruct-2411 (Mistral Large 24.11) |
| License | Mistral Research License (non-commercial); Mistral Commercial License |
| Recommended serving | vLLM |

## How does Pixtral Large perform on benchmarks?

At release, Mistral reported that Pixtral Large delivered frontier-class results across a range of multimodal benchmarks, comparing it against contemporary models including OpenAI's [GPT-4o](/wiki/gpt_4o) (August 2024 version), Anthropic's [Claude 3.5 Sonnet](/wiki/claude_3_5_sonnet) (the October 2024 update), Google's [Gemini 1.5 Pro](/wiki/gemini_1_5_pro), and Meta's [Llama 3.2](/wiki/llama_3_2_vision) 90B. The figures below are drawn from Mistral's published model card and should be read as the company's own reported results. [1][2]

| Benchmark (metric) | Pixtral Large score |
|---|---|
| MathVista (chain-of-thought) | 69.4 |
| MMMU (chain-of-thought) | 64.0 |
| ChartQA (chain-of-thought) | 88.1 |
| DocVQA (ANLS) | 93.3 |
| VQAv2 (VQA match) | 80.9 |
| AI2D (bounding box) | 93.8 |
| MM-MT-Bench | 7.4 |

Mistral highlighted MathVista, a benchmark for mathematical reasoning over visual data, where Pixtral Large scored 69.4 and which the company said outperformed all other models in its comparison set. On document and chart understanding, Mistral claimed the model surpassed GPT-4o and Gemini 1.5 Pro on DocVQA and ChartQA. Independent coverage noted that the model's 88.1 on ChartQA edged GPT-4o's reported 85.2 but trailed Claude 3.5 Sonnet's 89.1, illustrating that the leadership was task-dependent rather than uniform. On MM-MT-Bench, an open, judge-based multimodal evaluation meant to reflect real-world use, Mistral reported that Pixtral Large outperformed Claude 3.5 Sonnet, Gemini 1.5 Pro, and the latest GPT-4o. Mistral also pointed to the LMSys vision arena, where it said Pixtral Large ranked as the leading open-weights model, roughly 50 Elo points ahead of the nearest competitor at the time. [1][2][4]

As with all vendor-reported benchmarks, exact methodology (prompting, chain-of-thought, and evaluation harness) can affect the numbers, and results on rapidly updated competitor models can shift; the comparisons above reflect the specific model versions Mistral evaluated in November 2024. Other vision-language models from the same period not in Mistral's comparison table, such as Alibaba's [Qwen2-VL](/wiki/qwen2_vl), were also competitive on several of these tasks. [2]

## Is Pixtral Large open source, and how is it licensed?

Pixtral Large was released under a dual-licensing model. The open weights are governed by the Mistral Research License (MRL), which permits use for research and educational purposes on a non-commercial basis. Any commercial use, including experimentation, testing, and production deployment, requires a separate Mistral Commercial License obtained from the company. This split mirrors the licensing Mistral applied to Mistral Large 2 and differs from the fully open Apache 2.0 terms of the smaller Pixtral 12B. Because of the non-commercial research license, Pixtral Large is open-weights rather than fully open source in the OSI sense. [1][2]

For developers who prefer hosted access, the model was offered through Mistral's La Plateforme API under the name `pixtral-large-latest` and surfaced in the company's Le Chat assistant. Mistral also stated that Mistral Large 24.11 would become available through cloud partners shortly after launch, beginning with Google Cloud and Microsoft Azure. The downloadable weights were published on Hugging Face as Pixtral-Large-Instruct-2411 for self-hosting under the MRL. In early 2025 Mistral shipped an updated snapshot of the model, Pixtral Large 25.02, which Amazon Web Services made available as a fully managed, serverless option on Amazon Bedrock on April 8, 2025, while keeping the same 124-billion-parameter, 128,000-token configuration. [1][2][5]

## How does Pixtral Large compare to Pixtral 12B?

Pixtral Large and Pixtral 12B share the same overall vision-language design but sit at opposite ends of Mistral's multimodal range. Pixtral 12B is a compact, fully open model aimed at efficient local and commercial use, while Pixtral Large is a frontier-scale flagship built on Mistral's largest text decoder. [1][2]

| Property | Pixtral 12B | Pixtral Large |
|---|---|---|
| Released | September 2024 | November 18, 2024 |
| Language backbone | 12-billion-parameter model | Mistral Large 2 (123 billion) |
| Vision encoder | ~400 million parameters | ~1 billion parameters |
| Total parameters | ~12 billion | ~124 billion |
| Context window | 128,000 tokens | 128,000 tokens |
| License | Apache 2.0 (fully open) | Mistral Research / Commercial License |

## Why does Pixtral Large matter?

Pixtral Large marked Mistral AI's entry into frontier-scale multimodal modeling, extending the company's strategy of competitive open-weights releases from text into vision. By grounding a flagship-size language decoder in images, it gave developers and researchers a high-capacity alternative to proprietary multimodal systems such as GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro, with weights available for inspection and non-commercial use. Its release reinforced the broader 2024 trend toward [multimodal AI](/wiki/multimodal_ai) as a default capability for flagship models rather than a specialized add-on. While Mistral later superseded Pixtral Large with newer vision-capable models, it remained a notable milestone as the largest member of the Pixtral family and Mistral's first frontier-class vision-language release. [1][2][3]

## References

1. Mistral AI. "Pixtral Large." Mistral AI News, November 18, 2024. https://mistral.ai/news/pixtral-large
2. Mistral AI. "Pixtral-Large-Instruct-2411." Hugging Face model card. https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411
3. Mistral AI. "Models Overview." Mistral AI Documentation. https://docs.mistral.ai/getting-started/models/models_overview/
4. MarkTechPost. "Mistral AI Releases Pixtral Large: A 124B Open-Weights Multimodal Model Built on Top of Mistral Large 2." November 18, 2024. https://www.marktechpost.com/2024/11/18/mistral-ai-releases-pixtral-large-a-124b-open-weights-multimodal-model-built-on-top-of-mistral-large-2/
5. Amazon Web Services. "Amazon Bedrock now offers Pixtral Large 25.02, a multimodal model from Mistral AI." AWS What's New, April 8, 2025. https://aws.amazon.com/about-aws/whats-new/2025/04/amazon-bedrock-pixtral-large-25-02-mistral-ai/