Mistral OCR 3

Overview

Mistral OCR 3 is a document-understanding and optical character recognition model from Mistral AI. Released on December 18, 2025, it is the third generation of Mistral's OCR product line and the most capable version to date, delivering a 74% overall win rate over its predecessor on forms, scanned documents, complex tables, and handwritten content.^[1]^[2] The model is exposed through Mistral's API as mistral-ocr-2512 and carries the rolling alias mistral-ocr-latest. It also powers the Document AI Playground inside Mistral AI Studio and is integrated into the Le Chat consumer assistant.^[2]^[3]

Unlike classical pipeline-based OCR engines such as Tesseract, Mistral OCR 3 is a specialized vision-language model that processes entire document pages as images and generates structured markdown output enriched with HTML table reconstruction in a single pass. This end-to-end architecture lets the model preserve merged cells, reading order, and mixed content types without relying on separate segmentation, line detection, or character-recognition stages.^[4]^[10]

The model accepts PDF files, PPTX and DOCX files, and image inputs in PNG, JPEG, and AVIF formats. Documents can be passed as public URLs, private file IDs, or base64 payloads. Structured JSON annotations, bounding box extraction, and document question-answering are available through the same /v1/ocr endpoint.^[2]^[3]

Background

Mistral AI

Mistral AI is a Paris-based artificial intelligence company founded in April 2023 by Arthur Mensch, Guillaume Lample, and Timothée Lacroix, former researchers at Google DeepMind and Meta AI. The company was established with an explicit goal of building a European AI competitor to US-based technology giants, and it has consistently emphasized open weights, EU data residency, and enterprise reliability as differentiators.^[13]

Mistral's product portfolio spans large language models such as Mistral Large and Mistral Small, the Le Chat consumer assistant, the La Plateforme API developer suite, and a growing enterprise document AI stack. OCR 3 sits at the center of the document stack, powering both the developer-facing API and the end-user playground inside Mistral AI Studio.

The need for document AI

Enterprise organizations generate enormous volumes of documents in formats that are not natively machine-readable: paper forms scanned to PDF, photographed receipts, historical archives, printed contracts, and scientific papers with embedded equations and figures. Classical OCR tools such as Tesseract treat document analysis as a series of independent steps: page segmentation, text line detection, character recognition, and optional table reconstruction. Each transition is a potential failure point, and none of the steps understand semantic context.

The shift to large vision-language models created an alternative: train a single model to read a document page as an image and generate structured text directly. Tools like GPT-4o vision and Gemini could do this as a side effect of general-purpose visual reasoning, but at high per-page cost and with inconsistent table handling. By 2025, a market for purpose-built document-AI models had opened up, combining the accuracy of VLM-based recognition with pricing closer to commodity OCR pipelines. Mistral OCR was Mistral's entry into that market.

The Mistral OCR lineage

The OCR product line went from zero to three generations in under a year, which is faster than Mistral's flagship LLM cadence and reflects the competitive pressure in the enterprise document AI segment.

Generation	Model identifier	Public release	Key changes
Mistral OCR (first release)	`mistral-ocr-2503`	March 6, 2025	First proprietary OCR API from Mistral. Claimed 99%+ fuzzy-match accuracy across more than 100 languages and scripts. Throughput of up to 2,000 pages per minute on a single node. Free trials on Le Chat.^[5]^[7]
Mistral OCR 2 (Document AI)	`mistral-ocr-2505`	May 22, 2025	Added structured annotations (bounding boxes, document-level JSON), removed the eight-page limit from OCR 1, rebranded the product around enterprise document understanding, and became available on Google Cloud Vertex AI Model Garden.^[8]^[9]^[14]
Mistral OCR 3	`mistral-ocr-2512`	December 18, 2025	Major accuracy uplift on handwriting, forms, scanned documents, and complex tables. `mistral-ocr-latest` alias points here. Fully backward compatible with OCR 2 API contracts.^[1]^[2]

Mistral OCR 2 carries a published deprecation date of February 27, 2026, after which OCR 3 becomes the sole supported version.^[8]

Original Mistral OCR (March 2025)

The initial release was announced on March 6, 2025, billed as "the world's best document understanding API." Mistral described the model as capable of processing PDFs and images, extracting interleaved text and embedded images, handling tables, mathematical expressions, and LaTeX-formatted layouts, and outputting everything in structured markdown. The pricing was set at $1 per 1,000 pages (standard) with a batch discount bringing it to 2,000 pages per dollar.^[5]

The launch announcement included benchmark numbers: 94.89% overall accuracy on an internal benchmark, 94.29% on math-heavy content, 96.12% on tables, and 98.96% on scanned documents. Le Chat offered free document trials so consumers could try the capability without an API key. Within weeks, the model was also described as available for selective on-premises deployment.^[5]^[7]

TechCrunch characterized the release as turning "any PDF document into an AI-ready Markdown file," noting the importance for downstream RAG and agent pipelines. InfoQ highlighted the 2,000-page-per-minute throughput on a single node as unusually high for a VLM-based approach.^[7]

Mistral OCR 2 (May 2025)

Released on May 22, 2025, OCR 2 added structured annotations to the response. The bbox_annotation field returned bounding box coordinates for detected regions including charts, figures, and text blocks, allowing downstream systems to reference exact positions rather than just text content. A document_annotation field returned a JSON object structured around any schema the caller provided, enabling custom extraction templates for invoices, receipts, ID documents, or compliance forms without additional prompting.^[8]^[9]

OCR 2 also lifted the eight-page limit from the original release, making it practical for full-length contracts and multi-chapter reports. Mistral packaged OCR 2 alongside Le Chat Enterprise and made both available on Google Cloud Vertex AI Model Garden as Model-as-a-Service, giving Google Cloud customers a managed deployment option without running their own API key setup.^[14]

The pricing moved slightly: standard pages remained at approximately $2 per 1,000 (reflecting a pricing adjustment from the OCR 1 launch), and annotated pages were priced at $3 per 1,000.

Architecture

Mistral has not published the parameter count, training data composition, or full architectural specification for OCR 3. Based on public documentation and independent technical reviews, the following describes what is known about the model's design.

End-to-end vision-language approach

Mistral OCR 3 is not a classical OCR pipeline. Classical OCR systems string together multiple specialized models: a layout analysis model identifies page regions, a text line detector finds baselines, a character recognition network transcribes characters, and a separate table reconstruction model tries to infer rows and columns from spatial coordinates. Each model is trained independently, and errors in early stages propagate through later ones.

OCR 3 replaces this chain with a single vision-language model. The processing pipeline has four stages:^[4]

Document ingestion: PDF pages are converted to raster images at 150 to 300 DPI. Each page becomes a separate visual input.
Vision encoding: A vision encoder extracts features from the rasterized page, capturing text positions, table layouts, figure boundaries, and structural elements.
Autoregressive generation: A language model decoder generates markdown token by token, conditioned on the visual features. The decoder learns to produce headings, paragraphs, table tags, and LaTeX math notation as appropriate to what it sees.
Page aggregation: For multi-page documents, the outputs from individual pages are concatenated while managing page breaks and content that continues across pages.

This architecture is why OCR 3 can maintain merged-cell structure in HTML tables (using colspan and rowspan attributes) and switch fluidly between cursive handwriting and printed text on the same page. The decoder understands context across the entire page rather than working character by character.

Specialization trade-off

Because OCR 3 was trained specifically for document conversion, it performs structured extraction faster and more consistently than a generalist model like GPT-4o on the same tasks. The trade-off is that it cannot answer questions about document content or reason about visual elements; it only extracts and structures what is present. For document QnA use cases, the intended workflow is to feed OCR 3 output into a separate language model.^[3]^[4]

Mistral markets this specialization explicitly as a cost advantage. Running GPT-4o with vision on 1,000 pages costs substantially more than $2, and the output table HTML tends to be less consistently structured than OCR 3's dedicated table reconstruction.

Output format

OCR 3 returns markdown with embedded HTML table blocks. Callers can choose between markdown table syntax and full HTML (with colspan/rowspan support) depending on whether their downstream pipeline needs to preserve complex table structures. Optional parameters control header and footer extraction, hyperlink detection, and image embedding as base64 data.^[2]^[3]

When annotations are enabled, the response adds bounding box coordinates for each detected region and a structured JSON object whose schema the caller defines. This lets a single API call both extract the full document text and populate a custom data model, eliminating the need for a second LLM call to parse the OCR output.

Capabilities

Multilingual and multiscript support

Mistral OCR 3 supports more than 35 languages with confirmed coverage including Latin-based scripts, Cyrillic, Arabic, Hindi, Traditional Chinese, and Simplified Chinese.^[12] The original Mistral OCR announcement cited "thousands of scripts, fonts, and languages across all continents" with 99%+ fuzzy-match accuracy in benchmarks.^[5]

Independent benchmarks on OmniDocBench show English text accuracy at 94.6% and Chinese text accuracy at 86.1%, with a mixed-language composite of 86.2%.^[11] Mistral's own benchmark from the March 2025 launch showed strong performance on Russian, French, Hindi, Mandarin, Portuguese, German, Spanish, Turkish, Ukrainian, Italian, and Romanian.

The multilingual range is relevant to Mistral's European enterprise positioning: a single model that handles French, German, Spanish, and Eastern European languages without separate regional models matters for organizations serving the EU market. Arabic and Hindi support extends the addressable market into financial services and government document processing in the Middle East and South Asia.

Mathematical equations and scientific content

Mistral OCR 3 handles mathematical notation natively, rendering equations in LaTeX syntax inside the markdown output. This makes it directly useful for digitizing scientific papers, textbooks, and technical reports where other OCR tools either skip equations or output garbled character sequences.

On the OmniDocBench benchmark, OCR 3 achieved a formula accuracy of 78.2% and particularly strong results on academic content: 97.9% text accuracy on academic literature and 95.8% on research reports.^[11] These numbers reflect documents with dense inline math and mixed figure-and-text layouts.

Scientific publishers and research institutions have experimented with Mistral OCR to convert journal archives into AI-readable formats for downstream retrieval-augmented generation pipelines. The ability to extract not just text but structural elements like section headings, figure captions, and equation blocks in a single pass reduces the preprocessing burden for knowledge base construction.^[15]

Table reconstruction

Table handling is one of the areas where OCR 3 shows the clearest improvement over both its predecessors and competing products. The model produces HTML table markup with colspan and rowspan attributes, preserving merged cells, multi-row headers, and nested column hierarchies that simpler OCR tools flatten or drop entirely.^[2]

On the complex table accuracy benchmark from the PyImageSearch review, OCR 3 scored 96.6%, compared to 84.8% for AWS Textract and 85.9% for Azure Document Intelligence.^[10] On the OmniDocBench table TEDS (Tree Edit Distance Similarity) metric, OCR 3 scored 70.9% overall, with higher scores on structured document types: 83.0% on academic literature, 88.0% on exam papers, and 82.7% on books.^[11]

The HTML table output is designed to feed directly into LLM prompts without intermediate parsing. A table with merged headers comes through as valid HTML rather than as a confusing sequence of pipe-delimited rows, which improves the quality of downstream question-answering and data extraction.

Handwriting recognition

Handwriting is the hardest category for classical OCR tools because character forms vary widely between individuals, cursive strokes connect letters in ways that character-level models cannot decompose, and real documents often mix handwritten annotations with printed text on the same page.

OCR 3 scored 88.9% on the handwriting accuracy benchmark in the PyImageSearch review, compared to 78.2% for Azure Document Intelligence, 73.9% for Google Document AI, and 57.2% for DeepSeek-OCR.^[10] Qualitatively, the model handles cursive text, field annotations on printed forms, and signatures layered over typed content more reliably than OCR 2 did.

The 74% win rate figure Mistral cites for OCR 3 over OCR 2 was calculated specifically on a set of handwritten, form-heavy, and scanned document types that OCR 2 handled poorly, using fuzzy-match scoring against human-verified ground truth.^[2]

Charts, figures, and non-text elements

For non-text elements such as charts, graphs, and photographs, OCR 3 extracts the image data and returns it as a base64-encoded block inside the markdown, with a placeholder in the text stream indicating where the image appeared. The model does not analyze or describe image content; that step is left to a downstream VLM or vision model if needed.^[3]^[4]

This is a deliberate design choice. Attempting to interpret charts or figures would require a general visual reasoning capability that goes beyond OCR and would add latency and cost. The current approach ensures the extracted image can be passed to a model better suited to visual analysis, such as Pixtral or GPT-4o, without re-uploading the source document.

Document structure preservation

Beyond individual element types, OCR 3 preserves the reading order and hierarchical structure of the document: headings map to markdown heading levels, nested lists are nested in the output, footnotes are extracted in position rather than dumped at the end of the file, and page breaks are annotated. The OmniDocBench reading order score was 91.6%.^[11]

This matters for legal and compliance workflows where the document structure carries semantic meaning. A contract section that appears as a top-level heading versus a subheading can affect how an extraction pipeline interprets the hierarchy of obligations. Classical OCR tools that output plain text lose this structure entirely.

Benchmarks

Mistral did not publish a single consolidated benchmark table in the OCR 3 launch post. The numbers below come from a mix of Mistral's own internal evaluations, the independent CodeSOTA benchmark run in December 2025, and the PyImageSearch technical review from December 23, 2025. All third-party numbers should be read with awareness that vendor-sponsored and single-reviewer benchmarks have inherent limitations.

Performance by document type (PyImageSearch review)

Document type	Mistral OCR 3	Azure Document Intelligence	AWS Textract	Google Document AI	DeepSeek-OCR
Handwriting accuracy	88.9%	78.2%	N/A	73.9%	57.2%
Complex tables	96.6%	85.9%	84.8%	N/A	N/A
Forms	95.9%	86.2%	84.5%	N/A	N/A
Historical scanned documents	96.7%	83.7%	N/A	87.1%	81.1%
Multilingual English	98.6%	93.5%	93.9%	N/A	N/A

OmniDocBench (CodeSOTA, December 19, 2025)

OmniDocBench is an open benchmark of 1,355 document images maintained by the OpenDataLab group, released with the CVPR 2025 paper. It covers academic papers, books, exam papers, newspapers, and research reports in English and Chinese.

Metric	Score	Notes
Composite score	79.75	Overall across all document types
Text accuracy	90.1%	All document types
English text accuracy	94.6%	English subset
Chinese text accuracy	86.1%	Chinese subset
Mixed-language text accuracy	86.2%
Table TEDS	70.9%	Tree edit distance similarity
Formula accuracy	78.2%	LaTeX equation extraction
Reading order accuracy	91.6%

By document category within OmniDocBench:^[11]

Document category	Text accuracy	Table TEDS
Academic literature	97.9%	83.0%
Exam papers	92.8%	88.0%
Books	93.9%	82.7%
Research reports	95.8%	82.0%
Newspapers	67.0%	58.3%

The lower newspaper scores reflect the challenge of irregular multi-column magazine-style layouts, where OCR 3 sometimes misreads reading order across columns.

OCRBench v2 (CodeSOTA, December 21, 2025)

OCRBench v2 is a 7,400-sample benchmark that mixes pure text extraction with visual question-answering tasks. Because OCR 3 is a specialist extraction model with no VQA capability, it scores high on the extraction sub-tasks and low on the reasoning sub-tasks.

Sub-task	Score
Overall	25.2%
Full-page OCR	79.1%
Document parsing	55.2%
Text recognition	32.5%

The 25.2% overall figure versus 79.1% on full-page OCR shows the penalty OCR 3 pays for skipping visual reasoning. General large vision language models score 55 to 62% on OCRBench v2 overall because they can answer visual questions even if their table reconstruction is less precise.^[11]

For comparison, PaddleOCR-VL scored 92.86 on OmniDocBench, though it is a hybrid model that combines traditional OCR with visual understanding and operates differently from a pure API service like Mistral OCR 3.

Pricing

The pricing structure as of the December 2025 release is straightforward.

Tier	Price	Notes
Standard OCR	$2 per 1,000 pages	Via the synchronous `/v1/ocr` endpoint.^[2]
Annotated OCR	$3 per 1,000 annotated pages	When structured annotations (bounding boxes, JSON schema extraction) are requested.^[3]
Batch API	$1 per 1,000 pages	50% discount for asynchronous batch jobs submitted as `.jsonl` files.^[2]

For comparison, AWS Textract basic text detection costs $1.50 per 1,000 pages, rising to $15 per 1,000 pages for its forms and tables features. Google Document AI Document OCR costs $1.50 per 1,000 pages for standard document processing, with enterprise features priced higher. Azure Document Intelligence Read starts at $1.50 per 1,000 pages.

Mistral's batch price of $1 per 1,000 pages puts it at the low end of the enterprise OCR pricing band, particularly given the accuracy claims on forms and tables that would otherwise require more expensive specialized features from Textract or Document AI.^[10]

One practical note: the free tier on La Plateforme may use submitted documents for model training. Organizations with data sensitivity requirements should use the paid tier or the self-hosted option, which does not send data to Mistral's training pipeline.

API and integrations

The `/v1/ocr` endpoint

The OCR 3 API follows a simple request-response pattern. A caller sends a document reference (URL, file ID, or base64 payload) with the model name and optional parameters, and receives a JSON response containing the markdown text, page-level metadata, and any requested annotations.

Key request parameters include:

model: mistral-ocr-2512 or mistral-ocr-latest
document: object with type (document_url for PDFs/DOCX/PPTX, image_url for images) and url or data
include_image_base64: returns embedded images as base64 in the response
document_annotation_format: JSON schema for custom structured extraction
bbox_annotation_format: schema for bounding box annotations

A minimal Python call using the official SDK:

from mistralai import Mistral

client = Mistral(api_key="your-api-key")
response = client.ocr.process(
    model="mistral-ocr-latest",
    document={"type": "document_url", "url": "https://example.com/document.pdf"}
)
print(response.pages<sup><a href="#cite_note-0" class="cite-ref">[0]</a></sup>.markdown)

The SDK is installed via pip install mistralai.^[11]

Document AI Playground

Mistral AI Studio includes a Document AI Playground that lets analysts and product managers test OCR 3 without writing code. Users can drag and drop a PDF or image, select output parameters, and receive the markdown and JSON output immediately. The playground is useful for prototyping extraction schemas before committing to an API integration.^[2]^[3]

Batch processing

The batch API accepts .jsonl files where each line is a self-contained OCR request. Batch jobs are processed asynchronously, which lets callers submit large archives without holding open long-running connections. The 50% price discount on batch jobs makes the pipeline economical for high-volume archiving and migration projects.^[2]

Le Chat integration

Le Chat, Mistral's consumer AI assistant, uses the OCR stack as its document understanding layer. When a user uploads a PDF or image to Le Chat, Mistral OCR processes the document in the background and passes the extracted markdown to the conversation context before the language model generates a response.^[13]^[15]

This integration was present from the original OCR 1 release in March 2025. The OCR 3 upgrade improved the quality of document understanding visible to Le Chat users without any change to the chat interface itself. Le Chat Enterprise, available on Google Cloud Marketplace since May 2025, also uses the OCR stack for its enterprise document library features.^[14]

Chat completions and agents

The /v1/ocr endpoint integrates with Mistral's chat completions and agents APIs. A developer can extract a document with OCR 3 and then pass the markdown directly into a /v1/chat/completions call with one of Mistral's language models, creating a single-API-key workflow for document QnA, summarization, or structured data extraction without switching providers.^[3]

Azure AI Foundry and Google Cloud

Mistral OCR is available as a managed API on both Microsoft Azure AI Foundry and Google Cloud Vertex AI Model Garden, giving enterprise buyers access to the model through their existing cloud contracts and billing relationships. These deployments run on Mistral's infrastructure but are accessible via the respective cloud management consoles.^[14]

Use cases

Legal document processing

Legal workflows generate dense, structure-heavy documents: contracts with numbered sections and cross-references, court filings with exhibit attachments, compliance disclosures with standardized table formats, and regulatory submissions mixing text, forms, and signatures. Classical OCR tools can extract text but typically lose the hierarchical structure that determines how clauses relate to each other.

OCR 3's heading-level preservation and HTML table output make it practical for building contract review pipelines where section hierarchy matters. A compliance team can extract a regulatory filing into structured markdown, pass it to a language model for analysis, and receive responses that correctly attribute obligations to the right clause levels. Law firms and legal technology companies are among the enterprise buyers Mistral highlights for self-hosted deployments.^[2]^[15]

Finance and accounting

Invoice processing, receipt capture, and financial statement digitization are high-volume, accuracy-sensitive workflows. OCR errors in financial documents propagate directly into accounting systems. OCR 3's 95.9% forms accuracy benchmark and its ability to output extraction results as structured JSON against a caller-defined schema make it applicable to accounts payable automation, expense management, and KYC document processing.^[10]

The batch API pricing at $1 per 1,000 pages matters here because finance teams often need to process large archives of historical invoices when migrating accounting systems or preparing for audits. Running 500,000 invoices through a batch job costs $500, compared to several thousand dollars for equivalent volume through Textract's forms feature.

Banks and financial institutions with strict data sovereignty requirements are explicitly called out by Mistral as the target audience for the self-hosted deployment option, which keeps document content inside the buyer's own infrastructure.^[2]

Scientific publishing and research

Scientific papers contain a mix of dense prose, mathematical equations, structured tables, figure captions, and references. Converting a PDF paper to clean text for downstream processing is harder than it looks because equation rendering in PDF uses arbitrary character substitutions that classical OCR cannot decode, and table cells in papers often span multiple rows with no consistent formatting.

OCR 3's 78.2% formula accuracy on OmniDocBench, combined with 97.9% text accuracy on academic literature, makes it one of the better options for bulk scientific paper processing. Research institutions and publishers have begun using it to build RAG corpora from journal archives, where the alternative is often paying for commercial PDF-to-text conversion services that cost more per page and produce less clean output.^[11]^[15]

Archive digitization and cultural heritage

Historical archives contain documents scanned at varying quality levels, with age-related degradation, staining, skew from improper scanning, and handwritten marginalia layered over printed text. OCR 3's 96.7% accuracy on historical scanned documents in the PyImageSearch benchmark reflects training on degraded real-world scans rather than clean synthetic test sets.^[10]

Museums, national archives, libraries, and government agencies processing historical records are a natural fit. The model handles early-20th-century typed documents with physical damage, mid-century forms with handwritten completions, and mixed-period collections with varying scan quality within a single batch job.

Enterprise knowledge base construction

Organizations maintain large internal document libraries: product manuals, technical specifications, policy documents, internal reports, and training materials. Making this content searchable and usable by AI assistants requires converting it into clean text. OCR 3's throughput and per-page pricing make it practical to process entire enterprise document repositories that would take weeks with manual conversion or more expensive API services.

The Document QnA integration (combining OCR output with a language model in a single API call) reduces the engineering effort for building internal knowledge assistants, since the extraction and question-answering steps share one API key and one billing account.

Comparison with peer products

The document AI market in 2025 covers three distinct product categories: traditional enterprise document AI services from cloud providers, generalist VLMs used for document extraction as a secondary capability, and open-source OCR tools. Mistral OCR 3 is positioned to compete across all three.

Traditional enterprise document AI

Product	Operator	Standard price per 1,000 pages	Key strength	Known limitation
Mistral OCR 3	Mistral AI	$2 (standard), $1 (batch)	Handwriting, complex tables, EU data residency	No fine-tuning, no chart interpretation
AWS Textract	Amazon Web Services	$1.50 (text), up to $15 (forms and tables)	Deep AWS integration, mature compliance features	Complex table accuracy 84.8% in benchmarks
Google Document AI	Google Cloud	$1.50 (standard)	Large document type library, form parser variants	Handwriting 73.9% in benchmarks
Azure Document Intelligence	Microsoft	$1.50 (read)	Microsoft ecosystem, pre-built form models	Handwriting 78.2% in benchmarks
Adobe PDF Extract	Adobe	Variable (enterprise contract)	Native PDF fidelity, Acrobat ecosystem	High cost for bulk processing

Generalist VLMs used for OCR

GPT-4o vision and Gemini can extract text from document images, but they are general-purpose models priced for reasoning tasks rather than bulk page processing. GPT-4o charges per token, and a dense document page can generate thousands of output tokens, making cost-per-page substantially higher than Mistral OCR 3's $2 flat rate. Table reconstruction is less consistently structured from generalist models, and VLMs introduce latency and cost from image understanding that OCR 3 avoids by being purpose-built.

The trade-off runs the other way for tasks requiring visual reasoning: GPT-4o can interpret a chart, answer a question about a graph, or describe an architectural diagram, while OCR 3 passes such elements through as base64 image data without interpretation.

Open-source OCR alternatives

Tool	Operator	License	Key strength	Limitation vs OCR 3
Tesseract	Google (maintained by community)	Apache 2.0	Free, widely deployed, extensive language support	Poor table and handwriting accuracy, no structure preservation
Surya	VikParuchuri	GPL-3.0	Open weights, good layout analysis	Slower throughput, GPU required for batch
olmOCR	Allen AI	Apache 2.0	Academic focus, formula-aware	Less tested on business document types
MinerU	OpenDataLab	AGPL-3.0	Strong on scientific papers, open	Complex deployment, high resource requirements
Nougat	Meta AI	MIT	LaTeX math output, academic papers	Narrow domain, not production-hardened
DocLing	IBM	Apache 2.0	Enterprise design, multiple connectors	Accuracy behind Mistral OCR 3 on benchmarks

Open-source tools avoid per-page fees and allow fine-tuning on domain-specific document types. The trade-off is deployment complexity, infrastructure cost for GPU-accelerated batch processing, and generally lower accuracy on handwriting and complex forms.

Reception

The initial Mistral OCR release in March 2025 drew positive coverage from technical press. TechCrunch described it as straightforward and well-priced for the emerging document-to-LLM pipeline use case.^[7] VentureBeat framed the positioning against cloud OCR incumbents as ambitious but plausible given the accuracy benchmarks.^[13]

The OCR 3 release in December 2025 received similarly positive initial coverage. InfoQ noted the improvement methodology was notable for using fuzzy-match scoring on real business documents rather than synthetic benchmarks, which is harder to game.^[1] The VentureBeat headline focused on the 74% win rate and $2 pricing as the commercial argument for enterprise buyers who had been locked into AWS or Google cloud OCR services.

The CodeSOTA independent benchmark run in December 2025 provided verification outside Mistral's own marketing. The OmniDocBench composite score of 79.75 put OCR 3 behind hybrid models like PaddleOCR-VL (92.86) but well ahead of simpler pipeline OCR tools.^[11] The OCRBench v2 overall score of 25.2% drew some attention because it looks low, though reviewers noted this reflects the benchmark's VQA component rather than any failure in text extraction.

Independent testing by Parsio uncovered a format sensitivity issue: PDF-to-JPEG conversions of the same document sometimes outperform direct PDF uploads, which suggests the PDF rasterization pipeline has inconsistencies.^[16] The PyImageSearch review flagged hallucination risk for financial documents, noting that high-fidelity markdown output can look correct to a human reviewer even when specific digits are wrong.^[10]

Analytics Vidhya summarized the community view after several weeks of testing: the model is well-suited for high-structure documents like invoices, forms, and academic papers, but irregular layouts like magazines and multi-column newspaper pages remain a weakness.^[12]

Limitations

Several limitations have been identified through independent testing and community reports:

Format sensitivity: Direct PDF uploads sometimes produce worse results than rasterizing the PDF to JPEG first and submitting the image. This is the opposite of the expected behavior and creates inconsistency for production pipelines that cannot predict which format will work better for a given document.^[16]

Multi-column layout handling: OCR 3 can struggle with magazine-style layouts where text flows across irregular columns. The model sometimes attempts to represent non-tabular columnar text as an HTML table, which introduces structure where none was intended.^[10]^[11] The newspaper accuracy on OmniDocBench (67.0% text, 58.3% table TEDS) reflects this.

No fine-tuning: The model is a managed SaaS endpoint with no customer fine-tuning option, outside of the self-hosted offering for large enterprise buyers. Organizations with highly domain-specific document types (specialized medical forms, unusual financial instruments, proprietary engineering drawings) cannot adapt the model to their vocabulary and layout conventions.^[10]

Digit hallucination risk: The PyImageSearch review noted that OCR 3 can generate confident-looking output with flipped or wrong digits in numeric fields. Unlike a character-level model that fails visibly on ambiguous characters, OCR 3's language-model backbone may interpolate plausible-looking numbers. Human verification or cross-checking against known data is advisable for financial documents.^[10]

No visual reasoning: OCR 3 does not interpret charts, graphs, diagrams, or photographs. These are returned as base64 image data with a position placeholder in the text stream. Any interpretation requires a second model call.^[3]^[4]

Connectivity requirement: The standard offering is a SaaS API with no offline mode. The self-hosted option addresses this for buyers with infrastructure budgets, but it is not a drop-in substitute for a locally runnable open-source tool.

Free tier data usage: On the free tier of La Plateforme, submitted documents may be used for model training. This is a standard clause for free AI APIs but can be a showstopper for documents containing sensitive personal or commercial information.

Privacy and European positioning

Mistral is a Paris-based company and has consistently used EU data residency as a competitive argument against US hyperscalers. OCR 3 fits that positioning in two ways.

The standard managed service runs on Mistral's own infrastructure rather than AWS, Azure, or Google Cloud, which addresses the concern some EU customers have about data processed by US cloud services under CLOUD Act jurisdiction. The self-hosted option goes further, allowing banks, hospitals, and public-sector organizations to run the model entirely inside their own perimeter with no document content leaving their network.^[2]

Mistral's governance pages classify OCR 3 as a specialized base model with active lifecycle status and no scheduled retirement date beyond the OCR 2 deprecation on February 27, 2026.^[1]^[8] The active classification signals to enterprise buyers that they can build long-running pipelines on mistral-ocr-2512 without an imminent forced migration.

The combination of EU-based infrastructure, GDPR-native data handling, and a self-hosted deployment path has been Mistral's clearest differentiation from AWS Textract and Google Document AI, which both store and process data in US-based data centers by default (though both offer EU region options).

Place in Mistral's broader strategy

Mistral's enterprise product portfolio in 2025 had four main components: La Plateforme for general API access to LLMs, Mistral AI Studio for hosted UI tools including the Document AI Playground, the Document AI API suite (OCR, annotations, and document QnA), and Le Chat for consumer and enterprise assistant use cases. OCR 3 connects the document side of this stack.

The pricing structure is part of the competitive strategy. At $1 per 1,000 pages in batch mode, Mistral is priced below the entry tier of AWS Textract's forms and tables feature while claiming higher accuracy on the most difficult enterprise workloads. The gap between what Textract charges for full form extraction ($15 per 1,000 pages) and what OCR 3 charges ($2 standard, $1 batch) is substantial enough to drive procurement conversations at large organizations processing millions of documents per year.

OCR 3 also strengthens Le Chat Enterprise as a product. A business deploying Le Chat Enterprise through Google Cloud Marketplace gets document understanding through OCR 3 without a separate contract, which simplifies procurement and makes Le Chat a more complete platform for knowledge worker use cases beyond text chat.

References

InfoQ. "Mistral Releases OCR 3 with Improved Accuracy on Handwritten and Structured Documents." January 2026. https://www.infoq.com/news/2026/01/mistral-ocr3/
Mistral AI. "Introducing Mistral OCR 3." https://mistral.ai/news/mistral-ocr-3
Mistral Docs. "OCR 3." https://docs.mistral.ai/models/ocr-3-25-12
CodeSOTA. "How Mistral OCR 3 Works: Architecture and Design Decisions." December 2025. https://www.codesota.com/ocr/mistral-ocr-3/explanation
Mistral AI. "Mistral OCR." March 6, 2025. https://mistral.ai/news/mistral-ocr
MarkTechPost. "Mistral AI Releases OCR 3." December 19, 2025. https://www.marktechpost.com/2025/12/19/mistral-ai-releases-ocr-3-a-smaller-optical-character-recognition-ocr-model-for-structured-document-ai-at-scale/
InfoQ. "Mistral AI Launches API for LLM-Based OCR of Multimodal Documents." March 2025. https://www.infoq.com/news/2025/03/mistral-ai-ocr-api/
Mistral Docs. "OCR 2." https://docs.mistral.ai/models/ocr-2-25-05
Mistral Docs. "Changelog." https://docs.mistral.ai/getting-started/changelog
PyImageSearch. "Mistral OCR 3 Technical Review: SOTA Document Parsing at Commodity Pricing." December 23, 2025. https://pyimagesearch.com/2025/12/23/mistral-ocr-3-technical-review-sota-document-parsing-at-commodity-pricing/
CodeSOTA. "Mistral OCR 3 Review: VERIFIED Benchmarks, Pricing & Code." December 2025. https://www.codesota.com/ocr/mistral-ocr-3
Analytics Vidhya. "Is Mistral OCR 3 the Best OCR Model?" December 2025. https://www.analyticsvidhya.com/blog/2025/12/mistral-ocr-3/
VentureBeat. "Mistral launches OCR 3 to digitize enterprise documents, touts 74% win rate and $2-per-1,000-page pricing." December 2025. https://venturebeat.com/infrastructure/mistral-launches-ocr-3-to-digitize-enterprise-documents-touts-74-win-rate
Google Cloud Blog. "Mistral AI's Le Chat Enterprise and Mistral OCR 25.05 on Google Cloud." May 24, 2025. https://cloud.google.com/blog/products/ai-machine-learning/mistral-ais-le-chat-enterprise-and-mistral-ocr-25-05-on-google-cloud
Mistral AI. "Enterprise Document AI & OCR." https://mistral.ai/solutions/document-ai
Parsio. "Mistral OCR Tested: Pros, Cons, and How It Compares to Other OCR Models." https://parsio.io/blog/mistral-ocr-test-review/
DataCamp. "Mistral OCR 3: A Full Guide with Mistral AI Studio and Python." https://www.datacamp.com/tutorial/mistral-ocr-3-full-guide

Overview

Background

Mistral AI

The need for document AI

The Mistral OCR lineage

Original Mistral OCR (March 2025)

Mistral OCR 2 (May 2025)

Architecture

End-to-end vision-language approach

Specialization trade-off

Output format

Capabilities

Multilingual and multiscript support

Mathematical equations and scientific content

Table reconstruction

Handwriting recognition

Charts, figures, and non-text elements

Document structure preservation

Benchmarks

Performance by document type (PyImageSearch review)

OmniDocBench (CodeSOTA, December 19, 2025)

OCRBench v2 (CodeSOTA, December 21, 2025)

Pricing

API and integrations

The /v1/ocr endpoint

Document AI Playground

Batch processing

Le Chat integration

Chat completions and agents

Azure AI Foundry and Google Cloud

Use cases

Legal document processing

Finance and accounting

Scientific publishing and research

Archive digitization and cultural heritage

Enterprise knowledge base construction

Comparison with peer products

Traditional enterprise document AI

Generalist VLMs used for OCR

Open-source OCR alternatives

Reception

Limitations

Privacy and European positioning

Place in Mistral's broader strategy

See also

References

Improve this article

Related Articles

Mixtral

Arthur Mensch

Le Chat Enterprise

Mistral 7B

Codestral

Mistral Large

Overview

Background

Mistral AI

The need for document AI

The Mistral OCR lineage

Original Mistral OCR (March 2025)

Mistral OCR 2 (May 2025)

Architecture

End-to-end vision-language approach

Specialization trade-off

Output format

Capabilities

Multilingual and multiscript support

Mathematical equations and scientific content

Table reconstruction

Handwriting recognition

Charts, figures, and non-text elements

Document structure preservation

Benchmarks

Performance by document type (PyImageSearch review)

OmniDocBench (CodeSOTA, December 19, 2025)

OCRBench v2 (CodeSOTA, December 21, 2025)

Pricing

API and integrations

The /v1/ocr endpoint

Document AI Playground

The `/v1/ocr` endpoint

The `/v1/ocr` endpoint