Mistral OCR 3 is a document-understanding and optical character recognition model from Mistral AI. Released on December 18, 2025, it is the third generation of Mistral's OCR product line and the most capable version to date, delivering a 74% overall win rate over its predecessor on forms, scanned documents, complex tables, and handwritten content.[1][2] The model is exposed through Mistral's API as mistral-ocr-2512 and carries the rolling alias mistral-ocr-latest. It also powers the Document AI Playground inside Mistral AI Studio and is integrated into the Le Chat consumer assistant.[2][3]
Unlike classical pipeline-based OCR engines such as Tesseract, Mistral OCR 3 is a specialized vision-language model that processes entire document pages as images and generates structured markdown output enriched with HTML table reconstruction in a single pass. This end-to-end architecture lets the model preserve merged cells, reading order, and mixed content types without relying on separate segmentation, line detection, or character-recognition stages.[4][10]
The model accepts PDF files, PPTX and DOCX files, and image inputs in PNG, JPEG, and AVIF formats. Documents can be passed as public URLs, private file IDs, or base64 payloads. Structured JSON annotations, bounding box extraction, and document question-answering are available through the same /v1/ocr endpoint.[2][3]
Mistral AI is a Paris-based artificial intelligence company founded in April 2023 by Arthur Mensch, Guillaume Lample, and Timothée Lacroix, former researchers at Google DeepMind and Meta AI. The company was established with an explicit goal of building a European AI competitor to US-based technology giants, and it has consistently emphasized open weights, EU data residency, and enterprise reliability as differentiators.[13]
Mistral's product portfolio spans large language models such as Mistral Large and Mistral Small, the Le Chat consumer assistant, the La Plateforme API developer suite, and a growing enterprise document AI stack. OCR 3 sits at the center of the document stack, powering both the developer-facing API and the end-user playground inside Mistral AI Studio.
Enterprise organizations generate enormous volumes of documents in formats that are not natively machine-readable: paper forms scanned to PDF, photographed receipts, historical archives, printed contracts, and scientific papers with embedded equations and figures. Classical OCR tools such as Tesseract treat document analysis as a series of independent steps: page segmentation, text line detection, character recognition, and optional table reconstruction. Each transition is a potential failure point, and none of the steps understand semantic context.
The shift to large vision-language models created an alternative: train a single model to read a document page as an image and generate structured text directly. Tools like GPT-4o vision and Gemini could do this as a side effect of general-purpose visual reasoning, but at high per-page cost and with inconsistent table handling. By 2025, a market for purpose-built document-AI models had opened up, combining the accuracy of VLM-based recognition with pricing closer to commodity OCR pipelines. Mistral OCR was Mistral's entry into that market.
The OCR product line went from zero to three generations in under a year, which is faster than Mistral's flagship LLM cadence and reflects the competitive pressure in the enterprise document AI segment.
| Generation | Model identifier | Public release | Key changes |
|---|---|---|---|
| Mistral OCR (first release) | mistral-ocr-2503 | March 6, 2025 | First proprietary OCR API from Mistral. Claimed 99%+ fuzzy-match accuracy across more than 100 languages and scripts. Throughput of up to 2,000 pages per minute on a single node. Free trials on Le Chat.[5][7] |
| Mistral OCR 2 (Document AI) | mistral-ocr-2505 | May 22, 2025 | Added structured annotations (bounding boxes, document-level JSON), removed the eight-page limit from OCR 1, rebranded the product around enterprise document understanding, and became available on Google Cloud Vertex AI Model Garden.[8][9][14] |
| Mistral OCR 3 | mistral-ocr-2512 | December 18, 2025 | Major accuracy uplift on handwriting, forms, scanned documents, and complex tables. mistral-ocr-latest alias points here. Fully backward compatible with OCR 2 API contracts.[1][2] |
Mistral OCR 2 carries a published deprecation date of February 27, 2026, after which OCR 3 becomes the sole supported version.[8]
The initial release was announced on March 6, 2025, billed as "the world's best document understanding API." Mistral described the model as capable of processing PDFs and images, extracting interleaved text and embedded images, handling tables, mathematical expressions, and LaTeX-formatted layouts, and outputting everything in structured markdown. The pricing was set at $1 per 1,000 pages (standard) with a batch discount bringing it to 2,000 pages per dollar.[5]
The launch announcement included benchmark numbers: 94.89% overall accuracy on an internal benchmark, 94.29% on math-heavy content, 96.12% on tables, and 98.96% on scanned documents. Le Chat offered free document trials so consumers could try the capability without an API key. Within weeks, the model was also described as available for selective on-premises deployment.[5][7]
TechCrunch characterized the release as turning "any PDF document into an AI-ready Markdown file," noting the importance for downstream RAG and agent pipelines. InfoQ highlighted the 2,000-page-per-minute throughput on a single node as unusually high for a VLM-based approach.[7]
Released on May 22, 2025, OCR 2 added structured annotations to the response. The bbox_annotation field returned bounding box coordinates for detected regions including charts, figures, and text blocks, allowing downstream systems to reference exact positions rather than just text content. A document_annotation field returned a JSON object structured around any schema the caller provided, enabling custom extraction templates for invoices, receipts, ID documents, or compliance forms without additional prompting.[8][9]
OCR 2 also lifted the eight-page limit from the original release, making it practical for full-length contracts and multi-chapter reports. Mistral packaged OCR 2 alongside Le Chat Enterprise and made both available on Google Cloud Vertex AI Model Garden as Model-as-a-Service, giving Google Cloud customers a managed deployment option without running their own API key setup.[14]
The pricing moved slightly: standard pages remained at approximately $2 per 1,000 (reflecting a pricing adjustment from the OCR 1 launch), and annotated pages were priced at $3 per 1,000.
Mistral has not published the parameter count, training data composition, or full architectural specification for OCR 3. Based on public documentation and independent technical reviews, the following describes what is known about the model's design.
Mistral OCR 3 is not a classical OCR pipeline. Classical OCR systems string together multiple specialized models: a layout analysis model identifies page regions, a text line detector finds baselines, a character recognition network transcribes characters, and a separate table reconstruction model tries to infer rows and columns from spatial coordinates. Each model is trained independently, and errors in early stages propagate through later ones.
OCR 3 replaces this chain with a single vision-language model. The processing pipeline has four stages:[4]
This architecture is why OCR 3 can maintain merged-cell structure in HTML tables (using colspan and rowspan attributes) and switch fluidly between cursive handwriting and printed text on the same page. The decoder understands context across the entire page rather than working character by character.
Because OCR 3 was trained specifically for document conversion, it performs structured extraction faster and more consistently than a generalist model like GPT-4o on the same tasks. The trade-off is that it cannot answer questions about document content or reason about visual elements; it only extracts and structures what is present. For document QnA use cases, the intended workflow is to feed OCR 3 output into a separate language model.[3][4]
Mistral markets this specialization explicitly as a cost advantage. Running GPT-4o with vision on 1,000 pages costs substantially more than $2, and the output table HTML tends to be less consistently structured than OCR 3's dedicated table reconstruction.
OCR 3 returns markdown with embedded HTML table blocks. Callers can choose between markdown table syntax and full HTML (with colspan/rowspan support) depending on whether their downstream pipeline needs to preserve complex table structures. Optional parameters control header and footer extraction, hyperlink detection, and image embedding as base64 data.[2][3]
When annotations are enabled, the response adds bounding box coordinates for each detected region and a structured JSON object whose schema the caller defines. This lets a single API call both extract the full document text and populate a custom data model, eliminating the need for a second LLM call to parse the OCR output.
Mistral OCR 3 supports more than 35 languages with confirmed coverage including Latin-based scripts, Cyrillic, Arabic, Hindi, Traditional Chinese, and Simplified Chinese.[12] The original Mistral OCR announcement cited "thousands of scripts, fonts, and languages across all continents" with 99%+ fuzzy-match accuracy in benchmarks.[5]
Independent benchmarks on OmniDocBench show English text accuracy at 94.6% and Chinese text accuracy at 86.1%, with a mixed-language composite of 86.2%.[11] Mistral's own benchmark from the March 2025 launch showed strong performance on Russian, French, Hindi, Mandarin, Portuguese, German, Spanish, Turkish, Ukrainian, Italian, and Romanian.
The multilingual range is relevant to Mistral's European enterprise positioning: a single model that handles French, German, Spanish, and Eastern European languages without separate regional models matters for organizations serving the EU market. Arabic and Hindi support extends the addressable market into financial services and government document processing in the Middle East and South Asia.
Mistral OCR 3 handles mathematical notation natively, rendering equations in LaTeX syntax inside the markdown output. This makes it directly useful for digitizing scientific papers, textbooks, and technical reports where other OCR tools either skip equations or output garbled character sequences.
On the OmniDocBench benchmark, OCR 3 achieved a formula accuracy of 78.2% and particularly strong results on academic content: 97.9% text accuracy on academic literature and 95.8% on research reports.[11] These numbers reflect documents with dense inline math and mixed figure-and-text layouts.
Scientific publishers and research institutions have experimented with Mistral OCR to convert journal archives into AI-readable formats for downstream retrieval-augmented generation pipelines. The ability to extract not just text but structural elements like section headings, figure captions, and equation blocks in a single pass reduces the preprocessing burden for knowledge base construction.[15]
Table handling is one of the areas where OCR 3 shows the clearest improvement over both its predecessors and competing products. The model produces HTML table markup with colspan and rowspan attributes, preserving merged cells, multi-row headers, and nested column hierarchies that simpler OCR tools flatten or drop entirely.[2]
On the complex table accuracy benchmark from the PyImageSearch review, OCR 3 scored 96.6%, compared to 84.8% for AWS Textract and 85.9% for Azure Document Intelligence.[10] On the OmniDocBench table TEDS (Tree Edit Distance Similarity) metric, OCR 3 scored 70.9% overall, with higher scores on structured document types: 83.0% on academic literature, 88.0% on exam papers, and 82.7% on books.[11]
The HTML table output is designed to feed directly into LLM prompts without intermediate parsing. A table with merged headers comes through as valid HTML rather than as a confusing sequence of pipe-delimited rows, which improves the quality of downstream question-answering and data extraction.
Handwriting is the hardest category for classical OCR tools because character forms vary widely between individuals, cursive strokes connect letters in ways that character-level models cannot decompose, and real documents often mix handwritten annotations with printed text on the same page.
OCR 3 scored 88.9% on the handwriting accuracy benchmark in the PyImageSearch review, compared to 78.2% for Azure Document Intelligence, 73.9% for Google Document AI, and 57.2% for DeepSeek-OCR.[10] Qualitatively, the model handles cursive text, field annotations on printed forms, and signatures layered over typed content more reliably than OCR 2 did.
The 74% win rate figure Mistral cites for OCR 3 over OCR 2 was calculated specifically on a set of handwritten, form-heavy, and scanned document types that OCR 2 handled poorly, using fuzzy-match scoring against human-verified ground truth.[2]
For non-text elements such as charts, graphs, and photographs, OCR 3 extracts the image data and returns it as a base64-encoded block inside the markdown, with a placeholder in the text stream indicating where the image appeared. The model does not analyze or describe image content; that step is left to a downstream VLM or vision model if needed.[3][4]
This is a deliberate design choice. Attempting to interpret charts or figures would require a general visual reasoning capability that goes beyond OCR and would add latency and cost. The current approach ensures the extracted image can be passed to a model better suited to visual analysis, such as Pixtral or GPT-4o, without re-uploading the source document.
Beyond individual element types, OCR 3 preserves the reading order and hierarchical structure of the document: headings map to markdown heading levels, nested lists are nested in the output, footnotes are extracted in position rather than dumped at the end of the file, and page breaks are annotated. The OmniDocBench reading order score was 91.6%.[11]
This matters for legal and compliance workflows where the document structure carries semantic meaning. A contract section that appears as a top-level heading versus a subheading can affect how an extraction pipeline interprets the hierarchy of obligations. Classical OCR tools that output plain text lose this structure entirely.
Mistral did not publish a single consolidated benchmark table in the OCR 3 launch post. The numbers below come from a mix of Mistral's own internal evaluations, the independent CodeSOTA benchmark run in December 2025, and the PyImageSearch technical review from December 23, 2025. All third-party numbers should be read with awareness that vendor-sponsored and single-reviewer benchmarks have inherent limitations.
| Document type | Mistral OCR 3 | Azure Document Intelligence | AWS Textract | Google Document AI | DeepSeek-OCR |
|---|---|---|---|---|---|
| Handwriting accuracy | 88.9% | 78.2% | N/A | 73.9% | 57.2% |
| Complex tables | 96.6% | 85.9% | 84.8% | N/A | N/A |
| Forms | 95.9% | 86.2% | 84.5% | N/A | N/A |
| Historical scanned documents | 96.7% | 83.7% | N/A | 87.1% | 81.1% |
| Multilingual English | 98.6% | 93.5% | 93.9% | N/A | N/A |
OmniDocBench is an open benchmark of 1,355 document images maintained by the OpenDataLab group, released with the CVPR 2025 paper. It covers academic papers, books, exam papers, newspapers, and research reports in English and Chinese.
| Metric | Score | Notes |
|---|---|---|
| Composite score | 79.75 | Overall across all document types |
| Text accuracy | 90.1% | All document types |
| English text accuracy | 94.6% | English subset |
| Chinese text accuracy | 86.1% | Chinese subset |
| Mixed-language text accuracy | 86.2% | |
| Table TEDS | 70.9% | Tree edit distance similarity |
| Formula accuracy | 78.2% | LaTeX equation extraction |
| Reading order accuracy | 91.6% |
By document category within OmniDocBench:[11]
| Document category | Text accuracy | Table TEDS |
|---|---|---|
| Academic literature | 97.9% | 83.0% |
| Exam papers | 92.8% | 88.0% |
| Books | 93.9% | 82.7% |
| Research reports | 95.8% | 82.0% |
| Newspapers | 67.0% | 58.3% |
The lower newspaper scores reflect the challenge of irregular multi-column magazine-style layouts, where OCR 3 sometimes misreads reading order across columns.
OCRBench v2 is a 7,400-sample benchmark that mixes pure text extraction with visual question-answering tasks. Because OCR 3 is a specialist extraction model with no VQA capability, it scores high on the extraction sub-tasks and low on the reasoning sub-tasks.
| Sub-task | Score |
|---|---|
| Overall | 25.2% |
| Full-page OCR | 79.1% |
| Document parsing | 55.2% |
| Text recognition | 32.5% |
The 25.2% overall figure versus 79.1% on full-page OCR shows the penalty OCR 3 pays for skipping visual reasoning. General large vision language models score 55 to 62% on OCRBench v2 overall because they can answer visual questions even if their table reconstruction is less precise.[11]
For comparison, PaddleOCR-VL scored 92.86 on OmniDocBench, though it is a hybrid model that combines traditional OCR with visual understanding and operates differently from a pure API service like Mistral OCR 3.
The pricing structure as of the December 2025 release is straightforward.
| Tier | Price | Notes |
|---|---|---|
| Standard OCR | $2 per 1,000 pages | Via the synchronous /v1/ocr endpoint.[2] |
| Annotated OCR | $3 per 1,000 annotated pages | When structured annotations (bounding boxes, JSON schema extraction) are requested.[3] |
| Batch API | $1 per 1,000 pages | 50% discount for asynchronous batch jobs submitted as .jsonl files.[2] |
For comparison, AWS Textract basic text detection costs $1.50 per 1,000 pages, rising to $15 per 1,000 pages for its forms and tables features. Google Document AI Document OCR costs $1.50 per 1,000 pages for standard document processing, with enterprise features priced higher. Azure Document Intelligence Read starts at $1.50 per 1,000 pages.
Mistral's batch price of $1 per 1,000 pages puts it at the low end of the enterprise OCR pricing band, particularly given the accuracy claims on forms and tables that would otherwise require more expensive specialized features from Textract or Document AI.[10]
One practical note: the free tier on La Plateforme may use submitted documents for model training. Organizations with data sensitivity requirements should use the paid tier or the self-hosted option, which does not send data to Mistral's training pipeline.
/v1/ocr endpointThe OCR 3 API follows a simple request-response pattern. A caller sends a document reference (URL, file ID, or base64 payload) with the model name and optional parameters, and receives a JSON response containing the markdown text, page-level metadata, and any requested annotations.
Key request parameters include:
model: mistral-ocr-2512 or mistral-ocr-latestdocument: object with type (document_url for PDFs/DOCX/PPTX, image_url for images) and url or datainclude_image_base64: returns embedded images as base64 in the responsedocument_annotation_format: JSON schema for custom structured extractionbbox_annotation_format: schema for bounding box annotationsA minimal Python call using the official SDK:
from mistralai import Mistral
client = Mistral(api_key="your-api-key")
response = client.ocr.process(
model="mistral-ocr-latest",
document={"type": "document_url", "url": "https://example.com/document.pdf"}
)
print(response.pages<sup><a href="#cite_note-0" class="cite-ref">[0]</a></sup>.markdown)
The SDK is installed via pip install mistralai.[11]
Mistral AI Studio includes a Document AI Playground that lets analysts and product managers test OCR 3 without writing code. Users can drag and drop a PDF or image, select output parameters, and receive the markdown and JSON output immediately. The playground is useful for prototyping extraction schemas before committing to an API integration.[2][3]
The batch API accepts .jsonl files where each line is a self-contained OCR request. Batch jobs are processed asynchronously, which lets callers submit large archives without holding open long-running connections. The 50% price discount on batch jobs makes the pipeline economical for high-volume archiving and migration projects.[2]
Le Chat, Mistral's consumer AI assistant, uses the OCR stack as its document understanding layer. When a user uploads a PDF or image to Le Chat, Mistral OCR processes the document in the background and passes the extracted markdown to the conversation context before the language model generates a response.[13][15]
This integration was present from the original OCR 1 release in March 2025. The OCR 3 upgrade improved the quality of document understanding visible to Le Chat users without any change to the chat interface itself. Le Chat Enterprise, available on Google Cloud Marketplace since May 2025, also uses the OCR stack for its enterprise document library features.[14]
The /v1/ocr endpoint integrates with Mistral's chat completions and agents APIs. A developer can extract a document with OCR 3 and then pass the markdown directly into a /v1/chat/completions call with one of Mistral's language models, creating a single-API-key workflow for document QnA, summarization, or structured data extraction without switching providers.[3]
Mistral OCR is available as a managed API on both Microsoft Azure AI Foundry and Google Cloud Vertex AI Model Garden, giving enterprise buyers access to the model through their existing cloud contracts and billing relationships. These deployments run on Mistral's infrastructure but are accessible via the respective cloud management consoles.[14]
Legal workflows generate dense, structure-heavy documents: contracts with numbered sections and cross-references, court filings with exhibit attachments, compliance disclosures with standardized table formats, and regulatory submissions mixing text, forms, and signatures. Classical OCR tools can extract text but typically lose the hierarchical structure that determines how clauses relate to each other.
OCR 3's heading-level preservation and HTML table output make it practical for building contract review pipelines where section hierarchy matters. A compliance team can extract a regulatory filing into structured markdown, pass it to a language model for analysis, and receive responses that correctly attribute obligations to the right clause levels. Law firms and legal technology companies are among the enterprise buyers Mistral highlights for self-hosted deployments.[2][15]
Invoice processing, receipt capture, and financial statement digitization are high-volume, accuracy-sensitive workflows. OCR errors in financial documents propagate directly into accounting systems. OCR 3's 95.9% forms accuracy benchmark and its ability to output extraction results as structured JSON against a caller-defined schema make it applicable to accounts payable automation, expense management, and KYC document processing.[10]
The batch API pricing at $1 per 1,000 pages matters here because finance teams often need to process large archives of historical invoices when migrating accounting systems or preparing for audits. Running 500,000 invoices through a batch job costs $500, compared to several thousand dollars for equivalent volume through Textract's forms feature.
Banks and financial institutions with strict data sovereignty requirements are explicitly called out by Mistral as the target audience for the self-hosted deployment option, which keeps document content inside the buyer's own infrastructure.[2]
Scientific papers contain a mix of dense prose, mathematical equations, structured tables, figure captions, and references. Converting a PDF paper to clean text for downstream processing is harder than it looks because equation rendering in PDF uses arbitrary character substitutions that classical OCR cannot decode, and table cells in papers often span multiple rows with no consistent formatting.
OCR 3's 78.2% formula accuracy on OmniDocBench, combined with 97.9% text accuracy on academic literature, makes it one of the better options for bulk scientific paper processing. Research institutions and publishers have begun using it to build RAG corpora from journal archives, where the alternative is often paying for commercial PDF-to-text conversion services that cost more per page and produce less clean output.[11][15]
Historical archives contain documents scanned at varying quality levels, with age-related degradation, staining, skew from improper scanning, and handwritten marginalia layered over printed text. OCR 3's 96.7% accuracy on historical scanned documents in the PyImageSearch benchmark reflects training on degraded real-world scans rather than clean synthetic test sets.[10]
Museums, national archives, libraries, and government agencies processing historical records are a natural fit. The model handles early-20th-century typed documents with physical damage, mid-century forms with handwritten completions, and mixed-period collections with varying scan quality within a single batch job.
Organizations maintain large internal document libraries: product manuals, technical specifications, policy documents, internal reports, and training materials. Making this content searchable and usable by AI assistants requires converting it into clean text. OCR 3's throughput and per-page pricing make it practical to process entire enterprise document repositories that would take weeks with manual conversion or more expensive API services.
The Document QnA integration (combining OCR output with a language model in a single API call) reduces the engineering effort for building internal knowledge assistants, since the extraction and question-answering steps share one API key and one billing account.
The document AI market in 2025 covers three distinct product categories: traditional enterprise document AI services from cloud providers, generalist VLMs used for document extraction as a secondary capability, and open-source OCR tools. Mistral OCR 3 is positioned to compete across all three.
| Product | Operator | Standard price per 1,000 pages | Key strength | Known limitation |
|---|---|---|---|---|
| Mistral OCR 3 | Mistral AI | $2 (standard), $1 (batch) | Handwriting, complex tables, EU data residency | No fine-tuning, no chart interpretation |
| AWS Textract | Amazon Web Services | $1.50 (text), up to $15 (forms and tables) | Deep AWS integration, mature compliance features | Complex table accuracy 84.8% in benchmarks |
| Google Document AI | Google Cloud | $1.50 (standard) | Large document type library, form parser variants | Handwriting 73.9% in benchmarks |
| Azure Document Intelligence | Microsoft | $1.50 (read) | Microsoft ecosystem, pre-built form models | Handwriting 78.2% in benchmarks |
| Adobe PDF Extract | Adobe | Variable (enterprise contract) | Native PDF fidelity, Acrobat ecosystem | High cost for bulk processing |
GPT-4o vision and Gemini can extract text from document images, but they are general-purpose models priced for reasoning tasks rather than bulk page processing. GPT-4o charges per token, and a dense document page can generate thousands of output tokens, making cost-per-page substantially higher than Mistral OCR 3's $2 flat rate. Table reconstruction is less consistently structured from generalist models, and VLMs introduce latency and cost from image understanding that OCR 3 avoids by being purpose-built.
The trade-off runs the other way for tasks requiring visual reasoning: GPT-4o can interpret a chart, answer a question about a graph, or describe an architectural diagram, while OCR 3 passes such elements through as base64 image data without interpretation.
| Tool | Operator | License | Key strength | Limitation vs OCR 3 |
|---|---|---|---|---|
| Tesseract | Google (maintained by community) | Apache 2.0 | Free, widely deployed, extensive language support | Poor table and handwriting accuracy, no structure preservation |
| Surya | VikParuchuri | GPL-3.0 | Open weights, good layout analysis | Slower throughput, GPU required for batch |
| olmOCR | Allen AI | Apache 2.0 | Academic focus, formula-aware | Less tested on business document types |
| MinerU | OpenDataLab | AGPL-3.0 | Strong on scientific papers, open | Complex deployment, high resource requirements |
| Nougat | Meta AI | MIT | LaTeX math output, academic papers | Narrow domain, not production-hardened |
| DocLing | IBM | Apache 2.0 | Enterprise design, multiple connectors | Accuracy behind Mistral OCR 3 on benchmarks |
Open-source tools avoid per-page fees and allow fine-tuning on domain-specific document types. The trade-off is deployment complexity, infrastructure cost for GPU-accelerated batch processing, and generally lower accuracy on handwriting and complex forms.
The initial Mistral OCR release in March 2025 drew positive coverage from technical press. TechCrunch described it as straightforward and well-priced for the emerging document-to-LLM pipeline use case.[7] VentureBeat framed the positioning against cloud OCR incumbents as ambitious but plausible given the accuracy benchmarks.[13]
The OCR 3 release in December 2025 received similarly positive initial coverage. InfoQ noted the improvement methodology was notable for using fuzzy-match scoring on real business documents rather than synthetic benchmarks, which is harder to game.[1] The VentureBeat headline focused on the 74% win rate and $2 pricing as the commercial argument for enterprise buyers who had been locked into AWS or Google cloud OCR services.
The CodeSOTA independent benchmark run in December 2025 provided verification outside Mistral's own marketing. The OmniDocBench composite score of 79.75 put OCR 3 behind hybrid models like PaddleOCR-VL (92.86) but well ahead of simpler pipeline OCR tools.[11] The OCRBench v2 overall score of 25.2% drew some attention because it looks low, though reviewers noted this reflects the benchmark's VQA component rather than any failure in text extraction.
Independent testing by Parsio uncovered a format sensitivity issue: PDF-to-JPEG conversions of the same document sometimes outperform direct PDF uploads, which suggests the PDF rasterization pipeline has inconsistencies.[16] The PyImageSearch review flagged hallucination risk for financial documents, noting that high-fidelity markdown output can look correct to a human reviewer even when specific digits are wrong.[10]
Analytics Vidhya summarized the community view after several weeks of testing: the model is well-suited for high-structure documents like invoices, forms, and academic papers, but irregular layouts like magazines and multi-column newspaper pages remain a weakness.[12]
Several limitations have been identified through independent testing and community reports:
Format sensitivity: Direct PDF uploads sometimes produce worse results than rasterizing the PDF to JPEG first and submitting the image. This is the opposite of the expected behavior and creates inconsistency for production pipelines that cannot predict which format will work better for a given document.[16]
Multi-column layout handling: OCR 3 can struggle with magazine-style layouts where text flows across irregular columns. The model sometimes attempts to represent non-tabular columnar text as an HTML table, which introduces structure where none was intended.[10][11] The newspaper accuracy on OmniDocBench (67.0% text, 58.3% table TEDS) reflects this.
No fine-tuning: The model is a managed SaaS endpoint with no customer fine-tuning option, outside of the self-hosted offering for large enterprise buyers. Organizations with highly domain-specific document types (specialized medical forms, unusual financial instruments, proprietary engineering drawings) cannot adapt the model to their vocabulary and layout conventions.[10]
Digit hallucination risk: The PyImageSearch review noted that OCR 3 can generate confident-looking output with flipped or wrong digits in numeric fields. Unlike a character-level model that fails visibly on ambiguous characters, OCR 3's language-model backbone may interpolate plausible-looking numbers. Human verification or cross-checking against known data is advisable for financial documents.[10]
No visual reasoning: OCR 3 does not interpret charts, graphs, diagrams, or photographs. These are returned as base64 image data with a position placeholder in the text stream. Any interpretation requires a second model call.[3][4]
Connectivity requirement: The standard offering is a SaaS API with no offline mode. The self-hosted option addresses this for buyers with infrastructure budgets, but it is not a drop-in substitute for a locally runnable open-source tool.
Free tier data usage: On the free tier of La Plateforme, submitted documents may be used for model training. This is a standard clause for free AI APIs but can be a showstopper for documents containing sensitive personal or commercial information.
Mistral is a Paris-based company and has consistently used EU data residency as a competitive argument against US hyperscalers. OCR 3 fits that positioning in two ways.
The standard managed service runs on Mistral's own infrastructure rather than AWS, Azure, or Google Cloud, which addresses the concern some EU customers have about data processed by US cloud services under CLOUD Act jurisdiction. The self-hosted option goes further, allowing banks, hospitals, and public-sector organizations to run the model entirely inside their own perimeter with no document content leaving their network.[2]
Mistral's governance pages classify OCR 3 as a specialized base model with active lifecycle status and no scheduled retirement date beyond the OCR 2 deprecation on February 27, 2026.[1][8] The active classification signals to enterprise buyers that they can build long-running pipelines on mistral-ocr-2512 without an imminent forced migration.
The combination of EU-based infrastructure, GDPR-native data handling, and a self-hosted deployment path has been Mistral's clearest differentiation from AWS Textract and Google Document AI, which both store and process data in US-based data centers by default (though both offer EU region options).
Mistral's enterprise product portfolio in 2025 had four main components: La Plateforme for general API access to LLMs, Mistral AI Studio for hosted UI tools including the Document AI Playground, the Document AI API suite (OCR, annotations, and document QnA), and Le Chat for consumer and enterprise assistant use cases. OCR 3 connects the document side of this stack.
The pricing structure is part of the competitive strategy. At $1 per 1,000 pages in batch mode, Mistral is priced below the entry tier of AWS Textract's forms and tables feature while claiming higher accuracy on the most difficult enterprise workloads. The gap between what Textract charges for full form extraction ($15 per 1,000 pages) and what OCR 3 charges ($2 standard, $1 batch) is substantial enough to drive procurement conversations at large organizations processing millions of documents per year.
OCR 3 also strengthens Le Chat Enterprise as a product. A business deploying Le Chat Enterprise through Google Cloud Marketplace gets document understanding through OCR 3 without a separate contract, which simplifies procurement and makes Le Chat a more complete platform for knowledge worker use cases beyond text chat.