Gemini 3.1 Pro
Last reviewed
Jun 2, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 · 1,618 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 2, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 · 1,618 words
Add missing citations, update stale details, or suggest a clearer explanation.
Gemini 3.1 Pro is a large language model developed by Google DeepMind, announced on 19 February 2026 as a point-release upgrade to Gemini 3 Pro [1][2]. Built on the same model family as its predecessor, it was positioned as a stronger reasoning baseline for hard analytical and agentic work rather than a full generational redesign, with Google reporting more than double the reasoning performance of Gemini 3 Pro on the ARC-AGI-2 benchmark [1][3]. The model is natively multimodal, accepts up to one million tokens of input, and shipped in preview across Google's developer, enterprise, and consumer surfaces [1][4].
Gemini 3.1 Pro sits in the Gemini 3 line as the flagship "Pro" tier, above the smaller Gemini 3 Flash and alongside a Deep Think variant intended for extended deliberate reasoning [4]. According to its model card, it is built on Gemini 3 Pro and shares that model's underlying architecture and training recipe, which the card does not re-disclose and instead defers to the Gemini 3 Pro documentation [3]. Google describes it as a "smarter, more capable baseline for complex problem-solving" and recommends it for agentic performance, advanced coding, long-context or multimodal understanding, and algorithmic development [1][3].
The release follows a cadence Google adopted across 2025 and into 2026 of issuing dot-releases that sharpen a model's reasoning and tool-use behavior without renaming the generation. Where Gemini 3 Pro launched in late 2025, Gemini 3.1 Pro arrived roughly three months later and was framed primarily around gains in core reasoning and reliability on multi-step tasks [1][2].
Google introduced Gemini 3.1 Pro through a post on the company blog and a Google DeepMind announcement on 19 February 2026, with availability on Google Cloud surfaces beginning the following day [1][2][4]. The model was released in preview rather than general availability, with Google saying the preview window would let it validate the update and push further on ambitious agentic workflows before a broader rollout [1][4].
At launch Google paired the announcement with statements from early-access partners. JetBrains reported up to a 15% improvement over its best Gemini 3 Pro Preview runs, Databricks said the model reached best-in-class results on its OfficeQA grounded-reasoning benchmark, and Cartwheel cited substantially improved understanding of 3D transformations [4].
Gemini 3.1 Pro is an incremental successor to Gemini 3 Pro, not a new generation. The two share the same context window, price, and broad capability profile, so the practical distinction is reasoning quality and consistency on difficult problems [1][3][5]. Google's headline comparison was ARC-AGI-2, where Gemini 3.1 Pro's verified 77.1% was described as more than double Gemini 3 Pro's score on the same test [1][3]. On RE-Bench, a machine-learning research-and-engineering evaluation, the model card lists a human-normalized score of 1.27 for Gemini 3.1 Pro against 1.04 for Gemini 3 Pro [3][6].
Because the upgrade kept pricing and the 1M-token window unchanged, Google presented it as a drop-in replacement: developers already on Gemini 3 Pro could switch model identifiers and gain reasoning depth at the same cost per token [4][5]. A "3.5 Pro coming soon" note later appeared on the DeepMind Gemini page, indicating that 3.1 Pro held the current flagship position within the Gemini 3 series at release [4].
Google did not publish new architectural details specific to Gemini 3.1 Pro. The model card states only that it is built on Gemini 3 Pro and points to that model's card for training data, methodology, and design [3]. Gemini 3 Pro is a sparse mixture-of-experts transformer trained for native multimodality, and 3.1 Pro inherits that lineage [3][4]. The knowledge cutoff is reported as January 2025 [4][7].
The 3.1 Pro model card emphasizes evaluation and safety over architecture. It reports that the model remained below critical capability thresholds across the CBRN, cyber, harmful manipulation, machine-learning R&D, and misalignment domains that Google tracks under its Frontier Safety Framework [3].
Gemini 3.1 Pro is natively multimodal. It accepts text, audio, images, video, PDFs, and entire code repositories within a single input window and produces text output [3][4]. The model card frames its strengths around four areas: agentic performance, advanced coding, long-context and multimodal understanding, and algorithmic development [3].
Google highlighted agentic coding and tool use as a focus of the release, with the model tuned to use fewer output tokens while returning more reliable results on multi-step engineering tasks [4]. It is available inside developer tooling including Google AI Studio, the Gemini CLI, Antigravity, and Android Studio, where the agentic and coding behaviors are most directly exercised [1][4].
The figures below are drawn from Google DeepMind's Gemini 3.1 Pro model card unless otherwise noted. ARC-AGI-2, an abstract-reasoning test from the ARC Prize designed to resist memorization, was the launch's headline result [1][3]. GPQA Diamond is a graduate-level science benchmark, and the model card's GPQA and ARC-AGI figures were widely reported as among the strongest published at the time [3][5].
| Benchmark | Domain | Gemini 3.1 Pro | Source |
|---|---|---|---|
| ARC-AGI-2 (verified) | Abstract reasoning | 77.1% | [1][3] |
| GPQA Diamond | Graduate-level science | 94.3% | [3][5] |
| Humanity's Last Exam (with search/code) | Broad reasoning | 51.4% | [3] |
| SWE-bench Verified | Software engineering | 80.6% | [3] |
| SWE-Bench Pro (Public) | Software engineering | 54.2% | [3] |
| LiveCodeBench Pro | Competitive coding | 2887 Elo | [3][4] |
| MRCR v2 (128k) | Long-context retrieval | 84.9% | [3] |
| MRCR v2 (1M) | Long-context retrieval | 26.3% | [3] |
| RE-Bench | ML research/engineering | 1.27 (human-normalized) | [3][6] |
In head-to-head coverage, the press reported Gemini 3.1 Pro leading most categories against GPT-5.2, including ARC-AGI-2 (77.1% versus 52.9%) and GPQA Diamond (94.3% versus 92.4%), while Claude Opus 4.6 held narrow leads on some agentic and professional-task evaluations [5][8]. Reported competitor figures came from third-party aggregation and varied between outlets, so they should be read as indicative rather than official Google numbers [5][8].
Gemini 3.1 Pro launched in preview across developer, enterprise, and consumer channels: the Gemini API via Google AI Studio, the Gemini CLI, Antigravity, and Android Studio for developers; Vertex AI and Gemini Enterprise for organizations; and the Gemini app plus NotebookLM for consumers on paid plans [1][4]. It supports a context window of up to one million input tokens with an output ceiling of roughly 64,000 tokens [3][7].
API pricing matched Gemini 3 Pro, with a tiered structure based on prompt size and a discounted batch tier for asynchronous work [9][10]. Cache hits are billed at a fraction of the input rate [9][11].
| Tier | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Standard, prompt <= 200k | $2.00 | $12.00 |
| Standard, prompt > 200k | $4.00 | $18.00 |
| Batch / Flex, prompt <= 200k | $1.00 | $6.00 |
| Batch / Flex, prompt > 200k | $2.00 | $9.00 |
| Context caching (read) | $0.20 | n/a |
Source: Google Gemini API pricing documentation [9].
Coverage treated the release as a meaningful reasoning bump delivered without a price increase. Outlets noted the ARC-AGI-2 result as the standout, since that benchmark is built to penalize pattern memorization and the jump over Gemini 3 Pro was large [1][5]. Artificial Analysis placed Gemini 3.1 Pro Preview near the top of its Intelligence Index, scoring it 57 and ranking it among the highest models it tracks, while observing that the model ran slower and produced more verbose output than some peers during evaluation [11].
Commentators also pushed back on benchmark-led messaging. Writing on his Substack, Zvi Mowshowitz characterized the launch as one that "aces benchmarks" while questioning how much the headline scores translate into day-to-day capability gains, a recurring theme in reactions to closely spaced dot-releases [12].
As a preview release, Gemini 3.1 Pro carried the usual caveats Google attaches to pre-GA models, including rate limits on consumer apps and the possibility of behavior changes before general availability [1][4]. Output is text only; despite multimodal input, the model does not generate images, audio, or video [3]. Long-context performance degrades at the extreme end of the window, as shown by the MRCR v2 score falling from 84.9% at 128k tokens to 26.3% at the full 1M-token range [3]. On some professional-task and agentic evaluations, competing frontier models from Anthropic and OpenAI matched or exceeded it, so its lead was not uniform across every category [5][8]. Pricing also steps up above the 200k-token prompt threshold, which raises cost for the largest-context workloads [9].