Gemini 3.1 Pro

11 min read

Updated Jul 23, 2026

Gemini 3.1 Pro is a large language model developed by Google DeepMind and released on 19 February 2026 as a point-release upgrade to Gemini 3 Pro ^[1]^[2]. Built on the same model family as its predecessor, Google describes it on the model card as "Google's most advanced model for complex tasks" and positions it as a stronger reasoning baseline for hard analytical and agentic work rather than a full generational redesign, reporting more than double the reasoning performance of Gemini 3 Pro on the ARC-AGI-2 benchmark (77.1% versus 31.1%) ^[1]^[3]. The model is natively multimodal, accepts up to one million tokens of input, and shipped in preview across Google's developer, enterprise, and consumer surfaces at the same price as Gemini 3 Pro ^[1]^[3]^[4].

What is Gemini 3.1 Pro?

Gemini 3.1 Pro sits in the Gemini 3 line as the flagship "Pro" tier, above the smaller Gemini 3 Flash and alongside a Deep Think variant intended for extended deliberate reasoning ^[4]. According to its model card, it is built on Gemini 3 Pro and shares that model's underlying architecture and training recipe, which the card does not re-disclose and instead defers to the Gemini 3 Pro documentation ^[3]. Google describes it as a smarter, more capable baseline for complex problem-solving and recommends it for agentic performance, advanced coding, long-context or multimodal understanding, and algorithmic development ^[1]^[3]. In the launch blog post Google framed the release simply: "3.1 Pro is designed for tasks where a simple answer isn't enough." ^[1]

The release follows a cadence Google adopted across 2025 and into 2026 of issuing dot-releases that sharpen a model's reasoning and tool-use behavior without renaming the generation. Where Gemini 3 Pro launched in late 2025, Gemini 3.1 Pro arrived roughly three months later and was framed primarily around gains in core reasoning and reliability on multi-step tasks ^[1]^[2].

When was Gemini 3.1 Pro released?

Google introduced Gemini 3.1 Pro through a post on the company blog and a Google DeepMind announcement on 19 February 2026, with availability on Google Cloud surfaces beginning the following day ^[1]^[2]^[4]. The model was released in preview rather than general availability, with Google saying the preview window would let it validate the update and push further on ambitious agentic workflows before a broader rollout ^[1]^[4].

At launch Google paired the announcement with statements from early-access partners on the Google Cloud blog. JetBrains reported up to a 15% improvement over its best Gemini 3 Pro Preview runs, Databricks said the model reached best-in-class results on its OfficeQA grounded-reasoning benchmark, and Cartwheel cited substantially improved understanding of 3D transformations ^[4]. Vladislav Tankov, Director of AI at JetBrains, said: "Gemini 3.1 Pro represents a clear quality leap in our evaluations. We observed up to 15% improvement over the best Gemini 3 Pro Preview runs." ^[4] Andrew Carr, co-founder and chief scientist at Cartwheel, added that the model "has a substantially improved understanding of 3D transformations" and let his team "close a long standing rotation order bug in one of our export pipelines." ^[4]

How does Gemini 3.1 Pro compare to Gemini 3 Pro?

Gemini 3.1 Pro is an incremental successor to Gemini 3 Pro, not a new generation. The two share the same context window, price, and broad capability profile, so the practical distinction is reasoning quality and consistency on difficult problems ^[1]^[3]^[5]. Google's headline comparison was ARC-AGI-2, where Gemini 3.1 Pro's verified 77.1% was described as more than double Gemini 3 Pro's 31.1% on the same test ^[1]^[3]. On RE-Bench, a machine-learning research-and-engineering evaluation, the model card lists a human-normalized score of 1.27 for Gemini 3.1 Pro against 1.04 for Gemini 3 Pro ^[3]^[6].

Because the upgrade kept pricing and the 1M-token window unchanged, Google presented it as a drop-in replacement: developers already on Gemini 3 Pro could switch model identifiers and gain reasoning depth at the same cost per token ^[4]^[5]. A "3.5 Pro coming soon" note later appeared on the DeepMind Gemini page, indicating that 3.1 Pro held the current flagship position within the Gemini 3 series at release ^[4].

The table below pairs the figures Google published for both models on the same benchmarks in the 3.1 Pro model card.

Benchmark	Domain	Gemini 3.1 Pro	Gemini 3 Pro
ARC-AGI-2 (verified)	Abstract reasoning	77.1%	31.1%
GPQA Diamond	Graduate-level science	94.3%	91.9%
Humanity's Last Exam (with search/code)	Broad reasoning	51.4%	45.8%
SWE-bench Verified	Software engineering	80.6%	76.2%
SWE-Bench Pro (Public)	Software engineering	54.2%	43.3%
LiveCodeBench Pro (Elo)	Competitive coding	2887	2439
Terminal-Bench 2.0	Agentic terminal use	68.5%	56.9%
MRCR v2 (128k average)	Long-context retrieval	84.9%	77.0%
MRCR v2 (1M pointwise)	Long-context retrieval	26.3%	26.3%
RE-Bench	ML research/engineering	1.27	1.04
MMMU-Pro	Multimodal reasoning	80.5%	81.0%

Source: Gemini 3.1 Pro model card ^[3]. The two models post nearly identical scores at the extreme 1M-token retrieval setting and on the MMMU-Pro multimodal benchmark, so the gains are concentrated in abstract reasoning, agentic coding, and tool use rather than in every category ^[3].

What architecture and training does Gemini 3.1 Pro use?

Google did not publish new architectural details specific to Gemini 3.1 Pro. The model card states only that it is built on Gemini 3 Pro and points to that model's card for training data, methodology, and design ^[3]. Gemini 3 Pro is a sparse mixture-of-experts transformer trained for native multimodality, and 3.1 Pro inherits that lineage ^[3]^[4]. The knowledge cutoff is reported as January 2025, the same as Gemini 3 Pro ^[4]^[7].

The 3.1 Pro model card emphasizes evaluation and safety over architecture. It reports that the model remained below critical capability thresholds across the CBRN, cyber, harmful manipulation, machine-learning R&D, and misalignment domains that Google tracks under its Frontier Safety Framework ^[3].

What can Gemini 3.1 Pro do?

Gemini 3.1 Pro is natively multimodal. It accepts text, images, audio, and video files, including PDFs and entire code repositories, within a single input window and produces text output ^[3]^[4]. The model card frames its strengths around four areas: agentic performance, advanced coding, long-context and multimodal understanding, and algorithmic development ^[3].

Google highlighted agentic coding and tool use as a focus of the release, with the model tuned to use fewer output tokens while returning more reliable results on multi-step engineering tasks ^[4]. Hanlin Tang, Databricks' CTO of Neural Networks, said the model "achieved best-in-class results on OfficeQA, Databricks' benchmark for grounded reasoning that combines both tabular and unstructured data" ^[4]. It is available inside developer tooling including Google AI Studio, the Gemini CLI, Antigravity, and Android Studio, where the agentic and coding behaviors are most directly exercised ^[1]^[4].

What are Gemini 3.1 Pro's benchmark results?

The figures below are drawn from Google DeepMind's Gemini 3.1 Pro model card unless otherwise noted. ARC-AGI-2, an abstract-reasoning test from the ARC Prize designed to resist memorization, was the launch's headline result ^[1]^[3]. GPQA Diamond is a graduate-level science benchmark, and the model card's GPQA and ARC-AGI figures were widely reported as among the strongest published at the time ^[3]^[5].

Benchmark	Domain	Gemini 3.1 Pro	Source
ARC-AGI-2 (verified)	Abstract reasoning	77.1%	^[1]^[3]
GPQA Diamond	Graduate-level science	94.3%	^[3]^[5]
Humanity's Last Exam (with search/code)	Broad reasoning	51.4%	^[3]
SWE-bench Verified	Software engineering	80.6%	^[3]
SWE-Bench Pro (Public)	Software engineering	54.2%	^[3]
Terminal-Bench 2.0	Agentic terminal use	68.5%	^[3]
LiveCodeBench Pro	Competitive coding	2887 Elo	^[3]^[4]
MMMU-Pro	Multimodal reasoning	80.5%	^[3]
MMMLU	Multilingual knowledge	92.6%	^[3]
MRCR v2 (128k)	Long-context retrieval	84.9%	^[3]
MRCR v2 (1M)	Long-context retrieval	26.3%	^[3]
RE-Bench	ML research/engineering	1.27 (human-normalized)	^[3]^[6]

In head-to-head coverage, the press reported Gemini 3.1 Pro leading most categories against GPT-5.2, including ARC-AGI-2 (77.1% versus 52.9%) and GPQA Diamond (94.3% versus 92.4%), while Claude Opus 4.6 held narrow leads on some agentic and professional-task evaluations and posted 68.8% on ARC-AGI-2 ^[5]^[8]. Reported competitor figures came from third-party aggregation and varied between outlets, so they should be read as indicative rather than official Google numbers ^[5]^[8].

How much does Gemini 3.1 Pro cost and what is its context window?

Gemini 3.1 Pro launched in preview across developer, enterprise, and consumer channels: the Gemini API via Google AI Studio, the Gemini CLI, Antigravity, and Android Studio for developers; Vertex AI and Gemini Enterprise for organizations; and the Gemini app plus NotebookLM for consumers on paid plans ^[1]^[4]. It supports a context window of up to one million input tokens with an output ceiling of roughly 64,000 tokens ^[3]^[7].

API pricing matched Gemini 3 Pro, with a tiered structure based on prompt size and a discounted batch tier for asynchronous work ^[9]^[10]. Cache hits are billed at a fraction of the input rate ^[9]^[11].

Tier	Input (per 1M tokens)	Output (per 1M tokens)
Standard, prompt <= 200k	$2.00	$12.00
Standard, prompt > 200k	$4.00	$18.00
Batch / Flex, prompt <= 200k	$1.00	$6.00
Batch / Flex, prompt > 200k	$2.00	$9.00
Context caching (read), prompt <= 200k	$0.20	n/a

Source: Google Gemini API pricing documentation ^[9]. Context caching also carries a storage charge of $4.50 per 1M tokens per hour, and a higher-throughput Priority tier is offered at roughly 1.8x the Standard rate ^[9].

How was Gemini 3.1 Pro received?

Coverage treated the release as a meaningful reasoning bump delivered without a price increase. Outlets noted the ARC-AGI-2 result as the standout, since that benchmark is built to penalize pattern memorization and the jump over Gemini 3 Pro's 31.1% was large ^[1]^[5]. Artificial Analysis placed Gemini 3.1 Pro Preview in the top tier of its Intelligence Index, scoring it well above the median for its price class and ranking it among the strongest models it tracks; its public snapshot also described the preview as generating output at a fast rate and producing relatively concise responses compared with the average model in the index ^[11]. Because preview models change before general availability, these standings reflect a point-in-time evaluation rather than a final number ^[11].

Commentators also pushed back on benchmark-led messaging. Writing on his Substack, Zvi Mowshowitz characterized the launch as one that "aces benchmarks" while questioning how much the headline scores translate into day-to-day capability gains, a recurring theme in reactions to closely spaced dot-releases ^[12]. Several outlets noted that some rivals, such as a coding-focused GPT variant, left key benchmarks unpublished, so a number of Gemini 3.1 Pro's category wins were against absent or partial competitor results ^[5]^[12].

What are Gemini 3.1 Pro's limitations?

As a preview release, Gemini 3.1 Pro carried the usual caveats Google attaches to pre-GA models, including rate limits on consumer apps and the possibility of behavior changes before general availability ^[1]^[4]. Output is text only; despite multimodal input, the model does not generate images, audio, or video ^[3]. Long-context performance degrades at the extreme end of the window, as shown by the MRCR v2 score falling from 84.9% at 128k tokens to 26.3% at the full 1M-token range ^[3]. On some professional-task and agentic evaluations, competing frontier models from Anthropic and OpenAI matched or exceeded it, so its lead was not uniform across every category ^[5]^[8]. Pricing also steps up above the 200k-token prompt threshold, which raises cost for the largest-context workloads ^[9].

ELI5

Gemini 3.1 Pro is a smarter version of Google's earlier Gemini 3 Pro robot brain. It is not a brand-new robot, just the same one with a much better thinking upgrade, so it is far better at hard puzzles, tricky coding, and reading huge piles of text, pictures, and video all at once. Google says it can think more than twice as well on a famously tough reasoning test, and it costs the same as the old version to use.

References

^Gemini 3.1 Pro: A smarter model for your most complex tasks - The Keyword, Google
^Gemini 3.1 Pro: A smarter model for your most complex tasks - Google DeepMind
^Gemini 3.1 Pro - Model Card - Google DeepMind
^Gemini 3.1 Pro on Gemini CLI, Gemini Enterprise, and Vertex AI - Google Cloud Blog
^Gemini 3.1 Pro: Pricing, Context Window, Benchmarks, API & More - LLM Stats
^Google rolls out Gemini 3.1 Pro preview - TechInformed
^Gemini 3.1 Pro Review in 2026: Release Date, Price, Benchmarks - ICON
^Gemini 3.1 Pro Review 2026: #1 Ranked AI Model? - ComputerTech
^Gemini Developer API pricing - Google AI for Developers
^Gemini 3.1 Pro Preview - API Pricing & Benchmarks - OpenRouter
^Gemini 3.1 Pro Preview - Intelligence, Performance & Price Analysis - Artificial Analysis
^Gemini 3.1 Pro Aces Benchmarks, I Suppose - Zvi Mowshowitz

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · v3 · 2,282 words · full history

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Suggest edit

Gemini 3.1 Pro

What is Gemini 3.1 Pro?

When was Gemini 3.1 Pro released?

How does Gemini 3.1 Pro compare to Gemini 3 Pro?

What architecture and training does Gemini 3.1 Pro use?

What can Gemini 3.1 Pro do?

What are Gemini 3.1 Pro's benchmark results?

How much does Gemini 3.1 Pro cost and what is its context window?

How was Gemini 3.1 Pro received?

What are Gemini 3.1 Pro's limitations?

See also

ELI5

References

Improve this article

What links here (24 of 27)

What links here (24 of 27)

What is Gemini 3.1 Pro?

When was Gemini 3.1 Pro released?

How does Gemini 3.1 Pro compare to Gemini 3 Pro?

What architecture and training does Gemini 3.1 Pro use?

What can Gemini 3.1 Pro do?

What are Gemini 3.1 Pro's benchmark results?

How much does Gemini 3.1 Pro cost and what is its context window?

How was Gemini 3.1 Pro received?

What are Gemini 3.1 Pro's limitations?

See also

ELI5

References

Improve this article

Related Articles

Gemini 3

Gemma 3

Gemini 2.5 Flash

Gemini 3.5 Flash

Gemini 1.5 Pro

Gemini 1.5 Flash

What links here (24 of 27)

Related Articles

Gemini 3

Gemma 3

Gemini 2.5 Flash

Gemini 3.5 Flash

Gemini 1.5 Pro

Gemini 1.5 Flash

What links here (24 of 27)