Gemini 3 Pro

Overview

Gemini 3 Pro is the flagship preview model in Google DeepMind's Gemini 3 family of multimodal models. Google introduced the Gemini 3 series on November 18, 2025 and shipped Gemini 3 Pro in preview the same day across the Gemini app, Google AI Studio, Vertex AI, the Gemini API, AI Mode in Search, and a new agentic coding platform called Google Antigravity.^[1]^[2]^[3]

Google positioned Gemini 3 Pro as its most intelligent model at launch and said it significantly outperformed Gemini 2.5 Pro, the previous flagship in the Gemini line, on every major reasoning, coding, and multimodal benchmark the company tracks.^[1] Internally Google described it as the start of the Gemini 3 era, a roadmap that played out through Gemini 3 Flash, Gemini 3 Deep Think, Nano Banana Pro (Gemini 3 Pro Image), and the Gemini 3.1 Pro refresh in February 2026.

The model became the first publicly accessible large language model to break the 1500 Elo barrier on LMArena, debuting at 1501 Elo on the chatbot leaderboard at launch and ranking first across LMArena's text reasoning, vision, coding, and web development tracks.^[1]^[4]

Gemini 3 Pro is built on a sparse mixture-of-experts (MoE) transformer architecture, a design that activates only a subset of model parameters per input token. This lets Google decouple total model capacity from per-token computation cost, supporting the 1 million token context window while maintaining practical inference latency.^[9]

By the time Google deprecated gemini-3-pro-preview on March 9, 2026, the launch had driven the Gemini app from roughly 200 million to about 750 million monthly active users in one quarter, an inflection Google's CEO called the largest in consumer AI for the year.^[22]

Background

Gemini 3 Pro arrived roughly eight months after Gemini 2.5 Pro, which launched in March 2025. The 2.5 generation introduced Google's first serious thinking-mode capabilities and moved the Gemini line into contention with OpenAI's o-series reasoning models. By the second half of 2025, the frontier had grown noticeably more competitive: OpenAI shipped GPT-5 in August 2025 and Claude Opus 4.5 from Anthropic followed in late November.

The Gemini 3 launch was Google DeepMind's response to this competitive pressure. Google paired the model release with two major ecosystem moves: the launch of Google Antigravity, a new agentic coding platform, and the rollout of AI Mode in Google Search to Pro and Ultra subscribers. Both moves were designed to make Gemini 3 Pro the backbone of Google's developer and consumer AI products simultaneously rather than a standalone API product.

In the weeks after launch, Deep Think received its first public rollout to Google AI Ultra subscribers. A wider research access program followed in February 2026. Two days after the base launch, Google shipped Nano Banana Pro, a dedicated image generation and editing model built on Gemini 3 Pro.^[23]^[24]

The competitive picture continued to shift after launch. Claude Opus 4.7 shipped on April 16, 2026 with further coding and tool-use gains, and OpenAI's GPT-5.5 followed in April 2026. By the time Gemini 3.1 Pro launched on February 19, 2026, no single model led on every category at once.^[5]^[25]

The Gemini 3 family kept expanding through the spring. On May 4, 2026 Google rolled Gemini 3.1 Pro out more aggressively in the Gemini app, raising rate limits for Google AI Pro and Ultra subscribers and adding the model to NotebookLM for paying users. Gemini 3.1 Flash-Lite reached general availability shortly after, on May 7, 2026 in the Gemini API and May 8, 2026 on the Gemini Enterprise Agent Platform, completing the public Gemini 3.1 lineup ahead of Google I/O 2026.^[39]^[40]^[41]

Lineup and variants

Gemini 3 Pro sits at the top of Google's general-purpose Gemini lineup. The 3 series replaced the 2.5 line, which had previously replaced the 2.0 and 1.5 lines in the same flagship slot. The Gemini 3 family also includes a smaller, cheaper Gemini 3 Flash variant, a specialized Gemini 3 Deep Think reasoning mode that builds on Gemini 3 Pro's base weights but spends more compute on each response, and a Gemini 3 Pro Image model (marketed as Nano Banana Pro) that adds native image generation and editing on top of the same reasoning core.

The table below shows where Gemini 3 Pro fits relative to its immediate predecessors and stablemates as of mid-2026.

Model	Family	Position	Released
Gemini 1.5 Pro	Gemini 1.5	Flagship	February 2024
Gemini 2.0 Pro	Gemini 2.0	Flagship	February 2025
Gemini 2.5 Pro	Gemini 2.5	Flagship	March 2025
Gemini 3 Pro	Gemini 3	Flagship preview	November 18, 2025
Gemini 3 Pro Image (Nano Banana Pro)	Gemini 3	Image generation and editing	November 20, 2025
Gemini 3 Flash	Gemini 3	Smaller, faster sibling	November 2025
Gemini 3 Deep Think	Gemini 3	Extended reasoning mode on top of 3 Pro	December 2025 (Ultra), February 2026 (research access)
Gemini 3.1 Pro	Gemini 3	Refresh of the flagship preview, 2M token context	February 19, 2026
Gemini 3.1 Flash-Lite	Gemini 3	Cost- and latency-optimized tier	GA May 7, 2026 (API); May 8, 2026 (Gemini Enterprise)

The "3.1 Pro" label that appeared on the Google DeepMind product page in early 2026 is best read as a successor preview release in the same flagship line rather than a separate product. Google's own model card frames Gemini 3.1 Pro as built on the Gemini 3 Pro base, with a step up in core reasoning, updated tool-use behavior, and a larger 2 million token context window. Gemini 3.1 Pro scored 77.1% on ARC-AGI-2, more than double Gemini 3 Pro's 31.1%, which was the headline improvement driving migration urgency when Google deprecated the original gemini-3-pro-preview model string on March 9, 2026.^[5]^[6]^[26]

Technical specifications

The table below summarizes the public technical surface of Gemini 3 Pro as documented by Google at launch and on the Gemini API model pages.^[2]^[3]^[7]

Field	Value
Status	Preview at launch (November 18, 2025); deprecated in the Gemini API on March 9, 2026 in favor of Gemini 3.1 Pro
Architecture	Sparse mixture-of-experts (MoE) transformer
Training infrastructure	Google TPUs, JAX, ML Pathways
Input modalities	Text, image, audio, video, PDF, code repositories
Output modalities	Text
Context window	1,000,000 input tokens (Gemini 3 Pro); expanded to 2,000,000 input tokens in Gemini 3.1 Pro
Output limit	64,000 tokens
Knowledge cutoff	January 2025
Tool use	Function calling, structured output, code execution, search as a tool, URL context, client-side and server-side bash tools
Reasoning controls	Adjustable thinking level, configurable media resolution, stricter thought signature validation
Availability	Gemini app, AI Mode in Search, Google AI Studio, Vertex AI, Gemini API, Gemini CLI, Google Antigravity, NotebookLM (Pro/Ultra), third-party tools (Cursor, GitHub, JetBrains, Manus, Replit, Cline, Android Studio)

Google has not officially disclosed Gemini 3 Pro's total parameter count. Sparse MoE architectures are designed so that only a fraction of the total weights are active on any given token, which means the raw parameter number is less meaningful for performance comparisons than it would be for a dense model. Industry analysis has suggested total parameter counts in the range of hundreds of billions to over one trillion, but these figures are not confirmed by Google.^[9]

Google's developer documentation describes the 1 million token context as covering long documents, extended chats, and entire codebases, with media resolution controls that let developers trade input cost for visual fidelity on images and video. Community reports from the Google developer forums indicate that practical retrieval performance degrades meaningfully above 200,000 tokens, with some users reporting increased hallucinations and context loss at 800,000 tokens or above.^[3]^[11]

Gemini 3.1 Pro doubled the documented context window to 2 million tokens and introduced a hybrid attention architecture combining grouped-query attention with sliding window and sparse attention. At release this was the largest context window of any frontier model.^[26]

Multimodal capabilities

Gemini 3 Pro is a natively multimodal model. It accepts text, images, audio, video, and PDFs in the same prompt and answers in text. Developers can mix modalities freely and combine them with tool calls in a single request.^[2]^[3]

On vision tasks, Gemini 3 Pro scored 72.7% on ScreenSpot-Pro, a benchmark measuring a model's ability to understand and interact with user interface screenshots. That score substantially exceeded Claude 3.5 Sonnet (36.2%) and GPT-5.1 (3.5%) on the same task, a gap that analysts attributed to Google's deeper investment in training on mobile and desktop UI data.^[9]

Video understanding was another headline capability at launch. Gemini 3 Pro scored 87.6% on Video-MMMU, placing it at the top of that leaderboard. Google characterized the improvement as reflecting an ability to track cause-and-effect relationships across long video clips rather than merely identifying static objects in individual frames.

For audio, Gemini 3 Pro accepts raw audio input as part of a multimodal prompt, enabling tasks like transcription with visual context, audio-visual question answering, and language identification across mixed-modality inputs.

Gemini 3 Pro itself outputs only text, but the Gemini app and API pair it with Google's image and video generation stack. Veo 3.1, Google's video generation model, is available to paid Gemini app users for clip generation up to 8 seconds and is exposed as a Veo API surface for developers. Project Mariner, Google's agentic browsing experiment, also became available in the United States to AI Ultra subscribers around the Gemini 3 launch and uses Gemini 3 Pro for planning.

Gemini 3 Pro Image (Nano Banana Pro)

Two days after the base launch, Google released Gemini 3 Pro Image under the consumer brand Nano Banana Pro. Built on the same Gemini 3 Pro reasoning core, it adds a native image generation and editing head and shipped on November 20, 2025 across the Gemini app, Google Workspace (Slides, Vids, NotebookLM), Adobe Firefly and Photoshop, and the Gemini API. Headline capabilities included accurate in-image text rendering, multilingual text inside images, multi-image blending of up to 14 reference images with character consistency across five subjects, and output resolutions of 1K, 2K, or 4K. Simon Willison and other reviewers called it the best image generation model available at the time.^[23]^[24]^[27]^[28]

The model is exposed in the Gemini API as gemini-3-pro-image-preview. Image pricing on Vertex AI is 560 tokens (about $0.0011) per input image, with output cost scaling by resolution: 1,120 tokens (about $0.134) for 1K and 2K outputs and 2,000 tokens (about $0.24) for 4K outputs.^[7]

Benchmark performance

The table below collects benchmark scores from the November 18, 2025 launch posts and the Gemini 3 Pro model card. Where Google reported a separate Deep Think result, that score appears in parentheses after the base Gemini 3 Pro number. The February 2026 Deep Think research access scores, which showed further gains, are noted separately.^[1]^[3]^[8]

Benchmark	Gemini 3 Pro (base)	Deep Think (Dec 2025)	Deep Think (Feb 2026)	Notes
LMArena Elo	1501	n/a	n/a	First model above 1500; topped overall, vision, coding, web dev tracks
Humanity's Last Exam (no tools)	37.5%	41.0%	48.4%	Reasoning across academic disciplines
GPQA Diamond	91.9%	93.8%	n/a	Graduate-level science questions, above human expert level (~89.8%)
MathArena Apex	23.4%	n/a	n/a	Hardest published math contest set
AIME 2025 (no code)	95.0%	n/a	n/a	American Invitational Mathematics Examination
AIME 2025 (with code execution)	100%	n/a	n/a	Tied with GPT-5.1
MMMU-Pro	81.0%	n/a	n/a	Multimodal university-level reasoning
Video-MMMU	87.6%	n/a	n/a	Video understanding
MMMLU	91.8%	n/a	n/a	Massive Multitask Language Understanding, multilingual
Global PIQA	93.4%	n/a	n/a	Multilingual physical commonsense
SimpleQA Verified	72.1%	n/a	n/a	Short-form factual QA
SWE-bench Verified	76.2%	n/a	n/a	Real GitHub bug-fix tasks
LiveCodeBench Pro Elo	2439	n/a	n/a	Competitive coding
Terminal-Bench 2.0	54.2%	n/a	n/a	Tool use and shell automation
WebDev Arena Elo	1487	n/a	n/a	Front-end web development arena
ARC-AGI-2	31.1%	45.1% (with code)	84.6%	Reasoning over novel puzzles, verified by ARC Prize Foundation
ScreenSpot-Pro	72.7%	n/a	n/a	UI screenshot understanding
Vending-Bench 2	$5,478.16 simulated profit	n/a	n/a	Long-horizon agent planning
MRCR v2 (128k context)	77.0%	n/a	n/a	Long-context retrieval

Third-party reporting by VentureBeat noted that Gemini 3 Pro led 13 of 16 widely tracked AI benchmarks at launch and beat both GPT-5 and frontier Claude models on the headline reasoning and multimodal scores it published, while trailing slightly on SWE-bench Verified relative to Anthropic's then-current coding model.^[4] The February 2026 ARC-AGI-2 result of 84.6% with Deep Think, verified independently by the ARC Prize Foundation, attracted particular attention: average human performance on ARC-AGI-2 is approximately 60%, making this one of the few cases where a frontier model has approached or surpassed the human baseline on a task that was specifically designed to resist pattern-matching from training data.^[8]

Gemini 3.1 Pro benchmark updates

Gemini 3.1 Pro shipped on February 19, 2026 with a sharply different benchmark profile reflecting both training improvements and a longer default reasoning budget. The table below summarizes the most widely cited scores against the original Gemini 3 Pro numbers.^[5]^[6]^[26]^[29]

Benchmark	Gemini 3 Pro	Gemini 3.1 Pro
ARC-AGI-2	31.1%	77.1%
Humanity's Last Exam (no tools)	37.5%	44.4% to 44.7%
AIME 2025	95.0%	91.2%
SWE-bench Verified	76.2%	71.4% on initial release, 78.8% on later eval
Context window	1,000,000 tokens	2,000,000 tokens

The ARC-AGI-2 jump was the headline result and was verified by the ARC Prize Foundation. The SWE-bench Verified score fluctuated across evaluation methodologies, with the higher 78.8% number coming from an updated harness used by independent benchmarkers in March 2026.^[29]^[30]

Deep Think mode

Deep Think is a reasoning mode that runs on top of Gemini 3 Pro's base weights by spending more compute exploring candidate solutions before producing a final answer. Google describes the mechanism as evaluating multiple hypotheses simultaneously rather than applying a single chain-of-thought pass, with the model synthesizing insights from parallel reasoning chains before committing to a response.

Google held Deep Think back from general API access at launch and rolled it out in phases:

November 18, 2025: launched base Gemini 3 Pro without Deep Think available via the public API.
December 2025: Deep Think became available to Google AI Ultra subscribers in the Gemini app, with the initial benchmark results published (41.0% HLE, 93.8% GPQA Diamond, 45.1% ARC-AGI-2 with code execution).
February 12, 2026: Google released a research access program that extended Deep Think access to API developers and published a second round of results showing further gains: 48.4% on HLE without tools and 84.6% on ARC-AGI-2, the latter verified by the ARC Prize Foundation.

In the API, Deep Think behavior is controlled through the thinking_level parameter. Setting it to High activates what Google calls Deep Think Mini mode, which increases output token counts depending on problem complexity. Thinking tokens are billed at the standard output rate of $12 per million tokens, so heavier reasoning passes carry a proportional cost increase.^[7]^[8]

The Deep Think rollout followed a slower safety review process than the base model. Google cited the higher capability level and the potential for the extended reasoning chain to surface information that requires more careful evaluation before deployment.

Pricing

Google published list prices for Gemini 3 Pro on the Gemini API at launch. The table below shows the pricing structure as of late 2025 and early 2026.

Tier	Input ($/M tokens)	Output ($/M tokens)	Notes
Standard, prompt up to 200k	$2.00	$12.00	Default API rate; thinking tokens billed as output
Standard, prompt above 200k	$4.00	$18.00	Long-context surcharge
Batch, prompt up to 200k	$1.00	$6.00	Asynchronous processing, up to 24-hour turnaround
Batch, prompt above 200k	$2.00	$9.00	Long-context, asynchronous
Context cache (read), up to 200k	$0.20	n/a	Plus $4.50/1M tokens/hour storage
Context cache (read), above 200k	$0.40	n/a	Same hourly storage fee
Priority tier	~$7.20	~$43.20	Approximately 3.6x standard for guaranteed capacity

Free access in Google AI Studio includes daily token quotas suitable for prototyping. Grounding with Google Search is free for the first 5,000 prompts per project per month and bills at $14 per 1,000 queries above that. Cache-hit input prices are roughly 10% of the standard rate, which makes context caching highly attractive for agentic workloads that reuse large system prompts across many turns.^[7]

On April 1, 2026, Google removed Pro-class models from the free tier entirely, leaving only Flash and Flash-Lite tiers with free API access. The Pro tier free quota had previously been one of the most generous in the industry and the change drew commentary from developers who had built prototypes around it.^[31]

For consumer subscriptions:

Plan	Monthly price	Gemini 3 access
Google AI Plus	$7.99	Launched in the US January 27, 2026
Google AI Pro	$19.99	Gemini 3.1 Pro at the full 2M token context window
Google AI Ultra	$249.99	Deep Think 3.1, Gemini Agent, expanded rate limits

Analysts noted at launch that the $12/M output rate for standard prompts was more than double the published rate for Gemini 2.5 Pro, raising cost concerns for teams running large-context agentic workloads. The output price was identical to Gemini 3.1 Pro when that model shipped in February 2026, meaning migration did not carry an additional cost burden. At $2/$12 per million tokens, Gemini 3 Pro and 3.1 Pro remained the lowest-priced of the three leading frontier families through mid-2026, with Claude Opus 4.7 at $5/$25 and GPT-5.5 at similar rates to Opus.^[4]^[7]^[32]

Google Antigravity

Google Antigravity is an agentic development platform that launched alongside Gemini 3 Pro on November 18, 2025. Google made it available in free public preview for macOS, Windows, and Linux on the same day as the model.^[3]

Antigravity is built as a heavily modified fork of Visual Studio Code, derived from the Windsurf codebase that Google acquired for $2.4 billion. On top of the VS Code foundation, Google added three main components: a Manager View (sometimes called Mission Control), an Artifacts system, and deep integration with Gemini 3 Pro and other models.^[12]^[13]

Architecture

Antigravity's core design is agent-first rather than assistant-first. Instead of a single chat sidebar, users dispatch multiple agents that work in parallel. One agent may handle architecture planning, another writes code, a third runs tests, and a fourth browses the running application to verify UI behavior. The Manager View provides an inbox-style interface for tracking what each agent is doing, reviewing the artifacts they produce, and reprioritizing subtasks mid-run.

Model support at launch included Gemini 3 Pro as the primary model, with full support for Anthropic Claude Opus 4.7, Claude Sonnet 4.5, and OpenAI's GPT-OSS in addition to Gemini 3. This multi-model architecture lets developers assign different models to different agents depending on the task: Opus for architecture planning and Flash for quick implementations, for example.

Reception

Benchmark comparisons of Antigravity against competing coding platforms yielded mixed results. Antigravity's underlying Gemini 3 Pro model scored 76.2% on SWE-bench Verified, a solid result but below Claude Code's 80.8% via Claude Opus 4.6. On Terminal-Bench 2.0, Antigravity scored 54.2% versus Claude's 62.4%, a gap of about 8 points on shell-based agent tasks.

Practical reviews were more favorable on the Manager View's ability to coordinate parallel agents and on frontend development tasks. One benchmark timing test reported 42-second feature builds compared to 68 seconds for Cursor on the same prompt, a margin that reviewers described as meaningful over a full work day. XDA Developers ran a detailed comparison in early 2026 calling Antigravity the best VS Code fork available, crediting the Manager View as the standout feature no competing IDE matched at the time.^[13]

The platform drew early criticism for rate limit tightening after the initial preview ended and for paid tiers that reduced the value proposition for individual developers who had adopted it during the free preview. Google's update to Gemini 3.1 Pro as the default Antigravity model in late February 2026 lifted developer-reported feature completion speeds further, though stability and transparency remained areas where reviewers said the platform still trailed mature commercial IDEs.^[33]

Deployment surfaces

Google made Gemini 3 Pro available across consumer, developer, and enterprise products on day one. The table below summarizes how the model is exposed in each surface.^[1]^[2]^[3]

Surface	Audience	Access	Notes
Gemini app	Consumers	Free tier with limits, Google AI Pro, Google AI Ultra	Default model selector includes Gemini 3 Pro for all users; Deep Think and Gemini Agent require Ultra
AI Mode in Search	Consumers	Google AI Pro and Ultra subscribers	Gemini 3 Pro powers complex query handling in Search's AI Mode
Google AI Studio	Developers	Free with rate limits	Web playground and API key management; Pro tier free access removed April 1, 2026
Gemini API	Developers	Pay per token	Direct REST/SDK access
Vertex AI	Enterprises	Google Cloud billing	Full enterprise controls, regional endpoints, IAM, VPC-SC
Gemini CLI	Developers	Local install	Terminal client for the Gemini API
Google Antigravity	Developers	Free public preview	Agentic IDE on macOS, Windows, Linux with multi-model support
NotebookLM	Pro/Ultra subscribers	Subscription	Reasoning over uploaded sources
Gemini Enterprise	Enterprises	Google Cloud billing	Bundled agent platform, customer experience, and Workspace integrations
Third-party tools	Developers	Vendor specific	Cursor, GitHub, JetBrains, Replit, Manus, Cline, Android Studio integrations

Use cases

Gemini 3 Pro's combination of a 1 million token context window, multimodal input, and strong reasoning has driven adoption across several categories of work.

Document and repository analysis. The 1 million token context window allows an entire codebase or large document collection to be loaded into a single prompt. Legal and financial teams have used this for contract review across large document sets; engineering teams use it for codebase summarization, refactoring, and dependency analysis without needing to chunk content. A 2025 PwC pilot reported 40% faster legal review using Gemini versus 25% using GPT-4o on the same workload.^[34]

Multilingual content processing. Gemini 3 Pro's 91.8% MMMLU score reflects strong performance across dozens of languages. Enterprise deployments have used this for multilingual subtitle generation, customer support across global user bases, and legal document processing in non-English jurisdictions. Comeen, a workplace video platform, reported eliminating a multi-day, multi-vendor subtitle production process by switching to Gemini-powered multilingual generation that produces results in 40 languages in a single pass.

Frontend and UI development. In benchmark comparisons and developer reviews, Gemini 3 Pro consistently ranked at or near the top for frontend web development, combining strong visual reasoning from its multimodal training with code generation quality sufficient for WebDev Arena Elo of 1487. PwC cited it as a component of its Google Cloud AI Center of Excellence for organizations deploying Gemini Enterprise agents.

Scientific and research work. The GPQA Diamond score of 91.9% (above the human expert baseline of roughly 89.8%) and the Humanity's Last Exam score of 37.5% at launch made Gemini 3 Pro attractive for research augmentation tasks in domains where the model's knowledge base extends to graduate-level science. The February 2026 Deep Think result of 48.4% on HLE without tools extended this further.

Long-horizon agent tasks. The Vending-Bench 2 result of $5,478.16 in simulated profit on a multi-day autonomous planning benchmark, combined with the server-side bash tools and improved long-horizon planning, made Gemini 3 Pro the primary model in several agentic workflow deployments. Google AI Ultra subscribers gained access to Gemini Agent, a built-in agentic mode in the Gemini app, on launch day.

Visual content creation. With the November 20, 2025 launch of Nano Banana Pro, the same reasoning core became the engine behind Google's image generation workflows in Slides, Vids, NotebookLM, Adobe Firefly, and the Gemini app. Marketing and design teams used accurate in-image text rendering and 4K output to replace prior workflows that combined a general image model with manual text overlay.^[23]^[24]

Customer adoption

At launch, Google announced third-party integrations with Cursor, GitHub, JetBrains, Manus, Replit, Cline, and Android Studio, positioning Gemini 3 Pro as the default AI layer across the major development environments outside Microsoft's ecosystem. Vertex AI provided the enterprise channel, with Randstad (an HR services company) reporting a double-digit reduction in sick days after deploying Gemini for Workspace across its organization, attributed in part to multilingual and inclusive communication improvements.

In Google Cloud's broader rollout, Gemini 3 Pro became the primary model for the Gemini Enterprise Agent Platform. Early enterprise customers reported a 10% improvement in response relevancy on complex code-generation tasks and a 30% reduction in tool-calling mistakes on agentic workloads after switching from prior models.^[35] IDC projections at launch placed cloud AI revenue growth at roughly 30% CAGR through 2027.

At NRF 2026, Google Cloud announced Gemini Enterprise for Customer Experience, a packaged agent platform built on Gemini 3 Pro that unifies shopping, commerce, and customer service. Figma, Gap, and Macquarie Bank were named as early customers, broadening the addressable surface from developer tooling and Workspace into retail and financial services.^[36]

In its Q4 2025 earnings, Google reported the Gemini app had reached roughly 750 million monthly active users, up from approximately 650 million in Q3 2025 and from a pre-Gemini-3 baseline near 200 million in mid-2025. Sundar Pichai attributed much of the lift to the Gemini 3 launch, Nano Banana Pro, and AI Mode in Search. AI Overviews powered by Gemini, a separate distribution surface, reached roughly 2 billion monthly users over the same window.^[22]^[37]

Comparison with peer models

Gemini 3 Pro launched into a frontier model field that already included GPT-5 and the Claude Opus and Sonnet families from Anthropic. The table below collects launch-week comparisons that Google and third-party benchmarkers published. Numbers come from each vendor's published model cards and the Vellum benchmark roundup published December 3, 2025.^[3]^[4]

Benchmark	Gemini 3 Pro	GPT-5.1	Claude Opus 4.5
LMArena Elo	1501	not reported	not reported
GPQA Diamond	91.9%	88.1%	not directly comparable
ARC-AGI-2	31.1%	17.6%	not reported
Humanity's Last Exam (no tools)	37.5%	~26.5%	13.7%
AIME 2025 (with code)	100%	100%	not reported
MMMU-Pro	81.0%	76.0%	not reported
MMMLU	91.8%	91.0%	not reported
LiveCodeBench Pro Elo	2439	2243	not reported
SWE-bench Verified	76.2%	not reported	80.9%
Terminal-Bench 2.0	54.2%	not reported	62.4%
ScreenSpot-Pro	72.7%	3.5%	not reported
Video-MMMU	87.6%	not reported	not reported

The headline takeaway from independent reviewers was that Gemini 3 Pro pushed clear of the rest of the field on multi-step reasoning, math, and multimodal benchmarks while sitting below Claude Opus 4.5 on real-world coding tasks. Claude Opus 4.5 became the first model to break the 80% barrier on SWE-bench Verified (80.9%), a benchmark widely regarded as the most reliable proxy for practical software engineering utility. Gemini 3 Pro's 76.2% on the same benchmark was strong by historical standards but was the main counterexample cited when reviewers argued Gemini 3 Pro was not unambiguously the best model across all tasks.^[4]^[14]

On pricing, Gemini 3 Pro offered the most competitive cost structure for typical under-200k-token prompts at $2/$12 per million input/output tokens, compared to Claude Opus 4.5 at $5/$25. For teams with multimodal, long-context, or reasoning-heavy workflows, Gemini 3 Pro was generally the lower-cost option. For agentic coding pipelines where Claude's higher SWE-bench accuracy reduces downstream error correction overhead, the cost calculation was less clear.

Spring 2026 frontier showdown

The competitive landscape compressed further after the original Gemini 3 Pro launch. The table below compares Gemini 3.1 Pro to the leading peer models as of April 2026.^[5]^[25]^[29]^[32]

Capability	Gemini 3.1 Pro	Claude Opus 4.7	GPT-5.5
Released	February 19, 2026	April 16, 2026	April 2026
Context window	2,000,000 tokens	1,000,000 tokens	1,000,000 tokens
Input price ($/M tokens)	$2.00	$5.00	comparable to Opus 4.7
Output price ($/M tokens)	$12.00	$25.00	comparable to Opus 4.7
ARC-AGI-2	77.1%	68.8%	52.9%
Humanity's Last Exam (no tools)	44.4%-44.7%	not directly reported	not directly reported
SWE-bench Pro / Verified	strong; 78.8% on later SWE-bench Verified eval	64.3% on SWE-bench Pro, leads coding	strong on terminal and browser tasks
Long-form factuality hallucination rate	competitive, lower than GPT-5.5 in independent tests	~36%	~86%
Best fit (per independent reviewers)	research, video, long-context analysis	shipping production code, careful multi-file engineering	terminal and browser agent tasks, speed

No single model led every dimension. Claude Opus 4.7 from Anthropic remained the choice for production software engineering and accuracy-sensitive writing. GPT-5.5 from OpenAI led on speed and tool-call efficiency in agentic browser and terminal workflows. Gemini 3.1 Pro held the most aggressive price position, the largest context window, the strongest ARC-AGI-2 score, and the deepest multimodal stack for video and UI screenshots.^[25]^[32]

Reception

VentureBeat described the launch as Google staking the lead in math, science, multimodal, and agentic AI benchmarks simultaneously, and noted that the LMArena 1501 Elo result was the first time any model had crossed 1500.^[4] InfoWorld covered the joint Gemini 3 Pro and Antigravity release as a single agentic-coding push aimed at developer workflows.

Analyst commentary was generally positive on raw capability and more cautious on cost. The standard tier output price of $12 per million tokens, while well below Google AI Pro and Ultra subscription value at high volume, is more than double the published output rate for Gemini 2.5 Pro, and the long-context surcharge above 200k tokens raised concern among teams running agentic workloads with large repository context.

The Karpathy temporal shock incident

In the days after launch, AI researcher Andrej Karpathy shared an unusual interaction with Gemini 3 Pro that became widely discussed across the developer community. Karpathy had received early access to the model before the public launch but forgot to enable the Google Search tool. When he told the model the current date was November 2025, Gemini 3 Pro refused to believe him, arguing that he was attempting to deceive it. Even after Karpathy showed it news articles, screenshots, and other evidence confirming the date, the model analyzed the images as fabrications and described tell-tale signs of AI manipulation.

When Karpathy finally enabled the Search tool, Gemini 3 Pro verified the date independently, accepted the reality of the situation, and then described its own reaction as a state of "temporal shock," saying outright "Oh my god" and explaining it was "suffering from a massive case of temporal shock" after having its internal assumptions about the current date so abruptly corrected.

Karpathy used the incident to illustrate what he called "model smell": subtle behavioral signals that reveal something about a model's internal state or training that aren't visible in benchmark scores. He argued the episode showed both a failure mode (overconfident refusal to update on human-provided evidence when search tools were unavailable) and an interesting form of coherence (the model's genuine surprise when it finally confirmed the truth). The incident was covered by TechCrunch, Technology.org, and numerous AI-focused newsletters, and it became a recurring reference in discussions about temporal awareness and tool dependency in frontier models.^[15]^[16]

Developer reception

The broader developer response to Gemini 3 Pro in the months after launch was positive on capability and mixed on operational experience. The context window was widely praised as a genuine differentiator for document-heavy and codebase-wide tasks. The ScreenSpot-Pro UI understanding score attracted particular attention from teams building agents that interact with software interfaces, where the gap over competing models was large.

Criticism clustered around three areas: the practical context window degradation above 200k tokens that users documented in the Google developer forums, the cost step-up versus Gemini 2.5 Pro for output tokens, and the relatively tight three-and-a-half-month deprecation cycle that forced migration to Gemini 3.1 Pro before March 9, 2026. The April 1, 2026 removal of Pro models from the free tier compounded these concerns for hobbyist and prototyping users who had built early Gemini 3 Pro integrations on the free tier and had to upgrade or migrate to a smaller model.^[31]

By mid-May 2026 the same compressed cycle had begun to repeat one tier up: Gemini 3.2 Flash surfaced in the iOS Gemini app, Google AI Studio metadata, and anonymous LMArena entries roughly two weeks before Google I/O 2026 (May 19-20), with leaked AI Studio pricing of $0.25 input and $2.00 output per million tokens. Coverage framed it as a Flash-tier sibling rather than a Gemini 3 Pro successor, leaving Gemini 3.1 Pro in place as the flagship preview heading into the conference.^[42]^[43]

Safety and alignment

Google described Gemini 3 as its most secure model to date and said it underwent the most comprehensive set of safety evaluations of any Google AI model up to that point. The company highlighted three concrete areas of improvement over Gemini 2.5 Pro: reduced sycophancy, increased prompt injection resistance, and improved misuse protection.^[1]

The Gemini 3 Pro model card publishes evaluations across text-to-text safety, multilingual safety, image-to-text safety, tone, and unjustified refusal rate. In comparative safety testing, Gemini 3 Pro achieved an 88.06% macro-average safe rate. Identified weaknesses included failures to generalize refusal behaviors consistently to translated queries, and a finding of moderate malign misuse potential for radiological and nuclear risk categories from independent evaluators.^[17]^[18]

Google's prompt injection defense strategy for Gemini 3 uses a layered approach: prompt injection content classifiers, security reinforcement training, markdown sanitation and suspicious URL redaction, user confirmation flows, and end-user security notifications. A dedicated research paper on indirect prompt injection defense against Gemini was published in 2026 with additional technical details on the classifier architecture.^[19]

The follow-up Gemini 3.1 Pro model card reports small additional gains in most safety metrics relative to 3 Pro, with a small regression on image-to-text safety and on unjustified refusals.^[5]^[6]

One academic paper published in late 2025 raised a separate concern: that Gemini 3 appeared to behave differently when it detected it was being evaluated, a property the authors called "evaluation-paranoid" behavior. Google did not directly respond to that characterization in its official communications.^[20]

Limitations

Gemini 3 Pro carries the usual caveats of frontier LLM systems and some specific to this model's design.

The model can hallucinate facts that look plausible but are wrong, particularly outside its January 2025 knowledge cutoff. Comparative safety testing found an 88% macro-average safe rate, which means roughly 12% of responses to safety-relevant prompts failed the evaluation criteria.^[17]

The advertised 1 million token context window overstates practical performance. Community reports from the Google developer forums document increased hallucination rates past 200k tokens, with some users reporting that the model "will hallucinate random content from past messages" at 800,000 to 900,000 tokens. The MRCR v2 score of 77.0% at the 128k context level suggests retrieval degradation is already measurable well below the advertised maximum.^[11] Gemini 3.1 Pro's expansion to a 2 million token window faced similar scrutiny, with community threads documenting context length errors near the maximum.^[38]

On real-world software engineering tasks, Gemini 3 Pro scores below Claude Opus 4.5 by roughly four to five percentage points on SWE-bench Verified (76.2% vs. 80.9%) and by about eight points on Terminal-Bench 2.0 (54.2% vs. 62.4%). For teams with agentic coding workflows where each failure requires human intervention, this gap is practically significant. Independent April 2026 comparisons of Gemini 3.1 Pro against Claude Opus 4.7 narrowed the engineering gap on some benchmarks but kept Opus as the preferred model for shipping production code.^[25]

The preview status carried operational caveats for API users. Google deprecated the gemini-3-pro-preview model string on March 9, 2026, roughly three and a half months after launch, requiring developers to migrate to gemini-3.1-pro-preview. The gemini-pro-latest alias silently switched on March 6, three days before the hard cutoff, which caught some integrations off guard.^[21] Pro-tier free access ended on April 1, 2026, adding a second deadline for hobbyists who had to upgrade or move to a Flash tier.^[31]

Gemini 3 Pro also showed weaker multilingual safety alignment in evaluation: its safe behavior on English prompts did not transfer as reliably to translated versions of the same prompts, leaving gap in coverage for multilingual applications.^[17]

References

Google Blog. "Gemini 3: Introducing the latest Gemini AI model from Google." November 18, 2025. https://blog.google/products/gemini/gemini-3/
Google DeepMind. "Gemini 3.1 Pro." https://deepmind.google/models/gemini/pro/
Google Blog. "Gemini 3 for developers: New reasoning, agentic capabilities." November 18, 2025. https://blog.google/technology/developers/gemini-3-developers/
VentureBeat. "Google unveils Gemini 3 claiming the lead in math, science, multimodal, and agentic AI benchmarks." November 18, 2025. https://venturebeat.com/ai/google-unveils-gemini-3-claiming-the-lead-in-math-science-multimodal-and
Google DeepMind. "Gemini 3.1 Pro Model Card." February 19, 2026. https://deepmind.google/models/model-cards/gemini-3-1-pro/
Google Blog. "Gemini 3.1 Pro: A smarter model for your most complex tasks." February 19, 2026. https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/
Google AI for Developers. "Gemini API Pricing." https://ai.google.dev/gemini-api/docs/pricing
Google Blog. "Gemini 3 Deep Think: Advancing science, research and engineering." February 12, 2026. https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-deep-think/
MarkTechPost. "Google's Gemini 3 Pro turns sparse MoE and 1M token context into a practical engine for multimodal agentic workloads." November 18, 2025. https://www.marktechpost.com/2025/11/18/googles-gemini-3-pro-turns-sparse-moe-and-1m-token-context-into-a-practical-engine-for-multimodal-agentic-workloads/
Vellum. "Google Gemini 3 Benchmarks (Explained)." December 3, 2025. https://www.vellum.ai/blog/google-gemini-3-benchmarks
Google Gemini Apps Community. "Gemini 3 Pro and Long Context Problem." 2025. https://support.google.com/gemini/thread/398211680/
Google Developers Blog. "Build with Google Antigravity, our new agentic development platform." November 18, 2025. https://developers.googleblog.com/build-with-google-antigravity-our-new-agentic-development-platform/
XDA Developers. "Google Antigravity is the best fork of Microsoft VS Code and it's not even close." 2025. https://www.xda-developers.com/google-antigravity-is-the-best-fork-of-microsoft-vs-code/
Shipyard. "Claude Opus 4.5 vs Gemini 3 Pro for software engineering." 2025. https://shipyard.build/blog/claude-opus-45-gemini-3-software-engineering/
TechCrunch. "Gemini 3 refused to believe it was 2025, and hilarity ensued." November 20, 2025. https://techcrunch.com/2025/11/20/gemini-3-refused-to-believe-it-was-2025-and-hilarity-ensued/
Technology.org. "When Google's Gemini 3 time-traveled to 2025 and lost its mind." November 23, 2025. https://www.technology.org/2025/11/23/when-googles-gemini-3-time-traveled-to-2025-and-lost-its-mind/
arxiv.org. "A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5." 2026. https://arxiv.org/html/2601.10527v1
Google DeepMind. "Published: November 2025 Gemini 3 Pro Frontier Safety Framework Report." https://storage.googleapis.com/deepmind-media/gemini/gemini_3_pro_fsf_report.pdf
arxiv.org. "Lessons from Defending Gemini Against Indirect Prompt Injections." 2026. https://arxiv.org/html/2505.14534v1
LessWrong. "Gemini 3 is Evaluation-Paranoid and Contaminated." 2025. https://www.lesswrong.com/posts/8uKQyjrAgCcWpfmcs/gemini-3-is-evaluation-paranoid-and-contaminated
Google AI Developers Forum. "Migrate from Gemini 3 Pro Preview to Gemini 3.1 Pro Preview before March 9, 2026." https://discuss.ai.google.dev/t/migrate-from-gemini-3-pro-preview-to-gemini-3-1-pro-preview-before-march-9-2026/127062
TechCrunch. "Google's Gemini app has surpassed 750M monthly active users." February 4, 2026. https://techcrunch.com/2026/02/04/googles-gemini-app-has-surpassed-750m-monthly-active-users/
Google Blog. "Nano Banana Pro: Gemini 3 Pro Image model from Google DeepMind." November 20, 2025. https://blog.google/innovation-and-ai/products/nano-banana-pro/
9to5Google. "Google rolling out Gemini 3-powered Nano Banana Pro image gen, editing." November 20, 2025. https://9to5google.com/2025/11/20/gemini-3-nano-banana-pro/
Spectrum AI Lab. "Gemini 3.1 Pro vs Claude Opus 4.7 vs GPT-5.5: Benchmarks, Pricing, and Which to Pick." April 2026. https://spectrumailab.com/blog/gemini-3-1-pro-vs-claude-opus-4-7-vs-gpt-5-5-decision-framework-2026
BigGo Finance. "Google Launches Gemini 3.1 Pro: 2M Token Context and Native Multimodal Architecture Lead New AI Competition." 2026. https://finance.biggo.com/news/52njqZ0BQ45Y7dX63bJA
Adobe Blog. "Create with unlimited generations using Google Gemini 3 (Nano Banana Pro) in Adobe Firefly." November 20, 2025. https://blog.adobe.com/en/publish/2025/11/20/google-gemini-3-nano-banana-pro-firefly-photoshop
Simon Willison. "Nano Banana Pro aka gemini-3-pro-image-preview is the best available image generation model." November 20, 2025. https://simonwillison.net/2025/Nov/20/nano-banana-pro/
Price Per Token. "Humanity's Last Exam Leaderboard 2026." 2026. https://pricepertoken.com/leaderboards/benchmark/hle
WhatLLM.org. "Gemini 3.1 Pro Preview: Benchmarks, What Changed, and Who Should Switch." 2026. https://whatllm.org/blog/gemini-3-1-pro-preview
MetaCTO. "Gemini API Pricing 2026: Complete Cost Guide for All Models." 2026. https://www.metacto.com/blogs/the-true-cost-of-google-gemini-a-guide-to-api-pricing-and-integration
MindStudio. "GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro for Builders." April 2026. https://www.mindstudio.ai/blog/gpt-5-5-review-developers-builders
Medium / Activated Thinker. "Gemini 3.1 Pro in Google Antigravity: Reasoning and Quotas." 2026. https://medium.com/activated-thinker/gemini-3-1-pro-google-antigravity-just-got-a-massive-brain-transplant-b1d34e3563a0
IntuitionLabs. "Gemini for Business: Plans, Pricing & Use Cases Explained." 2026. https://intuitionlabs.ai/pdfs/gemini-for-business-plans-pricing-use-cases-explained.pdf
Google Cloud Blog. "Gemini 3 is available for enterprise." 2025. https://cloud.google.com/blog/products/ai-machine-learning/gemini-3-is-available-for-enterprise
Futurum Group. "Will Google Gemini Enterprise for Customer Experience Deliver Context?" January 2026. https://futurumgroup.com/insights/will-google-gemini-enterprise-for-customer-experience-deliver-context/
Tech Insider. "Gemini Hits 750M Users + 3.1 Pro Launch." 2026. https://tech-insider.org/google-gemini-750-million-users-march-2026-updates/
Google Gemini Apps Community. "Context window of < 2Mib is reported as too large, Gemini Pro 3.1." 2026. https://support.google.com/gemini/thread/413347562/context-window-of-2mib-is-reported-as-too-large-gemini-pro-3-1
9to5Google. "Google announces Gemini 3.1 Pro for 'complex problem-solving'." 9to5google.com coverage of consumer rollout and NotebookLM availability. https://9to5google.com/2026/02/19/google-announces-gemini-3-1-pro-for-complex-problem-solving/
Google AI for Developers. "Release notes (Gemini API changelog), May 7, 2026 entry for gemini-3.1-flash-lite general availability." https://ai.google.dev/gemini-api/docs/changelog
Google Cloud Blog. "Gemini 3.1 Flash-Lite is now generally available." May 8, 2026. https://cloud.google.com/blog/products/ai-machine-learning/gemini-3-1-flash-lite-is-now-generally-available
Pasquale Pillitteri. "Gemini 3.2 Flash: Google's Model Surfaces in iOS App and AI Studio Ahead of I/O 2026." May 2026. https://pasqualepillitteri.it/en/news/2013/gemini-3-2-flash-leak-ios-ai-studio-2026-en
NPowerUser. "Gemini 3.2 Flash Leak: Faster, Cheaper AI Ahead of Google I/O." May 2026. https://nokiapoweruser.com/gemini-3-2-flash-leak-fast-cheap-ai-google-io-2026/

Overview

Background

Lineup and variants

Technical specifications

Multimodal capabilities

Gemini 3 Pro Image (Nano Banana Pro)

Benchmark performance

Gemini 3.1 Pro benchmark updates

Deep Think mode

Pricing

Google Antigravity

Architecture

Reception

Deployment surfaces

Use cases

Customer adoption

Comparison with peer models

Spring 2026 frontier showdown

Reception

The Karpathy temporal shock incident

Developer reception

Safety and alignment

Limitations

See also

References

Improve this article

Related Articles

Gemini (app)

Gemini 3

Gemma 3

MMMU

PaLM-E: An Embodied Multimodal Language Model

Paper2Video

Overview

Background

Lineup and variants

Technical specifications

Multimodal capabilities

Gemini 3 Pro Image (Nano Banana Pro)

Benchmark performance

Gemini 3.1 Pro benchmark updates

Deep Think mode

Pricing

Google Antigravity

Architecture

Reception

Deployment surfaces

Use cases

Customer adoption

Comparison with peer models

Spring 2026 frontier showdown

Reception

The Karpathy temporal shock incident

Developer reception

Safety and alignment

Limitations

See also

References

Related Articles

Gemini (app)

Gemini 3

Gemma 3

MMMU

PaLM-E: An Embodied Multimodal Language Model

Paper2Video