Gemini 3 is the third major generation of the Gemini family of multimodal models developed by Google DeepMind. The family debuted on November 18, 2025 with Gemini 3 Pro as the flagship preview, and has since expanded into a tiered lineup that mirrors the structure introduced with Gemini 1.5 and refined through Gemini 2.5. As of May 2026 the family includes Gemini 3.1 Pro, Gemini 3 Flash, Gemini 3.1 Flash-Lite, the Gemini 3 Deep Think reasoning mode, and a pair of native image variants marketed as Nano Banana Pro and Nano Banana 2. All Gemini 3 models share a 1 million token input context window, a 64,000 token output limit, and a January 2025 knowledge cutoff.[1][2][6]
The Gemini 3 generation followed Gemini 2.5 Pro by roughly seven months and was positioned by Google as the company's largest single jump in reasoning capability since the original Gemini launch in December 2023. Gemini 3 Pro became the first publicly accessible large language model to cross 1500 Elo on LMArena and posted state-of-the-art scores on Humanity's Last Exam, GPQA Diamond, and ARC-AGI-2 at launch. The family's second wave, anchored by Gemini 3.1 Pro on February 19, 2026, more than doubled the ARC-AGI-2 score over the November preview and became the new flagship after Google deprecated the original gemini-3-pro-preview model string on March 9, 2026.[1][3][4][5]
Gemini 3 was the first Gemini generation Google made available simultaneously across consumer products (the Gemini app and AI Mode in Search), developer tools (Google AI Studio, Gemini API, Gemini CLI), enterprise platforms (Vertex AI, Gemini Enterprise, Gemini Code Assist), and the new agentic Google Antigravity IDE on launch day. Gemini 3 Pro replaced Gemini 2.5 Pro as the default model in the Gemini app immediately, and Gemini 3 Flash took over the default slot when it shipped on December 17, 2025.[1][2][7]
The Gemini 3 family is organized along the same three capability tiers Google has used since Gemini 1.5: a flagship Pro line for the hardest reasoning, coding, and long-context tasks; a Flash line that targets production cost and latency; and a Flash-Lite line for very high volume work where price per token matters more than headline benchmark scores. A reasoning-mode variant called Deep Think layers on top of the Pro weights, and image generation is split into two dedicated Nano Banana models that share Gemini 3's multimodal backbone but produce pixels rather than tokens.
What distinguishes Gemini 3 from earlier generations at the family level is consistency. Every member of the family supports the same 1 million token context window, the same January 2025 knowledge cutoff, the same Thinking Level reasoning control, the same set of built-in tools (Google Search grounding, Google Maps, code execution, URL context, file search, computer use), and the same multimodal input surface across text, images, video, audio, code, and PDFs up to 1,000 pages. In Gemini 2.5 these features were spread unevenly across Pro, Flash, and Flash-Lite. Gemini 3 collapsed that variance, which simplifies tier selection: developers choose by latency and cost rather than by capability gaps.[2][6][7]
The second generational shift involves how reasoning is exposed. In Gemini 2.5 Pro, thinking was opt-in through a token-budget parameter that could be set to zero. Gemini 3 makes thinking the default for every variant and replaces the budget with a categorical Thinking Level (minimal, low, medium, high) that the model uses to allocate internal reasoning compute. The model emits encrypted thought signatures that are preserved across multi-turn conversations, which keeps function calling and image editing workflows internally consistent without forcing developers to manually pass reasoning state.[1][2]
Google has not published a detailed technical paper for Gemini 3, but the developer documentation and the Gemini 3 Pro model card confirm a sparse mixture-of-experts (MoE) transformer architecture trained on Google TPUs through JAX and the ML Pathways stack. Total parameter counts have not been disclosed.[2][9]
Gemini 2.5, released March 25, 2025, was the line that first put the Gemini family in serious contention with OpenAI's o-series reasoning models and Anthropic's Claude Opus 4 family. Gemini 2.5 Pro topped LMArena at launch by close to 40 Elo points, scored 18.8% on Humanity's Last Exam, and led the WebDev Arena. Its Deep Think experimental reasoning mode took gold-medal level performance at the 2025 International Mathematical Olympiad and the 2025 ICPC World Finals, the competition wins that directly informed the reasoning architecture carried into Gemini 3.[1][13][14]
The table below shows the headline shifts between the two generations, drawing on benchmark numbers Google reported at each launch. Gemini 3 Pro's gains were largest on the hardest reasoning sets: HLE roughly doubled, ARC-AGI-2 jumped sevenfold, and SWE-bench Verified moved up by 12 percentage points. Multimodal and long-context performance also improved, though by smaller margins.
| Benchmark | Gemini 2.5 Pro | Gemini 3 Pro | Gemini 3 Deep Think (Feb 2026) |
|---|---|---|---|
| Humanity's Last Exam (no tools) | 18.8% | 37.5% | 48.4% |
| GPQA Diamond | 84.0% | 91.9% | 93.8% |
| ARC-AGI-2 | 4.9% | 31.1% | 84.6% |
| SWE-bench Verified | 63.8% | 76.2% | n/a |
| LMArena Elo | ~1452 | 1501 | n/a |
| Video-MMMU | n/a | 87.6% | n/a |
Beyond raw benchmarks, the family-level shifts that mattered most to early adopters were practical. Tool combinations that previously required separate API calls on Gemini 2.5 (function calling alongside Google Search, code execution alongside file search, computer use alongside structured outputs) became composable inside a single request on Gemini 3. The 1 million token context window was extended from Pro down to Flash and Flash-Lite, where in 2.5 only Pro had the full window. Pricing also shifted. Gemini 3 Pro standard output rose to $12 per million tokens, more than double Gemini 2.5 Pro's published rate, while Gemini 3 Flash undercut Gemini 2.5 Flash on input pricing and Gemini 3.1 Flash-Lite undercut Gemini 2.5 Flash-Lite on every dimension.[1][7][12]
Gemini 3 Pro was the flagship preview at the November 18, 2025 launch and the public face of the new generation. It targets complex reasoning, agentic coding, long-context document understanding, and advanced multimodal analysis. Gemini 3 Pro accepts text, images, audio, video, code, and PDFs up to 1,000 pages, returns text only, and exposes a 1 million token input context with a 64,000 token output limit. Google deprecated the underlying gemini-3-pro-preview model string on March 9, 2026, replacing it with Gemini 3.1 Pro at the same pricing tier.[5][7]
For full coverage of capabilities, technical specifications, the Antigravity launch, the Andrej Karpathy temporal-shock incident, and detailed benchmark commentary, see the dedicated Gemini 3 Pro article.
Gemini 3.1 Pro shipped on February 19, 2026 as a successor preview release in the same flagship slot. Google's model card frames it as built on the Gemini 3 Pro base, with what the company describes as a step up in core reasoning and updated tool-use behavior. The headline benchmark gain was on ARC-AGI-2, where Gemini 3.1 Pro scored 77.1%, more than double Gemini 3 Pro's 31.1%. The model retains the 1 million token context window, the 64,000 token output limit, and the same input/output modality surface as Gemini 3 Pro, and inherits the parent model's training data and infrastructure (Google TPUs, JAX, ML Pathways).[3][4]
Gemini 3.1 Pro pricing matches Gemini 3 Pro at $2.00 per million input tokens and $12.00 per million output tokens for prompts up to 200,000 tokens (rising to $4.00 and $18.00 for longer prompts), so the migration imposed no additional list-price cost on developers. The deprecation timing was unusually tight: the gemini-pro-latest alias silently switched to 3.1 Pro on March 6, 2026, three days before the hard cutoff for 3 Pro, which caught some early integrations off guard.[5][12]
Gemini 3 Flash launched December 17, 2025 and replaced Gemini 2.5 Flash as the default model in the Gemini app the same day. Google described it as "frontier intelligence at a fraction of the cost," delivering Pro-level capability at roughly one-quarter of Pro's standard pricing and three times the throughput. Gemini 3 Flash inherits Gemini 3 Pro's 1 million token context window, 64,000 token output limit, multimodal input set, and full tool integration including Google Search grounding, code execution, URL context, computer use, and function calling.[6][8]
The Flash benchmark profile includes 90.4% on GPQA Diamond, 33.7% on Humanity's Last Exam without tools, and 78% on SWE-bench Verified. The SWE-bench score is the data point that surprised many reviewers, because it exceeded Gemini 3 Pro's 76.2% on the same benchmark at launch. For developers building agentic coding pipelines or high-volume multimodal classification systems, Gemini 3 Flash became the recommended entry point in the family.[6][8]
Pricing on the standard tier is $0.50 per million input tokens for text, image, and video inputs, $1.00 per million for audio inputs, and $3.00 per million output tokens. Context caching is available with up to 90% cost reduction for repeated long-context prompts, and the Batch API provides a 50% discount for asynchronous processing.[8][12]
Gemini 3.1 Flash-Lite arrived on March 3, 2026 as the cost-optimized member of the family, targeting high-volume, latency-sensitive workloads such as translation, classification, content moderation, UI generation, and large-scale data extraction. It shares the 1 million token context window of the rest of the family and supports the same multimodal input set. Google reports a 2.5x faster time-to-first-token than Gemini 2.5 Flash and a 45% increase in output throughput while maintaining comparable or superior quality on the benchmarks Google tracks for Flash-Lite.[10]
Flash-Lite scored 86.9% on GPQA Diamond, 76.8% on MMMU Pro, and posted an Elo score of 1432 on the Arena.ai leaderboard. The standard pricing is $0.25 per million input tokens for text, image, and video, $0.50 per million for audio, and $1.50 per million output tokens, which makes Flash-Lite roughly one-eighth the standard input cost of Gemini 3 Pro and half the input cost of Gemini 3 Flash. The model became generally available on May 7, 2026 after roughly two months in preview.[10][12]
Deep Think is a reasoning mode that runs on Gemini 3 Pro's base weights by spending additional inference compute exploring multiple candidate solution paths before producing a final answer, rather than following a single chain-of-thought pass. It is exposed in the API through the Thinking Level parameter at the High setting, billed at the standard output token rate, and gated behind separate access tiers for safety reasons rather than packaged as a standalone API model.
Deep Think rolled out in three phases. Initial Ultra-tier access opened in December 2025 with the headline launch numbers (41.0% HLE, 93.8% GPQA Diamond, 45.1% ARC-AGI-2 with code execution). On February 12, 2026, Google opened a research access program that extended Deep Think to API developers and select enterprises and published a second round of results showing further gains: 48.4% on HLE without tools and 84.6% on ARC-AGI-2, the latter independently verified by the ARC Prize Foundation. Average human performance on ARC-AGI-2 is approximately 60%, which made Deep Think one of the few frontier models to approach or surpass the human baseline on a benchmark designed to resist pattern matching from training data.[11]
The staged rollout was attributed by Google to additional safety evaluation. The same multi-path reasoning that produces strong scores on hard mathematical and scientific benchmarks can produce more creative or unexpected outputs on open-ended tasks, and Google said it required additional alignment work before broader release.[1][11]
Gemini 3 Pro Image, marketed as Nano Banana Pro, shipped on November 20, 2025 as the family's first dedicated image generation variant. It supports image generation at 2K and 4K resolution, character consistency across up to five subjects, what Google describes as state-of-the-art text rendering inside images, image editing and localization, and grounding through Google Search for images that need real-world information. All Nano Banana Pro outputs carry SynthID digital watermarks. The model is available to consumers through the Gemini app and to developers through the Gemini API and Vertex AI under the model ID gemini-3-pro-image-preview.[15]
Nano Banana 2, also released as Gemini 3.1 Flash Image on February 26, 2026, is a faster, lower-cost image variant aimed at high-volume generation and editing workflows. Google's developer guidance is to use Nano Banana 2 for cost and latency, and Nano Banana Pro for quality at higher cost. Both image variants share the Gemini 3 multimodal backbone but produce images rather than text outputs.[15]
Pricing for Nano Banana Pro on the standard tier runs at $2.00 per million text or image input tokens, $12.00 per million for text or thinking output, and $120.00 per million tokens for image output. The Batch API offers per-image rates of approximately $0.067 for 1K and 2K images and $0.12 for 4K images.[12]
In addition to the main lineup, the Gemini API exposes several specialized Flash variants released as 3.1-line previews in early 2026. Gemini 3.1 Flash Live is a low-latency Live API model for real-time dialogue and voice-first applications. Gemini 3.1 Flash TTS provides low-latency speech generation for text-to-speech workloads. These are listed in the Gemini API release notes as gemini-3-1-flash-live and gemini-3-1-flash-tts and are positioned as audio-modality companions to the text and image members of the family.[16]
The table below summarizes the main Gemini 3 family variants as they stand in May 2026.
| Variant | Released | Status | Context | Output | Standard input ($/M) | Standard output ($/M) | Headline benchmark |
|---|---|---|---|---|---|---|---|
| Gemini 3 Pro | Nov 18, 2025 | Deprecated Mar 9, 2026 | 1M tokens | 64K | $2.00 / $4.00 | $12.00 / $18.00 | LMArena 1501 Elo |
| Gemini 3.1 Pro | Feb 19, 2026 | Preview (current flagship) | 1M tokens | 64K | $2.00 / $4.00 | $12.00 / $18.00 | ARC-AGI-2 77.1% |
| Gemini 3 Flash | Dec 17, 2025 | GA | 1M tokens | 64K | $0.50 (text/image/video), $1.00 (audio) | $3.00 | SWE-bench Verified 78% |
| Gemini 3.1 Flash-Lite | Mar 3, 2026 (preview); May 7, 2026 (GA) | GA | 1M tokens | 64K | $0.25 (text/image/video), $0.50 (audio) | $1.50 | GPQA Diamond 86.9% |
| Gemini 3 Deep Think | Dec 2025 (Ultra); Feb 12, 2026 (research access) | Mode on Pro weights | 1M tokens | 64K | matches Pro | matches Pro | HLE 48.4%, ARC-AGI-2 84.6% |
| Nano Banana Pro (Gemini 3 Pro Image) | Nov 20, 2025 | Preview | 1M tokens | image up to 4K | $2.00 (text/image input) | $12 (text/thinking), $120 (images) | 2K/4K image generation |
| Nano Banana 2 (Gemini 3.1 Flash Image) | Feb 26, 2026 | Preview | 1M tokens | image | n/a | n/a | High-volume image generation |
Standard input pricing for Pro and 3.1 Pro is shown as a pair because Google charges $2.00 for prompts up to 200,000 tokens and $4.00 for prompts above that threshold. Output pricing follows the same tiering ($12.00 for short prompts, $18.00 for long prompts).
The Gemini 3 family rolled out in two waves: an initial 3.x preview cluster between November and December 2025 that established the architecture and the Pro and Flash flagships, and a 3.1 refresh wave between February and May 2026 that improved benchmark scores, deprecated the original Pro preview, and added the cost-optimized Flash-Lite variant.
| Date | Event |
|---|---|
| November 18, 2025 | Gemini 3 Pro launches in preview across the Gemini app, AI Studio, Gemini API, Vertex AI, and Gemini CLI; Google Antigravity launches as agent-first IDE; AI Mode in Search rolls Gemini 3 Pro to Pro and Ultra subscribers |
| November 20, 2025 | Nano Banana Pro (Gemini 3 Pro Image) released for consumer and developer image generation |
| December 2025 | Gemini 3 Deep Think becomes available to Google AI Ultra subscribers in the Gemini app |
| December 17, 2025 | Gemini 3 Flash launches and replaces Gemini 2.5 Flash as the default model in the Gemini app |
| January 27, 2026 | Google AI Plus consumer tier launches in the United States at $7.99 per month with Gemini 3 access |
| February 12, 2026 | Gemini 3 Deep Think research access program opens to API developers; updated Deep Think benchmarks published (HLE 48.4%, ARC-AGI-2 84.6%) |
| February 19, 2026 | Gemini 3.1 Pro launches as the new flagship preview |
| February 26, 2026 | Nano Banana 2 (Gemini 3.1 Flash Image) ships as a faster, cheaper image variant |
| March 3, 2026 | Gemini 3.1 Flash-Lite ships in preview |
| March 6, 2026 | The gemini-pro-latest alias silently switches to Gemini 3.1 Pro |
| March 9, 2026 | The original gemini-3-pro-preview model string is deprecated and shut down |
| May 7, 2026 | Gemini 3.1 Flash-Lite reaches general availability |
Gemini 3 retains the tiered pricing structure Google introduced with Gemini 2.5: a Standard rate for typical synchronous workloads, a Batch rate at roughly 50% of Standard for asynchronous processing with up to 24-hour turnaround, and a Priority tier at roughly 1.8 to 1.9 times Standard for guaranteed capacity on production deployments. Context caching (read) is billed at roughly 10% of the standard input rate, plus an hourly storage fee of $4.50 per million tokens for cached content. Google Search grounding is free for the first 5,000 prompts per project per month across the Gemini 3 family, with additional queries billed at $14 per 1,000.
The table below collects the standard tier per-million-token pricing for the main family members in May 2026. Prompts above 200,000 tokens carry a long-context surcharge for Pro variants only.
| Variant | Standard input | Standard output | Batch input | Batch output | Priority input | Priority output |
|---|---|---|---|---|---|---|
| Gemini 3.1 Pro (<=200k) | $2.00 | $12.00 | $1.00 | $6.00 | $3.60 | $21.60 |
| Gemini 3.1 Pro (>200k) | $4.00 | $18.00 | $2.00 | $9.00 | $7.20 | $32.40 |
| Gemini 3 Flash (text) | $0.50 | $3.00 | $0.25 | $1.50 | $0.90 | $5.40 |
| Gemini 3 Flash (audio) | $1.00 | $3.00 | $0.50 | $1.50 | $1.80 | $5.40 |
| Gemini 3.1 Flash-Lite (text) | $0.25 | $1.50 | $0.125 | $0.75 | $0.45 | $2.70 |
| Gemini 3.1 Flash-Lite (audio) | $0.50 | $1.50 | $0.25 | $0.75 | $0.90 | $2.70 |
| Nano Banana Pro | $2.00 | $12.00 (text), $120.00 (image) | $1.00 (text) | $6.00 (text), per-image rates for batch | n/a | n/a |
Flash-Lite at $0.25 per million input tokens for text inputs sits at one-eighth of Gemini 3 Pro on the Standard tier and slightly above the price floor for any model in the wider Gemini line, which makes it the recommended choice for translation pipelines, content moderation, large-scale extraction, and similar workloads where intelligence per dollar is the binding constraint. Gemini 3 Flash at one-quarter of Pro pricing is the family's sweet spot for production agent and reasoning workloads. Pro and 3.1 Pro are the family's premium options for the hardest reasoning, long-context analysis, or coding tasks where the additional headline benchmark capability justifies the cost.
For consumer-facing access, Google offers three subscription tiers that bundle Gemini 3 family models with rate-limit and feature differences:
| Plan | Monthly price (US) | Gemini 3 family access |
|---|---|---|
| Google AI Plus | $7.99 | Gemini 3.1 Pro at lower limits; Gemini 3 Flash; Nano Banana Pro |
| Google AI Pro | $19.99 | Gemini 3.1 Pro at full 1 million token context; expanded NotebookLM access |
| Google AI Ultra | $249.99 | Gemini 3 Deep Think; Gemini Agent; highest rate limits across all family members |
Google made Gemini 3 available across consumer, developer, and enterprise surfaces from launch day. Every variant in the family is exposed through the same set of products, with feature gating handled per-tier rather than per-surface.
| Surface | Audience | Access | Notes |
|---|---|---|---|
| Gemini app | Consumers | Free, Plus, Pro, Ultra | Default model rotates with each Flash release; Deep Think and Gemini Agent require Ultra |
| AI Mode in Search | Consumers | Pro and Ultra subscribers | Gemini 3.1 Pro powers complex multi-step research queries |
| Google AI Studio | Developers | Free with rate limits | Web playground and API key management |
| Gemini API | Developers | Pay per token | Direct REST and SDK access to all family members |
| Vertex AI | Enterprises | Google Cloud billing | Regional endpoints, IAM, VPC-SC, Gemini Enterprise Agent Platform |
| Gemini CLI | Developers | Local install | Terminal client for the Gemini API |
| Google Antigravity | Developers | Free public preview | Agentic IDE on macOS, Windows, Linux with multi-model support |
| NotebookLM | Pro and Ultra subscribers | Subscription | Reasoning over uploaded sources with Gemini 3.1 Pro |
| Gemini Code Assist | Developers, enterprises | Free tier and paid | IDE assistant for VS Code, JetBrains, Android Studio |
| Third-party tools | Developers | Vendor specific | Cursor, GitHub Copilot, JetBrains AI, Replit, Manus, Cline |
The day-one availability across all of these surfaces simultaneously was a deliberate departure from the staged rollouts of earlier Gemini generations, where Pro variants typically appeared in the API days or weeks before reaching consumer products. Constellation Research and other industry analysts noted the coordinated launch as a maturation of Google's deployment strategy, framing Gemini 3 as a product ecosystem rather than a standalone model release.[17]
Gemini 3 inherited an unusually large installed base. The Gemini app reported 650 million monthly active users at the time of the November 18, 2025 launch, and Google's AI Overviews in Search served roughly 2 billion monthly users across the same period. Google said 70% of its Google Cloud customers were using AI in some form and reported 13 million developers building with its generative models. These figures collapsed the Gemini line as a whole rather than Gemini 3 specifically, but they describe the surface area into which Gemini 3 was deployed.[1]
By early 2026, community estimates from SQ Magazine placed total Gemini app monthly active users at approximately 350 million, up from roughly 200 million before the Gemini 3 launch. Those figures track app users specifically rather than across all surfaces (Search, Workspace, third-party integrations) and exclude users whose only Gemini exposure runs through AI Overviews or AI Mode.
Enterprise adoption was anchored around named launch partners. Box, Shopify, Wayfair, Rakuten, and Thomson Reuters piloted Gemini 3 Pro for tasks ranging from content management and presentation generation to legal reasoning, multi-lingual transcription, and supply chain planning. Thomson Reuters reported significant gains on legal reasoning and contract understanding. Shopify cited improvements on structured business tasks requiring precision and consistency. Comeen, a workplace video platform, reported eliminating a multi-day, multi-vendor subtitle production process by switching to Gemini-powered multilingual generation that produces results across 40 languages in a single pass.
Third-party developer tooling integration at launch included Cursor, GitHub (Copilot), JetBrains, Replit, Manus, Cline, and Android Studio. Several of these partners reported double-digit improvements in success rates on difficult coding tasks compared to Gemini 2.5 Pro within weeks of the launch.[1][2]
Every Gemini 3 family member accepts text, images, audio, video, code, and PDFs through a unified token space. Google describes the approach as a continuation of the "true, native multimodality" established in Gemini 1.0, where modalities share a common representation rather than being routed through specialized encoders. Per-image and per-video resolution settings (low, medium, high, ultra-high) let developers manage token usage and latency for vision-heavy workloads. PDF processing supports documents up to 1,000 pages, which makes it practical to load entire contract libraries, multi-year financial filings, or comprehensive clinical trial reports into a single prompt.
Family-level multimodal performance is led by Gemini 3 Pro's 87.6% score on Video-MMMU, the top score on that leaderboard at launch, and 81.0% on MMMU-Pro. Gemini 3 Flash trades small amounts of multimodal accuracy for substantial speed and cost gains, while Flash-Lite reaches 76.8% on MMMU Pro at one-eighth the input cost of Pro.[2][6][10]
The 1 million token input context window applies uniformly across the family. In practice, that capacity equates to roughly 750,000 words of text, several hours of audio, or a combination of modalities. The 64,000 token output limit applies to all variants. Context caching is available across all members and offers up to 90% cost reduction for repeated long-context prompts, which makes the family more economical than Gemini 2.5 for agentic workflows that reuse large system prompts across many turns.
Community reports from the Google developer forums document degraded retrieval quality above 200,000 tokens of input on Pro and Flash, with hallucination rates rising in the 800,000 to 900,000 token range. The MRCR v2 score of 77.0% at the 128,000 token level reported for Gemini 3 Pro suggests measurable retrieval quality decline well below the advertised maximum, which is a family-wide caveat for very long-context workloads.[18]
Thinking is on by default for every Gemini 3 variant. The Thinking Level parameter takes four values (minimal, low, medium, high) and allocates internal reasoning compute accordingly. At the High setting, the model behavior on Gemini 3 Pro and Gemini 3.1 Pro shifts into Deep Think Mini mode, which spends substantially more output tokens exploring solution paths before producing a final answer. Tool use is consistent across the family: function calling, code execution, Google Search grounding, Google Maps, URL context retrieval, file search, and computer use can be combined within a single API call, which simplifies agentic architectures that previously required orchestration code to chain tools in Gemini 2.5.[2][7]
Google described Gemini 3 as its most secure model line to date and reported reduced sycophancy, increased prompt injection resistance, and improved misuse protection compared to Gemini 2.5 Pro. The Gemini 3 Pro model card publishes safety evaluations across text-to-text, multilingual, and image-to-text categories, with an 88.06% macro-average safe rate at launch. Identified weaknesses included less reliable refusal generalization across translated queries and a finding of moderate malign misuse potential for radiological and nuclear risk categories from independent evaluators. Gemini 3.1 Pro's model card reports small additional gains in most safety metrics, with a small regression on image-to-text safety.[1][19][20]
A 2026 research paper published by Google detailed the layered indirect prompt injection defense used across the Gemini 3 family: prompt injection content classifiers, security reinforcement training, markdown sanitation, suspicious URL redaction, user confirmation flows, and end-user security notifications.[21]
The Gemini 3 family was designed with agentic workloads as a first-class use case rather than an add-on. Three changes at the family level support this. First, the model exposes server-side bash tools so agents can execute shell commands inside the Gemini API runtime rather than passing commands back to a client process. Second, the file search tool can index and retrieve over uploaded documents within the same prompt as the inference, reducing the round-trips that typical retrieval-augmented generation pipelines require. Third, computer use is a standard tool across the family, enabling automation that can read screenshots, click UI elements, type text, and verify outcomes inside the same workflow that runs the model.[2][7]
The Vending-Bench 2 result of $5,478.16 in simulated profit reported for Gemini 3 Pro at launch, on a multi-day autonomous planning benchmark where the model runs a virtual vending business, was widely cited as a concrete signal that the family's agent capabilities crossed a threshold of practical usefulness for unattended workloads. Google AI Ultra subscribers received Gemini Agent, an agentic mode in the Gemini app that uses Gemini 3 Pro and later 3.1 Pro for planning and a mix of Flash and Flash-Lite variants for execution steps.
For enterprise developers, the Gemini Enterprise Agent Platform on Vertex AI exposes the family's tool stack with IAM controls, regional endpoints, and audit logging, which became the recommended deployment path for production agentic systems where compliance and observability matter more than the bare-minimum inference cost.
Gemini 3 family multilingual coverage extends across more than 100 languages with strong performance in the major world languages. Gemini 3 Pro scored 91.8% on MMMLU (Massive Multitask Language Understanding, multilingual) and 93.4% on Global PIQA (multilingual physical commonsense), with the smaller family members trailing by single-digit margins on those benchmarks. This is one area where Flash-Lite's lower price floor translates directly into product viability: translation pipelines that previously routed through specialized cloud translation APIs migrated to Flash-Lite at $0.25 per million input tokens for use cases where the additional reasoning capability is valuable, such as preserving formatting during translation, maintaining domain-specific terminology, or handling code-switching between languages within a single document.[2][10]
Comeen, the workplace video platform mentioned in the customer adoption discussion above, used the multilingual capability across Gemini 3 family models to produce subtitles in 40 languages from a single inference call. Several launch-partner enterprise customers cited multilingual coverage as a primary reason for selecting Gemini 3 over alternative frontier models, particularly for global customer support and compliance documentation workloads.
The Gemini 3 family's combination of long context, multimodal input, agentic tooling, and the spread of pricing tiers from $0.25 to $4.00 per million input tokens covers a wide range of practical workloads. The pattern that emerged within the first six months after launch is that teams typically choose a tier based on per-task cost sensitivity and the criticality of marginal reasoning quality, rather than on capability cliffs of the kind that segmented earlier Gemini generations.
Gemini 3 Pro and Gemini 3 Flash are extensively used for code generation, multi-file refactoring, automated testing, and end-to-end agentic software development. Platforms including Cursor, GitHub Copilot, JetBrains AI, Replit, Manus, Cline, Android Studio, Google's Antigravity, and Gemini Code Assist integrate the models. Flash's 78% SWE-bench Verified score made it the recommended default for autonomous bug-fix and feature-addition workflows, while Pro and 3.1 Pro remained the choice for tasks requiring deeper architectural reasoning. Early adopter reports at launch cited double-digit improvements in coding task success rates compared to Gemini 2.5 Pro.[2][6]
The 1 million token context window combined with PDF support up to 1,000 pages makes the family practical for analyzing large document collections in a single prompt. Use cases at launch partners included entire contract libraries, multi-year financial filings, comprehensive clinical trial reports, and full codebase audits. Thomson Reuters reported significant gains on legal reasoning and contract understanding by switching to Gemini 3 Pro from prior models. For code-specific repository analysis, full repositories of medium-sized projects can be loaded into a single prompt for refactoring, security review, or dependency analysis without the chunking overhead that earlier generations required.
Box piloted Gemini 3 Pro for content management; Shopify cited improvements on structured business tasks requiring precision and consistency; Wayfair, Rakuten, and Thomson Reuters used it across content workflows from product copy to legal review. PwC integrated Gemini 3 into its Google Cloud AI Center of Excellence offering for organizations deploying Gemini Enterprise agents. Randstad reported a double-digit reduction in sick days after deploying Gemini for Workspace, attributed in part to multilingual and inclusive communication improvements made possible by the model's stronger non-English performance.
Gemini 3 Pro's 91.9% on GPQA Diamond exceeded the human expert baseline of approximately 89.8%, and Gemini 3 Deep Think extended that further. The combination of strong scientific reasoning, the 1 million token context window for absorbing large literature collections, and Google Search grounding for current publications made the family attractive for research augmentation tasks: literature review, hypothesis generation, experimental design, and synthesis across scientific domains. Deep Think's gold-medal level performance on the 2025 International Mathematical Olympiad and the 2025 ICPC World Finals carried into the Gemini 3 generation as evidence of the family's reasoning architecture being applicable to formal mathematical and algorithmic work that requires proof construction or multi-step derivation.[1][13][14]
The Gemini 3 Deep Think February 2026 research access program was specifically positioned around scientific and engineering use cases. The model card for that release includes results on CMT-Benchmark for advanced theoretical physics (50.5%) and gold-medal level performance on the 2025 International Physics and Chemistry Olympiad written sections, alongside the ARC-AGI-2 and HLE numbers cited above. Google described these results as the strongest signal yet that Gemini 3 Deep Think can support genuine scientific reasoning rather than simply retrieving and rephrasing prior knowledge.[11]
The 1 million token context window supports long-form content tasks: book-length manuscript analysis, full-season television script review, comprehensive market research synthesis across many documents, and academic literature review at scale. For video-heavy workflows, Gemini 3 Pro's 87.6% Video-MMMU score combined with audio processing supports tasks such as meeting transcription and summarization, educational video indexing, automated compliance monitoring of recorded content, and security review of long surveillance footage. Per-video resolution settings let developers manage the cost tradeoff between visual fidelity and token consumption.
Gemini 3 Pro and Gemini 3 Flash are the primary models powering the Gemini app for Google's 650+ million monthly active users. The app's AI Mode in Search uses Gemini 3 family models for complex multi-step research queries. Personal use cases include academic synthesis, interactive learning, travel and event planning, coding assistance, and creative projects. Nano Banana Pro and Nano Banana 2 add image generation to the consumer product set, with all outputs marked by SynthID watermarks for provenance tracking.
Gemini 3.1 Flash-Lite's $0.25 per million input tokens (text/image/video) and $1.50 per million output tokens make a class of workloads economically viable that were marginal at Gemini 2.5 Flash-Lite's higher rates. Examples include real-time content moderation at scale, large-batch translation for localization pipelines, automated UI-to-design extraction, telemetry classification, and customer support intent routing. The 2.5x faster time-to-first-token over Gemini 2.5 Flash also makes Flash-Lite suitable for interactive consumer applications where response latency is the binding user-experience constraint.[10]
The Gemini 3 family launched into a frontier model field that already included GPT-5 and Grok 4 from xAI, with Anthropic's Claude Opus 4.5 shipping shortly after Gemini 3 Pro. The table below collects launch-window comparisons drawn from each vendor's published model cards and third-party benchmark roundups. Numbers reflect the publicly reported scores for each vendor's top-tier flagship model at the equivalent point in their generational cycle.
| Benchmark | Gemini 3 Pro | Gemini 3.1 Pro | GPT-5.1 | Claude Opus 4.5 | Grok 4.1 |
|---|---|---|---|---|---|
| Humanity's Last Exam (no tools) | 37.5% | not reported | ~26.5% | 13.7% | not reported |
| GPQA Diamond | 91.9% | not reported | 88.1% | 92.8% | 94% |
| ARC-AGI-2 | 31.1% | 77.1% | 17.6% | not reported | not reported |
| SWE-bench Verified | 76.2% | not reported | ~72% | 80.9% | 75% |
| LMArena Elo | 1501 | not reported | not reported | not reported | not reported |
| Context window | 1M | 1M | 400K | 200K | 2M |
| Standard input ($/M, short prompts) | $2.00 | $2.00 | not directly comparable | $5.00 | substantially cheaper |
| Standard output ($/M, short prompts) | $12.00 | $12.00 | not directly comparable | $25.00 | substantially cheaper |
At launch, Gemini 3 Pro led on Humanity's Last Exam, ARC-AGI-2, and LMArena Elo. Grok 4.1 offered the largest context window. Claude Opus 4.5 led on SWE-bench Verified at 80.9%, the first model to cross the 80% threshold on that benchmark, and matched or exceeded Gemini 3 Pro on GPQA Diamond. The five models occupied different cost tiers: Grok 4.1 was substantially cheaper per token, Claude Opus 4.5 was substantially more expensive, and Gemini 3 Pro and Gemini 3.1 Pro sat in the middle. Reviewers generally concluded that Gemini 3 Pro pushed clear of the field on multi-step reasoning, math, and multimodal benchmarks while sitting below Claude Opus 4.5 on real-world coding tasks.[3][22][23]
The February 2026 Gemini 3 Deep Think research access scores (48.4% HLE, 84.6% ARC-AGI-2, Codeforces 3455 Elo, gold-medal IMO 2025) and the Gemini 3.1 Pro ARC-AGI-2 result (77.1%) raised the family's headline benchmark profile materially during the first quarter of 2026, though direct comparison to peer models requires caution because of differences in evaluation methodology, tools allowed, and time-to-result.
The Gemini 3 family is exposed through the Gemini API, the Google AI Studio web playground, the Vertex AI enterprise endpoint, the Gemini CLI terminal client, and language SDKs in Python, JavaScript, Go, and Java. The same underlying API supports all family members, with the model selected through a model parameter that takes IDs such as gemini-3.1-pro-preview, gemini-3-flash-preview, and gemini-3.1-flash-lite. Migration between family members typically requires only changing this model ID parameter and adjusting prompt structure for the cost and latency budget of the chosen tier.[7][16]
Three developer-experience changes at the family level differentiate Gemini 3 from Gemini 2.5. Tool use composition, mentioned in the reasoning section above, allows multiple built-in tools to be combined within a single API request. The Thinking Level parameter replaces the token-budget reasoning control with a categorical setting that requires less manual tuning. And the model now produces encrypted thought signatures that are preserved across multi-turn conversations, removing the need for developers to manually pass reasoning state between turns of a function-calling or agentic workflow.
Google's prompt engineering guidance for Gemini 3 differs from earlier generations in two notable ways. The recommended default temperature was lowered from 1.0 to a model-managed value, with documentation explicitly noting that lowering temperature manually can degrade performance on certain tasks. And Google recommends direct, concise prompts rather than the verbose, instruction-heavy prompts that worked well on Gemini 2.5 Pro, citing the model's improved instruction-following capability.
For cost management, Google introduced a tier-based long-context surcharge on Pro variants only: prompts above 200,000 tokens shift to a higher per-token rate. Flash and Flash-Lite do not carry this surcharge. Context caching is available across all family members and offers up to 90% cost reduction for repeated long-context prompts, which is particularly attractive for agentic workloads that reuse large system prompts across many turns.[12]
The migration from Gemini 3 Pro to Gemini 3.1 Pro in early 2026 illustrated both the benefits and the friction of the family's preview-heavy release model. The benefit was a substantial benchmark improvement (most visibly on ARC-AGI-2) at no list-price increase. The friction was a compressed three-and-a-half-month deprecation cycle: integrations that hard-coded the original gemini-3-pro-preview model ID had to migrate before March 9, 2026, and the silent switch of the gemini-pro-latest alias on March 6 caught some teams by surprise. Google's developer forum post on the migration acknowledged the unusually short deprecation window and committed to longer support periods on subsequent preview generations.[5]
Gemini 3's November 18, 2025 launch drew widespread coverage. TechCrunch led with "Google launches Gemini 3 with new coding app and record benchmark scores," highlighting the HLE record and the simultaneous Antigravity debut. VentureBeat described the release as Google "claiming the lead in math, science, multimodal, and agentic AI benchmarks," and CNBC framed it competitively as "battle with OpenAI intensifies." The 1501 LMArena Elo score, the first time any model crossed 1500 on that leaderboard, was widely cited as a concrete signal of benchmark leadership at launch.[1][22][23][24]
Developer reception of the family as a whole has been broadly positive. The two most-noted takeaways across reviewer coverage have been the surprising strength of Gemini 3 Flash on coding benchmarks (its 78% SWE-bench Verified exceeded Gemini 3 Pro's 76.2% at launch) and the unusually large jump in ARC-AGI-2 scores from Gemini 3 Pro through Gemini 3 Deep Think and on to Gemini 3.1 Pro, which collectively turned ARC-AGI-2 from a benchmark where frontier models scored in single digits into one where the leading family member crossed 84% within ten months. Average human performance on ARC-AGI-2 is roughly 60%, and the verified Gemini 3 Deep Think result of 84.6% became one of the few independently confirmed cases of a frontier LLM substantially exceeding the human baseline on a task designed to resist memorization.[6][11]
Criticism has clustered around three areas. The most-cited is cost: Gemini 3 Pro's $12 per million output tokens at the standard rate is more than double Gemini 2.5 Pro's published rate, and the long-context surcharge above 200,000 tokens raised concerns among teams running agentic workloads with large repository context. Second, the practical context degradation past 200,000 tokens documented in Google's own developer forums has tempered enthusiasm for the headline 1 million token figure. Third, the deprecation cycle for Gemini 3 Pro was unusually compressed: roughly three and a half months between launch and the March 9, 2026 shutdown of the original preview model string, which forced developers into earlier-than-usual migration work.[5][12][18]
Several limitations apply at the family level rather than to individual variants.
Knowledge cutoff. All Gemini 3 family models share a January 2025 training data cutoff. Information about events, software releases, scientific publications, and other developments after that date is not present in the model weights. Applications that need current information must rely on Google Search grounding or pass context directly in the prompt. The widely shared "temporal shock" exchange between AI researcher Andrej Karpathy and Gemini 3 Pro illustrated the failure mode where the model refused to update on user-provided evidence of the date when search tools were unavailable. Details of that incident are covered in the Gemini 3 Pro article.[25]
Practical context window. While 1 million tokens is the advertised capacity for all family members, retrieval performance degrades meaningfully above 200,000 tokens, and community reports document hallucinations at 800,000 tokens and above. Long-context workflows should plan for retrieval-augmented generation rather than raw context loading at the upper end of the window.[18]
Preview status. All Gemini 3 family members shipped in preview, and several remained in preview as of May 2026. Production SLA commitments, stable model versioning, and the most stringent enterprise compliance certifications were either unavailable or evolving during this period. Gemini 3.1 Flash-Lite was the first family member to reach general availability (May 7, 2026), and Gemini 3 Flash followed shortly thereafter. Gemini 3.1 Pro and the Nano Banana variants remained in preview.[5][16]
Hallucinations and factual accuracy. Like all large language models, Gemini 3 family members can produce factually incorrect outputs, particularly on long multi-turn conversations or when processing very long context. Comparative safety testing on Gemini 3 Pro found an 88.06% macro-average safe rate, meaning roughly 12% of responses to safety-relevant prompts failed evaluation criteria. High-stakes deployments in legal, medical, or financial domains should layer retrieval and human verification on top of model outputs.[19]
No native image or audio output from text variants. Gemini 3 Pro, Gemini 3.1 Pro, Gemini 3 Flash, and Gemini 3.1 Flash-Lite output text only. Image generation is handled by the Nano Banana family (Nano Banana Pro and Nano Banana 2). Audio generation is handled separately by Gemini 3.1 Flash TTS, and real-time audio dialogue is handled by Gemini 3.1 Flash Live.[16]
Latency at high thinking levels. The Thinking Level parameter at its highest setting produces substantially increased time-to-first-token across all variants. Interactive applications sensitive to response latency should choose lower thinking levels, or default to Gemini 3 Flash or Flash-Lite, which are tuned for faster response.