Gemini 3 Flash
Last reviewed
May 16, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 ยท 3,001 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 16, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 ยท 3,001 words
Add missing citations, update stale details, or suggest a clearer explanation.
Gemini 3 Flash is a multimodal large language model released by Google on December 17, 2025 as the fast, lower-cost sibling to Gemini 3 Pro in the Gemini 3 family. Built by Google DeepMind, it is positioned as a frontier reasoning model with throughput and pricing tuned for high-volume use, agentic workloads, and consumer products. On launch day Google made it the default model in the Gemini app and in AI Mode for Google Search, displacing the older Gemini 2.5 Flash. Developers can access it through the Gemini API in Google AI Studio, Vertex AI, Gemini Enterprise, Google Antigravity, Gemini CLI, and Android Studio.
The model lists a 1 million token context window, supports text, image, audio, video, and PDF inputs with text output, and has a stated knowledge cutoff of January 2025. API pricing was set at $0.50 per million input tokens and $3.00 per million output tokens at launch, with a $1.00 per million rate for audio input. On Google's published charts, Gemini 3 Flash matches or beats the prior generation Gemini 2.5 Pro on several reasoning and coding benchmarks while running roughly three times faster and using about 30 percent fewer tokens for the same tasks. That combination of price, speed, and quality is the main reason coverage from outlets such as TechCrunch and 9to5Google framed the launch as the moment a mid-tier Flash model started to credibly compete with last year's flagships.
Google has used the "Flash" label since 2024 for the smaller, faster member of each Gemini generation. Gemini 1.5 Flash arrived in May 2024 as a distilled sibling to Gemini 1.5 Pro with a long context window and lower latency. Gemini 2.0 Flash followed in December 2024 with native tool use and faster multimodal output. Gemini 2.5 Flash, released in 2025, introduced explicit "thinking" controls that let developers trade extra reasoning tokens for higher quality on harder prompts. By the time Gemini 3 Pro shipped in November 2025, the Flash tier had become Google's volume product, powering the free tier of the Gemini app, AI Mode in Search, and a large share of Vertex AI traffic.
Gemini 3 Flash continues that lineage but closes more of the gap to the Pro model than any prior Flash release. Google's product team described it as the place where "frontier intelligence" meets a price point compatible with serving billions of queries a day. Tulsee Doshi, senior director and head of product for Gemini Models, told TechCrunch that "Flash is just a much cheaper offering from an input and output price perspective," framing the model as the right default for most users while reserving Gemini 3 Pro for advanced math and coding work. The launch came roughly one month after Gemini 3 Pro and was timed to fold the new Flash tier into the Gemini app's holiday rollout, with global availability for the AI Mode feature in Search starting the same week.
Gemini 3 Flash inherits the same multimodal architecture as Gemini 3 Pro and trims it for speed rather than removing modalities. The model accepts text, images, audio, video, and PDFs, and writes back text. Inside the Gemini app the model is exposed as two modes: a default "Fast" mode for low-latency answers and a "Thinking" mode that allocates extra reasoning tokens before producing a response. Developers using the API can switch between these modes per request.
| Capability | Details |
|---|---|
| Context window | 1,048,576 input tokens (about 1 million) |
| Maximum output | 65,536 tokens on developer surfaces |
| Input modalities | Text, image, audio, video, PDF |
| Output modality | Text |
| Reasoning modes | Fast and Thinking, selectable in app and API |
| Tool use | Function calling, code execution, Google Search grounding |
| Knowledge cutoff | January 2025 |
| Knowledge access | Live web grounding via Search when enabled |
| Caching | Context caching at $0.05 per million tokens (text, image, video) |
| Batch mode | 50 percent discount on input and output |
Google highlighted several capability themes for the launch. The first is agentic coding, where the model can plan multi-step changes, call tools, and run iterative test loops inside environments like Google Antigravity and Android Studio. The second is document and data extraction at scale, including handwriting, contracts, and financial statements. The third is video and image understanding for visual question answering, slide generation, and creative work. Google also surfaced "generative UI" features in the Gemini app, where the model produces images, tables, and small interactive components alongside text.
On the consumer side, the Gemini app gained the ability to brainstorm a product launch, draft an email, build a lesson plan, generate a practice quiz, or summarize a long report using Gemini 3 Flash without users having to opt into a paid plan. For Workspace customers the model rolled out across Business Starter, Standard, and Plus tiers, Enterprise variants, Education plans, Frontline editions, and Nonprofits subscriptions, with admins able to gate access through the Generative AI settings in the Workspace Admin console.
Google published a small set of benchmark numbers at launch and Artificial Analysis added independent measurements over the following weeks. The figures below are the ones Google or Artificial Analysis stated directly. Where a benchmark is not listed here, no public number was available at the time of writing and so it has been omitted.
| Benchmark | Gemini 3 Flash | Notes |
|---|---|---|
| GPQA Diamond | 90.4% | Graduate-level science questions, with thinking enabled |
| Humanity's Last Exam | 33.7% | No tool use |
| MMMU Pro | 81.2% | Google described this as state-of-the-art at launch |
| SWE-bench Verified | 78% | Software engineering bug-fix benchmark |
| Toolathlon | 49.4% | Multi-tool agent evaluation |
| MCP Atlas | 57.4% | Model Context Protocol benchmark |
| Artificial Analysis Intelligence Index | 46 | Composite of ten reasoning, agent, and coding tests |
| Output speed | 167 to 180 tokens per second | Measured by Artificial Analysis |
| Time to first token | 7.4 seconds | Reasoning mode, measured by Artificial Analysis |
The headline result is that Gemini 3 Flash matches or exceeds Gemini 2.5 Pro on every benchmark Google chose to publish, while sitting close behind Gemini 3 Pro and ahead of it on MMMU Pro, Toolathlon, and MCP Atlas. On Google's enterprise blog, partner reports added a second set of numbers from production workloads. Box measured a 15 percent accuracy lift over Gemini 2.5 Flash on document extraction. Harvey reported a 7 percent improvement on BigLaw Bench reasoning. Geotab saw a 10 percent baseline improvement on agentic coding tasks. Warp recorded an 8 percent lift on command-line error-fix accuracy.
On the Artificial Analysis Intelligence Index, the reasoning configuration of Gemini 3 Flash scored 46, placing it in the upper third of the 146 models tracked at the time. The same evaluation cost about $278 to run across the index, with a blended cost of $1.13 per million tokens at a 3:1 input-to-output ratio. Output speed of 167 tokens per second placed it among the fastest reasoning models tested.
Google priced Gemini 3 Flash slightly above the 2.5 generation, reflecting the higher quality at equivalent throughput. The standard rate is $0.50 per million input tokens for text, image, and video, $1.00 per million for audio input, and $3.00 per million for output. Context caching costs $0.05 per million cached text, image, or video tokens, $0.10 per million for audio, and $1.00 per hour for storage. Batch mode applies a 50 percent discount on both input and output.
| Tier | Input (text, image, video) | Input (audio) | Output | Cache read |
|---|---|---|---|---|
| Standard | $0.50 / 1M | $1.00 / 1M | $3.00 / 1M | $0.05 / 1M |
| Batch | $0.25 / 1M | $0.50 / 1M | $1.50 / 1M | $0.025 / 1M |
For reference, Gemini 2.5 Flash was priced at $0.30 per million input and $2.50 per million output tokens, so the 3 Flash standard tier is roughly 67 percent more expensive on input and 20 percent more expensive on output. Google's argument for the increase is that the model uses about 30 percent fewer tokens on thinking-heavy tasks and runs about three times faster, which typically nets out to a lower total bill for agentic and reasoning workloads. Gemini 3 Pro, by contrast, lists at $2.00 per million input and $12.00 per million output tokens for prompts under 200,000 tokens, so Flash is four times cheaper on input and four times cheaper on output than the Pro model in its preview pricing.
Gemini 3 Flash sits in the fast, high-volume tier of frontier models alongside Claude Haiku 4.5 from Anthropic, GPT-5 mini from OpenAI, and DeepSeek V4 from the Chinese open-weights lab. The fastest way to read the landscape is by price and benchmark score; the differences in design philosophy then explain why developers pick one over the other.
| Model | Vendor | Input price | Output price | Context | Notable strength |
|---|---|---|---|---|---|
| Gemini 3 Flash | $0.50 / 1M | $3.00 / 1M | 1M tokens | Multimodal, speed, agentic coding | |
| Claude Haiku 4.5 | Anthropic | $1.00 / 1M | $5.00 / 1M | 200K tokens | Agentic reliability, computer use |
| GPT-5 mini | OpenAI | Variable | Variable | 128K output | Math reasoning, broad availability |
| DeepSeek V4 | DeepSeek | Lower | Lower | 128K tokens | Open weights, cost |
Against Claude Haiku 4.5, Gemini 3 Flash is roughly 1.7 times cheaper on input and output and reports a higher score on SWE-bench Verified, where Haiku 4.5 sits at 73.3 percent compared to 78 percent for Flash. Haiku 4.5 retains an edge on agentic autonomy benchmarks such as computer use, where Anthropic's safety and instruction-following training produces more reliable multi-step behavior. Reviewers writing about both models tend to recommend Haiku 4.5 for long-horizon agents that must operate without supervision, and Gemini 3 Flash for high-throughput backends, multimodal pipelines, and anything that needs the 1 million token context window.
Google did not publish a direct benchmark comparison to GPT-5 mini, but third-party reviewers have repeatedly noted that GPT-5 mini leads on mathematical competition problems such as AIME 2025 while Gemini 3 Flash leads on multimodal and coding benchmarks. The two models cover broadly similar ground for general chat, summarization, and lightweight agent work. DeepSeek V4, where pricing has been published, undercuts both on raw token cost and is the obvious choice for teams that want open weights or local inference, with the trade-off that multimodal coverage and grounding integrations are narrower than what Google ships through Vertex AI.
Within Google's own lineup, Gemini 3 Pro remains the model of choice for the hardest reasoning and coding tasks, particularly those that benefit from the extra capacity of the Pro architecture. The Gemini app exposes the choice directly: "Pro" for advanced math and coding, "Fast" for everyday questions, and "Thinking" inside Flash for cases where users want more deliberation without paying for Pro.
Google's launch material and partner case studies pointed to several workloads where Gemini 3 Flash is the recommended default.
Agentic coding is the most prominent example. Inside Google Antigravity and Android Studio, the model handles planning, code generation, and tool execution with low enough latency to support iterative debugging. Geotab cited a 10 percent improvement on baseline agentic coding tasks compared to the previous Flash generation, while Warp reported an 8 percent lift on command-line error fixing. The 1 million token context window is particularly useful for working across large repositories without context window juggling.
Document and data extraction is the second major use case. Box uses Gemini 3 Flash to read handwritten notes, contracts, financial statements, and other document types at scale, reporting a 15 percent accuracy improvement compared to Gemini 2.5 Flash. Harvey applies the model to legal reasoning tasks on BigLaw Bench. Bridgewater Associates uses it for long-context multimodal reasoning across mixed financial datasets.
Real-time assistants and chat backends are a third category. Because Flash returns roughly 180 output tokens per second, it can drive conversational interfaces, customer support bots, and in-app assistants without the perceptible pauses that larger reasoning models introduce. The Gemini app itself is the most visible deployment: every free-tier and most paid-tier user conversations now route through Gemini 3 Flash by default.
Creative and generative work makes up the fourth bucket. Figma uses the model for rapid prototype generation from natural-language briefs. Salesforce integrated it into Agentforce for sales and service workflows. Workday rolled it out across its AI-first productivity strategy. On the consumer side, the model handles brainstorming, lesson planning, quiz generation, and visual answer composition with images and tables.
Finally, search and grounded answer generation is the largest workload by volume. AI Mode in Google Search uses Gemini 3 Flash to read web results, synthesize summaries, and generate the visual answer cards that appear at the top of many queries. The combination of low cost, fast response, and the Google Search grounding tool makes the model a natural fit for that surface, which is why Google was willing to roll it out as the default across hundreds of millions of users on launch day.
The reception in technology press was broadly positive, with most coverage framing Gemini 3 Flash as the first mid-tier model that genuinely threatens the previous generation of flagship offerings. TechCrunch's launch story focused on the price-per-quality argument and quoted Doshi on the cost benefits. 9to5Google described it as offering "Pro-level performance" at Flash pricing and noted the immediate rollout to the Gemini app and AI Mode. Google Cloud's enterprise blog leaned on partner testimonials from Box, Harvey, Geotab, Warp, Salesforce, Workday, Figma, and Bridgewater to argue the model was production-ready on day one.
Independent benchmarkers reached similar conclusions. Artificial Analysis placed the reasoning configuration at 46 on its Intelligence Index, top third of all models tracked, and rated its output speed of around 167 tokens per second as among the fastest reasoning models in the field. The LLM Stats database recorded the 1 million token context window and January 2025 knowledge cutoff and confirmed Google's pricing.
Some reviewers pushed back on the headline that Flash had "made the flagship obsolete." The argument they make is that the price differential matters most for very high-volume workloads, while Pro retains a meaningful lead on the hardest mathematics and coding problems. A widely circulated Medium piece argued that Flash had collapsed 69 percent of the Pro price for tasks where the quality gap was small to invisible, but conceded that Pro remained the right choice for frontier research applications.
Within developer communities the most common complaint has been the price increase compared to Gemini 2.5 Flash. The standard input rate of $0.50 per million tokens is 67 percent more expensive than the older Flash model, and teams that ran high-volume but quality-tolerant pipelines on 2.5 Flash had to redo their cost models. Google's counter-argument is that the 30 percent reduction in tokens used and the three times speed-up offset the per-token increase for most realistic workloads, but the math depends heavily on how reasoning-heavy a particular application is.
The "Fast" and "Thinking" mode split in the Gemini app drew mixed reviews. Power users praised the ability to dial reasoning effort up or down without switching to Pro, while casual users reported confusion over when to use each mode. Google's documentation describes Fast as the right choice for everyday questions and Thinking as the choice for complex problem-solving, with Pro reserved for advanced mathematics and coding.
Reports from partners using the enterprise preview were uniformly positive. Salesforce, Workday, Figma, Box, Harvey, Geotab, Warp, and Bridgewater each contributed quotes to Google's launch material. While these are vendor-curated testimonials, the breadth of named partners is unusual for a Flash-class launch and suggests Google had invested in pre-release enablement well before the public rollout.
Overall, the launch was seen as confirmation that the Gemini 3 series, started by Gemini 3 Pro one month earlier, had moved Google into a leadership position on the frontier price-performance frontier for the first half of 2026. Reviewers writing in early 2026 routinely cite Gemini 3 Flash as the default option for anyone building a new multimodal application without a strong reason to pick a competitor.