Gemini 3 Flash

AI Models Google Large Language Models Multimodal AI

18 min read

Updated Jun 27, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 27, 2026

Fact-checked

In review queue

Sources

12 citations

Revision

v3 · 3,694 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Gemini 3 Flash is a multimodal large language model released by Google on December 17, 2025 as the fast, lower-cost sibling to Gemini 3 Pro in the Gemini 3 family. Built by Google DeepMind, it pairs a 1 million token context window with a controllable reasoning budget and is priced at $0.50 per million input tokens and $3.00 per million output tokens, which Google describes as delivering "Pro-grade reasoning with Flash-level latency, efficiency and cost" at "less than a quarter the cost of 3 Pro."^[1]^[2] On launch day Google made it the default model in the Gemini app and in AI Mode for Google Search, displacing the older Gemini 2.5 Flash.^[3] Developers can access it through the Gemini API in Google AI Studio, Vertex AI, Gemini Enterprise, Google Antigravity, Gemini CLI, Gemini Code Assist, and Android Studio.^[7]

The model supports text, image, audio, video, and PDF inputs with text output, lists a stated knowledge cutoff of January 2025, and can return up to about 64,000 output tokens.^[6]^[8] API pricing was set at $0.50 per million input tokens, $3.00 per million output tokens, and $1.00 per million for audio input.^[2]^[6] On Google's published charts, Gemini 3 Flash matches or beats the prior generation Gemini 2.5 Pro on several reasoning and coding benchmarks while running roughly three times faster and using about 30 percent fewer tokens for the same everyday tasks.^[1] That combination of price, speed, and quality is the main reason coverage from outlets such as TechCrunch and 9to5Google framed the launch as the moment a mid-tier Flash model started to credibly compete with last year's flagships.^[2]^[3]

What is Gemini 3 Flash?

Gemini 3 Flash is the speed-and-cost tier of Google's Gemini 3 model line, sitting one rung below Gemini 3 Pro. Google positions it as "frontier intelligence built for speed at a fraction of the cost," a thinking model that keeps the multimodal reach of the Pro tier while cutting latency and per-token price for high-volume use, agentic workloads, and consumer products.^[1] In Google's framing, "speed and scale don't have to come at the cost of intelligence."^[1]

Google has used the "Flash" label since 2024 for the smaller, faster member of each Gemini generation. Gemini 1.5 Flash arrived in May 2024 as a distilled sibling to Gemini 1.5 Pro with a long context window and lower latency. Gemini 2.0 Flash followed in December 2024 with native tool use and faster multimodal output. Gemini 2.5 Flash, released in 2025, introduced explicit "thinking" controls that let developers trade extra reasoning tokens for higher quality on harder prompts. By the time Gemini 3 Pro shipped in November 2025, the Flash tier had become Google's volume product, powering the free tier of the Gemini app, AI Mode in Search, and a large share of Vertex AI traffic.

Gemini 3 Flash continues that lineage but closes more of the gap to the Pro model than any prior Flash release. Tulsee Doshi, senior director and head of product for Gemini Models, told TechCrunch that "Flash is just a much cheaper offering from an input and output price perspective," framing the model as the right default for most users while reserving Gemini 3 Pro for advanced math and coding work.^[2] The launch came roughly one month after Gemini 3 Pro and was timed to fold the new Flash tier into the Gemini app's holiday rollout, with global availability for the AI Mode feature in Search starting the same week.^[3]

What can Gemini 3 Flash do?

Gemini 3 Flash inherits the same multimodal architecture as Gemini 3 Pro and trims it for speed rather than removing modalities. The model accepts text, images, audio, video, and PDFs, and writes back text.^[6] A single request can take in large amounts of media, with Google citing limits on the order of hundreds of images, several hours of audio, or tens of minutes of video within the 1 million token context window.^[1] Inside the Gemini app the model is exposed as two modes: a default "Fast" mode for low-latency answers and a "Thinking" mode that allocates extra reasoning tokens before producing a response. Developers using the API control the same behavior through a thinking level parameter rather than a binary toggle.

Capability	Details
Context window	1,048,576 input tokens (about 1 million)
Maximum output	About 64,000 tokens on developer surfaces
Input modalities	Text, image, audio, video, PDF
Output modality	Text
Reasoning control	thinking_level: minimal, low, medium, high (defaults to high)
Tool use	Function calling, code execution, Google Search grounding
Knowledge cutoff	January 2025
Knowledge access	Live web grounding via Search when enabled
Caching	Context caching at $0.05 per million tokens (text, image, video)
Batch mode	50 percent discount on input and output

Google highlighted several capability themes for the launch. The first is agentic coding, where the model can plan multi-step changes, call tools, and run iterative test loops inside environments like Google Antigravity and Android Studio.^[7] The second is document and data extraction at scale, including handwriting, contracts, and financial statements.^[4] The third is video and image understanding for visual question answering, slide generation, and creative work. Google also surfaced "generative UI" features in the Gemini app, where the model produces images, tables, and small interactive components alongside text.

On the consumer side, the Gemini app gained the ability to brainstorm a product launch, draft an email, build a lesson plan, generate a practice quiz, or summarize a long report using Gemini 3 Flash without users having to opt into a paid plan.^[5] For Workspace customers the model rolled out across Business Starter, Standard, and Plus tiers, Enterprise variants, Education plans, Frontline editions, and Nonprofits subscriptions, with admins able to gate access through the Generative AI settings in the Workspace Admin console.^[5]

How does the thinking budget work in Gemini 3 Flash?

Gemini 3 Flash is a "thinking" model, meaning it can spend extra internal reasoning tokens before answering. The amount of that reasoning is controllable. In the Gemini API the relevant parameter is thinking_level, which accepts four settings: minimal, low, medium, and high.^[8] Google describes these as relative allowances for the depth of the model's internal reasoning rather than strict token guarantees, and if the parameter is left unset the model defaults to high.^[8] The older thinking_budget parameter is still accepted for backward compatibility, but Google recommends migrating to thinking_level for more predictable behavior and warns against sending both in the same request.^[8]

The payoff of the higher thinking levels is accuracy on hard prompts, and the payoff of the lower levels is latency and cost. Google's headline efficiency claim is that even at the highest thinking level Gemini 3 Flash "uses 30% fewer tokens on average than 2.5 Pro, as measured on typical traffic, to accurately complete everyday tasks with higher performance," while running about three times faster.^[1] In the consumer Gemini app the same control is surfaced more simply as a choice between a default Fast response and a slower Thinking response.

How well does Gemini 3 Flash perform on benchmarks?

Google published a small set of benchmark numbers at launch and Artificial Analysis added independent measurements over the following weeks.^[1]^[7] The figures below are the ones Google or Artificial Analysis stated directly. Where a benchmark is not listed here, no public number was available at the time of writing and so it has been omitted.

Benchmark	Gemini 3 Flash	Notes
GPQA Diamond	90.4%	Graduate-level science questions, with thinking enabled
Humanity's Last Exam	33.7%	No tool use
MMMU Pro	81.2%	Google described this as state-of-the-art at launch
SWE-bench Verified	78%	Software engineering bug-fix benchmark, ahead of Gemini 3 Pro
Toolathlon	49.4%	Multi-tool agent evaluation
MCP Atlas	57.4%	Model Context Protocol benchmark
Artificial Analysis Intelligence Index	46	Composite of ten reasoning, agent, and coding tests
Output speed	167 to 180 tokens per second	Measured by Artificial Analysis
Time to first token	7.4 seconds	Reasoning mode, measured by Artificial Analysis

The headline result is that Gemini 3 Flash matches or exceeds Gemini 2.5 Pro on every benchmark Google chose to publish, while sitting close behind Gemini 3 Pro and ahead of it on MMMU Pro, Toolathlon, and MCP Atlas.^[1] Google singled out the GPQA Diamond score of 90.4% and the Humanity's Last Exam score of 33.7% (without tools) as evidence of "frontier-class performance on PhD-level reasoning and knowledge benchmarks," and the 81.2% on MMMU Pro as comparable to Gemini 3 Pro.^[1] On Google's enterprise blog, partner reports added a second set of numbers from production workloads. Box measured a 15 percent accuracy lift over Gemini 2.5 Flash on document extraction. Harvey reported a 7 percent improvement on BigLaw Bench reasoning. Geotab saw a 10 percent baseline improvement on agentic coding tasks. Warp recorded an 8 percent lift on command-line error-fix accuracy.^[4]

On the Artificial Analysis Intelligence Index, the reasoning configuration of Gemini 3 Flash scored 46, placing it in the upper third of the 146 models tracked at the time.^[7] The same evaluation cost about $278 to run across the index, with a blended cost of $1.13 per million tokens at a 3:1 input-to-output ratio. Output speed of 167 tokens per second placed it among the fastest reasoning models tested.^[7]

How much does Gemini 3 Flash cost?

Google priced Gemini 3 Flash slightly above the 2.5 generation, reflecting the higher quality at equivalent throughput. The standard rate is $0.50 per million input tokens for text, image, and video, $1.00 per million for audio input, and $3.00 per million for output.^[2]^[6] Context caching costs $0.05 per million cached text, image, or video tokens, $0.10 per million for audio, and $1.00 per hour for storage, which Google says can deliver up to 90 percent cost reductions when token reuse crosses certain thresholds.^[6]^[7] Batch mode applies a 50 percent discount on both input and output.^[6]

Tier	Input (text, image, video)	Input (audio)	Output	Cache read
Standard	$0.50 / 1M	$1.00 / 1M	$3.00 / 1M	$0.05 / 1M
Batch	$0.25 / 1M	$0.50 / 1M	$1.50 / 1M	$0.025 / 1M

For reference, Gemini 2.5 Flash was priced at $0.30 per million input and $2.50 per million output tokens, so the 3 Flash standard tier is roughly 67 percent more expensive on input and 20 percent more expensive on output.^[2] Google's argument for the increase is that the model uses about 30 percent fewer tokens on thinking-heavy tasks and runs about three times faster, which typically nets out to a lower total bill for agentic and reasoning workloads.^[1] Gemini 3 Pro, by contrast, lists at $2.00 per million input and $12.00 per million output tokens for prompts under 200,000 tokens, so Flash is four times cheaper on input and four times cheaper on output than the Pro model in its preview pricing, consistent with Google's claim of "less than a quarter the cost of 3 Pro."^[1]

How does Gemini 3 Flash compare to Gemini 3 Pro?

Within Google's own lineup, Gemini 3 Pro remains the model of choice for the hardest reasoning and coding tasks, particularly those that benefit from the extra capacity of the Pro architecture. Gemini 3 Flash trails it on several frontier evaluations but actually beats it on a few, including SWE-bench Verified (78%), MMMU Pro (81.2%), Toolathlon, and MCP Atlas, while costing roughly one quarter as much per token and running about three times faster.^[1] The Gemini app exposes the choice directly: "Pro" for advanced math and coding, "Fast" for everyday questions, and "Thinking" inside Flash for cases where users want more deliberation without paying for Pro. Doshi's summary to TechCrunch was that Flash is the right default for most users, with Pro reserved for advanced math and coding work.^[2]

How does Gemini 3 Flash compare to other fast models?

Gemini 3 Flash sits in the fast, high-volume tier of frontier models alongside Claude Haiku 4.5 from Anthropic, GPT-5 mini from OpenAI, and DeepSeek V4 from the Chinese open-weights lab. The fastest way to read the landscape is by price and benchmark score; the differences in design philosophy then explain why developers pick one over the other.

Model	Vendor	Input price	Output price	Context	Notable strength
Gemini 3 Flash	Google	$0.50 / 1M	$3.00 / 1M	1M tokens	Multimodal, speed, agentic coding
Claude Haiku 4.5	Anthropic	$1.00 / 1M	$5.00 / 1M	200K tokens	Agentic reliability, computer use
GPT-5 mini	OpenAI	Variable	Variable	128K output	Math reasoning, broad availability
DeepSeek V4	DeepSeek	Lower	Lower	128K tokens	Open weights, cost

Against Claude Haiku 4.5, Gemini 3 Flash is roughly 1.7 times cheaper on input and output and reports a higher score on SWE-bench Verified, where Haiku 4.5 sits at 73.3 percent compared to 78 percent for Flash. Haiku 4.5 retains an edge on agentic autonomy benchmarks such as computer use, where Anthropic's safety and instruction-following training produces more reliable multi-step behavior. Reviewers writing about both models tend to recommend Haiku 4.5 for long-horizon agents that must operate without supervision, and Gemini 3 Flash for high-throughput backends, multimodal pipelines, and anything that needs the 1 million token context window.

Google did not publish a direct benchmark comparison to GPT-5 mini, but third-party reviewers have repeatedly noted that GPT-5 mini leads on mathematical competition problems such as AIME 2025 while Gemini 3 Flash leads on multimodal and coding benchmarks. The two models cover broadly similar ground for general chat, summarization, and lightweight agent work. DeepSeek V4, where pricing has been published, undercuts both on raw token cost and is the obvious choice for teams that want open weights or local inference, with the trade-off that multimodal coverage and grounding integrations are narrower than what Google ships through Vertex AI.

What is Gemini 3 Flash used for?

Google's launch material and partner case studies pointed to several workloads where Gemini 3 Flash is the recommended default.

Agentic coding is the most prominent example. Inside Google Antigravity and Android Studio, the model handles planning, code generation, and tool execution with low enough latency to support iterative debugging.^[7] Geotab cited a 10 percent improvement on baseline agentic coding tasks compared to the previous Flash generation, while Warp reported an 8 percent lift on command-line error fixing.^[4] The 1 million token context window is particularly useful for working across large repositories without context window juggling.

Document and data extraction is the second major use case. Box uses Gemini 3 Flash to read handwritten notes, contracts, financial statements, and other document types at scale, reporting a 15 percent accuracy improvement compared to Gemini 2.5 Flash.^[4] Harvey applies the model to legal reasoning tasks on BigLaw Bench. Bridgewater Associates uses it for long-context multimodal reasoning across mixed financial datasets.^[4]

Real-time assistants and chat backends are a third category. Because Flash returns roughly 180 output tokens per second, it can drive conversational interfaces, customer support bots, and in-app assistants without the perceptible pauses that larger reasoning models introduce.^[7] The Gemini app itself is the most visible deployment: every free-tier and most paid-tier user conversations now route through Gemini 3 Flash by default.^[3]

Creative and generative work makes up the fourth bucket. Figma uses the model for rapid prototype generation from natural-language briefs. Salesforce integrated it into Agentforce for sales and service workflows. Workday rolled it out across its AI-first productivity strategy.^[4] On the consumer side, the model handles brainstorming, lesson planning, quiz generation, and visual answer composition with images and tables.^[5]

Finally, search and grounded answer generation is the largest workload by volume. AI Mode in Google Search uses Gemini 3 Flash to read web results, synthesize summaries, and generate the visual answer cards that appear at the top of many queries.^[3] The combination of low cost, fast response, and the Google Search grounding tool makes the model a natural fit for that surface, which is why Google was willing to roll it out as the default across hundreds of millions of users on launch day.^[3]

How was Gemini 3 Flash received?

The reception in technology press was broadly positive, with most coverage framing Gemini 3 Flash as the first mid-tier model that genuinely threatens the previous generation of flagship offerings. TechCrunch's launch story focused on the price-per-quality argument and quoted Doshi on the cost benefits.^[2] 9to5Google described it as offering "Pro-level performance" at Flash pricing and noted the immediate rollout to the Gemini app and AI Mode.^[3] Google Cloud's enterprise blog leaned on partner testimonials from Box, Harvey, Geotab, Warp, Salesforce, Workday, Figma, and Bridgewater to argue the model was production-ready on day one.^[4]

Independent benchmarkers reached similar conclusions. Artificial Analysis placed the reasoning configuration at 46 on its Intelligence Index, top third of all models tracked, and rated its output speed of around 167 tokens per second as among the fastest reasoning models in the field.^[7] The LLM Stats database recorded the 1 million token context window and January 2025 knowledge cutoff and confirmed Google's pricing.^[8]

Some reviewers pushed back on the headline that Flash had "made the flagship obsolete." The argument they make is that the price differential matters most for very high-volume workloads, while Pro retains a meaningful lead on the hardest mathematics and coding problems. A widely circulated Medium piece argued that Flash had collapsed 69 percent of the Pro price for tasks where the quality gap was small to invisible, but conceded that Pro remained the right choice for frontier research applications.^[11]

Within developer communities the most common complaint has been the price increase compared to Gemini 2.5 Flash. The standard input rate of $0.50 per million tokens is 67 percent more expensive than the older Flash model, and teams that ran high-volume but quality-tolerant pipelines on 2.5 Flash had to redo their cost models.^[2] Google's counter-argument is that the 30 percent reduction in tokens used and the three times speed-up offset the per-token increase for most realistic workloads, but the math depends heavily on how reasoning-heavy a particular application is.^[1]

The "Fast" and "Thinking" mode split in the Gemini app drew mixed reviews. Power users praised the ability to dial reasoning effort up or down without switching to Pro, while casual users reported confusion over when to use each mode. Google's documentation describes Fast as the right choice for everyday questions and Thinking as the choice for complex problem-solving, with Pro reserved for advanced mathematics and coding.

Reports from partners using the enterprise preview were uniformly positive. Salesforce, Workday, Figma, Box, Harvey, Geotab, Warp, and Bridgewater each contributed quotes to Google's launch material.^[4] While these are vendor-curated testimonials, the breadth of named partners is unusual for a Flash-class launch and suggests Google had invested in pre-release enablement well before the public rollout.

Overall, the launch was seen as confirmation that the Gemini 3 series, started by Gemini 3 Pro one month earlier, had moved Google into a leadership position on the frontier price-performance frontier for the first half of 2026. Reviewers writing in early 2026 routinely cite Gemini 3 Flash as the default option for anyone building a new multimodal application without a strong reason to pick a competitor.

ELI5: Gemini 3 Flash in plain terms

Think of Google's Gemini 3 models as two cars built on the same engine. Gemini 3 Pro is the heavy, powerful one you take on the hardest mountain roads (the trickiest math and coding). Gemini 3 Flash is the lighter, much cheaper, much faster one you drive every day. It can still read pictures, listen to audio, watch video, and remember a huge amount at once (about a million words' worth), and you can tell it to "think harder" when a question is tough or "answer quickly" when it is easy. Because it costs about a quarter of what the Pro version costs and answers about three times faster, Google made it the default brain inside the Gemini app and Google's AI search answers.

References

Google, "Introducing Gemini 3 Flash: Benchmarks, global availability," blog.google/products/gemini/gemini-3-flash/, December 17, 2025. ↩
Maxwell Zeff, "Google launches Gemini 3 Flash, makes it the default model in the Gemini app," TechCrunch, December 17, 2025. ↩
Abner Li, "Google announces Gemini 3 Flash with Pro-level performance, rolling out now," 9to5Google, December 17, 2025. ↩
Google Cloud, "Gemini 3 Flash for enterprises," cloud.google.com/blog/products/ai-machine-learning/gemini-3-flash-for-enterprises, December 17, 2025. ↩
Google Workspace Updates, "Introducing Gemini 3 Flash for the Gemini app," workspaceupdates.googleblog.com/2025/12/gemini-3-flash-for-gemini-app.html, December 17, 2025. ↩
Google AI for Developers, "Gemini Developer API pricing," ai.google.dev/gemini-api/docs/pricing. ↩
Google, "Build with Gemini 3 Flash: frontier intelligence that scales with you," blog.google/technology/developers/build-with-gemini-3-flash/, December 17, 2025; Artificial Analysis, "Gemini 3 Flash Preview (Reasoning) Intelligence, Performance and Price Analysis," artificialanalysis.ai/models/gemini-3-flash-reasoning. ↩
Google AI for Developers, "Gemini 3 Developer Guide" (thinking_level minimal/low/medium/high, defaults to high) and "Models," ai.google.dev/gemini-api/docs/gemini-3; LLM Stats, "Gemini 3 Flash Benchmarks, Pricing and Context Window," llm-stats.com/models/gemini-3-flash-preview. ↩
Data Studios, "Google Gemini 3 Flash: release, technical profile, platform rollout, and more," datastudios.org, December 2025.
OpenRouter, "Gemini 3 Flash Preview API Pricing and Benchmarks," openrouter.ai/google/gemini-3-flash-preview.
Cogni Down Under, "Google's Gemini 3 Flash Just Made the Flagship Model Obsolete, even Gemini 3 Pro, and It Costs 69% Less," Medium, December 2025. ↩
Case Western Reserve University, "Google Releases Gemini 3 Flash," case.edu/utech/about/utech-news/google-releases-gemini-3-flash, December 2025.

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

AI Model Release Timeline (2022-2026)Gemini (language model)Gemini 2.5 Flash Gemini 3.5 Flash Gemini CLI Google Vertex AI LegalBench SWE-Bench Pro

What is Gemini 3 Flash?

What can Gemini 3 Flash do?

How does the thinking budget work in Gemini 3 Flash?

How well does Gemini 3 Flash perform on benchmarks?

How much does Gemini 3 Flash cost?

How does Gemini 3 Flash compare to Gemini 3 Pro?

How does Gemini 3 Flash compare to other fast models?

What is Gemini 3 Flash used for?

How was Gemini 3 Flash received?

ELI5: Gemini 3 Flash in plain terms

See also

References

Improve this article

Related Articles

Gemini 3

Gemma 3

Gemini 3 Pro

Gemma 2

Llama 3.2

Pixtral

What links here

Related Articles

Gemini 3

Gemma 3

Gemini 3 Pro

Gemma 2

Llama 3.2

Pixtral

What links here