Gemini 3.5 Flash

Google DeepMind Large Language Models Multimodal AI

9 min read

Updated Jun 2, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 2, 2026

Fact-checked

In review queue

Sources

13 citations

Revision

v1 · 1,727 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Gemini 3.5 Flash is a fast frontier large language model developed by Google DeepMind, announced at Google I/O 2026 on May 19, 2026 and made generally available the same day ^[1]^[2]. It is the first model in the Gemini 3.5 family and the successor to Gemini 3 Flash, which shipped in December 2025 ^[3]^[4]. Google positioned the model around agentic execution and coding rather than conversational chat, describing it as the company's "most intelligent Flash model" and claiming it surpasses the larger Gemini 3.1 Pro on most coding and agentic evaluations while running about four times faster on output ^[1]^[5].

Overview

Gemini 3.5 Flash sits in the "Flash" tier of the Gemini line, the segment Google tunes for low latency and high throughput rather than maximum raw capability. With this release Google argued that the speed-optimized tier had caught up to, and on several axes overtaken, the previous flagship Pro model ^[1]^[6]. The pitch centered on long-horizon agentic work: tasks where a model plans, calls tools, writes and runs code, and iterates over many steps with limited human input ^[5].

The model is multimodal on input. It accepts text, images, video, audio, and PDF documents, and returns text ^[7]. It carries a one-million-token context window and supports tool use including Google Search grounding, code execution, file search, URL context, and function calling ^[8]. Notably, the computer-use capability available in some earlier Gemini releases is not supported in Gemini 3.5 Flash at launch ^[8].

Announcement and release

Google unveiled Gemini 3.5 Flash during the Google I/O 2026 developer keynote on May 19, 2026, and shipped it to production the same day rather than as a staged preview ^[1]^[2]. Koray Kavukcuoglu, who leads Google DeepMind's model work, framed the launch around the shift from "AI as a conversational tool to AI as an agentic tool," and said the model offers "an incredible combination of quality and low latency" ^[2]. Tulsee Doshi noted the model could "run autonomously for multiple hours," pausing to check in with a person at decision points ^[2].

At launch the model carried the API identifier gemini-3.5-flash and replaced the earlier preview identifier used for Gemini 3 Flash ^[8]. Google described it as generally available, stable, and ready for scaled production use ^[9]. The same release powered consumer-facing features announced at I/O, including the model behind the Gemini app and AI Mode in Google Search ^[2].

Place in the Gemini line

Gemini 3.5 Flash is the opening model of the Gemini 3.5 generation. Its direct predecessor in the Flash tier is Gemini 3 Flash (released December 17, 2025), and the wider 3.x family includes Gemini 3 Pro and the intermediate Gemini 3.1 Pro that Google used as its comparison baseline at launch ^[3]^[4]^[6]. Google reported that 3.5 Flash outperforms Gemini 3.1 Pro on challenging coding and agentic benchmarks, which several outlets described as the first time a Flash-tier model had surpassed a Pro-tier model on those workloads ^[1]^[5].

A larger sibling, Gemini 3.5 Pro, was previewed as "rolling out next month," with reporting putting that timeline in June 2026 ^[1]^[9]^[10]. Some coverage noted audience disappointment that the Pro variant was not shipping at the event itself ^[10]. For background on the broader generation, see Gemini 3.

Architecture and training

Google disclosed little about the underlying architecture or training data. The published model card and developer documentation describe the model's interface and behavior rather than its internals, and Google did not release parameter counts or training-corpus details ^[7]^[8]. The stated knowledge cutoff is January 2025 ^[7]^[9].

What Google did detail is the reasoning interface. Gemini 3.5 Flash uses a thinking_level parameter with values minimal, low, medium, and high, where medium is the default. This changed the default from the high setting used by Gemini 3 Flash, and Google said the low setting was retuned to be markedly stronger on code and agentic tasks ^[8]. The model also preserves intermediate reasoning across turns of a multi-turn conversation automatically, a behavior Google labels "thought preservation" ^[8].

Capabilities and modalities

The model accepts text, image, video, audio, and PDF input and produces text output ^[7]. Google lists its primary strengths as everyday tasks, agentic coding, advanced reasoning, multimodal understanding, and long-context understanding ^[7]. Tool capabilities include function calling, structured output, Google Search and Google Maps grounding, file search, URL context, and code execution ^[7]^[8].

The agentic framing was the centerpiece. Google described the model as able to execute long, multi-step workflows, and reporting cited early uptake among banks and fintechs automating multi-week processes ^[2]^[5]. On raw output speed, Kavukcuoglu said the model is "4x faster than other frontier models," and that Google had developed an optimized version that is "12x faster with the same quality" ^[2].

Benchmark performance

Google reported that Gemini 3.5 Flash leads Gemini 3.1 Pro across most coding, agentic, and multimodal evaluations, though it trails on a few reasoning and long-context tests. The table below lists scores published by Google and compiled in independent launch analyses; comparison figures are versus Gemini 3.1 Pro ^[1]^[7]^[6].

Benchmark	Category	Gemini 3.5 Flash	Gemini 3.1 Pro
Terminal-Bench 2.1	Agentic coding	76.2%	70.3%
SWE-Bench Pro (public)	Coding	55.1%	54.2%
MCP Atlas	Multi-step tool use	83.6%	78.2%
Toolathlon	Real-world tool use	56.5%	49.4%
OSWorld-Verified	Computer tasks	78.4%	76.2%
Finance Agent v2	Agentic finance	57.9%	43.0%
GDPval-AA	Agentic (Elo)	1656	1314
CharXiv Reasoning	Chart reasoning	84.2%	83.3%
MMMU-Pro	Multimodal reasoning	83.6%	80.5%
Blueprint-Bench 2	Multimodal	33.6%	26.5%
MRCR v2 (128k)	Long context	77.3%	84.9%
MRCR v2 (1M)	Long context	26.6%	26.3%
Humanity's Last Exam	Reasoning	40.2%	44.4%
ARC-AGI-2	Reasoning	72.1%	77.1%

On the Artificial Analysis Intelligence Index, the high-effort configuration of Gemini 3.5 Flash scored 55 and measured roughly 173 output tokens per second ^[11]. Google highlighted four headline numbers from the model card: 76.2% on Terminal-Bench 2.1, 83.6% on MCP Atlas, 84.2% on CharXiv Reasoning, and an Elo of 1656 on GDPval-AA ^[1]^[7].

Availability, context window, and pricing

Gemini 3.5 Flash launched across Google's developer and enterprise surfaces as well as its consumer apps. It is available through the Google AI Studio interface and the Gemini API, in Android Studio, on Google Vertex AI and the Gemini Enterprise Agent Platform, in Google Antigravity, and to everyone in the Gemini app and AI Mode in Search ^[1]^[2]^[7].

The context window is 1,048,576 input tokens with a maximum of 65,536 output tokens ^[8]^[12]. Pricing is volume-based per million tokens; rates differ slightly between Google's global and non-global serving regions, and batch processing is offered at a discount ^[12]^[13].

Item	Rate (per 1M tokens)
Input (global)	$1.50
Output (global)	$9.00
Cached input	$0.15
Cache storage	$1.00 per 1M token-hours
Input / output (non-global)	$1.65 / $9.90
Batch input / output	$0.75 / $4.50

These rates are higher than the prior generation. Simon Willison noted the model is about three times the price of Gemini 3 Flash Preview and six times that of Gemini 3.1 Flash-Lite, and read the increase as the major labs probing how much their API customers will pay ^[12]. Despite the higher per-token cost, Google deployed the model broadly across free consumer products, which Willison read as an aggressive bet on reach ^[12].

Reception

Coverage of the launch split between the benchmark story and the pricing story. Outlets including TechCrunch and MarkTechPost emphasized Google's claim that a Flash-tier model now beat its own Pro-tier model on coding and agentic work, and the explicit reorientation toward agents over chatbots ^[2]^[5]. The model landed in the top-right quadrant of Artificial Analysis's intelligence-versus-speed chart, the region Google wanted to occupy ^[1]^[11].

Other commentary was cooler. Trending Topics characterized the release as "more of a solid incremental improvement than a milestone" and pointed out that the version number is 3.5, not 4, while flagging the "considerable" price increase over the predecessor ^[10]. The same report noted a muted early standing on community arena rankings and described audience disappointment that Gemini 3.5 Pro was delayed rather than shipped at the event ^[10].

Limitations

Several limits were disclosed or evident at launch. Computer Use is not supported, unlike some earlier Gemini models ^[8]. The model output is text only, so it cannot generate images or audio directly ^[7]. On a handful of evaluations it trails Gemini 3.1 Pro, including Humanity's Last Exam (40.2% versus 44.4%), ARC-AGI-2 (72.1% versus 77.1%), and the 128k-token slice of MRCR v2 (77.3% versus 84.9%), which suggests the gains concentrated in agentic and coding workloads rather than across the board ^[6]. As with prior Gemini releases, Google did not publish architecture or training-data details, so independent verification of those aspects is not possible ^[7]^[8].

References

Google DeepMind. "Gemini 3.5: frontier intelligence with action." blog.google, May 19, 2026. https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/ ↩
Maxwell Zeff. "With Gemini 3.5 Flash, Google bets its next AI wave on agents, not chatbots." TechCrunch, May 19, 2026. https://techcrunch.com/2026/05/19/with-gemini-3-5-flash-google-bets-its-next-ai-wave-on-agents-not-chatbots/ ↩
DataNorth AI. "Google Releases Gemini 3.5 Flash: Frontier-Level Coding and Agentic Performance at 4x Speed." https://datanorth.ai/news/google-releases-gemini-3-5-flash ↩
Digital Applied. "Gemini 3.5 Flash: Benchmarks, Thinking & API Guide 2026." https://www.digitalapplied.com/blog/gemini-3-5-flash-benchmarks-api-guide ↩
MarkTechPost. "Google Introduces Gemini 3.5 Flash at I/O 2026: A Faster and Cheaper Model for AI Agents and Coding." May 20, 2026. https://www.marktechpost.com/2026/05/20/google-introduces-gemini-3-5-flash-at-i-o-2026-a-faster-and-cheaper-model-for-ai-agents-and-coding/ ↩
LLM Stats. "Gemini 3.5 Flash: Benchmarks, Pricing, and Complete Specs." https://llm-stats.com/blog/research/gemini-3.5-flash-launch ↩
Google DeepMind. "Gemini 3.5 Flash" (model card). https://deepmind.google/models/gemini/flash/ ↩
Google. "What's new in Gemini 3.5 Flash." Gemini API documentation. https://ai.google.dev/gemini-api/docs/interactions/whats-new-gemini-3.5 ↩
Google. "Gemini 3 Developer Guide." Gemini API documentation. https://ai.google.dev/gemini-api/docs/gemini-3 ↩
Trending Topics. "Google Launches Gemini 3.5 Flash With Higher Prices but No Generational Leap." https://www.trendingtopics.eu/google-launches-gemini-3-5-flash-with-higher-prices-but-no-generational-leap/ ↩
Artificial Analysis. "Gemini 3.5 Flash: Intelligence, Performance & Price Analysis." https://artificialanalysis.ai/models/gemini-3-5-flash ↩
Simon Willison. "Gemini 3.5 Flash: more expensive, but Google plan to use it for everything." May 19, 2026. https://simonwillison.net/2026/May/19/gemini-35-flash/ ↩
OpenRouter. "Gemini 3.5 Flash - API Pricing & Benchmarks." https://openrouter.ai/google/gemini-3.5-flash ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

AI Model Release Timeline (2022-2026)Best AI Writing Tools Gemini Spark Gemini vs ChatGPT LLM API Pricing Comparison

Overview

Announcement and release

Place in the Gemini line

Architecture and training

Capabilities and modalities

Benchmark performance

Availability, context window, and pricing

Reception

Limitations

References

Improve this article

Related Articles

Gemini 3

Gemma 3

Gemini 2.5 Flash

Gemini 3.1 Pro

Gemini 1.5 Pro

Gemini 1.5 Flash

What links here

Related Articles

Gemini 3

Gemma 3

Gemini 2.5 Flash

Gemini 3.1 Pro

Gemini 1.5 Pro

Gemini 1.5 Flash

What links here