Gemini 3.5 Flash
Last reviewed
Jun 2, 2026
Sources
13 citations
Review status
Source-backed
Revision
v1 · 1,727 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 2, 2026
Sources
13 citations
Review status
Source-backed
Revision
v1 · 1,727 words
Add missing citations, update stale details, or suggest a clearer explanation.
Gemini 3.5 Flash is a fast frontier large language model developed by Google DeepMind, announced at Google I/O 2026 on May 19, 2026 and made generally available the same day [1][2]. It is the first model in the Gemini 3.5 family and the successor to Gemini 3 Flash, which shipped in December 2025 [3][4]. Google positioned the model around agentic execution and coding rather than conversational chat, describing it as the company's "most intelligent Flash model" and claiming it surpasses the larger Gemini 3.1 Pro on most coding and agentic evaluations while running about four times faster on output [1][5].
Gemini 3.5 Flash sits in the "Flash" tier of the Gemini line, the segment Google tunes for low latency and high throughput rather than maximum raw capability. With this release Google argued that the speed-optimized tier had caught up to, and on several axes overtaken, the previous flagship Pro model [1][6]. The pitch centered on long-horizon agentic work: tasks where a model plans, calls tools, writes and runs code, and iterates over many steps with limited human input [5].
The model is multimodal on input. It accepts text, images, video, audio, and PDF documents, and returns text [7]. It carries a one-million-token context window and supports tool use including Google Search grounding, code execution, file search, URL context, and function calling [8]. Notably, the computer-use capability available in some earlier Gemini releases is not supported in Gemini 3.5 Flash at launch [8].
Google unveiled Gemini 3.5 Flash during the Google I/O 2026 developer keynote on May 19, 2026, and shipped it to production the same day rather than as a staged preview [1][2]. Koray Kavukcuoglu, who leads Google DeepMind's model work, framed the launch around the shift from "AI as a conversational tool to AI as an agentic tool," and said the model offers "an incredible combination of quality and low latency" [2]. Tulsee Doshi noted the model could "run autonomously for multiple hours," pausing to check in with a person at decision points [2].
At launch the model carried the API identifier gemini-3.5-flash and replaced the earlier preview identifier used for Gemini 3 Flash [8]. Google described it as generally available, stable, and ready for scaled production use [9]. The same release powered consumer-facing features announced at I/O, including the model behind the Gemini app and AI Mode in Google Search [2].
Gemini 3.5 Flash is the opening model of the Gemini 3.5 generation. Its direct predecessor in the Flash tier is Gemini 3 Flash (released December 17, 2025), and the wider 3.x family includes Gemini 3 Pro and the intermediate Gemini 3.1 Pro that Google used as its comparison baseline at launch [3][4][6]. Google reported that 3.5 Flash outperforms Gemini 3.1 Pro on challenging coding and agentic benchmarks, which several outlets described as the first time a Flash-tier model had surpassed a Pro-tier model on those workloads [1][5].
A larger sibling, Gemini 3.5 Pro, was previewed as "rolling out next month," with reporting putting that timeline in June 2026 [1][9][10]. Some coverage noted audience disappointment that the Pro variant was not shipping at the event itself [10]. For background on the broader generation, see Gemini 3.
Google disclosed little about the underlying architecture or training data. The published model card and developer documentation describe the model's interface and behavior rather than its internals, and Google did not release parameter counts or training-corpus details [7][8]. The stated knowledge cutoff is January 2025 [7][9].
What Google did detail is the reasoning interface. Gemini 3.5 Flash uses a thinking_level parameter with values minimal, low, medium, and high, where medium is the default. This changed the default from the high setting used by Gemini 3 Flash, and Google said the low setting was retuned to be markedly stronger on code and agentic tasks [8]. The model also preserves intermediate reasoning across turns of a multi-turn conversation automatically, a behavior Google labels "thought preservation" [8].
The model accepts text, image, video, audio, and PDF input and produces text output [7]. Google lists its primary strengths as everyday tasks, agentic coding, advanced reasoning, multimodal understanding, and long-context understanding [7]. Tool capabilities include function calling, structured output, Google Search and Google Maps grounding, file search, URL context, and code execution [7][8].
The agentic framing was the centerpiece. Google described the model as able to execute long, multi-step workflows, and reporting cited early uptake among banks and fintechs automating multi-week processes [2][5]. On raw output speed, Kavukcuoglu said the model is "4x faster than other frontier models," and that Google had developed an optimized version that is "12x faster with the same quality" [2].
Google reported that Gemini 3.5 Flash leads Gemini 3.1 Pro across most coding, agentic, and multimodal evaluations, though it trails on a few reasoning and long-context tests. The table below lists scores published by Google and compiled in independent launch analyses; comparison figures are versus Gemini 3.1 Pro [1][7][6].
| Benchmark | Category | Gemini 3.5 Flash | Gemini 3.1 Pro |
|---|---|---|---|
| Terminal-Bench 2.1 | Agentic coding | 76.2% | 70.3% |
| SWE-Bench Pro (public) | Coding | 55.1% | 54.2% |
| MCP Atlas | Multi-step tool use | 83.6% | 78.2% |
| Toolathlon | Real-world tool use | 56.5% | 49.4% |
| OSWorld-Verified | Computer tasks | 78.4% | 76.2% |
| Finance Agent v2 | Agentic finance | 57.9% | 43.0% |
| GDPval-AA | Agentic (Elo) | 1656 | 1314 |
| CharXiv Reasoning | Chart reasoning | 84.2% | 83.3% |
| MMMU-Pro | Multimodal reasoning | 83.6% | 80.5% |
| Blueprint-Bench 2 | Multimodal | 33.6% | 26.5% |
| MRCR v2 (128k) | Long context | 77.3% | 84.9% |
| MRCR v2 (1M) | Long context | 26.6% | 26.3% |
| Humanity's Last Exam | Reasoning | 40.2% | 44.4% |
| ARC-AGI-2 | Reasoning | 72.1% | 77.1% |
On the Artificial Analysis Intelligence Index, the high-effort configuration of Gemini 3.5 Flash scored 55 and measured roughly 173 output tokens per second [11]. Google highlighted four headline numbers from the model card: 76.2% on Terminal-Bench 2.1, 83.6% on MCP Atlas, 84.2% on CharXiv Reasoning, and an Elo of 1656 on GDPval-AA [1][7].
Gemini 3.5 Flash launched across Google's developer and enterprise surfaces as well as its consumer apps. It is available through the Google AI Studio interface and the Gemini API, in Android Studio, on Google Vertex AI and the Gemini Enterprise Agent Platform, in Google Antigravity, and to everyone in the Gemini app and AI Mode in Search [1][2][7].
The context window is 1,048,576 input tokens with a maximum of 65,536 output tokens [8][12]. Pricing is volume-based per million tokens; rates differ slightly between Google's global and non-global serving regions, and batch processing is offered at a discount [12][13].
| Item | Rate (per 1M tokens) |
|---|---|
| Input (global) | $1.50 |
| Output (global) | $9.00 |
| Cached input | $0.15 |
| Cache storage | $1.00 per 1M token-hours |
| Input / output (non-global) | $1.65 / $9.90 |
| Batch input / output | $0.75 / $4.50 |
These rates are higher than the prior generation. Simon Willison noted the model is about three times the price of Gemini 3 Flash Preview and six times that of Gemini 3.1 Flash-Lite, and read the increase as the major labs probing how much their API customers will pay [12]. Despite the higher per-token cost, Google deployed the model broadly across free consumer products, which Willison read as an aggressive bet on reach [12].
Coverage of the launch split between the benchmark story and the pricing story. Outlets including TechCrunch and MarkTechPost emphasized Google's claim that a Flash-tier model now beat its own Pro-tier model on coding and agentic work, and the explicit reorientation toward agents over chatbots [2][5]. The model landed in the top-right quadrant of Artificial Analysis's intelligence-versus-speed chart, the region Google wanted to occupy [1][11].
Other commentary was cooler. Trending Topics characterized the release as "more of a solid incremental improvement than a milestone" and pointed out that the version number is 3.5, not 4, while flagging the "considerable" price increase over the predecessor [10]. The same report noted a muted early standing on community arena rankings and described audience disappointment that Gemini 3.5 Pro was delayed rather than shipped at the event [10].
Several limits were disclosed or evident at launch. Computer Use is not supported, unlike some earlier Gemini models [8]. The model output is text only, so it cannot generate images or audio directly [7]. On a handful of evaluations it trails Gemini 3.1 Pro, including Humanity's Last Exam (40.2% versus 44.4%), ARC-AGI-2 (72.1% versus 77.1%), and the 128k-token slice of MRCR v2 (77.3% versus 84.9%), which suggests the gains concentrated in agentic and coding workloads rather than across the board [6]. As with prior Gemini releases, Google did not publish architecture or training-data details, so independent verification of those aspects is not possible [7][8].