Gemini 2.0 Flash
Last reviewed
Jun 3, 2026
Sources
13 citations
Review status
Source-backed
Revision
v1 · 1,406 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
13 citations
Review status
Source-backed
Revision
v1 · 1,406 words
Add missing citations, update stale details, or suggest a clearer explanation.
Gemini 2.0 Flash is a multimodal large language model developed by Google DeepMind as part of the Gemini family. Google introduced it on December 11, 2024 as the first model in the Gemini 2.0 generation, pitching it as a "workhorse" model for what the company called the agentic era: low latency, broad multimodal support, native tool use, and the ability to generate images and audio directly rather than calling separate systems [1][2]. It launched first as an experimental preview and reached general availability on February 5, 2025 [3].
Gemini 2.0 Flash succeeded Gemini 1.5 Flash, the fast, cost-efficient tier of the previous generation. Google framed the new model as keeping the speed of 1.5 Flash while pushing quality upward, and it stated that 2.0 Flash "even outperforms 1.5 Pro on key benchmarks, at twice the speed" [1]. The launch was led by Google DeepMind CEO Demis Hassabis and CTO Koray Kavukcuoglu, who positioned the 2.0 family around three themes: multimodality, agentic capabilities, and speed [1][2].
The December 11 release was deliberately staged. Developers could begin testing an experimental build (gemini-2.0-flash-exp) immediately, while a chat-optimized version appeared in the consumer Gemini product the same day [1][4]. Broader general availability, additional sizes, and the more advanced output modalities were promised for early 2025 [1].
Gemini 2.0 Flash accepts text, images, audio, and video as input. Its defining feature relative to earlier Gemini models is native multimodal output: the model can produce text, generate images, and synthesize speech from a single model rather than routing requests to a separate image or text-to-speech system [1][2].
At announcement, Google described two new output modes. The first was native image generation, in which the model interleaves generated images with text and supports multi-turn conversational editing, where a user can refine an image across successive prompts [1][5]. The second was steerable, multilingual text-to-speech audio, with Google citing a set of high-quality voices spanning multiple languages [1][2]. All generated images and audio carry an invisible SynthID watermark for provenance [1].
These output features were restricted to early-access partners at launch rather than being broadly available. Google opened native image generation to general developer experimentation through Google AI Studio and the Gemini API on March 12, 2025 [5][6]. The image, audio, and live-streaming capabilities were described as reaching wider availability "in the coming months" after the February general-availability milestone, which initially shipped with text output [3][7].
Gemini 2.0 Flash can call tools on its own as part of generating a response. Supported tools include Google Search, code execution, and developer-defined functions via function calling [2][3]. Google noted that the model can issue multiple searches in parallel to gather and combine information, which it presented as a building block for agent-style applications [2]. Alongside the model, Google released a Multimodal Live API that handles real-time audio and video streaming input, with support for natural conversational patterns such as interruptions and voice-activity detection [2].
Gemini 2.0 Flash provides a context window of 1 million tokens for input [3][7]. That matched the entry tier of the Gemini 1.5 line and was carried over to the budget-oriented Gemini 2.0 Flash-Lite. The larger Gemini 2.0 Pro Experimental, released the same day as the Flash general-availability launch, doubled this to 2 million tokens [3][7].
Google reported that Gemini 2.0 Flash improved on Gemini 1.5 Flash across a range of academic benchmarks and beat the larger Gemini 1.5 Pro on several of them while running roughly twice as fast [1]. The figures below are drawn from Google's published comparison for the experimental release. Where a directly comparable 1.5 Pro score was published, it is shown for reference [8][9].
| Benchmark | Capability | Gemini 2.0 Flash | Gemini 1.5 Pro |
|---|---|---|---|
| MMLU-Pro | General knowledge and reasoning | 76.4% | 75.8% |
| Natural2Code | Code generation | 92.9% | 85.4% |
| MATH | Mathematics | 89.7% | not published in set |
| MMMU | Multimodal (image) reasoning | 70.7% | not published in set |
| FACTS Grounding | Factual grounding | 83.6% | not published in set |
| CoVoST2 | Audio translation | 71.5% | not published in set |
Independent reporting on the experimental model corroborated the headline MMLU-Pro, Natural2Code, MATH, MMMU, and FACTS Grounding figures [8][9]. Google did not publish a full 1.5 Pro column for every benchmark in the launch comparison, so several reference cells are left blank rather than estimated.
A separate reasoning-focused variant, Gemini 2.0 Flash Thinking Experimental, was released on December 19, 2024. Built on the 2.0 Flash base, it exposes a summary of its step-by-step reasoning before answering and was widely compared to OpenAI's o1 family of reasoning models [10].
During the experimental phase, developers reached Gemini 2.0 Flash through the Gemini API in Google AI Studio and in Vertex AI [1][2]. On the consumer side, a chat-optimized version became selectable in the model drop-down of the Gemini app on desktop and mobile web from December 11, 2024, for both standard and Advanced users, with the mobile app following shortly after [1][4]. According to Wikipedia's account of the rollout, the model became the default in the Gemini app on January 30, 2025 [11].
At general availability on February 5, 2025, Google said 2.0 Flash shipped with higher rate limits, stronger performance, and simplified pricing across Google AI Studio, Vertex AI, and the Gemini app [3][7]. The general-availability pricing was set at $0.10 per million input tokens and $0.40 per million output tokens, replacing the earlier split between short and long context tiers [7][12].
Google presented Gemini 2.0 Flash as the engine behind several research prototypes shown at the December 2024 launch. Project Astra is a research effort toward a universal AI assistant that uses the model's real-time multimodal understanding through a phone's camera and microphone [1]. Project Mariner is an experimental agent that operates inside the Chrome browser to complete tasks on a user's behalf; Google reported that it reached a state-of-the-art result of 83.5% on the WebVoyager benchmark for web task completion [1]. A coding agent prototype, Jules, was also introduced for developers [1].
The Gemini 2.0 family grew over the following weeks. On February 5, 2025, Google released Gemini 2.0 Flash-Lite into public preview as its most cost-efficient model, offering better quality than 1.5 Flash at the same speed and cost, with the same 1 million token context window [3][7]. The same announcement brought Gemini 2.0 Pro Experimental, aimed at coding and complex prompts, with a 2 million token context window and tool calling for Google Search and code execution [3][7].
The next generation arrived with Gemini 2.5. Google released Gemini 2.5 Pro Experimental on March 25, 2025, a model that built reasoning into the architecture while retaining native multimodality, and the line later became the recommended migration target for 2.0 users [11]. Google subsequently scheduled the gemini-2.0-flash and gemini-2.0-flash-lite identifiers for shutdown on June 1, 2026, directing developers toward Gemini 2.5 Pro and the 2.5 Flash tiers [13].