Gemini 2.0 Flash
Last reviewed
Sources
15 citations
Review status
Source-backed
Revision
v2 · 1,770 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
15 citations
Review status
Source-backed
Revision
v2 · 1,770 words
Add missing citations, update stale details, or suggest a clearer explanation.
Gemini 2.0 Flash is a fast, low-cost multimodal large language model built by Google DeepMind as the flagship workhorse of the Gemini 2.0 generation, designed for the agentic era with native tool use, a 1 million token context window, and the ability to generate images and audio directly. Google introduced it on December 11, 2024 and stated that the model "even outperforms 1.5 Pro on key benchmarks, at twice the speed" while costing far less to run [1]. It launched first as an experimental preview (gemini-2.0-flash-exp) and reached general availability on February 5, 2025, priced at $0.10 per million input tokens and $0.40 per million output tokens [1][3][7].
Gemini 2.0 Flash is distinguished from earlier Gemini models by native multimodal output: a single model accepts text, images, audio, and video as input and can return text, generated images, and synthesized speech without routing to separate systems. It powers Google's agentic prototypes, including Project Astra and Project Mariner, and was succeeded by the reasoning-native Gemini 2.5 line in March 2025.
Gemini 2.0 Flash succeeded Gemini 1.5 Flash, the fast, cost-efficient tier of the previous generation. Google framed the new model as keeping the speed of 1.5 Flash while pushing quality upward, and it stated that 2.0 Flash "even outperforms 1.5 Pro on key benchmarks, at twice the speed" [1]. The launch was led by Google DeepMind CEO Demis Hassabis and CTO Koray Kavukcuoglu, who positioned the 2.0 family around three themes: multimodality, agentic capabilities, and speed [1][2]. Announcing the release, Google described the goal in terms of agents, writing that "with new advances in multimodality, like native image and audio output, and native tool use, it will enable us to build new AI agents that bring us closer to our vision of a universal assistant" [1].
The December 11 release was deliberately staged. Developers could begin testing an experimental build (gemini-2.0-flash-exp) immediately, while a chat-optimized version appeared in the consumer Gemini product the same day [1][4]. Broader general availability, additional sizes, and the more advanced output modalities were promised for early 2025 [1].
Gemini 2.0 Flash accepts text, images, audio, and video as input. Its defining feature relative to earlier Gemini models is native multimodal output: the model can produce text, generate images, and synthesize speech from a single model rather than routing requests to a separate image or text-to-speech system [1][2].
At announcement, Google described two new output modes. The first was native image generation, in which the model interleaves generated images with text and supports multi-turn conversational editing, where a user can refine an image across successive prompts [1][5]. The second was steerable, multilingual text-to-speech audio, with Google citing a set of high-quality voices spanning multiple languages [1][2]. All generated images and audio carry an invisible SynthID watermark for provenance [1].
These output features were restricted to early-access partners at launch rather than being broadly available. Google opened native image generation to general developer experimentation through Google AI Studio and the Gemini API on March 12, 2025 [5][6]. The image, audio, and live-streaming capabilities were described as reaching wider availability "in the coming months" after the February general-availability milestone, which initially shipped with text output [3][7].
Gemini 2.0 Flash can call tools on its own as part of generating a response. Supported tools include Google Search, code execution, and developer-defined functions via function calling [2][3]. Google noted that the model can issue multiple searches in parallel to gather and combine information, which it presented as a building block for agent-style applications [2]. Alongside the model, Google released a Multimodal Live API that handles real-time audio and video streaming input, with support for natural conversational patterns such as interruptions and voice-activity detection [2].
Gemini 2.0 Flash provides a context window of 1 million tokens for input [3][7]. That matched the entry tier of the Gemini 1.5 line and was carried over to the budget-oriented Gemini 2.0 Flash-Lite. The larger Gemini 2.0 Pro Experimental, released the same day as the Flash general-availability launch, doubled this to 2 million tokens [3][7].
Google reported that Gemini 2.0 Flash improved on Gemini 1.5 Flash across a range of academic benchmarks and beat the larger Gemini 1.5 Pro on several of them while running roughly twice as fast [1]. The figures below are drawn from Google's published comparison for the experimental release. Where a directly comparable 1.5 Pro score was published, it is shown for reference [8][9].
| Benchmark | Capability | Gemini 2.0 Flash | Gemini 1.5 Pro |
|---|---|---|---|
| MMLU-Pro | General knowledge and reasoning | 76.4% | 75.8% |
| Natural2Code | Code generation | 92.9% | 85.4% |
| MATH | Mathematics | 89.7% | not published in set |
| MMMU | Multimodal (image) reasoning | 70.7% | not published in set |
| FACTS Grounding | Factual grounding | 83.6% | not published in set |
| CoVoST2 | Audio translation | 71.5% | not published in set |
Independent reporting on the experimental model corroborated the headline MMLU-Pro, Natural2Code, MATH, MMMU, and FACTS Grounding figures [8][9]. Google did not publish a full 1.5 Pro column for every benchmark in the launch comparison, so several reference cells are left blank rather than estimated.
A separate reasoning-focused variant, Gemini 2.0 Flash Thinking Experimental, was released on December 19, 2024. Built on the 2.0 Flash base, it exposes a summary of its step-by-step reasoning before answering and was widely compared to OpenAI's o1 family of reasoning models [10]. Jeff Dean, Google DeepMind's chief scientist, described it as "an experimental model that explicitly shows its thoughts" and "trained to use thoughts to strengthen its reasoning" [14].
During the experimental phase, developers reached Gemini 2.0 Flash through the Gemini API in Google AI Studio and in Vertex AI [1][2]. On the consumer side, a chat-optimized version became selectable in the model drop-down of the Gemini app on desktop and mobile web from December 11, 2024, for both standard and Advanced users, with the mobile app following shortly after [1][4]. According to Wikipedia's account of the rollout, the model became the default in the Gemini app on January 30, 2025 [11].
At general availability on February 5, 2025, Google described 2.0 Flash as "our highly efficient workhorse model for developers with low latency and enhanced performance" and said it shipped with higher rate limits, stronger performance, and simplified pricing across Google AI Studio, Vertex AI, and the Gemini app [3][7]. The general-availability pricing was set at $0.10 per million input tokens and $0.40 per million output tokens, replacing the earlier split between short and long context tiers [7][12]. The table below summarizes the launch pricing for the 2.0 Flash tiers.
| Model | Status on Feb 5, 2025 | Input ($/M tokens) | Output ($/M tokens) | Context window |
|---|---|---|---|---|
| Gemini 2.0 Flash | Generally available | $0.10 | $0.40 | 1M tokens |
| Gemini 2.0 Flash-Lite | Public preview | $0.075 | $0.30 | 1M tokens |
| Gemini 2.0 Pro Experimental | Experimental | not priced at launch | not priced at launch | 2M tokens |
Google presented Gemini 2.0 Flash as the engine behind several research prototypes shown at the December 2024 launch. Project Astra is a research effort toward a universal AI assistant that uses the model's real-time multimodal understanding through a phone's camera and microphone [1]. Project Mariner is an experimental agent that operates inside the Chrome browser to complete tasks on a user's behalf; Google reported that it reached a state-of-the-art result of 83.5% on the WebVoyager benchmark, which evaluates agents on end-to-end real-world web tasks [1]. A coding agent prototype, Jules, was also introduced for developers [1].
The Gemini 2.0 family grew over the following weeks. On February 5, 2025, Google released Gemini 2.0 Flash-Lite into public preview as its most cost-efficient model, offering better quality than 1.5 Flash at the same speed and cost, with the same 1 million token context window [3][7]. Flash-Lite then reached general availability for production use in Google AI Studio and Vertex AI on February 25, 2025 [15]. The February 5 announcement also brought Gemini 2.0 Pro Experimental, aimed at coding and complex prompts, with a 2 million token context window and tool calling for Google Search and code execution [3][7].
The next generation arrived with Gemini 2.5. Google released Gemini 2.5 Pro Experimental on March 25, 2025, a model that built reasoning into the architecture while retaining native multimodality, and the line later became the recommended migration target for 2.0 users [11]. Google subsequently scheduled the gemini-2.0-flash and gemini-2.0-flash-lite identifiers for shutdown on June 1, 2026, directing developers toward Gemini 2.5 Pro and the 2.5 Flash tiers [13].