Gemini 2.0 Flash Thinking
Last reviewed
Jun 3, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 · 1,305 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 · 1,305 words
Add missing citations, update stale details, or suggest a clearer explanation.
Gemini 2.0 Flash Thinking is an experimental reasoning model released by Google as part of the Gemini 2.0 family. First made available on December 19, 2024, it was Google's first model trained to generate an explicit "thinking" process before producing a final answer, a technique aimed at improving accuracy on multi-step problems in mathematics, science, and coding [1][2]. Built on top of Gemini 2.0 Flash, it was offered as a free experimental option in Google AI Studio and through the Gemini API, and it was widely viewed as Google's answer to OpenAI's o1 family of reasoning models [1][3]. The model later served as a precursor to the thinking capabilities that Google DeepMind built natively into the Gemini 2.5 generation [4].
By late 2024, several AI labs were exploring "inference-time" or "test-time" scaling, in which a model spends additional computation reasoning through a problem before answering rather than producing a response in a single pass. OpenAI's o1, released earlier in 2024, was the most prominent example. Google introduced Gemini 2.0 Flash Thinking on December 19, 2024, the same month it unveiled the broader Gemini 2.0 line, positioning it as its first entrant in this class of reasoning models [1][3]. Jeff Dean, Google's chief scientist, described it as a model "trained to use thoughts to strengthen its reasoning" [2].
The initial release carried the model identifier gemini-2.0-flash-thinking-exp-1219 and was clearly labeled experimental, with Google cautioning that it was an early version [2][5]. Because it was derived from Gemini 2.0 Flash, it inherited that model's multimodal input support, including the ability to accept images alongside text [5].
The defining feature of Gemini 2.0 Flash Thinking is that, given a prompt, it pauses to work through the problem, considers intermediate steps and alternatives, and "explains" its reasoning before summarizing what it judges to be the most accurate answer [1][2]. Rather than only returning a final response, it exposes the chain of intermediate reasoning, which Google argued made the process more transparent and helped the model catch its own mistakes [2][6].
This reasoning was surfaced to users in two ways. In Google AI Studio, selecting the model enabled a dedicated "Thoughts" panel that displayed how the model broke down and worked through a task [6]. Through the Gemini API, the thinking content was returned as part of the response, appearing as the first element of the response content, so developers could inspect the thought summaries to understand how the model arrived at its conclusion [6].
Because it generated additional reasoning tokens before answering, the model produced longer outputs and took more time to respond than the base Flash model, with latencies ranging from seconds to minutes depending on the prompt [1]. It was also still error-prone in its early form. Contemporary coverage noted it could fail simple tasks, for instance miscounting the number of "R" letters in the word "strawberry," and could occasionally emit slightly malformed output [1][5].
Google shipped two main experimental versions. The original December release had a relatively small context window, and the January update expanded it substantially while adding native code execution.
| Version | Model ID | Released | Context window | Notable additions |
|---|---|---|---|---|
| Initial | gemini-2.0-flash-thinking-exp-1219 | Dec 19, 2024 | 32,000 tokens | First public reasoning model from Google [1][5][7] |
| Updated | gemini-2.0-flash-thinking-exp-01-21 | Jan 21, 2025 | 1,000,000 tokens | Native code execution; stronger benchmarks; fewer contradictions between reasoning and answer [7][8][9] |
The January 21, 2025 update increased the context window from 32,000 tokens to one million tokens, allowing the model to ingest large inputs such as an entire codebase or a collection of research papers [7][8]. It added native code execution as a tool, letting the model write and run code during its reasoning, and Google reported reduced instances of the model contradicting itself between its intermediate reasoning and its final answer [7][9]. The experimental Flash Thinking models remained free to test in Google AI Studio and via the API throughout this period [3][8].
Google reported gains across math, science, and multimodal benchmarks between the December and January versions. The figures below reflect Google's reported results for the two experimental releases as cited in contemporary coverage.
| Benchmark | Domain | Exp-1219 (Dec 2024) | Exp-01-21 (Jan 2025) |
|---|---|---|---|
| AIME 2024 | Mathematics | about 70% | 73.3% [8][9] |
| GPQA Diamond | Science | about 66% | 74.2% [8][9] |
| MMMU | Multimodal reasoning | not reported here | 75.4% [7][9] |
Press coverage framed the model's appeal partly on cost and capacity. VentureBeat noted that the free, one-million-token model contrasted sharply with paid premium reasoning offerings, processing far more context than some competing reasoning products while keeping response times relatively fast [3]. As with all early reasoning systems, these benchmark numbers came from the model developer and should be read as self-reported.
Gemini 2.0 Flash Thinking was released as an experimental model rather than a generally available production one. It could be selected from the model dropdown in Google AI Studio and called through the Gemini API using identifiers such as gemini-2.0-flash-thinking-exp or the dated variants gemini-2.0-flash-thinking-exp-1219 and gemini-2.0-flash-thinking-exp-01-21 [6][8]. The experimental tier was free but rate limited, with the December version constrained to a small number of requests per minute and per day [10].
Being experimental, it carried no stability guarantees. Google later folded thinking into its mainline lineup, and the standalone Flash Thinking experimental models were retired as the Gemini 2.5 generation and its successors took over reasoning duties [4][11].
Gemini 2.0 Flash Thinking was the direct ancestor of the reasoning approach that defined the next generation. When Google announced Gemini 2.5 Pro on March 25, 2025, it explicitly referred back to the earlier model, stating, "we recently introduced our first thinking model, Gemini 2.0 Flash Thinking" [4]. The key shift was that thinking moved from a separate experimental variant to a capability built into the models themselves. Google said it was "building these thinking capabilities directly into all of our models," and described Gemini 2.5 as a family of "thinking models" that reason through problems before responding by default [4].
Where Gemini 2.0 Flash Thinking was strongest in mathematics and coding, the Gemini 2.5 series generalized native thinking across domains and combined it with multimodal input and long context windows, while the underlying training recipe evolved from the original experiment [4][12]. The Gemini API documentation for thinking later centered on the Gemini 2.5 and 3 series as the production thinking models, with the 2.0 experimental version no longer the recommended path [11]. In this sense Gemini 2.0 Flash Thinking functioned as a proof of concept that validated visible, test-time reasoning for Google before the company made it a standard feature of its models.