Gemini 1.5 Flash

Google DeepMind Large Language Models Multimodal AI

9 min read

Updated Jun 28, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 28, 2026

Fact-checked

In review queue

Sources

9 citations

Revision

v2 · 1,727 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Gemini 1.5 Flash is a lightweight, low-latency multimodal large language model from Google DeepMind, released at Google I/O on May 14, 2024, as the fast and cost-efficient member of the Gemini 1.5 family.^[1]^[3] It was distilled from the larger Gemini 1.5 Pro through online distillation, shipped with a 1 million token context window, accepts text, image, audio, and video input, and was priced at $0.075 per million input tokens (prompts under 128K tokens) after an August 2024 cut, making it Google's cheapest and fastest Gemini model at launch.^[1]^[2]^[5] Google positioned it for high-volume, latency-sensitive workloads such as summarization, chat, captioning, and data extraction.^[2]

What is Gemini 1.5 Flash?

Gemini 1.5 Flash is the speed-optimized tier of Google's Gemini 1.5 generation: a dense Transformer-based multimodal model built to deliver most of Gemini 1.5 Pro's quality at a fraction of the cost and latency.^[2]^[4] Google first detailed the Gemini 1.5 generation in February 2024, when it released Gemini 1.5 Pro as a model capable of reasoning over very long inputs. Three months later, at I/O 2024, the company expanded the line with Gemini 1.5 Flash, positioning it as the fastest Gemini model offered through its API.^[1]^[3] The motivation was practical: developers running production applications frequently needed a model that responded quickly and cheaply at scale, and Pro was more capability than many of those tasks required. Flash was the answer, described by Google as a model that is "lighter weight than 1.5 Pro" yet "highly capable of multimodal reasoning across vast amounts of information."^[2]

The model shipped initially in public preview through Google's developer surfaces, alongside Gemini 1.5 Pro's expansion to a 2 million token window and a refreshed lineup of smaller open Gemma 2 models announced at the same event.^[2]

When was Gemini 1.5 Flash released?

Google announced Gemini 1.5 Flash at the I/O developer conference on May 14, 2024, and launched it in public preview the same day.^[1]^[3] The model reached general availability later in the year, with the stable build addressed by the version aliases gemini-1.5-flash and gemini-1.5-flash-002.^[6] The table below tracks the model's key milestones.

Date	Milestone
May 14, 2024	Announced at Google I/O; public preview begins
August 8, 2024	Tuning rolled out to all developers; input price cut 78% to $0.075/1M tokens
September 24, 2024	Updated -002 builds released; experimental Flash-8B introduced
October 3, 2024	Flash-8B made production ready (billing from October 14)
December 11, 2024	Succeeded by Gemini 2.0 Flash
September 24, 2025	gemini-1.5-flash-002 listed for discontinuation

How was Gemini 1.5 Flash built?

Gemini 1.5 Flash was not trained from scratch as an independent system. According to the Gemini 1.5 technical report, it is "a more lightweight variant designed for efficiency with minimal regression in quality," created through online distillation in which the smaller model learns to imitate the behavior of its more capable counterpart, Gemini 1.5 Pro.^[4] In Google's own framing, distillation transfers "the most essential knowledge and skills from a larger model" into a smaller, more efficient one.^[2]

The report describes Flash as a dense Transformer-based model that performs parallel computation of its attention and feedforward components, in contrast to the sparse mixture-of-experts approach used for the Pro tier.^[4] Google has not published an official parameter count for the standard Flash model, and this encyclopedia does not state one. The design goal was to preserve as much of Pro's quality as possible while cutting the cost and latency of serving, which is what makes the model attractive for high-frequency tasks.^[2]^[4]

What is Gemini 1.5 Flash's context window?

Gemini 1.5 Flash launched with the same 1 million token context window that defined the 1.5 generation, allowing it to ingest large documents, long transcripts, and extended multimedia inputs in a single request.^[2] In research evaluations reported in the technical report, the 1.5 models maintained near-perfect retrieval, above 99 percent, on long-context tasks out to at least 10 million tokens, although the context made available to users through the public API was held at the 1 to 2 million token range.^[4] For Flash specifically, the production limit was 1 million tokens, whereas Pro was later raised to 2 million.^[2]

What can Gemini 1.5 Flash do?

Flash accepts multimodal input, handling text, images, audio, and video, and returns text output.^[2] Google highlighted summarization, chat applications, image and video captioning, and data and structured extraction from long documents and tables as representative use cases.^[2] The combination of a long context window with low serving cost was the central selling point: developers could feed large amounts of material to the model and get fast responses suitable for interactive or batch processing.

The model also supported text tuning. On August 8, 2024, Google completed the rollout of Gemini 1.5 Flash tuning to all developers through both the Gemini API and Google AI Studio, letting teams fine-tune the base model on their own data.^[5]

How does Gemini 1.5 Flash compare to Gemini 1.5 Pro?

Gemini 1.5 Flash is the smaller, faster, and cheaper sibling of Gemini 1.5 Pro. Flash trades some peak capability for substantially lower latency and price, while sharing the same multimodal interface and long-context foundation. Independent and reported benchmark figures placed Flash close to Pro on general knowledge tests despite its smaller footprint, consistent with the technical report's claim of minimal quality regression.^[4] The contrast is summarized below.

Attribute	Gemini 1.5 Flash	Gemini 1.5 Pro
Architecture	Dense Transformer, distilled from Pro	Sparse mixture-of-experts
Context window (API)	1 million tokens	Up to 2 million tokens
Input pricing (under 128K)	$0.075 per 1M tokens	Higher (reduced over 1.5 lifetime)
Positioning	High-volume, low-latency tasks	Highest-quality multimodal reasoning
Input modalities	Text, image, audio, video	Text, image, audio, video

What benchmark gains did the Gemini 1.5 Flash -002 update bring?

In September 2024, Google released updated -002 builds of both Flash and Pro and reported the following gains for the refreshed models over their May versions:^[6]

Benchmark area	Reported change (002 update)
MMLU-Pro	About +7%
MATH and HiddenMath	About +20%
Vision and code tasks	About +2% to +7%

Google also said the updated models produced more concise default outputs and delivered roughly 2x faster output with about 3x lower latency on the new builds.^[6]

How much does Gemini 1.5 Flash cost?

Pricing fell sharply over the model's lifetime. On August 8, 2024, Google cut Flash input pricing by 78 percent to $0.075 per million tokens and output pricing by 71 percent to $0.30 per million tokens, for prompts under 128,000 tokens.^[5] The September update raised rate limits and reduced 1.5 Pro pricing as well.^[6] The standard Flash rate card is summarized below.

Item	Price (prompts under 128K tokens)
Input	$0.075 per 1M tokens
Output	$0.30 per 1M tokens

Usage through Google AI Studio was free of charge, with the paid rates applying to the Gemini API and Vertex AI.^[6]

What is Gemini 1.5 Flash-8B?

Gemini 1.5 Flash-8B is a smaller, cheaper, and faster variant of Flash. In the September 24, 2024 refresh, Google released it as experimental, then made it production ready on October 3, 2024, with billing for paid usage beginning October 14, 2024.^[6]^[7] Google described it as a model that "nearly matches the performance of the 1.5 Flash model launched in May across many benchmarks," while costing half as much.^[7]

Flash-8B carried the lowest published prices in the Gemini lineup at the time. For prompts under 128,000 tokens, it cost $0.0375 per million input tokens, $0.15 per million output tokens, and $0.01 per million cached tokens, with prices doubling for prompts above that threshold.^[7] Google also doubled the variant's throughput ceiling to 4,000 requests per minute, twice the standard Flash limit.^[7] The company recommended it for chat, transcription, long-context translation, and high-volume multimodal summarization.^[7]

Flash-8B item	Price (prompts under 128K tokens)
Input	$0.0375 per 1M tokens
Output	$0.15 per 1M tokens
Cached input	$0.01 per 1M tokens

Where could you use Gemini 1.5 Flash?

Gemini 1.5 Flash was distributed through three Google channels: Google AI Studio for prototyping, the Gemini API for developers, and Vertex AI for enterprise and Google Cloud customers.^[2]^[6] It launched in public preview in May 2024 and reached general availability later in the year, with the stable build addressed by the version aliases gemini-1.5-flash and gemini-1.5-flash-002.^[6]

The 1.5 generation has since been retired. Google listed September 24, 2025, as the discontinuation date for gemini-1.5-flash-002, the model that the gemini-1.5-flash alias pointed to, and the broader Gemini 1.0 and 1.5 families were subsequently shut down on the Gemini API.^[8]

What replaced Gemini 1.5 Flash?

Gemini 1.5 Flash was directly succeeded by Gemini 2.0 Flash, which Google announced on December 11, 2024. Google called 1.5 Flash "our most popular model yet for developers" and said 2.0 Flash built on it with stronger performance at similarly fast response times, even outperforming 1.5 Pro on key benchmarks at twice the speed.^[9] Gemini 2.0 Flash became the default model on January 30, 2025, with 1.5 Flash remaining available for a transition period before its later retirement.^[9] Subsequent generations, including the Gemini 2.5 and 3.x Flash models, continued the line.

References

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Gemini 1.5 Pro Gemini 2.0 Flash Gemini 2.0 Flash-Lite

What is Gemini 1.5 Flash?

When was Gemini 1.5 Flash released?

How was Gemini 1.5 Flash built?

What is Gemini 1.5 Flash's context window?

What can Gemini 1.5 Flash do?

How does Gemini 1.5 Flash compare to Gemini 1.5 Pro?

What benchmark gains did the Gemini 1.5 Flash -002 update bring?

How much does Gemini 1.5 Flash cost?

What is Gemini 1.5 Flash-8B?

Where could you use Gemini 1.5 Flash?

What replaced Gemini 1.5 Flash?

References

Improve this article

Related Articles

Gemini 3

Gemma 3

Gemini 2.5 Flash

Gemini 3.5 Flash

Gemini 3.1 Pro

Gemini 1.5 Pro

What links here

Related Articles

Gemini 3

Gemma 3

Gemini 2.5 Flash

Gemini 3.5 Flash

Gemini 3.1 Pro

Gemini 1.5 Pro

What links here