# Gemini 1.5 Pro

> Source: https://aiwiki.ai/wiki/gemini_1_5_pro
> Updated: 2026-06-24
> Categories: Google DeepMind, Large Language Models, Multimodal AI
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**Gemini 1.5 Pro** is a multimodal large language model developed by [Google DeepMind](/wiki/google_deepmind) and announced on February 15, 2024, as the flagship model of the [Gemini](/wiki/gemini) 1.5 generation. It was the first widely available model to ship with a context window of up to one million tokens, later expanded to two million, and Google described it as carrying "the longest context window of any large-scale foundation model yet."[1] Built around a sparse [mixture-of-experts](/wiki/mixture_of_experts) (MoE) Transformer architecture, it accepted text, code, images, audio, and video as input and reached roughly the performance of the much larger [Gemini Ultra](/wiki/gemini_ultra) 1.0 while using less compute.[1][2][3]

## Background

Google introduced the first Gemini models in December 2023, led by Gemini 1.0 Ultra, Pro, and Nano. Barely two months later, on February 15, 2024, the company unveiled Gemini 1.5 and made Gemini 1.5 Pro available in a limited private preview through [Google AI Studio](/wiki/google_ai_studio) and [Vertex AI](/wiki/google_vertex_ai). Sundar Pichai, chief executive of Google and Alphabet, and Demis Hassabis, chief executive of Google DeepMind, both introduced the release.[1][2] Hassabis described Gemini 1.5 Pro as "a mid-size multimodal model, optimized for scaling across a wide-range of tasks," performing at a level comparable to 1.0 Ultra, Google's largest model up to that point.[1]

The release drew attention mainly for its context length. Before the announcement, the largest context window among publicly available large language models was around 200,000 tokens. Gemini 1.5 Pro ran consistently with up to one million tokens, a figure Google said it had stress-tested far higher in research.[1][3]

## How does the mixture-of-experts architecture work?

The technical report, "Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context," posted to arXiv on March 8, 2024, describes Gemini 1.5 Pro as a sparse mixture-of-expert Transformer-based model. In an MoE design, a learned routing function directs each input to a subset of the model's parameters rather than activating the entire network. This lets the total parameter count grow while the compute used for any single token stays bounded, which made the model cheaper to train and serve than a dense model of equivalent size.[3] The report builds on the multimodal foundations of the Gemini 1.0 family and a body of Google research on long-context architectures and mixture-of-experts routing.[3]

Google did not publish the parameter count for Gemini 1.5 Pro.

## How long is the context window?

Gemini 1.5 Pro shipped with a standard context window of 128,000 tokens. From the February 2024 announcement, a limited group of developers and enterprise customers could request access to a one-million-token window in private preview through AI Studio and Vertex AI.[1][2] Google illustrated the scale by noting that one million tokens corresponded to roughly 700,000 words, over 30,000 lines of code, one hour of video, or about 11 hours of audio.[1]

In the technical report, Google described stress tests well beyond the production limit. On "needle in a haystack" retrieval, where a single planted fact is hidden at varying depths inside a large body of text, the model maintained 100% recall up to 530,000 tokens and greater than 99.7% recall up to one million tokens across all modalities (text, video, and audio). Extended experiments reported 99.2% recall at ten million tokens, alongside near-perfect retrieval across roughly 10.5 hours of video and up to about 107 hours of audio.[3] Google framed these multi-million-token results as research findings rather than features of the shipping product. The report contrasted the model's reach with then-current systems such as Claude 2.1 (200,000 tokens) and GPT-4 Turbo (128,000 tokens).[3]

The one-million-token window carried a latency cost. In early demonstrations, queries over very large inputs took anywhere from about 20 seconds to a minute to return.[4]

## What can Gemini 1.5 Pro do?

Gemini 1.5 Pro was natively multimodal and could reason over text, code, images, and video. When the model entered public preview on Vertex AI on April 9, 2024, Google added audio (speech) understanding, allowing it to process spoken content directly rather than relying on a separate transcription step. The company highlighted uses such as analyzing recordings of TV broadcasts, films, radio, and conference calls.[4][5]

The long window enabled tasks that smaller-context models could not attempt in a single pass, such as reasoning over an entire codebase, a long film, or a lengthy collection of documents at once. To demonstrate the reach, Google fed the model the 402-page transcripts from Apollo 11's mission to the moon; the company said it could "reason about conversations, events and details found across the document."[1]

A second demonstration tested in-context learning rather than retrieval. The model was given a single grammar manual for Kalamang, a language spoken by fewer than 200 people and with little presence in its training data, together with a dictionary and roughly 400 parallel sentences, and learned to translate to and from it at a level comparable to a person studying from the same material.[3]

## How does Gemini 1.5 Pro perform on benchmarks?

Across a standard suite of text, vision, and audio benchmarks, the technical report compared the updated Gemini 1.5 Pro against the earlier Gemini 1.0 models. The model won the large majority of comparisons against 1.0 Pro and a narrow majority against the larger 1.0 Ultra, while requiring less compute than Ultra.[3]

| Benchmark / metric | Gemini 1.5 Pro result | Source |
| --- | --- | --- |
| Win rate vs Gemini 1.0 Pro (core benchmarks) | 87.9% (29 of 33) | Technical report[3] |
| Win rate vs Gemini 1.0 Ultra (core benchmarks) | 57.6% (19 of 33) | Technical report[3] |
| Text "needle in a haystack" recall, to 530K tokens | 100% | Technical report[3] |
| Text recall, to 1M tokens | greater than 99.7% | Technical report[3] |
| Text recall, to 10M tokens | 99.2% | Technical report[3] |
| Audio haystack recall, to ~107 hours of audio | 100% | Technical report[3] |

Independent commentary at the time treated the long-context recall numbers as among the model's most notable results, while noting that real-world performance on very large inputs could depend heavily on the task and prompt.[3][4]

## When was Gemini 1.5 Pro released and where could you use it?

Gemini 1.5 Pro reached users through several channels over 2024. After the February 15 private preview, it entered public preview on Vertex AI on April 9, 2024, with the one-million-token window and newly added audio input.[4][5]

At Google I/O on May 14, 2024, Google brought Gemini 1.5 Pro to consumers through the [Gemini](/wiki/gemini_app) app's paid Gemini Advanced tier, with the one-million-token context window, available in more than 150 countries and over 35 languages.[6] Google described it as the longest context window of any widely available consumer chatbot.[6] At the same event the company announced that the developer context window would double to two million tokens, initially behind a waitlist, and debuted the lighter [Gemini 1.5 Flash](/wiki/gemini_1_5_flash) sibling.[7]

On June 27, 2024, Google removed the waitlist and made the two-million-token context window for Gemini 1.5 Pro available to all developers through the Gemini API and Google AI Studio. The same update introduced context caching, to lower the cost of reusing the same tokens across requests, along with code execution and access to the Gemma 2 open models.[7][8] Google reported that two million tokens could correspond to about two hours of video, roughly 22 hours of audio, more than 60,000 lines of code, or over 1.4 million words.[8] Stable production versions of the model were released later in 2024 under API names such as `gemini-1.5-pro-001` and `gemini-1.5-pro-002`.

The Gemini 1.5 models were later deprecated. Google restricted access for new projects to the Gemini 1.5 line through the Gemini API in April 2025, while existing integrations retained access for a transition period.[9]

## What replaced Gemini 1.5 Pro?

Gemini 1.5 Pro was accompanied by a lighter sibling, [Gemini 1.5 Flash](/wiki/gemini_1_5_flash), announced at Google I/O on May 14, 2024, as a faster and more efficient model distilled for high-volume use with smaller quality regressions.[3][7] The line was succeeded by the Gemini 2.x generation, beginning with [Gemini 2.0 Flash](/wiki/gemini_2_0_flash), which Google introduced in December 2024 and made generally available in early 2025, and continued with later 2.x and 3.x releases.[9]

## References

1. ["Introducing Gemini 1.5, Google's next-generation AI model"](https://blog.google/innovation-and-ai/products/google-gemini-next-generation-model-february-2024/). Google, February 15, 2024.
2. ["Gemini 1.5: Our next-generation model, now available for Private Preview in Google AI Studio"](https://developers.googleblog.com/gemini-15-our-next-generation-model-now-available-for-private-preview-in-google-ai-studio/). Google Developers Blog, February 15, 2024.
3. Gemini Team, Google. ["Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context"](https://arxiv.org/abs/2403.05530). arXiv:2403.05530, March 8, 2024.
4. ["Google's Gemini 1.5 Pro enters public preview on Vertex AI"](https://techcrunch.com/2024/04/09/googles-gemini-pro-1-5-enters-public-preview-on-vertex-ai/). TechCrunch, April 9, 2024.
5. ["Google announces Gemini 1.5 with greatly expanded context window"](https://9to5google.com/2024/02/15/gemini-1-5-announcement/). 9to5Google, February 15, 2024.
6. ["Google Gemini update: Access to 1.5 Pro and new features"](https://blog.google/products-and-platforms/products/gemini/google-gemini-update-may-2024/). Google, May 14, 2024.
7. ["Gemini 1.5 Pro now offers a 2 million token context window for devs"](https://9to5google.com/2024/06/27/gemini-1-5-pro-2-million/). 9to5Google, June 27, 2024.
8. ["Gemini 1.5 Pro 2M context window, code execution capabilities, and Gemma 2 are available today"](https://developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/). Google Developers Blog, June 27, 2024.
9. ["Gemini (language model)"](https://en.wikipedia.org/wiki/Gemini_(language_model)). Wikipedia.

