Gemini (language model)

Generative AI Google DeepMind Large Language Models Natural Language Processing

56 min read

Updated Jun 20, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 20, 2026

Fact-checked

In review queue

Sources

62 citations

Revision

v12 · 11,167 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Gemini is a family of natively multimodal large language models developed by Google DeepMind, first announced on December 6, 2023, that can reason across text, images, audio, video, and code within a single model. Google describes Gemini 1.0 as "built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video" ^[5]. Gemini is Google's flagship AI effort, and the model family has evolved rapidly through multiple generations, from Gemini 1.0 through the Gemini 3.5 series, powering a wide range of Google products including the Gemini consumer app, Google Search AI Overviews, Google Workspace, Android, and a growing line of Gemini Robotics models for embodied AI.

As of April 2026, the latest models in the family are Gemini 3.1 Pro, which achieved a 77.1% score on the ARC-AGI-2 benchmark (more than doubling its predecessor's reasoning performance), and Gemini 3.1 Flash Lite, a cost-optimized model released on March 3, 2026, that delivers 2.5 times faster time-to-first-token than its predecessor ^[1]^[2]. In April 2026, Google also released Gemini Robotics-ER 1.6 (an embodied reasoning model for robots) and Deep Research Max, an autonomous research agent built on Gemini 3.1 Pro that scored 93.3% on DeepSearchQA ^[42]^[43]. At Google I/O on May 19, 2026, Google announced the Gemini 3.5 series, releasing Gemini 3.5 Flash, an agentic model that outperforms Gemini 3.1 Pro on coding and agentic benchmarks, alongside Gemini Omni for any-to-any generation and the Gemini Spark personal agent ^[60]^[61]. Gemini competes directly with OpenAI's GPT-5 and Anthropic's Claude model family in the frontier AI market.

Beyond the proprietary Gemini line, Google DeepMind ships an open-weight cousin, Gemma, and a growing roster of specialist Gemini-derived systems including Nano Banana 2 image generation, Gemini Diffusion, Project Astra, Project Mariner, Jules, and the AlphaProof/AlphaGeometry mathematics agents. As of Google's Q4 2025 earnings call on February 4, 2026, the Gemini app surpassed 750 million monthly active users, with Sundar Pichai reporting that Google's first-party models process more than 10 billion tokens per minute via direct customer API use ^[47].

History

Google Brain and DeepMind merger

The story of Gemini begins with an organizational restructuring at Google. In April 2023, Google merged its two leading AI research divisions: Google Brain and DeepMind. Google Brain, which started in 2011 as a part-time research collaboration led by Jeff Dean and Greg Corrado, had produced foundational work on large language models including LaMDA and PaLM. DeepMind, a London-based lab acquired by Google in 2014 for a reported $500 million, had built its reputation on reinforcement learning breakthroughs and neuroscience-inspired AI approaches, producing models like Gopher and Sparrow ^[3].

The merger, announced by Alphabet CEO Sundar Pichai, created a single unit called Google DeepMind under the leadership of Demis Hassabis. The consolidation was widely seen as a response to the competitive pressure from OpenAI's ChatGPT, which had launched in November 2022 and rapidly gained over 100 million users. By unifying the two research groups, Google aimed to accelerate AI development and reduce internal duplication of effort ^[4].

Gemini 1.0 (December 2023)

Google first mentioned Gemini at its I/O developer conference on May 10, 2023, when Pichai described it as a next-generation model still in early development. The model was formally announced on December 6, 2023, with CEO Pichai and Hassabis positioning it as Google's most capable model ever. In the announcement, Pichai wrote, "I believe the transition we are seeing right now with AI will be the most profound in our lifetimes, far bigger than the shift to mobile or to the web before it" ^[5].

Gemini 1.0 launched in three sizes: Ultra (for highly complex tasks), Pro (a general-purpose model), and Nano (optimized for on-device use on mobile phones). The Ultra variant achieved a score of 90.0% on the MMLU (Massive Multitask Language Understanding) benchmark, making it, in Google's words, "the first model to outperform human experts on MMLU" ^[5]. Google reported that Gemini 1.0 outperformed existing models on 30 out of 32 major academic benchmarks ^[6].

The Nano variant came in two sizes: Nano-1 with approximately 1.8 billion parameters, and Nano-2 with around 3.25 billion parameters. These compact models were designed for real-time inference on mobile devices without requiring cloud connectivity. Gemini 1.0 models supported a 32,000-token context window and used a Transformer-based decoder architecture ^[7].

Bard rebranding (February 2024)

On February 8, 2024, Google rebranded its Bard chatbot as Gemini. The company simultaneously retired the "Duet AI" branding used for AI features in Google Cloud and Workspace, consolidating everything under the Gemini name. Alongside the rebrand, Google launched a dedicated Gemini mobile app on Android and integrated the service into the Google app on iOS ^[8].

The rebranding was motivated by several factors. Google wanted to unify its AI products under a single recognizable brand rather than maintaining a confusing mix of names. The move was also a strategic reset: Bard had suffered reputational damage from a widely publicized hallucination error during its launch demo in February 2023, which contributed to a $100 billion drop in Alphabet's market value ^[9].

At the same time, Google introduced "Gemini Advanced" powered by Gemini Ultra 1.0, available through the Google One AI Premium subscription plan.

Gemini 1.5 (February-May 2024)

Google announced Gemini 1.5 Pro in February 2024, describing it as more capable than Gemini 1.0 Ultra despite being more efficient. The model introduced two major architectural changes: a Mixture-of-Experts (MoE) approach and a dramatically expanded context window of up to 1 million tokens ^[10].

The 1-million-token context window was a significant technical achievement. It allowed the model to process roughly 700,000 words, 1 hour of video, 11 hours of audio, or codebases with over 30,000 lines of code in a single prompt. Google demonstrated the model's capabilities by showing near-perfect recall across these long contexts, including analyzing 3 hours of video and 22 hours of audio ^[11].

At Google I/O on May 14, 2024, Google introduced Gemini 1.5 Flash, a lighter-weight model designed to be fast and cost-efficient at scale. Flash was trained through a process called distillation, where essential knowledge from the larger 1.5 Pro model was transferred into a smaller, faster model. 1.5 Flash excelled at summarization, chat, image and video captioning, and data extraction from long documents ^[12].

Also at I/O 2024, Google made the 2-million-token context window available for Gemini 1.5 Pro through a developer waitlist, pushing the boundaries of what language models could process in a single query.

Gemini 2.0 (December 2024 - February 2025)

On December 11, 2024, Google announced Gemini 2.0 Flash Experimental, framing it as the beginning of a new "agentic era" for AI. The model introduced several notable capabilities: native multimodal output (generating text, audio, and images in a single API call), a Multimodal Live API for real-time audio and video streaming interactions, and native tool use including Google Search and code execution ^[13].

Gemini 2.0 Flash was twice as fast as Gemini 1.5 Pro while achieving stronger benchmark performance. It could natively generate images and support conversational, multi-turn image editing. The Multimodal Live API supported natural conversational patterns including interruptions and voice activity detection ^[14].

On December 19, 2024, Google released Gemini 2.0 Flash Thinking, an experimental variant with enhanced reasoning capabilities. Gemini 2.0 Flash became the default model on January 30, 2025. This was followed by the release of Gemini 2.0 Pro on February 5, 2025, and Gemini 2.0 Flash-Lite on February 25, 2025. Flash-Lite was positioned as the most affordable option in the lineup, priced at $0.075 per million input tokens and $0.30 per million output tokens ^[15].

The 2.0 launch also introduced two experimental AI agent projects. Jules, an AI coding agent that integrates with GitHub to autonomously fix bugs and update code in cloud virtual machines, debuted alongside Project Mariner, a Chrome-based AI agent that can browse websites and complete web tasks on behalf of users ^[13].

Gemini 2.0 was the first generation Google explicitly described as trained on its sixth-generation Trillium TPU: in the December 11, 2024 launch post, Google CEO Sundar Pichai stated, "We used Trillium TPUs to train the new Gemini 2.0" ^[48].

Gemini 2.5 (March-September 2025)

In March 2025, Google released Gemini 2.5 Pro, its first "thinking model." The 2.5 series introduced a new paradigm where models could reason through their internal thought process before generating a response. This approach, similar to what OpenAI had introduced with its o1 model series, allowed for significantly improved accuracy on complex reasoning tasks ^[16].

Gemini 2.5 Pro debuted at #1 on the LMArena leaderboard by a significant margin. It supported a 1-million-token context window (later extended to 2 million tokens) and featured configurable "thinking budgets" that let developers control how much computation the model spent on reasoning before responding. Key benchmark scores included 84.0% on GPQA Diamond, 92.0% on AIME 2024, 86.7% on AIME 2025, and 81.7% on MMMU ^[17].

Google also introduced Deep Think mode, an enhanced reasoning capability using novel research techniques that enabled the model to consider multiple hypotheses before responding. With Deep Think, Gemini 2.5 Pro scored impressively on the 2025 USAMO (one of the hardest math competitions), led on LiveCodeBench, and scored 84.0% on MMMU, a multimodal reasoning benchmark. Deep Think set a new record on FrontierMath Tiers 1-3 (29%) and reached the 65th percentile of USAMO participants ^[18].

Gemini 2.5 Flash followed on May 20, 2025, becoming the first Flash-series model with thinking capabilities. It offered the same reasoning features at a lower cost ($0.30/1M input tokens, $2.50/1M output tokens) and faster speed, maintaining the Flash line's emphasis on efficiency. Both 2.5 Pro and Flash reached general availability on June 17, 2025 ^[19].

Gemini 2.5 Flash-Lite launched on June 17, 2025 (stable on July 22), as a balanced model optimized for low-latency use cases, priced at $0.10 per million input tokens and $0.40 per million output tokens, with a 1-million-token context window and 64K output ^[20].

Gemini 3 (November-December 2025)

Gemini 3 Pro launched on November 18, 2025, marking a major generational leap. Google positioned it as "our most intelligent model that helps you bring any idea to life," with greater emphasis on long-term reasoning, multimodal understanding, persistent memory, and reliable agentic behavior. Demis Hassabis described Gemini 3 as "the best model in the world for multimodal understanding and our most powerful agentic and vibe coding model yet" ^[21].

The model achieved a 1501 Elo score on LMArena (claiming the top leaderboard position), scored 91.9% on GPQA Diamond (surpassing human expert performance), and reached 76.2% on SWE-bench Verified for coding tasks. Gemini 3 Pro scored 37.5% on Humanity's Last Exam without tools. The model supported a 1-million-token context window and was priced at $2 per million input tokens and $12 per million output tokens ^[22].

The release was accompanied by the launch of Google Antigravity, a new agentic development platform that combines an AI-powered IDE with a "Manager Surface" for spawning and orchestrating multiple AI agents working asynchronously. Antigravity was released in public preview at no cost for individuals, with support for MacOS, Windows, and Linux ^[23]. This was the first time Google launched a Gemini model across multiple products on day one.

Gemini 3 Flash followed on December 17, 2025 (released on December 22), combining Gemini 3 Pro's reasoning capabilities with the Flash line's low latency and cost efficiency. It achieved a 78% SWE-bench Verified score for agentic coding, actually outperforming Gemini 3 Pro on that particular benchmark. It was priced at $0.50 per million input tokens and $3.00 per million output tokens, and became the default model in the Gemini app globally ^[24].

Gemini 3 Deep Think (February 2026)

On February 12, 2026, Google DeepMind released Gemini 3 Deep Think, a major upgrade to the specialized reasoning mode first introduced with Gemini 2.5. Deep Think is not a separate model but rather a reasoning mode within Gemini 3 that allocates additional compute at inference time, enabling the model to explore multiple hypotheses before producing a final answer ^[25].

Gemini 3 Deep Think achieved a record-breaking 84.6% on ARC-AGI-2 (verified by the ARC Prize Foundation), representing a 53.5-percentage-point improvement over standard Gemini 3 Pro's 31.1% on the same benchmark and a 15.8-point lead over the next best AI system. On Humanity's Last Exam, it scored 48.4% without tools. It also reached 3,455 Elo on Codeforces, 93.8% on GPQA Diamond, 87.7% on International Physics Olympiad, and 82.8% on International Chemistry Olympiad 2025 ^[25]^[26].

Deep Think can generate up to 192,000 tokens of output, allowing it to work through extended reasoning chains. It is available to Google AI Ultra subscribers in the Gemini app, with a limit of 10 prompts per day ^[27].

Gemini 3.1 Pro (February 2026)

Google DeepMind released Gemini 3.1 Pro Preview on February 19, 2026. This marked a naming convention change: Google had previously used ".5" for mid-cycle updates (as in 1.5 and 2.5), but shifted to ".1" to reflect the significant capability jump in reasoning and agentic performance ^[1].

The headline result was a 77.1% score on ARC-AGI-2, a benchmark that evaluates a model's ability to solve novel visual-logic puzzles requiring multi-step abstraction and deduction outside its training distribution. Google stated, "On ARC-AGI-2, a benchmark that evaluates a model's ability to solve entirely new logic patterns, 3.1 Pro achieved a verified score of 77.1%. This is more than double the reasoning performance of 3 Pro" ^[1]. The 31.1% baseline belonged to Gemini 3 Pro, and the 77.1% result represented the highest score among frontier models operating without extended deep thinking mode ^[1].

Other benchmark results included a 2887 Elo on LiveCodeBench Pro, 94.3% on GPQA Diamond, and #1 rankings on 12 of 18 tracked benchmarks. The model supports a 1-million-token context window, 64K token output, and three configurable thinking levels (Low, Medium, High). It also introduced the ability to generate, animate, and visually render SVG graphics and 3D code directly from natural language descriptions ^[1].

Gemini 3.1 Flash Lite (March 2026)

On March 3, 2026, Google released Gemini 3.1 Flash Lite in preview, the fastest and most cost-efficient model in the Gemini 3 series. It was designed for high-volume developer workloads at scale, priced at $0.25 per million input tokens and $1.50 per million output tokens ^[2].

Gemini 3.1 Flash Lite delivers 2.5 times faster time-to-first-token and 45% faster output speed compared to Gemini 2.5 Flash, according to Artificial Analysis benchmarks. It achieved an Elo score of 1432 on the Arena.ai leaderboard, 86.9% on GPQA Diamond, and 76.8% on MMMU Pro. The model comes with configurable thinking levels in AI Studio and Vertex AI, giving developers control over how much reasoning the model applies to each task ^[2].

Target use cases include high-volume translation, content moderation, user interface generation, dashboard creation, simulation building, and instruction following at scale.

Gemini 3.1 Pro Deep Research and Robotics (April 2026)

April 2026 brought two significant additions to the Gemini lineup that pushed the family beyond traditional chat use cases. On April 14, 2026, Google DeepMind released Gemini Robotics-ER 1.6, a major upgrade to its specialized embodied reasoning model. Then on April 21, 2026, Google launched Deep Research and Deep Research Max in the Gemini API, both built on Gemini 3.1 Pro ^[42]^[43].

Gemini Robotics-ER 1.6 added a new instrument-reading capability that can interpret analog gauges, pressure meters, and sight glasses with 93% accuracy, up from 23% in Gemini Robotics-ER 1.5. The model also introduced multi-view success detection, merging live streams from multiple cameras into a coherent scene understanding before declaring a task complete. The release was developed in close collaboration with Boston Dynamics, which integrated the model into its Spot inspection platform ^[42].

Deep Research Max represented a step change for autonomous research agents. Built on Gemini 3.1 Pro with extended test-time compute, Deep Research Max scored 93.3% on DeepSearchQA (up from 66.1% in the December 2025 preview) and improved from 46.4% to 54.6% on Humanity's Last Exam. The companion Deep Research model offered the same agent design with lower latency and cost. Both supported MCP (Model Context Protocol), native chart and infographic generation, and the ability to draw on private workspace data alongside public web sources ^[43].

Gemini 3.5 and Google I/O 2026 (May 2026)

At its I/O developer conference on May 19, 2026, Google announced the Gemini 3.5 series, described as "frontier intelligence with action." The first model in the series, Gemini 3.5 Flash, became generally available the same week and rolled out globally across the Gemini app, AI Mode in Search, the Gemini API (Google AI Studio and Android Studio), Google Antigravity, and Gemini Enterprise. Google said Gemini 3.5 Flash delivers frontier-level capability at less than half the price of comparable frontier models and roughly four times faster output, and that it outperforms Gemini 3.1 Pro on challenging coding and agentic benchmarks. It scored 76.2% on Terminal-Bench 2.1, 1656 Elo on GDPval-AA, 83.6% on MCP Atlas, and 84.2% on the CharXiv Reasoning multimodal benchmark. Dynamic thinking is enabled by default, with the model automatically allocating more compute to harder problems, and it supports a 1-million-token input context window with up to 64K tokens of output ^[60]^[61]. The heavier-weight Gemini 3.5 Pro was being used internally at the time of the announcement, with Google saying it expected to roll it out the following month ^[60].

The same keynote introduced Gemini Omni, a model capable of generating output in any modality from any input, beginning with video. Google said Gemini Omni Flash was available starting that day to Google AI Plus, Pro, and Ultra subscribers (and via YouTube Shorts Remix and the YouTube Create app), with all generated videos carrying an imperceptible SynthID watermark ^[61]^[62]. Google also previewed Gemini Spark, a "24/7" personal AI agent that runs on Gemini 3.5 atop Google Antigravity and operates on dedicated Google Cloud virtual machines to handle long-horizon tasks through email, chat, and the Gemini app, initially as a beta for Google AI Ultra subscribers ^[61]^[62].

In his opening keynote, Sundar Pichai said the Gemini app had surpassed 900 million monthly active users, up from 400 million at I/O 2025, and that Google's model APIs were processing roughly 19 billion tokens per minute, with over 8.5 million developers building with its models monthly ^[62]. Google also restructured its top consumer tier, introducing a new Google AI Ultra plan priced at $100 per month, and added YouTube Premium Lite to the AI Pro plan at no extra charge ^[61]^[62].

Model versions

The following table summarizes the major Gemini model releases, their dates, and key specifications.

Model	Release Date	Context Window	Key Features
Gemini 1.0 Ultra	December 6, 2023	32K tokens	First model to surpass human experts on MMLU (90.0%); most capable Gemini 1.0 variant
Gemini 1.0 Pro	December 6, 2023	32K tokens	General-purpose model; 71.8% MMLU; integrated into Bard
Gemini 1.0 Nano	December 6, 2023	32K tokens	On-device model (1.8B and 3.25B parameter variants); runs on mobile without cloud
Gemini 1.5 Pro	February 2024	1M tokens (2M via waitlist)	Mixture-of-Experts architecture; major context window expansion; stronger than 1.0 Ultra
Gemini 1.5 Flash	May 14, 2024	1M tokens	Distilled from 1.5 Pro; optimized for speed and cost; fast summarization and captioning
Gemini 2.0 Flash (Experimental)	December 11, 2024	1M tokens	Multimodal Live API; native image generation; native tool use; 2x faster than 1.5 Pro
Gemini 2.0 Flash (Stable)	January 30, 2025	1M tokens	Became default model; general availability
Gemini 2.0 Flash-Lite	February 25, 2025	1M tokens	Most affordable option ($0.075/1M input); 8K max output; deprecated June 2026
Gemini 2.0 Pro	February 5, 2025	1M tokens	Higher-capability 2.0 variant for complex tasks
Gemini 2.5 Pro	March 25, 2025	1M tokens (2M later)	First thinking model; #1 on LMArena; configurable thinking budget; Deep Think mode
Gemini 2.5 Flash	May 20, 2025	1M tokens	First Flash model with thinking capabilities; cost-efficient reasoning
Gemini 2.5 Flash-Lite	June 17, 2025	1M tokens (64K output)	Low-latency balanced model; $0.10/1M input; stable July 22, 2025
Gemini Diffusion	May 20, 2025	(Experimental)	Text diffusion model; ~1,479 tokens/sec sampling; signup-only research demo
Gemini 3 Pro	November 18, 2025	1M tokens	1501 LMArena Elo; 91.9% GPQA Diamond; persistent memory; agentic behavior
Gemini 3 Flash	December 17, 2025	1M tokens	78% SWE-bench Verified; default Gemini app model globally
Gemini 3 Deep Think	February 12, 2026	1M tokens (192K output)	84.6% ARC-AGI-2; 48.4% Humanity's Last Exam; 3,455 Codeforces Elo
Gemini 3.1 Pro (Preview)	February 19, 2026	1M tokens (64K output)	77.1% ARC-AGI-2; 94.3% GPQA Diamond; three thinking levels; SVG/3D generation
Gemini 3.1 Flash Image (Nano Banana 2)	February 26, 2026	(Image)	#1 on Artificial Analysis Image Arena; integrated web-grounded image generation
Gemini 3.1 Flash Lite (Preview)	March 3, 2026	1M tokens	1432 Arena.ai Elo; 86.9% GPQA Diamond; 2.5x faster TTFT than 2.5 Flash
Gemini Robotics-ER 1.6	April 14, 2026	(Specialized)	Embodied reasoning model; 93% gauge-reading accuracy; multi-view scene fusion
Deep Research / Deep Research Max	April 21, 2026	1M tokens	Built on Gemini 3.1 Pro; 93.3% DeepSearchQA; MCP support; native charts
Gemini 3.5 Flash	May 19, 2026	1M tokens (64K output)	Frontier agentic model; outperforms 3.1 Pro on coding/agentic benchmarks; 76.2% Terminal-Bench 2.1; dynamic thinking
Gemini Omni Flash	May 19, 2026	(Multimodal)	Any-to-any generation starting with video; SynthID-watermarked output

How does Gemini work? (Architecture)

Gemini combines a decoder-only Transformer that ingests text, image, audio, and video tokens in a shared space with, from Gemini 1.5 onward, a sparse Mixture-of-Experts design that activates only a fraction of its parameters per token. For Gemini 3 Pro, the model has over 1 trillion total parameters but activates only an estimated 15 to 20 billion per query ^[29].

Transformer backbone

Gemini is built on a decoder-only Transformer architecture that receives interleaved multimodal tokens (text, visual, audio, video). Unlike earlier Google models that processed text primarily and used separate modules for other modalities, Gemini was designed from the ground up to be natively multimodal. Input from different modalities is converted into a shared token space through learned encoders, allowing the model to reason across modalities within the same forward pass ^[28].

The architecture employs Multi-Query Attention for latency improvements and supports scalable context windows reaching up to 2 million tokens in the 2.5 series. The decoder processes these interleaved tokens sequentially, generating output that can include text, images, audio, or code depending on the task ^[29].

Mixture-of-Experts

Starting with Gemini 1.5, the model family adopted a Mixture-of-Experts (MoE) architecture. In a traditional dense Transformer, every parameter is activated for every input token. In an MoE model, the network is divided into smaller specialized "expert" sub-networks, and a routing mechanism learns to activate only the most relevant experts for each input token ^[10].

This approach allows the model to have a very large total parameter count while keeping the computational cost per token low. For Gemini 3 Pro, the model has over 1 trillion total parameters, but only 15 to 20 billion parameters are activated per query. Typical configurations use top-k expert routing with k=2, meaning two experts are selected per token. This selective activation enables massive parameter scaling while optimizing compute efficiency and reducing serving costs ^[29].

Multimodal design

Gemini's multimodal architecture follows a unified design philosophy. Rather than bolting separate vision or audio models onto a text backbone (as some competitors do), Gemini uses a single core Transformer that handles all modalities through a shared token representation. Visual inputs are processed through learned visual encoders, audio through audio encoders, and so on, with all modalities mapped into the same embedding space before being fed to the Transformer ^[28].

Beginning with Gemini 2.0, the model also gained native multimodal output capabilities. It can generate interleaved text and images in a single response, produce controllable text-to-speech with watermarking, and generate structured outputs combining multiple formats. This bidirectional multimodal capability (understanding and generating across modalities) distinguishes Gemini from many competitors that handle multimodal input but produce text-only output ^[13].

Diffusion alternative: Gemini Diffusion

Alongside the autoregressive decoder line, Google DeepMind announced an experimental sister architecture called Gemini Diffusion at Google I/O on May 20, 2025. Rather than predicting one token at a time, Gemini Diffusion generates text and code by iteratively refining noise across all output positions in parallel, borrowing techniques from image and video diffusion models. Google reported an average sampling rate of approximately 1,479 tokens per second with a 0.84-second startup overhead, claiming roughly 4 to 5x faster end-to-end generation compared with a similarly sized autoregressive model. On coding benchmarks the experimental model performed comparably to Gemini 2.0 Flash-Lite (89.6% on HumanEval and 76.0% on MBPP) while trailing on reasoning benchmarks such as GPQA Diamond (40.4%). Gemini Diffusion remains a sign-up-only research demo rather than a production model ^[49].

Training infrastructure

Gemini models are trained almost entirely on Google's custom Tensor Processing Unit hardware, with no reported use of Nvidia GPUs in the training stack. Successive Gemini generations have closely co-evolved with new TPU generations designed by Google's hardware team in collaboration with Google DeepMind.

Trillium (TPU v6e)

Trillium, Google's sixth-generation TPU, was announced at Google I/O on May 14, 2024, and became generally available to Google Cloud customers in December 2024. Google described Trillium as offering a 4.7x peak compute increase per chip over TPU v5e, double the High Bandwidth Memory capacity and bandwidth, double the inter-chip interconnect bandwidth, and a 67% improvement in energy efficiency ^[50]. In the Gemini 2.0 launch announcement on December 11, 2024, Sundar Pichai confirmed, "We used Trillium TPUs to train the new Gemini 2.0" ^[48]. Trillium pods scale up to 256 chips, and Google has linked more than 100,000 Trillium chips on its Jupiter network fabric at over 13 petabits per second of bisectional bandwidth, enabling single-job training to span hundreds of thousands of accelerators ^[50].

Ironwood (TPU Ironwood, v7)

Google unveiled Ironwood, its seventh-generation TPU, at Google Cloud Next on April 9, 2025, with general availability announced in November 2025 alongside Gemini 3. Ironwood is described as an "inference-first" chip optimized for serving thinking models at scale, delivering more than 4x better performance per chip than Trillium for both training and inference. Each Ironwood superpod scales to 9,216 liquid-cooled chips, providing 42.5 ExaFlops of compute and 1.77 petabytes of shared High Bandwidth Memory, interconnected via a 9.6 Tb/s Inter-Chip Interconnect ^[51]. Google has not publicly disclosed precisely which Gemini variants train on Ironwood versus Trillium, but Google described Ironwood as the engine for "production Gemini inference at scale" by late 2025 ^[51].

TPU v8 (Cloud Next 2026)

At Google Cloud Next on April 22, 2026, Sundar Pichai announced Google's eighth-generation TPU in two variants: TPU 8t (training-focused), which scales to 9,600 TPUs per pod with 2 petabytes of shared memory and is described as providing roughly three times the processing power of Ironwood with up to 2x better performance per watt, and TPU 8i (inference-focused), which connects 1,152 TPUs per pod with 3x more on-chip SRAM and is positioned to "support running millions of agents cost-effectively." The same Cloud Next keynote announced that the Gemini Enterprise Agent Platform had reached 40% quarter-over-quarter growth in paid monthly active users in Q1 2026 ^[52].

Software stack

Across all hardware generations, Google trains Gemini using a combination of XLA (the Accelerated Linear Algebra compiler), the JAX numerical computing framework, and Pathways, Google's distributed runtime for orchestrating large-scale training across heterogeneous accelerators. Pathways was originally designed to share infrastructure efficiently across many Google AI workloads and has been credited with reducing compute fragmentation during Gemini training runs ^[29].

Key capabilities

Long context processing

One of Gemini's most distinctive features is its extremely large context window. Starting with Gemini 1.5 Pro's 1-million-token context (expanded to 2 million tokens by late 2024), the model can ingest and reason over entire books, hours of video, full codebases, and lengthy document collections in a single query ^[11].

Google has demonstrated near-perfect recall across these long contexts. In testing, Gemini 1.5 Pro could find specific details in 3 hours of video footage and maintain coherence across 22 hours of audio. The model also generalizes zero-shot to tasks involving long inputs, such as following complex instructions spanning thousands of pages ^[11].

The Gemini 3 series maintains the 1-million-token context window, while the 2.5 series offered up to 2 million tokens. Gemini 3.1 Pro supports 1 million tokens of input and up to 64,000 tokens of output ^[1].

Thinking and reasoning

With the introduction of the 2.5 series in March 2025, Gemini gained explicit reasoning ("thinking") capabilities. In this mode, the model works through a problem step by step internally before producing its final answer, similar to chain-of-thought prompting but as a native capability ^[16].

Developers can control the thinking budget, choosing how much computation the model dedicates to reasoning. The 3.1 Pro model offers three discrete thinking levels: Low, Medium, and High. Higher thinking budgets generally produce more accurate results on complex mathematical, scientific, and coding problems, at the cost of increased latency and token usage ^[1].

Deep Think mode goes further, using advanced techniques including parallel thinking to enable the model to explore multiple hypotheses before selecting the best answer. Deep Think explicitly dedicates more inference-time compute and can generate up to 192,000 tokens of reasoning output. This mode has proven especially effective on competition-level mathematics (reaching gold-medal standard at the 2025 International Mathematical Olympiad in research evaluations), complex coding benchmarks, and scientific reasoning challenges ^[18]^[25]^[53].

Code generation and software engineering

Gemini models have shown strong performance across coding benchmarks. Gemini 3 Pro achieved a 2,439 Elo on LiveCodeBench (a competitive programming benchmark) and 76.2% on SWE-bench Verified, which measures the ability to solve real-world software engineering tasks from GitHub issues ^[22].

Gemini 3 Flash scored even higher on SWE-bench Verified at 78%, while Gemini 3.1 Pro reached a 2887 Elo on LiveCodeBench Pro. Gemini 3 Deep Think pushed competitive programming even further, achieving 3,455 Elo on Codeforces. These scores place Gemini models among the top performers for agentic coding tasks, where the model must understand a codebase, diagnose issues, and generate correct patches ^[24]^[1]^[25].

Multimodal understanding

Gemini natively processes text, images, audio, video, PDFs, and code. This goes beyond simple image captioning: the model can reason about relationships between objects in images, understand video narratives across time, transcribe and analyze audio content, and work with documents that combine text and visual elements ^[28].

The multimodal reasoning capability was tested extensively on MMMU, a benchmark requiring understanding of images, charts, diagrams, and text together. Gemini 2.5 Pro with Deep Think scored 84.0% on MMMU, while the 3 series continued to improve on multimodal benchmarks ^[18].

Native image generation (Nano Banana family)

Beyond multimodal understanding, Gemini also drives Google's first-party native image generation. The original "Nano Banana" image model launched in 2025 inside the Gemini app and rapidly became one of the most-used consumer image generators. On February 26, 2026, Google released Nano Banana 2, formally named Gemini 3.1 Flash Image, which sits on top of the Gemini 3.1 Flash reasoning backbone. Google describes Nano Banana 2 as a state-of-the-art image generation and editing model with web-grounded knowledge: the model can pull real-time information and images from web search to generate accurate renderings of specific subjects, create infographics, and turn notes into diagrams. Nano Banana 2 introduces Minimal and High thinking modes that allow it to reason through complex prompts before rendering, and it is rated #1 on the Artificial Analysis Image Arena at roughly half the price of its predecessor, Nano Banana Pro ^[54].

Benchmarks

The following table shows selected benchmark results across Gemini generations, illustrating the model family's progression.

Benchmark	Gemini 1.0 Ultra	Gemini 1.5 Pro	Gemini 2.5 Pro	Gemini 3 Pro	Gemini 3 Deep Think	Gemini 3.1 Pro
MMLU	90.0%	N/A	N/A	N/A	N/A	N/A
GPQA Diamond	N/A	N/A	84.0%	91.9%	93.8%	94.3%
LMArena Elo	N/A	N/A	#1 (debut)	1501	N/A	N/A
SWE-bench Verified	N/A	N/A	63.8%	76.2%	N/A	N/A
ARC-AGI-2	N/A	N/A	N/A	31.1%	84.6%	77.1%
LiveCodeBench / Codeforces	N/A	N/A	N/A	2,439 Elo	3,455 Elo	2,887 Elo
MMMU (Deep Think)	N/A	N/A	84.0%	N/A	N/A	N/A
Humanity's Last Exam (no tools)	N/A	N/A	18.8%	37.5%	48.4%	N/A
AIME 2025	N/A	N/A	86.7%	N/A	N/A	N/A

These results reflect different time periods and testing conditions; direct cross-generation comparisons should be made cautiously since benchmarks evolve and testing methodologies change.

Gemini app

The Gemini app is Google's consumer-facing AI assistant, accessible through the web (gemini.google.com), a dedicated Android app, and integration within the Google app on iOS.

From Bard to Gemini

The app originated as Bard, which Google launched on March 21, 2023, as a direct competitor to ChatGPT. Bard initially ran on LaMDA and later upgraded to PaLM 2. On February 8, 2024, Google rebranded Bard as Gemini, aligning the consumer product with the underlying model family ^[8].

The rebrand coincided with the launch of Gemini Advanced, a premium tier powered by Gemini Ultra 1.0 (later upgraded to newer models). In May 2025, Google restructured its consumer AI subscriptions into a three-tier system: Google AI Plus, Google AI Pro, and Google AI Ultra, replacing the previous unified AI Premium offering ^[9].

Subscription tiers (March 2026)

As of March 2026, Google offers three paid subscription tiers for Gemini access.

Feature	AI Plus ($7.99/mo)	AI Pro ($19.99/mo)	AI Ultra ($249.99/mo)
Context window	128K tokens	1M tokens	1M tokens
Thinking prompts/day	90	300	1,500
Pro model prompts/day	30	100	500
Deep Think 3.1 access	No	No	10 prompts/day (192K context)
Deep Research reports/day	12	20	120
Image generation/day	50	100	1,000
Cloud storage	200 GB	2 TB	30 TB
Jules coding agent	Basic	5x limits	20x limits
Project Mariner	No	No	Yes (10 simultaneous tasks)
Workspace AI integration	No	Yes	Yes
YouTube Premium	No	No	Included

Google AI Plus launched in the US on January 27, 2026, at $7.99 per month (with a promotional rate of $3.99 for the first two months), providing an entry-level paid tier between the free account and the $19.99 Pro plan. The plan includes access to Gemini 3 Pro, Deep Research, NotebookLM Plus, and can be shared with up to five family members ^[30].

Google AI Ultra, priced at $249.99 per month, includes exclusive access to Deep Think 3.1, Project Mariner for autonomous web browsing, Veo 3.1 video generation with audio, 30 TB of storage, YouTube Premium, and the highest limits for Jules and other agentic features ^[27].

Current app features (March 2026)

As of March 2026, the Gemini app defaults to Gemini 3 Flash globally, with paid subscribers having access to higher-tier models. The app supports text, voice, and image inputs. Users can upload documents, images, and files for analysis. The app integrates with Google services, allowing it to access Gmail, Google Drive, Google Maps, and other tools to answer queries in context ^[24].

Additional app features include:

Gems: Customizable AI experts that can be configured for specific topics, tasks, and workflows. Users can create custom Gems and share them through Google Drive-style permissions. Pre-built Gems include a Productivity Planner that connects Gmail, Calendar, and Drive.
Gemini Live: A conversational mode for real-time, natural spoken interaction, available on mobile devices and expanding to Nest smart home devices.
Nano Banana / Nano Banana 2: Native image generation and editing, with users able to circle, draw, or annotate directly on images to specify edits. Nano Banana 2 (Gemini 3.1 Flash Image) was rolled out across Google products on February 26, 2026 ^[54].
Temporary Chat: A privacy mode (introduced August 2025) that does not save conversations to history or use them for model training.
Extensions: Integrations with Android services including Spotify, Phone, Messages, WhatsApp, and utility apps for contextual assistance.

Usage statistics

In Google's Q4 2025 earnings call on February 4, 2026, Sundar Pichai disclosed that "the Gemini app now has over 750 million monthly active users," up from 650 million the previous quarter and 275 million in early 2025. The same call reported that Google's first-party models process over 10 billion tokens per minute via direct customer API use (up from 7 billion the previous quarter), that Google had "sold more than eight million paid seats of Gemini Enterprise" across over 2,800 companies, and that Gemini Enterprise managed over 5 billion customer interactions in Q4 2025, growing 65% year-over-year ^[47]. By Q1 2026, paid monthly active users on the Gemini Enterprise Agent Platform had grown 40% quarter-over-quarter ^[52]. At Google I/O on May 19, 2026, Pichai reported that the Gemini app had surpassed 900 million monthly active users (up from 400 million at I/O 2025) and that Google's model APIs were processing roughly 19 billion tokens per minute, with more than 8.5 million developers building on its models monthly ^[62].

Integration with Google products

Google Search (AI Overviews)

Google began rolling out AI Overviews in Search in 2024, using custom Gemini models to generate summary answers that appear at the top of search results. By the end of 2024, AI Overviews had reached users in the U.S. and was expanding globally. At Google I/O 2025, the company introduced "AI Mode" in Search, powered by Gemini 2.5, providing more interactive and conversational search experiences ^[31]. By Google I/O 2026 on May 19, 2026, AI Mode had surpassed 1 billion monthly users and adopted Gemini 3.5 Flash as its default model ^[62].

Google Workspace

Gemini has been integrated across Google Workspace applications including Gmail, Docs, Sheets, Slides, Drive, and Chat. The integration provides AI assistance directly in the workflow: summarizing email threads in Gmail, drafting documents in Docs, generating formulas and analysis in Sheets, and creating presentations in Slides ^[32].

In 2025, Google removed the standalone Gemini for Workspace add-on and bundled Gemini AI features directly into Google Workspace Business Standard, Business Plus, and Enterprise plans at no extra charge, raising the per-seat prices to absorb the cost. Gemini in Workspace can reference information from multiple files within Google Drive when generating content or suggestions, making it context-aware across an organization's documents ^[32].

Android and Pixel

Gemini serves as the default AI assistant on Android devices, replacing Google Assistant for many tasks. Gemini Nano runs on-device on supported phones (starting with the Google Pixel 8 Pro and Samsung Galaxy S24 series), enabling AI features like summarization and smart reply without requiring an internet connection. The full Gemini app provides more advanced capabilities through cloud-based models ^[8].

The Pixel 10 series, launched in 2025, ships a newer multimodal Gemini Nano accessible to third-party developers through Android's ML Kit GenAI APIs. On-device features include Recorder Summaries, Magic Compose without an internet connection, and Pixel Screenshots, which uses Nano to extract searchable information from a user's screenshot history ^[55].

AI agent projects

Google has used Gemini as the foundation for several AI agent initiatives that go beyond traditional chat interactions.

Project Astra

Project Astra is a Google DeepMind research prototype exploring the development of a universal AI assistant. First previewed at Google I/O on May 14, 2024, Astra processes real-time video, audio, and text as a single continuous stream, enabling it to see, hear, remember, and reason about its environment ^[33].

Project Astra was significantly expanded at Google I/O 2025. It powers a new "Search Live" feature in Google Search, where users can click a "Live" button while using AI Mode or Google Lens to ask questions about what they are seeing through their smartphone camera. Astra streams live video and audio into the model and responds with minimal latency ^[33].

Google's collaboration with Samsung on Android XR smart glasses (under the "Project Moohan" codename) aims to provide a physical form factor for Astra, enabling features like real-time visual labeling, instant sign translation, and step-by-step repair instructions overlaid on physical objects ^[33].

Project Mariner

Project Mariner is an AI agent prototype from Google DeepMind that autonomously executes tasks within the Chrome browser. It can browse websites, fill out forms, retrieve information, and complete multi-step web tasks while keeping the user informed and allowing intervention at any time ^[34].

First unveiled in late 2024, Mariner was expanded at Google I/O 2025 with the ability to handle up to 10 tasks simultaneously by running on virtual machines in the cloud. This allows users to continue working while Mariner completes tasks in the background. As of March 2026, Project Mariner is available exclusively to Google AI Ultra subscribers at $249.99 per month ^[34].

Jules

Jules is an asynchronous AI coding agent that integrates with GitHub. It clones codebases into secure Google Cloud virtual machines and uses Gemini to autonomously fix bugs, implement features, and update code while developers focus on other work. Jules launched out of beta in August 2025, with a free plan capped at 15 daily tasks and elevated limits for Pro and Ultra subscribers ^[35].

In October 2025, Google introduced Jules Tools, a command-line interface that brings Jules directly into the developer's terminal. With the release of Gemini 3, Jules was upgraded to use Gemini 3 Pro for improved code understanding and generation ^[35].

Deep Research and Deep Research Max

Deep Research is an autonomous research agent that browses the open web (and, with permission, a user's connected Google Drive, Gmail, and other private workspace data) to produce long-form, cited reports on a topic. Originally previewed in late 2024 as a feature in the Gemini app, the agent was substantially upgraded on April 21, 2026, when Google released two new variants in the Gemini API: Deep Research (model ID deep-research-preview-04-2026) optimized for low latency, and Deep Research Max (model ID deep-research-max-preview-04-2026) optimized for maximum thoroughness ^[43].

Both variants are built on Gemini 3.1 Pro and support MCP for connecting to external data sources, native chart and infographic generation, and citation tracking. Deep Research Max climbed to 93.3% on DeepSearchQA (up from 66.1% in December 2025) and from 46.4% to 54.6% on Humanity's Last Exam, making it Google's most capable agent for asynchronous research workflows such as overnight due-diligence reports or comprehensive literature reviews ^[43].

Gemini Robotics

Gemini Robotics is a family of specialized models adapting Gemini's reasoning and multimodal capabilities for robotics. Google DeepMind first introduced the Gemini Robotics line in 2025 with two variants: Gemini Robotics, a vision-language-action (VLA) model that outputs robot actions, and Gemini Robotics-ER (Embodied Reasoning), which produces structured plans, points, and trajectories that an external controller executes.

On April 14, 2026, Google released Gemini Robotics-ER 1.6, a major upgrade focused on real-world utility. New capabilities include reading analog instruments such as gauges and sight glasses with 93% accuracy (up from 23% in version 1.5), multi-view success detection that fuses live streams from multiple cameras into a coherent scene representation, improved spatial pointing and counting, and stronger compliance with safety policies on adversarial spatial reasoning tasks. Boston Dynamics integrated the model into its Spot quadruped robot to enable autonomous industrial inspection routines, including reading factory gauges and verifying valve positions ^[42].

Gemini Robotics-ER 1.6 is available to developers through the Gemini API and Google AI Studio.

Scientific applications

A distinctive aspect of the Gemini family is the use of Gemini as a backbone language model inside Google DeepMind's broader scientific research portfolio, several of which have produced peer-reviewed mathematical and scientific results.

AlphaProof and AlphaGeometry 2 (IMO 2024)

On July 25, 2024, Google DeepMind announced that two Gemini-derived systems, AlphaProof and AlphaGeometry 2, had together solved four of the six problems from the 2024 International Mathematical Olympiad (IMO), earning 28 of a possible 42 points and equaling the top of the silver-medal category. This was the first AI system to reach silver-medal standard at the competition ^[53].

AlphaProof is a reinforcement-learning system that pairs a fine-tuned Gemini model with the AlphaZero algorithm, using Gemini to translate natural-language problem statements into the Lean formal proof language and AlphaZero to search for valid proofs. AlphaProof solved two algebra problems and one number theory problem, including the hardest problem in the competition (only five human contestants solved it). AlphaGeometry 2, a successor to the original AlphaGeometry, uses a Gemini-based language model trained on a much larger corpus of synthetic geometry data, paired with a symbolic deduction engine; it solved the IMO geometry problem within 19 seconds of receiving the formalization ^[53]. A peer-reviewed write-up of AlphaProof appeared in Nature in 2025.

Gemini Deep Think at IMO 2025

On July 21, 2025, Google DeepMind announced that an advanced version of Gemini with Deep Think had achieved gold-medal standard at the 2025 IMO, scoring 35 of 42 points and solving five of the six problems perfectly. Unlike the 2024 system, the 2025 Gemini Deep Think solution operated entirely in natural language within the official 4.5-hour competition timeframe, rather than requiring formal-language translations and multi-day computation. The result was certified by the IMO President ^[56].

AlphaFold and the 2024 Nobel Prize

While AlphaFold predates the Gemini family and is a separate system, the 2024 Nobel Prize in Chemistry, awarded on October 9, 2024, to Demis Hassabis and John Jumper for AlphaFold, shaped public perception of Google DeepMind's broader research arc and is frequently invoked by Google in framing Gemini's scientific ambitions. Hassabis characterized AlphaFold as "the first proof point of AI's incredible potential to accelerate scientific discovery" ^[57]. Subsequent Gemini work on scientific reasoning is often described by Google as a continuation of the same research program. AlphaFold 3, released in 2024, extends protein-structure prediction to nucleic acids and small molecules.

API access and developer tools

Google AI Studio

Google AI Studio is the primary entry point for developers experimenting with Gemini models. It offers a web-based interface for testing prompts, tuning models, and generating API keys. The free tier requires no credit card and provides access to multiple models, including Gemini 2.5 Flash, 3 Flash, and 3.1 Flash Lite, with rate limits of 5 to 15 requests per minute and up to 1,000 requests per day ^[36].

Vertex AI

For production workloads, Google offers Gemini through Vertex AI, its enterprise AI platform on Google Cloud. Vertex AI provides the same models at the same per-token pricing but adds features for high-volume deployment, including fine-tuning, batch processing, model evaluation, and enterprise security controls ^[36].

Gemini CLI

Gemini CLI is Google's open-source command-line interface for the Gemini API, offering an agentic terminal experience aimed at developers who prefer to drive coding workflows from a shell rather than an IDE. It supports tool use, file editing, and integration with Google AI Studio, Vertex AI, and a free Gemini account tier.

Google Antigravity

Launched alongside Gemini 3 on November 18, 2025, Google Antigravity is an agentic development platform that combines a traditional AI-powered code editor with an agent-first "Manager Surface" interface. In the Manager Surface, developers can spawn, orchestrate, and observe multiple AI agents working asynchronously across different workspaces. Agents produce "Artifacts" (task lists, implementation plans, screenshots, browser recordings) that developers can review and comment on. Antigravity is available in public preview at no cost for individuals and supports Gemini 3 Pro, Gemini 3 Flash, and third-party models ^[23].

Gemini in third-party IDEs

In addition to Google's own surfaces, Gemini models are available inside GitHub Copilot, Cursor, and other developer tools through model-selection menus. GitHub added Gemini 3 Pro as a selectable model in November 2025 and broader Visual Studio support in December 2025 ^[58].

Pricing

Gemini API pricing follows a per-token model. The following table shows pricing as of early 2026.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Notes
Gemini 2.0 Flash-Lite	$0.075	$0.30	Most affordable 2.0 option; deprecated June 2026
Gemini 2.5 Flash-Lite	$0.10	$0.40	Low-latency balanced model
Gemini 2.5 Flash	$0.30	$2.50	Thinking model, cost-efficient
Gemini 3.1 Flash Lite	$0.25	$1.50	Fastest Gemini 3 series model
Gemini 3 Flash	$0.50	$3.00	Default app model
Gemini 3.5 Flash	$1.50	$9.00	Agentic Gemini 3.5 model; $0.15/1M cached input
Gemini 3 Pro / 3.1 Pro	$2.00	$12.00	Standard context (up to 200K tokens)
Gemini 3 Pro / 3.1 Pro	$4.00	$18.00	Extended context (over 200K tokens)

Google also offers a free tier with generous limits for experimentation and prototyping. In 2026, Google introduced Project Spend Caps in AI Studio, allowing developers to set monthly dollar limits on API expenses, and revamped its Usage Tiers system to help developers scale access more quickly ^[36].

Safety and responsible AI

Google evaluates each new Gemini release against its Frontier Safety Framework (FSF), originally published in May 2024 and updated in February 2025. The framework defines "Critical Capability Levels" across CBRN (chemical, biological, radiological, nuclear) risk, cybersecurity, machine-learning research and development, and deceptive alignment, with corresponding evaluations and mitigations applied before deployment ^[59]. Each frontier-class model also ships with a published model card documenting evaluations, known limitations, and mitigation approaches; cards have been released for Gemini 2.5 Pro, Gemini 2.5 Deep Think, Gemini 2.5 Computer Use, Gemini 3 Pro, and Nano Banana variants.

Google describes Gemini 3 as having undergone "the most comprehensive set of safety evaluations of any Google AI model to date," combining external red-teaming, automated capability evaluations, query filtering, fine-tuning aligned to safety guidelines, and dedicated input filtering and processing for high-risk categories ^[21]^[59].

Is Gemini open source? (Gemma open models)

The Gemini models themselves are proprietary and closed-weight, available only through Google's APIs and apps. However, alongside the proprietary Gemini family, Google DeepMind has released the Gemma series of open-weight models built from the same research and technology that powers Gemini, giving developers downloadable weights they can run and fine-tune locally.

Gemma 1

Google released the original Gemma models on February 21, 2024, in two sizes: 2B and 7B parameters. Both used a dense decoder architecture with Rotary Positional Embeddings (RoPE) and were available in pre-trained and instruction-tuned variants. Gemma 1 models had an 8,192-token context length ^[37].

Gemma 2

Gemma 2 launched on June 27, 2024, with 9B and 27B parameter sizes (a 2B variant followed on July 31). The 27B model became one of the highest-ranking open models on the LMSYS Chatbot Arena leaderboard, outperforming popular models more than twice its size. Gemma 2 delivered higher performance and greater inference efficiency compared to the first generation, with significant safety improvements ^[38].

Gemma 3

Released on March 12, 2025, Gemma 3 is built from the same research that produced Gemini 2.0. It comes in four sizes (1B, 4B, 12B, and 27B), with a later 270M variant released in August 2025 for task-specific fine-tuning. Key advances include support for over 140 languages, vision understanding, a 128K-token context window, and function calling. In preliminary human preference evaluations on LMArena, Gemma 3 outperformed LLaMA 3 405B, DeepSeek-V3, and o3-mini ^[39].

Gemma 3n

Announced at Google I/O 2025 and released on June 26, 2025, Gemma 3n is a mobile-first architecture developed in collaboration with Qualcomm, MediaTek, and Samsung. It uses a novel MatFormer (Matryoshka Transformer) design that nests a smaller model inside a larger one, enabling elastic inference where developers can choose between the full model and a smaller, faster sub-model ^[40].

Gemma 3n is available in E2B and E4B sizes (with raw parameter counts of 5B and 8B, respectively) but runs with a memory footprint comparable to traditional 2B and 4B models, operating with as little as 2 GB of memory. It was the first sub-10B parameter model to exceed 1,300 points on LMArena. Gemma 3n supports text, image, audio, and video understanding across 140 languages ^[40].

Generative media siblings

While Gemini itself handles multimodal text-and-image generation, Google DeepMind also produces a family of specialist generative media models that share research and infrastructure with Gemini and are increasingly orchestrated through Gemini's API. These include Veo for text-to-video generation (with Veo 3.1 added in late 2025), Imagen for high-fidelity text-to-image generation (Imagen 4 is the current generation), Lyria and Lyria RealTime for music generation, and Gemini Embedding 2 for multimodal embeddings spanning text, images, audio, video, and documents.

How does Gemini compare to GPT-5 and Claude?

Gemini competes primarily with OpenAI's GPT series and Anthropic's Claude family, along with open-source alternatives like Meta's LLaMA and DeepSeek. The three frontier families differ most clearly on context length and on which kinds of coding they lead: Gemini's 1-million-token context window is roughly 2.5 times larger than GPT-5.2's 400,000 tokens and five times larger than Claude's 200,000 tokens, while Claude Opus 4.5 leads real-world software-engineering benchmarks and Gemini leads algorithmic and competitive programming ^[41].

vs. GPT-5

OpenAI released GPT-5 on August 7, 2025. GPT-5.2 offers a 400,000-token context window and achieved a perfect 100% score on AIME 2025 without external tools. On SWE-bench Verified, GPT-5.2 scored 74.9%, trailing both Gemini 3 Pro (76.2%) and Claude Opus 4.5 (80.9%). Gemini maintains a significant advantage in context window size, with its 1-million-token window being 2.5 times larger than GPT-5.2's ^[41].

vs. Claude

Anthropic released Claude Opus 4.5 on November 24, 2025. Claude leads on real-world coding tasks with an 80.9% SWE-bench Verified score. However, Gemini 3 Pro dominates algorithmic and competitive programming with its 2,439 LiveCodeBench Elo (and Gemini 3 Deep Think reaches 3,455 Codeforces Elo). Claude offers a 200,000-token context window, significantly smaller than Gemini's 1-million-token window. The models have different strengths: Gemini excels in multimodal reasoning and long-context tasks, while Claude is known for strong instruction following and coding accuracy ^[41].

All three families continue to leapfrog each other on benchmarks with each new release, and the competitive gap between frontier models has narrowed considerably since 2024.

Controversies and incidents

Despite its rapid technical progress, Gemini has been at the center of several high-profile incidents that shaped public perception of Google's AI strategy.

Bard launch hallucination (February 2023)

The original Bard demonstration in February 2023 featured a sample answer in which Bard incorrectly stated that the James Webb Space Telescope had taken the very first images of an exoplanet outside the solar system. Astronomers quickly pointed out that the first such image was actually captured in 2004 by the European Southern Observatory's Very Large Telescope. The error appeared in a promotional GIF Google had published the day before its planned AI press event in Paris, and Alphabet's stock dropped by roughly 9% that week, erasing about $100 billion in market value ^[9].

The incident set a difficult tone for the Bard launch and motivated much of the urgency behind the eventual Gemini rebrand and the consolidation of Google Brain and DeepMind into a single Google DeepMind organization later that year ^[4].

Image generation diversity controversy (February 2024)

Shortly after the Bard-to-Gemini rebrand, Google enabled Gemini to generate images of people through its Imagen 2 backend. Within days, users discovered that the system frequently refused to depict white people in historical contexts where that was historically accurate, instead returning racially diverse images for prompts about America's Founding Fathers, Vikings, medieval European knights, and even World War II German soldiers in Nazi uniforms ^[44].

On February 22, 2024, Google paused the people-generation feature. Jack Krawczyk, then head of product for Gemini Experiences, acknowledged that the model had been over-tuned to ensure diversity in generic prompts and had failed to apply nuance for prompts asking about specific historical contexts. CEO Sundar Pichai called the responses "completely unacceptable" in an internal memo and committed to structural changes ^[45].

The episode drew criticism from politicians, journalists, and AI researchers, and Google did not restore image generation of people in Gemini for several months. The controversy is widely cited as an example of the difficulty of balancing safety, fairness, and historical accuracy in generative AI products ^[44]^[45].

AI Overviews errors (May 2024)

When Google rolled out AI Overviews powered by Gemini at I/O 2024, users almost immediately surfaced absurd or dangerous responses. The most viral examples included an answer recommending that users add non-toxic glue to pizza sauce so cheese would stick (sourced from a sarcastic Reddit comment more than a decade old), a suggestion that humans should eat at least one small rock per day (sourced from a satirical article on The Onion), and a claim that Barack Obama was the first Muslim US president ^[46].

Google attributed the errors to "data voids" where authoritative sources were thin, and to the model's failure to detect satire and sarcasm. The company manually disabled AI Overviews for many of the offending queries, tightened the retrieval system's grounding sources, and added more aggressive filtering for satirical content. Liz Reid, head of Search, defended the broader rollout but acknowledged that the launch had revealed gaps in the system's ability to evaluate source quality ^[46].

Other criticism

Gemini has also faced criticism over inaccuracies in AI-generated text summaries that appear above search results, occasional confidently incorrect citations in Deep Research outputs, and concerns from publishers about traffic loss as users get answers directly from AI Overviews instead of clicking through to source websites. Google has continued to refine grounding, citation, and retrieval techniques in response.

Current state (April 2026)

As of April 2026, Google offers a full spectrum of Gemini models serving different needs. Gemini 3.1 Pro Preview is the most capable standard model, leading on 12 of 18 tracked benchmarks, while Gemini 3 Deep Think holds the record for abstract reasoning with its 84.6% ARC-AGI-2 score. Gemini 3.1 Flash Lite provides the fastest and most cost-efficient option in the Gemini 3 family for high-volume workloads, and the April 2026 releases of Gemini Robotics-ER 1.6 and Deep Research Max extended the family into embodied AI and long-horizon autonomous research. In May 2026, Google opened the next generation: at Google I/O on May 19, it released Gemini 3.5 Flash and the Gemini Omni any-to-any model, previewed the Gemini Spark personal agent, and said the heavier-weight Gemini 3.5 Pro would follow the next month ^[60]^[61]^[62].

Gemini 3 Flash serves as the default model for most consumer interactions, balancing high performance with low cost and latency. Gemini models power AI features across virtually all Google products, from Search to Workspace to Android, serving over 750 million monthly active users in the Gemini app alone and more than 2 billion users monthly through AI Overviews in Search ^[47].

Google's three-tier subscription structure (AI Plus at $7.99, AI Pro at $19.99, and AI Ultra at $249.99 per month) segments consumer access, with the Ultra tier unlocking advanced agentic capabilities through Project Mariner, elevated Jules limits, and exclusive Deep Think access.

The Gemini API has become one of the most widely used AI APIs globally, competing with OpenAI's API for developer mindshare. Google's strategy of offering a generous free tier and competitive pricing has helped attract developers, particularly those already within the Google Cloud ecosystem. At Google Cloud Next on April 22, 2026, Google paired its eighth-generation TPU (TPU 8t and TPU 8i) with the Gemini Enterprise Agent Platform as the foundation for what Sundar Pichai called managing "thousands" of AI agents per organization ^[52].

The Gemma open model family continues to expand, with Gemma 3 and Gemma 3n providing competitive open-weight alternatives that run efficiently on consumer hardware and mobile devices.

References

"Gemini 3.1 Pro: A smarter model for your most complex tasks." Google Blog, February 2026. https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/ ↩
"Gemini 3.1 Flash Lite: Our most cost-effective AI model yet." Google Blog, March 2026. https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-lite/ ↩
"Announcing Google DeepMind." Google DeepMind Blog, April 2023. https://deepmind.google/blog/announcing-google-deepmind/ ↩
"Read the internal memo Alphabet sent in merging AI-focused groups DeepMind and Google Brain." CNBC, April 2023. https://www.cnbc.com/2023/04/20/alphabet-merges-ai-focused-groups-deepmind-and-google-research.html ↩
"Introducing Gemini: our largest and most capable AI model." Google Blog, December 2023. https://blog.google/technology/ai/google-gemini-ai/ ↩
"Google announces Gemini 1.0 with Nano, Pro, and Ultra sizes." 9to5Google, December 2023. https://9to5google.com/2023/12/06/google-gemini-1-0/ ↩
"Gemini (language model)." Wikipedia. https://en.wikipedia.org/wiki/Gemini_(language_model) ↩
"Bard is now Gemini: How to try Ultra 1.0 and new mobile app." Google Blog, February 2024. https://blog.google/products-and-platforms/products/gemini/bard-gemini-advanced-app/ ↩
"Google is rebranding its Bard AI service as Gemini. Here's what it means." CBS News, February 2024. https://www.cbsnews.com/news/google-gemini-ai-bard/ ↩
"Introducing Gemini 1.5, Google's next-generation AI model." Google Blog, February 2024. https://blog.google/innovation-and-ai/products/google-gemini-next-generation-model-february-2024/ ↩
"Gemini 1.5: Google's Generative AI Model with Mixture of Experts Architecture." Encord, 2024. https://encord.com/blog/google-gemini-1-5-generative-ai-model-with-mixture-of-experts/ ↩
"Google Gemini updates: Flash 1.5, Gemma 2 and Project Astra." Google Blog, May 2024. https://blog.google/technology/ai/google-gemini-update-flash-ai-assistant-io-2024/ ↩
"Google introduces Gemini 2.0: A new AI model for the agentic era." Google Blog, December 2024. https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/ ↩
"Gemini 2.0 Flash: An outstanding multi-modal LLM with a sci-fi streaming mode." Simon Willison, December 2024. https://simonwillison.net/2024/Dec/11/gemini-2/ ↩
"Gemini 2.0 model updates: 2.0 Flash, Flash-Lite, Pro Experimental." Google Blog, February 2025. https://blog.google/innovation-and-ai/models-and-research/google-deepmind/gemini-model-updates-february-2025/ ↩
"Gemini 2.5: Our newest Gemini model with thinking." Google Blog, March 2025. https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/ ↩
"Gemini 2.5 Pro: Benchmarks & Integration Guide for Developers." Helicone, 2025. https://www.helicone.ai/blog/gemini-2.5-full-developer-guide ↩
"Gemini 2.5: Updates to our family of thinking models." Google Developers Blog, 2025. https://developers.googleblog.com/en/gemini-2-5-thinking-model-updates/ ↩
"Google I/O 2025: Updates to Gemini 2.5 from Google DeepMind." Google Blog, May 2025. https://blog.google/innovation-and-ai/models-and-research/google-deepmind/google-gemini-updates-io-2025/ ↩
"Gemini 2.5 Flash-Lite is now stable and generally available." Google Developers Blog, 2025. https://developers.googleblog.com/gemini-25-flash-lite-is-now-stable-and-generally-available/ ↩
"Gemini 3: Introducing the latest Gemini AI model from Google." Google Blog, November 2025. https://blog.google/products-and-platforms/products/gemini/gemini-3/ ↩
"Gemini 3 Pro: Pricing, Context Window, Benchmarks, and More." LLM Stats, 2025. https://llm-stats.com/models/gemini-3-pro-preview ↩
"Build with Google Antigravity, our new agentic development platform." Google Developers Blog, November 2025. https://developers.googleblog.com/build-with-google-antigravity-our-new-agentic-development-platform/ ↩
"Google launches Gemini 3 Flash, makes it the default model in the Gemini app." TechCrunch, December 2025. https://techcrunch.com/2025/12/17/google-launches-gemini-3-flash-makes-it-the-default-model-in-the-gemini-app/ ↩
"Gemini 3 Deep Think: Advancing science, research and engineering." Google Blog, February 2026. https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-deep-think/ ↩
"Gemini 3 Deep Think Achieves 48.4% on Humanity's Last Exam and 3455 Codeforces Elo." Remio AI, 2026. https://www.remio.ai/post/gemini-3-deep-think-achieves-48-4-on-humanity-s-last-exam-and-3455-codeforces-elo ↩
"What Gemini features you get with Google AI Plus, Pro, & Ultra." 9to5Google, March 2026. https://9to5google.com/2026/03/17/google-ai-pro-ultra-features/ ↩
"How Gemini 3.0 Works: Architecture, Modes & Real Examples." Skywork AI, 2025. https://skywork.ai/blog/how-gemini-3-0-works/ ↩
"From Multimodal Marvels to Mixing of Experts: Google's Gemini Evolution." Medium (happtiq), 2024. https://medium.com/happtiq-data-ai-hub/from-multimodal-marvels-to-mixing-of-experts-googles-gemini-evolution-e42622df65bf ↩
"Google AI Plus is now available everywhere our AI plans are available, including the U.S." Google Blog, January 2026. https://blog.google/products-and-platforms/products/google-one/google-ai-plus-availability/ ↩
"Google I/O 2024: New generative AI experiences in Search." Google Blog, May 2024. https://blog.google/products-and-platforms/products/search/generative-ai-google-search-may-2024/ ↩
"Gemini AI features now included in Google Workspace subscriptions." Google Workspace Help, 2025. https://support.google.com/a/answer/15756885 ↩
"Google I/O 2025: Gemini as a universal AI assistant." Google Blog, May 2025. https://blog.google/technology/google-deepmind/gemini-universal-ai-assistant/ ↩
"Google rolls out Project Mariner, its web-browsing AI agent." TechCrunch, May 2025. https://techcrunch.com/2025/05/20/project-mariner-comes-to-google-search-gemini-and-developers/ ↩
"Google's AI coding agent Jules is now out of beta." TechCrunch, August 2025. https://techcrunch.com/2025/08/06/googles-ai-coding-agent-jules-is-now-out-of-beta/ ↩
"Gemini Developer API pricing." Google AI for Developers, 2026. https://ai.google.dev/gemini-api/docs/pricing ↩
"Gemma: Google introduces new state-of-the-art open models." Google Blog, February 2024. https://blog.google/technology/developers/gemma-open-models/ ↩
"Google launches Gemma 2, its next generation of open models." Google Blog, June 2024. https://blog.google/innovation-and-ai/technology/developers-tools/google-gemma-2/ ↩
"Gemma 3: Google's new open model based on Gemini 2.0." Google Blog, March 2025. https://blog.google/technology/developers/gemma-3/ ↩
"Announcing Gemma 3n preview: powerful, efficient, mobile-first AI." Google Developers Blog, June 2025. https://developers.googleblog.com/en/introducing-gemma-3n/ ↩
"GPT 5.1 vs Claude 4.5 vs Gemini 3: 2025 AI Comparison." Passionfruit, 2025. https://www.getpassionfruit.com/blog/gpt-5-1-vs-claude-4-5-sonnet-vs-gemini-3-pro-vs-deepseek-v3-2-the-definitive-2025-ai-model-comparison ↩
"Gemini Robotics ER 1.6: Enhanced Embodied Reasoning." Google DeepMind Blog, April 2026. https://deepmind.google/blog/gemini-robotics-er-1-6/ ↩
"Deep Research Max: a step change for autonomous research agents." Google Blog, April 2026. https://blog.google/innovation-and-ai/models-and-research/gemini-models/next-generation-gemini-deep-research/ ↩
"Google pauses Gemini AI image generator after it created inaccurate historical pictures." CNBC, February 2024. https://www.cnbc.com/2024/02/22/google-pauses-gemini-ai-image-generator-after-inaccuracies.html ↩
"Google CEO Pichai says Gemini's AI image results 'offended our users'." NPR, February 2024. https://www.npr.org/2024/02/28/1234532775/google-gemini-offended-users-images-race ↩
"Why Google's AI Overviews gets things wrong." MIT Technology Review, May 2024. https://www.technologyreview.com/2024/05/31/1093019/why-are-googles-ai-overviews-results-so-bad/ ↩
"Q4 earnings call: Remarks from our CEO." Google Blog, February 4, 2026. https://blog.google/company-news/inside-google/message-ceo/alphabet-earnings-q4-2025/ ↩
"Google introduces Gemini 2.0: A new AI model for the agentic era." Google Blog, December 11, 2024. https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/ ↩
"Gemini Diffusion: Google DeepMind's experimental research model." Google Blog, May 20, 2025. https://blog.google/innovation-and-ai/models-and-research/google-deepmind/gemini-diffusion/ ↩
"Introducing Trillium, sixth-generation TPUs." Google Cloud Blog, May 14, 2024. https://cloud.google.com/blog/products/compute/introducing-trillium-6th-gen-tpus ↩
"3 things to know about Ironwood, Google's latest TPU." Google Blog, November 25, 2025. https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/ironwood-google-tpu-things-to-know/ ↩
"Sundar Pichai shares news from Google Cloud Next 2026." Google Blog, April 22, 2026. https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/cloud-next-2026-sundar-pichai/ ↩
"AI achieves silver-medal standard solving International Mathematical Olympiad problems." Google DeepMind Blog, July 25, 2024. https://deepmind.google/blog/ai-solves-imo-problems-at-silver-medal-level/ ↩
"Nano Banana 2: Google's latest AI image generation model." Google Blog, February 26, 2026. https://blog.google/innovation-and-ai/technology/ai/nano-banana-2/ ↩
"5 new things Gemini can do on Pixel." Google Blog, 2025. https://blog.google/products/gemini/gemini-nano-pixel-10-updates/ ↩
"Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad." Google DeepMind Blog, July 21, 2025. https://deepmind.google/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/ ↩
"Demis Hassabis & John Jumper awarded Nobel Prize in Chemistry." Google DeepMind Blog, October 9, 2024. https://deepmind.google/blog/demis-hassabis-john-jumper-awarded-nobel-prize-in-chemistry/ ↩
"Gemini Models on GitHub Copilot." Google Cloud Blog, 2025. https://cloud.google.com/blog/products/ai-machine-learning/gemini-models-on-github-copilot ↩
"Frontier Safety Framework." Google DeepMind, May 2024 (updated February 2025). https://deepmind.google/discover/blog/updating-the-frontier-safety-framework/ ↩
"Gemini 3.5: frontier intelligence with action." Google Blog, May 2026. https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/ ↩
"100 things we announced at Google I/O 2026." Google Blog, May 2026. https://blog.google/innovation-and-ai/technology/ai/google-io-2026-all-our-announcements/ ↩
"Google I/O 2026: Sundar Pichai's opening keynote." Google Blog, May 19, 2026. https://blog.google/innovation-and-ai/sundar-pichai-io-2026/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

11 revisions by 1 contributors · full history

Suggest edit