Gemini is a family of multimodal large language models developed by Google DeepMind. First announced on December 6, 2023, Gemini represents Google's most ambitious AI effort to date, designed from the ground up to natively process and reason across text, images, audio, video, and code. The model family has evolved rapidly through multiple generations, from Gemini 1.0 through Gemini 3.1 Pro, and powers a wide range of Google products including the Gemini consumer app, Google Search AI Overviews, Google Workspace, and Android.
As of March 2026, the latest models in the family are Gemini 3.1 Pro, which achieved a 77.1% score on the ARC-AGI-2 benchmark (more than doubling its predecessor's reasoning performance), and Gemini 3.1 Flash Lite, a cost-optimized model released on March 3, 2026, that delivers 2.5 times faster time-to-first-token than its predecessor [1][2]. Gemini competes directly with OpenAI's GPT-5 and Anthropic's Claude model family in the frontier AI market.
The story of Gemini begins with an organizational restructuring at Google. In April 2023, Google merged its two leading AI research divisions: Google Brain and DeepMind. Google Brain, which started in 2011 as a part-time research collaboration led by Jeff Dean and Greg Corrado, had produced foundational work on large language models including LaMDA and PaLM. DeepMind, a London-based lab acquired by Google in 2014 for a reported $500 million, had built its reputation on reinforcement learning breakthroughs and neuroscience-inspired AI approaches, producing models like Gopher and Sparrow [3].
The merger, announced by Alphabet CEO Sundar Pichai, created a single unit called Google DeepMind under the leadership of Demis Hassabis. The consolidation was widely seen as a response to the competitive pressure from OpenAI's ChatGPT, which had launched in November 2022 and rapidly gained over 100 million users. By unifying the two research groups, Google aimed to accelerate AI development and reduce internal duplication of effort [4].
Google first mentioned Gemini at its I/O developer conference on May 10, 2023, when Pichai described it as a next-generation model still in early development. The model was formally announced on December 6, 2023, with CEO Pichai and Hassabis positioning it as Google's most capable model ever [5].
Gemini 1.0 launched in three sizes: Ultra (for highly complex tasks), Pro (a general-purpose model), and Nano (optimized for on-device use on mobile phones). The Ultra variant achieved a score of 90.0% on the MMLU (Massive Multitask Language Understanding) benchmark, making it the first language model to surpass human expert performance on that test. Google reported that Gemini 1.0 outperformed existing models on 30 out of 32 major academic benchmarks [6].
The Nano variant came in two sizes: Nano-1 with approximately 1.8 billion parameters, and Nano-2 with around 3.25 billion parameters. These compact models were designed for real-time inference on mobile devices without requiring cloud connectivity. Gemini 1.0 models supported a 32,000-token context window and used a Transformer-based decoder architecture [7].
On February 8, 2024, Google rebranded its Bard chatbot as Gemini. The company simultaneously retired the "Duet AI" branding used for AI features in Google Cloud and Workspace, consolidating everything under the Gemini name. Alongside the rebrand, Google launched a dedicated Gemini mobile app on Android and integrated the service into the Google app on iOS [8].
The rebranding was motivated by several factors. Google wanted to unify its AI products under a single recognizable brand rather than maintaining a confusing mix of names. The move was also a strategic reset: Bard had suffered reputational damage from a widely publicized hallucination error during its launch demo in February 2023, which contributed to a $100 billion drop in Alphabet's market value [9].
At the same time, Google introduced "Gemini Advanced" powered by Gemini Ultra 1.0, available through the Google One AI Premium subscription plan.
Google announced Gemini 1.5 Pro in February 2024, describing it as more capable than Gemini 1.0 Ultra despite being more efficient. The model introduced two major architectural changes: a Mixture-of-Experts (MoE) approach and a dramatically expanded context window of up to 1 million tokens [10].
The 1-million-token context window was a significant technical achievement. It allowed the model to process roughly 700,000 words, 1 hour of video, 11 hours of audio, or codebases with over 30,000 lines of code in a single prompt. Google demonstrated the model's capabilities by showing near-perfect recall across these long contexts, including analyzing 3 hours of video and 22 hours of audio [11].
At Google I/O on May 14, 2024, Google introduced Gemini 1.5 Flash, a lighter-weight model designed to be fast and cost-efficient at scale. Flash was trained through a process called distillation, where essential knowledge from the larger 1.5 Pro model was transferred into a smaller, faster model. 1.5 Flash excelled at summarization, chat, image and video captioning, and data extraction from long documents [12].
Also at I/O 2024, Google made the 2-million-token context window available for Gemini 1.5 Pro through a developer waitlist, pushing the boundaries of what language models could process in a single query.
On December 11, 2024, Google announced Gemini 2.0 Flash Experimental, framing it as the beginning of a new "agentic era" for AI. The model introduced several notable capabilities: native multimodal output (generating text, audio, and images in a single API call), a Multimodal Live API for real-time audio and video streaming interactions, and native tool use including Google Search and code execution [13].
Gemini 2.0 Flash was twice as fast as Gemini 1.5 Pro while achieving stronger benchmark performance. It could natively generate images and support conversational, multi-turn image editing. The Multimodal Live API supported natural conversational patterns including interruptions and voice activity detection [14].
On December 19, 2024, Google released Gemini 2.0 Flash Thinking, an experimental variant with enhanced reasoning capabilities. Gemini 2.0 Flash became the default model on January 30, 2025. This was followed by the release of Gemini 2.0 Pro on February 5, 2025, and Gemini 2.0 Flash-Lite on February 25, 2025. Flash-Lite was positioned as the most affordable option in the lineup, priced at $0.075 per million input tokens and $0.30 per million output tokens [15].
The 2.0 launch also introduced two experimental AI agent projects. Jules, an AI coding agent that integrates with GitHub to autonomously fix bugs and update code in cloud virtual machines, debuted alongside Project Mariner, a Chrome-based AI agent that can browse websites and complete web tasks on behalf of users [13].
In March 2025, Google released Gemini 2.5 Pro, its first "thinking model." The 2.5 series introduced a new paradigm where models could reason through their internal thought process before generating a response. This approach, similar to what OpenAI had introduced with its o1 model series, allowed for significantly improved accuracy on complex reasoning tasks [16].
Gemini 2.5 Pro debuted at #1 on the LMArena leaderboard by a significant margin. It supported a 1-million-token context window (later extended to 2 million tokens) and featured configurable "thinking budgets" that let developers control how much computation the model spent on reasoning before responding. Key benchmark scores included 84.0% on GPQA Diamond, 92.0% on AIME 2024, 86.7% on AIME 2025, and 81.7% on MMMU [17].
Google also introduced Deep Think mode, an enhanced reasoning capability using novel research techniques that enabled the model to consider multiple hypotheses before responding. With Deep Think, Gemini 2.5 Pro scored impressively on the 2025 USAMO (one of the hardest math competitions), led on LiveCodeBench, and scored 84.0% on MMMU, a multimodal reasoning benchmark. Deep Think set a new record on FrontierMath Tiers 1-3 (29%) and reached the 65th percentile of USAMO participants [18].
Gemini 2.5 Flash followed on May 20, 2025, becoming the first Flash-series model with thinking capabilities. It offered the same reasoning features at a lower cost ($0.30/1M input tokens, $2.50/1M output tokens) and faster speed, maintaining the Flash line's emphasis on efficiency. Both 2.5 Pro and Flash reached general availability on June 17, 2025 [19].
Gemini 2.5 Flash-Lite launched on June 17, 2025 (stable on July 22), as a balanced model optimized for low-latency use cases, priced at $0.10 per million input tokens and $0.40 per million output tokens, with a 1-million-token context window and 64K output [20].
Gemini 3 Pro launched on November 18, 2025, marking a major generational leap. Google positioned it as the company's most advanced AI model with greater emphasis on long-term reasoning, multimodal understanding, persistent memory, and reliable agentic behavior [21].
The model achieved a 1501 Elo score on LMArena (claiming the top leaderboard position), scored 91.9% on GPQA Diamond (surpassing human expert performance), and reached 76.2% on SWE-bench Verified for coding tasks. Gemini 3 Pro scored 37.5% on Humanity's Last Exam without tools. The model supported a 1-million-token context window and was priced at $2 per million input tokens and $12 per million output tokens [22].
The release was accompanied by the launch of Google Antigravity, a new agentic development platform that combines an AI-powered IDE with a "Manager Surface" for spawning and orchestrating multiple AI agents working asynchronously. Antigravity was released in public preview at no cost for individuals, with support for MacOS, Windows, and Linux [23]. This was the first time Google launched a Gemini model across multiple products on day one.
Gemini 3 Flash followed on December 17, 2025 (released on December 22), combining Gemini 3 Pro's reasoning capabilities with the Flash line's low latency and cost efficiency. It achieved a 78% SWE-bench Verified score for agentic coding, actually outperforming Gemini 3 Pro on that particular benchmark. It was priced at $0.50 per million input tokens and $3.00 per million output tokens, and became the default model in the Gemini app globally [24].
On February 12, 2026, Google DeepMind released Gemini 3 Deep Think, a major upgrade to the specialized reasoning mode first introduced with Gemini 2.5. Deep Think is not a separate model but rather a reasoning mode within Gemini 3 that allocates additional compute at inference time, enabling the model to explore multiple hypotheses before producing a final answer [25].
Gemini 3 Deep Think achieved a record-breaking 84.6% on ARC-AGI-2 (verified by the ARC Prize Foundation), representing a 53.5-percentage-point improvement over standard Gemini 3 Pro's 31.1% on the same benchmark and a 15.8-point lead over the next best AI system. On Humanity's Last Exam, it scored 48.4% without tools. It also reached 3,455 Elo on Codeforces, 93.8% on GPQA Diamond, 87.7% on International Physics Olympiad, and 82.8% on International Chemistry Olympiad 2025 [25][26].
Deep Think can generate up to 192,000 tokens of output, allowing it to work through extended reasoning chains. It is available to Google AI Ultra subscribers in the Gemini app, with a limit of 10 prompts per day [27].
Google DeepMind released Gemini 3.1 Pro Preview on February 19, 2026. This marked a naming convention change: Google had previously used ".5" for mid-cycle updates (as in 1.5 and 2.5), but shifted to ".1" to reflect the significant capability jump in reasoning and agentic performance [1].
The headline result was a 77.1% score on ARC-AGI-2, a benchmark that evaluates a model's ability to solve novel visual-logic puzzles requiring multi-step abstraction and deduction outside its training distribution. This was more than double the 31.1% score of Gemini 3 Pro, representing the highest score among frontier models operating without extended deep thinking mode [1].
Other benchmark results included a 2887 Elo on LiveCodeBench Pro, 94.3% on GPQA Diamond, and #1 rankings on 12 of 18 tracked benchmarks. The model supports a 1-million-token context window, 64K token output, and three configurable thinking levels (Low, Medium, High). It also introduced the ability to generate, animate, and visually render SVG graphics and 3D code directly from natural language descriptions [1].
On March 3, 2026, Google released Gemini 3.1 Flash Lite in preview, the fastest and most cost-efficient model in the Gemini 3 series. It was designed for high-volume developer workloads at scale, priced at $0.25 per million input tokens and $1.50 per million output tokens [2].
Gemini 3.1 Flash Lite delivers 2.5 times faster time-to-first-token and 45% faster output speed compared to Gemini 2.5 Flash, according to Artificial Analysis benchmarks. It achieved an Elo score of 1432 on the Arena.ai leaderboard, 86.9% on GPQA Diamond, and 76.8% on MMMU Pro. The model comes with configurable thinking levels in AI Studio and Vertex AI, giving developers control over how much reasoning the model applies to each task [2].
Target use cases include high-volume translation, content moderation, user interface generation, dashboard creation, simulation building, and instruction following at scale.
The following table summarizes the major Gemini model releases, their dates, and key specifications.
| Model | Release Date | Context Window | Key Features |
|---|---|---|---|
| Gemini 1.0 Ultra | December 6, 2023 | 32K tokens | First model to surpass human experts on MMLU (90.0%); most capable Gemini 1.0 variant |
| Gemini 1.0 Pro | December 6, 2023 | 32K tokens | General-purpose model; 71.8% MMLU; integrated into Bard |
| Gemini 1.0 Nano | December 6, 2023 | 32K tokens | On-device model (1.8B and 3.25B parameter variants); runs on mobile without cloud |
| Gemini 1.5 Pro | February 2024 | 1M tokens (2M via waitlist) | Mixture-of-Experts architecture; major context window expansion; stronger than 1.0 Ultra |
| Gemini 1.5 Flash | May 14, 2024 | 1M tokens | Distilled from 1.5 Pro; optimized for speed and cost; fast summarization and captioning |
| Gemini 2.0 Flash (Experimental) | December 11, 2024 | 1M tokens | Multimodal Live API; native image generation; native tool use; 2x faster than 1.5 Pro |
| Gemini 2.0 Flash (Stable) | January 30, 2025 | 1M tokens | Became default model; general availability |
| Gemini 2.0 Flash-Lite | February 25, 2025 | 1M tokens | Most affordable option ($0.075/1M input); 8K max output; deprecated June 2026 |
| Gemini 2.0 Pro | February 5, 2025 | 1M tokens | Higher-capability 2.0 variant for complex tasks |
| Gemini 2.5 Pro | March 25, 2025 | 1M tokens (2M later) | First thinking model; #1 on LMArena; configurable thinking budget; Deep Think mode |
| Gemini 2.5 Flash | May 20, 2025 | 1M tokens | First Flash model with thinking capabilities; cost-efficient reasoning |
| Gemini 2.5 Flash-Lite | June 17, 2025 | 1M tokens (64K output) | Low-latency balanced model; $0.10/1M input; stable July 22, 2025 |
| Gemini 3 Pro | November 18, 2025 | 1M tokens | 1501 LMArena Elo; 91.9% GPQA Diamond; persistent memory; agentic behavior |
| Gemini 3 Flash | December 17, 2025 | 1M tokens | 78% SWE-bench Verified; default Gemini app model globally |
| Gemini 3 Deep Think | February 12, 2026 | 1M tokens (192K output) | 84.6% ARC-AGI-2; 48.4% Humanity's Last Exam; 3,455 Codeforces Elo |
| Gemini 3.1 Pro (Preview) | February 19, 2026 | 1M tokens (64K output) | 77.1% ARC-AGI-2; 94.3% GPQA Diamond; three thinking levels; SVG/3D generation |
| Gemini 3.1 Flash Lite (Preview) | March 3, 2026 | 1M tokens | 1432 Arena.ai Elo; 86.9% GPQA Diamond; 2.5x faster TTFT than 2.5 Flash |
Gemini is built on a decoder-only Transformer architecture that receives interleaved multimodal tokens (text, visual, audio, video). Unlike earlier Google models that processed text primarily and used separate modules for other modalities, Gemini was designed from the ground up to be natively multimodal. Input from different modalities is converted into a shared token space through learned encoders, allowing the model to reason across modalities within the same forward pass [28].
The architecture employs Multi-Query Attention for latency improvements and supports scalable context windows reaching up to 2 million tokens in the 2.5 series. The decoder processes these interleaved tokens sequentially, generating output that can include text, images, audio, or code depending on the task [29].
Starting with Gemini 1.5, the model family adopted a Mixture-of-Experts (MoE) architecture. In a traditional dense Transformer, every parameter is activated for every input token. In an MoE model, the network is divided into smaller specialized "expert" sub-networks, and a routing mechanism learns to activate only the most relevant experts for each input token [10].
This approach allows the model to have a very large total parameter count while keeping the computational cost per token low. For Gemini 3 Pro, the model has over 1 trillion total parameters, but only 15 to 20 billion parameters are activated per query. Typical configurations use top-k expert routing with k=2, meaning two experts are selected per token. This selective activation enables massive parameter scaling while optimizing compute efficiency and reducing serving costs [29].
Gemini's multimodal architecture follows a unified design philosophy. Rather than bolting separate vision or audio models onto a text backbone (as some competitors do), Gemini uses a single core Transformer that handles all modalities through a shared token representation. Visual inputs are processed through learned visual encoders, audio through audio encoders, and so on, with all modalities mapped into the same embedding space before being fed to the Transformer [28].
Beginning with Gemini 2.0, the model also gained native multimodal output capabilities. It can generate interleaved text and images in a single response, produce controllable text-to-speech with watermarking, and generate structured outputs combining multiple formats. This bidirectional multimodal capability (understanding and generating across modalities) distinguishes Gemini from many competitors that handle multimodal input but produce text-only output [13].
One of Gemini's most distinctive features is its extremely large context window. Starting with Gemini 1.5 Pro's 1-million-token context (expanded to 2 million tokens by late 2024), the model can ingest and reason over entire books, hours of video, full codebases, and lengthy document collections in a single query [11].
Google has demonstrated near-perfect recall across these long contexts. In testing, Gemini 1.5 Pro could find specific details in 3 hours of video footage and maintain coherence across 22 hours of audio. The model also generalizes zero-shot to tasks involving long inputs, such as following complex instructions spanning thousands of pages [11].
The Gemini 3 series maintains the 1-million-token context window, while the 2.5 series offered up to 2 million tokens. Gemini 3.1 Pro supports 1 million tokens of input and up to 64,000 tokens of output [1].
With the introduction of the 2.5 series in March 2025, Gemini gained explicit reasoning ("thinking") capabilities. In this mode, the model works through a problem step by step internally before producing its final answer, similar to chain-of-thought prompting but as a native capability [16].
Developers can control the thinking budget, choosing how much computation the model dedicates to reasoning. The 3.1 Pro model offers three discrete thinking levels: Low, Medium, and High. Higher thinking budgets generally produce more accurate results on complex mathematical, scientific, and coding problems, at the cost of increased latency and token usage [1].
Deep Think mode goes further, using advanced techniques including parallel thinking to enable the model to explore multiple hypotheses before selecting the best answer. Deep Think explicitly dedicates more inference-time compute and can generate up to 192,000 tokens of reasoning output. This mode has proven especially effective on competition-level mathematics (reaching gold-medal standard at the 2025 International Mathematical Olympiad in research evaluations), complex coding benchmarks, and scientific reasoning challenges [18][25].
Gemini models have shown strong performance across coding benchmarks. Gemini 3 Pro achieved a 2,439 Elo on LiveCodeBench (a competitive programming benchmark) and 76.2% on SWE-bench Verified, which measures the ability to solve real-world software engineering tasks from GitHub issues [22].
Gemini 3 Flash scored even higher on SWE-bench Verified at 78%, while Gemini 3.1 Pro reached a 2887 Elo on LiveCodeBench Pro. Gemini 3 Deep Think pushed competitive programming even further, achieving 3,455 Elo on Codeforces. These scores place Gemini models among the top performers for agentic coding tasks, where the model must understand a codebase, diagnose issues, and generate correct patches [24][1][25].
Gemini natively processes text, images, audio, video, PDFs, and code. This goes beyond simple image captioning: the model can reason about relationships between objects in images, understand video narratives across time, transcribe and analyze audio content, and work with documents that combine text and visual elements [28].
The multimodal reasoning capability was tested extensively on MMMU, a benchmark requiring understanding of images, charts, diagrams, and text together. Gemini 2.5 Pro with Deep Think scored 84.0% on MMMU, while the 3 series continued to improve on multimodal benchmarks [18].
The following table shows selected benchmark results across Gemini generations, illustrating the model family's progression.
| Benchmark | Gemini 1.0 Ultra | Gemini 1.5 Pro | Gemini 2.5 Pro | Gemini 3 Pro | Gemini 3 Deep Think | Gemini 3.1 Pro |
|---|---|---|---|---|---|---|
| MMLU | 90.0% | N/A | N/A | N/A | N/A | N/A |
| GPQA Diamond | N/A | N/A | 84.0% | 91.9% | 93.8% | 94.3% |
| LMArena Elo | N/A | N/A | #1 (debut) | 1501 | N/A | N/A |
| SWE-bench Verified | N/A | N/A | 63.8% | 76.2% | N/A | N/A |
| ARC-AGI-2 | N/A | N/A | N/A | 31.1% | 84.6% | 77.1% |
| LiveCodeBench / Codeforces | N/A | N/A | N/A | 2,439 Elo | 3,455 Elo | 2,887 Elo |
| MMMU (Deep Think) | N/A | N/A | 84.0% | N/A | N/A | N/A |
| Humanity's Last Exam (no tools) | N/A | N/A | 18.8% | 37.5% | 48.4% | N/A |
| AIME 2025 | N/A | N/A | 86.7% | N/A | N/A | N/A |
These results reflect different time periods and testing conditions; direct cross-generation comparisons should be made cautiously since benchmarks evolve and testing methodologies change.
The Gemini app is Google's consumer-facing AI assistant, accessible through the web (gemini.google.com), a dedicated Android app, and integration within the Google app on iOS.
The app originated as Bard, which Google launched on March 21, 2023, as a direct competitor to ChatGPT. Bard initially ran on LaMDA and later upgraded to PaLM 2. On February 8, 2024, Google rebranded Bard as Gemini, aligning the consumer product with the underlying model family [8].
The rebrand coincided with the launch of Gemini Advanced, a premium tier powered by Gemini Ultra 1.0 (later upgraded to newer models). In May 2025, Google restructured its consumer AI subscriptions into a three-tier system: Google AI Plus, Google AI Pro, and Google AI Ultra, replacing the previous unified AI Premium offering [9].
As of March 2026, Google offers three paid subscription tiers for Gemini access.
| Feature | AI Plus ($7.99/mo) | AI Pro ($19.99/mo) | AI Ultra ($249.99/mo) |
|---|---|---|---|
| Context window | 128K tokens | 1M tokens | 1M tokens |
| Thinking prompts/day | 90 | 300 | 1,500 |
| Pro model prompts/day | 30 | 100 | 500 |
| Deep Think 3.1 access | No | No | 10 prompts/day (192K context) |
| Deep Research reports/day | 12 | 20 | 120 |
| Image generation/day | 50 | 100 | 1,000 |
| Cloud storage | 200 GB | 2 TB | 30 TB |
| Jules coding agent | Basic | 5x limits | 20x limits |
| Project Mariner | No | No | Yes (10 simultaneous tasks) |
| Workspace AI integration | No | Yes | Yes |
| YouTube Premium | No | No | Included |
Google AI Plus launched in the US on January 27, 2026, at $7.99 per month (with a promotional rate of $3.99 for the first two months), providing an entry-level paid tier between the free account and the $19.99 Pro plan. The plan includes access to Gemini 3 Pro, Deep Research, NotebookLM Plus, and can be shared with up to five family members [30].
Google AI Ultra, priced at $249.99 per month, includes exclusive access to Deep Think 3.1, Project Mariner for autonomous web browsing, Veo 3.1 video generation with audio, 30 TB of storage, YouTube Premium, and the highest limits for Jules and other agentic features [27].
As of March 2026, the Gemini app defaults to Gemini 3 Flash globally, with paid subscribers having access to higher-tier models. The app supports text, voice, and image inputs. Users can upload documents, images, and files for analysis. The app integrates with Google services, allowing it to access Gmail, Google Drive, Google Maps, and other tools to answer queries in context [24].
Additional app features include:
Google began rolling out AI Overviews in Search in 2024, using custom Gemini models to generate summary answers that appear at the top of search results. By the end of 2024, AI Overviews had reached users in the U.S. and was expanding globally. At Google I/O 2025, the company introduced "AI Mode" in Search, powered by Gemini 2.5, providing more interactive and conversational search experiences [31].
Gemini has been integrated across Google Workspace applications including Gmail, Docs, Sheets, Slides, Drive, and Chat. The integration provides AI assistance directly in the workflow: summarizing email threads in Gmail, drafting documents in Docs, generating formulas and analysis in Sheets, and creating presentations in Slides [32].
In 2025, Google included Gemini AI features in the cost of Google Workspace Business and Enterprise plans, removing the need for a separate add-on. Gemini in Workspace can reference information from multiple files within Google Drive when generating content or suggestions, making it context-aware across an organization's documents [32].
Gemini serves as the default AI assistant on Android devices, replacing Google Assistant for many tasks. Gemini Nano runs on-device on supported phones (starting with the Google Pixel 8 Pro and Samsung Galaxy S24 series), enabling AI features like summarization and smart reply without requiring an internet connection. The full Gemini app provides more advanced capabilities through cloud-based models [8].
Google has used Gemini as the foundation for several AI agent initiatives that go beyond traditional chat interactions.
Project Astra is a Google DeepMind research prototype exploring the development of a universal AI assistant. Built on Gemini 2.5 Pro (and later Gemini 3), Astra processes real-time video, audio, and text as a single continuous stream, enabling it to see, hear, remember, and reason about its environment [33].
First previewed in 2024, Project Astra was significantly expanded at Google I/O 2025. It powers a new "Search Live" feature in Google Search, where users can click a "Live" button while using AI Mode or Google Lens to ask questions about what they are seeing through their smartphone camera. Astra streams live video and audio into the model and responds with minimal latency [33].
Google's collaboration with Samsung on Android XR smart glasses (under the "Project Moohan" codename) aims to provide a physical form factor for Astra, enabling features like real-time visual labeling, instant sign translation, and step-by-step repair instructions overlaid on physical objects [33].
Project Mariner is an AI agent prototype from Google DeepMind that autonomously executes tasks within the Chrome browser. It can browse websites, fill out forms, retrieve information, and complete multi-step web tasks while keeping the user informed and allowing intervention at any time [34].
First unveiled in late 2024, Mariner was expanded at Google I/O 2025 with the ability to handle up to 10 tasks simultaneously by running on virtual machines in the cloud. This allows users to continue working while Mariner completes tasks in the background. As of March 2026, Project Mariner is available exclusively to Google AI Ultra subscribers at $249.99 per month [34].
Jules is an asynchronous AI coding agent that integrates with GitHub. It clones codebases into secure Google Cloud virtual machines and uses Gemini to autonomously fix bugs, implement features, and update code while developers focus on other work. Jules launched out of beta in August 2025, with a free plan capped at 15 daily tasks and elevated limits for Pro and Ultra subscribers [35].
In October 2025, Google introduced Jules Tools, a command-line interface that brings Jules directly into the developer's terminal. With the release of Gemini 3, Jules was upgraded to use Gemini 3 Pro for improved code understanding and generation [35].
Google AI Studio is the primary entry point for developers experimenting with Gemini models. It offers a web-based interface for testing prompts, tuning models, and generating API keys. The free tier requires no credit card and provides access to multiple models, including Gemini 2.5 Flash, 3 Flash, and 3.1 Flash Lite, with rate limits of 5 to 15 requests per minute and up to 1,000 requests per day [36].
For production workloads, Google offers Gemini through Vertex AI, its enterprise AI platform on Google Cloud. Vertex AI provides the same models at the same per-token pricing but adds features for high-volume deployment, including fine-tuning, batch processing, model evaluation, and enterprise security controls [36].
Launched alongside Gemini 3 on November 18, 2025, Google Antigravity is an agentic development platform that combines a traditional AI-powered code editor with an agent-first "Manager Surface" interface. In the Manager Surface, developers can spawn, orchestrate, and observe multiple AI agents working asynchronously across different workspaces. Agents produce "Artifacts" (task lists, implementation plans, screenshots, browser recordings) that developers can review and comment on. Antigravity is available in public preview at no cost for individuals and supports Gemini 3 Pro, Gemini 3 Flash, and third-party models [23].
Gemini API pricing follows a per-token model. The following table shows pricing as of early 2026.
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Notes |
|---|---|---|---|
| Gemini 2.0 Flash-Lite | $0.075 | $0.30 | Most affordable 2.0 option; deprecated June 2026 |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | Low-latency balanced model |
| Gemini 2.5 Flash | $0.30 | $2.50 | Thinking model, cost-efficient |
| Gemini 3.1 Flash Lite | $0.25 | $1.50 | Fastest Gemini 3 series model |
| Gemini 3 Flash | $0.50 | $3.00 | Default app model |
| Gemini 3 Pro / 3.1 Pro | $2.00 | $12.00 | Standard context (up to 200K tokens) |
| Gemini 3 Pro / 3.1 Pro | $4.00 | $18.00 | Extended context (over 200K tokens) |
Google also offers a free tier with generous limits for experimentation and prototyping. In 2026, Google introduced Project Spend Caps in AI Studio, allowing developers to set monthly dollar limits on API expenses, and revamped its Usage Tiers system to help developers scale access more quickly [36].
Alongside the proprietary Gemini family, Google DeepMind has released the Gemma series of open-weight models built from the same research and technology that powers Gemini.
Google released the original Gemma models on February 21, 2024, in two sizes: 2B and 7B parameters. Both used a dense decoder architecture with Rotary Positional Embeddings (RoPE) and were available in pre-trained and instruction-tuned variants. Gemma 1 models had an 8,192-token context length [37].
Gemma 2 launched on June 27, 2024, with 9B and 27B parameter sizes (a 2B variant followed on July 31). The 27B model became one of the highest-ranking open models on the LMSYS Chatbot Arena leaderboard, outperforming popular models more than twice its size. Gemma 2 delivered higher performance and greater inference efficiency compared to the first generation, with significant safety improvements [38].
Released on March 12, 2025, Gemma 3 is built from the same research that produced Gemini 2.0. It comes in four sizes (1B, 4B, 12B, and 27B), with a later 270M variant released in August 2025 for task-specific fine-tuning. Key advances include support for over 140 languages, vision understanding, a 128K-token context window, and function calling. In preliminary human preference evaluations on LMArena, Gemma 3 outperformed LLaMA 3 405B, DeepSeek-V3, and o3-mini [39].
Announced at Google I/O 2025 and released on June 26, 2025, Gemma 3n is a mobile-first architecture developed in collaboration with Qualcomm, MediaTek, and Samsung. It uses a novel MatFormer (Matryoshka Transformer) design that nests a smaller model inside a larger one, enabling elastic inference where developers can choose between the full model and a smaller, faster sub-model [40].
Gemma 3n is available in E2B and E4B sizes (with raw parameter counts of 5B and 8B, respectively) but runs with a memory footprint comparable to traditional 2B and 4B models, operating with as little as 2 GB of memory. It was the first sub-10B parameter model to exceed 1,300 points on LMArena. Gemma 3n supports text, image, audio, and video understanding across 140 languages [40].
Gemini competes primarily with OpenAI's GPT series and Anthropic's Claude family, along with open-source alternatives like Meta's LLaMA and DeepSeek.
OpenAI released GPT-5 on August 7, 2025. GPT-5.2 offers a 400,000-token context window and achieved a perfect 100% score on AIME 2025 without external tools. On SWE-bench Verified, GPT-5.2 scored 74.9%, trailing both Gemini 3 Pro (76.2%) and Claude Opus 4.5 (80.9%). Gemini maintains a significant advantage in context window size, with its 1-million-token window being 2.5 times larger than GPT-5.2's [41].
Anthropic released Claude Opus 4.5 on November 24, 2025. Claude leads on real-world coding tasks with an 80.9% SWE-bench Verified score. However, Gemini 3 Pro dominates algorithmic and competitive programming with its 2,439 LiveCodeBench Elo (and Gemini 3 Deep Think reaches 3,455 Codeforces Elo). Claude offers a 200,000-token context window, significantly smaller than Gemini's 1-million-token window. The models have different strengths: Gemini excels in multimodal reasoning and long-context tasks, while Claude is known for strong instruction following and coding accuracy [41].
All three families continue to leapfrog each other on benchmarks with each new release, and the competitive gap between frontier models has narrowed considerably since 2024.
As of March 2026, Google offers a full spectrum of Gemini models serving different needs. Gemini 3.1 Pro Preview is the most capable standard model, leading on 12 of 18 tracked benchmarks, while Gemini 3 Deep Think holds the record for abstract reasoning with its 84.6% ARC-AGI-2 score. Gemini 3.1 Flash Lite provides the fastest and most cost-efficient option in the Gemini 3 family for high-volume workloads.
Gemini 3 Flash serves as the default model for most consumer interactions, balancing high performance with low cost and latency. Gemini models power AI features across virtually all Google products, from Search to Workspace to Android, serving hundreds of millions of users daily.
Google's three-tier subscription structure (AI Plus at $7.99, AI Pro at $19.99, and AI Ultra at $249.99 per month) segments consumer access, with the Ultra tier unlocking advanced agentic capabilities through Project Mariner, elevated Jules limits, and exclusive Deep Think access.
The Gemini API has become one of the most widely used AI APIs globally, competing with OpenAI's API for developer mindshare. Google's strategy of offering a generous free tier and competitive pricing has helped attract developers, particularly those already within the Google Cloud ecosystem.
The Gemma open model family continues to expand, with Gemma 3 and Gemma 3n providing competitive open-weight alternatives that run efficiently on consumer hardware and mobile devices.