Gemini Nano
Last reviewed
May 1, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 4,000 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 1, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 4,000 words
Add missing citations, update stale details, or suggest a clearer explanation.
Gemini Nano is the smallest variant of Google's Gemini family of multimodal large language models, designed to run efficiently on mobile devices and other edge hardware rather than in cloud data centers. It was introduced on December 6, 2023 by Google CEO Sundar Pichai alongside the larger Gemini Pro and Gemini Ultra models, and first shipped on the Pixel 8 Pro in the December 2023 Pixel Feature Drop. Nano powers on-device features such as summarization in the Recorder app, Smart Reply in Gboard, Magic Compose in Google Messages, Pixel Screenshots search, and image descriptions in TalkBack, while keeping user data on the device.
The original Gemini technical report (arXiv:2312.11805) describes Nano as a pair of small models distilled from larger Gemini variants and quantized to 4 bits. Gemini Nano-1 has 1.8 billion parameters and targets low-memory devices, while Gemini Nano-2 has 3.25 billion parameters and targets higher-memory devices. Google has since expanded Nano beyond Pixel through the AICore system service and the Google AI Edge SDK on Android, brought it into desktop Chrome through the built-in Prompt API, and added a multimodal Nano with image and audio understanding on the Pixel 9 series. With Apple's 2024 release of Apple Intelligence, Gemini Nano became the most prominent counterpart to Apple's on-device foundation models in the smartphone market.
Google announced the Gemini family in a blog post by Sundar Pichai and DeepMind CEO Demis Hassabis on December 6, 2023. Pichai described Gemini as Google's "most capable and general model yet" and said it had been "built to be multimodal" from the start, handling text, code, audio, images, and video together rather than stitching modalities together later. The first generation, Gemini 1.0, was released in three sizes: Ultra for highly complex tasks, Pro for a balance of capability and scale, and Nano for on-device tasks.
The accompanying technical report, "Gemini: A Family of Highly Capable Multimodal Models," appeared on arXiv as paper 2312.11805 on December 19, 2023, with later revisions through 2024 and 2025. The report describes Gemini Nano as two small models, Nano-1 with 1.8 billion parameters and Nano-2 with 3.25 billion parameters, both trained by distilling from larger Gemini models and quantized to 4 bits. The paper frames Nano as the variant intended for memory-constrained, on-device deployment, in contrast to Pro and Ultra, which were initially served from Google's data centers.
The Pixel 8 Pro was the first device to ship Gemini Nano. Google's December 2023 Pixel Feature Drop, also published on December 6, 2023, announced that Nano was running on the phone and powering two production features: Summarize in Recorder, which generated bullet-point summaries of voice recordings without a network connection, and Smart Reply in Gboard, which produced contextual responses for messaging apps including WhatsApp, Line, and KakaoTalk. Both ran locally on the Tensor G3 chip. Google's blog called the Pixel 8 Pro "the first smartphone with AI built in," referring to the system-level integration of an on-device large language model.
Gemini Nano sits inside a wider model family that has grown across multiple generations.
| Variant | First released | Size class | Runs in | Notes |
|---|---|---|---|---|
| Gemini Ultra | Dec 2023 (1.0) | Frontier | Cloud | Largest 1.0 model, served via Bard / Gemini Advanced |
| Gemini Pro | Dec 2023 (1.0) | Mid-tier | Cloud | Default API model in early 2024 |
| Gemini Nano | Dec 2023 (1.0) | 1.8B and 3.25B | On-device | Distilled and 4-bit quantized |
| Gemini 1.5 Pro / Flash | Feb and May 2024 | Mid-tier | Cloud | Long context, Mixture-of-Experts (MoE) |
| Gemini 2.0 Flash | Jan 2025 | Compact cloud | Cloud | Default for the Gemini app and API |
| Gemini 2.0 Pro | Feb 2025 | Mid-tier | Cloud | Two-million-token context |
| Gemini 2.5 family | 2025 | Mid-tier and large | Cloud | DeepMind-led generation introduced at I/O 2025 |
| Gemini 3 Pro | Late 2025 | Frontier | Cloud | Successor to 2.5 line, see Gemini 3 Pro |
Nano is the only branch explicitly designed for on-device use. The cloud Flash models are smaller and cheaper than Pro or Ultra but still run in Google's data centers. Nano carries the Gemini brand into phones, watches, and browsers, with Google shipping updated Nano variants alongside each generation, usually a release or two behind the cloud models.
Gemini Nano shares the broader Gemini architecture, a decoder-only Transformer trained on text, code, images, and audio. Two design choices distinguish Nano from the Pro and Ultra variants: distillation and aggressive quantization.
Distillation means the Nano models are trained to imitate the outputs of larger Gemini models rather than learning purely from raw training data. The student model, Nano-1 or Nano-2, sees both the original training corpus and the probability distributions over tokens produced by a larger Gemini teacher. This lets a small model carry over a useful share of the teacher's behavior in a much smaller parameter budget, and the Gemini paper credits this approach for Nano's relatively strong performance per parameter on MMLU and reading comprehension.
Quantization to 4 bits per weight roughly quarters the storage and memory bandwidth required compared to a 16-bit floating point representation. With 4-bit weights, Nano-1 fits in roughly 1 to 1.5 gigabytes of memory and Nano-2 in around 2 to 3 gigabytes, depending on activation precision and any LoRA (Low-Rank Adaptation) adapters loaded on top. Google has not published every detail of the quantization scheme, but Android documentation refers to 4-bit Nano weights running through the AICore runtime on Tensor and partner NPUs.
Google has kept Nano as a closed-weights model. There are no public weight downloads, and developers interact with Nano only through Google's APIs on Android, Chrome, and partner devices. The related Gemma family is Google's open-weights line of small Transformer models, but it is a separate brand with its own checkpoints; Nano remains the proprietary on-device model that Google ships through its own surfaces.
Gemini Nano covers a narrower range of tasks than its cloud siblings, but it handles the everyday workloads that benefit most from low latency and on-device privacy. The model handles text generation and rewriting (including tone shifts used by Magic Compose in Google Messages), summarization of long inputs such as Recorder audio transcripts and chat threads, smart reply and short-message proofreading, and, on Pixel 9 and later, multimodal text-image-audio understanding through the Nano with Multimodality variant. The same multimodal model powers richer transcript summaries and detailed image descriptions for accessibility tools. Sensor- and context-aware features combine Nano output with on-device signals such as call audio, screenshots, or app context.
Pixel Screenshots, introduced with the Pixel 9 family, is a representative case. It lets a user search saved screenshots in plain English and pulls out details like flight numbers, Wi-Fi passwords, or recipe ingredients. Nano runs locally so the screenshots themselves never leave the phone.
Google has expanded Gemini Nano well beyond the original Pixel 8 Pro launch.
| Platform | First Nano support | Surface |
|---|---|---|
| Pixel 8 Pro | Dec 2023 Feature Drop | Recorder summaries, Gboard Smart Reply |
| Pixel 8 / 8a | Jun 2024 Feature Drop | Same features as 8 Pro, gated by 8 GB RAM |
| Pixel 9, 9 Pro, 9 Pro XL, 9 Pro Fold | Aug 2024 | Multimodal Nano, Pixel Screenshots, Pixel Studio, Call Notes |
| Pixel 9a | 2025 | Smaller "Nano XXS" variant |
| Pixel 10 family | 2025 | Tensor G5: Magic Cue, Voice Translate, Pro Res Zoom |
| Samsung Galaxy S24 | Jan 2024 | On-device Circle to Search, Magic Compose, Photomoji |
| Galaxy S24 FE, Z Fold 6, Z Flip 6 | 2024 | Nano-backed Google features via AICore |
| Motorola Edge 50 Ultra, Razr 50 Ultra | 2024 | AICore-based Nano features |
| Xiaomi 14T series, MIX Flip | 2024 | Announced AICore support |
| Android via AI Edge SDK | Oct 2024 (experimental) | Text-to-text prompts on Pixel 9, expanding |
| Chrome desktop (Prompt API) | Chrome 138 (2025) | LanguageModel JavaScript API in the browser |
Samsung's Galaxy S24 series was the first non-Pixel phone to run Gemini Nano, bundled through AICore and powering Google-branded features such as Magic Compose in Messages and the on-device portions of Circle to Search; heavier work falls back to cloud-served Gemini Pro. The same AICore plumbing has since shipped on Motorola and Xiaomi flagships, with Tensor, Qualcomm Snapdragon, and MediaTek Dimensity NPUs as the supported acceleration paths.
The layer between user-facing apps and Gemini Nano on Android is built around two pieces. AICore is an Android system service introduced with Android 14. It manages the Nano model weights, runs inference on the Tensor TPU or partner NPU, applies safety filters, and updates the model out of band so apps do not have to bundle large weights themselves. Google describes AICore as "private by design": the service has restricted internet access, runs each request in isolation, and uses the Private Compute Core architecture from earlier Pixel features such as Smart Reply.
The Google AI Edge SDK is the developer-facing API for AICore. The first SDK package, com.google.ai.edge.aicore:aicore:0.0.1-exp01, opened experimental access on October 1, 2024, initially limited to text-to-text prompts on Pixel 9 series devices. The SDK lets a developer set parameters such as temperature, max output tokens, and number of candidates, attach a LoRA adapter for task-specific tuning, and stream tokens as they are produced. The wider Google AI Edge effort also covers LiteRT (formerly TensorFlow Lite (LiteRT)) and a broader runtime called LiteRT-LM aimed at small language models on edge devices.
ML Kit GenAI APIs sit one level above the AI Edge SDK and give Android developers ready-made entry points: Prompt, Summarization, Proofreading, Rewriting, Image Description, and Speech Recognition. These were announced in May 2025.
In Chrome, Google added a built-in AI stack starting in 2024 with experimental flags and stabilized parts of it through Chrome 137 and 138 in 2025. The Prompt API exposes Gemini Nano through a JavaScript LanguageModel global object, with promise-based and streaming interfaces for text generation. Browser support requires Windows 10 or 11, macOS 13 and later, recent Linux, or ChromeOS on Chromebook Plus, plus around 22 gigabytes of free disk on the Chrome profile volume to hold the model. Earlier reporting referred to the namespace as window.ai, but the released API uses LanguageModel directly and ships in Chrome rather than as a third-party shim.
Production features using Gemini Nano on Android and Chrome include Summarize in Recorder (offline bullet summaries of voice recordings), Smart Reply in Gboard (contextual chips for WhatsApp, Line, KakaoTalk), Magic Compose in Google Messages (tone rewriting), Pixel Screenshots (natural-language search through saved screenshots), Pixel Studio (on-device image generation parts), Call Notes (call transcription and summary), Weather AI Reports, and TalkBack image descriptions. On Pixel 10, Nano powers Magic Cue, Voice Translate, and Pro Res Zoom. Pixel Watch 4 and Pixel Buds 2a use Nano for Raise to Talk, message summaries, and recommendations. In Chrome, the Prompt API lets web pages run summarization, translation, and generation directly in the browser. Third-party Android apps build on the same model through ML Kit GenAI, the AI Edge SDK, or AICore.
Google has not published a full benchmark suite for the shipping Nano models, but several public numbers give a sense of the range. The Gemini 1.0 paper reports MMLU and BoolQ scores for Nano-1 and Nano-2 well below Pro and Ultra but competitive with other small Transformer models of similar size. Google's October 2024 AI Edge SDK post notes that the experimental on-device Nano (described in academic literature as Nano 2) scored 56% on MMLU compared with 46% for the earlier version, 23% on math compared with 14%, 90% on paraphrasing compared with 44%, and 82% on smart reply compared with 44%. These are internal task evaluations, not standardized public benchmarks, but they show the trajectory of the on-device model since launch.
Latency on Tensor G3 and G5 hardware is in the low hundreds of milliseconds for short prompts, with the Pixel 8 Pro launch features designed for real-time keyboard suggestions and short summaries. Memory footprint is dominated by the 4-bit weights, which fit in roughly 1 to 1.5 gigabytes for Nano-1 and 2 to 3 gigabytes for Nano-2, plus working memory for the KV cache. Battery cost is low enough for Recorder summaries and Gboard suggestions to run frequently without unusual drain, although Google has gated longer-form features such as Recorder summarization of multi-hour audio to higher-RAM Pixel 9 devices.
Quality on hard tasks lags behind cloud-served Gemini Pro, Gemini Flash, and frontier models from other vendors. Reviews of the launch features were mixed: AndroidPolice's Pixel 8 Pro hands-on found the Recorder summaries useful but the Gboard Smart Reply chips often bland or off-topic, and 9to5Google noted that Smart Reply latency was visible on the first builds. Features have improved with subsequent Nano updates and tighter prompt tuning, but the on-device model is best understood as a fast assistant for short, well-scoped tasks rather than a frontier reasoning system.
Gemini Nano is one of several small language models positioned for on-device use. The table compares Nano with the most prominent peers, using parameter counts disclosed in papers, blog posts, or model cards.
| Model | Vendor | Parameters | Quantization | On-device | Multimodal | First disclosed |
|---|---|---|---|---|---|---|
| Gemini Nano-1 | ~1.8B | 4-bit | Yes | Text, image, audio (multimodal variant) | Dec 2023 | |
| Gemini Nano-2 | ~3.25B | 4-bit | Yes | Text, image, audio | Dec 2023 | |
| Apple Foundation Models (on-device) | Apple | ~3B | Mixed group | Yes | Text, image | Jun 2024 |
| Phi-3 Mini | Microsoft | 3.8B | 4-bit possible | Yes (community) | Text (image variant separate) | Apr 2024 |
| Llama 3.2 1B | Meta | ~1.2B | 4-bit possible | Yes (community) | Text | Sep 2024 |
| Llama 3.2 3B | Meta | ~3.2B | 4-bit possible | Yes (community) | Text | Sep 2024 |
| Llama 3.2 11B Vision | Meta | ~11B | Mixed | Edge servers / high-end | Text, image | Sep 2024 |
| Mistral 7B (small variants) | Mistral AI | 7B | 4-bit possible | Yes (community) | Text | Sep 2023 |
| Qwen2 0.5B / 1.5B | Alibaba | 0.5B / 1.5B | 4-bit possible | Yes | Text | Jun 2024 |
| GPT-4o mini | OpenAI | Undisclosed | n/a | No (cloud only) | Text, image, audio | Jul 2024 |
Gemini Nano is unusual on this list for two reasons. It ships pre-installed on consumer phones through a system service rather than as a hobbyist download, and it is gated behind Google's API rather than released as open weights. LLaMA and Phi are open-weights models that rely on third-party runtimes such as llama.cpp or Ollama to run on phones, while Apple Intelligence is a parallel system-level integration locked to Apple silicon. GPT-4o mini is a cloud-hosted OpenAI model included in the table only to clarify it is not a peer of Nano in the on-device sense.
Apple's response to Gemini Nano arrived at WWDC in June 2024 as Apple Intelligence. Both systems target similar features (writing tools, summaries, smart replies, image descriptions, on-device assistants), but they differ on hardware, model layout, and trust model.
| Aspect | Gemini Nano | Apple Intelligence |
|---|---|---|
| Vendor | Apple | |
| On-device model size | ~1.8B (Nano-1), ~3.25B (Nano-2) | ~3B on-device foundation model |
| Cloud fallback | Cloud Gemini Pro / Flash / Ultra | ~8B Private Cloud Compute model on Apple Silicon |
| Hardware | Tensor G3 / G4 / G5, Qualcomm and MediaTek NPUs via AICore | Apple A17 Pro and later, M-series for iPad and Mac |
| Operating systems | Android 14+, ChromeOS, desktop Chrome | iOS 18+, iPadOS 18+, macOS Sequoia+ |
| Trust model | Private Compute Core on-device, Google Cloud terms for fallback | Private Cloud Compute with attested servers, no logging |
| Developer access | AI Edge SDK, ML Kit GenAI, Chrome Prompt API | Writing Tools, Image Playground, App Intents, Foundation Models framework |
| Open weights | No | No |
The two systems represent different bets about what "system-level AI" looks like. Google's bet is that Nano can be a portable layer running on Pixel, partner Android phones, and desktop Chrome, with cloud Gemini available when the on-device model is not enough. Apple's bet is a tighter pair of one on-device model and one private-cloud model, both restricted to Apple Silicon, with third-party cloud models such as ChatGPT plugged in only on user request.
Gemini Nano is constrained by both its size and its deployment context. It is much smaller than cloud Gemini models, so it is weaker on hard reasoning, long-context tasks, and complex coding. The Gemini paper's own benchmarks show Nano-2 well behind Gemini Pro on MMLU, GSM8K, and similar evaluations. For tasks like explaining a legal contract or refactoring a large codebase, the cloud-served Gemini models and competing frontier systems are still the right tools.
4-bit quantization gives up some quality compared to higher-precision weights, in exchange for faster inference and lower memory. Battery, thermal, and memory constraints also limit how aggressively apps can use Nano. Continuous generation works for short bursts, but real-time agent loops and long-running summaries of multi-hour audio are gated to higher-RAM Pixel models, and Google's documentation recommends batching work and avoiding tight loops.
Device support is uneven. Many Nano features are exclusive to Pixel, with Samsung, Motorola, and Xiaomi getting subsets through partnerships. The Pixel 9a uses a smaller "Nano XXS" variant, and older or lower-tier Android phones do not support Nano at all. Chrome's Prompt API requires substantial disk space and is still flagged as experimental for many origins. Nano is also closed source: there are no public weights, and developers cannot fine-tune the base model directly. Google supports LoRA adapters on top of Nano through AICore for some customization, but the base model remains under Google's control.
In January 2024, the Galaxy S24 series brought Gemini Nano to its first non-Pixel phone, alongside Circle to Search and Magic Compose. The March and June 2024 Pixel Feature Drops extended Nano features to the Pixel 8 and Pixel 8a, initially through a developer option and then as default features once Google was satisfied the lower-RAM phones could handle them.
Google I/O 2024 previewed Gemini Nano with Multimodality, the variant that adds image and audio understanding alongside text. The Pixel 9 series, launched in August 2024, became the first phone to ship the multimodal Nano. 9to5Google reported the model is roughly twice the size of the original and substantially more capable on Recorder summaries and image descriptions. The same release added Pixel Screenshots, Pixel Studio, and richer TalkBack image descriptions.
On October 1, 2024, Google opened experimental access to Gemini Nano for all Android developers through the AI Edge SDK and AICore, initially limited to Pixel 9 devices for text-to-text prompts. In May 2025, ML Kit GenAI APIs added higher-level entry points for prompt, summarization, proofreading, rewriting, image description, and speech recognition. Google I/O 2025 included a session titled "Gemini Nano on Android: building with on-device gen AI," framing Nano as a stable platform for third-party apps rather than a Pixel-only experiment.
In 2025, the Pixel 10 series launched on the Tensor G5 with Nano powering Magic Cue, Voice Translate, and Pro Res Zoom, plus on-device portions of the photo-to-video feature in the Gemini app. The Pixel Watch 4 added Raise to Talk, and Pixel Buds 2a added on-device message summaries. The 2026 Android Developers Blog post announcing Gemma 4 in the AICore Developer Preview noted that Gemma 4 was the foundation for the next Gemini Nano generation, called "Gemini Nano 4," and that AICore-targeted Gemma 4 code would run automatically on Nano 4 devices later in 2026. Chrome's built-in AI APIs moved from a developer preview to broader availability through Chrome 137 and 138 in 2025, with the Prompt API exposing Nano through a stable LanguageModel JavaScript object on Windows, macOS, Linux, and ChromeOS.
Gemini Nano is the on-device anchor of Google's AI strategy. The cloud-served Pro, Flash, and Ultra models compete directly with OpenAI's GPT line and Anthropic's Claude in the API and chatbot markets, but Nano competes in a different fight: which AI assistant is built into the operating system a user picks up first thing in the morning. By shipping Nano on Pixel, partner Android phones, Wear OS watches, Pixel Buds, and desktop Chrome, Google offers a system-level AI layer no pure API vendor can match.
Nano is also the most direct counterweight to Apple Intelligence. Both companies have settled on a similar pattern: a small on-device model for fast, private operations and a larger cloud model for harder work. The differences between them, on hardware, on disclosure, and on third-party integration, will shape how the phone industry thinks about AI for years.
For developers, Nano expands the surface area of generative AI. A prompt through the AI Edge SDK costs nothing per call, runs offline, and keeps user data on the device, which makes it attractive for features hard to justify with a cloud API: real-time keyboard help, sensitive document summaries, accessibility tools, and long-running background work. The trade-off is that the model is smaller, the API is gated to specific devices and chipsets, and Google sets the rules for what runs through AICore. Nano shifts some on-device AI design decisions from app developers to the platform vendor, much as camera pipelines and notification systems have done on smartphones. Generative AI is moving from a service users go to into a feature ambient in the phone, the watch, and the browser, and Gemini Nano is one of the clearest early examples of that shift in production.