Gemini Nano
Last reviewed
May 24, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 ยท 6,293 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 24, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 ยท 6,293 words
Add missing citations, update stale details, or suggest a clearer explanation.
Gemini Nano is the smallest variant of Google's Gemini family of multimodal large language models, designed to run efficiently on mobile devices and other edge hardware rather than in cloud data centers.[1] It was introduced on December 6, 2023 by Google CEO Sundar Pichai and Google DeepMind CEO Demis Hassabis alongside the larger Gemini Pro and Gemini Ultra models, and first shipped on the Pixel 8 Pro in the December 2023 Pixel Feature Drop.[1][3] Nano powers on-device features such as summarization in the Recorder app, Smart Reply in Gboard, Magic Compose in Google Messages, Pixel Screenshots search, and image descriptions in TalkBack, while keeping user data on the device.[3][11]
The original Gemini technical report (arXiv:2312.11805) describes Nano as a pair of small models distilled from larger Gemini variants and quantized to 4 bits.[2][6] Gemini Nano-1 has 1.8 billion parameters and targets low-memory devices, while Gemini Nano-2 has 3.25 billion parameters and targets higher-memory devices.[2][33] Google has since expanded Nano beyond Pixel through the AICore system service and the Google AI Edge SDK on Android, brought it into desktop Chrome through the built-in Prompt API, and added a multimodal Nano with image and audio understanding on the Pixel 9 series.[7][9][17] With Apple's 2024 release of Apple Intelligence, Gemini Nano became the most prominent counterpart to Apple's on-device foundation models in the smartphone market.[21]
Google announced the Gemini family in a blog post by Sundar Pichai and Google DeepMind CEO Demis Hassabis on December 6, 2023.[1][6] Pichai described Gemini as Google's "most capable and general model yet" and said it had been "built to be multimodal" from the start, handling text, code, audio, images, and video together rather than stitching modalities together later.[1] The first generation, Gemini 1.0, was released in three sizes: Ultra for highly complex tasks, Pro for a balance of capability and scale, and Nano for on-device tasks.[1]
The accompanying technical report, "Gemini: A Family of Highly Capable Multimodal Models," appeared on arXiv as paper 2312.11805 on December 19, 2023, with later revisions through 2024 and 2025.[2] The report describes Gemini Nano as two small models, Nano-1 with 1.8 billion parameters and Nano-2 with 3.25 billion parameters, both trained by distilling from larger Gemini models and quantized to 4 bits.[2] The paper frames Nano as the variant intended for memory-constrained, on-device deployment, in contrast to Pro and Ultra, which were initially served from Google's data centers.[2]
The Pixel 8 Pro was the first device to ship Gemini Nano. Google's December 2023 Pixel Feature Drop, also published on December 6, 2023, announced that Nano was running on the phone and powering two production features: Summarize in Recorder, which generated bullet-point summaries of voice recordings without a network connection, and Smart Reply in Gboard, which produced contextual responses for messaging apps including WhatsApp, Line, and KakaoTalk.[3] Both ran locally on the Tensor G3 chip.[3][4] Google's blog called the Pixel 8 Pro "the first smartphone with AI built in," referring to the system-level integration of an on-device large language model.[3]
Behind the scenes, Google introduced Android AICore the same day as a new Android 14 system service that provides apps with access to Gemini Nano while managing the model, runtime, and safety features.[5] AICore launched with support for Pixel 8 Pro and was designed from the start to accept neural-network co-processors from Qualcomm, Samsung S.LSI, and MediaTek, signalling that Google intended Nano to be a portable component across the Android ecosystem rather than a Pixel-exclusive.[5]
Gemini Nano sits inside a wider model family that has grown across multiple generations.
| Variant | First released | Size class | Runs in | Notes |
|---|---|---|---|---|
| Gemini Ultra | Dec 2023 (1.0) | Frontier | Cloud | Largest 1.0 model, served via Bard / Gemini Advanced[1] |
| Gemini Pro | Dec 2023 (1.0) | Mid-tier | Cloud | Default API model in early 2024[1] |
| Gemini Nano | Dec 2023 (1.0) | 1.8B and 3.25B | On-device | Distilled and 4-bit quantized[2] |
| Gemini 1.5 Pro / Flash | Feb and May 2024 | Mid-tier | Cloud | Long context, Mixture of Experts (MoE) |
| Gemini 2.0 Flash | Jan 2025 | Compact cloud | Cloud | Default for the Gemini app and API |
| Gemini 2.0 Pro | Feb 2025 | Mid-tier | Cloud | Two-million-token context |
| Gemini 2.5 family | 2025 | Mid-tier and large | Cloud | DeepMind-led generation introduced at I/O 2025 |
| Gemini 3 Pro | Late 2025 | Frontier | Cloud | Successor to 2.5 line, see Gemini 3 Pro |
Nano is the only branch explicitly designed for on-device use. The cloud Flash models are smaller and cheaper than Pro or Ultra but still run in Google's data centers. Nano carries the Gemini brand into phones, watches, and browsers, with Google shipping updated Nano variants alongside each generation, usually a release or two behind the cloud models.
Gemini Nano shares the broader Gemini architecture, a decoder-only Transformer trained on text, code, images, and audio.[2] Two design choices distinguish Nano from the Pro and Ultra variants: distillation and aggressive quantization.
Knowledge distillation means the Nano models are trained to imitate the outputs of larger Gemini models rather than learning purely from raw training data.[2] The student model, Nano-1 or Nano-2, sees both the original training corpus and the probability distributions over tokens produced by a larger Gemini teacher.[2] This lets a small model carry over a useful share of the teacher's behavior in a much smaller parameter budget, and the Gemini paper credits this approach for Nano's relatively strong performance per parameter on MMLU and reading comprehension.[2] The Gemini 1.0 report frames the Nano series as "the most efficient model for on-device tasks" and explains that the small footprint comes from a combination of distillation, careful data curation, and post-training quantization.[2]
Quantization to 4 bits per weight roughly quarters the storage and memory bandwidth required compared to a 16-bit floating point representation. With 4-bit weights, Nano-1 fits in roughly 1 to 1.5 gigabytes of memory and Nano-2 in around 2 to 3 gigabytes, depending on activation precision and any LoRA (Low-Rank Adaptation) adapters loaded on top.[2][5] Google has not published every detail of the quantization scheme, but Android documentation refers to 4-bit Nano weights running through the AICore runtime on Tensor and partner NPUs.[5][7]
Compared to Apple's on-device foundation model, which uses a mixed 2-bit and 4-bit configuration averaging 3.7 bits per weight, Nano sits at a coarser quantization level but on a similar memory budget.[22] Phi-3 Mini from Microsoft can also be quantized to 4 bits and occupies roughly 1.8 gigabytes in that form, slotting between Nano-1 and Nano-2 on a memory-versus-parameter axis.[23]
Google has kept Nano as a closed-weights model. There are no public weight downloads, and developers interact with Nano only through Google's APIs on Android, Chrome, and partner devices. The related Gemma family is Google's open-weights line of small Transformer models, but it is a separate brand with its own checkpoints; Nano remains the proprietary on-device model that Google ships through its own surfaces. The two lines share research lineage and tooling. A 2026 Android Developers Blog noted that Gemma 4 will serve as the foundation for the next Gemini Nano generation, with AICore-targeted Gemma 4 code expected to run automatically on future Nano 4 devices, but the on-device Nano weights themselves remain proprietary.[7]
The original 1.0 Nano models were text-only.[2] At Google I/O in May 2024, Google previewed "Gemini Nano with Multimodality," a variant that adds image and audio understanding in addition to text.[16] The multimodal Nano shipped on the Pixel 9 series in August 2024 and is "significantly larger than the previous one," roughly twice the size of Nano 1.0 and described by Google as "three times more capable and sophisticated" than the version that shipped on the Pixel 8 Pro.[17] The multimodal model continues to use distillation and 4-bit quantization, with the Pixel 9 series shipping 16 gigabytes of RAM in part to reserve room for the larger weights.[17][19]
Gemini Nano covers a narrower range of tasks than its cloud siblings, but it handles the everyday workloads that benefit most from low latency and on-device privacy. The model handles text generation and rewriting (including tone shifts used by Magic Compose in Google Messages), summarization of long inputs such as Recorder audio transcripts and chat threads, smart reply and short-message proofreading, and, on Pixel 9 and later, multimodal text-image-audio understanding through the Nano with Multimodality variant.[11][14][17] The same multimodal model powers richer transcript summaries and detailed image descriptions for accessibility tools. Sensor- and context-aware features combine Nano output with on-device signals such as call audio, screenshots, or app context.
Pixel Screenshots, introduced with the Pixel 9 family in August 2024, is a representative case. It lets a user search saved screenshots in plain English and pulls out details like flight numbers, Wi-Fi passwords, or recipe ingredients.[20] Nano runs locally so the screenshots themselves never leave the phone, and Google says all the processing remains offline for near-immediate results even with thousands of screenshots indexed.[20]
Call Notes, also on Pixel 9 and later, takes the audio of a phone call and produces a private on-device summary with the key details; both parties on the call are notified when the feature is active.[20] Pixel Weather generates a one-sentence custom forecast pulling in UV index and air quality on top of the temperature.[20] Pixel Studio combines Nano text understanding with a separate on-device diffusion model that runs on Tensor G4 for fast image generation.[20]
Google has expanded Gemini Nano well beyond the original Pixel 8 Pro launch.
| Platform | First Nano support | Surface |
|---|---|---|
| Pixel 8 Pro | Dec 2023 Feature Drop | Recorder summaries, Gboard Smart Reply[3] |
| Pixel 8 / 8a | Jun 2024 Feature Drop | Same features as 8 Pro, gated by 8 GB RAM[15] |
| Pixel 9, 9 Pro, 9 Pro XL, 9 Pro Fold | Aug 2024 | Multimodal Nano, Pixel Screenshots, Pixel Studio, Call Notes[17][20] |
| Pixel 9a | 2025 | Smaller "Nano XXS" variant[18] |
| Pixel 10 family | 2025 | Tensor G5: Magic Cue, Voice Translate, Pro Res Zoom[25][26] |
| Samsung Galaxy S24 | Jan 2024 | On-device Circle to Search, Magic Compose, Photomoji[12] |
| Galaxy S24 FE, Z Fold 6, Z Flip 6 | 2024 | Nano-backed Google features via AICore[11] |
| Motorola Edge 50 Ultra, Razr 50 Ultra | 2024 | AICore-based Nano features[11] |
| Xiaomi 14T series, MIX Flip | 2024 | Announced AICore support[11] |
| Android via AI Edge SDK | Oct 2024 (experimental) | Text-to-text prompts on Pixel 9, expanding[9] |
| Chrome desktop (Prompt API) | Chrome 138 (2025) | LanguageModel JavaScript API in the browser[29] |
Samsung's Galaxy S24 series was the first non-Pixel phone to run Gemini Nano, bundled through AICore and powering Google-branded features such as Magic Compose in Messages and the on-device portions of Circle to Search; heavier work falls back to cloud-served Gemini Pro.[12][13] The same AICore plumbing has since shipped on Motorola and Xiaomi flagships, with Tensor, Qualcomm Snapdragon, and MediaTek Dimensity NPUs as the supported acceleration paths.[5][7]
The Pixel 9a, released in 2025, uses a smaller "Nano 1.0 XXS" variant; Google reduced the model because the device ships with only 8 gigabytes of RAM compared to the 12 to 16 gigabytes on the rest of the Pixel 9 series.[18] Nano XXS is text-only, does not run in the background, and the Pixel 9a therefore loses several multimodal AI features available on the Pixel 9, including Pixel Screenshots and Call Notes.[18]
The layer between user-facing apps and Gemini Nano on Android is built around two pieces. AICore is an Android system service introduced with Android 14. It manages the Nano model weights, runs inference on the Tensor TPU or partner NPU, applies safety filters, and updates the model out of band so apps do not have to bundle large weights themselves.[5][7] Google describes AICore as "private by design": the service has restricted internet access, runs each request in isolation, and uses the Private Compute Core architecture from earlier Pixel features such as Smart Reply.[8]
The Google AI Edge SDK is the developer-facing API for AICore. The first SDK package, com.google.ai.edge.aicore:aicore:0.0.1-exp01, opened experimental access on October 1, 2024, initially limited to text-to-text prompts on Pixel 9 series devices.[9] The SDK lets a developer set parameters such as temperature, top K, candidate count, and max output tokens, and stream tokens as they are produced.[9] The wider Google AI Edge effort also covers LiteRT (formerly TensorFlow Lite (LiteRT)) and a broader runtime called LiteRT-LM aimed at small language models on edge devices, with Google describing LiteRT-LM as the "battle-tested infrastructure powering Gemini Nano deployment across Google products, including Chrome and Pixel Watch."[28][34]
AICore exposes LoRA (Low-Rank Adaptation) fine-tuning on top of the base Nano weights, so feature owners can specialize the model for a single task without retraining or duplicating the full network.[5][8] Google uses the same mechanism internally. ML Kit's summarization, proofreading, and image-description APIs each ship with a small API-specific LoRA adapter trained on representative data; the adapter loads on demand once the base Nano model is on the device.[10] Google reports that feature-specific LoRA tuning lifts the summarization quality benchmark from 77.2 to 92.1 and image description from 86.9 to 92.3 on its internal raters.[10]
LoRA tuning also doubles as the integration point for safety controls. The LoRA blocks are trained against app-specific safety data alongside the task data, and AICore runs additional input and output safety classifiers around each call so that the same base model can satisfy different safety standards for different surfaces, from Recorder summaries to keyboard Smart Reply.[8][10]
ML Kit GenAI APIs sit one level above the AI Edge SDK and give Android developers ready-made entry points: Prompt, Summarization, Proofreading, Rewriting, Image Description, and Speech Recognition.[10][35] These were announced at Google I/O in May 2025 and ship as the com.google.mlkit:genai-* libraries.[10] Unlike the earlier AI Edge SDK preview, ML Kit GenAI runs on a wider range of Android phones, supports image inputs for the Image Description API, and is intended for production use on devices with optimized MediaTek Dimensity, Qualcomm Snapdragon, or Google Tensor hardware through AICore.[10][35]
In Chrome, Google added a built-in AI stack starting in 2024 with experimental flags and stabilized parts of it through Chrome 137 and 138 in 2025.[29][30] The Prompt API exposes Gemini Nano through a JavaScript LanguageModel global object, with promise-based and streaming interfaces for text generation.[29] Key methods include LanguageModel.availability(), LanguageModel.create() for instantiating a session, and session.prompt() or session.promptStreaming() for synchronous and streaming responses.[29]
Browser support requires Windows 10 or 11, macOS 13 and later, recent Linux, or ChromeOS on Chromebook Plus, plus around 22 gigabytes of free disk on the Chrome profile volume to hold the model.[29] Hardware-wise, Chrome requires a GPU with more than 4 gigabytes of VRAM or a CPU with at least 16 gigabytes of RAM and four or more cores, with audio input restricted to GPU paths.[29] Earlier reporting referred to the namespace as window.ai, but the released API uses LanguageModel directly and ships in Chrome rather than as a third-party shim.[29]
The Prompt API is generally available in Chrome 138 for Chrome Extensions and remains behind a flag for web pages.[29][30] Three sibling APIs reached stable in Chrome 138: the Summarizer API, the Translator API, and the Language Detector API.[30] The Writer and Rewriter APIs are in origin trials, the Proofreader API entered an origin trial through Chrome 139 Canary, and a multimodal variant of the Prompt API that accepts audio and images is available to participants in the Early Preview Program.[30] From Chrome 149, Gemini Nano in the browser supports English, Spanish, Japanese, German, and French for input and output text.[29]
Production features using Gemini Nano on Android and Chrome include Summarize in Recorder (offline bullet summaries of voice recordings), Smart Reply in Gboard (contextual chips for WhatsApp, Line, KakaoTalk), Magic Compose in Google Messages (tone rewriting with formal, excited, chill, and Shakespearean styles operating on the last 20 messages), Pixel Screenshots (natural-language search through saved screenshots), Pixel Studio (on-device image generation paired with a Tensor G4 diffusion model), Call Notes (call transcription and summary), Pixel Weather AI Reports, and TalkBack image descriptions.[3][11][12][20] On Pixel 10, Nano powers Magic Cue, Voice Translate, and Pro Res Zoom.[25][26]
Magic Cue surfaces relevant information across Gmail, Calendar, Messages, and Screenshots when it predicts the user might want it.[25] Voice Translate runs real-time translation on device while preserving the speaker's voice across English, Spanish, German, Japanese, French, Hindi, Italian, Portuguese, Swedish, Russian, and Indonesian.[25] Pro Res Zoom uses Nano-driven prompts alongside the Pixel 10 camera pipeline to enhance detail on zoom levels up to 100 times.[25]
Pixel Watch 4 and Pixel Buds 2a extend Nano features into wearables. Raise to Talk on Pixel Watch 4 starts a Gemini session by lifting the wrist, while Pixel Buds 2a are the first A-series buds to summarize and reply to messages without checking the phone.[25][27] Wear OS 6 began rolling Gemini out to Pixel Watch, Samsung, OnePlus, OPPO, and Xiaomi wearables through 2025.[27]
In Chrome, the Prompt API lets web pages run summarization, translation, and generation directly in the browser. Third-party Android apps build on the same model through ML Kit GenAI, the AI Edge SDK, or AICore, paying nothing per call and keeping user data on the device.[10][29]
Google has not published a full benchmark suite for the shipping Nano models, but several public numbers give a sense of the range. The Gemini 1.0 paper reports MMLU and BoolQ scores for Nano-1 and Nano-2 well below Pro and Ultra but competitive with other small Transformer models of similar size.[2] In particular, Gemini Nano 2 scored 45.9% on MMLU (5-shot) and 71.6% on BoolQ in the launch paper, normalized to 0.64 and 0.81 relative to Gemini Pro.[2] Phi-3 Mini, released roughly five months later, is reported to score 68.8% on the same MMLU benchmark, highlighting that small-model quality moved quickly in the year after Nano shipped.[23]
Google's October 2024 AI Edge SDK post notes that the experimental on-device Nano (described as Nano 2) scored 56% on MMLU compared with 46% for the earlier version, 23% on math compared with 14%, 90% on paraphrasing compared with 44%, and 82% on smart reply compared with 44%.[9] These are internal task evaluations, not standardized public benchmarks, but they show the trajectory of the on-device model since launch.
Latency on Tensor G3 and G5 hardware is in the low hundreds of milliseconds for short prompts, with the Pixel 8 Pro launch features designed for real-time keyboard suggestions and short summaries.[4] Tensor G5, the Pixel 10 chip manufactured on a TSMC 3 nanometer process, runs the newest Gemini Nano model approximately 2.6 times faster and 2 times more efficiently than the Tensor G4 inside the Pixel 9, and expands the on-device token window from roughly 12,000 to 32,000 tokens.[26] The Tensor G4 itself reuses the "rio" Edge TPU codename from the G3 with the same clock speed, so the bulk of the multimodal Nano speedups between Pixel 8 Pro and Pixel 9 came from software and the larger RAM budget rather than the accelerator.[31]
Memory footprint is dominated by the 4-bit weights, which fit in roughly 1 to 1.5 gigabytes for Nano-1 and 2 to 3 gigabytes for Nano-2, plus working memory for the KV cache.[2] Battery cost is low enough for Recorder summaries and Gboard suggestions to run frequently without unusual drain, although Google has gated longer-form features such as Recorder summarization of multi-hour audio to higher-RAM Pixel 9 devices.[17]
Quality on hard tasks lags behind cloud-served Gemini Pro, Gemini Flash, and frontier models from other vendors. Reviews of the launch features were mixed: AndroidPolice's Pixel 8 Pro hands-on found the Recorder summaries useful but the Gboard Smart Reply chips often bland or off-topic, and 9to5Google noted that Smart Reply latency was visible on the first builds.[4] Features have improved with subsequent Nano updates and tighter prompt tuning, but the on-device model is best understood as a fast assistant for short, well-scoped tasks rather than a frontier reasoning system.
Google's internal benchmarks released alongside the AI Edge SDK preview showed clear gains as the on-device Nano matured. On factuality, the Nano 2 paper score climbed from 46% to 56% on MMLU between the launch model and the experimental build offered to developers in October 2024, while paraphrasing rose from 44% to 90% and smart reply rose from 44% to 82%.[9] Math also improved from 14% to 23%, although the model remains weak compared to Phi-3 Mini's 68.8% MMLU performance and Apple's larger on-device model's reported lead over Phi-3 Mini and Mistral 7B on human evaluation.[22][23] The story across vendors is that the same parameter budget is delivering more useful capability every quarter, driven by data curation, distillation improvements, and feature-specific LoRA adapters rather than larger weights.[10][22]
Gemini Nano is one of several small language model systems positioned for on-device use. The table compares Nano with the most prominent peers, using parameter counts disclosed in papers, blog posts, or model cards.
| Model | Vendor | Parameters | Quantization | On-device | Multimodal | First disclosed |
|---|---|---|---|---|---|---|
| Gemini Nano-1 | ~1.8B | 4-bit | Yes | Text, image, audio (multimodal variant)[2][17] | Dec 2023 | |
| Gemini Nano-2 | ~3.25B | 4-bit | Yes | Text, image, audio[2][17] | Dec 2023 | |
| Apple Foundation Models (on-device) | Apple | ~3B | Mixed 2/4-bit, avg 3.7 bpw[22] | Yes | Text, image | Jun 2024 |
| Phi-3 Mini | Microsoft | 3.8B[23] | 4-bit possible | Yes (community) | Text (image variant separate) | Apr 2024 |
| Llama 3.2 1B | Meta | ~1.2B | 4-bit possible | Yes (community) | Text | Sep 2024 |
| Llama 3.2 3B | Meta | ~3.2B | 4-bit possible | Yes (community) | Text | Sep 2024 |
| Llama 3.2 11B Vision | Meta | ~11B | Mixed | Edge servers / high-end | Text, image | Sep 2024 |
| Mistral 7B (small variants) | Mistral AI | 7B | 4-bit possible | Yes (community) | Text | Sep 2023 |
| Qwen 2 0.5B / 1.5B | Alibaba | 0.5B / 1.5B | 4-bit possible | Yes | Text | Jun 2024 |
| GPT-4o mini | OpenAI | Undisclosed | n/a | No (cloud only) | Text, image, audio | Jul 2024 |
Gemini Nano is unusual on this list for two reasons. It ships pre-installed on consumer phones through a system service rather than as a hobbyist download, and it is gated behind Google's API rather than released as open weights. LLaMA and Phi-3 are open-weights models that rely on third-party runtimes such as llama.cpp or Ollama to run on phones, while Apple Intelligence is a parallel system-level integration locked to Apple silicon. GPT-4o mini is a cloud-hosted OpenAI model included in the table only to clarify it is not a peer of Nano in the on-device sense.
Microsoft's Phi-3 Mini, with 3.8 billion parameters and a similar 4-bit deployment profile, posts higher absolute MMLU scores than the Gemini Nano numbers in the original 1.0 paper but does not have an equivalent system-level integration in any consumer operating system.[23] In practice, that distinction is what makes Nano commercially significant: a Pixel user does not pick Nano off a model hub, the operating system runs it on their behalf, much like Apple Intelligence on iOS.
Apple's response to Gemini Nano arrived at WWDC in June 2024 as Apple Intelligence.[21][22] Both systems target similar features (writing tools, summaries, smart replies, image descriptions, on-device assistants), but they differ on hardware, model layout, and trust model.
| Aspect | Gemini Nano | Apple Intelligence |
|---|---|---|
| Vendor | Apple | |
| On-device model size | ~1.8B (Nano-1), ~3.25B (Nano-2)[2] | ~3B on-device foundation model[22] |
| Cloud fallback | Cloud Gemini Pro / Flash / Ultra | Private Cloud Compute model on Apple Silicon servers[22] |
| Hardware | Tensor G3 / G4 / G5, Qualcomm and MediaTek NPUs via AICore[5][26] | Apple A17 Pro and later, M-series for iPad and Mac[22] |
| Operating systems | Android 14+, ChromeOS, desktop Chrome[5][29] | iOS 18+, iPadOS 18+, macOS Sequoia+[22] |
| Quantization | 4-bit weights[2] | Mixed 2-bit and 4-bit, ~3.7 bits per weight average[22] |
| Trust model | Private Compute Core on-device[8] | Private Cloud Compute with attested servers, no logging[22] |
| Developer access | AI Edge SDK, ML Kit GenAI, Chrome Prompt API[9][10][29] | Writing Tools, Image Playground, App Intents, Foundation Models framework[22] |
| Open weights | No | No |
Apple's machine-learning research paper claims that its 3 billion parameter on-device model outperforms Phi-3 Mini, Mistral 7B, Gemma 7B, and Llama 3 8B on its human evaluation suite, and Apple reports a generation rate of about 30 tokens per second on the iPhone 15 Pro with around 0.6 milliseconds time-to-first-token latency.[22] Both systems use task-specific adapters, but Apple has elected to keep its adapter format internal while Google exposes the LoRA mechanism to third-party developers through AICore.[8][22]
The two systems represent different bets about what "system-level AI" looks like. Google's bet is that Nano can be a portable layer running on Pixel, partner Android phones, and desktop Chrome, with cloud Gemini available when the on-device model is not enough. Apple's bet is a tighter pair of one on-device model and one private-cloud model, both restricted to Apple Silicon, with third-party cloud models such as ChatGPT plugged in only on user request.
Gemini Nano is constrained by both its size and its deployment context. It is much smaller than cloud Gemini models, so it is weaker on hard reasoning, long-context tasks, and complex coding. The Gemini paper's own benchmarks show Nano-2 well behind Gemini Pro on MMLU, GSM8K, and similar evaluations.[2] For tasks like explaining a legal contract or refactoring a large codebase, the cloud-served Gemini models and competing frontier systems are still the right tools.
4-bit quantization gives up some quality compared to higher-precision weights, in exchange for faster inference and lower memory. Battery, thermal, and memory constraints also limit how aggressively apps can use Nano. Continuous generation works for short bursts, but real-time agent loops and long-running summaries of multi-hour audio are gated to higher-RAM Pixel models, and Google's documentation recommends batching work and avoiding tight loops.
Device support is uneven. Many Nano features are exclusive to Pixel, with Samsung, Motorola, and Xiaomi getting subsets through partnerships.[11][12] The Pixel 9a uses a smaller "Nano XXS" variant with only text input and no background execution, and older or lower-tier Android phones do not support Nano at all.[18] Google's newer Gemini Intelligence layer, as of 2026, requires 12 gigabytes of RAM, a flagship chip, and a Gemini Nano v3 or higher runtime, leaving devices below that bar with reduced features.[32]
Chrome's Prompt API requires substantial disk space (around 22 gigabytes for the model and runtime), is unsupported on mobile and non-Plus ChromeOS, and is still flagged as experimental for many origins.[29] Nano is also closed source: there are no public weights, and developers cannot fine-tune the base model directly. Google supports LoRA adapters on top of Nano through AICore for some customization, but the base model remains under Google's control.[7][8]
The original Pixel 8 Pro launch features drew mixed reviews. Recorder summarization was widely praised as useful and reliable, but the first Gboard Smart Reply suggestions were criticized for being generic, occasionally off-topic, and slower than the on-server replies they were meant to replace, and several outlets noted that updates through 2024 and 2025 were needed before the on-device experience felt competitive.[4] Even the multimodal Nano on the Pixel 9 was limited to English at launch for several features, with broader language support arriving incrementally.[3]
In January 2024, the Galaxy S24 series brought Gemini Nano to its first non-Pixel phone, alongside Circle to Search and Magic Compose.[12][13] The March and June 2024 Pixel Feature Drops extended Nano features to the Pixel 8 and Pixel 8a, initially through a developer option and then as default features once Google was satisfied the lower-RAM phones could handle them.[15]
Google I/O 2024 previewed Gemini Nano with Multimodality, the variant that adds image and audio understanding alongside text.[16] The Pixel 9 series, launched in August 2024, became the first phone to ship the multimodal Nano. 9to5Google reported the model is roughly twice the size of the original and substantially more capable on Recorder summaries and image descriptions.[17] The same release added Pixel Screenshots, Pixel Studio, richer TalkBack image descriptions, and Call Notes.[20]
On October 1, 2024, Google opened experimental access to Gemini Nano for all Android developers through the AI Edge SDK and AICore, initially limited to Pixel 9 devices for text-to-text prompts.[9] In May 2025, ML Kit GenAI APIs added higher-level entry points for prompt, summarization, proofreading, rewriting, image description, and speech recognition, with the Gemini Nano variant shipping on Pixel 10 reaching ML Kit later in the year.[10] Google I/O 2025 included a session titled "Gemini Nano on Android: building with on-device gen AI," framing Nano as a stable platform for third-party apps rather than a Pixel-only experiment.[24]
In 2025, the Pixel 10 series launched on the Tensor G5 with Nano powering Magic Cue, Voice Translate, and Pro Res Zoom, plus on-device portions of the photo-to-video feature in the Gemini app.[25][26] The Tensor G5 expanded the on-device token window from roughly 12,000 to 32,000 tokens and ran the newest Nano model roughly 2.6 times faster than Tensor G4 while consuming half the power.[26] The Pixel Watch 4 added Raise to Talk, and Pixel Buds 2a added on-device message summaries.[25][27]
The 2026 Android Developers Blog post announcing Gemma 4 in the AICore Developer Preview noted that Gemma 4 was the foundation for the next Gemini Nano generation, called "Gemini Nano 4," and that AICore-targeted Gemma 4 code would run automatically on Nano 4 devices later in 2026.[7] Chrome's built-in AI APIs moved from a developer preview to broader availability through Chrome 137 and 138 in 2025, with the Prompt API exposing Nano through a stable LanguageModel JavaScript object on Windows, macOS, Linux, and ChromeOS.[29][30]
Gemini Nano is the on-device anchor of Google's AI strategy. The cloud-served Pro, Flash, and Ultra models compete directly with OpenAI's GPT line and Anthropic's Claude in the API and chatbot markets, but Nano competes in a different fight: which AI assistant is built into the operating system a user picks up first thing in the morning. By shipping Nano on Pixel, partner Android phones, Wear OS watches, Pixel Buds, and desktop Chrome, Google offers a system-level AI layer no pure API vendor can match.[25][27][29]
Nano is also the most direct counterweight to Apple Intelligence. Both companies have settled on a similar pattern: a small on-device model for fast, private operations and a larger cloud model for harder work. The differences between them, on hardware, on disclosure, and on third-party integration, will shape how the phone industry thinks about AI for years.[21][22]
For developers, Nano expands the surface area of generative AI. A prompt through the AI Edge SDK costs nothing per call, runs offline, and keeps user data on the device, which makes it attractive for features hard to justify with a cloud API: real-time keyboard help, sensitive document summaries, accessibility tools, and long-running background work.[9][10] The trade-off is that the model is smaller, the API is gated to specific devices and chipsets, and Google sets the rules for what runs through AICore.[7][8] Nano shifts some on-device AI design decisions from app developers to the platform vendor, much as camera pipelines and notification systems have done on smartphones. Generative AI is moving from a service users go to into a feature ambient in the phone, the watch, and the browser, and Gemini Nano is one of the clearest early examples of that shift in production.[25][27][29]
For the browser ecosystem, the Chrome Prompt API turns Gemini Nano into the first widely available on-device LLM exposed through a standard web platform interface.[29] That changes what a web developer can ship without infrastructure costs or privacy disclosures: a translation widget, a summarizer for long articles, a structured-output JSON generator for an internal tool, or a generative search box on a personal blog can all run entirely on the user's machine.[29][30] The chief constraints are the 22 gigabyte disk requirement for the model and the relatively narrow set of supported operating systems, which Google has confirmed will expand over time.[29] The combination of Android AICore on phones and Chrome's Prompt API in the browser gives Google two reference surfaces that no third-party LLM vendor can replicate without a comparable platform footprint.