Gemini Nano

AI Hardware Google Large Language Models

33 min read

Updated Jun 24, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 24, 2026

Fact-checked

In review queue

Sources

35 citations

Revision

v3 · 6,570 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Gemini Nano is the smallest and most efficient variant of Google's Gemini family of multimodal large language models, designed to run directly on phones and other edge hardware instead of in cloud data centers.^[1] Google introduced it on December 6, 2023 as the on-device tier of Gemini 1.0, ships it in two sizes, Nano-1 with 1.8 billion parameters and Nano-2 with 3.25 billion parameters, both distilled from larger Gemini models and quantized to 4 bits, and first deployed it on the Pixel 8 Pro to power Summarize in Recorder and Smart Reply in Gboard.^[1]^[2]^[3] Because the model runs locally, it works offline and keeps user data on the device: Google says Gemini Nano on the Pixel 8 Pro "offers several advantages by design, helping prevent sensitive data from leaving the phone, as well as offering the ability to use features without a network connection."^[3]

Gemini Nano was announced by Google CEO Sundar Pichai and Google DeepMind CEO Demis Hassabis alongside the larger Gemini Pro and Gemini Ultra models, and first shipped on the Pixel 8 Pro in the December 2023 Pixel Feature Drop.^[1]^[3] Nano powers on-device features such as summarization in the Recorder app, Smart Reply in Gboard, Magic Compose in Google Messages, Pixel Screenshots search, and image descriptions in TalkBack, while keeping user data on the device.^[3]^[11]

The original Gemini technical report (arXiv:2312.11805) describes Nano as a pair of small models distilled from larger Gemini variants and quantized to 4 bits, and frames the series as "the best-in-class small language models for a wide variety of tasks, such as summarization and reading comprehension, which power next generation on-device experiences."^[2]^[6] Gemini Nano-1 has 1.8 billion parameters and targets low-memory devices, while Gemini Nano-2 has 3.25 billion parameters and targets higher-memory devices.^[2]^[33] Google has since expanded Nano beyond Pixel through the Android AICore system service and the Google AI Edge SDK, brought it into desktop Chrome through the built-in Prompt API, and added a multimodal Nano with image and audio understanding on the Pixel 9 series.^[7]^[9]^[17] With Apple's 2024 release of Apple Intelligence, Gemini Nano became the most prominent counterpart to Apple's on-device AI foundation models in the smartphone market.^[21]

When was Gemini Nano released?

Google announced the Gemini family in a blog post by Sundar Pichai and Google DeepMind CEO Demis Hassabis on December 6, 2023.^[1]^[6] Pichai described Gemini as Google's "most capable and general model yet" and said it had been "built to be multimodal" from the start, handling text, code, audio, images, and video together rather than stitching modalities together later.^[1] The first generation, Gemini 1.0, was released in three sizes: Ultra for highly complex tasks, Pro for a balance of capability and scale, and Nano for on-device tasks.^[1]

The accompanying technical report, "Gemini: A Family of Highly Capable Multimodal Models," appeared on arXiv as paper 2312.11805 on December 19, 2023, with later revisions through 2024 and 2025.^[2] The report describes Gemini Nano as two small models, Nano-1 with 1.8 billion parameters and Nano-2 with 3.25 billion parameters, both trained by distilling from larger Gemini models and quantized to 4 bits.^[2] The paper frames Nano as the variant intended for memory-constrained, on-device deployment, in contrast to Pro and Ultra, which were initially served from Google's data centers.^[2]

The Pixel 8 Pro was the first device to ship Gemini Nano. Google's December 2023 Pixel Feature Drop, also published on December 6, 2023, announced that Nano was running on the phone and powering two production features: Summarize in Recorder, which generated bullet-point summaries of voice recordings without a network connection, and Smart Reply in Gboard, which produced contextual responses for messaging apps including WhatsApp, Line, and KakaoTalk.^[3] Both ran locally on the Tensor G3 chip.^[3]^[4] Google's blog called the Pixel 8 Pro "the first smartphone with AI built in," referring to the system-level integration of an on-device large language model, and described Gemini Nano as "our most efficient model built for on-device tasks."^[3]

Behind the scenes, Google introduced Android AICore the same day as a new Android 14 system service that provides apps with access to Gemini Nano while managing the model, runtime, and safety features.^[5] AICore launched with support for Pixel 8 Pro and was designed from the start to accept neural-network co-processors from Qualcomm, Samsung S.LSI, and MediaTek, signalling that Google intended Nano to be a portable component across the Android ecosystem rather than a Pixel-exclusive.^[5]

How does Gemini Nano fit in the Gemini family?

Gemini Nano sits inside a wider model family that has grown across multiple generations.

Variant	First released	Size class	Runs in	Notes
Gemini Ultra	Dec 2023 (1.0)	Frontier	Cloud	Largest 1.0 model, served via Bard / Gemini Advanced^[1]
Gemini Pro	Dec 2023 (1.0)	Mid-tier	Cloud	Default API model in early 2024^[1]
Gemini Nano	Dec 2023 (1.0)	1.8B and 3.25B	On-device	Distilled and 4-bit quantized^[2]
Gemini 1.5 Pro / Flash	Feb and May 2024	Mid-tier	Cloud	Long context, Mixture of Experts (MoE)
Gemini 2.0 Flash	Jan 2025	Compact cloud	Cloud	Default for the Gemini app and API
Gemini 2.0 Pro	Feb 2025	Mid-tier	Cloud	Two-million-token context
Gemini 2.5 family	2025	Mid-tier and large	Cloud	DeepMind-led generation introduced at I/O 2025
Gemini 3 Pro	Late 2025	Frontier	Cloud	Successor to 2.5 line, see Gemini 3 Pro

Nano is the only branch explicitly designed for on-device use. The cloud Flash models are smaller and cheaper than Pro or Ultra but still run in Google's data centers. Nano carries the Gemini brand into phones, watches, and browsers, with Google shipping updated Nano variants alongside each generation, usually a release or two behind the cloud models.

How is Gemini Nano built?

Gemini Nano shares the broader Gemini architecture, a decoder-only Transformer trained on text, code, images, and audio.^[2] Two design choices distinguish Nano from the Pro and Ultra variants: distillation and aggressive quantization.

Distillation from larger Gemini models

Knowledge distillation means the Nano models are trained to imitate the outputs of larger Gemini models rather than learning purely from raw training data.^[2] The student model, Nano-1 or Nano-2, sees both the original training corpus and the probability distributions over tokens produced by a larger Gemini teacher.^[2] This lets a small model carry over a useful share of the teacher's behavior in a much smaller parameter budget, and the Gemini paper credits this approach for Nano's relatively strong performance per parameter on MMLU and reading comprehension.^[2] The Gemini 1.0 report frames the Nano series as "the best-in-class small language models for a wide variety of tasks, such as summarization and reading comprehension," and explains that the small footprint comes from a combination of distillation, careful data curation, and post-training quantization.^[2]

4-bit quantization

Quantization to 4 bits per weight roughly quarters the storage and memory bandwidth required compared to a 16-bit floating point representation. With 4-bit weights, Nano-1 fits in roughly 1 to 1.5 gigabytes of memory and Nano-2 in around 2 to 3 gigabytes, depending on activation precision and any LoRA (Low-Rank Adaptation) adapters loaded on top.^[2]^[5] Google has not published every detail of the quantization scheme, but Android documentation refers to 4-bit Nano weights running through the AICore runtime on Tensor and partner NPUs.^[5]^[7]

Compared to Apple's on-device foundation model, which uses a mixed 2-bit and 4-bit configuration averaging 3.7 bits per weight, Nano sits at a coarser quantization level but on a similar memory budget.^[22] Phi-3 Mini from Microsoft can also be quantized to 4 bits and occupies roughly 1.8 gigabytes in that form, slotting between Nano-1 and Nano-2 on a memory-versus-parameter axis.^[23]

Is Gemini Nano open source?

Google has kept Nano as a closed-weights model. There are no public weight downloads, and developers interact with Nano only through Google's APIs on Android, Chrome, and partner devices. The related Gemma family is Google's open-weights line of small Transformer models, but it is a separate brand with its own checkpoints; Nano remains the proprietary on-device model that Google ships through its own surfaces. The two lines share research lineage and tooling. A 2026 Android Developers Blog noted that Gemma 4 will serve as the foundation for the next Gemini Nano generation, with AICore-targeted Gemma 4 code expected to run automatically on future Nano 4 devices, but the on-device Nano weights themselves remain proprietary.^[7]

Multimodal Nano

The original 1.0 Nano models were text-only.^[2] At Google I/O in May 2024, Google previewed "Gemini Nano with Multimodality," a variant that adds image and audio understanding in addition to text.^[16] The multimodal Nano shipped on the Pixel 9 series in August 2024 and is "significantly larger than the previous one," roughly twice the size of Nano 1.0 and described by Google as "three times more capable and sophisticated" than the version that shipped on the Pixel 8 Pro.^[17] The multimodal model continues to use distillation and 4-bit quantization, with the Pixel 9 series shipping 16 gigabytes of RAM in part to reserve room for the larger weights.^[17]^[19]

What can Gemini Nano do?

Gemini Nano covers a narrower range of tasks than its cloud siblings, but it handles the everyday workloads that benefit most from low latency and on-device privacy. The model handles text generation and rewriting (including tone shifts used by Magic Compose in Google Messages), summarization of long inputs such as Recorder audio transcripts and chat threads, smart reply and short-message proofreading, and, on Pixel 9 and later, multimodal text-image-audio understanding through the Nano with Multimodality variant.^[11]^[14]^[17] The same multimodal model powers richer transcript summaries and detailed image descriptions for accessibility tools. Sensor- and context-aware features combine Nano output with on-device signals such as call audio, screenshots, or app context.

Pixel Screenshots, introduced with the Pixel 9 family in August 2024, is a representative case. It lets a user search saved screenshots in plain English and pulls out details like flight numbers, Wi-Fi passwords, or recipe ingredients.^[20] Nano runs locally so the screenshots themselves never leave the phone, and Google says all the processing remains offline for near-immediate results even with thousands of screenshots indexed.^[20]

Call Notes, also on Pixel 9 and later, takes the audio of a phone call and produces a private on-device summary with the key details; both parties on the call are notified when the feature is active.^[20] Pixel Weather generates a one-sentence custom forecast pulling in UV index and air quality on top of the temperature.^[20] Pixel Studio combines Nano text understanding with a separate on-device diffusion model that runs on Tensor G4 for fast image generation.^[20]

Which devices and platforms run Gemini Nano?

Google has expanded Gemini Nano well beyond the original Pixel 8 Pro launch.

Platform	First Nano support	Surface
Pixel 8 Pro	Dec 2023 Feature Drop	Recorder summaries, Gboard Smart Reply^[3]
Pixel 8 / 8a	Jun 2024 Feature Drop	Same features as 8 Pro, gated by 8 GB RAM^[15]
Pixel 9, 9 Pro, 9 Pro XL, 9 Pro Fold	Aug 2024	Multimodal Nano, Pixel Screenshots, Pixel Studio, Call Notes^[17]^[20]
Pixel 9a	2025	Smaller "Nano XXS" variant^[18]
Pixel 10 family	2025	Tensor G5: Magic Cue, Voice Translate, Pro Res Zoom^[25]^[26]
Samsung Galaxy S24	Jan 2024	On-device Circle to Search, Magic Compose, Photomoji^[12]
Galaxy S24 FE, Z Fold 6, Z Flip 6	2024	Nano-backed Google features via AICore^[11]
Motorola Edge 50 Ultra, Razr 50 Ultra	2024	AICore-based Nano features^[11]
Xiaomi 14T series, MIX Flip	2024	Announced AICore support^[11]
Android via AI Edge SDK	Oct 2024 (experimental)	Text-to-text prompts on Pixel 9, expanding^[9]
Chrome desktop (Prompt API)	Chrome 138 (2025)	LanguageModel JavaScript API in the browser^[29]

Samsung's Galaxy S24 series was the first non-Pixel phone to run Gemini Nano, bundled through AICore and powering Google-branded features such as Magic Compose in Messages and the on-device portions of Circle to Search; heavier work falls back to cloud-served Gemini Pro.^[12]^[13] The same AICore plumbing has since shipped on Motorola and Xiaomi flagships, with Tensor, Qualcomm Snapdragon, and MediaTek Dimensity NPUs as the supported acceleration paths.^[5]^[7]

The Pixel 9a, released in 2025, uses a smaller "Nano 1.0 XXS" variant; Google reduced the model because the device ships with only 8 gigabytes of RAM compared to the 12 to 16 gigabytes on the rest of the Pixel 9 series.^[18] Nano XXS is text-only, does not run in the background, and the Pixel 9a therefore loses several multimodal AI features available on the Pixel 9, including Pixel Screenshots and Call Notes.^[18]

How do developers access Gemini Nano?

The layer between user-facing apps and Gemini Nano on Android is built around two pieces. AICore is an Android system service introduced with Android 14. It manages the Nano model weights, runs inference on the Tensor TPU or partner NPU, applies safety filters, and updates the model out of band so apps do not have to bundle large weights themselves.^[5]^[7] Google describes AICore as "private by design": the service has restricted internet access, runs each request in isolation, and uses the Private Compute Core architecture from earlier Pixel features such as Smart Reply.^[8]

The Google AI Edge SDK is the developer-facing API for AICore. The first SDK package, com.google.ai.edge.aicore:aicore:0.0.1-exp01, opened experimental access on October 1, 2024, initially limited to text-to-text prompts on Pixel 9 series devices.^[9] The SDK lets a developer set parameters such as temperature, top K, candidate count, and max output tokens, and stream tokens as they are produced.^[9] The wider Google AI Edge effort also covers LiteRT (formerly TensorFlow Lite (LiteRT)) and a broader runtime called LiteRT-LM aimed at small language models on edge devices, with Google describing LiteRT-LM as the "battle-tested infrastructure powering Gemini Nano deployment across Google products, including Chrome and Pixel Watch."^[28]^[34]

LoRA adapters and safety pipeline

AICore exposes LoRA (Low-Rank Adaptation) fine-tuning on top of the base Nano weights, so feature owners can specialize the model for a single task without retraining or duplicating the full network.^[5]^[8] Google uses the same mechanism internally. ML Kit's summarization, proofreading, and image-description APIs each ship with a small API-specific LoRA adapter trained on representative data; the adapter loads on demand once the base Nano model is on the device.^[10] Google reports that feature-specific LoRA tuning lifts the summarization quality benchmark from 77.2 to 92.1 and image description from 86.9 to 92.3 on its internal raters.^[10]

LoRA tuning also doubles as the integration point for safety controls. The LoRA blocks are trained against app-specific safety data alongside the task data, and AICore runs additional input and output safety classifiers around each call so that the same base model can satisfy different safety standards for different surfaces, from Recorder summaries to keyboard Smart Reply.^[8]^[10]

ML Kit GenAI APIs

ML Kit GenAI APIs sit one level above the AI Edge SDK and give Android developers ready-made entry points: Prompt, Summarization, Proofreading, Rewriting, Image Description, and Speech Recognition.^[10]^[35] These were announced at Google I/O in May 2025 and ship as the com.google.mlkit:genai-* libraries.^[10] Unlike the earlier AI Edge SDK preview, ML Kit GenAI runs on a wider range of Android phones, supports image inputs for the Image Description API, and is intended for production use on devices with optimized MediaTek Dimensity, Qualcomm Snapdragon, or Google Tensor hardware through AICore.^[10]^[35]

How does the Chrome Prompt API use Gemini Nano?

In Chrome, Google added a built-in AI stack starting in 2024 with experimental flags and stabilized parts of it through Chrome 137 and 138 in 2025.^[29]^[30] The Prompt API exposes Gemini Nano through a JavaScript LanguageModel global object, with promise-based and streaming interfaces for text generation; Google states that "with the Prompt API, you can send natural language requests to Gemini Nano in Chrome" and that "no data is sent to Google or any third party when using the model."^[29] Key methods include LanguageModel.availability(), LanguageModel.create() for instantiating a session, and session.prompt() or session.promptStreaming() for synchronous and streaming responses.^[29]

Browser support requires Windows 10 or 11, macOS 13 and later, recent Linux, or ChromeOS on Chromebook Plus, plus around 22 gigabytes of free disk on the Chrome profile volume to hold the model.^[29] Hardware-wise, Chrome requires a GPU with more than 4 gigabytes of VRAM or a CPU with at least 16 gigabytes of RAM and four or more cores, with audio input restricted to GPU paths.^[29] The network is only needed for the one-time model download: Google notes that "subsequent use of the model does not require a network connection."^[29] Earlier reporting referred to the namespace as window.ai, but the released API uses LanguageModel directly and ships in Chrome rather than as a third-party shim.^[29]

The Prompt API is generally available in Chrome 138 for Chrome Extensions and remains behind a flag for web pages.^[29]^[30] Three sibling APIs reached stable in Chrome 138: the Summarizer API, the Translator API, and the Language Detector API.^[30] The Writer and Rewriter APIs are in origin trials, the Proofreader API entered an origin trial through Chrome 139 Canary, and a multimodal variant of the Prompt API that accepts audio and images is available to participants in the Early Preview Program.^[30] From Chrome 149, Gemini Nano in the browser supports English, Spanish, Japanese, German, and French for input and output text.^[29]

What is Gemini Nano used for?

Production features using Gemini Nano on Android and Chrome include Summarize in Recorder (offline bullet summaries of voice recordings), Smart Reply in Gboard (contextual chips for WhatsApp, Line, KakaoTalk), Magic Compose in Google Messages (tone rewriting with formal, excited, chill, and Shakespearean styles operating on the last 20 messages), Pixel Screenshots (natural-language search through saved screenshots), Pixel Studio (on-device image generation paired with a Tensor G4 diffusion model), Call Notes (call transcription and summary), Pixel Weather AI Reports, and TalkBack image descriptions.^[3]^[11]^[12]^[20] On Pixel 10, Nano powers Magic Cue, Voice Translate, and Pro Res Zoom.^[25]^[26]

Magic Cue surfaces relevant information across Gmail, Calendar, Messages, and Screenshots when it predicts the user might want it.^[25] Voice Translate runs real-time translation on device while preserving the speaker's voice across English, Spanish, German, Japanese, French, Hindi, Italian, Portuguese, Swedish, Russian, and Indonesian.^[25] Pro Res Zoom uses Nano-driven prompts alongside the Pixel 10 camera pipeline to enhance detail on zoom levels up to 100 times.^[25]

Pixel Watch 4 and Pixel Buds 2a extend Nano features into wearables. Raise to Talk on Pixel Watch 4 starts a Gemini session by lifting the wrist, while Pixel Buds 2a are the first A-series buds to summarize and reply to messages without checking the phone.^[25]^[27] Wear OS 6 began rolling Gemini out to Pixel Watch, Samsung, OnePlus, OPPO, and Xiaomi wearables through 2025.^[27]

In Chrome, the Prompt API lets web pages run summarization, translation, and generation directly in the browser. Third-party Android apps build on the same model through ML Kit GenAI, the AI Edge SDK, or AICore, paying nothing per call and keeping user data on the device.^[10]^[29]

How fast is Gemini Nano and how does it score?

Google has not published a full benchmark suite for the shipping Nano models, but several public numbers give a sense of the range. The Gemini 1.0 paper reports MMLU and BoolQ scores for Nano-1 and Nano-2 well below Pro and Ultra but competitive with other small Transformer models of similar size.^[2] In particular, Gemini Nano 2 scored 45.9% on MMLU (5-shot) and 71.6% on BoolQ in the launch paper, normalized to 0.64 and 0.81 relative to Gemini Pro.^[2] Phi-3 Mini, released roughly five months later, is reported to score 68.8% on the same MMLU benchmark, highlighting that small-model quality moved quickly in the year after Nano shipped.^[23]

Google's October 2024 AI Edge SDK post notes that the experimental on-device Nano (described as Nano 2) scored 56% on MMLU compared with 46% for the earlier version, 23% on math compared with 14%, 90% on paraphrasing compared with 44%, and 82% on smart reply compared with 44%.^[9] These are internal task evaluations, not standardized public benchmarks, but they show the trajectory of the on-device model since launch.

Latency on Tensor G3 and G5 hardware is in the low hundreds of milliseconds for short prompts, with the Pixel 8 Pro launch features designed for real-time keyboard suggestions and short summaries.^[4] Tensor G5, the Pixel 10 chip manufactured on a TSMC 3 nanometer process, runs the newest Gemini Nano model approximately 2.6 times faster and 2 times more efficiently than the Tensor G4 inside the Pixel 9, and expands the on-device token window from roughly 12,000 to 32,000 tokens.^[26] The Tensor G4 itself reuses the "rio" Edge TPU codename from the G3 with the same clock speed, so the bulk of the multimodal Nano speedups between Pixel 8 Pro and Pixel 9 came from software and the larger RAM budget rather than the accelerator.^[31]

Memory footprint is dominated by the 4-bit weights, which fit in roughly 1 to 1.5 gigabytes for Nano-1 and 2 to 3 gigabytes for Nano-2, plus working memory for the KV cache.^[2] Battery cost is low enough for Recorder summaries and Gboard suggestions to run frequently without unusual drain, although Google has gated longer-form features such as Recorder summarization of multi-hour audio to higher-RAM Pixel 9 devices.^[17]

Quality on hard tasks lags behind cloud-served Gemini Pro, Gemini Flash, and frontier models from other vendors. Reviews of the launch features were mixed: AndroidPolice's Pixel 8 Pro hands-on found the Recorder summaries useful but the Gboard Smart Reply chips often bland or off-topic, and 9to5Google noted that Smart Reply latency was visible on the first builds.^[4] Features have improved with subsequent Nano updates and tighter prompt tuning, but the on-device model is best understood as a fast assistant for short, well-scoped tasks rather than a frontier reasoning system.

Google's internal benchmarks released alongside the AI Edge SDK preview showed clear gains as the on-device Nano matured. On factuality, the Nano 2 paper score climbed from 46% to 56% on MMLU between the launch model and the experimental build offered to developers in October 2024, while paraphrasing rose from 44% to 90% and smart reply rose from 44% to 82%.^[9] Math also improved from 14% to 23%, although the model remains weak compared to Phi-3 Mini's 68.8% MMLU performance and Apple's larger on-device model's reported lead over Phi-3 Mini and Mistral 7B on human evaluation.^[22]^[23] The story across vendors is that the same parameter budget is delivering more useful capability every quarter, driven by data curation, distillation improvements, and feature-specific LoRA adapters rather than larger weights.^[10]^[22]

How does Gemini Nano compare with other on-device language models?

Gemini Nano is one of several small language model systems positioned for on-device use. The table compares Nano with the most prominent peers, using parameter counts disclosed in papers, blog posts, or model cards.

Model	Vendor	Parameters	Quantization	On-device	Multimodal	First disclosed
Gemini Nano-1	Google	~1.8B	4-bit	Yes	Text, image, audio (multimodal variant)^[2]^[17]	Dec 2023
Gemini Nano-2	Google	~3.25B	4-bit	Yes	Text, image, audio^[2]^[17]	Dec 2023
Apple Foundation Models (on-device)	Apple	~3B	Mixed 2/4-bit, avg 3.7 bpw^[22]	Yes	Text, image	Jun 2024
Phi-3 Mini	Microsoft	3.8B^[23]	4-bit possible	Yes (community)	Text (image variant separate)	Apr 2024
Llama 3.2 1B	Meta	~1.2B	4-bit possible	Yes (community)	Text	Sep 2024
Llama 3.2 3B	Meta	~3.2B	4-bit possible	Yes (community)	Text	Sep 2024
Llama 3.2 11B Vision	Meta	~11B	Mixed	Edge servers / high-end	Text, image	Sep 2024
Mistral 7B (small variants)	Mistral AI	7B	4-bit possible	Yes (community)	Text	Sep 2023
Qwen 2 0.5B / 1.5B	Alibaba	0.5B / 1.5B	4-bit possible	Yes	Text	Jun 2024
GPT-4o mini	OpenAI	Undisclosed	n/a	No (cloud only)	Text, image, audio	Jul 2024

Gemini Nano is unusual on this list for two reasons. It ships pre-installed on consumer phones through a system service rather than as a hobbyist download, and it is gated behind Google's API rather than released as open weights. LLaMA and Phi-3 are open-weights models that rely on third-party runtimes such as llama.cpp or Ollama to run on phones, while Apple Intelligence is a parallel system-level integration locked to Apple silicon. GPT-4o mini is a cloud-hosted OpenAI model included in the table only to clarify it is not a peer of Nano in the on-device sense.

Microsoft's Phi-3 Mini, with 3.8 billion parameters and a similar 4-bit deployment profile, posts higher absolute MMLU scores than the Gemini Nano numbers in the original 1.0 paper but does not have an equivalent system-level integration in any consumer operating system.^[23] In practice, that distinction is what makes Nano commercially significant: a Pixel user does not pick Nano off a model hub, the operating system runs it on their behalf, much like Apple Intelligence on iOS.

How does Gemini Nano differ from Apple Intelligence?

Apple's response to Gemini Nano arrived at WWDC in June 2024 as Apple Intelligence.^[21]^[22] Both systems target similar features (writing tools, summaries, smart replies, image descriptions, on-device assistants), but they differ on hardware, model layout, and trust model.

Aspect	Gemini Nano	Apple Intelligence
Vendor	Google	Apple
On-device model size	~1.8B (Nano-1), ~3.25B (Nano-2)^[2]	~3B on-device foundation model^[22]
Cloud fallback	Cloud Gemini Pro / Flash / Ultra	Private Cloud Compute model on Apple Silicon servers^[22]
Hardware	Tensor G3 / G4 / G5, Qualcomm and MediaTek NPUs via AICore^[5]^[26]	Apple A17 Pro and later, M-series for iPad and Mac^[22]
Operating systems	Android 14+, ChromeOS, desktop Chrome^[5]^[29]	iOS 18+, iPadOS 18+, macOS Sequoia+^[22]
Quantization	4-bit weights^[2]	Mixed 2-bit and 4-bit, ~3.7 bits per weight average^[22]
Trust model	Private Compute Core on-device^[8]	Private Cloud Compute with attested servers, no logging^[22]
Developer access	AI Edge SDK, ML Kit GenAI, Chrome Prompt API^[9]^[10]^[29]	Writing Tools, Image Playground, App Intents, Foundation Models framework^[22]
Open weights	No	No

Apple's machine-learning research paper claims that its 3 billion parameter on-device model outperforms Phi-3 Mini, Mistral 7B, Gemma 7B, and Llama 3 8B on its human evaluation suite, and Apple reports a generation rate of about 30 tokens per second on the iPhone 15 Pro with around 0.6 milliseconds time-to-first-token latency.^[22] Both systems use task-specific adapters, but Apple has elected to keep its adapter format internal while Google exposes the LoRA mechanism to third-party developers through AICore.^[8]^[22]

The two systems represent different bets about what "system-level AI" looks like. Google's bet is that Nano can be a portable layer running on Pixel, partner Android phones, and desktop Chrome, with cloud Gemini available when the on-device model is not enough. Apple's bet is a tighter pair of one on-device model and one private-cloud model, both restricted to Apple Silicon, with third-party cloud models such as ChatGPT plugged in only on user request.

What are the limitations of Gemini Nano?

Gemini Nano is constrained by both its size and its deployment context. It is much smaller than cloud Gemini models, so it is weaker on hard reasoning, long-context tasks, and complex coding. The Gemini paper's own benchmarks show Nano-2 well behind Gemini Pro on MMLU, GSM8K, and similar evaluations.^[2] For tasks like explaining a legal contract or refactoring a large codebase, the cloud-served Gemini models and competing frontier systems are still the right tools.

4-bit quantization gives up some quality compared to higher-precision weights, in exchange for faster inference and lower memory. Battery, thermal, and memory constraints also limit how aggressively apps can use Nano. Continuous generation works for short bursts, but real-time agent loops and long-running summaries of multi-hour audio are gated to higher-RAM Pixel models, and Google's documentation recommends batching work and avoiding tight loops.

Device support is uneven. Many Nano features are exclusive to Pixel, with Samsung, Motorola, and Xiaomi getting subsets through partnerships.^[11]^[12] The Pixel 9a uses a smaller "Nano XXS" variant with only text input and no background execution, and older or lower-tier Android phones do not support Nano at all.^[18] Google's newer Gemini Intelligence layer, as of 2026, requires 12 gigabytes of RAM, a flagship chip, and a Gemini Nano v3 or higher runtime, leaving devices below that bar with reduced features.^[32]

Chrome's Prompt API requires substantial disk space (around 22 gigabytes for the model and runtime), is unsupported on mobile and non-Plus ChromeOS, and is still flagged as experimental for many origins.^[29] Nano is also closed source: there are no public weights, and developers cannot fine-tune the base model directly. Google supports LoRA adapters on top of Nano through AICore for some customization, but the base model remains under Google's control.^[7]^[8]

The original Pixel 8 Pro launch features drew mixed reviews. Recorder summarization was widely praised as useful and reliable, but the first Gboard Smart Reply suggestions were criticized for being generic, occasionally off-topic, and slower than the on-server replies they were meant to replace, and several outlets noted that updates through 2024 and 2025 were needed before the on-device experience felt competitive.^[4] Even the multimodal Nano on the Pixel 9 was limited to English at launch for several features, with broader language support arriving incrementally.^[3]

Recent developments

In January 2024, the Galaxy S24 series brought Gemini Nano to its first non-Pixel phone, alongside Circle to Search and Magic Compose.^[12]^[13] The March and June 2024 Pixel Feature Drops extended Nano features to the Pixel 8 and Pixel 8a, initially through a developer option and then as default features once Google was satisfied the lower-RAM phones could handle them.^[15]

Google I/O 2024 previewed Gemini Nano with Multimodality, the variant that adds image and audio understanding alongside text.^[16] The Pixel 9 series, launched in August 2024, became the first phone to ship the multimodal Nano. 9to5Google reported the model is roughly twice the size of the original and substantially more capable on Recorder summaries and image descriptions.^[17] The same release added Pixel Screenshots, Pixel Studio, richer TalkBack image descriptions, and Call Notes.^[20]

On October 1, 2024, Google opened experimental access to Gemini Nano for all Android developers through the AI Edge SDK and AICore, initially limited to Pixel 9 devices for text-to-text prompts.^[9] In May 2025, ML Kit GenAI APIs added higher-level entry points for prompt, summarization, proofreading, rewriting, image description, and speech recognition, with the Gemini Nano variant shipping on Pixel 10 reaching ML Kit later in the year.^[10] Google I/O 2025 included a session titled "Gemini Nano on Android: building with on-device gen AI," framing Nano as a stable platform for third-party apps rather than a Pixel-only experiment.^[24]

In 2025, the Pixel 10 series launched on the Tensor G5 with Nano powering Magic Cue, Voice Translate, and Pro Res Zoom, plus on-device portions of the photo-to-video feature in the Gemini app.^[25]^[26] The Tensor G5 expanded the on-device token window from roughly 12,000 to 32,000 tokens and ran the newest Nano model roughly 2.6 times faster than Tensor G4 while consuming half the power.^[26] The Pixel Watch 4 added Raise to Talk, and Pixel Buds 2a added on-device message summaries.^[25]^[27]

The 2026 Android Developers Blog post announcing Gemma 4 in the AICore Developer Preview noted that Gemma 4 was the foundation for the next Gemini Nano generation, called "Gemini Nano 4," and that AICore-targeted Gemma 4 code would run automatically on Nano 4 devices later in 2026.^[7] Chrome's built-in AI APIs moved from a developer preview to broader availability through Chrome 137 and 138 in 2025, with the Prompt API exposing Nano through a stable LanguageModel JavaScript object on Windows, macOS, Linux, and ChromeOS.^[29]^[30]

Strategic significance

Gemini Nano is the on-device anchor of Google's AI strategy. The cloud-served Pro, Flash, and Ultra models compete directly with OpenAI's GPT line and Anthropic's Claude in the API and chatbot markets, but Nano competes in a different fight: which AI assistant is built into the operating system a user picks up first thing in the morning. By shipping Nano on Pixel, partner Android phones, Wear OS watches, Pixel Buds, and desktop Chrome, Google offers a system-level AI layer no pure API vendor can match.^[25]^[27]^[29]

Nano is also the most direct counterweight to Apple Intelligence. Both companies have settled on a similar pattern: a small on-device model for fast, private operations and a larger cloud model for harder work. The differences between them, on hardware, on disclosure, and on third-party integration, will shape how the phone industry thinks about AI for years.^[21]^[22]

For developers, Nano expands the surface area of generative AI. A prompt through the AI Edge SDK costs nothing per call, runs offline, and keeps user data on the device, which makes it attractive for features hard to justify with a cloud API: real-time keyboard help, sensitive document summaries, accessibility tools, and long-running background work.^[9]^[10] The trade-off is that the model is smaller, the API is gated to specific devices and chipsets, and Google sets the rules for what runs through AICore.^[7]^[8] Nano shifts some on-device AI design decisions from app developers to the platform vendor, much as camera pipelines and notification systems have done on smartphones. Generative AI is moving from a service users go to into a feature ambient in the phone, the watch, and the browser, and Gemini Nano is one of the clearest early examples of that shift in production.^[25]^[27]^[29]

For the browser ecosystem, the Chrome Prompt API turns Gemini Nano into the first widely available on-device LLM exposed through a standard web platform interface.^[29] That changes what a web developer can ship without infrastructure costs or privacy disclosures: a translation widget, a summarizer for long articles, a structured-output JSON generator for an internal tool, or a generative search box on a personal blog can all run entirely on the user's machine.^[29]^[30] The chief constraints are the 22 gigabyte disk requirement for the model and the relatively narrow set of supported operating systems, which Google has confirmed will expand over time.^[29] The combination of Android AICore on phones and Chrome's Prompt API in the browser gives Google two reference surfaces that no third-party LLM vendor can replicate without a comparable platform footprint.

References

Pichai, Sundar and Hassabis, Demis, "Introducing Gemini: our largest and most capable AI model", Google, 2023-12-06. https://blog.google/technology/ai/google-gemini-ai/. Accessed 2026-05-24. ↩
Gemini Team, Google, "Gemini: A Family of Highly Capable Multimodal Models", arXiv:2312.11805, 2023-12-19. https://arxiv.org/abs/2312.11805. Accessed 2026-05-24. ↩
Google, "Pixel 8 Pro the first smartphone with AI built in is now running Gemini Nano, plus more AI updates coming to the Pixel portfolio", Google, 2023-12-06. https://blog.google/products/pixel/pixel-feature-drop-december-2023/. Accessed 2026-06-24. ↩
Li, Abner, "Google rolling out Gemini Nano to Pixel 8 Pro with Android AICore", 9to5Google, 2023-12-06. https://9to5google.com/2023/12/06/pixel-8-pro-gemini-nano/. Accessed 2026-05-24. ↩
Android Developers, "A New Foundation for AI on Android", Android Developers Blog, 2023-12-06. https://android-developers.googleblog.com/2023/12/a-new-foundation-for-ai-on-android.html. Accessed 2026-05-24. ↩
TechCrunch, "Pixel 8 Pro becomes the first smartphone powered by Google's new AI model, Gemini", TechCrunch, 2023-12-06. https://techcrunch.com/2023/12/06/pixel-8-pro-becomes-the-first-smartphone-powered-by-googles-new-ai-model-gemini/. Accessed 2026-05-24. ↩
Google, "Gemini Nano on Android", Android Developers, 2026. https://developer.android.com/ai/gemini-nano. Accessed 2026-05-24. ↩
Android Developers, "An introduction to privacy and safety for Gemini Nano", Android Developers Blog, 2024-10. https://android-developers.googleblog.com/2024/10/introduction-to-privacy-and-safety-gemini-nano.html. Accessed 2026-05-24. ↩
Android Developers, "Gemini Nano is now available on Android via experimental access", Android Developers Blog, 2024-10-01. https://android-developers.googleblog.com/2024/10/gemini-nano-experimental-access-available-on-android.html. Accessed 2026-05-24. ↩
Android Developers, "On-device GenAI APIs as part of ML Kit help you easily build with Gemini Nano", Android Developers Blog, 2025-05. https://android-developers.googleblog.com/2025/05/on-device-gen-ai-apis-ml-kit-gemini-nano.html. Accessed 2026-05-24. ↩
Triggs, Robert, "Here are all the Gemini Nano features, and the phones that support them", Android Authority, 2024. https://www.androidauthority.com/gemini-nano-features-devices-3490062/. Accessed 2026-05-24. ↩
Google, "The power of Google AI comes to the new Samsung Galaxy S24 series", Google, 2024-01-17. https://blog.google/products-and-platforms/platforms/android/google-ai-samsung-galaxy-s24/. Accessed 2026-05-24. ↩
Li, Abner, "Google Messages getting on-device Magic Compose with Gemini Nano", 9to5Google, 2024-01-17. https://9to5google.com/2024/01/17/new-google-messages-features-beta/. Accessed 2026-05-24. ↩
Google, "Gemini Nano Multimodal Capabilities on Pixel Phones", Google Store, 2024. https://store.google.com/intl/en/ideas/articles/gemini-nano-offline/. Accessed 2026-05-24. ↩
Schoon, Ben, "Google bringing Gemini Nano to Pixel 8 with next Feature Drop", 9to5Google, 2024-03-28. https://9to5google.com/2024/03/28/pixel-8-gemini-nano-feature-drop/. Accessed 2026-05-24. ↩
Triggs, Robert, "Pixel 9 series will debut a more powerful, multimodal Gemini Nano", Android Authority, 2024-05. https://www.androidauthority.com/gemini-nano-multimodal-pixel-9-3442273/. Accessed 2026-05-24. ↩
Li, Abner, "How Pixel Recorder is using Gemini Nano with Multimodality, which is approximately 2x larger than 1.0", 9to5Google, 2024-09-01. https://9to5google.com/2024/09/01/pixel-recorder-gemini-nano-multimodality/. Accessed 2026-06-24. ↩
Roque, Christian de Looper et al, "Google's Pixel 9a has a smaller, less capable Gemini", MobileSyrup, 2025-03-20. https://mobilesyrup.com/2025/03/20/google-pixel-9a-runs-gemini-nano-xxs/. Accessed 2026-05-24. ↩
Pi, Anshel, "14 new AI features in Google's Pixel 9 series devices", Google, 2024-08. https://blog.google/products-and-platforms/devices/pixel/google-pixel-9-new-ai-features/. Accessed 2026-05-24. ↩
Google, "Pixel Studio, Pixel Screenshots, Call Notes and Pixel Weather features", Google, 2024-08. https://blog.google/products-and-platforms/devices/pixel/google-pixel-9-new-ai-features/. Accessed 2026-05-24. ↩
Apple, "Introducing Apple Intelligence", Apple Newsroom, 2024-06-10. https://www.apple.com/newsroom/2024/06/introducing-apple-intelligence-for-iphone-ipad-and-mac/. Accessed 2026-05-24. ↩
Apple Machine Learning Research, "Introducing Apple's On-Device and Server Foundation Models", Apple, 2024-06. https://machinelearning.apple.com/research/introducing-apple-foundation-models. Accessed 2026-05-24. ↩
Abdin, Marah et al, "Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone", arXiv:2404.14219, 2024-04. https://arxiv.org/abs/2404.14219. Accessed 2026-05-24. ↩
Google, "Gemini Nano on Android: Building with on-device gen AI", Google I/O 2025. https://io.google/2025/explore/technical-session-14/. Accessed 2026-05-24. ↩
Google, "5 new things Gemini can do on Pixel", Google, 2025-09. https://blog.google/products/gemini/gemini-nano-pixel-10-updates/. Accessed 2026-05-24. ↩
Google, "Pixel 10 introduces new chip, Tensor G5", Google, 2025-08. https://blog.google/products-and-platforms/devices/pixel/tensor-g5-pixel-10/. Accessed 2026-05-24. ↩
Google, "How to use Gemini on a Wear OS smartwatch", Google, 2025-07. https://blog.google/products-and-platforms/platforms/wear-os/gemini-wear-os-watches/. Accessed 2026-05-24. ↩
Google Developers, "Blazing fast on-device GenAI with LiteRT-LM", Google Developers Blog, 2025. https://developers.googleblog.com/blazing-fast-on-device-genai-with-litert-lm/. Accessed 2026-05-24. ↩
Google, "The Prompt API", Chrome for Developers, 2025-09-21. https://developer.chrome.com/docs/ai/prompt-api. Accessed 2026-06-24. ↩
Chrome for Developers, "AI APIs are in stable and origin trials, with new Early Preview Program APIs", Chrome for Developers Blog, 2025-05. https://developer.chrome.com/blog/ai-api-updates-io25. Accessed 2026-05-24. ↩
Westenberg, Mishaal, "Exclusive: Google Pixel 9's Tensor G4 is the smallest upgrade to the series so far", Android Authority, 2024-08. https://www.androidauthority.com/exclusive-tensor-g4-small-upgrade-3466398/. Accessed 2026-05-24. ↩
Li, Abner, "Gemini Intelligence has high Android spec requirements, likely won't support Pixel 9 or Galaxy Z Fold 7", 9to5Google, 2026-05-15. https://9to5google.com/2026/05/15/gemini-intelligence-android-spec-requirements/. Accessed 2026-05-24. ↩
ML Journey, "Gemini AI Model Parameters and Performance Benchmarks", ML Journey, 2024. https://mljourney.com/gemini-ai-model-parameters-and-performance-benchmarks/. Accessed 2026-05-24. ↩
Google Developers, "Blazing fast on-device GenAI with LiteRT-LM", Google Developers Blog, 2025. https://developers.googleblog.com/blazing-fast-on-device-genai-with-litert-lm/. Accessed 2026-05-24. ↩
InfoQ, "Google Brings Gemini Nano to ML Kit with New On-Device GenAI APIs", InfoQ, 2025-06. https://www.infoq.com/news/2025/06/google-mlkit-genai-gemini-nano/. Accessed 2026-05-24. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

Apple Neural Engine Edge computing Gemini (language model)Gemini 1.0 Gemma 3n Google AI Studio MiniCPM-V Samsung Samsung AI

When was Gemini Nano released?

How does Gemini Nano fit in the Gemini family?

How is Gemini Nano built?

Distillation from larger Gemini models

4-bit quantization

Is Gemini Nano open source?

Multimodal Nano

What can Gemini Nano do?

Which devices and platforms run Gemini Nano?

How do developers access Gemini Nano?

LoRA adapters and safety pipeline

ML Kit GenAI APIs

How does the Chrome Prompt API use Gemini Nano?

What is Gemini Nano used for?

How fast is Gemini Nano and how does it score?

How does Gemini Nano compare with other on-device language models?

How does Gemini Nano differ from Apple Intelligence?

What are the limitations of Gemini Nano?

Recent developments

Strategic significance

See also

References

Improve this article

Related Articles

Tensor Processing Unit (TPU)

TPU Pod

TPU Chip

TPU Device

TPU Master

TPU Node

What links here

Related Articles

Tensor Processing Unit (TPU)

TPU Pod

TPU Chip

TPU Device

TPU Master

TPU Node

What links here