Gemini 1.0
Last reviewed
Jun 3, 2026
Sources
15 citations
Review status
Source-backed
Revision
v1 · 1,597 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
15 citations
Review status
Source-backed
Revision
v1 · 1,597 words
Add missing citations, update stale details, or suggest a clearer explanation.
Gemini 1.0 is the first generation of Gemini, the family of multimodal models that Google DeepMind announced on 6 December 2023 [1][2]. Presented by Google chief executive Sundar Pichai and DeepMind chief executive Demis Hassabis at a virtual briefing, it shipped in three sizes built for different deployment targets: Gemini Ultra for the most complex tasks, Gemini Pro for general use, and Gemini Nano for running directly on devices [1][3]. Google described the family as natively multimodal, meaning the models were pre-trained from the start across text, code, images, audio, and video rather than wiring together separate single-modality systems [1][4]. The launch drew wide attention for the benchmark figures Google reported, and also a controversy over a demonstration video that was later shown to be heavily edited [5][6].
Gemini 1.0 was the first model family released after Google merged its Google Brain and DeepMind research groups into a single unit, Google DeepMind, earlier in 2023 [1][2]. Google framed the launch as the start of what it called the "Gemini era" and the first realization of the combined lab's plans, positioning the family as a successor to its earlier PaLM 2 and LaMDA models [2]. The work drew on teams across Google Research in addition to the merged labs [2].
The project name nods to two histories at once: the Brain and DeepMind merger that produced the team, and NASA's two-seat Project Gemini spaceflight program [2]. Google released the models alongside a technical report, "Gemini: A Family of Highly Capable Multimodal Models," posted to arXiv [4].
Gemini 1.0 was offered in three model sizes so a single family could span data-center workloads down to phones. Ultra and Pro were cloud models, while Nano was distilled into two compact variants meant to run on hardware with limited memory [1][4].
| Model | Positioning | Notes |
|---|---|---|
| Gemini Ultra | Largest and most capable, for highly complex tasks | Held back past launch for safety testing; reached consumers in February 2024 [1][7] |
| Gemini Pro | Best model for scaling across a wide range of tasks | Powered Bard from launch day and shipped to developers on 13 December 2023 [1][8] |
| Gemini Nano | Most efficient model, for on-device tasks | Distilled from larger models; debuted on the Pixel 8 Pro [1][9] |
Nano itself came in two versions. Per the technical report, Nano-1 has 1.8 billion parameters and Nano-2 has 3.25 billion parameters, targeting low-memory and higher-memory devices respectively [4][10]. Both were produced by distilling knowledge from the larger Gemini models and were quantized to 4-bit for deployment [4][10].
The central design claim for Gemini 1.0 was native multimodality. Where many earlier systems bolted a vision or audio component onto a text model, Google said Gemini was built from the ground up to be multimodal and pre-trained simultaneously on different modalities, then fine-tuned with additional multimodal data [1][4]. In practice the cloud models could take in and reason over interleaved inputs of text, images, audio, and video, and produce text and image outputs depending on the configuration [1][4].
Google promoted this design as enabling capabilities that were difficult for text-first models, such as reasoning across a problem that mixes a diagram with a written question, or parsing spoken language directly rather than transcribing it first [1][4]. The on-device Nano models were aimed at narrower tasks: on the Pixel 8 Pro they powered a Summarize feature in the Recorder app and, as a developer preview, Smart Reply suggestions in Gboard, both able to work without a network connection [9].
The headline result Google reported was on MMLU (Massive Multitask Language Understanding), a test spanning 57 subjects including math, physics, history, law, medicine, and ethics. Gemini Ultra scored 90.0% and was described as the first model to outperform human experts on the benchmark, whose human-expert baseline the benchmark's authors put at 89.8% [1][3][11]. Google also said Ultra exceeded the prior state of the art on 30 of the 32 widely used academic benchmarks it tested [1][2].
| Benchmark | Domain | Gemini Ultra |
|---|---|---|
| MMLU | General knowledge and reasoning (57 subjects) | 90.0% [1][11] |
| GSM8K | Grade-school math word problems | 94.4% [12] |
| HumanEval | Python code generation (pass@1) | 74.4% [12] |
| MMMU | Multimodal reasoning across domains | 59.4% [12] |
These figures came from Google's own technical report and announcement. Independent commentators noted at the time that comparisons against OpenAI's GPT-4 depended heavily on prompting method; the much-cited MMLU number, for example, used a particular chain-of-thought prompting setup, and results varied under other conditions [11][13]. The MMLU comparison Google highlighted placed GPT-4 at 86.4% [3][11].
Alongside the launch, Google posted a roughly six-minute video titled "Hands-on with Gemini: Interacting with multimodal AI." It appeared to show a person speaking to Gemini in real time while drawing pictures and handling objects, with the model responding conversationally and seemingly watching the world as it changed [5][6].
Within days, reporting established that the video was not a real-time demonstration. Google confirmed to Bloomberg and others that the sequences were assembled from still image frames and text prompts rather than live video and voice, and that the spoken dialogue heard in the clip had been added afterward [5][6]. The YouTube description carried a note that, for the demo, latency had been reduced and Gemini's outputs shortened for brevity, and Google published a separate post, "How it's made: Interacting with Gemini through multimodal prompting," walking through the actual prompting [6].
Google's position was that the underlying interactions were genuine: the company said the video was an illustrative depiction based on real multimodal prompts and outputs from testing [5][6]. Critics countered that the editing implied a fluid, real-time conversational ability that the model did not actually demonstrate as shown, and that this was materially different from what the presentation suggested [5][6]. The episode became a recurring example in discussion of how AI capabilities are marketed.
Gemini Pro reached users first. From the 6 December 2023 announcement, a fine-tuned version of Gemini Pro powered Google's Bard chatbot in English across a large set of countries, and on 13 December Google opened the Gemini Pro API to developers through Google AI Studio and Vertex AI [1][8]. Gemini Nano arrived the same week on the Pixel 8 Pro via the December feature drop, making that phone the first to run a Gemini model on-device [9].
Gemini Ultra was held back at launch for additional trust and safety review and red-teaming [1]. It reached consumers on 8 February 2024, when Google retired the Bard name, rebranded the chatbot as Gemini, and introduced a premium tier called Gemini Advanced powered by Ultra 1.0 [7]. Access came through a new Google One AI Premium plan priced at roughly $20 per month, available in English in more than 150 countries and territories at launch [7].
Google trained Gemini 1.0 on its own Tensor Processing Units, using a mix of TPU v5e and TPU v4 accelerators across multiple data centers [4][14]. According to the technical report and contemporary reporting, the largest Ultra model relied on a large fleet of TPU v4 accelerators, with the training run distributed across data centers and arranged so that model parallelism was used inside pods of chips and data parallelism between them [4][14].
The report noted that operating at this scale introduced engineering challenges absent at smaller sizes, including a higher rate of hardware failures and even disruptions from cosmic-ray events affecting memory, which the team had to design around to keep training stable [4][14]. Google stated that the training compute exceeded that used for its earlier PaLM 2 model but did not publish exact parameter counts for the Ultra and Pro models [4].
Gemini 1.0 was quickly followed by a next-generation model. On 15 February 2024, only a week after Gemini Advanced launched, Google introduced Gemini 1.5 Pro, a mixture-of-experts model with a dramatically larger context window of up to one million tokens [15]. Later generations, including Gemini 2.0 and Gemini 3, continued the line. The general Gemini overview covers the full family and its evolution beyond this first release.