Gemini 1.0

Google DeepMind Large Language Models Multimodal AI

9 min read

Updated Jun 28, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 28, 2026

Fact-checked

In review queue

Sources

15 citations

Revision

v2 · 1,755 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Gemini 1.0 is the first generation of Gemini, the family of natively multimodal AI models that Google DeepMind announced on 6 December 2023 ^[1]^[2]. It shipped in three sizes built for different deployment targets: Gemini Ultra for the most complex tasks, Gemini Pro for general use, and Gemini Nano for running directly on devices ^[1]^[3]. Google reported that Gemini Ultra scored 90.0% on the MMLU knowledge-and-reasoning benchmark, which the company said made it the first model to outperform human experts on that test ^[1]^[11]. Presented by Google chief executive Sundar Pichai and DeepMind chief executive Demis Hassabis at a virtual briefing, the launch drew wide attention for those benchmark figures and also a controversy over a demonstration video that was later shown to be heavily edited ^[5]^[6].

What was the context for Gemini 1.0?

Gemini 1.0 was the first model family released after Google merged its Google Brain and DeepMind research groups into a single unit, Google DeepMind, earlier in 2023 ^[1]^[2]. Google framed the launch as the start of what it called the "Gemini era" and the first realization of the combined lab's plans, positioning the family as a successor to its earlier PaLM 2 and LaMDA models ^[2]. The work drew on teams across Google Research in addition to the merged labs ^[2].

The project name nods to two histories at once: the Brain and DeepMind merger that produced the team, and NASA's two-seat Project Gemini spaceflight program ^[2]. Google released the models alongside a technical report, "Gemini: A Family of Highly Capable Multimodal Models," posted to arXiv ^[4].

What were the three Gemini 1.0 model sizes?

Gemini 1.0 was offered in three model sizes so a single family could span data-center workloads down to phones. Ultra and Pro were cloud models, while Nano was distilled into two compact variants meant to run on hardware with limited memory ^[1]^[4]. In Google's words, the company "optimized Gemini 1.0, our first version, for three different sizes: Gemini Ultra, our largest and most capable model for highly complex tasks; Gemini Pro, our best model for scaling across a wide range of tasks; Gemini Nano, our most efficient model for on-device tasks" ^[1].

Model	Positioning	Notes
Gemini Ultra	Largest and most capable, for highly complex tasks	Held back past launch for safety testing; reached consumers in February 2024 ^[1]^[7]
Gemini Pro	Best model for scaling across a wide range of tasks	Powered Bard from launch day and shipped to developers on 13 December 2023 ^[1]^[8]
Gemini Nano	Most efficient model, for on-device tasks	Distilled from larger models; debuted on the Pixel 8 Pro ^[1]^[9]

Nano itself came in two versions. Per the technical report, Nano-1 has 1.8 billion parameters and Nano-2 has 3.25 billion parameters, targeting low-memory and higher-memory devices respectively ^[4]^[10]. Both were produced by distilling knowledge from the larger Gemini models and were quantized to 4-bit for deployment ^[4]^[10].

What made Gemini 1.0 natively multimodal?

The central design claim for Gemini 1.0 was native multimodality. Where many earlier systems bolted a vision or audio component onto a text model, Google said Gemini was built from the ground up to be multimodal and pre-trained simultaneously on different modalities, then fine-tuned with additional multimodal data ^[1]^[4]. In Google's own framing, Gemini "was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video" ^[1]. In practice the cloud models could take in and reason over interleaved inputs of text, images, audio, and video, and produce text and image outputs depending on the configuration ^[1]^[4].

Google promoted this design as enabling capabilities that were difficult for text-first models, such as reasoning across a problem that mixes a diagram with a written question, or parsing spoken language directly rather than transcribing it first ^[1]^[4]. The on-device Nano models were aimed at narrower tasks: on the Pixel 8 Pro they powered a Summarize feature in the Recorder app and, as a developer preview, Smart Reply suggestions in Gboard, both able to work without a network connection ^[9].

How did Gemini 1.0 perform on benchmarks?

The headline result Google reported was on MMLU (Massive Multitask Language Understanding), a test spanning 57 subjects including math, physics, history, law, medicine, and ethics. Gemini Ultra scored 90.0% and was described as the first model to outperform human experts on the benchmark, whose human-expert baseline the benchmark's authors put at 89.8% ^[1]^[3]^[11]. According to the technical report, Ultra reached that figure using an uncertainty-routed chain-of-thought method with 32 samples (CoT@32); under plain greedy decoding it scored 84.0% on the same test ^[4]^[11]. Google also said Ultra exceeded the prior state of the art on 30 of the 32 widely used academic benchmarks it tested ^[1]^[2].

Benchmark	Domain	Gemini Ultra
MMLU	General knowledge and reasoning (57 subjects)	90.0% ^[1]^[11]
GSM8K	Grade-school math word problems	94.4% ^[12]
HumanEval	Python code generation (pass@1)	74.4% ^[12]
MMMU	Multimodal reasoning across domains	59.4% ^[12]

These figures came from Google's own technical report and announcement. Independent commentators noted at the time that comparisons against OpenAI's GPT-4 depended heavily on prompting method; the much-cited MMLU number, for example, used a particular chain-of-thought prompting setup, and results varied under other conditions ^[11]^[13]. The MMLU comparison Google highlighted placed GPT-4 at 86.4% ^[3]^[11].

Why was the Gemini 1.0 demo video controversial?

Alongside the launch, Google posted a roughly six-minute video titled "Hands-on with Gemini: Interacting with multimodal AI." It appeared to show a person speaking to Gemini in real time while drawing pictures and handling objects, with the model responding conversationally and seemingly watching the world as it changed ^[5]^[6].

Within days, reporting established that the video was not a real-time demonstration. Google confirmed to Bloomberg and others that the sequences were assembled from still image frames and text prompts rather than live video and voice, and that the spoken dialogue heard in the clip had been added afterward ^[5]^[6]. The YouTube description carried a note that, for the demo, latency had been reduced and Gemini's outputs shortened for brevity, and Google published a separate post, "How it's made: Interacting with Gemini through multimodal prompting," walking through the actual prompting ^[6].

Google's position was that the underlying interactions were genuine. The company said the video was "an illustrative depiction of the possibilities of interacting with Gemini, based on real multimodal prompts and outputs from testing" ^[5]^[6]. Critics countered that the editing implied a fluid, real-time conversational ability that the model did not actually demonstrate as shown, and that this was materially different from what the presentation suggested ^[5]^[6]. The episode became a recurring example in discussion of how AI capabilities are marketed.

When was Gemini 1.0 available?

Gemini Pro reached users first. From the 6 December 2023 announcement, a fine-tuned version of Gemini Pro powered Google's Bard chatbot in English across a large set of countries, and on 13 December Google opened the Gemini Pro API to developers through Google AI Studio and Vertex AI ^[1]^[8]. Gemini Nano arrived the same week on the Pixel 8 Pro via the December feature drop, making that phone the first to run a Gemini model on-device ^[9].

Gemini Ultra was held back at launch for additional trust and safety review and red-teaming ^[1]. It reached consumers on 8 February 2024, when Google retired the Bard name, rebranded the chatbot as Gemini, and introduced a premium tier called Gemini Advanced powered by Ultra 1.0 ^[7]. Access came through a new Google One AI Premium plan priced at $19.99 per month, available in English in more than 150 countries and territories at launch ^[7].

How was Gemini 1.0 trained?

Google trained Gemini 1.0 on its own Tensor Processing Units, using a mix of TPU v5e and TPU v4 accelerators across multiple data centers ^[4]^[14]. According to the technical report and contemporary reporting, the largest Ultra model relied on a large fleet of TPU v4 accelerators, with the training run distributed across data centers and arranged so that model parallelism was used inside pods of chips and data parallelism between them ^[4]^[14].

The report noted that operating at this scale introduced engineering challenges absent at smaller sizes, including a higher rate of hardware failures and even disruptions from cosmic-ray events affecting memory, which the team had to design around to keep training stable ^[4]^[14]. Google stated that the training compute exceeded that used for its earlier PaLM 2 model but did not publish exact parameter counts for the Ultra and Pro models ^[4].

What came after Gemini 1.0?

Gemini 1.0 was quickly followed by a next-generation model. On 15 February 2024, only a week after Gemini Advanced launched, Google introduced Gemini 1.5 Pro, a mixture-of-experts model with a dramatically larger context window of up to one million tokens ^[15]. Later generations, including Gemini 2.0 and Gemini 3, continued the line. The general Gemini overview covers the full family and its evolution beyond this first release.

References

Introducing Gemini: our largest and most capable AI model - The Keyword, Google ↩
Gemini (language model)) - Wikipedia ↩
Google announces Gemini 1.0 with Nano, Pro, and Ultra sizes, launching today in Bard - 9to5Google ↩
Gemini: A Family of Highly Capable Multimodal Models - Gemini Team, Google (arXiv) ↩
Google faces controversy over edited Gemini AI demo video - CNBC ↩
Google admits the Gemini AI demo hands-on video was staged - Tom's Hardware ↩
Google launches Gemini Ultra, its most powerful LLM yet, and a $20 paid tier for Gemini Advanced - TechCrunch ↩
Google brings Gemini Pro to Vertex AI - TechCrunch ↩
Pixel 8 Pro, the first smartphone with AI built in, is now running Gemini Nano - The Keyword, Google ↩
Brief Review: Gemini, A Family of Highly Capable Multimodal Models - Sik-Ho Tsang, Medium ↩
Google Gemini: Fact or Fiction? - Cameron R. Wolfe, Deep Learning Focus ↩
Gemini AI Model Parameters and Performance Benchmarks - ML Journey ↩
Google DeepMind Gemini Is the World's Best AI - NextBigFuture ↩
Training Google's Gemini: TPUs, multiple data centers, and risks of cosmic rays - Data Center Dynamics ↩
Introducing Gemini 1.5, Google's next-generation model - The Keyword, Google ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

AI Model Release Timeline (2022-2026)AlphaCode 2 Gemini Ultra PaLM 2

What was the context for Gemini 1.0?

What were the three Gemini 1.0 model sizes?

What made Gemini 1.0 natively multimodal?

How did Gemini 1.0 perform on benchmarks?

Why was the Gemini 1.0 demo video controversial?

When was Gemini 1.0 available?

How was Gemini 1.0 trained?

What came after Gemini 1.0?

References

Improve this article

Related Articles

Gemini 3

Gemma 3

Gemini 2.5 Flash

Gemini 3.5 Flash

Gemini 3.1 Pro

Gemini 1.5 Pro

What links here

Related Articles

Gemini 3

Gemma 3

Gemini 2.5 Flash

Gemini 3.5 Flash

Gemini 3.1 Pro

Gemini 1.5 Pro

What links here