Gemini Ultra
Last reviewed
Jun 3, 2026
Sources
16 citations
Review status
Source-backed
Revision
v1 · 1,587 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
16 citations
Review status
Source-backed
Revision
v1 · 1,587 words
Add missing citations, update stale details, or suggest a clearer explanation.
Gemini Ultra is the largest and most capable model in the Gemini 1.0 family, the first generation of multimodal large language models released by Google DeepMind. It was announced on December 6, 2023, as the flagship tier of the three-model Gemini lineup that also included Gemini Pro and the on-device Gemini Nano. Positioned by Google as a direct rival to OpenAI's GPT-4, Ultra was the model Google used to make the headline claim that Gemini could outperform a human-expert baseline on a widely used knowledge benchmark. Unlike the smaller tiers, which shipped almost immediately, Ultra did not reach the public until February 8, 2024, when it became the engine behind the paid Gemini Advanced product.[1][2][3]
Google introduced the Gemini family at a virtual press briefing led by chief executive Sundar Pichai and DeepMind chief executive Demis Hassabis. The launch was the company's most prominent answer to ChatGPT, which had triggered what Pichai called a "code red" inside Google a year earlier. Gemini 1.0 was offered in three sizes tuned for different jobs: Ultra for the most demanding tasks, Pro as a general-purpose model, and Nano for running on devices such as the Pixel 8 Pro. Google described all three as "natively multimodal," meaning they were pre-trained from the start on a mixture of text, code, images, audio, and video rather than having separate modalities bolted on afterward.[1][2][4]
At announcement, Pro was already powering Google's Bard chatbot and was being rolled out to developers. Ultra, by contrast, was held back. Google said it was still undergoing "trust and safety checks, including red-teaming by trusted external parties," and would arrive in a forthcoming product first referred to as "Bard Advanced."[1][2] That gap between announcement and release became one of the first points of friction in the model's reception, because reporters were not given hands-on access or live demonstrations at the briefing.[5]
Google presented Ultra as its state-of-the-art model across text, image, audio, and video understanding, with particular strength in tasks that require multi-step reasoning, coding, and the interpretation of complex visual or scientific material. Because the model was trained natively across modalities, Google emphasized examples in which it reasoned jointly over images and text, extracted information from charts and documents, and worked through problems in mathematics and physics.[1][2] On code generation, the technical report credited Ultra with strong results, including a 74.4% score on the HumanEval Python benchmark, and Google framed coding as one of Ultra's standout abilities.[2][6]
The model was the basis for Gemini Advanced, where it was branded "Ultra 1.0." Google said this configuration was far more capable than Pro at highly complex tasks such as following nuanced instructions, collaborative coding, and detailed reasoning, and that it would gain features over time, including the ability to handle longer interactions and larger amounts of context.[3][7]
Google's launch materials leaned heavily on benchmark comparisons against GPT-4, reporting that Ultra exceeded the prior state of the art on 30 of 32 widely used academic benchmarks.[2] The most-publicized figure was a 90.0% score on Massive Multitask Language Understanding (MMLU), a 57-subject test of knowledge and problem-solving. Google called Ultra "the first model to outperform human experts" on MMLU, citing a human-expert baseline of 89.8%.[2][8]
That claim carried an important caveat that drew contemporary criticism. The 90.0% result (reported as 90.04% in the technical report) was obtained using a method Google called uncertainty-routed chain-of-thought with 32 samples, abbreviated CoT@32, in which the model generates many reasoning chains and selects an answer based on its confidence. On the same CoT@32 method, GPT-4 scored 87.29%. But when both models were compared with the more conventional 5-shot prompting setup, Ultra scored 83.7% and GPT-4 scored 86.4%, meaning GPT-4 actually came out ahead on the apples-to-apples comparison.[6][9] Critics, including commentators on Hacker News and academics quoted in trade coverage, argued that comparing Ultra's best-case prompting strategy against GPT-4's standard one made the headline table misleading.[9][10] Reviewers also noted that across many benchmarks the margins were narrow, and that on at least one commonsense-reasoning test (HellaSwag) Ultra trailed GPT-4 by a wide margin.[5]
The table below summarizes figures from Google's December 2023 technical report. Where two methods are shown for MMLU, both models are compared under each method.
| Benchmark | Gemini Ultra | GPT-4 | Notes |
|---|---|---|---|
| MMLU (CoT@32, uncertainty-routed) | 90.04% | 87.29% | Basis for the "first to beat human experts" claim; human-expert baseline 89.8% |
| MMLU (5-shot) | 83.7% | 86.4% | Standard prompting; GPT-4 scores higher |
| MMMU (multimodal) | 59.4% | 56.8% | GPT-4 figure is GPT-4V; routing method boosts Ultra to 62.4% |
| GSM8K (grade-school math) | 94.4% | 92.0% | Math word problems |
| MATH (competition math) | 53.2% | 52.9% | |
| HumanEval (code) | 74.4% | 67.0% | Python code generation |
Sources: Gemini technical report and Google's announcement.[2][6][8]
Ultra reached the public on February 8, 2024. On the same day, Google renamed Bard to Gemini across its services and launched Gemini Advanced, "a new experience that gives you access to Ultra 1.0." Access was sold through a new Google One AI Premium plan priced at $19.99 per month in the United States, which also bundled 2 TB of storage and the rest of Google One's features, and which opened with a two-month free trial.[3][7][11] The price matched OpenAI's ChatGPT Plus subscription.[12]
Gemini Advanced launched in more than 150 countries and territories, initially in English only, with Japanese and Korean named as the next languages on the roadmap. Google also introduced a dedicated Gemini app for Android, which could replace Google Assistant, and brought Gemini to the Google app on iOS in the following weeks.[3][7][11] At the 2025 Google I/O conference, the Google One AI Premium plan and the Gemini Advanced brand were folded into a new tier called Google AI Pro, while the $19.99 monthly price point in the US remained.[13]
Press coverage of Ultra was attentive but skeptical. TechCrunch wrote that Ultra "scores only marginally better than GPT-4" across many benchmarks and pointed to the narrow gaps: 94.4% versus 92.0% on math, 82.4% versus 80.9% on one reading-comprehension test, and just 0.6 percentage points on an image-understanding task, while noting Ultra fell "a fair bit behind" GPT-4 on commonsense reasoning.[5] The MMLU methodology mismatch became the central technical complaint, with observers stressing that the dramatic 90% figure required the multi-sample CoT@32 setup rather than a like-for-like evaluation.[9][10]
Once Gemini Advanced was in users' hands, the reaction was mixed. A number of testers reported that the product did not feel decisively better than GPT-4 in everyday use, and some pointed to overcautious refusals on sensitive topics.[12][14] The mismatch between benchmark claims and lived experience became a recurring theme in commentary about the model.
No successor model named Ultra was ever released. When Google moved to its next generation, Gemini 1.5, it shipped Gemini 1.5 Pro and a smaller Gemini 1.5 Flash but never a 1.5 Ultra, and the company did not commit to whether an Ultra-class 1.5 model would arrive. The same pattern held through the 2.x generations, which centered on Pro and Flash variants alongside dedicated reasoning configurations. In effect, the Ultra tier as a distinct model line ended with version 1.0, with its role absorbed by improved Pro models and, later, by reasoning-focused systems.[15][16]
The "Ultra" name did resurface in 2025, but as a subscription tier rather than a model: Google AI Ultra, a high-end plan introduced at I/O 2025, bundled access to Google's most advanced models and features at a far higher price than the original Gemini Advanced subscription. The underlying models in that plan were later Gemini versions, not a revived Gemini Ultra model.[13][16] The original Gemini 1.0 models, including Ultra, were eventually retired as Google's API shifted to newer generations.[16]