Imagen 2
Last reviewed
Jun 3, 2026
Sources
14 citations
Review status
Source-backed
Revision
v1 · 1,274 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
14 citations
Review status
Source-backed
Revision
v1 · 1,274 words
Add missing citations, update stale details, or suggest a clearer explanation.
Imagen 2 is the second generation of Google's text-to-image diffusion model, developed by Google DeepMind and first announced for developers and enterprises on December 13, 2023. It generates photorealistic images from natural-language prompts and was positioned as Google's most advanced image-generation system at the time, with notable gains in image quality, prompt understanding, and the ability to render legible text and logos. Over the following months it became the engine behind image creation in Google's consumer products, including the Bard chatbot and the ImageFX tool in Google Labs, before being succeeded by Imagen 3 in 2024. [1][2]
The original Imagen was introduced by Google Research in a May 2022 paper titled "Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding." Its central finding was that a large, frozen text encoder trained only on text, specifically the T5-XXL language model, was unexpectedly effective at conditioning an image generator, and that scaling the language model improved both fidelity and text-image alignment more than scaling the image model itself. The architecture used a 64x64 base diffusion model followed by two super-resolution diffusion stages that upscaled outputs to 1024x1024. On the COCO benchmark Imagen reported a then-state-of-the-art FID score of 7.27 without ever training on COCO, and human raters preferred it over contemporaries such as DALL-E 2, GLIDE, and Latent Diffusion in side-by-side tests on the authors' DrawBench benchmark. [7][8]
The first Imagen was a research system rather than a public product. It was never released for general use, and Google cited concerns about harmful content and social bias as reasons for withholding it. Google also separately developed Parti, an autoregressive text-to-image model, around the same period. Imagen 2 turned this research lineage into a deployable product line.
Imagen 2 was trained on what Google described as higher-quality image and description pairings, which it credited for more detailed outputs and tighter alignment between a prompt and the resulting image. Google highlighted progress on failure modes that commonly plague diffusion models, including more realistic rendering of human hands and faces and fewer distracting visual artifacts. [2][4]
The headline new capabilities concerned typography and branding. Earlier text-to-image systems were notorious for producing garbled, gibberish text inside images. Imagen 2 could render cleaner, legible text and overlay it onto generated scenes, and it could produce logos such as emblems, lettermarks, and abstract marks that could then be placed on products, clothing, business cards, and similar surfaces. The model also supported multilingual text rendering at launch in Chinese, Hindi, Japanese, Korean, Portuguese, English, and Spanish, with more languages planned for 2024. On the Vertex AI platform it additionally offered visual question answering and long-form image captioning. [1][3][5]
| Capability | Detail |
|---|---|
| Announced (Cloud/Vertex AI) | December 13, 2023; generally available December 14, 2023 [1][6] |
| Consumer rollout (Bard, ImageFX) | February 1, 2024 [2][9] |
| Text rendering languages | Chinese, Hindi, Japanese, Korean, Portuguese, English, Spanish [3][6] |
| Watermarking | SynthID invisible watermark [1][6] |
| Editing (added 2024) | Inpainting, outpainting [10][11] |
| Image-to-video (added 2024) | Four-second clips at 360x640 [10][12] |
Imagen 2 reached different audiences along two tracks. The first was enterprise. Google Cloud made Imagen 2 generally available on Vertex AI on December 14, 2023 for customers on an approved allowlist, marketing it as part of the "Imagen on Vertex AI" offering and extending Google's copyright indemnification commitment to cover it. Early enterprise users cited by Google included Snap, which used the technology in an AI Camera Mode for Snapchat+ subscribers, Shutterstock, which built an AI image generator on it, and Canva. [1][6]
The second track was consumer products. On February 1, 2024, Google added a free Imagen 2-powered image generator to Bard, available in English across most countries, and launched ImageFX, a standalone experimental tool in Google Labs and the AI Test Kitchen. ImageFX introduced "expressive chips," keyword suggestions that let users quickly vary parts of a prompt, and was initially limited to English-language users in the United States, Kenya, New Zealand, and Australia. Google also used Imagen 2 to power image generation within its Search Generative Experience. [2][9]
Every image Imagen 2 produced could carry a SynthID watermark, an approach developed by Google DeepMind to embed an identifier directly into the pixels of an image. SynthID had been announced earlier, on August 29, 2023, as a beta tool built with Google Cloud and was first offered to a limited set of Vertex AI customers using Imagen. The watermark is imperceptible to the human eye yet remains detectable after common manipulations such as added filters, color changes, cropping, and lossy compression. It relies on two jointly trained deep-learning models, one that adds the mark and one that detects it, and the detector can report whether a watermark is present, absent, or possibly present. On Vertex AI, the watermarking and verification service for Imagen 2 was offered to allowlisted customers as an experimental feature. [1][13]
Through 2024, Google expanded what Imagen 2 could do on Vertex AI beyond single-shot image generation. Image editing moved to general availability and gained inpainting and outpainting. With inpainting, a user supplies a reference image and a mask to insert or remove content within the picture; with outpainting, the model extends an image beyond its original borders to widen the field of view. [10][11]
At the Google Cloud Next conference in April 2024, Google previewed an image-to-video capability sometimes branded "live images." Imagen 2 could generate short, four-second video clips at a resolution of 360 by 640 pixels from a text prompt, aimed at marketers and creatives and positioned against tools from Runway and Pika. These live images also carried SynthID watermarks, and Google noted the feature was in preview and excluded from its generative AI indemnification policy. [10][12]
Google introduced Imagen 3 at its I/O developer conference in May 2024, describing it as a higher-quality successor with better detail, richer lighting, fewer artifacts, and improved text rendering, which DeepMind chief executive Demis Hassabis characterized as more accurate at following prompts and more creative than Imagen 2. Imagen 3 entered private preview around that announcement and was opened more broadly to United States users later in 2024. Like its predecessor, it applied SynthID watermarks to its outputs. The series continued with Imagen 4, which added faster generation, higher resolution, and further gains in text and detail rendering, cementing Imagen 2's role as a transitional release that brought Google's image-generation research into widely used products. [14]