Imagen 2

Diffusion Models Google DeepMind Image Generation

6 min read

Updated Jun 3, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 3, 2026

Fact-checked

In review queue

Sources

14 citations

Revision

v1 · 1,274 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Imagen 2 is the second generation of Google's text-to-image diffusion model, developed by Google DeepMind and first announced for developers and enterprises on December 13, 2023. It generates photorealistic images from natural-language prompts and was positioned as Google's most advanced image-generation system at the time, with notable gains in image quality, prompt understanding, and the ability to render legible text and logos. Over the following months it became the engine behind image creation in Google's consumer products, including the Bard chatbot and the ImageFX tool in Google Labs, before being succeeded by Imagen 3 in 2024. ^[1]^[2]

Background (Imagen)

The original Imagen was introduced by Google Research in a May 2022 paper titled "Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding." Its central finding was that a large, frozen text encoder trained only on text, specifically the T5-XXL language model, was unexpectedly effective at conditioning an image generator, and that scaling the language model improved both fidelity and text-image alignment more than scaling the image model itself. The architecture used a 64x64 base diffusion model followed by two super-resolution diffusion stages that upscaled outputs to 1024x1024. On the COCO benchmark Imagen reported a then-state-of-the-art FID score of 7.27 without ever training on COCO, and human raters preferred it over contemporaries such as DALL-E 2, GLIDE, and Latent Diffusion in side-by-side tests on the authors' DrawBench benchmark. ^[7]^[8]

The first Imagen was a research system rather than a public product. It was never released for general use, and Google cited concerns about harmful content and social bias as reasons for withholding it. Google also separately developed Parti, an autoregressive text-to-image model, around the same period. Imagen 2 turned this research lineage into a deployable product line.

Improvements and features

Imagen 2 was trained on what Google described as higher-quality image and description pairings, which it credited for more detailed outputs and tighter alignment between a prompt and the resulting image. Google highlighted progress on failure modes that commonly plague diffusion models, including more realistic rendering of human hands and faces and fewer distracting visual artifacts. ^[2]^[4]

The headline new capabilities concerned typography and branding. Earlier text-to-image systems were notorious for producing garbled, gibberish text inside images. Imagen 2 could render cleaner, legible text and overlay it onto generated scenes, and it could produce logos such as emblems, lettermarks, and abstract marks that could then be placed on products, clothing, business cards, and similar surfaces. The model also supported multilingual text rendering at launch in Chinese, Hindi, Japanese, Korean, Portuguese, English, and Spanish, with more languages planned for 2024. On the Vertex AI platform it additionally offered visual question answering and long-form image captioning. ^[1]^[3]^[5]

Capability	Detail
Announced (Cloud/Vertex AI)	December 13, 2023; generally available December 14, 2023 ^[1]^[6]
Consumer rollout (Bard, ImageFX)	February 1, 2024 ^[2]^[9]
Text rendering languages	Chinese, Hindi, Japanese, Korean, Portuguese, English, Spanish ^[3]^[6]
Watermarking	SynthID invisible watermark ^[1]^[6]
Editing (added 2024)	Inpainting, outpainting ^[10]^[11]
Image-to-video (added 2024)	Four-second clips at 360x640 ^[10]^[12]

Availability (Bard, ImageFX, Vertex AI)

Imagen 2 reached different audiences along two tracks. The first was enterprise. Google Cloud made Imagen 2 generally available on Vertex AI on December 14, 2023 for customers on an approved allowlist, marketing it as part of the "Imagen on Vertex AI" offering and extending Google's copyright indemnification commitment to cover it. Early enterprise users cited by Google included Snap, which used the technology in an AI Camera Mode for Snapchat+ subscribers, Shutterstock, which built an AI image generator on it, and Canva. ^[1]^[6]

The second track was consumer products. On February 1, 2024, Google added a free Imagen 2-powered image generator to Bard, available in English across most countries, and launched ImageFX, a standalone experimental tool in Google Labs and the AI Test Kitchen. ImageFX introduced "expressive chips," keyword suggestions that let users quickly vary parts of a prompt, and was initially limited to English-language users in the United States, Kenya, New Zealand, and Australia. Google also used Imagen 2 to power image generation within its Search Generative Experience. ^[2]^[9]

SynthID watermarking

Every image Imagen 2 produced could carry a SynthID watermark, an approach developed by Google DeepMind to embed an identifier directly into the pixels of an image. SynthID had been announced earlier, on August 29, 2023, as a beta tool built with Google Cloud and was first offered to a limited set of Vertex AI customers using Imagen. The watermark is imperceptible to the human eye yet remains detectable after common manipulations such as added filters, color changes, cropping, and lossy compression. It relies on two jointly trained deep-learning models, one that adds the mark and one that detects it, and the detector can report whether a watermark is present, absent, or possibly present. On Vertex AI, the watermarking and verification service for Imagen 2 was offered to allowlisted customers as an experimental feature. ^[1]^[13]

Later updates

Through 2024, Google expanded what Imagen 2 could do on Vertex AI beyond single-shot image generation. Image editing moved to general availability and gained inpainting and outpainting. With inpainting, a user supplies a reference image and a mask to insert or remove content within the picture; with outpainting, the model extends an image beyond its original borders to widen the field of view. ^[10]^[11]

At the Google Cloud Next conference in April 2024, Google previewed an image-to-video capability sometimes branded "live images." Imagen 2 could generate short, four-second video clips at a resolution of 360 by 640 pixels from a text prompt, aimed at marketers and creatives and positioned against tools from Runway and Pika. These live images also carried SynthID watermarks, and Google noted the feature was in preview and excluded from its generative AI indemnification policy. ^[10]^[12]

Succession by Imagen 3

Google introduced Imagen 3 at its I/O developer conference in May 2024, describing it as a higher-quality successor with better detail, richer lighting, fewer artifacts, and improved text rendering, which DeepMind chief executive Demis Hassabis characterized as more accurate at following prompts and more creative than Imagen 2. Imagen 3 entered private preview around that announcement and was opened more broadly to United States users later in 2024. Like its predecessor, it applied SynthID watermarks to its outputs. The series continued with Imagen 4, which added faster generation, higher resolution, and further gains in text and detail rendering, cementing Imagen 2's role as a transitional release that brought Google's image-generation research into widely used products. ^[14]

References

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

Imagen 3 Imagen 4 Parti (text-to-image model)Samsung

Background (Imagen)

Improvements and features

Availability (Bard, ImageFX, Vertex AI)

SynthID watermarking

Later updates

Succession by Imagen 3

References

Improve this article

Related Articles

Imagen (text-to-image model)

Stable Diffusion

DALL-E

Midjourney

Runwayml/stable-diffusion-v1-5 model

Flux (text-to-image model)

What links here

Related Articles

Imagen (text-to-image model)

Stable Diffusion

DALL-E

Midjourney

Runwayml/stable-diffusion-v1-5 model

Flux (text-to-image model)

What links here