DALL-E 2

AI Models Image Generation OpenAI

8 min read

Updated Jun 22, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 22, 2026

Fact-checked

In review queue

Sources

11 citations

Revision

v2 · 1,599 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

DALL-E 2 (stylized DALL·E 2) is a text-to-image generation system that OpenAI announced on April 6, 2022, capable of producing photorealistic 1024 by 1024 pixel images from a written prompt, roughly four times the resolution of the original DALL·E.^[1]^[3]^[5] It is built on a method OpenAI's researchers called "unCLIP," a two-stage pipeline that pairs the CLIP contrastive image-text model with a diffusion model decoder.^[1]^[2]^[3] DALL-E 2 succeeded the original DALL·E (January 2021), added editing tools such as inpainting, outpainting, and image variations, opened to the public on September 28, 2022, and was later deprecated in November 2025, scheduled for removal from OpenAI's API on May 12, 2026, after being superseded by DALL·E 3 and gpt-image-1.^[4]

What came before DALL-E 2?

The original DALL·E, announced on January 5, 2021, was a 12-billion-parameter version of the GPT-3 language model that generated 256 by 256 pixel images autoregressively from text, using a discrete variational autoencoder to tokenize images and a CLIP model to rank candidate outputs.^[5] In the months that followed, OpenAI shifted its image-generation research toward diffusion models. In December 2021 the company published GLIDE ("Guided Language to Image Diffusion for Generation and Editing"), a 3.5-billion-parameter text-conditional diffusion model whose samples human evaluators preferred over the original DALL·E. GLIDE also demonstrated text-driven inpainting.^[6] DALL-E 2 directly built on this work, reusing the GLIDE decoder architecture while changing how the model is conditioned.^[2]

Despite the shared "DALL·E" branding, DALL-E 2 is architecturally distinct from DALL·E 1. The original used an autoregressive transformer over discrete image tokens, whereas DALL-E 2 uses a diffusion-based pipeline conditioned on CLIP embeddings.^[2]^[5]

How does DALL-E 2 work?

DALL-E 2 is described in the paper "Hierarchical Text-Conditional Image Generation with CLIP Latents," submitted to arXiv on April 13, 2022 by Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen.^[1] The paper names the approach "unCLIP" because it effectively inverts the CLIP image encoder: rather than mapping an image to an embedding, the system starts from a CLIP embedding and generates a corresponding image.^[1]^[3]

The pipeline is a two-stage model that operates on CLIP's joint text-and-image embedding space:^[1]^[2]

Stage	Component	Function
CLIP encoding	Frozen CLIP text encoder	Converts the text prompt into a CLIP text embedding
Stage 1	Prior	Maps the CLIP text embedding to a CLIP image embedding
Stage 2	Decoder	Generates an image conditioned on the CLIP image embedding

For the prior, the authors experimented with both an autoregressive model and a diffusion model, and reported that the diffusion prior was more computationally efficient and produced higher-quality samples. The diffusion prior is a Transformer with a width of 2048 across 24 blocks.^[1]

The decoder is a diffusion model that generates a 64 by 64 pixel image. The paper states that the decoder uses "the 3.5 billion parameter GLIDE model, with the same architecture and diffusion hyperparameters," modified to condition on CLIP image embeddings rather than directly on text.^[1] Two cascaded diffusion upsampler models then increase the resolution, first from 64 by 64 to 256 by 256 and then from 256 by 256 to 1024 by 1024. Neither upsampler uses attention.^[1] The final 1024 by 1024 output is roughly four times the linear resolution of the original DALL·E's 256 by 256 images, a comparison OpenAI summarized as "4x greater resolution."^[3]^[5]

Because the decoder is conditioned on a CLIP image embedding rather than a fixed image, the same embedding can be decoded multiple times to produce different but semantically consistent outputs. This mechanism underlies the model's image-variation feature, in which DALL-E 2 generates new images that preserve the content and style of an input while varying incidental details. Operating in CLIP's embedding space also lets the model interpolate between two images or apply text-guided edits in a zero-shot manner.^[1]^[2]

The unCLIP design placed DALL-E 2 within a broader 2022 wave of diffusion-based text-to-image systems, alongside Google's Imagen and the open-source Stable Diffusion, which emerged later the same year.^[2]

What can DALL-E 2 do?

DALL-E 2's core capability is generating images from natural-language descriptions, including photorealistic photographs, paintings, illustrations, and other styles, and combining concepts, attributes, and styles that may not co-occur in its training data.^[3] Beyond text-to-image generation, the system offered several editing tools:

Feature	Description	Availability
Text-to-image	Generates images from a written prompt	April 2022 (research); July 2022 (beta)
Inpainting (Edit)	Edits a region of an existing image using a prompt, matching style, lighting, and shadows	At launch
Image variations	Produces alternative versions of an uploaded or generated image	At launch
Outpainting	Extends an image beyond its original borders, adding new content in the same style	August 31, 2022

Outpainting, announced on August 31, 2022, let users expand an image past its frame, for example extending a painting into a larger scene while maintaining its visual style and direction.^[7]

The system also had well-documented limitations. It struggled with binding attributes to the correct objects (for instance, reliably distinguishing "a red cube on a blue cube" from the reverse), with rendering coherent text within images, and with prompts involving many objects, negations, or complex spatial relationships.^[2]

When was DALL-E 2 released to the public?

OpenAI followed a staged rollout. At the April 6, 2022 research announcement, access was limited to a small group of trusted testers.^[3] On July 20, 2022, OpenAI moved DALL-E 2 into a paid beta and began expanding access toward roughly one million people on its waitlist. Alongside the beta it introduced a credit-based pricing model: users received 50 free credits in their first month and 15 free credits each subsequent month, with additional credits sold in batches (115 credits for US$15). A single credit generated four images from a prompt, or three images for an edit or variation. The beta also granted users full commercial rights to the images they created, including the right to reprint, sell, and merchandise them.^[8]

On September 28, 2022, OpenAI removed the waitlist entirely, opening DALL-E 2 to anyone who wished to sign up. The company said that more than 1.5 million users were then actively creating with the system, generating over 2 million images per day, with about 100,000 users sharing work and feedback in OpenAI's Discord community.^[9] On November 3, 2022, OpenAI released the DALL·E API in public beta, allowing developers to integrate image generation into their own applications and products.^[10]

How did OpenAI handle safety and bias?

OpenAI published a system card for the DALL-E 2 preview in April 2022 documenting risks including bias and representation, mis- and disinformation, explicit content, economic effects, harassment and hate, and copyright and memorization.^[11] To reduce harms, OpenAI filtered the training data to remove images with obvious violent, sexual, or hateful content, which it said reduced the model's ability to produce such material.^[3]^[9]

The content policy prohibited generating sexual, violent, hateful, and other disallowed imagery. OpenAI also rejected uploads containing realistic human faces and blocked attempts to generate the likenesses of public figures, including political leaders and celebrities.^[9] Each generated image carried a visible signature in the bottom-right corner consisting of five colored squares (in muted yellow, cyan, green, red, and blue) to mark it as DALL-E output, and OpenAI said it was continuing to explore watermarking and other image-provenance techniques.^[11]

Like other systems trained on internet data, DALL-E 2 reflected social biases: neutral occupational prompts often produced outputs skewed by gender, race, and other attributes. In July 2022 OpenAI announced a technique that, for prompts not specifying demographics, would steer outputs toward greater diversity by adjusting how prompts were applied internally.^[11]

When was DALL-E 2 retired?

DALL-E 2 was succeeded by DALL·E 3, which OpenAI announced in 2023, and later by the gpt-image-1 family of image models. On November 14, 2025, OpenAI notified developers that the DALL·E model snapshots were being deprecated. According to OpenAI's API deprecation documentation, both dall-e-2 and dall-e-3 are scheduled to be removed from the API on May 12, 2026, with the gpt-image-2, gpt-image-1, and gpt-image-1-mini models recommended as replacements.^[4]

Why does DALL-E 2 matter?

DALL-E 2 was one of the systems that brought text-to-image generation to mainstream public attention in 2022. Its unCLIP architecture demonstrated that conditioning a diffusion decoder on CLIP image embeddings could improve the diversity of generated images while retaining photorealism and adherence to the prompt, and its staged public rollout, content policies, and bias-mitigation efforts influenced how subsequent generative-image systems were released and governed.^[1]^[2]^[9]

References

Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M. "Hierarchical Text-Conditional Image Generation with CLIP Latents." arXiv:2204.06125. https://arxiv.org/abs/2204.06125 ↩
"DALL-E." Wikipedia. https://en.wikipedia.org/wiki/DALL-E ↩
"OpenAI showcases DALL-E 2, a powerful A.I. for creating photorealistic scenes from text descriptions." Fortune, April 6, 2022. https://fortune.com/2022/04/06/openai-dall-e-2-photorealistic-images-from-text-descriptions/ ↩
"Deprecations." OpenAI API documentation. https://developers.openai.com/api/docs/deprecations ↩
"DALL·E: Creating images from text." OpenAI, January 5, 2021. https://openai.com/index/dall-e/ ↩
Nichol, A., Dhariwal, P., et al. "GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models." arXiv:2112.10741. https://arxiv.org/abs/2112.10741 ↩
"DALL·E: Introducing Outpainting." OpenAI, August 31, 2022. https://openai.com/index/dall-e-introducing-outpainting/ ↩
"OpenAI expands access to DALL-E 2, its powerful image-generating AI system." TechCrunch, July 20, 2022. https://techcrunch.com/2022/07/20/openai-expands-access-to-dall-e-2-its-powerful-image-generating-ai-system/ ↩
"OpenAI removes the waitlist for DALL-E 2, allowing anyone to sign up." TechCrunch, September 28, 2022. https://techcrunch.com/2022/09/28/openai-removes-the-waitlist-for-dall-e-2-allowing-anyone-to-sign-up/ ↩
"OpenAI removes DALL-E waitlist, allowing anyone to sign up, and tests API." VentureBeat, September 28, 2022. https://venturebeat.com/ai/openai-removes-dall-e-beta-waitlist-tests-api ↩
"DALL·E 2 Preview - Risks and Limitations (system card)." OpenAI, GitHub. https://github.com/openai/dalle-2-preview/blob/main/system-card.md ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Aditya Ramesh CM3leon Classifier-Free Guidance (CFG)DALL-E 3 DALL·E Custom GPTs Fashion GLIDE (OpenAI)Mark Chen (OpenAI)MaskGIT OpenAI Papers Pieter Abbeel Point-E Shap-E Transfusion VQGAN (Taming Transformers)

What came before DALL-E 2?

How does DALL-E 2 work?

What can DALL-E 2 do?

When was DALL-E 2 released to the public?

How did OpenAI handle safety and bias?

When was DALL-E 2 retired?

Why does DALL-E 2 matter?

References

Improve this article

Related Articles

GPT Image 1

DALL-E 3

GPT Image 2

Art ChatGPT Plugins

DALL-E

DALL-E (Agent)

What links here

Related Articles

GPT Image 1

DALL-E 3

GPT Image 2

Art ChatGPT Plugins

DALL-E

DALL-E (Agent)

What links here