gpt-image-1
Last reviewed
May 21, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 2,916 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 21, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 2,916 words
Add missing citations, update stale details, or suggest a clearer explanation.
gpt-image-1 is OpenAI's natively multimodal image generation model that powers image creation inside ChatGPT and is exposed to developers as an OpenAI API endpoint of the same name.[^1][^2] The model first appeared publicly on March 25, 2025, when OpenAI replaced its diffusion-based DALL-E 3 system inside ChatGPT with a new image generator branded "4o image generation," embedded directly into the GPT-4o family.[^1] On April 23, 2025, OpenAI released the same model to the API under the identifier gpt-image-1.[^2] Unlike DALL-E 3's two-stage text-to-diffusion pipeline, gpt-image-1 generates pictures autoregressively as a sequence of image tokens, sharing the same omnimodal backbone that produces text and audio.[^1][^3] OpenAI says the model is substantially better at text rendering, multi-turn editing, and instruction following than its predecessors, and reported that more than 130 million users created over 700 million images in the first week after launch.[^2]
| Attribute | Value |
|---|---|
| Developer | OpenAI[^1] |
| Initial ChatGPT release | March 25, 2025 (as "4o image generation")[^1] |
| API release | April 23, 2025 (as gpt-image-1)[^2] |
| Family | GPT-4o (natively multimodal)[^1] |
| Generation method | Autoregressive image-token generation[^3] |
| API output sizes | 1024x1024, 1536x1024, 1024x1536[^4] |
| Text-input price | $5 / 1M tokens[^2] |
| Image-input price | $10 / 1M tokens[^2] |
| Image-output price | $40 / 1M tokens[^2] |
| Per-image (low/medium/high square) | ~$0.02 / $0.07 / $0.19[^2] |
| Safety provenance | C2PA metadata; ChatGPT outputs also include SynthID watermarks[^5][^6] |
For most of 2023 and 2024, OpenAI's image generator inside ChatGPT was DALL-E 3, a text-to-image diffusion system that ran as a separate model invoked through a tool call.[^7] When OpenAI unveiled GPT-4o ("omni") on May 13, 2024, it demonstrated that the same model could in principle ingest and emit images directly, and OpenAI president Greg Brockman posted a sample image generated by the new omnimodal model the next day.[^8] However, native image output was not enabled for public users at that time; ChatGPT continued to route image requests to DALL-E 3, and OpenAI said it was "working hard to bring those to the world."[^8]
On March 25, 2025, OpenAI published the blog post "Introducing 4o Image Generation," which announced that GPT-4o's built-in image generation was rolling out that day to ChatGPT users on the Plus, Pro, Team and Free tiers, plus Sora.[^1] OpenAI described the feature as "a significantly more capable image generation approach than DALL-E 3," able to "create photorealistic output," take images as inputs, edit them, follow detailed instructions, and "reliably incorporate text into images."[^1] CEO Sam Altman introduced the launch on X, writing "we are launching a new thing today, images in chatgpt!" and predicting users would love it.[^9] OpenAI's launch material emphasized that, because image generation was now "native" to GPT-4o, it could "use everything it knows" from training and chat context, rather than producing pictures in a separate model with no awareness of the surrounding conversation.[^1]
Within hours of release, ChatGPT users discovered the model would readily restyle uploaded photos in the look of Japanese animation house Studio Ghibli, and a viral wave of "Ghibli-fied" portraits, news photos and memes spread across X, Instagram and Reddit.[^10] By March 28, Sam Altman told users, "It's super fun seeing people love images in ChatGPT. But our GPUs are melting. We are going to temporarily introduce some rate limits while we work on making it more efficient. Hopefully it won't be long! ChatGPT free tier will get 3 generations per day soon."[^10] Altman later wrote that ChatGPT had added one million new users in a single hour after the feature went out, the fastest growth burst the product had ever seen.[^11] On March 31, 2025, he posted "chatgpt image gen now rolled out to all free users!" confirming general availability.[^12] The Ghibli trend also drew commentary about copyright and the studio's living co-founder Hayao Miyazaki, who had previously called animation by AI "an insult to life itself," and OpenAI tightened style controls during the wave, with free ChatGPT refusing some "Studio Ghibli style" prompts.[^10] The trend further drew public attention when the official White House X account posted a Ghibli-style image depicting an immigration arrest, which generated more than 51 million views and significant backlash.[^13]
On April 23, 2025, OpenAI published "Introducing our latest image generation model in the API," which made the same model available to developers under the API name gpt-image-1.[^2] The blog post confirmed that "when image generation was introduced in ChatGPT" the previous month, more than 130 million users had generated over 700 million images in the first week.[^2] Adobe, Airtable, Wix, Instacart, GoDaddy, Canva, Figma, HubSpot and Photoroom were named as launch partners or early adopters.[^2][^14] Adobe Firefly's redesigned web app added gpt-image-1 as a partner model alongside Google Imagen 3 and Flux 1.1 Pro the day after the API launch.[^14] Figma integrated the model for in-canvas generation and editing, and Canva began rolling it into its Magic Studio tools.[^14] OpenAI priced the model per token: $5 per million text input tokens, $10 per million image input tokens, and $40 per million image output tokens, which OpenAI translated to roughly $0.02, $0.07 and $0.19 per low, medium and high-quality square image.[^2]
OpenAI launched a cheaper variant, gpt-image-1-mini, at DevDay on October 6, 2025, priced about 80% below the base model.[^4] A larger refresh, GPT Image 1.5, began rolling out in December 2025 and was followed by GPT Image 2 in April 2026, both of which remain part of the same gpt-image lineage and continue to be served through the same Images API.[^4]
gpt-image-1 is part of the GPT-4o family of "omnimodal" models, meaning a single transformer-based network is trained to read and write tokens across text, image and audio modalities rather than chaining a language model to a separate image decoder.[^1][^3] OpenAI's launch material described the feature as "embedded natively, deep in the architecture of the omnimodal GPT-4o model," so it can "use everything it knows" from its pretraining to apply visual style, factual knowledge and chat context to a picture.[^1] In OpenAI's API documentation, gpt-image-1 is described as "a natively multimodal language model that accepts both text and image inputs, and produces image outputs."[^15]
Unlike the DALL-E line and most contemporary art generators, which are diffusion models that iteratively denoise a random latent into an image, gpt-image-1 generates pictures autoregressively, predicting image tokens one after another in sequence.[^3] OpenAI confirmed in the API documentation that the model uses image tokens as both input and output, and bills them separately from text tokens at a higher rate.[^2][^15] Independent analyses note that this token-by-token construction is the same paradigm that underlies large language models, and that it lets the same model handle text and image outputs through a unified transformer backbone instead of running a diffusion U-Net.[^3][^16] OpenAI has not publicly disclosed parameter count, training corpus details, or the exact tokenizer used to encode images.
The API accepts text prompts and optional reference images, including PNG, JPEG, WebP and non-animated GIF input up to 50 MB each.[^15] In edit mode, callers can pass up to 16 input images, plus an optional alpha-mask PNG that marks the region to modify.[^15] Output sizes are 1024x1024 (square), 1536x1024 (landscape) and 1024x1536 (portrait), with three quality tiers: low, medium and high.[^4] OpenAI exposes a background parameter that can request transparent backgrounds when the output format is PNG or WebP.[^15] Token consumption scales with both size and quality: a low-quality 1024x1024 image consumes roughly 272 output tokens, while a high-quality large image can consume several thousand, which is what drives the headline per-image prices.[^15]
gpt-image-1 is billed in three streams: text input tokens at $5 per million, image input tokens at $10 per million, and image output tokens at $40 per million (Batch API workflows are priced at half-rate).[^2] OpenAI's headline rule of thumb is $0.02, $0.07 and $0.19 for low, medium and high-quality square images at launch.[^2] Image input tokens apply both for editing flows and when an image is supplied as visual context, so multi-turn editing of a high-quality picture compounds quickly.[^15]
OpenAI and independent reviewers consistently flag four capability jumps over DALL-E 3:
OpenAI shipped gpt-image-1 with an "Addendum to GPT-4o System Card: Native image generation" that catalogues launch-time mitigations.[^17] Notable measures include:
moderation parameter that defaults to auto (standard filtering) and can be lowered to low for less restrictive enforcement, subject to OpenAI's usage policies.[^15]The April 23 API release was accompanied by named integrations across the design, marketing, e-commerce and productivity stacks:
| Partner | Use case |
|---|---|
| Adobe Firefly and Express | Partner model alongside Adobe's own Firefly Image Model 4[^14] |
| Figma Design | In-canvas generation and edit, including object add/remove and background expand[^14] |
| Canva AI / Magic Studio | Image generation in Canva templates and marketing assets[^2][^14] |
| Airtable | Image generation tied to records[^2] |
| Wix | Site visuals[^2] |
| GoDaddy | Logo and brand asset creation[^2][^14] |
| Instacart | Recipe and shopping-list illustrations[^2][^14] |
| HubSpot | Marketing imagery[^14] |
| Photoroom | Product photo editing[^14] |
By August 2025, OpenAI was reporting that ChatGPT was on a path to 700 million weekly active users, growth that company executives credited in part to the image release.[^11]
Reported uses of gpt-image-1 since launch include marketing creative for advertisements and landing pages; concept art for designers; storyboard frames for animators; product photography for e-commerce; image editing such as background replacement, retouching and "expand canvas" workflows; infographics and slides that previously required hand-drawn text; and conversion of photographs into stylized portraits.[^1][^2][^14] The C2PA-tagged image stream has also been used by news outlets and platforms to spot AI provenance in viral content.[^5]
OpenAI and independent reviewers have documented several rough edges in the launch model:
Beyond technical artefacts, gpt-image-1 attracted three main strands of criticism in 2025. Copyright groups argued that the Ghibli-style craze relied on training the model on copyrighted animation frames, and asked whether OpenAI should bar style imitation of identifiable living creators.[^10] Civil-society groups raised concerns about the looser default policy on adult public-figure likenesses, noting the deepfake risk during an election cycle.[^17][^18] Information integrity researchers questioned whether C2PA metadata and SynthID alone are enough to slow misinformation, given that metadata can be removed and watermarks degrade under heavy editing.[^5][^6] OpenAI's own system card acknowledges these as known residual risks and points to layered mitigations rather than a single technical fix.[^17]
The 2025 image-generation landscape settled into a handful of leading models with distinct architectural choices. The table below summarizes the contemporaneous competitive set at gpt-image-1's launch.
| Model | Vendor | Architecture | Notable strengths | Notes |
|---|---|---|---|---|
| gpt-image-1 | OpenAI | Native multimodal, autoregressive image tokens[^3] | Text rendering, chat-style editing, prompt following[^1][^2] | Replaced DALL-E 3 in ChatGPT on 2025-03-25[^1] |
| DALL-E 3 | OpenAI | Diffusion (text-conditioned)[^7] | Legacy DALL-E pipeline kept in API briefly after gpt-image-1 launched[^7] | Effectively superseded inside ChatGPT[^1] |
| Imagen 3 | Google DeepMind | Diffusion[^14] | Strong photorealism in lifestyle and environment scenes[^14] | Surfaced as partner model in Adobe Firefly alongside gpt-image-1[^14] |
| Reve Image 1.0 | Reve | Diffusion | Aesthetic precision; launched the same week as 4o image generation[^16] | Free preview tier emphasized accessibility[^16] |
| Midjourney V7 | Midjourney | Diffusion | Aesthetic quality and stylization[^14] | Weaker at in-image text rendering[^14] |
| Ideogram 3.0 | Ideogram | Diffusion | Specialist text rendering[^16] | Often compared head-to-head with gpt-image-1 for typographic work[^16] |
| FLUX.1 (Pro) | Black Forest Labs | Diffusion (flow-matching) | Photoreal generation; integrated into Adobe Firefly partner roster[^14] | Open weights for smaller variants |
Architecturally, gpt-image-1 is unusual in this comparison because it is autoregressive rather than diffusion-based, and because it shares the same backbone as the chatbot calling it, which is what enables the conversational edit loop.[^1][^3]