GPT Image 1

AI Models Generative AI Image Generation Multimodal AI OpenAI

33 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

29 citations

Revision

v5 · 6,683 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

GPT Image 1 (API identifier gpt-image-1) is a natively multimodal image generation model developed by OpenAI, integrated into ChatGPT on March 25, 2025, and released as a standalone API on April 23, 2025.^[1]^[2] Unlike its predecessor DALL-E 3, which operated as a separate diffusion service called by ChatGPT, GPT Image 1 is built directly into the GPT-4o transformer architecture, so it can generate, analyze, and edit images within the same unified context as text without handing off requests to an external system.^[1] OpenAI describes it as "the natively multimodal model that powers this experience in ChatGPT," brought to developers so they can integrate "high-quality, professional-grade image generation directly into their own tools and platforms."^[2]

The launch produced one of the largest immediate adoption events in generative AI history. Within the first week, more than 130 million users created over 700 million images, overwhelming OpenAI's infrastructure and triggering temporary rate limits.^[2]^[8] CEO Sam Altman noted publicly that the company's GPUs were "melting" from the demand.^[7] The viral catalyst was a trend of recreating photographs in the style of Studio Ghibli, which spread across social media within hours of launch and ignited a wider debate about copyright, artistic attribution, and the ethics of style imitation.^[4]^[5]

Overview

Attribute	Value
Developer	OpenAI
Type	Native multimodal image generation
Architecture	Autoregressive transformer with rolling diffusion (Transfusion-style)
Predecessor	DALL-E 3 (diffusion)
Parent model	GPT-4o
ChatGPT launch	March 25, 2025
API launch	April 23, 2025
API model identifier	gpt-image-1
Successor variants	GPT Image 1 Mini (October 6, 2025), GPT Image 1.5 (December 16, 2025)
Output resolutions	1024x1024, 1024x1536, 1536x1024
Watermarking	C2PA Content Credentials
Availability	ChatGPT Free, Plus, Pro, Team, Enterprise; OpenAI API; Azure OpenAI Service
API endpoints	/v1/images/generations, /v1/images/edits, /v1/responses, /v1/chat/completions

What is GPT Image 1 and how does it differ from DALL-E?

OpenAI's image generation history began with DALL-E, released in January 2021, which used a discrete variational autoencoder (dVAE) to compress images into tokens that a GPT-like transformer could then generate. DALL-E 2, released in April 2022, shifted to a diffusion-based approach using CLIP guidance, which improved photorealism substantially. DALL-E 3, released in September 2023 and integrated into ChatGPT in October 2023, retained diffusion but introduced better prompt adherence through training on descriptive captions generated by GPT-4.

Despite these improvements, the DALL-E series operated as a separate subsystem. When a ChatGPT user requested an image, the model would pass the request to DALL-E via an internal tool call, losing conversational context.^[1] DALL-E 3 also struggled with legible text rendering, producing garbled or misspelled characters in most cases.^[16] The lack of true iterative editing limited its usefulness for professional workflows.

GPT Image 1 represents a departure from that architecture. Rather than calling out to a separate diffusion system, the model generates images natively within the same transformer backbone that handles text.^[1] The earliest public hint came from the original GPT-4o announcement on May 13, 2024, which included demonstrations of native image output that OpenAI did not ship at the time.^[14] Those capabilities took roughly ten more months of internal alignment, safety review, and infrastructure preparation before being exposed to users.

How does GPT Image 1 work? Native multimodality and the Transfusion approach

GPT Image 1 is built on the same transformer architecture as GPT-4o, but extends it to produce image outputs directly. The technical approach draws on the Transfusion method, which combines autoregressive language modeling with diffusion-based image generation inside a single model.^[10]

In standard diffusion models like DALL-E 2 and Stable Diffusion, images are generated by iteratively denoising random noise, guided by a text condition signal. The process produces high-quality results but separates text understanding from pixel-level generation.^[15] GPT Image 1 instead represents images as sequences of continuous-valued latent patches, encoded by a variational autoencoder (VAE) into a lower-dimensional latent space.^[10] These patches are interleaved with text tokens in a unified sequence, and the model learns a combined objective: autoregressive next-token prediction for text, and a denoising diffusion probabilistic model (DDPM) objective for image patches.^[10]

The attention masking scheme is hybrid. Text tokens use causal (left-to-right) attention. Image patches, which appear between special begin-of-image (BOI) and end-of-image (EOI) tokens, use bidirectional attention among themselves, allowing the model to consider spatial relationships.^[10] Across text-image boundaries, autoregressive masking preserves sequence integrity. This means the transformer can treat an image as a two-dimensional entity within what is otherwise a one-dimensional sequence.

For inference, GPT Image 1 uses a rolling diffusion process that denoises image patches row-by-row rather than all at once.^[14] This allows streaming-compatible generation where the image builds progressively. Users in ChatGPT see this as a top-down progressive reveal, with the image appearing to paint itself from the top of the frame downward.^[14]

The practical effect is that the model retains full conversational context when generating images.^[1] If a user has been discussing a product design concept for several turns, the generation request can draw on that entire context rather than receiving an isolated prompt. The model can also accept images as inputs, analyze their content, and produce edited versions within the same context window.^[1] OpenAI states the model "can create images across diverse styles, faithfully follow custom guidelines, leverage world knowledge, and accurately render text."^[2]

Output images are available in three aspect ratios: 1024x1024 (square), 1536x1024 (landscape), and 1024x1536 (portrait).^[3] Generation time typically ranges from 30 to 60 seconds for standard requests on the high quality tier, with low and medium quality settings producing results in 10 to 25 seconds.

Why does autoregressive image generation matter?

A diffusion model treats image generation as a denoising problem, conditioned on a text embedding from a separate encoder. The text encoder, the diffusion U-Net, and the ChatGPT language model are three systems with three training regimes.^[9] Autoregressive image generation collapses that pipeline. The same model that holds the conversation also draws the image, using the same weights and the same attention mechanism.^[9] If you ask GPT Image 1 to draw "a chemistry beaker labeled with the structural formula for caffeine," the language model already knows the formula, and the image-generating part of the same network can use that knowledge directly. This explains the model's strengths in legible text, instruction following, and recognizable objects, and also why generation can be slower than parallel diffusion, since each image patch in the rolling diffusion step depends on the patches before it.^[9]

C2PA watermarking

All images generated by GPT Image 1 are watermarked using the Coalition for Content Provenance and Authenticity (C2PA) standard, often referred to as Content Credentials.^[1] C2PA metadata is embedded invisibly in the image file and can be read by compatible tools to verify that the image was AI-generated.^[21] OpenAI joined the C2PA steering committee in 2024, and GPT Image 1 was the first OpenAI image product to apply the standard at scale.^[1] The metadata persists through most file transformations, though it can be stripped by some editing tools, and the metadata indicates that the image was generated rather than captured.^[21] This provenance tagging was presented as a transparency measure to help platforms and viewers distinguish AI-generated content from photographs.^[1]

Free tier ChatGPT users see an additional visible watermark in the bottom-right corner of generated images.^[1] Plus, Pro, Team, and Enterprise subscribers, along with API customers, do not receive the visible watermark, though the embedded C2PA metadata remains in all outputs regardless of tier.^[1]

When was GPT Image 1 released in ChatGPT? (March 2025)

OpenAI announced GPT Image 1 on March 25, 2025, during a livestreamed event.^[1]^[19] The feature was initially rolled out to ChatGPT Plus, Pro, and Team subscribers.^[1] Rollout to free-tier users was delayed by one day, on March 26, due to the volume of requests overwhelming server capacity.^[1] Announcing the feature, Sam Altman called it "a new high-water mark for us in allowing creative freedom."^[29]

The launch coincided with a significant relaxation of OpenAI's style-replication policies.^[6] Previous versions of ChatGPT refused to generate images closely imitating the style of named studios or contemporary artists. The new version was more permissive toward broad studio styles while maintaining restrictions on imitating individual living artists.^[6] This policy shift, combined with the model's substantially improved ability to capture distinctive visual aesthetics, produced the Studio Ghibli viral moment.^[4]^[6]

The ChatGPT implementation supports several interaction modes beyond simple prompt-to-image generation. Users can upload an existing image and request modifications, such as changing a background, adding an object, or transforming the overall style.^[1] Multi-turn editing is supported, meaning a user can refine an image across several messages with natural language instructions like "make the lighting warmer" or "remove the figure on the left" without starting over from scratch.^[1] The model also accepts up to 10 reference images in a single request, enabling complex compositional tasks.^[3]

The Studio Ghibli moment

The cultural event most closely associated with GPT Image 1 began within hours of the ChatGPT rollout. Users discovered that asking the model to "redraw this image in the style of Studio Ghibli" produced convincing results that matched the warm color palette, soft linework, and characteristic facial proportions associated with films like My Neighbor Totoro and Spirited Away.^[4] Elon Musk, Donald Trump, scenes from The Lord of the Rings, and personal photographs were all Ghiblified and shared widely.^[5] Sam Altman changed his own profile picture to a Ghibli-style image, effectively endorsing the trend before the company had processed the implications.^[5]

By the end of the first week, OpenAI reported that 130 million users had generated more than 700 million images.^[2]^[8] Altman said in a March 27 post that the team was rationing GPU capacity. "Our GPUs are melting," he wrote, a phrase that ended up in headlines across most major tech sites within 24 hours.^[7]

The trend resurfaced a 2016 quote from Studio Ghibli co-founder Hayao Miyazaki. In NHK documentary footage, Miyazaki had been shown an early AI animation demo and reacted with what reviewers described as visible disgust.^[20] "I am utterly disgusted," he said. "If you really want to make creepy stuff you can, but I would never wish to incorporate this technology into my work at all. I strongly feel that this is an insult to life itself."^[20] The 2016 quote was about a different technology and a different demonstration, but the contrast between Miyazaki's stated revulsion and millions of users imitating his studio's house style was hard to miss. TechCrunch, CNN, and the Washington Post all ran stories pairing the old quote with the new trend.^[4]^[5]^[6]

Studio Ghibli itself did not issue a formal statement during the trend, and as of mid-2026 has not filed legal action against OpenAI over GPT Image 1.^[6] The studio's silence was widely read as discomfort given Miyazaki's documented views, but it was not a takedown notice. OpenAI's policy at the time allowed generation in "broader studio styles" while prohibiting imitation of "individual living artists," and the company quietly tightened its filters on certain Ghibli-adjacent prompts in the weeks after the launch.^[6]

The White House posted an AI-generated Ghibli-style image depicting the arrest of a migrant by ICE officers, drawing widespread criticism for using a warmly nostalgic artistic style to represent a contested law enforcement action.^[5] The post became a reference case for critics arguing that powerful generation tools could be used for emotional manipulation. The cultural reaction split into camps: lightweight creative use, working-illustrator concerns about style imitation at scale, and copyright-and-consent objections about training data.^[4] None of these debates reached a clean resolution, but the trend set the template for how future image-model launches would be received.

When was the gpt-image-1 API released, and what does it cost? (April 2025)

OpenAI made GPT Image 1 available through its public API on April 23, 2025, under the model identifier gpt-image-1.^[2] The API exposes three endpoints: the chat completions endpoint at /v1/chat/completions, the responses endpoint at /v1/responses, and the dedicated image generation endpoint at /v1/images/generations. Image editing is available through /v1/images/edits.^[3] The OpenAI API listing made the model available to developers who had previously been calling DALL-E 3, with most of the same calling conventions but a richer set of parameters covering quality, output resolution, and reference images.^[2]

The API follows a token-based pricing model consistent with other OpenAI API offerings. Text input tokens cost $5.00 per million, with cached text input priced at $1.25 per million. Image input tokens cost $10.00 per million, with cached image input at $2.50 per million. Image output tokens are priced at $40.00 per million.^[12]

For developers pricing per generated image rather than per token, the practical costs work out as follows:

Quality level	Resolution	Output tokens (approx)	Approximate cost per image
Low	1024x1024	272	$0.011
Low	1024x1536	408	$0.016
Low	1536x1024	400	$0.016
Medium	1024x1024	1056	$0.042
Medium	1024x1536	1584	$0.063
Medium	1536x1024	1568	$0.063
High	1024x1024	4160	$0.167
High	1024x1536	6240	$0.25
High	1536x1024	6208	$0.25

Access is tiered by usage level. Tier 1 developers (new accounts with limited usage history) receive 100,000 tokens per minute and 5 images per minute. Tier 5 accounts (established high-volume users) receive 8,000,000 tokens per minute and 250 images per minute.^[3] OpenAI states that API-submitted images and prompts are not used for model training.^[2]

The API does not support streaming, function calling, structured outputs, or fine-tuning as of the initial release.^[3] Developers calling through the Responses API can register image_generation as a tool and let the model decide when to invoke it within a longer agentic flow, which is the recommended path for chat-style applications that need conditional image output rather than every-turn rendering.^[3]

Available API parameters

The images.generate and images.edit endpoints accept a set of parameters mirroring the ChatGPT controls:^[3]

Parameter	Values	Purpose
`prompt`	Text up to 32,000 characters	Description of the desired image
`quality`	`low`, `medium`, `high`, `auto`	Cost, speed, and quality tradeoff
`size`	`1024x1024`, `1024x1536`, `1536x1024`, `auto`	Output dimensions
`n`	1 to 10	Number of images per request
`output_format`	`png`, `jpeg`, `webp`	Encoding of returned image
`background`	`transparent`, `opaque`, `auto`	Whether to leave the background as alpha
`moderation`	`auto`, `low`	Content filter strictness
`user`	Free-form string	End-user identifier for abuse tracking

The webp output format and transparent background parameter were added in summer 2025 updates to support design and branding workflows where assets need an alpha channel for compositing.^[3]

What can GPT Image 1 do? Capabilities

Text rendering

The most widely cited improvement over DALL-E 3 is the model's handling of text within images.^[16] DALL-E 3 frequently produced garbled, misspelled, or warped text in images, with one independent comparison finding roughly 37% accuracy on text rendering tasks. GPT Image 1 substantially improves this, with the same comparison reporting approximately 87% accuracy.^[16] The improvement comes from the model's native integration with the text transformer: because the same architecture that processes language also generates the image, the model has a more direct path to rendering the specific characters, words, and layouts specified in a prompt.^[16]

This capability opens GPT Image 1 to use cases that were essentially closed to DALL-E 3 and most other diffusion models, including generating mockups of book covers, product packaging, poster designs, banners with readable headlines, and infographics containing labeled data.^[16] It also explains why the model was the first widely usable image generation tool for meme creation at scale: text-on-image is the meme format, and text-on-image is precisely what DALL-E 3 could not do reliably.

Photorealism and style control

GPT Image 1 produces photorealistic output with better lighting coherence and texture handling than DALL-E 3.^[16] Reviewers note the model tends toward a polished, commercially appealing aesthetic by default: skin tones are smooth, lighting is balanced, and object surfaces look consistent. This default suits marketing and product photography but can work against users who want grittier visual styles.

Style transfer is substantially more reliable than in DALL-E 3.^[16] Users can describe a reference style by name or description, provide a reference image, or both, and the model will apply that style with reasonable consistency across generations.^[1] The conversational memory means a style can be established in one turn and maintained across edits.^[1] The model supports a range of artistic styles, photographic styles, and illustration techniques: oil painting, watercolor, anime, photorealistic portraits, architectural visualizations, product shots on white backgrounds, and flat design graphics, among others.

Image editing and inpainting

GPT Image 1 supports mask-based inpainting, where a user specifies a region of an image to be replaced while the rest remains intact.^[3] Common applications include removing objects from photographs, replacing backgrounds, changing clothing or accessories on a figure, or dropping new elements into an existing scene. The API accepts mask images in addition to the source image and text prompt.^[3]

Outpainting (extending the canvas beyond the original image boundaries) is also supported, allowing users to expand a composition in any direction. This is useful for adapting an image to a different aspect ratio or adding context around a central subject.

One documented limitation in the initial release is that the edited region can look visually distinct from the surrounding content in some cases, particularly in texture-sensitive applications like product photography or portraits where seamless blending is required.

Multi-image composition

GPT Image 1 accepts up to 10 reference images in a single request.^[3] Common patterns include taking a photo of a person, a photo of a product, and a photo of a setting and producing a composite image where the person is using the product in that setting. Designers also use this feature for moodboard-driven generation, where a small set of inspiration images influences the style and color palette of a generated output without dictating its exact content.

Instruction following

Because GPT Image 1 generates images through the same transformer that processes the instruction, it can follow detailed, multi-clause prompts more reliably than diffusion models that encode the prompt through a separate text encoder.^[9] In a 1,000-task grounded image editing benchmark conducted after the API release, GPT Image 1 achieved the highest functional-correctness scores among all models tested.

The model also benefits from conversational iteration. Instead of writing a complete new prompt for each revision, users can say "change the background to late afternoon light" or "make the text larger" in a follow-up message, and the model applies the change within context.^[1] This iterative workflow is not available through the standalone DALL-E 3 API, which requires a complete new prompt for each generation.

How much does GPT Image 1 cost? Pricing and access tiers

ChatGPT subscribers access the image generation feature within their plan without per-image charges, subject to fair-use rate limits that tighten as usage volume increases. Free-tier users have access but face stricter rate limits than paid subscribers, plus the visible watermark described above.^[1]

For API developers, the token-based pricing structure described above applies.^[12] The total cost per image varies significantly by quality setting and resolution. A low-quality 1024x1024 image costs approximately $0.011, making high-volume generation feasible for many applications. High-quality large-format images at $0.25 per image are positioned for professional and commercial use cases where output quality justifies the cost.^[11]

Plan or tier	Cost	Image generation access
ChatGPT Free	$0	Limited generations per day, visible watermark
ChatGPT Plus	$20/month	Higher daily limits, no visible watermark
ChatGPT Pro	$200/month	Effectively unrestricted personal use
ChatGPT Team	$25 to $30/seat/month	Per-seat limits, no visible watermark
ChatGPT Enterprise	Custom	Contracted limits, data residency options
API (gpt-image-1)	$0.011 to $0.25 per image	Per-call pricing, tier-based rate limits
Azure OpenAI Service	Comparable to OpenAI direct	Enterprise SLAs, regional residency

Enterprise customers working with OpenAI directly can negotiate volume pricing. Azure OpenAI Service also hosts the gpt-image-1 model, providing enterprise-grade SLAs, regional data residency options, and integration with Microsoft AI governance tooling.

OpenAI offers adjustable content moderation sensitivity through the API, with an auto setting applying standard policy filters and a low setting relaxing some restrictions for verified enterprise applications.^[3] The company states that neither images submitted through the API nor the prompts used to generate them are included in training data for future model versions.^[2]

Who uses the gpt-image-1 API?

A notable feature of the GPT Image 1 launch was the speed at which large software vendors integrated the API.^[2] OpenAI's launch announcement on April 23 named several launch partners, and additional integrations followed in the months after.^[2] OpenAI reported that "leading enterprises and startups across industries including creative tools, e-commerce, education, enterprise software, and gaming" were already building on the model at launch.^[2]

Company	Product	Integration date	Use case
Adobe	Firefly Boards (and broader Firefly)	April 23, 2025 (launch partner)	Model option alongside Adobe's own Firefly models
Figma	Figma Design (FigJam)	April 2025	In-canvas image generation and editing
Wix	Wix site builder	April 2025	On-brand site imagery, product photography
Canva	Canva editor	May 2025	Direct generation inside design templates
Microsoft	Designer, Copilot, Bing Image Creator	April 2025	Default image model for Copilot consumer surfaces
HeyGen	Avatar and presentation tooling	May 2025	Background and slide visuals
Quora (Poe)	Poe assistant platform	April 2025	Selectable image bot for paid users
GoDaddy	Studio brand assets	June 2025	Logo and marketing asset generation
Photoroom	Mobile photo editing	May 2025	AI fill and product shot generation
Instacart	Recipe and product imagery	June 2025	Recipe card and meal visual generation
Salesforce	Marketing Cloud	Q3 2025	On-brand campaign asset generation

Microsoft was a high-volume integrator. Within weeks of the launch, Microsoft Designer, the consumer Copilot apps, and Bing Image Creator had moved to gpt-image-1 as the default image model.^[23] Adobe positioned its integration as additive rather than a replacement for Firefly, reflecting its commercial commitment to its own training-data-licensed model.^[24] Canva's integration arrived a few weeks after launch, available to Pro subscribers initially and then expanded to all paid users.^[25]

What is GPT Image 1 used for? Use cases

Marketing and content creation

Marketing teams adopted GPT Image 1 rapidly after launch. The ability to generate ad mockups with readable text, create on-brand visual assets from natural language descriptions, and iterate within a conversation substantially reduced the time from concept to usable visual.^[18] Enterprise early adopters reported double-digit speed improvements in prompt-to-asset workflows.^[18] Gamma, a presentation platform, reported generating five million presentation graphics daily after integrating the API.^[2] For e-commerce, the model's ability to generate product images on clean backgrounds, apply consistent lighting, and produce variants at scale opened applications in catalog generation and product visualization.

Design and prototyping

Design teams use GPT Image 1 for rapid prototyping of UI concepts, packaging designs, logo ideas, and illustration styles. The model's ability to accept a reference image and apply transformations means designers can start from an existing asset and explore variations without rebuilding from scratch. The conversational iteration model suits exploratory early stages where requirements are still evolving.

Education and documentation

The improved text rendering makes the model useful for generating instructional diagrams, flowcharts, labeled illustrations, and infographics.^[16] Technical writers and educators can produce figures that previously required a dedicated graphic designer.

Game development and entertainment

Game studios and independent developers use the model for concept art generation, character design exploration, environment mood boards, and UI element prototyping. The model's style consistency across multiple outputs helps establish a coherent visual language for a project before committing to expensive manual production.

A large share of GPT Image 1's usage comes from individuals creating personalized images for social media, profile pictures, greeting cards, and hobbyist creative projects. The Ghibli moment demonstrated how this casual creative use can reach mass scale when the model's capabilities align with a popular trend.^[4] Portrait-to-illustration conversion became one of the most common use cases among non-professional users.

How does GPT Image 1 compare with other image models?

GPT Image 1 launched into a market that already had several capable image models competing for both consumer and developer attention. The table below compares it with the most widely discussed peers as of mid-2026. Pricing is approximate and shifts often, especially on the API side.

Model	Developer	Architecture	Text rendering	Photorealism	Native conversational editing	Approx. API cost (medium quality, 1024x1024)
GPT Image 1	OpenAI	Autoregressive (Transfusion)	Excellent	Very good	Yes	$0.042
GPT Image 1 Mini	OpenAI	Distilled from GPT Image 1	Very good	Good	Yes	$0.008 to $0.015
DALL-E 3	OpenAI	Diffusion (separate from LM)	Poor	Good	No	$0.040
FLUX.1 Pro	Black Forest Labs	Rectified flow transformer	Good	Excellent	No	$0.05 to $0.06
FLUX.1.1 Pro Ultra	Black Forest Labs	Rectified flow transformer	Good	Excellent	No	$0.06 to $0.08
FLUX Krea	Black Forest Labs	Aesthetic-tuned FLUX variant	Good	Excellent (artistic)	No	$0.04 to $0.05
Midjourney v7	Midjourney	Proprietary diffusion	Fair	Excellent (stylized)	Partial (variations)	Subscription, ~$0.04 to $0.10 effective
Imagen 3	Google DeepMind	Diffusion	Good	Very good	No	$0.04 (Vertex)
Imagen 4	Google DeepMind	Diffusion	Very good	Excellent	No	$0.03 to $0.06
Stable Diffusion 3	Stability AI	Multimodal diffusion transformer	Fair	Good	No	Free (open weights)
Stable Diffusion 3.5 Large	Stability AI	MMDiT (8B parameters)	Good	Very good	No	Free (open weights)
Recraft V3	Recraft	Diffusion (vector-aware)	Excellent	Good	No	$0.04
Ideogram 3	Ideogram	Diffusion (text-rendering specialty)	Excellent	Very good	No	$0.06 to $0.08
Reve Image 1.0	Reve	Diffusion	Very good	Very good	No	$0.04

GPT Image 1 ranked first on the Artificial Analysis Image Arena leaderboard at launch, a crowd-sourced benchmark that computes an ELO score from blind preference votes. FLUX 1.1 Pro from Black Forest Labs scored comparably in photorealism benchmarks and is generally preferred for tasks where raw visual quality and anatomical accuracy are the priority. DALL-E 3 remains available through the OpenAI API and ChatGPT for legacy compatibility, but is largely superseded by GPT Image 1 for most use cases. OpenAI announced that DALL-E 2 and DALL-E 3 API access will be deprecated on May 12, 2026, with new applications encouraged to migrate to gpt-image-1.^[22]

Imagen 3 and Imagen 4, Google DeepMind's image models, produce highly photorealistic output and have strong text rendering, though they are primarily accessible through Google's Vertex AI platform and Gemini products rather than as a standalone consumer feature. In head-to-head comparisons, Imagen 4 holds an advantage on certain photorealism tests, while GPT Image 1 holds an advantage in conversational editing and instruction following.

Midjourney v7 retains a strong following for artistic and aesthetic work where users want the model to exercise significant creative interpretation. GPT Image 1 is generally less opinionated aesthetically, following instructions more literally, which suits professional production workflows but can produce less surprising or artistically distinctive outputs than Midjourney for abstract creative prompts.

Reve, a newer entrant from a small team formerly at Adobe and Snap, achieved Image Arena leaderboard wins in late 2024 with a focus on prompt adherence at low cost. Ideogram 3 is the model most directly competitive with GPT Image 1 on text rendering specifically, and remains a common choice for users whose primary need is typography in images.

Reception

The immediate reception was defined more by the scale of adoption than by critical review. The 700 million images generated in the first week placed the launch among the fastest consumer technology adoptions ever recorded.^[8] TechCrunch, The Verge, CNN, and the Washington Post all covered the launch and the Ghibli trend within 24 hours.^[4]^[5]^[6]

Technical reviewers praised the text rendering improvement and the conversational editing workflow as genuine advances over existing tools.^[16] The native integration with ChatGPT's context window was identified as a structural advantage.^[15] The model's tendency toward a polished, smooth aesthetic drew criticism from designers who preferred the grittier output they could obtain from FLUX or Midjourney. Reviewers also noted that aggressive safety filters sometimes declined reasonable requests without clear explanation, and that the lack of seed control in ChatGPT made exact reproducibility difficult.

Enterprise adoption was faster than for any previous OpenAI image product.^[18] By the third quarter of 2025, OpenAI was reporting that gpt-image-1 had become the most-used image generation API on its platform, surpassing combined DALL-E 2 and DALL-E 3 traffic. Industry analysts also tracked an impact on creative labor. Freelance graphic design job postings declined by approximately 18% in the months following the launch, according to freelance platform tracking. Defenders argued the change represents a shift in the type of work designers do rather than a reduction in total demand. Critics argued the displacement was real and concentrated among entry-level designers.

Controversies

Style replication and copyright

The Studio Ghibli trend raised copyright questions that remained legally unresolved through 2026.^[4] Under U.S. copyright law, artistic style is generally not protectable, and intellectual property attorneys quoted during the trend confirmed that generating Ghibli-style images does not straightforwardly constitute infringement.^[4] The deeper question is whether OpenAI's training data included copyrighted Ghibli films and whether training on that material constitutes fair use.^[5] Several related lawsuits by publishers and news organizations against OpenAI over training data were ongoing.^[4]

The 2016 Miyazaki quote referred to a specific zombie animation demonstration rather than to text-to-image models like GPT Image 1, but it became the most widely shared statement of artistic objection to the trend.^[20] OpenAI's policy at launch allowed generation in "broader studio styles" while prohibiting imitation of "individual living artists."^[6] Critics questioned whether Miyazaki, as the creative director behind Ghibli's visual identity, qualified as an individual artist whose style should be protected, and whether the studio/individual distinction was coherent. The studio itself made no public statement and filed no legal action.^[6] OpenAI quietly tightened the model's response to certain Ghibli-specific prompts in the weeks after the launch, though Ghibli-style outputs remained achievable through descriptive prompting.^[6]

A broader criticism applied to GPT Image 1 and other large image models is that the model's ability to convincingly replicate specific artistic styles implies that the training data included large quantities of work in those styles, without compensation to or consent from the artists whose work was used.^[5] OpenAI has not released details of the training data composition for GPT Image 1, and the company faces several ongoing lawsuits from artists, publishers, and news organizations over its use of copyrighted material in training corpora.^[4]

Infrastructure strain and access inequality

The temporary rate limits imposed at launch, which primarily affected free-tier users, drew criticism from users who felt the rollout prioritized paying subscribers.^[7] OpenAI's tiered access structure means that the most capable features of GPT Image 1 are available at full rate only to Tier 5 API customers, with lower tiers facing substantially tighter limits.^[3]

Geographic restrictions

GPT Image 1 is not available in all countries. Users in mainland China, Russia, North Korea, and Iran cannot access the model through ChatGPT or the API, mirroring broader geographic restrictions on OpenAI products. The European Union received the model with a short delay relative to the US rollout, while the UK received it on the same schedule as the US.

Limitations

Several documented limitations affected GPT Image 1 in its initial release form. Some have been mitigated by later updates, but the underlying tendencies are still recognizable in current outputs.

Limitation	Description	Status as of mid-2026
Slow generation	High-quality outputs take 30 to 60 seconds	Reduced 4x in GPT Image 1.5
Anatomy artifacts	Hands, fingers, and complex poses occasionally distorted	Improved in 1.5, still occasional
Non-Latin text	Chinese, Arabic, Hebrew text often malformed	Partial improvement, Latin scripts still strongest
Over-smoothing	Default aesthetic is polished and synthetic-looking	Mitigable with prompting; default unchanged
Color bias	Warm/amber bias in shadows and neutrals	Partial mitigation in 1.5
Inpainting seams	Mask boundaries sometimes visible	Partial improvement
Crowd scenes	Multiple faces in one frame frequently distorted	Slowly improving
No streaming, fine-tuning, or function calling	API limitations	Streaming added in late 2025 update
Knowledge-intensive accuracy	Medical, scientific diagrams sometimes incorrect	Tied to underlying model knowledge, slow to fix
Geographic blocks	Restricted in mainland China, Russia, Iran, North Korea	Unchanged

The model struggles with complex human poses, particularly when limbs overlap or when a scene requires multiple figures in close physical contact. Hand and finger artifacts have improved across versions but remain the most common defect in human-figure outputs. Crowd scenes with multiple faces frequently produce distorted results.

Text rendering accuracy drops considerably for scripts other than Latin-derived alphabets. Chinese, Arabic, and Hebrew text in particular showed unreliable results in early testing, with characters frequently incorrect or malformed. The improvements in GPT Image 1.5 narrowed this gap somewhat, but Latin scripts remain the most reliable.

The default aesthetic is smooth and polished. Fine-grained surface textures, grain, physical wear, or other visual imperfections that convey realism in photography are often absent. Designers chasing a documentary or photojournalistic feel typically prompt explicitly for film grain or use a different model. Reviewers also noted a warm color bias with shadows tending toward amber or yellow rather than neutral, correctable with prompting but requiring awareness.

Mask-based inpainting can show visible discontinuities at the boundary between the edited area and the preserved original, particularly in high-frequency texture regions. The API does not support fine-tuning on custom datasets, so brands that want a model trained on their specific visual style continue to use Adobe Firefly or open-weight alternatives. The model is also less reliable for scientifically accurate illustrations, medical diagrams, or other domain-specific visuals where precise accuracy matters.

Successors and updates

OpenAI has updated GPT Image 1 several times since the initial release. The two most prominent post-launch versions are GPT Image 1 Mini and GPT Image 1.5.

Variant	Release date	Key changes
GPT Image 1 (initial)	April 23, 2025	Initial API launch alongside ChatGPT integration
GPT Image 1 (transparent backgrounds)	June 2025	Added `background: transparent`, WebP output
GPT Image 1 Mini	October 6, 2025	Distilled smaller variant, ~80% cheaper
GPT Image 1 (streaming preview)	November 2025	Streaming partial-image previews in API
GPT Image 1.5	December 16, 2025	~4x faster, ~20% cheaper, anatomy and color fixes
GPT Image 1.5 (extended ratios)	February 2026	Added widescreen and tall mobile aspect ratios

GPT Image 1 Mini was announced on October 6, 2025, during OpenAI DevDay.^[27] It is approximately 80% less expensive than the standard model while retaining most core capabilities, targeting high-volume applications where cost efficiency outweighs peak quality.^[27] Mini-tier pricing puts a low-quality 1024x1024 image at roughly $0.0025 and a medium-quality square image at about $0.008.^[27]

GPT Image 1.5 was released on December 16, 2025.^[28] OpenAI stated that image inputs and outputs were "20% cheaper than in GPT Image 1," and the update improved generation speed by approximately four times.^[28] The release partially addressed the warm color bias and made noticeable progress on hand and finger rendering, though crowd scenes remain weak.^[28]

OpenAI's broader frontier work on GPT-5 and the Sora 2 video model has spilled over into the image side. Internal documentation cited by reporters in early 2026 suggests that the next major image generation update will be branded as part of the GPT-5 family rather than continuing the GPT Image numbering, though as of mid-2026 GPT Image 1.5 remains the production model.

References

OpenAI. "Introducing 4o Image Generation." OpenAI Blog, March 25, 2025. https://openai.com/index/introducing-4o-image-generation/ ↩
OpenAI. "Introducing our latest image generation model in the API." OpenAI Blog, April 23, 2025. https://openai.com/index/image-generation-api/ ↩
OpenAI. "GPT Image 1 Model Documentation." OpenAI API Docs. https://developers.openai.com/api/docs/models/gpt-image-1 ↩
Metz, Cade, and Nico Grant. "OpenAI's viral Studio Ghibli moment highlights AI copyright concerns." TechCrunch, March 26, 2025. https://techcrunch.com/2025/03/26/openais-viral-studio-ghibli-moment-highlights-ai-copyright-concerns/ ↩
Milmo, Dan. "ChatGPT's viral Studio Ghibli-style images highlight AI copyright concerns." CNN, March 27, 2025. https://www.cnn.com/2025/03/27/style/chatgpt-studio-ghibli-ai-images-intl-hnk/index.html ↩
Bass, Dina. "AI generated Ghibli images go viral as OpenAI loosens its rules." Washington Post, March 28, 2025. https://www.washingtonpost.com/technology/2025/03/28/chatgpt-ghibli-ai-images-copyright/ ↩
Brewster, Thomas. "Sam Altman says ChatGPT's Studio Ghibli-style images are 'melting' OpenAI's GPUs." Fortune, March 28, 2025. https://fortune.com/2025/03/28/sam-altman-chatgpt-gpus-melting-ai-images/ ↩
GIGAZINE. "130 million people use ChatGPT's new image generation feature to generate over 700 million images in a week." April 4, 2025. https://gigazine.net/gsc_news/en/20250404-chatgpt-users-generated-over-700m-images/ ↩
Robison, Greg. "Tokens Not Noise: How GPT-4o's Approach Changes Everything About AI Art." Medium, 2025. https://gregrobison.medium.com/tokens-not-noise-how-gpt-4os-approach-changes-everything-about-ai-art-99ab8ef5195d ↩
Marktechpost. "Transformer Meets Diffusion: How the Transfusion Architecture Empowers GPT-4o's Creativity." April 6, 2025. https://www.marktechpost.com/2025/04/06/transformer-meets-diffusion-how-the-transfusion-architecture-empowers-gpt-4os-creativity/ ↩
Cursor IDE Blog. "GPT-image-1 Pricing Guide 2025: Complete Cost Breakdown for Developers." https://www.cursor-ide.com/blog/gpt-image-1-pricing-guide-2025 ↩
OpenAI. "OpenAI API Pricing." https://openai.com/api/pricing/ ↩
GPT Image Wikipedia article. https://en.wikipedia.org/wiki/GPT_Image
Aman's AI Journal. "GPT-4o Native Image Generation." https://aman.ai/primers/ai/gpt4o-native-image-generation/ ↩
The AI Enterprise. "ChatGPT-4o's New Image Capabilities: Beyond Diffusion." https://www.theaienterprise.io/p/chatgpt-new-image-capabilities-beyond-diffusion ↩
OpenAI Tools Hub. "GPT Image vs DALL-E 3: Real Differences." https://www.openaitoolshub.org/en/blog/gpt-image-vs-dall-e ↩
MindStudio. "What Is GPT Image 1? OpenAI's Native Image Generation Model." https://www.mindstudio.ai/blog/what-is-gpt-image-1-openai
Inc. Magazine. "OpenAI's New Image-Gen Tech Is Now Available for Businesses." https://www.inc.com/ben-sherry/openais-new-image-gen-tech-is-now-available-for-businesses-and-some-are-already-using-it/91180264 ↩
TechRadar. "OpenAI unveiled image generation for 4o: everything you need to know." https://www.techradar.com/news/live/openai-march-25-livestream-event ↩
NHK. Documentary footage of Hayao Miyazaki on AI animation, originally aired 2016 (footage widely re-circulated in March 2025). ↩
Coalition for Content Provenance and Authenticity. "C2PA Specification." https://c2pa.org/specifications/specifications/2.0/index.html ↩
OpenAI. "DALL-E 2 and DALL-E 3 API deprecation notice." OpenAI Help Center, 2026. ↩
Microsoft. "Microsoft Copilot adopts gpt-image-1 as default image model." Microsoft Blog, April 2025. ↩
Adobe. "Firefly Boards introduces external model support, including gpt-image-1." Adobe Blog, April 23, 2025. ↩
Canva. "AI image generation in Canva: gpt-image-1 integration." Canva Newsroom, May 2025. ↩
Figma. "AI in Figma: image generation integration." Figma Blog, April 2025.
OpenAI. "GPT Image 1 Mini at OpenAI DevDay 2025." OpenAI Blog, October 6, 2025. ↩
OpenAI. "GPT Image 1.5 rolling out in the API and ChatGPT." OpenAI Blog and Developer Community, December 16, 2025. https://community.openai.com/t/gpt-image-1-5-rolling-out-in-the-api-and-chatgpt/1369443 ↩
Altman, Sam. Comments at OpenAI livestream announcing 4o image generation, March 25, 2025 (reported by Fortune). https://fortune.com/2025/03/28/sam-altman-chatgpt-gpus-melting-ai-images/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

4 revisions by 1 contributors · full history

Suggest edit

GPT Image 1

Overview

What is GPT Image 1 and how does it differ from DALL-E?

How does GPT Image 1 work? Native multimodality and the Transfusion approach

Why does autoregressive image generation matter?

C2PA watermarking

When was GPT Image 1 released in ChatGPT? (March 2025)

The Studio Ghibli moment

When was the gpt-image-1 API released, and what does it cost? (April 2025)

Available API parameters

What can GPT Image 1 do? Capabilities

Text rendering

Photorealism and style control

Image editing and inpainting

Multi-image composition

Instruction following

How much does GPT Image 1 cost? Pricing and access tiers

Who uses the gpt-image-1 API?

What is GPT Image 1 used for? Use cases

Marketing and content creation

Design and prototyping

Education and documentation

Game development and entertainment

How does GPT Image 1 compare with other image models?

Reception

Controversies

Style replication and copyright

Infrastructure strain and access inequality

Geographic restrictions

Limitations

Successors and updates

See also

References

Improve this article

What links here (24 of 26)

What links here (24 of 26)

Overview

What is GPT Image 1 and how does it differ from DALL-E?

How does GPT Image 1 work? Native multimodality and the Transfusion approach

Why does autoregressive image generation matter?

C2PA watermarking

When was GPT Image 1 released in ChatGPT? (March 2025)

The Studio Ghibli moment

When was the gpt-image-1 API released, and what does it cost? (April 2025)

Available API parameters

What can GPT Image 1 do? Capabilities

Text rendering

Photorealism and style control

Image editing and inpainting

Multi-image composition

Instruction following

How much does GPT Image 1 cost? Pricing and access tiers

Who uses the gpt-image-1 API?

What is GPT Image 1 used for? Use cases

Marketing and content creation

Design and prototyping

Education and documentation

Game development and entertainment

Personal use and social media

How does GPT Image 1 compare with other image models?

Reception

Controversies

Style replication and copyright

Training data and consent

Infrastructure strain and access inequality

Geographic restrictions

Limitations

Successors and updates

See also

References

Improve this article

Related Articles

Sora 2

Nano Banana Pro

Seedream 4.0

DALL-E

DALL-E 3

GPT Image 2

What links here (24 of 26)

Related Articles

Sora 2

Nano Banana Pro

Seedream 4.0

DALL-E

DALL-E 3

GPT Image 2

What links here (24 of 26)