GPT Image 2
Last reviewed
Jun 3, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 · 1,661 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 · 1,661 words
Add missing citations, update stale details, or suggest a clearer explanation.
GPT Image 2 (model name gpt-image-2, marketed inside the product as ChatGPT Images 2.0) is an image-generation model released by OpenAI on April 21, 2026. It is the company's first image model that "thinks" before it draws, running a reasoning pass to plan layout, spatial relationships, and text placement before any pixels are generated. It can also search the web mid-generation, produce up to eight images from a single prompt, and double-check its own output. [1][2][3] The model succeeds GPT Image 1 (March 2025) and the interim GPT Image 1.5 (December 2025), and within hours of release it took the top spot across every category of the independent Image Arena leaderboard by the largest margin the leaderboard had ever recorded. [4][5]
The launch arrived in the middle of a competitive sprint. OpenAI had declared an internal "code red" in early December 2025 after Google's Gemini 3 and the viral Nano Banana image models cut into ChatGPT's lead, and CEO Sam Altman had named image generation a priority. [6] GPT Image 1.5 was the first counterpunch; GPT Image 2 was the heavier one, and it landed against Google's Nano Banana 2 as the model to beat. [4][7]
| Attribute | Detail |
|---|---|
| Developer | OpenAI |
| Model name | gpt-image-2 (product brand: ChatGPT Images 2.0) |
| Released | April 21, 2026; broad ChatGPT and Codex rollout April 22 [1][2] |
| Predecessor | GPT Image 1.5 (gpt-image-1.5, Dec 2025); earlier GPT Image 1 |
| Type | Native multimodal image generation with a reasoning step |
| Max resolution | Up to 2K in ChatGPT; higher resolutions (reported up to 4K) offered in beta via the API [2][3] |
| Aspect ratios | Several presets from ultra-wide 3:1 to ultra-tall 1:3 [1][3] |
| Modes | Instant (fast) and Thinking (reasoning, web search) [3] |
| Availability | All ChatGPT plans, Codex, and the API [1][2] |
| Knowledge cutoff | December 2025 [2] |
GPT Image 2 is a native multimodal generator, meaning the same model that understands a text prompt is the one that renders the picture, rather than handing the request off to a separate system. That design carries over from GPT Image 1 and 1.5. What changed is the addition of a reasoning step. Before generating, the model interprets what the user actually wants, considers spatial relationships, plans where text should sit, and works through the visual logic of the scene. [1][8] OpenAI frames the result as images that "feel less AI-generated," with a better sense of composition and visual taste. [1]
OpenAI declined to confirm the underlying architecture. It would not say whether gpt-image-2 is diffusion-based, autoregressive, or a hybrid, describing it instead as a new generalist model and, informally, as a "GPT for images." Several outlets noted the shift away from the earlier two-stage inference pipeline toward a single-pass generator, but the company has not published a technical paper or system card with architecture details as of this writing. [2][8]
The lineage runs from DALL-E to a native pipeline and now to a reasoning pipeline. The original DALL-E models, including DALL-E 3, worked as a separate diffusion service that ChatGPT called when a user asked for a picture. GPT Image 1 collapsed that separation by building generation directly into the GPT-4o model, so images and text shared one context. GPT Image 1.5 moved the same idea onto the GPT-5 model and made edits up to four times faster. GPT Image 2 keeps the native design and adds the thinking step plus web search. [9][3]
| Model | Released | Key change |
|---|---|---|
| DALL-E 3 | 2023 | Separate diffusion service invoked by ChatGPT |
| GPT Image 1 | Mar/Apr 2025 | Generation built natively into GPT-4o |
| GPT Image 1.5 | Dec 2025 | GPT-5 backbone, much faster editing |
| GPT Image 2 | Apr 2026 | Reasoning before generation, web search, up to 2K |
The most visible practical gap is text. DALL-E 3 routinely misspelled words in images. A widely shared comparison used a Mexican restaurant menu: the 2024 model produced items like "enchuita" and "churiros," while GPT Image 2 rendered the menu cleanly. [2] The other big difference is agency. Where earlier models took a prompt and produced an image in one shot, GPT Image 2 can pause to research, batch several candidates, and verify the result against the prompt before returning it. [3][8]
Reasoning is the headline feature. In Thinking mode the model spends more time planning, which OpenAI says yields better instruction-following, more accurate object placement, and steadier composition than its predecessors. The same mode unlocks web search, so the model can pull real-time information into a generation, for example current facts for an infographic. [1][3][8]
Text rendering is the most cited concrete improvement. GPT Image 2 reads the prompt, plans the layout, and renders legible type across non-Latin scripts including Japanese, Korean, Chinese, Hindi, and Bengali, without the warped letters and invented glyphs that affected earlier generators. [1][2] Reviewers reported that it handles dense, text-heavy outputs such as full infographics, slides, maps, and multi-panel comics with surprising accuracy. [10] The model still has limits: very long blocks of text can break down after a few hundred characters, and fine details like exact logo geometry remain unreliable. [3]
On resolution and layout, ChatGPT outputs go up to 2K across several aspect ratios from 3:1 to 1:3, with the API offering higher resolutions in beta. [2][3] The model is built for editing as well as generation, handling small text, icons, UI elements, and subtle stylistic instructions, and it can produce up to eight images from one prompt. [1][3]
ChatGPT Images 2.0 rolled out to all ChatGPT and Codex users beginning April 22, 2026, with API access opening at the same time. [1][2] Instant mode is free for everyone. Thinking mode, with its extended reasoning and web search, is reserved for paid subscribers on the Plus, Pro, Business, and Enterprise plans. [2][3]
In the API, gpt-image-2 uses token-based pricing that scales with quality and resolution. Reported figures put image input at about $8 per million tokens and output in the region of $30 per million tokens. A standard 1024 by 1024 image at high quality works out to roughly $0.21, with low and medium quality costing far less. [3] At larger sizes the per-image cost can fall below the equivalent on GPT Image 1.5, while at the standard 1024 by 1024 high-quality setting it runs somewhat higher. [3]
The reception was driven by the Image Arena leaderboard run by Arena (formerly LMArena), which ranks models by blind human preference votes. Within about 12 hours of release, GPT Image 2 swept first place across all three Arena categories: Text-to-Image, Single-Image Edit, and Multi-Image Edit. In Text-to-Image it scored 1512, ahead of the next model, Nano Banana 2 at roughly 1271, a lead of about 242 Elo points that Arena called the largest gap it had ever seen. [4][5][7] Coverage citing Arena's data put the model's blind-comparison win rate near 93 percent. [7]
The picture on Artificial Analysis, which runs its own image arena, is more measured. Later snapshots there placed a gpt-image-2 variant well inside the top tier but not as a runaway leader, a reminder that different leaderboards use different prompts, voters, and quality settings. [11] For context, GPT Image 1.5 had also reached the top of the Arena in December 2025 yet never caught on with consumers, so a high ranking did not by itself guarantee adoption. [7]
Press coverage was broadly positive and focused on two things: the thinking step and text. TechCrunch led with how good the model is at generating text, the weakness that had defined AI image tools. [2] The New Stack and The Decoder framed the launch around the idea that OpenAI now reasons before it draws, putting gpt-image-2 in the same conceptual lane as Google's Nano Banana Pro, which had popularized clean in-image text months earlier. [8][12] VentureBeat described near-flawless multilingual text, infographics, slides, maps, and even manga. [10]
The competitive subtext was hard to miss. Reporting tied GPT Image 2 directly to the December "code red" and the race with Google, casting it as the product that finally let OpenAI reclaim attention in image generation after Nano Banana's viral run. [6][7] OpenAI followed the launch by retiring its older DALL-E 2 and DALL-E 3 models in May 2026, closing the book on the diffusion-service era that GPT Image 2 had superseded. [7]