Imagen 4 is the fourth-generation text-to-image model developed by Google DeepMind, announced on May 20, 2025, at Google I/O 2025. The model introduces native 2K resolution output, substantially improved text rendering and typography, and enhanced photorealism across fine details such as fabric textures, water reflections, and animal fur. It is available in three variants -- Imagen 4 (standard), Imagen 4 Ultra, and Imagen 4 Fast -- through the Gemini API, Google AI Studio, Vertex AI, and directly within the Gemini app, Google Workspace applications, and the experimental Whisk tool. All outputs carry an invisible SynthID digital watermark. The standard and Fast variants reached general availability on August 14, 2025; Imagen 4 Ultra reached GA at the same time.
The model is positioned as Google DeepMind's most capable image generation system at the time of release, competing directly with FLUX 2, GPT Image 1 from OpenAI, Midjourney V7, and other frontier text-to-image systems. Independent benchmark evaluations place Imagen 4 Ultra among the top-ranked models globally for photorealism and prompt adherence, while the standard and Fast variants offer a cost-efficient middle tier for high-volume commercial applications.
The Imagen series began at Google Brain with the publication of "Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding" in May 2022. The original Imagen model used a two-stage architecture: a transformer-based language model (T5-XXL) for text encoding, followed by a cascade of diffusion models that generated images at 64x64 pixels and then progressively upsampled to 256x256 and finally 1024x1024 resolution. The approach demonstrated strong alignment between text prompts and image outputs, and Imagen 1 was briefly competitive with DALL-E 2 on human preference evaluations. Google did not release Imagen 1 publicly, offering only a limited demonstration website.
In April 2023, Google Brain and Google DeepMind merged into a single organization under Demis Hassabis. Development of the Imagen series transferred to the combined entity, marking a shift in the organizational home of the research.
Imagen 2 was announced in December 2023 and represented the first version of the series made broadly available to developers, initially through Vertex AI. The standout capability in Imagen 2 was text and logo generation: the model could render coherent, legible text strings within generated images, a task that prior diffusion models handled poorly. Imagen 2 also introduced watermarking via SynthID and was integrated into Workspace applications including Slides and Docs. Researchers at Google DeepMind noted that Imagen 2 was trained on a corpus filtered for quality, PII, and duplicates, and that a portion of training captions were generated synthetically.
Imagen 3 was released in August 2024 and further advanced the series on photorealism, prompt adherence, and art style diversity. Google described it as generating brighter, better-composed images with richer details and textures, and extended its capability to a wider range of art styles from photorealism to impressionism, abstract, and anime-influenced illustration. Imagen 3 improved upon Imagen 2 on fine spatial details and lighting, and on the GeckoNum benchmark for object counting and spatial reasoning it outperformed DALL-E 3 by 12 percentage points. Imagen 3 became the baseline against which Imagen 4's improvements were measured.
Imagen 4 was announced publicly on May 20, 2025, during the Google I/O 2025 keynote. Google described the model as its "highest quality text-to-image" system and highlighted three headline capabilities: native 2K resolution generation, substantially better text rendering and typography, and up to 10x faster generation in its Fast variant relative to Imagen 3.
At announcement, Imagen 4 was immediately made available in the Gemini app for paid Gemini Advanced subscribers, within the experimental Whisk image generation tool, and across Google Workspace applications in Slides, Vids, and Docs.
On June 24, 2025, Google launched Imagen 4 and Imagen 4 Ultra into paid preview through the Gemini API and Google AI Studio for developers, at $0.04 and $0.06 per image respectively.
On August 14, 2025, all three variants -- Imagen 4, Imagen 4 Ultra, and Imagen 4 Fast -- reached general availability on both the Gemini API and Vertex AI, with the model IDs imagen-4.0-generate-001, imagen-4.0-ultra-generate-001, and imagen-4.0-fast-generate-001 stabilized for production use.
Imagen 4 ships as a family of three models, each tuned for different points on the quality-speed-cost tradeoff.
The standard variant, accessed via the model ID imagen-4.0-generate-001, is the primary general-purpose offering. It supports native generation up to 2048x2048 pixels (the 2K capability) and can produce images at resolutions including 1024x1024, 896x1280, 1280x896, 768x1408, 1408x768, 2048x2048, 1792x2560, 2560x1792, 1536x2816, and 2816x1536, across aspect ratios of 1:1, 3:4, 4:3, 9:16, and 16:9. The standard model accepts prompts up to 480 tokens and can return between 1 and 4 images per request. Its rate limit on Vertex AI is 75 requests per minute per region. Pricing on both Vertex AI and the Gemini API is $0.04 per output image.
The standard variant is positioned for production applications requiring high visual quality without the added cost of the Ultra tier. Use cases include marketing collateral, editorial illustration, branded content, and product imagery.
Imagen 4 Ultra, accessed via imagen-4.0-ultra-generate-001, is the premium flagship in the family. It shares the same resolution and aspect ratio support as the standard model but is trained with additional quality optimization to produce outputs with stronger prompt adherence, more precise rendering of complex details, and superior alignment with abstract or symbolic language in prompts.
Google positions Imagen 4 Ultra for professional applications such as advertising, branding, architectural visualization, and editorial photography, where the highest possible output quality justifies the additional cost. Generation times for Ultra are longer than the standard variant, typically in the 15-30 second range depending on resolution, and the rate limit is 30 requests per minute on Vertex AI -- lower than the standard tier. Pricing is $0.06 per output image. On the Artificial Analysis Image Arena quality leaderboard, Imagen 4 Ultra is consistently ranked among the top-tier models globally, close behind GPT Image 1 in overall preference votes according to evaluations conducted through 2025.
Notably, the MindStudio documentation for Imagen 4 Ultra reports that the model was trained using Google's sixth-generation Tensor Processing Units (Trillium TPUs) and leveraged synthetic captions generated by Gemini to improve prompt-following robustness. These captions were used alongside the filtered real-data corpus to ensure that the model could interpret not only literal descriptions but also metaphorical, stylistic, or mood-based prompt language.
Imagen 4 Fast, accessed via imagen-4.0-fast-generate-001, is a speed-optimized variant designed for rapid iteration, high-volume pipelines, and cost-sensitive use cases. Google describes it as "up to 10x faster than Imagen 3," with generation times typically in the 3-8 second range at standard resolutions. The Fast model supports the same set of aspect ratios as the other variants but is limited to lower resolutions -- up to 1408x768 or 768x1408 on the long edge -- and does not support the 2K output of the standard and Ultra models. Its rate limit is 150 requests per minute, the highest in the family.
Imagen 4 Fast is priced at $0.02 per output image, making it the most economical option in the Imagen 4 family and competitive with mid-tier offerings from other providers. It is positioned for applications such as real-time design prototyping, social media content at scale, automated illustration pipelines, and any workflow where latency and throughput matter more than maximum output quality.
Google DeepMind has not published a detailed technical paper for Imagen 4 at the time of general availability. Based on available documentation, third-party analysis, and the known lineage of the series, the following architectural characteristics have been confirmed or credibly reported.
Imagen 4 uses a latent diffusion model architecture, following the same broad design lineage as Imagen 3. In latent diffusion, the diffusion process operates in a compressed latent space rather than directly on pixel values, which reduces computational cost and enables generation of high-resolution images that would be prohibitively expensive to produce with pixel-space diffusion. The model employs a transformer-based language encoder (consistent with the T5 lineage established in Imagen 1) to process text prompts and produce conditioning representations that guide the denoising process.
Training used Google's TPU infrastructure, with the Ultra variant specifically reported to have leveraged Trillium (sixth-generation) TPU pods. The training corpus is described as having been aggressively filtered to remove low-quality images, PII, AI-generated synthetic imagery, and duplicate content. A portion of training captions were generated synthetically using Gemini, a technique that improves instruction-following by ensuring that captions describe images in natural-language terms similar to how users write prompts.
The architecture is implemented in JAX and Flax, consistent with Google DeepMind's standard ML framework for production model development. The model does not expose negative prompting as a user-facing feature, and does not support image editing, inpainting, outpainting, style transfer, or upscaling through the same endpoints -- these capabilities are absent from the Imagen 4 API as of general availability.
The 2K resolution capability reflects advances in the latent compression and upsampling stages, allowing the model to generate images at 2048x2048 or equivalent non-square resolutions in a single pass without a separate super-resolution postprocessing step.
Text rendering within generated images has historically been one of the most conspicuous weaknesses of diffusion-based text-to-image systems. Baseline diffusion models tend to generate plausible-looking letterforms but fail at spelling consistency, producing garbled words, repeated characters, or mixed-up letterforms even for short strings. Imagen 4 makes the most significant advance in this area of any Imagen generation.
Google describes the improvements as enabling "significantly better spelling, longer text strings, and new layouts." In practice, reviewers and independent testers report that Imagen 4 can reliably render complete sentences on product labels, posters, greeting cards, menus, and comic book panels -- use cases that required manual post-editing with every prior version of the series. The AllAboutAI review awarded the text rendering capability a 5 out of 5 rating, describing the results as "the best I've seen from any AI model so far" in a head-to-head evaluation.
The improvements are attributable to several interacting factors. The larger training corpus with higher-quality synthetic captions gave the model more examples of real-world typography and its associated textual descriptions. The higher base resolution of training and inference reduces the effective pixel-per-character density problem that caused letterforms to degrade at 512x512 outputs. The transformer-based text encoder already encodes the correct sequence of characters and passes strong conditioning signals; the diffusion backbone in Imagen 4 appears to have learned better to propagate those character-level constraints through to pixel generation.
Despite the substantial progress, limitations remain. Tiny fonts at small physical sizes, text with complex wrapping or circular layout, and text embedded in highly detailed or busy backgrounds can still produce errors. Non-Latin scripts, while supported for prompts in Chinese, Hindi, Japanese, Korean, Portuguese, and Spanish, have less reliable text rendering in outputs compared to Latin-alphabet strings.
Imagen 4 and Imagen 4 Ultra support native generation at up to 2048 pixels on the long edge, including true 2048x2048 square images and extended rectangular formats such as 2816x1536 and 1536x2816. This is a meaningful step over the effective maximum resolutions of Imagen 3 and prior systems, which topped out at 1024x1024 for standard outputs.
Native 2K generation means that the model produces fine detail at full resolution in a single inference pass, without separate post-generation upscaling. At 2048x2048, outputs can be used directly in print materials, large-format displays, and high-resolution product imagery without the artifacts introduced by AI upscalers. For the 2K square format and the ultra-wide 2816x1536 landscape format, the model maintains composition balance and visual coherence across the full canvas -- a challenge that naive scaling approaches fail at because they cannot re-compose or re-render at the larger scale.
Imagen 4 Fast does not support 2K output, with its resolution ceiling at 1408x768 on the long edge. The pricing reflects this distinction: the Fast variant is cheaper in part because it does not pay the compute cost of 2K generation.
Outputs are delivered in PNG or JPEG format, with a maximum file size of 10MB per image. Each request can return between 1 and 4 images simultaneously.
Developers access Imagen 4 programmatically through the Gemini API using standard REST or SDK calls. The model IDs for the three variants are imagen-4.0-generate-001, imagen-4.0-ultra-generate-001, and imagen-4.0-fast-generate-001. The API reached paid preview on June 24, 2025, and general availability on August 14, 2025.
Google AI Studio provides a browser-based testing environment where developers can experiment with the models without writing code. Free-tier access to Imagen 4 in Google AI Studio is limited to 2 images per minute across all three variants. Upgrading to a billing-linked account raises this to 10 images per minute (Tier 1) at no minimum spend. Google AI Studio usage itself is free; charges apply only when usage exceeds free quotas or when calling the paid Gemini API.
Official code samples and cookbooks are available through the Google Gemini GitHub repository.
Vertex AI is Google Cloud's managed ML platform and provides enterprise-grade access to Imagen 4 with additional governance, security, and compliance features. The three model IDs are the same as in the Gemini API. Rate limits on Vertex AI are 75 requests per minute for the standard variant, 30 for Ultra, and 150 for Fast, per region. Provisioned throughput is available for enterprise customers requiring higher sustained throughput.
Vertex AI access includes the full suite of Responsible AI controls described in the Safety Filters section, including configurable safety thresholds, person generation controls, and audit logging. Enterprise customers can request modified safety thresholds through their Google Cloud account team.
Imagen 4 powers image generation in the Gemini consumer app, available to Gemini Advanced and AI Ultra subscribers. The Gemini app provides a conversational interface for prompt-driven image creation without requiring API credentials.
Across Google Workspace, Imagen 4 is integrated into Google Slides, Google Docs, and Google Vids for generating visuals directly within productivity workflows. Users can generate custom images, backgrounds, and illustrative elements without leaving the Workspace environment. This integration was announced as generally available at Google I/O 2025, coinciding with the model announcement.
Whisk is a Google Labs experimental tool that uses Imagen 4 as its generation backbone. Whisk allows users to combine reference images for subject, scene, and style, then generate new variations through a visual drag-and-drop interface. The tool is available as a free experiment through labs.google.
ImageFX, Google's standalone public text-to-image demo site at aitestkitchen.withgoogle.com, was updated to use Imagen 4 after the May 2025 announcement, providing free-tier access for casual users who want to test the model without creating developer accounts.
Imagen 4 is also accessible via third-party API aggregators including fal.ai and the Vercel AI Gateway, at pricing close to the native Gemini API rates. These platforms allow developers already using multi-provider SDKs to access Imagen 4 alongside models from other vendors through unified endpoints.
Imagen 4 uses a per-image pricing model across all access paths. The pricing is consistent between the Gemini API and Vertex AI.
| Variant | Model ID | Price per image | Rate limit (Vertex AI) |
|---|---|---|---|
| Imagen 4 Fast | imagen-4.0-fast-generate-001 | $0.02 | 150 RPM |
| Imagen 4 (Standard) | imagen-4.0-generate-001 | $0.04 | 75 RPM |
| Imagen 4 Ultra | imagen-4.0-ultra-generate-001 | $0.06 | 30 RPM |
There are no per-token input charges. The prompt (up to 480 tokens) is included in the per-image price. Requests returning 4 images are billed at 4x the single-image rate. Image upscaling, when available as a separate capability, is priced at $0.06 per image regardless of the source model.
For comparison, DALL-E 3 via the OpenAI API is priced at approximately $0.040-$0.080 per image depending on resolution and quality setting. FLUX 2 Pro on the fal.ai platform is priced at approximately $0.05 per image. Midjourney does not offer a public API; it is subscription-only via Discord. The Imagen 4 Fast tier at $0.02 per image is among the lowest prices for a frontier-quality text-to-image API at the time of its launch.
A free tier exists for experimental access: Google AI Studio provides 2 images per minute across Imagen 4 variants at no charge for users who have a Google account, with rate limits reduced as of December 2025 due to abuse mitigation.
All Imagen 4 variants are scheduled for discontinuation on June 30, 2026, at which point Google recommends migrating to gemini-2.5-flash-image, the successor model that integrates image generation directly into Gemini 2.5.
The text-to-image model landscape in mid-2025 includes several high-capability systems competing on photorealism, prompt adherence, text rendering, speed, and price. The table below summarizes the principal competitors at the time of Imagen 4's general availability.
| Model | Developer | Price per image | Resolution | Text rendering | Notes |
|---|---|---|---|---|---|
| Imagen 4 Ultra | Google DeepMind | $0.06 | Up to 2048x2048 | Excellent | Top-tier photorealism; Elo near top of leaderboards |
| Imagen 4 (Standard) | Google DeepMind | $0.04 | Up to 2048x2048 | Very good | Best value for high-resolution output |
| Imagen 4 Fast | Google DeepMind | $0.02 | Up to 1408x768 | Good | 10x faster than Imagen 3; lowest price in class |
| GPT Image 1 | OpenAI | $0.04-$0.08 | Up to 1024x1024 | Excellent | Best text rendering overall; strong prompt adherence |
| FLUX 2 Pro | Black Forest Labs | ~$0.05 | Up to 1440x1440 | Good | Top Elo scores; Elo ~1,265; fast generation |
| Midjourney V7 | Midjourney | Subscription only | Up to 2048x2048 | Moderate | Best for stylized/artistic output; no API |
| DALL-E 3 | OpenAI | $0.040-$0.080 | Up to 1024x1024 | Good | Integrated with ChatGPT; superseded by GPT Image 1 |
GPT Image 1 from OpenAI, released in April 2025, is Imagen 4's primary competitor in the enterprise API segment. Both models excel at photorealism and text rendering. Independent evaluations from mid-2025 consistently place GPT Image 1 (especially its high-quality variant) at or near the top of quality leaderboards, with Imagen 4 Ultra ranked close behind. GPT Image 1 has an advantage in dense text, small-scale letterforms, and complex typographic layouts. Imagen 4 Ultra holds an advantage in photorealistic skin tones and fine material textures, and offers native 2K output at a lower price point than GPT Image 1's high-quality tier. Both models apply mandatory watermarking to outputs.
FLUX 2, developed by Black Forest Labs and released in late 2024, uses a flow-matching transformer architecture rather than conventional latent diffusion, and consistently ranks at or near the top of the Artificial Analysis Image Arena Elo leaderboard alongside GPT Image 1. FLUX 2 Pro and FLUX 2 [max] generate images at high quality with relatively fast inference. Imagen 4 Ultra competes closely with FLUX 2 on overall quality; FLUX 2 has an edge in speed and style variety, while Imagen 4 Ultra leads in photorealistic detail rendering and has the advantage of native 2K output. For developers already on Google Cloud infrastructure, Imagen 4's Vertex AI integration offers compliance and governance features that third-party FLUX deployments cannot match.
Midjourney V7, released in early 2025, remains the leading model for stylized, artistic, and aesthetically distinctive output. Midjourney's aesthetic quality -- particularly its rendering of lighting, mood, and abstract visual concepts -- is widely considered unmatched in subjective preference evaluations focused on artistic work. However, Midjourney does not offer a public API, requires a Discord subscription, and does not support the kind of programmatic, high-volume, or enterprise-integrated workflows that Imagen 4 targets. For photorealism and text rendering, Imagen 4 Ultra outperforms Midjourney V7; for artistic stylization and creative ambiguity, Midjourney remains the preferred choice for many creative professionals.
Every image generated by Imagen 4, regardless of variant or access method, carries an invisible SynthID digital watermark. SynthID was developed by Google DeepMind as a perceptually invisible, robust watermarking system for AI-generated media. For images, SynthID embeds a watermark directly into the pixel values during generation in a way that is imperceptible to human observers under normal viewing conditions but detectable by a trained classifier.
The watermark is designed to survive common image transformations including cropping, resizing, JPEG compression, color adjustments, and moderate blur. The SynthID-Image technical paper, published at ICLR 2025 (arXiv:2510.09263), describes the system as a post-hoc, model-independent approach: it was designed to operate as a wrapper around the generation pipeline rather than being embedded in the model weights themselves, which allows it to be applied to any generative model without retraining.
Detection of SynthID watermarks requires access to Google's detection classifier, which is not publicly available. Users cannot verify SynthID watermarks independently; verification is handled by Google's internal infrastructure. The watermark cannot be disabled or removed by API users, nor can outputs be obtained without it.
In addition to SynthID, Imagen 4 outputs carry C2PA (Coalition for Content Provenance and Authenticity) metadata labels, which mark images as AI-generated in a standardized schema supported by a broad coalition of media companies, hardware manufacturers, and software platforms. C2PA metadata can be stripped by tools that remove EXIF data, so it provides weaker persistence than the pixel-level SynthID watermark.
Imagen 4 incorporates a multi-layered content safety system that filters both inputs (prompts) and outputs (generated images). The system is described in Google's Responsible AI documentation for Imagen on Vertex AI.
The safety filter evaluates content across twelve categories:
For each category, the filter assigns a confidence score. Whether content is blocked depends on a configurable threshold: block_low_and_above (most restrictive), block_medium_and_above (default), or block_only_high (least restrictive, available by request). Enterprise customers on Vertex AI can request adjusted thresholds through their Google Cloud account team.
Regardless of safety threshold settings, certain content categories are always blocked:
Imagen 4 enables person generation by default, including realistic depictions of faces. API users can configure the personGeneration parameter to allow all people (default), allow adults only (excluding minors), or disable person generation entirely. This control is separate from the main safety filter thresholds.
When content is blocked, the API can return a filtering reason if the includeRaiFilteredReason parameter is set. This allows developers to understand why a prompt was rejected and to modify their prompts accordingly, rather than receiving only a generic refusal. Independent testing has noted that the safety filters can be over-sensitive, occasionally blocking innocuous prompts that use words or concepts that incidentally match filter categories -- a known limitation shared with other major text-to-image systems.
Imagen 4's capabilities make it suited to a broad range of commercial and creative applications.
The combination of high resolution, photorealism, and improved text rendering makes Imagen 4 particularly effective for advertising creative production. Agencies can generate campaign imagery, product lifestyle shots, and branded graphic assets with embedded typographic elements (headlines, taglines, product names) without the post-production text overlay step that AI-generated advertising previously required. The Imagen 4 Fast variant supports rapid ideation and A/B testing at scale, while the Ultra variant serves final-asset production.
E-commerce merchants use Imagen 4 to generate product lifestyle images, seasonal campaign visuals, and scene compositions that would otherwise require expensive photography sessions. The model handles consistent material rendering -- fabric, leather, metal, glass -- with the level of quality required for product detail pages. Google's documentation and third-party guides specifically highlight food photography as a strength, citing the model's lighting consistency and material realism.
Media publishers, content platforms, and individual creators use Imagen 4 through the Gemini app and AI Studio to generate editorial illustrations, blog header images, social media visuals, and infographics. The Workspace integration allows Google Slides and Docs users to generate contextually relevant images directly within their documents, reducing the need to switch to external tools.
Design teams use Imagen 4 Fast for rapid concept visualization and mood boarding. The model can generate UI mockup backgrounds, architectural concept renders, brand identity explorations, and packaging designs quickly enough to support iterative feedback loops within design sprints.
Native 2K resolution enables use cases in print-format media where earlier AI models produced insufficient resolution for large-format output. Poster design, magazine layout, and book cover illustration are practical at 2048x2048 without requiring separate upscaling workflows.
Larger organizations access Imagen 4 through Vertex AI to build automated image generation pipelines with enterprise governance. Use cases include generating product images at scale for e-commerce catalogs, personalizing marketing images for different audience segments, and automating visual asset creation for global campaigns. The Vertex AI integration provides audit logging, VPC Service Controls, and data residency options that are not available through the consumer Gemini API.
The reception to Imagen 4 at launch was broadly positive, though more nuanced than Google's announcement framing implied.
The Verge and TechCrunch coverage at Google I/O 2025 highlighted the text rendering improvements as the most meaningful advance, noting that the ability to generate posters and greeting cards with correctly spelled text addresses a longstanding user frustration. Photorealism in fabric, lighting, and texture detail also drew praise from hands-on reviewers.
Independent evaluators on AllAboutAI scored Imagen 4 highly across text rendering (5/5) and prompt understanding, while noting that photorealism in outdoor backgrounds could still feel artificial and that complex crowd scenes lost facial distinctiveness. The Artificial Analysis Image Arena ranked Imagen 4 fifth among text-to-image models as of mid-2025 evaluations, behind GPT Image 1 (high quality), FLUX 2 [max], FLUX 2 Pro, and GPT Image 1.5.
Some reviewers, including coverage on Pollo AI, found Imagen 4 to be inconsistent relative to Imagen 3 in certain style categories, suggesting that the shift in training data composition or architecture may have introduced regressions in specific artistic modes while advancing others. Reddit community feedback was mixed: users praised the aspect ratio flexibility and text improvements while reporting occasional blurry or distorted results in face rendering and fine textures -- observations consistent with the model's documented limitations.
The pricing structure was received positively by developers, particularly the $0.02 Fast tier, which was seen as highly competitive for high-volume applications.
Despite the advances in Imagen 4, several limitations are documented in Google's own materials and confirmed through independent testing.
Imagen 4 continues to produce occasional anatomical errors in human subjects: extra or malformed fingers, distorted facial geometry, and inconsistent proportions in complex poses. These are known failure modes across all major diffusion-based text-to-image systems and represent an ongoing research challenge. The errors are less frequent than in Imagen 3 but are not eliminated.
The model struggles with precise spatial instructions and exact object counts. Requests for a specific number of objects ("exactly seven apples"), precise geometric arrangements ("a perfect grid of four by four tiles"), or strictly centered compositions produce inconsistent results. This reflects a fundamental limitation of diffusion models in learning countable and spatially exact representations.
Imagen 4's training and optimization prioritize photorealism and clean illustration over highly stylized or experimental art directions. For abstract, painterly, or aggressively stylized outputs, Midjourney V7 and FLUX 2 are generally preferred by creative users. The model's safety filters also limit certain artistic directions that involve ambiguously violent or mature visual elements.
While greatly improved, text rendering still fails on small-scale letterforms at high detail levels, non-Latin scripts in generated image content (as opposed to prompt input), text with circular or curved layouts, and multi-line wrapped text in non-standard containers. Dense typographic compositions that mix multiple font sizes and styles remain unreliable.
Imagen 4's API endpoints do not support image editing, inpainting, outpainting, or style transfer as of general availability. Developers who need to modify existing images must use separate systems or wait for future capabilities. This contrasts with GPT Image 1, which supports in-context editing, and with some FLUX 2 deployments that include inpainting functionality.
The safety filter system occasionally blocks innocuous prompts with no clear violation. Developers have reported that prompts involving medical terminology, historical events, or culturally specific visual concepts are rejected without actionable explanations, even with the includeRaiFilteredReason parameter enabled. The non-removable SynthID watermark and mandatory safety filters also constrain use cases in contexts where watermark-free outputs are required by downstream tools.
All three Imagen 4 variants are scheduled for discontinuation on June 30, 2026, a notably short runway of less than 11 months from the August 2025 GA date. Users building production systems on Imagen 4 will need to plan migration to gemini-2.5-flash-image or a successor model within this window.