ControlNet

Computer Vision Deep Learning Generative AI Image Generation

19 min read

Updated Jun 22, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 22, 2026

Fact-checked

In review queue

Sources

16 citations

Revision

v3 · 3,754 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

ControlNet is a neural network architecture that adds spatial and structural control to large pretrained text-to-image diffusion models. It was introduced in a paper submitted on 10 February 2023 by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala at Stanford University, titled "Adding Conditional Control to Text-to-Image Diffusion Models" (arXiv:2302.05543) ^[1]. The architecture lets users constrain image generation with auxiliary inputs such as Canny edge maps, depth maps, human pose skeletons, semantic segmentation masks, or scribbles, in addition to the usual text prompt. As the abstract puts it, ControlNet "locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls" ^[1]. It is most commonly paired with Stable Diffusion, the first widely available open-weights text-to-image model.

ControlNet works by freezing a copy of the pretrained diffusion U-Net and training a parallel trainable copy of its encoder, connected back to the frozen network through "zero convolutions," 1x1 convolution layers initialized to zero weights and biases. The zero initialization keeps the auxiliary branch from disrupting the base model at the start of training, which lets ControlNet be fine-tuned reliably even on relatively small datasets; the authors report that training is robust with datasets both smaller than 50,000 images and larger than 1 million images ^[1]. The paper received the Marr Prize (Best Paper Award) at the International Conference on Computer Vision (ICCV) 2023, one of two papers to share the honor that year ^[2]^[3].

What problem does ControlNet solve?

Diffusion models, originally proposed by Sohl-Dickstein and colleagues in 2015 and refined into Denoising Diffusion Probabilistic Models by Ho et al. in 2020, learn to generate samples by reversing a gradual noising process applied to training data ^[4]. By 2022 they had become the dominant approach to text-to-image generation, powering systems including DALL-E 2, Google's Imagen, and Stable Diffusion. Stable Diffusion, released in August 2022 by Rombach and colleagues at the CompVis group with collaborators at Runway and Stability AI, used a latent diffusion formulation that ran the denoising process in a compressed VAE latent space rather than at full pixel resolution ^[14]. Crucially, its model weights were released openly, which let researchers and hobbyists experiment with fine-tuning, custom samplers, and downstream tools.

Text prompts alone gave impressive results, but they were a blunt instrument when an artist or designer needed precise spatial control. A prompt could request "a cat sitting on a striped rug," yet the cat's pose, the rug's perspective, and the camera angle remained at the model's discretion. Earlier customization techniques addressed identity and style rather than layout. Textual inversion, proposed by Gal et al. in 2022, learned new word embeddings from a handful of reference images. DreamBooth, from Ruiz et al. at Google, fine-tuned the diffusion weights to bind a unique token to a specific subject. LoRA (Low-Rank Adaptation) injected small trainable matrices into the cross-attention layers to teach a style or character with minimal parameters. None of these methods provided pixel-level spatial conditioning, so they could not, for example, force a generated person to match a specific pose or a generated room to match a specific architectural plan.

Classifier-free guidance and image-to-image initialization gave partial control. The img2img mode in Stable Diffusion accepts a reference image, encodes it to latents, adds noise, and then denoises with a prompt; this preserves rough composition but blurs and distorts geometry, especially at strong denoising. ControlNet was designed to fill this gap by accepting structured visual inputs alongside the prompt and respecting their geometry throughout the diffusion process ^[1].

How does ControlNet work?

The central design idea of ControlNet is to add a side branch to the denoising U-Net while leaving the original network's weights untouched. Given a pretrained diffusion U-Net with encoder, middle, and decoder blocks, ControlNet creates a trainable copy of the encoder and the middle block. The copy is initialized from the frozen weights, so it starts with the same representational capacity as the base model rather than from random noise ^[1].

The trainable copy receives the conditioning image (for example, a Canny edge map) as input. The conditioning image is first projected to the same spatial resolution as the model's noisy latent through a small four-layer convolutional preprocessor that ends in a stride-1 layer. The encoder copy then processes the combined signal. After each block, the output is passed through a zero convolution, a 1x1 convolution layer whose weights and biases are both initialized to zero. The zero-convolution outputs are added into the corresponding skip connections feeding the frozen U-Net's decoder.

Because the zero convolutions output exactly zero at initialization, the entire ControlNet branch contributes nothing on the very first training step. The frozen U-Net therefore produces the same output it would have produced without conditioning, and any gradient that flows back through ControlNet is well behaved rather than dominated by random noise. The authors describe these as "zero-initialized convolution layers that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning" ^[1]. Mathematically, if the frozen feature is y = F(x) and the ControlNet branch produces y_c = Z(F'(x + Z(c))), then at step zero Z(.) = 0 and y_c = 0, so the combined output y + y_c equals y exactly ^[1].

A characteristic behavior reported in the paper is the sudden convergence phenomenon: during training the loss does not improve gradually as the model learns to follow the conditioning. Instead, the network appears to ignore the control input for several thousand steps, then abruptly aligns its outputs to the condition, usually in fewer than 10,000 optimization steps. In the authors' Canny edge experiment the model fails to follow the condition for roughly the first 6,100 iterations, then at step 6,133 it suddenly begins generating images that match the input edge map exactly ^[1]. The phenomenon is generally attributed to the zero convolutions, which require many gradient updates before they grow large enough to influence the decoder.

During training, the authors randomly replace 50% of text prompts with empty strings. This forces the ControlNet to rely on the visual condition alone for guidance and prevents the text branch from carrying too much of the burden, which improves the strength of spatial alignment.

What conditioning types does ControlNet support?

The original ControlNet 1.0 release in February 2023 shipped eight conditioning modalities for Stable Diffusion 1.5, with additional modalities added in ControlNet 1.1 later that year ^[5]. Each modality is a separately trained ControlNet checkpoint paired with a preprocessor that converts an arbitrary input image into the expected condition. The paper itself tests "various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts" ^[1].

Modality	Preprocessor	Typical use
Canny edges	OpenCV Canny detector	Reproducing line structure of a reference photo
Hough lines (M-LSD)	Mobile Line Segment Detector	Architecture, interiors, perspective scenes
HED soft edges	Holistically-Nested Edge Detection	Painterly recoloring and stylizing
Sketch / scribble	Thinning + simplification	User-drawn input from rough strokes
Human pose	OpenPose body / hand / face keypoints	Pose-specified character generation
Semantic segmentation	ADE20K-style class map	Scene layout with explicit class regions
Depth	MiDaS monocular depth network	3D-aware composition, room layouts
Normal map	Computed from MiDaS depth	Surface-aware re-lighting and stylization
Anime line drawing	Manga-line preprocessor	Coloring of anime / cartoon line art

The Canny model uses the classic Canny edge detector from 1986. The pose model uses OpenPose, the body and hand keypoint estimator developed at Carnegie Mellon. The depth model uses Intel's MiDaS monocular depth network. The semantic segmentation model is trained against the ADE20K protocol, which defines 150 scene-parsing classes. Each preprocessor is shipped with the ControlNet repository so that users can run an off-the-shelf computer vision model on their reference image to generate the input ControlNet expects ^[5].

How is ControlNet trained?

The authors train each ControlNet on a separate dataset that pairs natural images with the corresponding condition. Edge models are trained on roughly 3 million image-edge pairs, semantic segmentation models on roughly 164,000 ADE20K samples extended with internet imagery, and pose models on roughly 80,000 image-keypoint pairs because labeled pose data is scarcer than auto-extracted edges. The paper demonstrates that the same architecture is robust across this range, training successfully on datasets smaller than 50,000 images and larger than 1 million images ^[1].

Training runs use the standard latent diffusion noise prediction loss, augmented only by the additional ControlNet branch. The authors report a single-condition training cost on the order of 600 NVIDIA A100 GPU hours per modality for the original 1.0 release, with finer-grained 1.1 retraining adding 200 to 2,160 GPU hours per checkpoint depending on dataset size and modality difficulty ^[5]. Compatibility was originally limited to Stable Diffusion 1.5, with SDXL checkpoints arriving later in 2023 from both the community and Stability AI.

The sudden convergence phenomenon described in the architecture section is reproducible across modalities. Practitioners observed that running training too short can yield checkpoints that ignore the condition, while running it past convergence usually produces clean alignment with diminishing returns from further training.

What changed in ControlNet 1.1?

In April 2023, Zhang released ControlNet 1.1 as a nightly version of the same GitHub repository, then promoted it to stable later that year. ControlNet 1.1 was not a single new model but a refresh of all 1.0 checkpoints plus several new ones, retrained on better data and renamed to a stricter convention. Files now followed the pattern control_v11<status>_sd15_<name>, where the status code is p for production, e for experimental, and f1 for a bug-fix release on top of an earlier checkpoint ^[5].

The production checkpoints for ControlNet 1.1 are control_v11p_sd15_canny, control_v11p_sd15_mlsd, control_v11f1p_sd15_depth, control_v11p_sd15_normalbae, control_v11p_sd15_seg, control_v11p_sd15_inpaint, control_v11p_sd15_lineart, control_v11p_sd15s2_lineart_anime, control_v11p_sd15_openpose, control_v11p_sd15_scribble, and control_v11p_sd15_softedge. The three experimental checkpoints in 1.1 are control_v11e_sd15_shuffle for content shuffling, control_v11e_sd15_ip2p for InstructPix2Pix style instruction following (Instruct-Pix2Pix), and control_v11f1e_sd15_tile for high-resolution tile-based upscaling.

Notable changes in 1.1 included swapping the older HED preprocessor for a soft-edge model, retraining several checkpoints after Zhang found that earlier datasets had contained quality issues such as a small group of grayscale human images duplicated thousands of times, and improving robustness to imperfect preprocessor outputs. The Inpaint and Tile models in particular became popular because they extended ControlNet beyond strict structural conditioning into more general image enhancement and editing.

ControlNet for SDXL

Stability AI released SDXL in July 2023, replacing Stable Diffusion 1.5 with a larger U-Net (roughly 2.6 billion parameters) and a higher native output resolution of 1024x1024. ControlNet checkpoints had to be retrained to match the new backbone. Both the open-source community and Stability AI released SDXL ControlNet variants over the second half of 2023.

In August 2023, Stability AI released a set of four official Control-LoRA checkpoints for SDXL covering Canny, Depth, Recolor, and Sketch, formatted as LoRA modules rather than full ControlNet branches. Where the original SDXL ControlNet weights were roughly 5 GB on disk, Stability's Control-LoRA versions came in 800 MB rank-256 and 400 MB rank-128 sizes, trading a small amount of quality for substantial savings in disk and VRAM usage ^[6]. Independent groups including Diffusers (canny-sdxl-1.0, depth-sdxl-1.0-small) and InstantX produced additional SDXL ControlNets, including pose models that the community had been requesting.

How does ControlNet compare to similar methods?

ControlNet's release in February 2023 was rapidly followed by a wave of related conditioning approaches. The table below compares the main families.

Method	Year	Approach	Strength
ControlNet	2023	Locked U-Net plus trainable encoder copy with zero convolutions	High-fidelity spatial control
T2I-Adapter	2023	Lightweight adapter (around 77M params) injected at U-Net features	Smaller and faster than ControlNet
Composer	2023	Train a single diffusion model jointly on many decomposed factors	Composable conditions in one model
Uni-ControlNet	NeurIPS 2023	Two adapters covering all local and global conditions	One model handles many condition types
ControlNet-LoRA	2023	LoRA-style decomposition of ControlNet weights	Smaller files, easier merging
IP-Adapter	2023	Image prompt adapter with decoupled cross-attention	Conditions on a reference image's content
InstantID	2024	IdentityNet combining IP-Adapter with face-landmark ControlNet	Face-preserving generation from one photo

T2I-Adapter, from Mou and colleagues at Tencent ARC, was posted to arXiv only eight days after ControlNet (arXiv:2302.08453) and reached similar conditioning quality with about 77 million parameters and 300 MB of storage ^[7]. Composer, from Huang and colleagues at Alibaba's DAMO Academy (arXiv:2302.09778), trained a single diffusion model jointly on a large set of decomposed image factors and remixed them at inference time ^[8]. Uni-ControlNet, accepted to NeurIPS 2023, used two lightweight adapters to handle all local conditions and all global conditions through a single backbone, regardless of how many condition types were combined at inference ^[9]. IP-Adapter, also from Tencent (Ye et al. 2023), generalized the idea further by accepting an arbitrary reference image as a visual prompt, and InstantID combined an IP-Adapter with a face-landmark ControlNet to preserve identity from a single face photo ^[10].

What tools support ControlNet?

The most widely used integration for the original Stable Diffusion 1.5 ControlNet is sd-webui-controlnet, a third-party extension for AUTOMATIC1111's Stable Diffusion web UI maintained by GitHub user Mikubill. The extension exposes all ControlNet 1.0 and 1.1 checkpoints, runs the corresponding preprocessors automatically, and supports stacked ControlNets so that a user can combine, for example, an OpenPose constraint with a depth map ^[11]. By 2024 the extension had over 17,000 GitHub stars and was a standard feature of most Stable Diffusion installations.

ComfyUI, the node-based interface for diffusion workflows, includes native ControlNet nodes out of the box and supports T2I-Adapter, ControlLoRA, ControlLLLite, SparseCtrls, and SVD-ControlNets through built-in or community node packs. InvokeAI, a polished open-source GUI for Stable Diffusion, similarly ships with built-in ControlNet support. The Hugging Face Diffusers library exposes ControlNet through pipelines including StableDiffusionControlNetPipeline, StableDiffusionXLControlNetPipeline, StableDiffusionControlNetInpaintPipeline, and the multi-ControlNet variants, and provides a controlnet_conditioning_scale parameter for tuning how strongly the condition is enforced ^[12].

Zhang himself authored two further tools that lean on ControlNet's design lessons. Fooocus, released in August 2023, is a simplified Stable Diffusion XL front end that hides most knobs and uses GPT-2 to expand prompts; it integrates ControlNet-style conditioning under the hood. Stable Diffusion WebUI Forge, released in early 2024, is a fork of AUTOMATIC1111's web UI optimized for memory efficiency and ControlNet performance, with a UNet patcher system that allows ControlNets, LoRAs, and other adapters to be applied without rebuilding the model graph each time ^[13].

Proprietary systems also adopted similar conditioning ideas after ControlNet's release. Adobe Firefly added structure reference and style reference features in 2023 and 2024, Midjourney v6 introduced character reference and style reference modes, and Runway's video models accepted pose and depth conditions, although none of these systems publicly disclosed whether they reused ControlNet code or simply borrowed the concept.

What is ControlNet used for?

ControlNet expanded the practical range of image generation and image-to-image workflows. Architects and interior designers used the depth and M-LSD ControlNets to turn rough sketches and 3D mock-ups into photorealistic renderings while preserving floor plans and sight lines. Fashion designers fed pose skeletons into the OpenPose ControlNet to generate consistent figures across a clothing lookbook. Storyboard artists and animators used pose plus scribble conditioning to keep characters on-model across many frames. Visual effects studios and indie game developers adopted ControlNet to generate environment art, texture references, and concept variations from depth-rendered geometry.

In 2D art tools, plugins for Krita and Photoshop wrapped ControlNet pipelines so that an artist could paint a rough composition and have the model fill in details while respecting line work. Avatar generation services used ControlNet for face-consistent stylization, and InstantID later took this further by combining a face-landmark ControlNet with an IP-Adapter for one-photo identity preservation. In scientific visualization, researchers experimented with ControlNet as a way to render molecular structures, fluid simulations, and microscopy outputs in a controllable artistic style.

Video extensions also built on the same idea. Sparse-frame ControlNets and motion ControlNets were used with AnimateDiff and Stable Video Diffusion to keep character motion aligned with reference dance footage or pose sequences.

Computational considerations

Because ControlNet runs both the frozen U-Net and the trainable copy of its encoder forward at inference, it roughly doubles the compute cost of one diffusion step compared with the base model. Memory overhead is typically smaller than a 2x increase because only the encoder half is duplicated, but practitioners running on consumer GPUs still report measurable VRAM pressure when stacking multiple ControlNets. The Forge web UI is one practical response: it implements aggressive UNet patching and offloading to keep multiple ControlNets resident at once on cards with 8 to 12 GB of VRAM ^[13].

LoRA-style ControlNets such as Stability's Control-LoRA and the community's ControlLLLite reduce the on-disk size of each conditioning model by an order of magnitude, although they often sacrifice some conditioning fidelity for the savings. Distillation methods including ControlNet-XS and ControlNeXt have been proposed to compress the architecture further by stripping away parts of the encoder copy that contribute little to the output.

Reception and impact

ControlNet was met with immediate enthusiasm in the open-source generative AI community. Within weeks of release, demonstrations of pose-controlled and edge-controlled image generation were widely shared on the Stable Diffusion subreddit, on X (formerly Twitter), and in publications including The Verge and Ars Technica. The paper accumulated several thousand citations within its first year on Google Scholar and was widely treated as the reference architecture for adding any structured conditioning signal to a frozen diffusion model.

At ICCV 2023 in Paris, the paper received the Marr Prize (Best Paper Award), putting it alongside historic computer vision papers including the original SIFT and Mask R-CNN papers as Marr Prize honorees ^[2]^[3]. Lead author Lvmin Zhang, widely known online as lllyasviel on GitHub, became one of the most influential individual contributors in the open-source diffusion ecosystem; alongside ControlNet he authored the manga-line preprocessor used by the Anime Lineart model, the Fooocus interface, the Stable Diffusion WebUI Forge fork, and contributions to layered diffusion control. Co-authors Anyi Rao and Maneesh Agrawala had backgrounds in cinematic video understanding and human-computer interaction respectively, which the authors credited as influencing the focus on practical user-driven control.

ControlNet's broader influence is visible in the family of follow-up papers it inspired: T2I-Adapter, Composer, UniControl, Uni-ControlNet, IP-Adapter, ControlNet-XS, ControlNet++, ControlNeXt, and InstantID all build on the basic recipe of pairing a frozen large diffusion backbone with a smaller trainable conditioning network ^[7]^[8]^[9]^[10]. Major commercial systems including Adobe Firefly, Stable Diffusion 3, Flux, Hunyuan-DiT, and others now expose ControlNet-style structure and reference inputs as a standard feature.

Limitations

The original ControlNet design has several practical limitations. Each conditioning modality requires a separately trained model, which inflates the total disk footprint when a user wants to combine many control types. The unified successors Uni-ControlNet and UniControl partly address this by sharing weights across modalities, but they have not displaced the original per-modality checkpoints in mainstream tools.

Quality of the generated image is bounded by the quality of the conditioning input. A noisy Canny edge map, an OpenPose skeleton with missing keypoints, or a depth map with halo artifacts will all propagate into the output. Sparse or ambiguous conditions, such as a very rough scribble or a low-resolution depth map, often fail to constrain the model strongly enough and produce images that drift away from the user's intent.

Residual artifacts can also arise from the locked-versus-trainable mismatch. Because only the encoder copy is fine-tuned, conditioning information has to be smuggled into the frozen decoder through the skip connections, which sometimes manifests as faint texture inconsistencies or color shifts at object boundaries. Practitioners often work around this by reducing the controlnet_conditioning_scale or by mixing ControlNet output with a plain text-only generation in latent space.

Finally, ControlNet's compute cost roughly doubles a base diffusion step, and using multiple ControlNets at once multiplies the overhead. Subsequent research, including ControlNet-XS, ControlLoRA, and ControlNeXt, has aimed at reducing this cost while preserving conditioning fidelity, but the basic two-branch architecture still defines the upper bound on how lightweight a ControlNet-style conditioner can be while remaining as expressive as the original ^[1]^[6].

References

Zhang, L., Rao, A., Agrawala, M. (2023). "Adding Conditional Control to Text-to-Image Diffusion Models." arXiv:2302.05543. https://arxiv.org/abs/2302.05543 ↩
Zhang, L., Rao, A., Agrawala, M. (2023). "Adding Conditional Control to Text-to-Image Diffusion Models." Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 2023, pp. 3836-3847. https://openaccess.thecvf.com/content/ICCV2023/html/Zhang_Adding_Conditional_Control_to_Text-to-Image_Diffusion_Models_ICCV_2023_paper.html ↩
"ControlNet best paper at ICCV23 (Marr Prize)," GitHub Discussion #552, lllyasviel/ControlNet repository. https://github.com/lllyasviel/ControlNet/discussions/552 ↩
Ho, J., Jain, A., Abbeel, P. (2020). "Denoising Diffusion Probabilistic Models." arXiv:2006.11239. https://arxiv.org/abs/2006.11239 ↩
lllyasviel (Lvmin Zhang). "ControlNet-v1-1-nightly" GitHub repository. https://github.com/lllyasviel/ControlNet-v1-1-nightly ↩
Stability AI. "Stability AI Control LoRAs" model card on Hugging Face. https://huggingface.co/stabilityai/control-lora ↩
Mou, C., Wang, X., Xie, L., Wu, Y., Zhang, J., Qi, Z., Shan, Y. (2023). "T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models." arXiv:2302.08453. https://arxiv.org/abs/2302.08453 ↩
Huang, L., Chen, D., Liu, Y., Shen, Y., Zhao, D., Zhou, J. (2023). "Composer: Creative and Controllable Image Synthesis with Composable Conditions." arXiv:2302.09778. https://arxiv.org/abs/2302.09778 ↩
Zhao, S., Chen, D., Chen, Y., Bao, J., Hao, S., Yuan, L., Wong, K. (2023). "Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models." NeurIPS 2023. arXiv:2305.16322. https://arxiv.org/abs/2305.16322 ↩
Wang, Q., Bai, X., Wang, H., Qin, Z., Chen, A. (2024). "InstantID: Zero-shot Identity-Preserving Generation in Seconds." arXiv:2401.07519. https://arxiv.org/abs/2401.07519 ↩
Mikubill. "sd-webui-controlnet: WebUI extension for ControlNet," GitHub repository. https://github.com/Mikubill/sd-webui-controlnet ↩
Hugging Face. "ControlNet" Diffusers library documentation. https://huggingface.co/docs/diffusers/using-diffusers/controlnet ↩
lllyasviel. "stable-diffusion-webui-forge" GitHub repository. https://github.com/lllyasviel/stable-diffusion-webui-forge ↩
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B. (2022). "High-Resolution Image Synthesis with Latent Diffusion Models." CVPR 2022. arXiv:2112.10752. https://arxiv.org/abs/2112.10752 ↩
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V. (2020). "Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer" (MiDaS). IEEE TPAMI. https://arxiv.org/abs/1907.01341
Cao, Z., Hidalgo, G., Simon, T., Wei, S., Sheikh, Y. (2019). "OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields." IEEE TPAMI. https://arxiv.org/abs/1812.08008

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

Art Artificial intelligence terms Best AI Image Generators Black Forest Labs ComfyUI DALL·E Custom GPTs DDIM (Denoising Diffusion Implicit Models)Diffusion model Flux (text-to-image model)IP-Adapter Image-to-Image Models Kaiber AI Latent Consistency Models (LCM)Prompt-to-Prompt Runwayml/stable-diffusion-v1-5 model SDXL (Stable Diffusion XL)Stable Diffusion Text-to-Image Models Würstchen

What problem does ControlNet solve?

How does ControlNet work?

What conditioning types does ControlNet support?

How is ControlNet trained?

What changed in ControlNet 1.1?

ControlNet for SDXL

How does ControlNet compare to similar methods?

What tools support ControlNet?

What is ControlNet used for?

Computational considerations

Reception and impact

Limitations

See also

References

Improve this article

Related Articles

Frechet Inception Distance

CycleGAN

StyleGAN

Ideogram 3.0

Nano Banana

Seedream

What links here

Related Articles

Frechet Inception Distance

CycleGAN

StyleGAN

Ideogram 3.0

Nano Banana

Seedream

What links here