StyleGAN
Last reviewed
May 1, 2026
Sources
16 citations
Review status
Source-backed
Revision
v1 · 3,841 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 1, 2026
Sources
16 citations
Review status
Source-backed
Revision
v1 · 3,841 words
Add missing citations, update stale details, or suggest a clearer explanation.
StyleGAN is a family of generative adversarial network architectures developed by NVIDIA Research for high-quality unconditional image synthesis. Its signature contribution is a style-based generator that injects controllable style information at multiple resolutions through a separate mapping network, producing photorealistic images with disentangled, editable attributes such as pose, hair, skin texture, and background. The first StyleGAN paper appeared as an arXiv preprint on December 12, 2018 and was published at CVPR 2019, with successor models StyleGAN2 (2020), StyleGAN2-ADA (2020), and StyleGAN3 (2021) extending the family. For several years after its release, StyleGAN defined the state of the art for unconditional photorealistic image generation, especially of human faces, and its publicly released checkpoints powered viral demonstrations such as ThisPersonDoesNotExist.com.
StyleGAN belongs to the broader family of generative adversarial networks introduced by Ian Goodfellow and colleagues in 2014. Where prior face-generation GAN work tended to produce 256x256 or smaller images with visible artifacts, StyleGAN pushed unconditional sampling to a 1024x1024 resolution with high enough fidelity that human raters routinely failed to distinguish synthetic faces from real photographs.
StyleGAN was developed at NVIDIA Research, primarily out of the Helsinki office. The lead author of the original paper is Tero Karras, a senior research scientist at NVIDIA who also led the predecessor Progressive Growing of GANs (PGGAN), the FFHQ dataset release, StyleGAN2, and StyleGAN3. The original paper credits Tero Karras, Samuli Laine, and Timo Aila. Subsequent papers in the series added co-authors Miika Aittala, Janne Hellsten, Jaakko Lehtinen (joint with Aalto University), and Erik Härkönen.
The StyleGAN family was released in stages between 2018 and 2021:
| Version | First release | Venue | Lead authors |
|---|---|---|---|
| PGGAN (predecessor) | October 2017 (arXiv) | ICLR 2018 | Karras, Aila, Laine, Lehtinen |
| StyleGAN | December 12, 2018 (arXiv); code February 2019 | CVPR 2019 | Karras, Laine, Aila |
| StyleGAN2 | December 3, 2019 (arXiv); code February 5, 2020 | CVPR 2020 | Karras, Laine, Aittala, Hellsten, Lehtinen, Aila |
| StyleGAN2-ADA | June 11, 2020 (arXiv) | NeurIPS 2020 | Karras, Aittala, Hellsten, Laine, Lehtinen, Aila |
| StyleGAN3 | June 23, 2021 (arXiv); code October 12, 2021 | NeurIPS 2021 | Karras, Aittala, Laine, Härkönen, Hellsten, Lehtinen, Aila |
StyleGAN inherits much of its training recipe from Progressive Growing of GANs (PGGAN), introduced by Karras, Aila, Laine, and Lehtinen in October 2017 and presented at ICLR 2018. PGGAN trained a generator and a discriminator with matching architectures starting at 4x4 pixels, then doubled the resolution in stages by inserting new convolutional blocks and fading them in smoothly. The result was the first published method to reliably synthesize 1024x1024 photorealistic faces, trained on a curated CelebA-HQ subset. PGGAN also introduced minibatch standard deviation as a discriminator trick to improve sample diversity and an equalised learning rate that became standard in later NVIDIA GAN papers.
StyleGAN inherits PGGAN's resolution-doubling schedule, the equalised learning rate, and the leaky-ReLU non-linearities. It departs from PGGAN by replacing the conventional generator (which feeds the latent vector directly into the first convolution) with a style-based generator that decouples the latent code from the synthesis pipeline.
The original 2019 paper describes a generator with two distinct sub-networks:
In AdaIN, each feature map is first normalized to zero mean and unit variance per sample, then scaled and shifted by the affine-projected style. Because the same w is delivered at every layer, but each layer's affine transform is independent, the network learns to use coarse resolutions (4x4 to 8x8) for global attributes such as pose and face shape, middle resolutions (16x16 to 32x32) for features such as hairstyle and eye openness, and fine resolutions (64x64 to 1024x1024) for color scheme and micro-texture.
Four additional components round out the design:
On FFHQ at 1024x1024, the released StyleGAN model achieves an FID50k of 4.40 (4.4159 in the official repository), compared to 8.04 for the PGGAN baseline at the same resolution. The paper also introduced two new disentanglement metrics, perceptual path length (PPL) and linear separability, which both showed measurable gains over the baseline.
The second paper, Analyzing and Improving the Image Quality of StyleGAN, was released as an arXiv preprint on December 3, 2019 and the code on February 5, 2020. It catalogues several artifacts in StyleGAN1 and proposes architectural fixes:
Training Generative Adversarial Networks with Limited Data (Karras, Aittala, Hellsten, Laine, Lehtinen, Aila), released on June 11, 2020 and presented at NeurIPS 2020, introduced Adaptive Discriminator Augmentation (ADA). The discriminator is fed strongly augmented images (color jitter, geometric warps, cutout, and similar transforms), and the augmentation strength p is adjusted on the fly using the discriminator's output statistics as a proxy for overfitting. The generator never sees the augmented pixels directly; the augmentation only affects the loss signal.
The practical impact is that StyleGAN2-ADA can train competitively with as few as 1,000 to 10,000 images, where the original StyleGAN2 collapses. Reported results include cutting the CIFAR-10 unconditional FID record from 5.59 to 2.42 and producing strong face models on the MetFaces dataset (1,336 art-museum portraits) and the AFHQ dataset (animal faces, three categories). The official PyTorch port, NVlabs/stylegan2-ada-pytorch, has become the de facto reference implementation for the entire family.
Alias-Free Generative Adversarial Networks (Karras, Aittala, Laine, Härkönen, Hellsten, Lehtinen, Aila), released on June 23, 2021 with code on October 12, 2021, addresses an issue informally known as texture sticking: when the latent code w is animated smoothly, fine details such as beard hairs or pores appear glued to image-coordinate positions instead of tracking the underlying face. The authors trace this to aliasing introduced by pointwise non-linearities (ReLU, leaky ReLU) and naive upsampling, both of which violate signal-processing assumptions about band-limited continuous signals.
The StyleGAN3 generator treats every intermediate feature map as a continuous 2D signal sampled on a regular grid. Each operation that could introduce aliasing is wrapped in an upsample-filter-nonlinearity-filter-downsample sandwich using carefully designed Kaiser low-pass filters. Two trained variants are released:
FID is roughly comparable to StyleGAN2 (slightly worse on a few benchmarks, slightly better on others), but the equivariance metrics EQ-T, EQ-Tfrac, and EQ-R show large improvements. The practical payoff is much smoother latent-space animation and video, which made StyleGAN3 the preferred choice for animated avatars and video synthesis pipelines built on the family.
Several follow-ups extend the family beyond NVIDIA Research:
| Model | Year | Authors | Contribution |
|---|---|---|---|
| StyleGAN-XL | SIGGRAPH 2022 | Axel Sauer, Katja Schwarz, Andreas Geiger (University of Tübingen) | First class-conditional StyleGAN3 trained successfully on full ImageNet at up to 1024x1024 using projected GAN losses and progressive growing of synthesis stages. |
| StyleGAN-V | CVPR 2022 | Ivan Skorokhodov, Sergey Tulyakov, Mohamed Elhoseiny | Treats videos as continuous-time signals and adds a temporal mapping branch to StyleGAN2; trained on 1024x1024 video clips. |
| StyleGAN-T | ICML 2023 | Axel Sauer, Tero Karras, Samuli Laine, Andreas Geiger, Timo Aila | Scales the StyleGAN architecture to text-to-image generation on LAION-style datasets, generating 512x512 images in under 0.1 seconds. |
| GigaGAN | CVPR 2023 | Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman, Sylvain Paris, Taesung Park (Adobe / POSTECH / CMU) | Demonstrates that a billion-parameter GAN with text conditioning, attention layers, and adaptive kernel selection can match diffusion-model quality on text-to-image while running orders of magnitude faster. Builds on StyleGAN's mapping network and modulated convolutions. |
Research on StyleGAN inversion and editing distinguishes several latent spaces:
| Space | Dimensionality | Origin | Notes |
|---|---|---|---|
| Z | 512 per sample | Input Gaussian noise | Entangled; less useful for editing. |
| W | 512 per sample | Output of the mapping network | More disentangled than Z; the canonical latent space. |
| W+ | 512 per layer (so 18 x 512 for 1024x1024) | Per-layer copies of W used in image inversion | Far more expressive; most pSp, e4e, and ReStyle inversions live here. |
| S | 9,088 dimensions in StyleGAN2 1024x1024 | Per-channel style scalars after the affine transforms | Identified by Wu, Lischinski, and Shechtman in StyleSpace; even more disentangled than W+. |
For a small random direction y in W, path length regularisation penalises the squared difference between the magnitude of the Jacobian-vector product (the change in the VGG feature image) and a running average a. Formally,
L_pl = E_{w, y} [ (||J_w^T * y|| - a)^2 ]
where J_w is the Jacobian of the synthesis network with respect to w, the perturbation y is drawn from a unit-variance Gaussian over pixels, and a is updated as an exponential moving average of the observed magnitudes. The effect is that walking a fixed distance in W produces a roughly fixed perceptual change in image space, a property that is critical for clean latent-space editing.
StyleGAN3 measures translation equivariance by translating the input style by a fractional pixel and checking whether the output image translates by the same amount, computed in the PSNR sense between the translated and re-rendered images (EQ-T and EQ-Tfrac). Rotation equivariance (EQ-R) does the same for rotations. StyleGAN3-R reaches PSNR values above 60 dB on these metrics, compared to roughly 20 dB for StyleGAN2.
StyleGAN papers and downstream work primarily use the following datasets:
| Dataset | Size | Resolution | Domain |
|---|---|---|---|
| FFHQ | 70,000 | 1024x1024 | Human faces, Flickr-sourced, MTurk-filtered |
| CelebA-HQ | 30,000 | 1024x1024 | Curated celebrity faces, used in PGGAN |
| LSUN Bedrooms | 3 million | 256x256 | Indoor scenes |
| LSUN Cars | 5.5 million | 512x384 | Cars |
| LSUN Cats | 1.6 million | 256x256 | Cats |
| LSUN Churches | 126,000 | 256x256 | Outdoor church facades |
| AFHQ | 15,000 | 512x512 | Animal faces (cats, dogs, wildlife) |
| MetFaces | 1,336 | 1024x1024 | Art-museum portraits, used to demonstrate StyleGAN2-ADA |
| BreCaHAD | ~160 ROIs | 1360x1024 | Histopathology, used for ADA |
FID (Frechet Inception Distance) numbers reported by the original papers on FFHQ at 1024x1024:
| Model | FID50k on FFHQ-1024 | Path length (PPL) | Notes |
|---|---|---|---|
| PGGAN (Karras et al. 2018) | 8.04 | high | Baseline before style injection |
| StyleGAN1 (config F) | 4.40 | reduced vs PGGAN | First style-based generator |
| StyleGAN2 (config F) | 2.84 | sharply reduced vs StyleGAN1 | Weight demodulation, no progressive growing |
| StyleGAN2-ADA | 2.92 with full data; remains usable on 5,000 images | comparable to StyleGAN2 | Adaptive discriminator augmentation |
| StyleGAN3-T | 2.79 | comparable | Translation-equivariant |
| StyleGAN3-R | 3.07 | comparable | Translation- and rotation-equivariant |
StyleGAN2-ADA also set a CIFAR-10 unconditional FID record of 2.42 (down from 5.59) and class-conditional FID of 2.10. On limited-data benchmarks such as MetFaces, StyleGAN2-ADA reached an FID of 18.22 against 57.26 for plain StyleGAN2.
Precision and recall, in the sense of Kynkaanniemi et al. 2019, are also widely reported. StyleGAN2 substantially improves recall over StyleGAN1, indicating that more of the training distribution is covered, not merely a sharper subset.
The main public-facing uses of StyleGAN cluster in a few categories:
| Application area | Examples |
|---|---|
| Synthetic faces | ThisPersonDoesNotExist.com (Phillip Wang, February 2019), Generated Photos stock photo library, Which Face Is Real (University of Washington), and dozens of artist sites |
| Game and avatar creation | Procedural NPC faces, virtual influencers, customizable avatars in indie games |
| Art and design | NFT projects in 2021, Refik Anadol installations, MetFaces-style artistic portraits, Anna Ridler's tulip series |
| Data augmentation | Synthetic face datasets for face recognition training; has been studied at face-recognition vendors and in academic papers |
| Privacy | Anonymisation by replacing real faces in photos with StyleGAN-generated ones whose features differ; e.g., DeepPrivacy and follow-ups |
| Image editing via inversion | Project a real photo into W+ using e4e, pSp, or ReStyle, then edit with InterfaceGAN, GANSpace, StyleSpace, or StyleCLIP |
| Domain adaptation | StyleGAN-NADA uses CLIP losses to retarget a face model to cartoons, sketches, or other styles without paired data; MyStyle personalises the generator to a single subject |
| Toonification, age progression | Pixel2Style2Pixel and Toonify (Justin Pinkney) layer two StyleGAN checkpoints to swap face style between domains |
| Anime / illustration | Specialised StyleGAN2 fine-tunes such as Gwern's Anime Faces project and the early Waifu Labs models |
| Medical imaging | Synthetic histopathology, retinal images, and chest X-rays for augmentation and de-identification, using StyleGAN2-ADA |
| Video and animation | StyleGAN-V for continuous video, MoCoGAN-HD, and StyleGAN3-driven facial animation pipelines |
The ThisPersonDoesNotExist.com launch in mid-February 2019 is widely cited as the moment that brought GAN-generated faces into mainstream awareness. Phillip Wang, then a software engineer at Uber, had been studying AI on his own for six months and posted the site on a Facebook AI group on February 11, 2019, only days after NVIDIA released the StyleGAN code. Within weeks the site had received millions of visitors and prompted news coverage in CNN, The Verge, and Inverse.
The StyleGAN papers have been cited tens of thousands of times on Google Scholar, with StyleGAN1 alone above ten thousand citations within a few years of publication. The mapping network and modulated convolution have been adopted in unrelated architectures including text-to-image GANs (GigaGAN), super-resolution networks, and even some diffusion model decoders.
The broader research subfields of GAN inversion and latent-space image editing effectively grew out of StyleGAN. The StyleSpace paper (Wu, Lischinski, Shechtman 2021) and InterfaceGAN (Shen et al. 2020) have themselves spawned hundreds of follow-ups. StyleGAN3 remains the preferred backbone for video and avatar work in 2024 and 2025 because no diffusion model has matched its temporal smoothness without much heavier compute.
| Model | Year | Max resolution | Best reported FID | Key innovation |
|---|---|---|---|---|
| DCGAN (Radford et al.) | 2015 | 64x64 | n/a | First convolutional GAN that trained reliably |
| WGAN (Arjovsky et al.) | 2017 | 64x64 | n/a | Wasserstein loss for stable training |
| PGGAN (Karras et al.) | 2017 | 1024x1024 | 8.04 (FFHQ) | Progressive resolution growing |
| BigGAN (Brock et al.) | 2018 | 512x512 | 7.4 (ImageNet) | Class-conditional, very large batches |
| StyleGAN1 | 2018 | 1024x1024 | 4.40 (FFHQ) | Style-based generator with mapping network and AdaIN |
| StyleGAN2 | 2019 | 1024x1024 | 2.84 (FFHQ) | Weight demodulation, path length regularisation |
| StyleGAN2-ADA | 2020 | 1024x1024 | 2.42 (CIFAR-10) | Adaptive discriminator augmentation |
| StyleGAN3 | 2021 | 1024x1024 | 2.79 (FFHQ) | Alias-free, equivariant generator |
| StyleGAN-XL | 2022 | 1024x1024 | 2.30 (ImageNet) | Projected losses, conditional training on ImageNet |
| GigaGAN | 2023 | 1024x1024 | 9.09 (COCO 2014) | Billion-parameter text-conditional GAN |
The original StyleGAN1 code on NVlabs/stylegan was released under a Creative Commons BY-NC 4.0 license, restricting it to non-commercial use. StyleGAN2 was released under the NVIDIA Source Code License, also research-only. StyleGAN2-ADA and StyleGAN3 use the same NVIDIA Source Code License. The FFHQ dataset is distributed under Creative Commons BY-NC-SA 4.0 by NVIDIA Corporation, with each individual image retaining the license under which it was originally posted to Flickr.
The community has produced several permissively licensed re-implementations, but the official NVlabs checkpoints inherit the non-commercial restriction. This has been a recurring issue in commercial deployments of face-generation tools and is one reason that some downstream products (especially in advertising) train fresh checkpoints on properly licensed data.