StyleGAN

Computer Vision Generative AI Image Generation

22 min read

Updated Jun 24, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 24, 2026

Fact-checked

In review queue

Sources

16 citations

Revision

v2 · 4,393 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

StyleGAN is a family of style-based generative adversarial network (GAN) architectures developed by NVIDIA Research for high-quality unconditional image synthesis, best known for generating photorealistic human faces at 1024x1024 resolution. Its signature contribution is a style-based generator that injects controllable style information at multiple resolutions through a separate mapping network, producing images with disentangled, editable attributes such as pose, hair, skin texture, and background. The original paper, "A Style-Based Generator Architecture for Generative Adversarial Networks" by Tero Karras, Samuli Laine, and Timo Aila, appeared as an arXiv preprint on December 12, 2018 and was published at CVPR 2019 (pages 4401-4410). ^[1] Its authors state the architecture yields "an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair)." ^[1] Successor models StyleGAN2 (2020), StyleGAN2-ADA (2020), and StyleGAN3 (2021) extended the family, and for several years StyleGAN defined the state of the art for photorealistic face generation. Its publicly released checkpoints powered viral demonstrations such as ThisPersonDoesNotExist.com. ^[15]^[16]

StyleGAN belongs to the broader family of generative adversarial networks introduced by Ian Goodfellow and colleagues in 2014. Where prior face-generation GAN work tended to produce 256x256 or smaller images with visible artifacts, StyleGAN pushed unconditional sampling to 1024x1024 resolution with high enough fidelity that human raters routinely failed to distinguish synthetic faces from real photographs.

Who created StyleGAN, and when was it released?

StyleGAN was developed at NVIDIA Research, primarily out of the Helsinki office. The lead author of the original paper is Tero Karras, a senior research scientist at NVIDIA who also led the predecessor Progressive Growing of GANs (PGGAN), the FFHQ dataset release, StyleGAN2, and StyleGAN3. The original paper credits Tero Karras, Samuli Laine, and Timo Aila. ^[1] Subsequent papers in the series added co-authors Miika Aittala, Janne Hellsten, Jaakko Lehtinen (joint with Aalto University), and Erik Harkonen. ^[2]^[3]^[4]

The StyleGAN family was released in stages between 2018 and 2021:

Version	First release	Venue	Lead authors
PGGAN (predecessor)	October 2017 (arXiv)	ICLR 2018	Karras, Aila, Laine, Lehtinen
StyleGAN	December 12, 2018 (arXiv); code February 2019	CVPR 2019	Karras, Laine, Aila
StyleGAN2	December 3, 2019 (arXiv); code February 5, 2020	CVPR 2020	Karras, Laine, Aittala, Hellsten, Lehtinen, Aila
StyleGAN2-ADA	June 11, 2020 (arXiv)	NeurIPS 2020	Karras, Aittala, Hellsten, Laine, Lehtinen, Aila
StyleGAN3	June 23, 2021 (arXiv); code October 12, 2021	NeurIPS 2021	Karras, Aittala, Laine, Harkonen, Hellsten, Lehtinen, Aila

The original paper was distributed in December 2018 and its source code was made publicly available in February 2019; StyleGAN3 was introduced on June 23, 2021 with code released on October 12, 2021. ^[1]^[4]^[15]

Predecessor: progressive growing of GANs

StyleGAN inherits much of its training recipe from Progressive Growing of GANs (PGGAN), introduced by Karras, Aila, Laine, and Lehtinen in October 2017 and presented at ICLR 2018. ^[5] PGGAN trained a generator and a discriminator with matching architectures starting at 4x4 pixels, then doubled the resolution in stages by inserting new convolutional blocks and fading them in smoothly. The result was the first published method to reliably synthesize 1024x1024 photorealistic faces, trained on a curated CelebA-HQ subset. PGGAN also introduced minibatch standard deviation as a discriminator trick to improve sample diversity and an equalised learning rate that became standard in later NVIDIA GAN papers.

StyleGAN inherits PGGAN's resolution-doubling schedule, the equalised learning rate, and the leaky-ReLU non-linearities. It departs from PGGAN by replacing the conventional generator (which feeds the latent vector directly into the first convolution) with a style-based generator that decouples the latent code from the synthesis pipeline. ^[1]

How does the StyleGAN1 generator work?

The original 2019 paper describes a generator with two distinct sub-networks: ^[1]

Mapping network. An eight-layer multi-layer perceptron (MLP) that maps a 512-dimensional latent vector z, drawn from a standard normal distribution N(0, I) in the input latent space Z, to an intermediate latent code w in a learned space W of the same dimensionality.
Synthesis network. A convolutional network that starts from a learned 4x4x512 constant tensor and progressively upsamples to 1024x1024. At each resolution, the intermediate code w is broadcast through a learned affine transformation to produce per-channel scale and bias values that modulate the activations through Adaptive Instance Normalization (AdaIN).

In AdaIN, each feature map is first normalized to zero mean and unit variance per sample, then scaled and shifted by the affine-projected style. Because the same w is delivered at every layer, but each layer's affine transform is independent, the network learns to use coarse resolutions (4x4 to 8x8) for global attributes such as pose and face shape, middle resolutions (16x16 to 32x32) for features such as hairstyle and eye openness, and fine resolutions (64x64 to 1024x1024) for color scheme and micro-texture. ^[1]

Four additional components round out the design: ^[1]

Per-pixel noise injection. At each layer of the synthesis network, a single-channel image of uncorrelated Gaussian noise is broadcast across feature maps with learned per-feature scaling factors and added to the activations after the convolution. This separates stochastic detail (freckles, individual hair strands, pores) from the deterministic style controlled by w.
Style mixing regularisation. During training, two latent codes z1 and z2 are sampled and mapped to w1 and w2. A random crossover point splits the network so that w1 is used at layers below the crossover and w2 above. This prevents the synthesis network from assuming that adjacent resolution scales correspond to the same w, which improves disentanglement and enables the well-known style-mixing visualisations at inference time.
Truncation trick. At sampling time, w is replaced by w_truncated = w_avg + psi * (w - w_avg), where w_avg is the running mean of w over training and psi in [0, 1] is a tunable scalar. Lower psi produces a more typical, higher-quality face at the cost of diversity. Karras et al. report psi around 0.7 as a common choice.
Released training set: FFHQ. The paper introduced Flickr-Faces-HQ (FFHQ), 70,000 high-quality 1024x1024 PNGs crawled from Flickr under permissive licenses, automatically aligned and cropped with dlib, and human-filtered through Amazon Mechanical Turk to remove statues and photos of photos. ^[1]^[9] FFHQ has substantially more variation in age, ethnicity, accessories (eyeglasses, hats), and background than the older CelebA-HQ benchmark.

On FFHQ at 1024x1024, the released StyleGAN model achieves an FID50k of 4.40 (4.4159 in the official repository), compared to 8.04 for the PGGAN baseline at the same resolution. ^[1]^[6] The paper also introduced two new disentanglement metrics, perceptual path length (PPL) and linear separability, which both showed measurable gains over the baseline. ^[1]

What did StyleGAN2 (2020) change?

The second paper, Analyzing and Improving the Image Quality of StyleGAN, was released as an arXiv preprint on December 3, 2019 and the code on February 5, 2020. ^[2] Its authors summarise the work as follows: "We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images." ^[2] The paper catalogues several artifacts in StyleGAN1 and proposes architectural fixes:

Blob artifacts. StyleGAN1 occasionally produced bright droplet-like blobs that, while not always visible in the output, appear in all feature maps starting from the 64x64 resolution; the authors trace them to AdaIN's per-sample normalization. The fix replaces AdaIN entirely with weight modulation and demodulation, often called modconv. Instead of normalizing activations, the convolution kernel itself is multiplied by the per-channel style scale before being applied, then renormalized so each output feature map has unit standard deviation in expectation. This removes the need to renormalize activations, which eliminated the droplet artifact. ^[2]
Phase artifacts and texture sticking to coordinates. StyleGAN1 sometimes generated teeth or earrings that snapped to absolute pixel positions instead of moving with the face during interpolation. StyleGAN2 partially addresses this by removing progressive growing and replacing it with a single end-to-end network using either skip connections in the generator and residual connections in the discriminator (configuration F, the default), or vice versa. ^[2]
Path length regularisation. A new regulariser encourages a fixed-size step in W to produce a fixed-magnitude change in the generated image as measured by VGG features. The authors note this "path length regularizer yields the additional benefit that the generator becomes significantly easier to invert," which is critical for downstream image editing. ^[2]
Lazy regularisation. Both the R1 gradient penalty on the discriminator and path length regularisation on the generator are evaluated only every 16 minibatches, with no measurable quality drop and meaningful speed gains. ^[2]
Larger model. Configuration F doubles the number of feature maps in the highest-resolution layers, raising the parameter count and pushing FFHQ-1024 FID50k down to 2.84 from StyleGAN1's 4.40. ^[2]

How does StyleGAN2-ADA train on limited data?

Training Generative Adversarial Networks with Limited Data (Karras, Aittala, Hellsten, Laine, Lehtinen, Aila), released on June 11, 2020 and presented at NeurIPS 2020, introduced Adaptive Discriminator Augmentation (ADA). ^[3] The discriminator is fed strongly augmented images (color jitter, geometric warps, cutout, and similar transforms), and the augmentation strength p is adjusted on the fly using the discriminator's output statistics as a proxy for overfitting. The generator never sees the augmented pixels directly; the augmentation only affects the loss signal. The paper's central claim is that "good results are now possible using only a few thousand training images, often matching StyleGAN2 results with an order of magnitude fewer images." ^[3]

The practical impact is that StyleGAN2-ADA can train competitively with as few as 1,000 to 10,000 images, where the original StyleGAN2 collapses. The authors also report that the widely used CIFAR-10 is in fact a limited-data benchmark and "improve the record FID from 5.59 to 2.42," and produce strong face models on the MetFaces dataset (1,336 art-museum portraits) and the AFHQ dataset (animal faces, three categories). ^[3] The official PyTorch port, NVlabs/stylegan2-ada-pytorch, has become the de facto reference implementation for the entire family. ^[7]

What problem does StyleGAN3 (2021) solve?

Alias-Free Generative Adversarial Networks (Karras, Aittala, Laine, Harkonen, Hellsten, Lehtinen, Aila), released on June 23, 2021 with code on October 12, 2021, addresses an issue informally known as texture sticking. ^[4] The authors observe that "the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner," manifesting as "detail appearing to be glued to image coordinates instead of the surfaces of depicted objects." ^[4] When the latent code w is animated smoothly, fine details such as beard hairs or pores appear glued to image-coordinate positions instead of tracking the underlying face. The authors trace this to aliasing introduced by pointwise non-linearities (ReLU, leaky ReLU) and naive upsampling, both of which violate signal-processing assumptions about band-limited continuous signals.

The StyleGAN3 generator treats every intermediate feature map as a continuous 2D signal sampled on a regular grid. Each operation that could introduce aliasing is wrapped in an upsample-filter-nonlinearity-filter-downsample sandwich using carefully designed Kaiser low-pass filters. Two trained variants are released: ^[4]

StyleGAN3-T. Translation-equivariant. Sub-pixel translations of the input style produce matching sub-pixel translations of the output image.
StyleGAN3-R. Translation- and rotation-equivariant. The 2D feature grids are wrapped in disk-shaped Fourier-feature representations so that rotations are also exact.

The resulting networks "match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales." ^[4] The equivariance metrics EQ-T, EQ-Tfrac, and EQ-R show large improvements, while FID is roughly comparable to StyleGAN2 (slightly worse on a few benchmarks, slightly better on others). The authors note the results "pave the way for generative models better suited for video and animation," which made StyleGAN3 the preferred choice for animated avatars and video synthesis pipelines built on the family. ^[4]

StyleGAN-XL, StyleGAN-T, StyleGAN-V, and GigaGAN

Several follow-ups extend the family beyond NVIDIA Research:

Model	Year	Authors	Contribution
StyleGAN-XL	SIGGRAPH 2022	Axel Sauer, Katja Schwarz, Andreas Geiger (University of Tubingen)	First class-conditional StyleGAN3 trained successfully on full ImageNet at up to 1024x1024 using projected GAN losses and progressive growing of synthesis stages. ^[10]
StyleGAN-V	CVPR 2022	Ivan Skorokhodov, Sergey Tulyakov, Mohamed Elhoseiny	Treats videos as continuous-time signals and adds a temporal mapping branch to StyleGAN2; trained on 1024x1024 video clips. ^[12]
StyleGAN-T	ICML 2023	Axel Sauer, Tero Karras, Samuli Laine, Andreas Geiger, Timo Aila	Scales the StyleGAN architecture to text-to-image generation on LAION-style datasets, generating 512x512 images in under 0.1 seconds. ^[11]
GigaGAN	CVPR 2023	Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman, Sylvain Paris, Taesung Park (Adobe / POSTECH / CMU)	Demonstrates that a billion-parameter GAN with text conditioning, attention layers, and adaptive kernel selection can match diffusion-model quality on text-to-image while running orders of magnitude faster. Builds on StyleGAN's mapping network and modulated convolutions. ^[13]

Architecture details

What are the W and W+ latent spaces?

Research on StyleGAN inversion and editing distinguishes several latent spaces:

Space	Dimensionality	Origin	Notes
Z	512 per sample	Input Gaussian noise	Entangled; less useful for editing.
W	512 per sample	Output of the mapping network	More disentangled than Z; the canonical latent space.
W+	512 per layer (so 18 x 512 for 1024x1024)	Per-layer copies of W used in image inversion	Far more expressive; most pSp, e4e, and ReStyle inversions live here.
S	9,088 dimensions in StyleGAN2 1024x1024	Per-channel style scalars after the affine transforms	Identified by Wu, Lischinski, and Shechtman in StyleSpace; even more disentangled than W+. ^[14]

Path length regularisation

For a small random direction y in W, path length regularisation penalises the squared difference between the magnitude of the Jacobian-vector product (the change in the VGG feature image) and a running average a. Formally,

L_pl = E_{w, y} [ (||J_w^T * y|| - a)^2 ]

where J_w is the Jacobian of the synthesis network with respect to w, the perturbation y is drawn from a unit-variance Gaussian over pixels, and a is updated as an exponential moving average of the observed magnitudes. The effect is that walking a fixed distance in W produces a roughly fixed perceptual change in image space, a property that is critical for clean latent-space editing. ^[2]

Equivariance constraints

StyleGAN3 measures translation equivariance by translating the input style by a fractional pixel and checking whether the output image translates by the same amount, computed in the PSNR sense between the translated and re-rendered images (EQ-T and EQ-Tfrac). Rotation equivariance (EQ-R) does the same for rotations. StyleGAN3-R reaches PSNR values above 60 dB on these metrics, compared to roughly 20 dB for StyleGAN2. ^[4]

What datasets is StyleGAN trained on?

StyleGAN papers and downstream work primarily use the following datasets:

Dataset	Size	Resolution	Domain
FFHQ	70,000	1024x1024	Human faces, Flickr-sourced, MTurk-filtered
CelebA-HQ	30,000	1024x1024	Curated celebrity faces, used in PGGAN
LSUN Bedrooms	3 million	256x256	Indoor scenes
LSUN Cars	5.5 million	512x384	Cars
LSUN Cats	1.6 million	256x256	Cats
LSUN Churches	126,000	256x256	Outdoor church facades
AFHQ	15,000	512x512	Animal faces (cats, dogs, wildlife)
MetFaces	1,336	1024x1024	Art-museum portraits, used to demonstrate StyleGAN2-ADA
BreCaHAD	~160 ROIs	1360x1024	Histopathology, used for ADA

The FFHQ dataset itself is distributed under a Creative Commons BY-NC-SA 4.0 license by NVIDIA, while each individual photo retains the Creative Commons or public-domain license under which it was originally posted to Flickr (CC BY 2.0, CC BY-NC 2.0, Public Domain Mark 1.0, CC0 1.0, or U.S. Government Works). ^[9]

How good is StyleGAN? (FID and other metrics)

FID (Frechet Inception Distance) numbers reported by the original papers on FFHQ at 1024x1024:

Model	FID50k on FFHQ-1024	Path length (PPL)	Notes
PGGAN (Karras et al. 2018)	8.04	high	Baseline before style injection
StyleGAN1 (config F)	4.40	reduced vs PGGAN	First style-based generator
StyleGAN2 (config F)	2.84	sharply reduced vs StyleGAN1	Weight demodulation, no progressive growing
StyleGAN2-ADA	2.92 with full data; remains usable on 5,000 images	comparable to StyleGAN2	Adaptive discriminator augmentation
StyleGAN3-T	2.79	comparable	Translation-equivariant
StyleGAN3-R	3.07	comparable	Translation- and rotation-equivariant

StyleGAN2-ADA also set a CIFAR-10 unconditional FID record of 2.42 (down from 5.59) and class-conditional FID of 2.10. ^[3] On limited-data benchmarks such as MetFaces, StyleGAN2-ADA reached an FID of 18.22 against 57.26 for plain StyleGAN2. ^[3]

Precision and recall, in the sense of Kynkaanniemi et al. 2019, are also widely reported. StyleGAN2 substantially improves recall over StyleGAN1, indicating that more of the training distribution is covered, not merely a sharper subset.

What is StyleGAN used for?

The main public-facing uses of StyleGAN cluster in a few categories:

Application area	Examples
Synthetic faces	ThisPersonDoesNotExist.com (Phillip Wang, February 2019), Generated Photos stock photo library, Which Face Is Real (University of Washington), and dozens of artist sites
Game and avatar creation	Procedural NPC faces, virtual influencers, customizable avatars in indie games
Art and design	NFT projects in 2021, Refik Anadol installations, MetFaces-style artistic portraits, Anna Ridler's tulip series
Data augmentation	Synthetic face datasets for face recognition training; has been studied at face-recognition vendors and in academic papers
Privacy	Anonymisation by replacing real faces in photos with StyleGAN-generated ones whose features differ; e.g., DeepPrivacy and follow-ups
Image editing via inversion	Project a real photo into W+ using e4e, pSp, or ReStyle, then edit with InterfaceGAN, GANSpace, StyleSpace, or StyleCLIP
Domain adaptation	StyleGAN-NADA uses CLIP losses to retarget a face model to cartoons, sketches, or other styles without paired data; MyStyle personalises the generator to a single subject
Toonification, age progression	Pixel2Style2Pixel and Toonify (Justin Pinkney) layer two StyleGAN checkpoints to swap face style between domains
Anime / illustration	Specialised StyleGAN2 fine-tunes such as Gwern's Anime Faces project and the early Waifu Labs models
Medical imaging	Synthetic histopathology, retinal images, and chest X-rays for augmentation and de-identification, using StyleGAN2-ADA
Video and animation	StyleGAN-V for continuous video, MoCoGAN-HD, and StyleGAN3-driven facial animation pipelines

StyleGAN-generated faces also became a reference point in discussions of deepfakes and synthetic media, since the same fidelity that makes a sampled face convincing can be misused for fake profiles and disinformation.

The ThisPersonDoesNotExist.com launch in mid-February 2019 is widely cited as the moment that brought GAN-generated faces into mainstream awareness. Phillip Wang, then a software engineer at Uber, had been studying AI on his own for six months and posted the site on a Facebook AI group on February 11, 2019, only days after NVIDIA released the StyleGAN code. ^[15]^[16] Wang told Inverse the project was meant "to raise awareness for this technology." ^[15] CNN reported in February 2019 that the site, which displays a freshly generated face on each page reload, had drawn millions of visitors within weeks and prompted coverage in outlets including CNN, The Verge, and Inverse. ^[16]

Strengths and limitations

Strengths

Image quality. Held the state of the art for unconditional image generation from 2019 through roughly 2022.
Disentangled latents. W and S spaces support clean attribute editing, latent traversals, and style mixing without retraining.
Fast inference. A single forward pass of the synthesis network produces a 1024x1024 image in milliseconds on a modern GPU, compared with 25 to 50 denoising steps for a typical diffusion model.
Large open-source ecosystem. Pretrained checkpoints exist for FFHQ, MetFaces, AFHQv2, LSUN classes, and dozens of fine-tunes; PyTorch, TensorFlow, and JAX implementations are available.
Strong inversion methods. Techniques such as pSp, e4e, ReStyle, HyperInverter, and PTI allow real images to be edited in latent space.

Limitations

Unconditional by default. Vanilla StyleGAN is not text-conditioned; pairing with CLIP or training a text-conditional variant such as StyleGAN-T or GigaGAN is required for prompt-driven generation.
Mode collapse on diverse data. StyleGAN performs poorly on highly diverse datasets such as ImageNet without the larger backbones and projected losses introduced in StyleGAN-XL.
Sensitivity to alignment. StyleGAN-quality faces on FFHQ depend on the consistent dlib alignment used in the training data; non-aligned web photos require pre-processing.
Maximum resolution. 1024x1024 is the largest commonly published resolution; higher resolutions require careful engineering and substantial compute.
Largely supplanted for general-purpose generation. Diffusion models such as Stable Diffusion, DALL-E 2, and Imagen have taken over text-to-image work, although GANs remain competitive on speed and on narrow domains such as faces and animation.

How influential was StyleGAN?

The StyleGAN papers have been cited tens of thousands of times on Google Scholar, with StyleGAN1 alone above ten thousand citations within a few years of publication. The mapping network and modulated convolution have been adopted in unrelated architectures including text-to-image GANs (GigaGAN), super-resolution networks, and even some diffusion model decoders.

The broader research subfields of GAN inversion and latent-space image editing effectively grew out of StyleGAN. The StyleSpace paper (Wu, Lischinski, Shechtman 2021) and InterfaceGAN (Shen et al. 2020) have themselves spawned hundreds of follow-ups. ^[14] StyleGAN3 remains the preferred backbone for video and avatar work in 2024 and 2025 because no diffusion model has matched its temporal smoothness without much heavier compute.

How does StyleGAN compare to other GAN architectures?

Model	Year	Max resolution	Best reported FID	Key innovation
DCGAN (Radford et al.)	2015	64x64	n/a	First convolutional GAN that trained reliably
WGAN (Arjovsky et al.)	2017	64x64	n/a	Wasserstein loss for stable training
PGGAN (Karras et al.)	2017	1024x1024	8.04 (FFHQ)	Progressive resolution growing
BigGAN (Brock et al.)	2018	512x512	7.4 (ImageNet)	Class-conditional, very large batches
StyleGAN1	2018	1024x1024	4.40 (FFHQ)	Style-based generator with mapping network and AdaIN
StyleGAN2	2019	1024x1024	2.84 (FFHQ)	Weight demodulation, path length regularisation
StyleGAN2-ADA	2020	1024x1024	2.42 (CIFAR-10)	Adaptive discriminator augmentation
StyleGAN3	2021	1024x1024	2.79 (FFHQ)	Alias-free, equivariant generator
StyleGAN-XL	2022	1024x1024	2.30 (ImageNet)	Projected losses, conditional training on ImageNet
GigaGAN	2023	1024x1024	9.09 (COCO 2014)	Billion-parameter text-conditional GAN

Is StyleGAN open source?

The original StyleGAN1 code on NVlabs/stylegan was released under a Creative Commons BY-NC 4.0 license, restricting it to non-commercial use. ^[6] StyleGAN2 was released under the NVIDIA Source Code License, also research-only. StyleGAN2-ADA and StyleGAN3 use the same NVIDIA Source Code License. ^[7]^[8] The FFHQ dataset is distributed under Creative Commons BY-NC-SA 4.0 by NVIDIA Corporation, with each individual image retaining the license under which it was originally posted to Flickr. ^[9]

The community has produced several permissively licensed re-implementations, but the official NVlabs checkpoints inherit the non-commercial restriction. This has been a recurring issue in commercial deployments of face-generation tools and is one reason that some downstream products (especially in advertising) train fresh checkpoints on properly licensed data.

References

Karras, T., Laine, S., & Aila, T. (2019). "A Style-Based Generator Architecture for Generative Adversarial Networks." CVPR 2019, pp. 4401-4410. arXiv:1812.04948. Submitted December 12, 2018. https://arxiv.org/abs/1812.04948 ↩
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). "Analyzing and Improving the Image Quality of StyleGAN." CVPR 2020. arXiv:1912.04958. Submitted December 3, 2019. https://arxiv.org/abs/1912.04958 ↩
Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., & Aila, T. (2020). "Training Generative Adversarial Networks with Limited Data." NeurIPS 2020. arXiv:2006.06676. Submitted June 11, 2020. https://arxiv.org/abs/2006.06676 ↩
Karras, T., Aittala, M., Laine, S., Harkonen, E., Hellsten, J., Lehtinen, J., & Aila, T. (2021). "Alias-Free Generative Adversarial Networks." NeurIPS 2021. arXiv:2106.12423. Submitted June 23, 2021. https://arxiv.org/abs/2106.12423 ↩
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2018). "Progressive Growing of GANs for Improved Quality, Stability, and Variation." ICLR 2018. arXiv:1710.10196. https://arxiv.org/abs/1710.10196 ↩
NVlabs. "StyleGAN: Official TensorFlow Implementation." GitHub repository. https://github.com/NVlabs/stylegan ↩
NVlabs. "StyleGAN2-ADA: Official PyTorch Implementation." GitHub repository. https://github.com/NVlabs/stylegan2-ada-pytorch ↩
NVlabs. "StyleGAN3." GitHub repository. https://github.com/NVlabs/stylegan3 ↩
NVlabs. "Flickr-Faces-HQ Dataset (FFHQ)." GitHub repository. https://github.com/NVlabs/ffhq-dataset ↩
Sauer, A., Schwarz, K., & Geiger, A. (2022). "StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets." SIGGRAPH 2022. arXiv:2202.00273. https://arxiv.org/abs/2202.00273 ↩
Sauer, A., Karras, T., Laine, S., Geiger, A., & Aila, T. (2023). "StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis." ICML 2023. arXiv:2301.09515. https://arxiv.org/abs/2301.09515 ↩
Skorokhodov, I., Tulyakov, S., & Elhoseiny, M. (2022). "StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2." CVPR 2022. arXiv:2112.14683. https://arxiv.org/abs/2112.14683 ↩
Kang, M., Zhu, J.-Y., Zhang, R., Park, J., Shechtman, E., Paris, S., & Park, T. (2023). "Scaling up GANs for Text-to-Image Synthesis." CVPR 2023. arXiv:2303.05511. https://arxiv.org/abs/2303.05511 ↩
Wu, Z., Lischinski, D., & Shechtman, E. (2021). "StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation." CVPR 2021. arXiv:2011.12799. https://arxiv.org/abs/2011.12799 ↩
Matsakis, L. (2019). "'This Person Does Not Exist' Creator Reveals His Site's Creepy Origin Story." Inverse, February 20, 2019. https://www.inverse.com/article/53414-this-person-does-not-exist-creator-interview ↩
Metz, R., & Garcia, A. (2019). "These people do not exist. Why websites are churning out fake images of people (and cats)." CNN Business, February 28, 2019. https://www.cnn.com/2019/02/28/tech/ai-fake-faces ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

AI art BigGAN CycleGAN DCGAN (Deep Convolutional GAN)EDM (Elucidating Diffusion Models)GAN Generative adversarial network LTX-Video Staged training Unconditional Image Generation Models

Who created StyleGAN, and when was it released?

Predecessor: progressive growing of GANs

How does the StyleGAN1 generator work?

What did StyleGAN2 (2020) change?

How does StyleGAN2-ADA train on limited data?

What problem does StyleGAN3 (2021) solve?

StyleGAN-XL, StyleGAN-T, StyleGAN-V, and GigaGAN

Architecture details

What are the W and W+ latent spaces?

Path length regularisation

Equivariance constraints

What datasets is StyleGAN trained on?

How good is StyleGAN? (FID and other metrics)

What is StyleGAN used for?

Strengths and limitations

Strengths

Limitations

How influential was StyleGAN?

How does StyleGAN compare to other GAN architectures?

Is StyleGAN open source?

See also

References

Improve this article

Related Articles

Frechet Inception Distance

ControlNet

CycleGAN

Ideogram 3.0

Nano Banana

Seedream

What links here

Related Articles

Frechet Inception Distance

ControlNet

CycleGAN

Ideogram 3.0

Nano Banana

Seedream

What links here