Generative Adversarial Network (GAN)

Introduction

A Generative Adversarial Network (GAN) is a class of machine learning framework introduced by Ian Goodfellow and colleagues in 2014 ^[1]. The core idea involves two neural networks competing against each other in a game-theoretic setup: a generator that produces synthetic data and a discriminator that attempts to distinguish real data from generated samples. Through this adversarial process, the generator gradually learns to produce outputs that are increasingly realistic, while the discriminator becomes better at detecting fakes. When training succeeds, the generator captures the underlying data distribution well enough that the discriminator can no longer reliably tell real samples from generated ones.

GANs represented a major breakthrough in generative models and have been applied to image synthesis, super-resolution, image-to-image translation, data augmentation, drug discovery, and many other tasks. While diffusion models have overtaken GANs in some image generation benchmarks since 2021, GANs remain widely used and studied due to their fast inference speed and ability to produce high-fidelity outputs.

Explain like I'm 5 (ELI5)

Imagine two people playing a game. One person is an art forger who tries to paint fake copies of famous paintings. The other person is a detective whose job is to figure out which paintings are real and which are fake. At first, the forger is terrible and the detective catches every fake easily. But the forger keeps practicing and getting better. The detective also keeps improving at spotting fakes. Over time, the forger becomes so skilled that even the detective has trouble telling real from fake. In a GAN, the "forger" is the generator network and the "detective" is the discriminator network, and they keep pushing each other to improve through competition.

Historical background

Ian Goodfellow conceived the idea of GANs during a discussion with colleagues at a Montreal bar in 2014. He implemented the first prototype that same evening, and the resulting paper, "Generative Adversarial Nets," was published at the Conference on Neural Information Processing Systems (NeurIPS) in December 2014 ^[1]. The paper proposed a fundamentally new approach to generative modeling that did not require explicit density estimation or Markov chain sampling, setting it apart from previous methods such as restricted Boltzmann machines and variational autoencoders.

Yann LeCun, a Turing Award laureate, famously described adversarial training as "the most interesting idea in the last 10 years in machine learning" ^[2]. The original GAN paper has since accumulated over 65,000 citations and spawned hundreds of architectural variants.

Architecture

GANs consist of two primary components: the generator and the discriminator. Both are typically implemented as deep neural networks and are trained simultaneously.

Generator

The generator takes a random noise vector z, sampled from a prior distribution (usually a Gaussian or uniform distribution), and maps it to the data space through a series of learned transformations. The goal of the generator is to produce samples that are indistinguishable from real data. In image generation, the generator typically uses transposed convolutions (sometimes called fractional-strided convolutions) to progressively upsample the noise vector into a full-resolution image.

Discriminator

The discriminator is a binary classifier that receives both real data samples and generated samples. It outputs a probability indicating whether a given input is real (drawn from the training set) or fake (produced by the generator). In image tasks, the discriminator typically uses strided convolutions to downsample the input and produce a scalar classification output.

Information flow

During training, the generator never sees the real data directly. Instead, it receives learning signals through the gradients that flow back from the discriminator. The discriminator, on the other hand, sees both real and generated samples and learns to tell them apart. This indirect feedback loop is what drives the adversarial training process.

Mathematical formulation

The training objective of a GAN is formulated as a minimax game between the generator G and discriminator D. The original loss function proposed by Goodfellow et al. is:

min_G max_D V(D, G) = E[log D(x)] + E[log(1 - D(G(z)))]

where:

x is sampled from the real data distribution p_data
z is sampled from a noise prior p_z (typically Gaussian)
D(x) is the discriminator's estimate that x is real
G(z) is the generator's output given noise z
E denotes the expected value

The discriminator tries to maximize this objective by correctly classifying real samples (pushing D(x) toward 1) and generated samples (pushing D(G(z)) toward 0). The generator tries to minimize it by fooling the discriminator (pushing D(G(z)) toward 1).

Goodfellow et al. proved that, given sufficient model capacity and training time, this minimax game has a unique global optimum where G perfectly replicates the real data distribution (p_g = p_data) and D outputs 0.5 for all inputs, meaning it cannot distinguish real from fake ^[1].

In practice, the loss is decomposed into two separate optimization steps that alternate during training:

Discriminator update: maximize log D(x) + log(1 - D(G(z))) with respect to D
Generator update: minimize log(1 - D(G(z))) with respect to G, or equivalently maximize log D(G(z)) (the non-saturating variant, which provides stronger gradients early in training)

The connection to information theory is notable: the original GAN objective is closely related to the Jensen-Shannon divergence between the real data distribution and the generated distribution. Minimizing this divergence is equivalent to minimizing the cross-entropy between the two distributions ^[1].

Training dynamics and challenges

Nash equilibrium

GAN training seeks a Nash equilibrium, a state from game theory where neither the generator nor the discriminator can improve by changing its strategy alone. In the ideal case, the generator produces perfect samples and the discriminator assigns equal probability to real and fake inputs. However, finding this equilibrium in the high-dimensional, non-convex parameter space of neural networks is notoriously difficult, and training often fails to converge in practice.

Mode collapse

Mode collapse is one of the most common failure modes in GAN training. It occurs when the generator learns to produce only a small subset of the possible outputs, ignoring the full diversity of the real data distribution. For example, a GAN trained on handwritten digits might learn to generate only the digit "1" while ignoring all other digits. This happens because the generator finds a few outputs that reliably fool the discriminator and has no incentive to explore other modes of the distribution.

Training instability

The competing objectives of the generator and discriminator can lead to oscillations where neither network converges. If the discriminator becomes too powerful, the generator receives vanishing gradients and stops learning. If the generator becomes too powerful, the discriminator cannot provide useful feedback. Balancing the training of both networks is a persistent challenge.

Vanishing gradients

When the discriminator is very confident in its classifications, the gradients flowing back to the generator become extremely small, effectively halting generator learning. This problem is particularly acute with the original GAN loss function that uses log(1 - D(G(z))), which saturates when D(G(z)) is close to 0. The non-saturating loss variant (maximizing log D(G(z))) partially addresses this issue.

Hyperparameter sensitivity

GANs are notoriously sensitive to hyperparameter choices, including learning rates, batch normalization settings, network architectures, and latent space dimensions. Small changes in these settings can dramatically affect whether training converges, diverges, or collapses.

Major GAN variants

Since the original 2014 paper, hundreds of GAN variants have been proposed. The following table summarizes the most influential ones.

Variant	Year	Authors	Key contribution
CGAN (Conditional GAN)	2014	Mirza and Osindero	Conditions generation on class labels or other auxiliary information
DCGAN	2015	Radford, Metz, Chintala	Introduced convolutional architecture guidelines for stable GAN training
Pix2Pix	2016	Isola et al.	Paired image-to-image translation using conditional GANs
WGAN	2017	Arjovsky, Chintala, Bottou	Replaced JS divergence with Wasserstein distance for stable training
WGAN-GP	2017	Gulrajani et al.	Added gradient penalty to enforce Lipschitz constraint instead of weight clipping
ProGAN	2017	Karras et al. (NVIDIA)	Progressive growing of both networks from low to high resolution
CycleGAN	2017	Zhu et al.	Unpaired image-to-image translation using cycle consistency loss
StackGAN	2017	Zhang et al.	Multi-stage text-to-image generation with conditioning augmentation
SAGAN	2018	Zhang et al.	Self-attention mechanism for capturing long-range dependencies
BigGAN	2018	Brock et al.	Large-scale class-conditional generation with truncation trick
StyleGAN	2018	Karras et al. (NVIDIA)	Style-based generator with adaptive instance normalization
StyleGAN2	2020	Karras et al. (NVIDIA)	Weight demodulation, removed blob artifacts, perceptual path length regularization
StyleGAN3	2021	Karras et al. (NVIDIA)	Alias-free generation eliminating texture sticking artifacts

DCGAN

Deep Convolutional GAN (DCGAN), introduced by Radford, Metz, and Chintala in 2015, was one of the first architectures to demonstrate that convolutional neural networks could be used effectively in GANs ^[3]. DCGAN established several architectural guidelines that became standard practice: replacing pooling layers with strided convolutions, using batch normalization in both the generator and discriminator, removing fully connected hidden layers, using ReLU activation in the generator (except for the output layer, which uses Tanh), and using LeakyReLU in the discriminator. These guidelines significantly improved training stability and became the default starting point for many subsequent GAN architectures.

Wasserstein GAN and gradient penalty

The Wasserstein GAN (WGAN), proposed by Arjovsky, Chintala, and Bottou in 2017, addressed fundamental training stability issues by replacing the Jensen-Shannon divergence in the original GAN loss with the Wasserstein distance (also called Earth Mover's distance) ^[4]. The Wasserstein distance provides smoother gradients even when the generator's distribution and the real distribution have minimal overlap, a situation where the original GAN loss produces uninformative gradients.

To compute the Wasserstein distance, WGAN requires the discriminator (called the "critic" in this context) to satisfy a Lipschitz continuity constraint. The original WGAN enforced this through weight clipping, which worked but introduced problems such as capacity underuse and exploding or vanishing gradients.

WGAN-GP, proposed by Gulrajani et al. later in 2017, replaced weight clipping with a gradient penalty term that directly penalizes the norm of the critic's gradient with respect to its input ^[5]. The gradient penalty encourages the critic's gradient norm to stay close to 1, providing a more principled and effective way to enforce the Lipschitz constraint. WGAN-GP became one of the most popular GAN training methods due to its improved stability and reduced sensitivity to architectural choices and hyperparameters.

Progressive growing (ProGAN)

Progressive Growing of GANs (ProGAN), proposed by Karras et al. at NVIDIA in 2017, introduced the idea of training GANs incrementally, starting at low resolution (such as 4x4 pixels) and progressively adding layers to both the generator and discriminator to handle higher resolutions (up to 1024x1024) ^[6]. This approach made high-resolution image generation feasible for the first time and greatly improved training stability, since each resolution stage could be trained relatively quickly. ProGAN also introduced minibatch standard deviation as a technique to increase output diversity and equalized learning rate to normalize weight updates across layers.

StyleGAN series

The StyleGAN family, developed by Tero Karras and colleagues at NVIDIA, represents one of the most significant advances in GAN-based image generation.

StyleGAN (2018): Introduced a style-based generator architecture that uses a mapping network to transform the latent code into an intermediate latent space, then applies the style information at each layer through adaptive instance normalization (AdaIN) ^[7]. This design allows control over different levels of detail, with earlier layers controlling coarse features (pose, face shape) and later layers controlling fine details (hair texture, skin). StyleGAN also added stochastic noise inputs to control stochastic variation such as freckles and hair placement.

StyleGAN2 (2020): Addressed several artifacts present in StyleGAN outputs, most notably the "blob" artifacts caused by AdaIN ^[8]. StyleGAN2 replaced AdaIN with weight demodulation, which applies style information by modulating and demodulating convolution weights rather than normalizing feature maps. It also introduced perceptual path length regularization to encourage smoother latent space interpolation and used residual connections. The result was significantly improved image quality.

StyleGAN3 (2021): Tackled the "texture sticking" problem, where generated textures appear fixed to pixel coordinates rather than moving naturally with underlying features ^[9]. The solution involved a thorough analysis of aliasing in the generator network and the introduction of alias-free operations throughout the architecture. StyleGAN3 treats all internal signals as continuous, ensuring that features transform smoothly under translation and rotation.

Image-to-image translation

Pix2Pix (2016): Proposed by Isola et al., Pix2Pix uses conditional GANs for paired image-to-image translation ^[10]. Given paired training data (for example, satellite images and corresponding maps), Pix2Pix learns a mapping from one image domain to another. The generator uses a U-Net architecture with skip connections, and the discriminator operates on local image patches (PatchGAN) rather than the entire image, which improves texture quality.

CycleGAN (2017): Proposed by Zhu et al., CycleGAN enables unpaired image-to-image translation by introducing cycle consistency loss ^[11]. Instead of requiring matched image pairs, CycleGAN uses two generator-discriminator pairs (one for each translation direction) and enforces that translating an image to the target domain and back should recover the original image. This approach enabled applications such as converting photographs to paintings, transforming horses into zebras, and seasonal landscape changes.

Conditional GAN

Conditional GANs (CGANs), introduced by Mirza and Osindero in 2014, extend the original GAN framework by conditioning both the generator and discriminator on additional information, such as class labels, text descriptions, or other data ^[12]. This conditioning allows users to control what the generator produces. CGANs formed the basis for text-to-image generation models that preceded diffusion models, including StackGAN (2017), which generated 256x256 images from text descriptions using a multi-stage approach, and AttnGAN (2018), which incorporated attention mechanisms to align image regions with specific words in the input description.

BigGAN

BigGAN, introduced by Brock et al. in 2018, demonstrated that scaling up GAN training (with larger batch sizes, more parameters, and class conditioning) could dramatically improve image quality ^[13]. Trained on ImageNet at 128x128 resolution, BigGAN achieved an Inception Score of 166.5 and a FID of 7.4, far surpassing previous state-of-the-art results. BigGAN introduced the "truncation trick," which trades off sample diversity for fidelity by reducing the variance of the noise input at inference time, and used orthogonal regularization in the generator to improve amenability to truncation.

Applications

GANs have found widespread use across many domains. The table below summarizes key application areas.

Application	Description	Notable models
Image synthesis	Generating photorealistic images of faces, objects, scenes	StyleGAN, BigGAN, ProGAN
Image super-resolution	Enhancing low-resolution images to higher resolution	SRGAN, ESRGAN
Image inpainting	Filling in missing or damaged regions of images	DeepFill, EdgeConnect
Image-to-image translation	Converting images from one domain to another	Pix2Pix, CycleGAN, SPADE/GauGAN
Text-to-image generation	Creating images from text descriptions	StackGAN, AttnGAN, GigaGAN
Data augmentation	Generating synthetic training data to improve classifier performance	Various domain-specific GANs
Video generation	Synthesizing realistic video sequences	MoCoGAN, DVD-GAN
Medical imaging	Synthesizing medical images for training and augmentation	MedGAN, various specialized architectures
Drug discovery	Generating novel molecular structures with desired properties	ORGAN, MolGAN, MedGAN
Style transfer	Applying artistic styles to photographs	AdaIN-based GANs, CycleGAN
3D object generation	Creating three-dimensional shapes and scenes	3D-GAN, EG3D
Audio synthesis	Generating realistic speech and music	WaveGAN, GANSynth

Image super-resolution

SRGAN (Super-Resolution GAN), proposed by Ledig et al. in 2017, was one of the first models to use a GAN framework for image super-resolution. By training the generator to produce upscaled images that fool a discriminator, SRGAN produces perceptually sharper results than methods trained solely with pixel-wise loss functions such as mean squared error. ESRGAN (Enhanced SRGAN) further improved on this approach with a residual-in-residual dense block architecture and relativistic discriminator.

Drug discovery and molecular generation

GANs have been applied to drug discovery by generating novel molecular structures with desired pharmacological properties. Models such as ORGAN (Objective-Reinforced GAN) and MolGAN use GAN-based frameworks to generate molecular graphs or SMILES strings. Recent work, such as the MedGAN model combining Wasserstein GANs with graph convolutional networks, has demonstrated the ability to generate novel, chemically valid, and diverse molecules for pharmaceutical research.

Data augmentation

In domains where training data is scarce or expensive to collect, GANs can generate synthetic samples to augment existing datasets. This approach has proven particularly valuable in medical imaging, where annotated data is limited. GAN-augmented datasets have been shown to improve the performance of downstream classifiers in tasks such as tumor detection and pathology image analysis.

Evaluation metrics

Evaluating GANs is challenging because there is no single metric that captures all aspects of generation quality. The following metrics are most commonly used.

Metric	What it measures	How it works	Strengths	Limitations
Frechet Inception Distance (FID)	Overall quality and diversity	Compares mean and covariance of Inception v3 features between real and generated image sets; lower is better	Captures both quality and diversity in a single score	Sensitive to sample size; relies on Inception network features
Inception Score (IS)	Quality and diversity	Uses Inception v3 to classify generated images; measures confidence and class diversity; higher is better	Easy to compute; correlates with human judgment for ImageNet	Does not compare to real data; can be gamed; biased toward ImageNet classes
LPIPS	Perceptual similarity	Uses deep network features to compute perceptual distance between image pairs; lower means more similar	Aligns well with human perception of image similarity	Measures pairwise similarity, not distributional properties
Precision and Recall	Quality vs. diversity separately	Measures what fraction of generated samples are realistic (precision) and what fraction of the real distribution is covered (recall)	Disentangles quality from diversity	Requires density estimation in feature space

The Frechet Inception Distance (FID) has become the de facto standard for evaluating GAN outputs. It works by feeding both real and generated images through a pretrained Inception v3 network, extracting feature vectors from an intermediate layer, fitting multivariate Gaussian distributions to both sets of features, and computing the Frechet distance between the two Gaussians. A lower FID indicates that the generated images are closer to the real images in both quality and diversity.

GANs compared with other generative models

GANs are one of several major approaches to generative modeling. The following table compares GANs with variational autoencoders (VAEs) and diffusion models.

Property	GANs	VAEs	Diffusion models
Training objective	Adversarial minimax game	Evidence lower bound (ELBO) maximization	Denoising score matching
Sample quality	High fidelity, sharp images	Often blurry due to pixel-wise reconstruction loss	State-of-the-art fidelity and diversity
Sample diversity	Can suffer from mode collapse	Generally good diversity	Excellent diversity
Training stability	Notoriously unstable; requires careful tuning	Stable; straightforward optimization	Stable; simple loss function
Inference speed	Fast (single forward pass through generator)	Fast (single forward pass through decoder)	Slow (requires many iterative denoising steps)
Likelihood estimation	No explicit likelihood	Provides lower bound on likelihood	Can compute exact likelihood via probability flow ODE
Mode coverage	Partial (prone to mode collapse)	Good	Excellent
Latent space	Learned implicitly; can be less smooth	Explicitly regularized; typically smooth	Defined by diffusion process

Since 2021, diffusion models (such as DALL-E 2, Stable Diffusion, and Imagen) have surpassed GANs on many image generation benchmarks, particularly in diversity and mode coverage. However, GANs retain advantages in inference speed, since they generate samples in a single forward pass rather than requiring hundreds of denoising steps. Some recent work explores hybrid approaches that combine the strengths of both paradigms.

Training tips and best practices

Based on years of community experience, the following practices have been found to improve GAN training:

Use WGAN-GP or spectral normalization as a starting point. Both provide more stable training than the original GAN loss. Spectral normalization constrains the Lipschitz constant of the discriminator by normalizing the spectral norm of each weight matrix.
Apply two time-scale update rule (TTUR). Use different learning rates for the generator and discriminator, typically with a lower learning rate for the generator. Heusel et al. showed that GANs trained with TTUR converge to a local Nash equilibrium.
Use batch normalization in the generator and consider spectral normalization or layer normalization in the discriminator. Avoid batch normalization in the discriminator when using WGAN-GP, as it can interfere with the gradient penalty computation.
Monitor FID during training rather than relying solely on the loss values. GAN losses are not directly interpretable in the same way as standard classification or regression losses.
Use the non-saturating loss for the generator (maximize log D(G(z)) instead of minimize log(1 - D(G(z)))). This provides stronger gradients early in training when the generator is still poor.
Add label smoothing for real samples (use target values of 0.9 instead of 1.0 for real data labels) to prevent the discriminator from becoming overconfident.
Use minibatch discrimination or minibatch standard deviation to encourage the generator to produce diverse outputs and reduce mode collapse.
Start with proven architectures such as DCGAN guidelines before experimenting with novel designs. Debug the implementation thoroughly before tuning hyperparameters.

Ethical concerns and deepfakes

The ability of GANs to generate highly realistic synthetic media has raised serious ethical and societal concerns. The most prominent issue is deepfakes, which are synthetic images, videos, or audio created by AI that convincingly depict people saying or doing things they never actually did.

Research has documented a 550% increase in AI-manipulated media between 2019 and 2023, highlighting the rapid proliferation of this technology. GAN-generated deepfakes have been used for identity fraud, non-consensual synthetic pornography, political misinformation, and financial scams.

Detecting GAN-generated content remains an active area of research. Early detection methods relied on identifying GAN-specific artifacts, such as spectral patterns in the frequency domain or inconsistencies in facial features. However, as generation quality improves, detection becomes increasingly difficult. Models trained to detect GAN-based deepfakes often struggle with outputs from diffusion-based generators, which produce different artifact patterns.

Regulatory responses have emerged worldwide. The European Union's AI Act includes transparency requirements for synthetic media, and China has established rules requiring watermarking, traceability, and identity verification for deep synthesis content. Best practices include watermarking AI-generated content, implementing use restrictions in generation APIs, and developing robust detection tools.

Timeline of key developments

Year	Development
2014	Goodfellow et al. publish the original GAN paper; Mirza and Osindero propose Conditional GAN
2015	Radford et al. introduce DCGAN with convolutional architecture guidelines
2016	Isola et al. propose Pix2Pix for paired image-to-image translation
2017	Arjovsky et al. propose WGAN; Gulrajani et al. introduce WGAN-GP; Karras et al. propose ProGAN; Zhu et al. introduce CycleGAN; Zhang et al. propose StackGAN
2018	Karras et al. introduce StyleGAN; Brock et al. introduce BigGAN; Zhang et al. propose SAGAN
2020	Karras et al. release StyleGAN2 with weight demodulation
2021	Karras et al. release alias-free StyleGAN3; diffusion models begin surpassing GANs on benchmarks
2022-present	Hybrid GAN-diffusion approaches emerge; GANs remain dominant for real-time and low-latency applications

References

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). "Generative Adversarial Nets." Advances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/1406.2661
LeCun, Y. (2016). Quoted in various interviews describing adversarial training as "the most interesting idea in the last 10 years in machine learning."
Radford, A., Metz, L., & Chintala, S. (2015). "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks." arXiv preprint arXiv:1511.06434. https://arxiv.org/abs/1511.06434
Arjovsky, M., Chintala, S., & Bottou, L. (2017). "Wasserstein Generative Adversarial Networks." Proceedings of the International Conference on Machine Learning (ICML). https://arxiv.org/abs/1701.07875
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. (2017). "Improved Training of Wasserstein GANs." Advances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/1704.00028
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). "Progressive Growing of GANs for Improved Quality, Stability, and Variation." arXiv preprint arXiv:1710.10196. https://arxiv.org/abs/1710.10196
Karras, T., Laine, S., & Aila, T. (2019). "A Style-Based Generator Architecture for Generative Adversarial Networks." IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://arxiv.org/abs/1812.04948
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). "Analyzing and Improving the Image Quality of StyleGAN." IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://arxiv.org/abs/1912.04958
Karras, T., Aittala, M., Laine, S., Harkonen, E., Hellsten, J., Lehtinen, J., & Aila, T. (2021). "Alias-Free Generative Adversarial Networks." Advances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/2106.12423
Isola, P., Zhu, J.-Y., Zhou, T., & Efros, A. A. (2017). "Image-to-Image Translation with Conditional Adversarial Networks." IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://arxiv.org/abs/1611.07004
Zhu, J.-Y., Park, T., Isola, P., & Efros, A. A. (2017). "Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks." IEEE International Conference on Computer Vision (ICCV). https://arxiv.org/abs/1703.10593
Mirza, M., & Osindero, S. (2014). "Conditional Generative Adversarial Nets." arXiv preprint arXiv:1411.1784. https://arxiv.org/abs/1411.1784
Brock, A., Donahue, J., & Simonyan, K. (2018). "Large Scale GAN Training for High Fidelity Natural Image Synthesis." International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1809.11096
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). "GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium." Advances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/1706.08500
Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., & Shi, W. (2017). "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network." IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://arxiv.org/abs/1609.04802

Introduction

Explain like I'm 5 (ELI5)

Historical background

Architecture

Generator

Discriminator

Information flow

Mathematical formulation

Training dynamics and challenges

Nash equilibrium

Mode collapse

Training instability

Vanishing gradients

Hyperparameter sensitivity

Major GAN variants

DCGAN

Wasserstein GAN and gradient penalty

Progressive growing (ProGAN)

StyleGAN series

Image-to-image translation

Conditional GAN

BigGAN

Applications

Image super-resolution

Drug discovery and molecular generation

Data augmentation

Evaluation metrics

GANs compared with other generative models

Training tips and best practices

Ethical concerns and deepfakes

Timeline of key developments

References

Improve this article

Related Articles

Sparse autoencoder

Denoising Diffusion Probabilistic Model

ARC-AGI 2

Latent diffusion model

Diffusion models

Generative Model

Introduction

Explain like I'm 5 (ELI5)

Historical background

Architecture

Generator

Discriminator

Information flow

Mathematical formulation

Training dynamics and challenges

Nash equilibrium

Mode collapse

Training instability

Vanishing gradients

Hyperparameter sensitivity

Major GAN variants

DCGAN

Wasserstein GAN and gradient penalty

Progressive growing (ProGAN)

StyleGAN series

Image-to-image translation

Conditional GAN

BigGAN

Applications

Image super-resolution

Drug discovery and molecular generation

Data augmentation

Evaluation metrics

GANs compared with other generative models

Training tips and best practices

Ethical concerns and deepfakes

Timeline of key developments

References

Related Articles

Sparse autoencoder

Denoising Diffusion Probabilistic Model

ARC-AGI 2

Latent diffusion model

Diffusion models

Generative Model