# Generative Model

> Source: https://aiwiki.ai/wiki/generative_model
> Updated: 2026-07-12
> Categories: Deep Learning, Generative AI, Machine Learning
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

*See also: [Machine learning terms](/wiki/machine_learning_terms), [Discriminative model](/wiki/discriminative_model)*

## What is a generative model?

A **generative model** is a class of statistical and [machine learning](/wiki/machine_learning) model that learns the joint probability distribution P(X) of the observed data, or the joint distribution P(X, Y) of inputs and labels, in order to generate new data samples that resemble the training distribution. In contrast to [discriminative models](/wiki/discriminative_model), which learn the conditional probability P(Y|X) to classify inputs, generative models capture how the data itself is produced.[3] This is the defining difference: a discriminative model learns the boundary between classes, while a generative model learns the full data-generating process and can therefore synthesize images, write text, compose music, design molecules, and augment datasets.

Generative models sit at the heart of the modern AI revolution. Systems such as [GPT](/wiki/gpt_generative_pre-trained_transformer), [DALL-E](/wiki/dall-e), and [Stable Diffusion](/wiki/stable_diffusion) are all built on generative modeling principles. The field draws on decades of research in probability theory, Bayesian statistics, information theory, and [deep learning](/wiki/deep_model), and it has expanded rapidly since 2014 with the introduction of [generative adversarial networks](/wiki/generative_adversarial_network_gan) (GANs) and variational autoencoders (VAEs).[1][2] The category reached mainstream awareness in late 2022 when [ChatGPT](/wiki/chatgpt) attracted an estimated 100 million monthly active users within roughly two months of launch, which a UBS analysis called the fastest-growing consumer application in history.[13]

## Formal Definition

In probabilistic terms, a generative model specifies a distribution p_model(x) over the data space X, or a joint distribution p_model(x, y) over inputs and labels. The goal during training is to adjust the model parameters theta so that p_model(x; theta) approximates the true data distribution p_data(x) as closely as possible.[9] Once trained, the model can:

1. **Generate new samples** by drawing from p_model(x).
2. **Estimate likelihoods** by evaluating p_model(x) for a given input (in models with tractable density).
3. **Perform classification** by computing $$P(Y \mid X)$$ via Bayes' rule, since $$P(Y \mid X) = \frac{P(X \mid Y) P(Y)}{P(X)}$$.[3]

The training objective often involves maximizing the log-likelihood of the observed data under the model, though many modern approaches use alternative objectives such as adversarial losses, score matching, or variational lower bounds.[10]

## How do generative and discriminative models differ?

The distinction between generative and discriminative models is one of the most fundamental concepts in machine learning.[3]

A **[discriminative model](/wiki/discriminative_model)** learns the decision boundary between classes directly. It models P(Y|X) and answers the question: "Given this input, what is the most likely label?" Examples include [logistic regression](/wiki/logistic_regression), [support vector machines](/wiki/support_vector_machine_svm), and most [neural network](/wiki/neural_network) classifiers.[3]

A **generative model** learns the full data distribution. It models P(X, Y) or P(X) and answers the question: "How was this data generated?" Because it captures the complete data-generating process, it can also be used for classification through Bayes' rule, but its primary strength lies in generating new data.[3]

### The Ng and Jordan Analysis (2002)

In a landmark 2002 paper, Andrew Ng and Michael Jordan compared generative and discriminative classifiers using [naive Bayes](/wiki/naive_bayes) (generative) and logistic regression (discriminative) as representative examples.[3] Their key findings include:

- **Asymptotic performance**: Discriminative models achieve lower classification error when training data is abundant, because they directly optimize for the classification task.[3]
- **Sample efficiency**: Generative models converge to their (higher) asymptotic error much faster, requiring fewer training examples to reach reasonable accuracy. The paper quantified this: generative naive Bayes parameter estimates approach their asymptotic values after on the order of $$\log(n)$$ examples, where n is the dimension of the input, whereas logistic regression requires on the order of n examples.[3]
- **Two regimes**: There exist two distinct performance regimes as training set size grows. With small datasets, the generative classifier (naive Bayes) often outperforms the discriminative one (logistic regression). As data increases, the discriminative model eventually surpasses the generative model.[3]

Ng and Jordan summarized the counterintuitive result directly, writing that "there can often be two distinct regimes of performance as the training set size is increased, one in which each algorithm does better," contrary to the then-widely-held belief that discriminative classifiers are almost always preferable.[3] This analysis remains influential and is frequently cited when practitioners choose between model families based on available data volume.[3]

| Property | Generative Model | Discriminative Model |
|---|---|---|
| What it models | Joint distribution P(X, Y) or P(X) | Conditional distribution $$P(Y \mid X)$$ |
| Primary use | Data generation, density estimation | Classification, regression |
| Can generate new samples | Yes | No |
| Can classify | Yes (via Bayes' rule) | Yes (directly) |
| Sample efficiency | Higher (needs less data) | Lower (needs more data) |
| Asymptotic accuracy | Lower | Higher |
| Classic examples | Naive Bayes, GMM, HMM | Logistic regression, SVM, neural network classifier |

## Taxonomy of Generative Models

Ian Goodfellow proposed an influential taxonomy that divides generative models based on how they represent the data distribution.[9] The primary split is between models that define an **explicit density** function and those that use an **implicit density** approach.

### Explicit Density Models

Explicit density models define a parametric form for p_model(x) and optimize it directly. These are further divided into:

- **Tractable density models**: Models where the likelihood p_model(x) can be computed exactly. Examples include autoregressive models (PixelCNN, GPT) and normalizing flows (RealNVP, Glow).
- **Approximate density models**: Models where the exact likelihood is intractable, so a surrogate objective is used. Examples include variational autoencoders (VAEs), which maximize a variational lower bound (ELBO), and Boltzmann machines, which rely on Markov chain Monte Carlo sampling.

### Implicit Density Models

Implicit density models do not define an explicit density function. Instead, they learn to sample from p_model(x) directly without ever computing the likelihood. The most prominent example is the [GAN](/wiki/generative_adversarial_network_gan), which trains a generator network to produce samples that fool a discriminator network.[9] Generative stochastic networks (GSNs) also fall into this category.

## Major Families of Generative Models

### Generative Adversarial Networks (GANs)

[Generative adversarial networks](/wiki/generative_adversarial_network_gan) were introduced by Ian Goodfellow and seven co-authors in 2014.[1] A GAN consists of two [neural networks](/wiki/neural_network) trained simultaneously in a minimax game:

- The **generator** G takes random noise z as input and produces synthetic data G(z).
- The **discriminator** D receives both real data and generated data and outputs the probability that its input is real.

The two networks optimize a single value function, $$\min_G \max_D V(D, G) = \mathbb{E}[\log D(x)] + \mathbb{E}[\log(1 - D(G(z)))]$$. The generator tries to minimize the discriminator's ability to distinguish real from fake, while the discriminator tries to maximize its classification accuracy. The original paper proved that this game has a unique global optimum at which the generator recovers the true data distribution and the discriminator outputs 1/2 everywhere, meaning generated samples are indistinguishable from real data.[1] The authors framed the setup as an analogy: the generative model is "analogous to a team of counterfeiters, trying to produce fake currency and use it without detection, while the discriminative model is analogous to the police, trying to detect the counterfeit currency."[1]

GANs have achieved remarkable results in image generation (StyleGAN, BigGAN), image-to-image translation (pix2pix, CycleGAN), and super-resolution (SRGAN). However, they can be difficult to train due to mode collapse (where the generator produces limited variety) and training instability.[8]

### Variational Autoencoders (VAEs)

Variational autoencoders, introduced by Diederik Kingma and Max Welling in a paper submitted on December 20, 2013, combine [neural networks](/wiki/neural_network) with variational Bayesian inference.[2] A VAE consists of:

- An **encoder** network that maps input data x to a distribution over latent variables z, typically parameterized by a mean vector mu and a standard deviation vector sigma.
- A **decoder** network that maps samples from the latent space back to the data space.

The model is trained by maximizing the evidence lower bound (ELBO), which balances reconstruction accuracy against a KL divergence regularization term that keeps the latent distribution close to a prior (usually a standard Gaussian). The reparameterization trick, also introduced in the original paper, allows gradients to flow through the sampling operation during backpropagation. The authors showed that "a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods," which is what makes the VAE trainable end to end.[2]

VAEs produce a smooth, continuous latent space that supports meaningful interpolation between data points. They are widely used in drug discovery, molecular generation, and representation learning.

### Autoregressive Models

Autoregressive models decompose the joint distribution into a product of conditional distributions using the chain rule of probability:

$$
p(x) = p(x_1) p(x_2 \mid x_1) p(x_3 \mid x_1, x_2) \cdots p(x_n \mid x_1, \ldots, x_{n-1})
$$

Each element in the sequence is generated one at a time, conditioned on all previous elements. This approach yields tractable, exact likelihoods and has been enormously successful in both text and image generation.

For text, the [Transformer](/wiki/transformer) architecture powers modern autoregressive [large language models](/wiki/large_language_model) such as the [GPT](/wiki/gpt_generative_pre-trained_transformer) family (GPT-2, GPT-3, GPT-4), [LLaMA](/wiki/llama), and [Claude](/wiki/claude).[12] These models predict the next token in a sequence and can generate coherent paragraphs, code, poetry, and more.

For images, PixelCNN and PixelRNN generate images one pixel at a time. More recent approaches like VQVAE-2 and Parti use discrete token representations of images and apply autoregressive transformers to generate them.

### Diffusion Models

[Diffusion models](/wiki/diffusion_model) (also called score-based generative models) have emerged as one of the most powerful generative frameworks since 2020. The approach involves two processes:

- **Forward process**: Gaussian noise is incrementally added to the data over many timesteps until the data is completely destroyed, becoming pure noise.
- **Reverse process**: A [neural network](/wiki/neural_network) is trained to reverse each noising step, gradually denoising random noise back into coherent data.

Denoising diffusion probabilistic models (DDPMs), introduced by Jonathan Ho, Ajay Jain, and Pieter Abbeel in 2020, demonstrated that this framework could generate high-quality images.[4] The paper opened by presenting "high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics," and reported a then-state-of-the-art Frechet Inception Distance (FID) of 3.17 and an Inception score of 9.46 on unconditional CIFAR-10.[4] Song and Ermon showed the connection between diffusion models and score matching, where the network learns the gradient (score) of the log data density.[5]

Diffusion models have several advantages over GANs: stable training with a simple mean squared error [loss function](/wiki/loss_function), no mode collapse, and a principled likelihood-based framework.[4] They power state-of-the-art image generators such as [Stable Diffusion](/wiki/stable_diffusion), [DALL-E](/wiki/dall-e) 2 and 3, and [Imagen](/wiki/imagen). They have also been extended to video generation ([Sora](/wiki/sora)), audio synthesis, and 3D content creation.

### Normalizing Flows

Normalizing flows construct a complex probability distribution by applying a sequence of invertible, differentiable transformations to a simple base distribution (typically a Gaussian).[6] Because each transformation is invertible, the exact likelihood can be computed using the change-of-variables formula:

$$
\log p(x) = \log p(z) - \sum_i \log \left\lvert \det\left(\frac{\partial f_i}{\partial z_i}\right) \right\rvert
$$

where z is the base distribution sample and fi are the flow layers. Key models in this family include RealNVP (2016), Glow (2018), and Neural Spline Flows (2019).

The main advantage of normalizing flows is exact likelihood computation, which is useful for density estimation and anomaly detection. The main limitation is that the input and output dimensions must match, and the transformations must be carefully designed so that both the function and its Jacobian determinant are efficiently computable.

### Energy-Based Models (EBMs)

Energy-based models assign a scalar energy value E(x) to each data point. Lower energy corresponds to higher probability. The probability distribution is defined as:

$$
p(x) = \frac{\exp(-E(x))}{Z}
$$

where Z is the partition function (normalizing constant). Because Z is typically intractable, training EBMs requires specialized techniques such as contrastive divergence or score matching.[10]

EBMs are flexible and expressive, but sampling from them is computationally expensive, often requiring Markov chain Monte Carlo (MCMC) methods. Research by Yann LeCun and others has explored combining EBMs with modern deep learning architectures.

### Boltzmann Machines

Boltzmann machines, proposed by Geoffrey Hinton and Terry Sejnowski in 1985, are stochastic [neural networks](/wiki/neural_network) inspired by statistical mechanics.[11] They define a joint distribution over visible and hidden units using an energy function, where states with lower energy are more probable, analogous to how a physical system at thermal equilibrium is more likely to occupy low-energy configurations. Restricted Boltzmann Machines (RBMs) simplify the architecture by removing connections between units in the same layer, making training more tractable through contrastive divergence.

RBMs were foundational to early [deep learning](/wiki/deep_model). Stacking multiple RBMs produces Deep Belief Networks (DBNs), which were among the first successful deep generative models. While largely superseded by VAEs, GANs, and diffusion models for generation tasks, Boltzmann machines remain important in the history of generative modeling and continue to inspire research in energy-based approaches.

### Naive Bayes as a Generative Classifier

[Naive Bayes](/wiki/naive_bayes) is one of the simplest generative classifiers. It models the joint distribution P(X, Y) by assuming that all features are conditionally independent given the class label:

$$
P(X \mid Y = c) = \prod_i P(x_i \mid Y = c)
$$

Despite this strong independence assumption, naive Bayes performs surprisingly well in many practical settings, especially text classification.[3] It can also be used generatively by sampling from the learned class-conditional distributions, though the generated samples tend to be less realistic than those from deep generative models.

## Comparison of Generative Model Families

| Model Family | Year Introduced | Density Type | Likelihood | Training Stability | Sample Quality | Key Use Cases |
|---|---|---|---|---|---|---|
| Naive Bayes | 1960s | Explicit (tractable) | Exact | Very stable | Low | Text classification, spam filtering |
| Boltzmann Machines / RBMs | 1985 | Explicit (approximate) | Approximate (MCMC) | Moderate | Low to moderate | Feature learning, pretraining |
| Gaussian Mixture Models | Classical | Explicit (tractable) | Exact | Stable (EM) | Low | Clustering, density estimation |
| Variational Autoencoders | 2013 | Explicit (approximate) | Lower bound (ELBO) | Stable | Moderate (can be blurry) | Molecule design, representation learning |
| GANs | 2014 | Implicit | Not available | Unstable (mode collapse) | High (sharp images) | Image synthesis, style transfer |
| Normalizing Flows | 2015 | Explicit (tractable) | Exact | Stable | Moderate to high | Density estimation, anomaly detection |
| Autoregressive (GPT, PixelCNN) | 2016 | Explicit (tractable) | Exact | Stable | High | Text generation, language modeling |
| Diffusion Models | 2020 | Explicit (approximate) | Approximate (ELBO) | Very stable | Very high | Image and video generation |

A 2022 comparative review of deep generative models in IEEE Transactions on Pattern Analysis and Machine Intelligence surveyed VAEs, GANs, normalizing flows, energy-based, and autoregressive families together, noting that no single family dominates on all axes of likelihood tractability, sample quality, and training stability simultaneously.[10]

## Latent Variable Models

Many generative models introduce **latent variables** z that represent unobserved factors of variation in the data. The observed data x is assumed to be generated from these hidden factors according to:

$$
p(x) = \int p(x \mid z) p(z) \, dz
$$

Latent variable models include VAEs, GANs (where the noise input z serves as the latent variable), and classical models like factor analysis and probabilistic PCA. The latent space often captures meaningful, interpretable features of the data. For example, in a face generation model, different latent dimensions might control pose, lighting, or expression.

The key challenge in latent variable models is **inference**: computing the posterior distribution $$p(z \mid x)$$. In VAEs, this is addressed through amortized variational inference using an encoder network.[2] In GANs, the latent-to-data mapping is learned, but inferring z from x typically requires additional techniques such as encoder networks (BiGAN) or optimization-based inversion.

## Training Objectives

Different generative model families use different [loss functions](/wiki/loss_function) and training objectives:

- **Maximum Likelihood Estimation (MLE)**: Maximizes the log-probability of the observed data. Used by autoregressive models, normalizing flows, and (approximately) by VAEs.
- **Adversarial Loss**: The generator minimizes the discriminator's ability to distinguish real from generated samples. Used by GANs.[1]
- **Evidence Lower Bound (ELBO)**: Maximizes a tractable lower bound on the log-likelihood, combining a reconstruction term with a KL divergence regularizer. Used by VAEs.[2]
- **Score Matching**: Learns the gradient of the log-density (the score function) rather than the density itself. Used by score-based diffusion models.[5]
- **Denoising Objective**: Trains the model to predict the noise added to a data sample, which is equivalent to learning the score function. Used by DDPMs.[4]
- **Contrastive Divergence**: Approximates the gradient of the log-likelihood using short Markov chain runs. Used by RBMs and Boltzmann machines.[11]

## How are generative models evaluated?

Evaluating generative models is challenging because there is no single metric that captures all aspects of generation quality. Common metrics include:

### Frechet Inception Distance (FID)

FID, introduced by Heusel et al. in 2017, is the most widely used metric for evaluating generated images.[7] It extracts features from both real and generated images using a pretrained Inception v3 network, fits multivariate Gaussians to each set of features, and computes the Frechet distance between the two distributions. Lower FID indicates higher quality and diversity. The authors introduced FID specifically because it "captures the similarity of generated images to real ones better than the Inception Score," and showed it agrees better with human judgment and with increasing levels of injected image distortion.[7]

### Inception Score (IS)

The Inception Score, introduced by Salimans et al. in 2016, evaluates generated images based on two criteria: each image should be clearly classifiable (low entropy in the class prediction), and the set of generated images should cover many classes (high entropy in the marginal class distribution).[8] Higher IS indicates better quality and diversity. However, IS has notable limitations: it does not compare against real data, and it can be fooled by generators that produce high-confidence but unrealistic images.

### Perplexity

For text generation models, [perplexity](/wiki/perplexity) measures how well the model predicts a held-out test set. It is defined as the exponentiation of the average negative log-likelihood per token. Lower perplexity indicates that the model assigns higher probability to the actual text, suggesting better language modeling. Perplexity is the standard evaluation metric for [large language models](/wiki/large_language_model).[12]

### Additional Metrics

Other evaluation approaches include Kernel Inception Distance (KID), precision and recall metrics for generative models (measuring fidelity and diversity separately), CLIP score (for text-image alignment), and human evaluation studies.

## What are generative models used for?

Generative models have found applications across a wide range of fields:

### Image Generation and Editing

Diffusion models and GANs can generate photorealistic images from text descriptions, perform style transfer, fill in missing regions (inpainting), increase resolution (super-resolution), and edit specific attributes of existing images. Products like [Midjourney](/wiki/midjourney), [DALL-E](/wiki/dall-e), and [Stable Diffusion](/wiki/stable_diffusion) have made these capabilities accessible to millions of users.

### Text Generation

Autoregressive [large language models](/wiki/large_language_model) generate coherent, contextually appropriate text for a vast range of applications: creative writing, code generation, translation, summarization, question answering, and conversational AI. Systems like [ChatGPT](/wiki/chatgpt), [Claude](/wiki/claude), and [Gemini](/wiki/gemini) represent the current state of the art.

### Drug Discovery and Molecular Design

Generative models, particularly VAEs and diffusion models, are used to design novel molecular structures with desired pharmacological properties. These models can explore the vast chemical space more efficiently than traditional high-throughput screening, generating candidates optimized for drug-likeness, synthetic accessibility, and target binding affinity. Real-world successes include the AI-driven discovery of novel antibiotics effective against multidrug-resistant infections.

### Data Augmentation

Generative models create synthetic training data to improve the performance of supervised learning systems, especially when real labeled data is scarce or expensive to collect. This is particularly valuable in medical imaging, where patient privacy concerns and annotation costs limit dataset size.

### Anomaly Detection

Models that learn the normal data distribution can flag anomalies as inputs with low likelihood under p_model(x). This approach is used in fraud detection, manufacturing quality control, network intrusion detection, and medical diagnostics.

### Music, Audio, and Video

Generative models now create music compositions, voice synthesis, sound effects, and even full video clips. Models like [Sora](/wiki/sora) (video), AudioCraft (audio), and Suno (music) demonstrate the breadth of creative applications.

### Scientific Research

Beyond drug discovery, generative models assist in protein structure prediction, materials science (designing new materials with target properties), weather forecasting, and particle physics simulation.

## The Generative AI Revolution (2022 Onward)

The period from 2022 onward marked a transformative moment for generative models, bringing them from research labs into mainstream use:

- **April 2022**: OpenAI announced [DALL-E](/wiki/dall-e) 2 on April 6, 2022, demonstrating high-quality text-to-image generation with diffusion models.[14]
- **August 2022**: Stability AI released [Stable Diffusion](/wiki/stable_diffusion) on August 22, 2022, developed with the CompVis group at LMU Munich, Runway, and LAION, with code and weights released publicly to democratize access to powerful image generation.[15]
- **November 2022**: OpenAI launched [ChatGPT](/wiki/chatgpt), based on GPT-3.5. It reached an estimated 100 million monthly active users within about two months, which UBS analysts described as the fastest-growing consumer application in history.[13]
- **March 2023**: OpenAI released [GPT-4](/wiki/gpt-4) on March 14, 2023, a large multimodal model that accepts image and text inputs; OpenAI reported it passed a simulated bar exam scoring around the top 10 percent of test takers, versus the bottom 10 percent for GPT-3.5.[16]
- **2023 onward**: Google launched [Gemini](/wiki/gemini), Anthropic scaled [Claude](/wiki/claude), Meta open-sourced [LLaMA](/wiki/llama), and a global race to develop and deploy generative AI began across every major technology company.

This revolution was driven by three converging factors: advances in model architectures (transformers, diffusion), massive increases in compute and training data, and breakthroughs in alignment and instruction tuning that made models useful for everyday tasks.

## Challenges and Open Problems

Despite remarkable progress, generative models face several ongoing challenges:

- **Hallucinations**: Language models sometimes generate plausible-sounding but factually incorrect content, a significant barrier to deployment in high-stakes applications.
- **Bias and fairness**: Generative models can amplify biases present in their training data, producing stereotyped or harmful content.
- **Evaluation difficulty**: No single metric fully captures generation quality, and human evaluation is expensive and subjective.
- **Computational cost**: Training large generative models requires enormous compute resources and energy, raising sustainability concerns.
- **Copyright and ethics**: The use of copyrighted material in training data and the potential for deepfakes raise unresolved legal and ethical questions.
- **Mode collapse**: Particularly in GANs, models may fail to capture the full diversity of the training distribution.[8]
- **Controllability**: Steering generative models to produce exactly what users want, while avoiding unwanted outputs, remains an active area of research.

## Explain Like I'm 5 (ELI5)

Imagine you spend a long time looking at thousands of pictures of cats. Eventually, you get so good at understanding what cats look like that you can close your eyes and draw a brand new cat from your imagination. It would not be a copy of any cat you saw before, but it would still look like a real cat because you learned the "rules" of what makes a cat look like a cat.

That is basically what a generative model does. It looks at tons of examples (pictures, sentences, molecules, or anything else), figures out the hidden patterns and rules behind them, and then uses those rules to create brand new examples that look just like the real ones. Some generative models are really good at making pictures, others are great at writing stories, and some even help scientists invent new medicines.

## References

1. Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al. (2014). "Generative Adversarial Nets." *Advances in Neural Information Processing Systems*, 27. https://proceedings.neurips.cc/paper/2014/hash/f033ed80deb0234979a61f95710dbe25-Abstract.html
2. Kingma, D. P. and Welling, M. (2013). "Auto-Encoding Variational Bayes." *arXiv preprint arXiv:1312.6114*. https://arxiv.org/abs/1312.6114
3. Ng, A. Y. and Jordan, M. I. (2002). "On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes." *Advances in Neural Information Processing Systems*, 14. https://papers.nips.cc/paper/2020-on-discriminative-vs-generative-classifiers-a-comparison-of-logistic-regression-and-naive-bayes
4. Ho, J., Jain, A., and Abbeel, P. (2020). "Denoising Diffusion Probabilistic Models." *Advances in Neural Information Processing Systems*, 33. https://arxiv.org/abs/2006.11239
5. Song, Y. and Ermon, S. (2019). "Generative Modeling by Estimating Gradients of the Data Distribution." *Advances in Neural Information Processing Systems*, 32.
6. Rezende, D. J. and Mohamed, S. (2015). "Variational Inference with Normalizing Flows." *International Conference on Machine Learning*.
7. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. (2017). "GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium." *Advances in Neural Information Processing Systems*, 30. https://arxiv.org/abs/1706.08500
8. Salimans, T., Goodfellow, I., Zaremba, W., et al. (2016). "Improved Techniques for Training GANs." *Advances in Neural Information Processing Systems*, 29.
9. Goodfellow, I. (2016). "NIPS 2016 Tutorial: Generative Adversarial Networks." *arXiv preprint arXiv:1701.00160*. https://arxiv.org/abs/1701.00160
10. Bond-Taylor, S., Leach, A., Long, Y., and Sherrington, C. G. (2022). "Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models." *IEEE Transactions on Pattern Analysis and Machine Intelligence*, 44(11), 7327-7347.
11. Hinton, G. E. and Sejnowski, T. J. (1986). "Learning and Relearning in Boltzmann Machines." In *Parallel Distributed Processing*, Vol. 1, MIT Press.
12. Radford, A., Wu, J., Child, R., et al. (2019). "Language Models are Unsupervised Multitask Learners." *OpenAI Technical Report*.
13. Hu, K. (2023). "ChatGPT sets record for fastest-growing user base." *Reuters*, February 2, 2023, citing a UBS study estimating 100 million monthly active users in January 2023. https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/
14. OpenAI (2022). "DALL-E 2." Announced April 6, 2022. https://openai.com/index/dall-e-2/
15. Stability AI (2022). "Stable Diffusion Public Release." August 22, 2022. https://stability.ai/news/stable-diffusion-public-release
16. OpenAI (2023). "GPT-4." Released March 14, 2023. https://openai.com/index/gpt-4-research/