Model collapse
Last reviewed
Sources
10 citations
Review status
Source-backed
Revision
v4 ยท 4,327 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
10 citations
Review status
Source-backed
Revision
v4 ยท 4,327 words
Add missing citations, update stale details, or suggest a clearer explanation.
Model collapse is a degenerative process in which generative AI models trained recursively on data produced by previous-generation models progressively lose information, especially the rare events in the tails of the original training distribution, and converge toward low-diversity or nonsensical output. It was formally defined in a July 2024 Nature paper by Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Nicolas Papernot, Ross Anderson, and Yarin Gal, who write: "Model collapse is a degenerative process affecting generations of learned generative models, in which the data they generate end up polluting the training set of the next generation" [1]. Each generation of model-generated data introduces small approximation errors that compound over successive training cycles, so over time rare events and low-probability features are systematically lost and outputs converge toward a narrow, homogeneous subset of what the original distribution contained [1]. The phenomenon has been demonstrated in large language models, variational autoencoders (VAEs), and Gaussian mixture models (GMMs), and it has raised significant concerns about the long-term viability of training AI systems on internet-scale data as the web becomes increasingly saturated with AI-generated content.
The paper distinguishes two stages: "In early model collapse, the model begins losing information about the tails of the distribution; in late model collapse, the model converges to a distribution that carries little resemblance to the original one, often with substantially reduced variance" [1]. In a headline experiment, recursively fine-tuning Meta's OPT-125m language model on its own generated text caused output about medieval architecture to degenerate, by the ninth generation, into a repetitive list of multi-colored "tailed jackrabbits" [1].
The rapid proliferation of generative AI tools since 2022, including ChatGPT, Stable Diffusion, and other systems built on large language models and diffusion models, has led to an explosion of AI-generated text, images, audio, and video on the internet. Estimates from various research groups suggest that a substantial and growing fraction of publicly available web content is now machine-generated. This creates a feedback loop: new AI models are typically trained on data scraped from the web, which increasingly contains outputs from previous AI models. The central question motivating model collapse research is what happens when this cycle repeats across multiple generations of models.
Before the Shumailov et al. paper, several research groups had observed symptoms of what would later be called model collapse. The precursor paper, "The Curse of Recursion: Training on Generated Data Makes Models Forget," posted as a preprint in May 2023 by many of the same authors, introduced the core concept and provided early experimental evidence [2]. The 2024 Nature publication (Nature, volume 631, pages 755-759) formalized the theoretical framework, expanded the experimental scope, and brought the phenomenon to wide attention in both the research community and the general public [1].
Model collapse unfolds through a process that can be understood in several stages. At its core, the problem is one of compounding approximation errors across generations of model training. The Nature authors decompose these into three sources: a statistical approximation error (finite sampling loses rare cases), a functional expressivity error (a limited model class cannot represent the true distribution), and a functional approximation error (biases in the learning procedure itself) [1].
Consider a sequence of generative models, where model 0 is trained on authentic human-generated data, model 1 is trained on data generated by model 0, model 2 is trained on data generated by model 1, and so on. Each model in this chain learns an approximation of the data distribution it was trained on. No model perfectly captures every feature of its training data; there are always small errors introduced by finite sample sizes, limited model capacity, and the stochastic nature of training.
When the next model in the chain trains on the outputs of the previous model, it learns from data that already contains these approximation errors. The new model then introduces its own approximation errors on top of those inherited from its predecessor. With each successive generation, the errors accumulate and amplify. As the authors put it, models "trained on polluted data... then mis-perceive reality" [1].
The first visible effect is what Shumailov et al. term "early model collapse." In this stage, the model begins to lose information about the tails of the data distribution. The tails represent rare events, unusual patterns, minority viewpoints, and low-probability outcomes. These are precisely the features that are most likely to be underrepresented in any finite sample drawn from the distribution.
When model 0 generates a dataset, rare events that occur with probability less than approximately 1/M (where M is the number of samples generated) are unlikely to appear in that dataset at all. Model 1, trained on this dataset, has no opportunity to learn these rare patterns. Even events that do appear in the generated dataset may be underrepresented relative to their true frequency, causing the next model to assign them even lower probability. Over successive generations, this trimming effect propagates inward from the tails, progressively eliminating features that were increasingly common in the original distribution.
If the recursive training process continues long enough, the models enter "late model collapse." In this stage, the learned distribution has lost so much information that it bears little resemblance to the original data distribution. The variance of the distribution shrinks dramatically, and the model's outputs converge toward a narrow cluster of high-probability outputs. In the most extreme cases, the model effectively learns to produce a single mode or a small set of nearly identical outputs, regardless of the prompt or conditioning. The authors note that, over generations, the sampled data is likely to "collapse to a delta function" [1].
The mathematical explanation of model collapse rests on the statistical properties of sampling and density estimation across multiple generations.
Suppose the true data distribution P has a probability density function p(x). When we draw M samples from P to create a training set, the empirical distribution P_hat approximates P but necessarily loses information about regions where p(x) is very small. Specifically, any event with probability less than about 1/M is likely to be completely absent from the sample. This is not a failure of the sampling procedure; it is a fundamental statistical limitation.
When a generative model is trained on these M samples, it learns an approximation Q of P_hat, which is itself an approximation of P. The model Q may further smooth, distort, or truncate the distribution due to its own inductive biases and capacity limitations.
Let P_0 = P be the original distribution, and let P_n be the distribution learned by the nth generation model. Each generation introduces an error term. The total error after n generations can be decomposed into two components [1]:
Shumailov et al. showed that for distributions with different tail characteristics (light-tailed vs. heavy-tailed), the resampling process causes the observed distributions to converge. Although the original distributions may be very different, after sufficient rounds of resampling and refitting, they become indistinguishable because the tails that differentiated them have been eliminated [1].
The mathematical analysis reveals that tail behavior is the key factor determining vulnerability to model collapse. Distributions with heavier tails (such as power-law or Pareto distributions) lose more information per generation because their tails extend further into low-probability regions. Light-tailed distributions (such as Gaussian distributions) are somewhat more resilient but still degrade over sufficient generations.
The critical insight is that the information lost in each generation is not random; it is systematically biased toward rare events. This means that model collapse disproportionately affects the representation of minority groups, unusual patterns, creative outliers, and rare but important knowledge. The authors stress that such low-probability events "are often relevant to marginalized groups" and are "vital to understand complex systems" [1].
| Distribution Type | Tail Behavior | Vulnerability to Model Collapse | Information Loss Pattern |
|---|---|---|---|
| Gaussian (light-tailed) | Exponential decay | Moderate | Gradual variance reduction |
| Power-law (heavy-tailed) | Polynomial decay | High | Rapid tail trimming |
| Bounded distributions | Hard cutoff | Lower (but still present) | Edge erosion |
| Mixture models | Multiple modes | High | Minor modes disappear first |
Shumailov et al. provided experimental evidence of model collapse across three different types of generative models, demonstrating that the phenomenon is not specific to a single architecture but is a general property of recursive training on synthetic data.
The simplest experimental setting involved Gaussian mixture models (GMMs). A GMM was first fitted to a dataset, then used to generate synthetic data, which was used to fit a new GMM, and so on. After several generations, the fitted GMMs showed dramatically reduced variance, with modes collapsing toward the overall mean. Minor modes (representing less frequent clusters in the original data) disappeared first, followed by progressive convergence of the remaining modes. This experiment served as a clean illustration of the mathematical principles because GMMs have well-understood statistical properties.
Experiments with variational autoencoders (VAEs) trained on image data showed similar degradation. When VAEs were recursively trained on their own generated images, the quality and diversity of generated images decreased with each generation. Fine details were lost first, followed by broader structural features. After several generations, the generated images became blurry and homogeneous, lacking the variety present in the original training set.
The most consequential experiments involved large language models. The researchers fine-tuned OPT-125m, an open-source 125-million-parameter language model released by Meta, in a recursive chain where each generation was trained on text produced by its predecessor. The training data was based on the WikiText-2 dataset [1].
The quantitative results were striking. The base model fine-tuned on real WikiText-2 data reached a mean perplexity of 34, down from a zero-shot baseline of 115, confirming that it had learned the task (perplexity is a standard metric of language-model quality where lower values indicate better performance) [1]. When subsequent generations were trained for five epochs with no original data retained, performance degraded by roughly 20 to 28 perplexity points across the chain [1]. By contrast, in a setting where 10% of the original training data was re-sampled at every generation, the paper reports that "preservation of the original data allows for better model fine-tuning and leads to only minor degradation of performance" [1].
The qualitative collapse was even more vivid. Starting from a generation-0 passage about Perpendicular and Revival church architecture (referencing St. John's Cathedral in London), later generations drifted off-topic; by the ninth generation the model produced text about "some of the world's largest populations of black-tailed jackrabbits, white-tailed jackrabbits, blue-tailed jackrabbits, red-tailed jackrabbits," content entirely unrelated to the original prompt [1]. The authors also documented heavy phrase repetition, and found that adding a repetition penalty made matters worse, causing perplexity to roughly double compared with the original [1].
| Generation | Observed Behavior | Perplexity Trend |
|---|---|---|
| 0 (original) | Coherent, diverse text matching training distribution | Baseline (34 mean perplexity) |
| 1-2 | Slight loss of rare vocabulary and unusual phrasings | Modest increase |
| 3-5 | Noticeable reduction in topic diversity; repetitive patterns emerge | Significant increase |
| 5-9 | Severe topic drift; nonsensical or irrelevant outputs | Steep increase |
| 9+ | Near-total collapse; outputs unrelated to original distribution | Very high |
The Shumailov et al. findings prompted significant follow-up research, including both supporting studies and critical responses.
A 2025 paper published at ICLR, "Strong Model Collapse," extended the theoretical analysis and provided additional experimental evidence [3]. The paper demonstrated that model collapse can occur even when models have access to a mixture of real and synthetic data, although the rate of collapse depends on the proportion of synthetic data in the training mix.
Researchers at various institutions replicated the core findings using different model architectures and datasets, confirming that model collapse is a robust phenomenon rather than an artifact of specific experimental choices.
Ali Borji published a detailed critique in October 2024, "A Note on Shumailov et al. (2024)," arguing that some aspects of the experimental setup may not reflect real-world training practices [4]. The critique noted that actual AI training pipelines typically involve data curation, filtering, and mixing of multiple data sources rather than naive recursive training on a single model's outputs. Other researchers pointed out that the severity of model collapse depends heavily on the ratio of synthetic to real data in the training mix, and that careful data curation can substantially mitigate the effect.
A key finding from the paper "Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data" (Gerstgrasser et al., 2024) showed that if real data is accumulated alongside synthetic data rather than replaced by it, model collapse can be avoided [5]. The authors prove that when each new generation of synthetic data is added to (rather than substituted for) the original real data, the test error has a finite upper bound that is independent of the number of model-fitting iterations, so collapse no longer occurs; if the data are instead replaced, test error grows without bound across iterations [5]. They confirmed this by pretraining sequences of language models on text corpora. This result suggests that model collapse is not an inevitable fate but rather a consequence of specific (and avoidable) data-management practices: in practice, web archives accumulate rather than discard human data, which may explain why collapse has not yet been observed at scale in production models.
The original Nature authors reach a complementary conclusion, framing the takeaway as a "first mover advantage": "Our evaluation suggests a 'first mover advantage' when it comes to training models such as LLMs" [1]. They argue that to sustain learning over time, "we need to make sure that access to the original data source is preserved and that further data not generated by LLMs remain available over time" [1].
Model collapse has several important implications for the AI industry and research community.
The most immediate concern is that the internet, which has been the primary source of training data for large AI models, is becoming contaminated with AI-generated content at an accelerating pace. Common Crawl, the web-scraping dataset used as a foundation for many large language models, does not currently have reliable mechanisms for distinguishing human-generated content from AI-generated content. As the proportion of synthetic content on the web grows, future models trained on web-scraped data will increasingly be training on outputs of previous models, creating exactly the conditions that lead to model collapse.
Model collapse has elevated the value of datasets known to contain only human-generated content, particularly those collected before the widespread deployment of generative AI tools (roughly pre-2022). The Nature authors warn that without provenance tracking it "may become increasingly difficult to train newer versions of LLMs without access to data that were crawled from the Internet before the mass adoption of the technology" [1]. Organizations and researchers have begun treating these "pre-AI" datasets as particularly valuable resources, and some have advocated for the creation of curated data repositories with strong provenance guarantees [6].
Because model collapse preferentially eliminates rare events and low-probability patterns, it poses particular risks for the representation of minority groups, less common languages, specialized domains, and creative outliers. If models progressively lose the ability to generate or understand content from the tails of the distribution, the resulting AI systems may become less useful for users whose needs or perspectives are not part of the dominant mode.
As generative AI systems play an increasingly central role in knowledge retrieval and content creation, model collapse raises the specter of gradual knowledge loss. Obscure but accurate information, rare cultural references, specialized technical knowledge, and other forms of "long-tail" content could be progressively lost as models collapse toward the most common patterns in their training data.
Researchers and practitioners have proposed several strategies for preventing or mitigating model collapse.
The most straightforward mitigation is to ensure that training sets always include a substantial fraction of verified human-generated data. Shumailov et al. showed that access to even a modest amount of original human data can significantly slow or prevent model collapse; in their OPT-125m experiment, re-sampling just 10% of the original data each generation reduced the damage to only minor degradation [1]. Several organizations have begun investing in high-quality human annotation and data collection programs specifically to maintain this anchor.
Knowing whether a piece of training data was generated by a human or a machine is essential for preventing model collapse. Data provenance systems track the origin, history, and transformations of training data. Technologies such as AI watermarking, content credentials (C2PA), and metadata tagging can help identify synthetic content in training pipelines [6]. Strong provenance practices include recording source URLs, collection dates, generation flags (human-origin, machine-origin, or unknown), and licensing information.
Rather than excluding all synthetic data, some researchers advocate for careful curation and quality filtering. The key insight is that not all synthetic data is harmful; the problem arises from indiscriminate mixing of synthetic and real data without quality controls. Research has shown that training on filtered synthetic data can not only avoid model degradation but can sometimes enhance performance relative to training on unfiltered data [7]. Filtering strategies include deduplication, quality scoring, diversity checks, and comparison against known real-data benchmarks.
Research on the optimal mixing of real and synthetic data has shown that there exist ratios at which the benefits of additional data (even synthetic data) outweigh the risks of model collapse. A 2024 study derived theoretical bounds on the optimal fraction of synthetic data as a function of the total dataset size and the quality of the synthetic data [5]. The practical recommendation is to keep synthetic data as a supplement to, not a replacement for, real data, and to control its proportion carefully.
A 2025 approach proposed "escaping model collapse via synthetic data verification," where synthetic data is validated against quality criteria before being included in training sets [8]. This can involve checking the synthetic data for statistical consistency with known real-data distributions, evaluating its diversity, and removing samples that show signs of mode collapse or other degradation.
Continuous monitoring of model performance on distribution tails is an important practical mitigation. By evaluating models on slice-based benchmarks that specifically test performance on rare events, minority categories, and low-frequency patterns, practitioners can detect early signs of model collapse before it becomes severe. This approach allows for corrective action (such as augmenting training data with more real examples from underrepresented categories) before the damage becomes irreversible.
| Mitigation Strategy | Mechanism | Effectiveness | Practical Challenges |
|---|---|---|---|
| Real data anchoring | Maintain fraction of verified human data | High (prevents collapse when sufficient) | Sourcing and verifying human data at scale |
| Data provenance tracking | Identify and label synthetic content | Moderate to high | No universal standard; watermarks can be stripped |
| Synthetic data filtering | Remove low-quality synthetic samples | Moderate | Requires reliable quality metrics |
| Optimal mixing ratios | Control synthetic/real data proportion | High (when ratios are well-calibrated) | Optimal ratios depend on domain and model |
| Tail performance monitoring | Detect early collapse on rare categories | Moderate (detection, not prevention) | Requires representative evaluation sets |
| Data verification | Validate synthetic data before inclusion | Moderate to high | Computationally expensive at scale |
Model collapse is related to several other phenomena in machine learning and statistics.
Mode collapse in generative adversarial networks (GANs) is a well-known training failure where the generator produces only a small subset of the possible outputs, ignoring other modes of the data distribution. While the mechanism is different (mode collapse in GANs arises from the adversarial training dynamics rather than recursive training on synthetic data), the outcome is similar: a loss of diversity in generated outputs. Model collapse can be seen as a multi-generational analog of mode collapse.
Catastrophic forgetting occurs when a neural network trained on a new task loses its ability to perform previously learned tasks. Model collapse involves a related but distinct form of forgetting: the model loses its knowledge of the tails of the distribution rather than entire tasks. Both phenomena reflect the limited capacity of neural networks to retain information when exposed to new data that does not adequately represent what came before.
In traditional machine learning operations (MLOps), data drift refers to changes in the distribution of input data over time, which can degrade model performance. Model collapse can be viewed as a self-inflicted form of data drift, where the model's own outputs contaminate future training data, causing the training distribution to shift away from the original real-world distribution.
As of mid-2026, model collapse is a widely recognized risk in the AI research and industry communities, but it is not yet a fully solved problem.
The consensus view is that naive recursive training on model-generated data without real-data anchoring will lead to collapse, and this has been confirmed across multiple model types and experimental settings. However, the practical severity of the problem depends heavily on how training data is managed. Real-world AI training pipelines already incorporate some degree of data curation, filtering, and mixing, which provides partial protection against model collapse.
Major AI companies have responded by investing in data provenance infrastructure, curating high-quality human-generated datasets, and developing watermarking and detection tools to identify AI-generated content in their training pipelines. The EU AI Act, which includes transparency requirements for AI-generated content labeling and watermarking that become enforceable in August 2026, is expected to create additional incentives for provenance tracking that could help mitigate model collapse risks [9].
However, several challenges remain. There is no universal standard for identifying AI-generated content, and existing detection tools have significant limitations. The proportion of AI-generated content on the internet continues to grow, and it is unclear whether current mitigation strategies will be sufficient at the scale of the entire web. Research published at ICLR 2025 confirmed that the risk of model collapse is not yet fully addressed, even as practical mitigations continue to improve [3].
A notable finding published in Physical Review Letters in May 2026 by researchers from King's College London, the Norwegian University of Science and Technology, and the Abdus Salam International Centre for Theoretical Physics found that introducing as little as a single real-world data point into an otherwise synthetic training set can prevent model collapse entirely. Analysis of Exponential Family models demonstrated this effect, with similar results observed in Restricted Boltzmann Machines. The researchers framed recursive self-training as "data cannibalism" and showed that even a vanishingly small anchor of human-generated data is sufficient to prevent the recursive degradation process [10]. "By focusing on a simple model, we can establish why adding just one data point prevents them from generating gibberish from an objective, statistical standpoint," said co-author Professor Yasser Roudi, adding that "from this foundation, we can establish principles that will be vital in future AI construction" [10]. The result strengthens the theoretical case for real-data anchoring as the primary mitigation strategy.
The model collapse problem has also prompted broader reflection on the sustainability of current approaches to AI training. Some researchers have argued that it highlights the need for fundamentally new approaches to data collection and model training, rather than relying on ever-larger web scrapes. Others have pointed out that model collapse is ultimately a problem of information loss, and that solving it requires treating training data as a carefully managed resource rather than an abundant commodity.