# Feedback Loop

> Source: https://aiwiki.ai/wiki/feedback_loop
> Updated: 2026-06-27
> Categories: AI Ethics, Machine Learning
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

*See also: [Machine learning terms](/wiki/machine_learning_terms)*

A feedback loop in [machine learning](/wiki/machine_learning) is a cycle in which a deployed model's [predictions](/wiki/prediction) influence the real world, and the resulting data is then collected and used to retrain the same model, so the model partly shapes its own future training data. Because the model never sees a random sample of all possible outcomes (only the slice of reality its own decisions produced), feedback loops can amplify [bias](/wiki/bias_ethics_fairness), narrow data diversity, and drive [concept drift](/wiki/concept_drift), even when each individual prediction looks reasonable. The canonical example is a [recommender system](/wiki/recommender_system) that learns from clicks it caused: it recommends content, users engage with what they are shown, and that engagement is read back as preference, steering future recommendations toward what was already promoted [1][11].

## What is a feedback loop?

A feedback loop is a systemic mechanism in which the output of a process serves as input for subsequent iterations of that same process, creating a self-reinforcing or self-correcting cycle. In [machine learning](/wiki/machine_learning), feedback loops arise when a model's [predictions](/wiki/prediction) influence the real-world environment, and the resulting data is then used to retrain or update the model. This circular dependency can improve model performance over time, but it can also amplify [biases](/wiki/bias_ethics_fairness), narrow data diversity, and cause unintended societal consequences.

Feedback loops are ubiquitous in both natural and artificial systems, appearing in ecology, economics, engineering, and software. In AI systems, understanding feedback loops is critical because models increasingly make decisions that shape the very data they learn from. A [recommendation system](/wiki/recommender_system) changes what users see, which changes what users click, which changes what the model recommends next. A hiring algorithm decides who gets interviewed, and only data from interviewed candidates feeds back into the model. These cycles can be virtuous or vicious depending on design, monitoring, and intervention.

Google's widely cited engineering guide *Rules of Machine Learning* warns of one of the most common forms directly: "The position of content dramatically affects how likely the user is to interact with it. If you put an app in the first position it will be clicked more often, and you will be convinced it is more likely to be clicked" [11]. The guide's Rule #35 makes the general principle explicit: "When you switch your ranking algorithm radically enough that different results show up, you have effectively changed the data that your algorithm is going to see in the future" [11].

## Explain Like I'm 5 (ELI5)

Imagine you have a robot that picks your lunch every day. On Monday, the robot gives you pizza, and you eat it because it is there. The robot sees you ate pizza, so on Tuesday it gives you pizza again. You eat it again because, once more, it is the only option. Now the robot is really sure you love pizza, so it gives you pizza every day forever. You never get to try tacos or sushi because the robot only learns from what it already gave you. That is a feedback loop: the robot's choices affect what data it gets back, which makes it keep making the same choices.

## What are positive and negative feedback loops?

Feedback loops are broadly classified into two types: positive (self-reinforcing) and negative (self-correcting).

| Type | Behavior | Effect | ML Example |
|------|----------|--------|------------|
| Positive feedback loop | Each iteration's output reinforces the previous output | Exponential growth or runaway behavior | A trending video gets recommended more, gains more views, and gets recommended even more |
| Negative feedback loop | Each iteration's output dampens the input for future iterations | Stability and convergence | A model detects it is over-predicting fraud, reduces its sensitivity, and reaches a balanced false-positive rate |

In [machine learning](/wiki/machine_learning), positive feedback loops are more commonly discussed because they present greater risks. When a model's output amplifies certain patterns in its [training](/wiki/training) data, the model can become increasingly confident in narrow predictions while losing the ability to handle diverse inputs. Negative feedback loops, on the other hand, are often built intentionally into systems as stabilizing mechanisms, such as regularization techniques or threshold-based corrections.

## What are direct and hidden feedback loops?

A second axis, beyond positive versus negative, distinguishes how directly a model's output reaches its own training data. In a direct feedback loop, the model's predictions immediately become labeled training examples: a system that ranks search results, then trains on the clicks those rankings produced, closes the loop in a single step. In a hidden (or indirect) feedback loop, the influence travels through the environment first and is often many steps removed, so the connection between a model's decision and the data it later sees is unintuitive and easy to miss. Hidden loops are generally considered more dangerous precisely because they are harder to detect: the model's output changes user behavior, market conditions, or what a downstream model does, and only that altered world is observed [11][12].

Google's *Rules of Machine Learning* describes the hidden case in Rule #28, noting that "identical short-term behavior does not imply identical long-term behavior," because a system that only surfaces what was historically clicked prevents new items from ever being discovered, quietly reshaping the data it will be trained on next [11]. Researchers studying online recommender systems have formalized the hidden feedback loop as a degenerate equilibrium: a 2024 analysis derived "a mathematical model of the hidden feedback loop effect" showing how a system can collapse toward serving an ever-narrowing band of content even when each step optimizes a reasonable objective [12].

## How do feedback loops form in machine learning?

A feedback loop in ML forms through a recurring cycle with several stages:

1. **Model deployment**: A trained model is deployed to make [predictions](/wiki/prediction) or decisions in a real-world environment.
2. **Environmental influence**: The model's outputs change user behavior, business processes, or data collection patterns.
3. **Data generation**: New data is generated as a result of the model's influence on the environment.
4. **Model retraining**: The new data is collected and used to retrain or fine-tune the model.
5. **Cycle repeats**: The updated model is redeployed, and the cycle begins again.

The key issue is that the model does not receive a random sample of all possible outcomes. It only observes outcomes that resulted from its own decisions. This creates a selection bias known as the "closed feedback loop" problem, where the model learns only from the slice of reality it helped create.

## How do feedback loops affect recommendation systems?

Recommendation systems provide one of the most studied examples of feedback loops in AI. When a platform recommends content to users, users engage with some of that content through clicks, views, likes, or shares. The algorithm interprets this engagement as a signal of preference and recommends similar content in the future. Over time, this cycle can narrow the range of content a user sees.

### What are filter bubbles and echo chambers?

The terms "filter bubble" and "echo chamber" describe situations where feedback loops in [recommendation systems](/wiki/recommender_system) progressively reduce a user's exposure to diverse perspectives. Most algorithms do not distinguish between inherent user interest and engagement driven simply by what was presented. This means the system inadvertently reinforces the popularity of already-popular content and steers users toward increasingly homogeneous information diets.

Researchers have identified several contributing factors:

| Factor | Description |
|--------|-------------|
| Algorithmic bias | The recommendation algorithm optimizes for engagement metrics rather than diversity or accuracy |
| Data bias | Historical interaction data reflects past recommendations, not the full range of user preferences |
| Cognitive bias | Users tend to engage with content that confirms existing beliefs (confirmation bias) |
| Popularity bias | Popular items receive more exposure, generating more data, which further increases their prominence |

However, the empirical evidence for filter bubbles is mixed. A 2023 systematic review found that while all components of the engagement-recommendation-belief feedback loop have been demonstrated in some contexts, the effect sizes tend to be small or context-dependent, and researchers use widely varying definitions, making it difficult to draw firm conclusions about the overall magnitude of the problem [1].

## How do feedback loops cause bias in predictive policing?

Predictive policing systems illustrate how feedback loops can cause serious societal harm. These systems use historical crime data to predict where crimes are likely to occur and direct police patrols accordingly. The feedback loop operates as follows: the algorithm identifies a neighborhood as high-risk, police are deployed there in greater numbers, more crimes are detected in that area due to increased surveillance, and the resulting data reinforces the algorithm's assessment that the neighborhood is high-risk [2].

This is a textbook example of a runaway positive feedback loop. Research by Ensign et al. (2018) demonstrated mathematically that predictive policing systems are susceptible to "runaway feedback loops, where police are repeatedly sent back to the same neighborhoods regardless of the true crime rate" [3]. Because discovered-crime data such as arrest counts is used to update the model, the system feeds on the patrols it already ordered. The authors proved why this occurs and showed it can be corrected by changing the inputs in a black-box manner so that the runaway loop does not occur [3]. The problem is compounded by the fact that historically overpoliced communities, which are often communities of color, have disproportionately higher recorded crime rates not necessarily because more crime occurs there, but because more policing occurs there.

In June 2020, Santa Cruz, California became the first U.S. city to ban the use of predictive policing algorithms, after having placed a moratorium on the technology in 2017. Santa Cruz had been one of the first cities to pilot PredPol, the predictive-policing company whose product was central to the runaway-feedback-loop critique. Mayor Justin Cummings cited how "predictive policing and facial recognition can be disproportionately biased against people of color" as the reason for the ban [13]. Several other cities have since restricted these systems, and the debate over them continues.

## How do feedback loops affect hiring algorithms?

AI-powered hiring systems are vulnerable to feedback loops that entrench discrimination. When an algorithm screens resumes and selects candidates for interviews, only the selected candidates proceed through the hiring pipeline. The algorithm then receives performance data only for the candidates it chose, never learning about the potentially strong candidates it rejected [4].

This creates a self-reinforcing cycle: if the initial model has even a slight bias toward certain demographic profiles (due to biased historical hiring data), it will select more candidates from those groups, generate more positive outcome data for those groups, and become increasingly confident that its biased selection criteria are correct. Research from MIT Sloan has shown that AI does not just replicate human [biases](/wiki/bias_ethics_fairness) in hiring; it can amplify them through this feedback mechanism [5].

Mitigation strategies for hiring feedback loops include:

- Collecting outcome data on a random sample of rejected candidates
- Regular audits of selection rates across demographic groups
- Using fairness constraints during model [training](/wiki/training)
- Maintaining human oversight for a percentage of all hiring decisions

## Is RLHF a feedback loop?

[Reinforcement learning](/wiki/reinforcement_learning_rl) from human feedback (RLHF) represents a deliberately engineered feedback loop designed to align AI models with human preferences. Unlike the unintended loops above, RLHF is a human feedback loop built on purpose: a language model generates outputs, human annotators rank those outputs by quality, a reward model is trained on these rankings, and the language model is then fine-tuned using [reinforcement learning](/wiki/reinforcement_learning_rl) to maximize the reward model's scores [6].

The RLHF process typically follows three stages:

| Stage | Description |
|-------|-------------|
| Supervised fine-tuning | A pre-trained model is fine-tuned on high-quality demonstration data to establish a baseline policy |
| Reward model training | Human annotators rank multiple model outputs for the same prompt; a reward model learns to predict human preferences |
| RL optimization | The model is optimized to generate outputs that score highly according to the reward model |

While RLHF is powerful for alignment, it carries its own feedback loop risks. The model may learn to exploit weaknesses in the reward model rather than genuinely improving its outputs, a phenomenon known as [reward hacking](/wiki/reward_hacking). Additionally, the reward model reflects the preferences of its specific annotator pool, which may not represent diverse global perspectives. A single reward function cannot always capture the opinions of varied groups, and conflicting preferences may cause the model to favor majority opinions while disadvantaging underrepresented viewpoints.

## How do feedback loops affect content moderation?

Automated content moderation systems on social media platforms are subject to feedback loops that can affect freedom of expression. These systems are trained on datasets of content that was previously flagged and reviewed by human moderators. When the automated system removes content, that removal decision influences what content remains visible on the platform and, consequently, what new content users create and what new reports moderators review.

Feedback loops in content moderation can lead to several problems. Over-enforcement occurs when the system becomes increasingly aggressive at removing borderline content, because its training data skews toward removal decisions. Under-enforcement can also occur in languages or cultural contexts that are underrepresented in the training data, since fewer moderation decisions in those contexts generate less training signal. Platforms like Meta have faced criticism for relying on machine translation rather than investing in native-language moderation resources, leading to errors and biases in moderating content in languages such as Burmese, Amharic, and Sinhala [7].

The most effective content moderation approaches combine automated systems with human review, creating a hybrid feedback loop where human moderators can correct systematic errors before they compound.

## How do feedback loops affect fraud detection?

Fraud detection presents a unique feedback loop challenge because the environment is adversarial. Unlike most ML applications where the data distribution changes passively, fraudsters actively adapt their tactics in response to detection systems. This creates a cat-and-mouse dynamic:

1. A fraud detection model is trained on known fraud patterns and deployed.
2. The model catches fraud that matches known patterns.
3. Fraudsters observe which tactics get caught and shift to new methods.
4. The model's performance degrades on the new fraud tactics (concept drift).
5. New fraud data is collected and the model is retrained.

This adversarial feedback loop means that fraud detection models face a form of [concept drift](/wiki/concept_drift) that is deliberately induced by the actors being monitored. Research has shown that traditional drift detectors cannot always distinguish between genuine concept drift and adversarial poisoning, where bad actors intentionally inject false data to manipulate the model [8].

Organizations that excel at fraud detection address this through frequent model retraining (sometimes daily), ensemble methods that combine multiple detection approaches, and online learning algorithms that update model weights with each new confirmed fraud instance.

## How do feedback loops cause data drift?

Feedback loops are a significant cause of data drift, which is the phenomenon where the statistical distribution of input data changes over time relative to the data used during model [training](/wiki/training). When a model's decisions shape the environment, the data collected from that environment will increasingly diverge from the original training distribution. This divergence between the data a model was trained on and the data it actually sees in production is closely related to [training-serving skew](/wiki/training-serving_skew).

There are several types of drift that feedback loops can trigger:

| Drift Type | Description | Feedback Loop Cause |
|-----------|-------------|--------------------|
| Covariate drift | Input feature distributions shift | Model decisions change which data points are observed |
| Concept drift | The relationship between inputs and outputs changes | Model-influenced environment alters the underlying patterns |
| Label drift | The distribution of target labels shifts | Model actions change real-world outcomes |

For example, if a loan approval model approves more applicants from a certain income bracket, the repayment data it receives will be skewed toward that bracket. Over time, the model's understanding of default risk becomes less representative of the broader population, and its accuracy on underrepresented groups deteriorates.

## How do you monitor for feedback loops?

Detecting and monitoring feedback loops in production ML systems requires a combination of statistical testing, performance tracking, and data quality checks. Key monitoring strategies include:

**Performance monitoring**: Track model accuracy, precision, recall, and error rates over time on new data. A consistent decline in performance metrics relative to the original baseline is a strong indicator that a feedback loop is distorting the training data distribution.

**Distribution monitoring**: Use statistical tests such as the Kolmogorov-Smirnov test, chi-square test, or Population Stability Index (PSI) to compare current input feature distributions against the [training](/wiki/training) data baseline. Significant distributional shifts can reveal the influence of feedback loops.

**Outcome diversity tracking**: Monitor whether the range of model outputs is narrowing over successive retraining cycles. A decreasing diversity of [predictions](/wiki/prediction) may indicate a positive feedback loop that is collapsing the model's output space.

**A/B testing and randomization**: Periodically expose a small fraction of users or cases to randomized decisions rather than model-driven ones. This provides an unbiased sample of outcomes that can be compared against model-influenced outcomes to quantify the feedback loop's effect.

**Subgroup analysis**: Break down model performance by demographic groups, geographic regions, or other meaningful segments. Feedback loops often affect subgroups differently, and aggregate metrics can mask localized degradation.

## How do you mitigate harmful feedback loops?

Several strategies have been developed to mitigate the harmful effects of feedback loops in AI systems:

| Strategy | Description | Application |
|----------|-------------|-------------|
| Exploration-exploitation balancing | Introduce controlled randomness in model decisions to gather diverse data | Recommendation systems, hiring |
| Counterfactual evaluation | Estimate what would have happened under different model decisions | Lending, criminal justice |
| Fairness constraints | Enforce demographic parity or equalized odds during training | Hiring, credit scoring |
| Human-in-the-loop review | Require human oversight for high-stakes decisions | Content moderation, healthcare |
| Regular retraining with fresh data | Incorporate data from outside the feedback loop | Fraud detection, advertising |
| Diversity injection | Explicitly promote diverse outputs in recommendation or ranking | Search engines, news feeds |
| Causal modeling | Use causal inference to separate model influence from organic trends | Policy evaluation, A/B testing |

A comprehensive approach to feedback loop mitigation combines multiple strategies. AWS's Well-Architected Machine Learning Lens recommends establishing feedback loops across all phases of the ML lifecycle, with automated alerts that trigger investigation or retraining when distributional shifts exceed predefined thresholds [9].

## How are feedback loops classified?

Researchers have proposed formal taxonomies for feedback loops in automated decision-making systems. A 2023 study published in the ACM Conference on Fairness, Accountability, and Transparency classified feedback loops along several dimensions [10]:

- **Direct vs. indirect**: Direct feedback loops occur when model outputs immediately become part of the retraining data. Indirect feedback loops occur when model outputs influence the environment, which then influences future data collection.
- **Observed vs. unobserved**: In observed feedback loops, the system can measure the outcome of its decisions (for example, whether an approved loan was repaid). In unobserved feedback loops, the outcome of the counterfactual decision is never known (for example, whether a rejected loan applicant would have repaid).
- **Beneficial vs. degenerative**: Beneficial feedback loops incorporate unbiased external data and improve model performance. Degenerative feedback loops amplify existing biases and degrade model fairness.

## References

1. Matakos, A., et al. "Filter Bubbles in Recommender Systems: Fact or Fallacy - A Systematic Review." arXiv preprint arXiv:2307.01221 (2023).
2. GT Law. "The Perils of Feedback Loops in Machine Learning: Predictive Policing." Gilbert + Tobin Insights.
3. Ensign, D., Friedler, S. A., Neville, S., Scheidegger, C., Venkatasubramanian, S. "Runaway Feedback Loops in Predictive Policing." Proceedings of the 1st Conference on Fairness, Accountability and Transparency, PMLR 81:160-171 (2018). arXiv:1706.09847.
4. Raghavan, M., et al. "Fairness and Bias in Algorithmic Hiring: A Multidisciplinary Survey." ACM Transactions on Intelligent Systems and Technology (2024).
5. MIT Sloan. "AI is Reinventing Hiring, With the Same Old Biases." MIT Sloan Management Review.
6. Hugging Face. "Illustrating Reinforcement Learning from Human Feedback (RLHF)." Hugging Face Blog.
7. Meta Oversight Board. "Content Moderation in a New Era for AI and Automation." Oversight Board Report.
8. Cejnek, T., et al. "Adversarial Concept Drift Detection Under Poisoning Attacks for Robust Data Stream Mining." PMC (2022).
9. AWS. "Establish Feedback Loops Across ML Lifecycle Phases." AWS Well-Architected Machine Learning Lens.
10. Bountouridis, D., et al. "A Classification of Feedback Loops and Their Relation to Biases in Automated Decision-Making Systems." ACM Conference on Fairness, Accountability, and Transparency (2023). arXiv:2305.06055.
11. Zinkevich, M. "Rules of Machine Learning: Best Practices for ML Engineering." Google for Developers (Rules #28, #35, #36).
12. Khritankov, A. "A Mathematical Model of the Hidden Feedback Loop Effect in Machine Learning Systems." arXiv preprint arXiv:2405.02726 (2024).
13. Route Fifty / Governing. "Santa Cruz, Calif., Becomes First to Ban Predictive Policing" (June 2020).

