In-group bias (also called in-group favoritism or in-group preference) is a pattern of behavior in which individuals systematically favor members of their own social group over members of other groups. Originating in social psychology, the concept has become increasingly relevant to artificial intelligence and machine learning, where it manifests as systematic skew in data collection, annotation, model training, and algorithmic decision-making. When in-group bias enters the ML pipeline, it can cause models to perform unevenly across demographic groups, reinforce societal stereotypes, and produce discriminatory outcomes in high-stakes domains such as hiring, criminal justice, healthcare, and content moderation.
Imagine you are on a soccer team. You probably think the kids on your team are better players and nicer people than the kids on the other team, even if you do not actually know the other kids very well. That is in-group bias: you like people in "your group" more than people who are not.
Computers can have the same problem. When people teach a computer to make decisions, the information they give it might already treat some groups of people better than others. If a computer learns from that lopsided information, it can end up being unfair, like always picking one group of people for a job and ignoring everyone else. Scientists work hard to find and fix these unfair patterns so that computers treat everyone fairly.
The study of in-group bias has deep roots in 20th-century social psychology. William Graham Sumner introduced the related concept of "ethnocentrism" in 1906, describing the tendency for people to view their own group as the center of everything and to rate all others with reference to it.
The most influential experimental framework for studying in-group bias is the minimal group paradigm, developed by Henri Tajfel and colleagues in the early 1970s. In these experiments, participants were assigned to groups based on trivial and arbitrary criteria, such as a stated preference for paintings by Paul Klee versus Wassily Kandinsky. Despite the meaninglessness of the group assignment, participants consistently allocated more resources (points or money) to members of their own group than to members of the other group. This finding was striking because it demonstrated that conflict, competition, and prior relationships between groups are not necessary for in-group favoritism to appear; mere categorization into groups is sufficient.
Tajfel and Michael Billig extended these findings in 1973, confirming that people can form self-preferencing in-groups within minutes based on completely arbitrary characteristics.
Building on the minimal group results, Tajfel and John Turner proposed Social Identity Theory (SIT) in 1979. SIT holds that individuals derive part of their self-concept from the social groups to which they belong. People are motivated to maintain a positive self-image, and one way they do this is by favorably comparing their in-group to relevant out-groups. This process involves three stages:
In-group bias is connected to several other cognitive and perceptual effects:
| Phenomenon | Description |
|---|---|
| Out-group homogeneity effect | People perceive members of out-groups as more similar to one another than they actually are, while viewing in-group members as more diverse and distinct. |
| Confirmation bias | The tendency to search for, interpret, and remember information that confirms one's existing beliefs, which can reinforce in-group preferences. |
| Ultimate attribution error | Positive behavior by in-group members is attributed to disposition ("they are good people"), while the same behavior by out-group members is attributed to external factors. |
| Own-race bias | People are better at recognizing faces of their own racial group, linked to differential processing in the fusiform face area (FFA) of the brain. |
| Group polarization | Group discussions tend to push members toward more extreme positions in the direction the group already leans. |
Evolutionary psychologists argue that in-group favoritism is an evolved mechanism selected for the advantages of coalition affiliation. Throughout most of human history, membership in a cooperative group provided access to shared resources, collective defense, and social learning. Individuals who favored and cooperated with their group members were more likely to survive and reproduce. However, research also shows that in-group preferences remain flexible; people can be recategorized into new groups relatively easily, suggesting the bias is not rigidly tied to specific social categories.
In-group bias can infiltrate every stage of the machine learning pipeline, from data collection through deployment. Understanding these entry points is essential for building fair systems.
Training datasets often reflect the composition and priorities of the groups that create them. When certain populations are overrepresented and others are underrepresented, models learn to optimize performance for the majority group at the expense of minorities. This is sometimes called coverage bias or representation bias.
For example, early computer vision datasets used for facial analysis were composed predominantly of lighter-skinned subjects. The Gender Shades study by Joy Buolamwini and Timnit Gebru (2018) evaluated three commercial gender classification systems and found that darker-skinned females were misclassified at rates up to 34.7%, while the maximum error rate for lighter-skinned males was just 0.8%. The disparity stemmed directly from training data that did not adequately represent the full range of human skin tones.
Data labeling is one of the stages most susceptible to in-group bias. Human annotators bring their own cultural backgrounds, beliefs, and social identities to the labeling task. Research has documented several patterns:
These patterns mean that the demographic composition of an annotation team directly shapes the labels in the dataset, which in turn shapes what the model learns. A homogeneous annotation team is likely to produce labels that reflect the norms and sensitivities of that particular group, potentially encoding in-group bias into the ground truth.
Feature engineering decisions can introduce in-group bias when certain features correlate with protected attributes such as race, gender, or socioeconomic status. For instance, using zip code as a feature in a lending model can serve as a proxy for race due to residential segregation, even if race is not explicitly included as an input variable.
Word embeddings, which are dense vector representations of words learned from large text corpora, have been shown to absorb and amplify societal stereotypes present in their training data. In a landmark 2016 paper, Bolukbasi et al. demonstrated that Word2Vec embeddings trained on Google News articles encoded gender stereotypes: the analogy "man is to computer programmer as woman is to homemaker" emerged directly from the embedding space. These biased representations propagate through any downstream model that uses them, effectively baking in-group and out-group associations into natural language processing systems.
During training, models optimize for aggregate performance metrics such as accuracy or loss. If the training data is imbalanced, the model will learn to perform well on the majority group because doing so yields the greatest reduction in overall error. Minority groups, contributing fewer examples to the total loss, receive less optimization attention. The result is a model that works well for the "in-group" represented in the data and poorly for everyone else.
Several high-profile cases have demonstrated how in-group bias can produce real-world harm when encoded in AI systems.
One of the most widely discussed examples of algorithmic bias involves the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) tool, used in the US criminal justice system to predict the likelihood that a defendant would reoffend. A 2016 investigation by ProPublica analyzed more than 10,000 criminal defendants in Broward County, Florida, and found that Black defendants were nearly twice as likely as White defendants to be incorrectly classified as high-risk for recidivism, while White defendants were more likely to be incorrectly flagged as low-risk.
After controlling for criminal history, age, and gender, Black defendants were 77% more likely to be scored as higher risk for future violent crime and 45% more likely to be scored as higher risk for any future crime. The tool's developer, Northpointe (now Equivant), argued that the overall accuracy rate of approximately 61% was equal across racial groups, but the distribution of errors was markedly unequal.
In a 2019 study published in Science, Obermeyer et al. examined a widely deployed healthcare algorithm (used on approximately 200 million patients annually in the United States) that determined which patients should be referred to "high-risk care management" programs. The algorithm used healthcare spending as a proxy for healthcare needs. Because Black patients historically had less money spent on their care (due to systemic barriers to healthcare access), the algorithm systematically underestimated their health needs.
The researchers found that at any given risk score, Black patients had on average 26% more chronic conditions than White patients with the same score. The bias reduced the proportion of Black patients identified for extra care by more than half: only 17.7% of patients automatically flagged for enrollment were Black, compared to an estimated 46.5% that should have been. Replacing the cost-based proxy with a combined health-and-cost metric reduced the bias by 86%.
The Gender Shades study (Buolamwini and Gebru, 2018) revealed that commercial facial analysis systems from IBM, Microsoft, and Face++ had dramatically different error rates across intersectional demographic groups. A follow-up report by the National Institute of Standards and Technology (NIST) in 2019 tested 189 facial recognition algorithms and found that many were 10 to 100 times more likely to misidentify a person with dark skin than a person with light skin.
Real-world consequences have followed. In 2020, a Black man named Robert Williams was wrongfully arrested in Detroit based on a faulty facial recognition match. The American Civil Liberties Union (ACLU) tested Amazon's Rekognition system on photos of US Congress members and found 28 false matches with a mugshot database, disproportionately affecting members of color.
In 2015, Amazon discontinued an AI-based resume screening tool after discovering it was biased against women. The system had been trained on resumes submitted over a 10-year period, during which the majority of applicants were men. The model learned to penalize resumes containing the word "women's" (as in "women's chess club captain") and downranked candidates from all-women's colleges.
Beyond the Bolukbasi et al. (2016) finding, subsequent research has revealed that embeddings trained on different corpora absorb different cultural biases. Caliskan, Bryson, and Narayanan (2017) used the Word Embedding Association Test (WEAT) to show that word embeddings replicate a wide range of human biases documented in the Implicit Association Test (IAT), including associations between European-American names and pleasant words, and between African-American names and unpleasant words.
Large language models (LLMs) such as GPT, BERT, and their successors are trained on massive text corpora that reflect the biases, perspectives, and demographics of their authors. Because English-language internet text is disproportionately produced by certain demographic and cultural groups, LLMs can absorb and reproduce the in-group preferences of those groups.
A 2024 study published in Nature Computational Science by Hu et al. tested 77 different LLMs using sentence completion prompts. Models were asked to complete sentences beginning with "We are" (in-group framing) and "They are" (out-group framing). The results showed that in-group sentences were 93% more likely to receive positive completions than out-group sentences, demonstrating that nearly all base models and some instruction-tuned models display measurable in-group favoritism and out-group derogation.
The study also found that these biases manifest in naturalistic human-LLM conversations, not just in controlled experimental settings. Models trained primarily on Western, English-language text tend to exhibit cultural values aligned with self-expression, individualism, and tolerance norms that are more common in Western societies, while underrepresenting perspectives from other cultural traditions.
Quantifying bias in machine learning systems requires formal fairness metrics. Several metrics have been developed to capture different aspects of group-level fairness.
| Metric | Definition | Key property |
|---|---|---|
| Demographic parity | The proportion of positive predictions should be equal across all groups. | Ensures equal selection rates regardless of group membership. |
| Equalized odds | True positive rates and false positive rates should be equal across groups. | Accounts for actual outcomes, not just prediction rates. |
| Disparate impact ratio | The ratio of selection rates between the least-favored group and the most-favored group. The "four-fifths rule" requires this ratio to be at least 0.8. | Commonly used in US employment law as a legal threshold. |
| Calibration | Predicted probabilities should reflect true probabilities equally across groups. | Ensures that a "70% risk score" means 70% for all groups. |
| Counterfactual fairness | A decision is fair if it would remain the same in a counterfactual world where the individual belonged to a different group. | Captures individual-level rather than group-level fairness. |
| Predictive parity | The positive predictive value (precision) should be equal across groups. | Ensures that positive predictions are equally reliable for all groups. |
An important theoretical result, sometimes called the "impossibility theorem of fairness," shows that demographic parity, equalized odds, and calibration cannot all be satisfied simultaneously except in trivial cases (Chouldechova, 2017; Kleinberg et al., 2016). This means practitioners must choose which notion of fairness is most appropriate for their specific application context.
Researchers and practitioners have developed a range of techniques to detect and reduce in-group bias in ML systems. These are typically categorized by the stage of the pipeline at which they intervene.
Pre-processing techniques modify the training data before a model is trained.
| Method | Description |
|---|---|
| Reweighting | Assigns different weights to training samples based on their group-label combination to counteract representation imbalances. |
| Data augmentation | Generates synthetic examples for underrepresented groups to balance the dataset. |
| Disparate impact remover | Modifies feature values to improve group fairness while preserving rank ordering within groups. |
| Learning fair representations | Learns a transformed feature space that is informative for prediction but obfuscates information about protected attributes. |
| Diverse annotation teams | Ensures that data labeling teams include annotators from a variety of demographic backgrounds to reduce systematic labeling bias. |
In-processing techniques modify the learning algorithm itself to incorporate fairness constraints during training.
| Method | Description |
|---|---|
| Adversarial debiasing | Trains a classifier and an adversary simultaneously; the adversary tries to predict the protected attribute from the classifier's output, and the classifier is penalized for making this possible. |
| Prejudice remover | Adds a discrimination-aware regularization term to the learning objective. |
| Meta-fair classifier | Takes a fairness metric as input and returns a classifier optimized to satisfy that metric. |
| Constrained optimization | Incorporates fairness constraints directly into the optimization problem (e.g., through Lagrangian relaxation). |
Post-processing techniques adjust model outputs after training to improve fairness.
| Method | Description |
|---|---|
| Threshold adjustment | Sets different classification thresholds for different groups to equalize error rates. |
| Calibrated equalized odds | Adjusts predictions to satisfy equalized odds while maintaining calibration as much as possible. |
| Reject option classification | Gives favorable outcomes to disadvantaged groups and unfavorable outcomes to advantaged groups for instances near the decision boundary. |
Several open-source toolkits are available for detecting and mitigating bias:
In-group bias overlaps with but is distinct from several related concepts in the AI fairness literature.
| Concept | Relationship to in-group bias |
|---|---|
| Algorithmic bias | A broader term covering any systematic error in an AI system that produces unfair outcomes. In-group bias is one specific source of algorithmic bias. |
| Coverage bias | Occurs when some groups are underrepresented in the training data. In-group bias can cause coverage bias when data collectors preferentially sample from their own communities. |
| Prediction bias | A statistical property where a model's average prediction differs from the average observed value. In-group bias can contribute to prediction bias for underrepresented groups. |
| Selection bias | Arises when the process of selecting data introduces systematic differences. In-group bias can drive selection bias when gatekeepers preferentially include data from their own group. |
| Confirmation bias | A cognitive bias where individuals favor information that confirms existing beliefs. Annotators may exhibit confirmation bias that reinforces in-group preferences during labeling. |
| Disparate treatment | Occurs when a system explicitly uses a protected attribute (such as race or gender) in its decision-making. In-group bias more commonly leads to disparate impact through proxy variables. |
In-group bias in AI systems has documented consequences across many sectors.
Risk assessment tools like COMPAS are used in pretrial detention, sentencing, and parole decisions across the United States. When these tools encode racial bias, they contribute to the disproportionate incarceration of minority groups. Multiple studies have shown that recidivism prediction instruments produce unequal false positive rates across racial groups.
Clinical decision support algorithms that rely on historical healthcare spending data can systematically underestimate the needs of populations that have historically faced barriers to care. Beyond the Obermeyer et al. (2019) finding, researchers have identified racial bias in pulmonary function test algorithms that "race-normalize" results, leading to underdiagnosis of respiratory conditions in Black patients.
Automated resume screening and candidate ranking systems can reproduce historical hiring patterns that favored certain demographic groups. The Amazon recruiting tool case demonstrated how models trained on biased historical data learn to discriminate against groups that were historically underrepresented in the applicant pool.
Sentiment analysis and hate speech classifiers are particularly susceptible to annotation bias because the task is inherently subjective. Research has shown that hate speech classifiers learn normative social stereotypes from their training data, and the demographic composition of the annotation team significantly shapes what the classifier treats as hateful content.
Credit scoring and loan approval algorithms can produce disparate outcomes along racial and gender lines when they rely on features that correlate with protected attributes. Zip code, purchasing patterns, and educational history can all serve as proxies for demographic group membership.
Image recognition systems trained on demographically imbalanced datasets produce accuracy disparities across racial and gender groups. When these systems are deployed for law enforcement surveillance, the uneven error rates create disproportionate risks of misidentification for already-marginalized communities.
Organizations developing AI systems can adopt several practices to reduce the risk of in-group bias:
Audit training data for demographic representation. Before training begins, analyze the dataset to determine whether all relevant groups are adequately represented. If certain groups are underrepresented, consider augmentation, resampling, or targeted data collection.
Diversify annotation teams. Ensure that the people labeling data come from a range of demographic backgrounds and provide clear, detailed annotation guidelines to reduce subjective variation.
Use multiple independent annotators. Having each data point labeled by several annotators and aggregating their judgments (for example, by majority vote) can reduce the influence of any single annotator's biases.
Implement blind labeling. Where possible, remove demographic information from the data that annotators see during the labeling process.
Evaluate models across subgroups. Report performance metrics (accuracy, precision, recall, F1 score) separately for each demographic group, not just in aggregate.
Select appropriate fairness metrics. Choose fairness criteria based on the specific application and the types of harm most likely to result from errors.
Apply bias mitigation at multiple stages. Combine pre-processing, in-processing, and post-processing techniques for more robust debiasing.
Conduct ongoing monitoring. Bias can shift over time as populations and data distributions change. Regular audits are necessary to catch emerging disparities.
Engage affected communities. Include representatives from potentially affected groups in the design, evaluation, and governance of AI systems.
Document data and model decisions. Use datasheets for datasets (Gebru et al., 2021) and model cards (Mitchell et al., 2019) to create transparency about data sources, annotation processes, and known limitations.