In-Group Bias

In-group bias (also called in-group favoritism or in-group preference) is a pattern of behavior in which individuals systematically favor members of their own social group over members of other groups. Originating in social psychology, the concept has become increasingly relevant to artificial intelligence and machine learning, where it manifests as systematic skew in data collection, annotation, model training, and algorithmic decision-making. When in-group bias enters the ML pipeline, it can cause models to perform unevenly across demographic groups, reinforce societal stereotypes, and produce discriminatory outcomes in high-stakes domains such as hiring, criminal justice, healthcare, and content moderation.

Explain like I'm 5 (ELI5)

Imagine you are on a soccer team. You probably think the kids on your team are better players and nicer people than the kids on the other team, even if you do not actually know the other kids very well. That is in-group bias: you like people in "your group" more than people who are not.

Computers can have the same problem. When people teach a computer to make decisions, the information they give it might already treat some groups of people better than others. If a computer learns from that lopsided information, it can end up being unfair, like always picking one group of people for a job and ignoring everyone else. Scientists work hard to find and fix these unfair patterns so that computers treat everyone fairly.

The study of in-group bias has deep roots in 20th-century social psychology. William Graham Sumner introduced the related concept of "ethnocentrism" in 1906, describing the tendency for people to view their own group as the center of everything and to rate all others with reference to it.

The minimal group paradigm

The most influential experimental framework for studying in-group bias is the minimal group paradigm, developed by Henri Tajfel and colleagues in the early 1970s. In these experiments, participants were assigned to groups based on trivial and arbitrary criteria, such as a stated preference for paintings by Paul Klee versus Wassily Kandinsky. Despite the meaninglessness of the group assignment, participants consistently allocated more resources (points or money) to members of their own group than to members of the other group. This finding was striking because it demonstrated that conflict, competition, and prior relationships between groups are not necessary for in-group favoritism to appear; mere categorization into groups is sufficient.

Tajfel and Michael Billig extended these findings in 1973, confirming that people can form self-preferencing in-groups within minutes based on completely arbitrary characteristics.

Building on the minimal group results, Tajfel and John Turner proposed Social Identity Theory (SIT) in 1979. SIT holds that individuals derive part of their self-concept from the social groups to which they belong. People are motivated to maintain a positive self-image, and one way they do this is by favorably comparing their in-group to relevant out-groups. This process involves three stages:

Social categorization. People classify themselves and others into social groups (by nationality, profession, ethnicity, gender, and so on).
Social identification. Individuals adopt the identity of the group they categorize themselves into and begin conforming to group norms.
Social comparison. People compare their in-group with out-groups in ways that favor the in-group, thereby boosting self-esteem.

In-group bias is connected to several other cognitive and perceptual effects:

Phenomenon	Description
Out-group homogeneity effect	People perceive members of out-groups as more similar to one another than they actually are, while viewing in-group members as more diverse and distinct.
Confirmation bias	The tendency to search for, interpret, and remember information that confirms one's existing beliefs, which can reinforce in-group preferences.
Ultimate attribution error	Positive behavior by in-group members is attributed to disposition ("they are good people"), while the same behavior by out-group members is attributed to external factors.
Own-race bias	People are better at recognizing faces of their own racial group, linked to differential processing in the fusiform face area (FFA) of the brain.
Group polarization	Group discussions tend to push members toward more extreme positions in the direction the group already leans.

Evolutionary perspective

Evolutionary psychologists argue that in-group favoritism is an evolved mechanism selected for the advantages of coalition affiliation. Throughout most of human history, membership in a cooperative group provided access to shared resources, collective defense, and social learning. Individuals who favored and cooperated with their group members were more likely to survive and reproduce. However, research also shows that in-group preferences remain flexible; people can be recategorized into new groups relatively easily, suggesting the bias is not rigidly tied to specific social categories.

How in-group bias enters the machine learning pipeline

In-group bias can infiltrate every stage of the machine learning pipeline, from data collection through deployment. Understanding these entry points is essential for building fair systems.

Data collection and representation

Training datasets often reflect the composition and priorities of the groups that create them. When certain populations are overrepresented and others are underrepresented, models learn to optimize performance for the majority group at the expense of minorities. This is sometimes called coverage bias or representation bias.

For example, early computer vision datasets used for facial analysis were composed predominantly of lighter-skinned subjects. The Gender Shades study by Joy Buolamwini and Timnit Gebru (2018) evaluated three commercial gender classification systems and found that darker-skinned females were misclassified at rates up to 34.7%, while the maximum error rate for lighter-skinned males was just 0.8%. The disparity stemmed directly from training data that did not adequately represent the full range of human skin tones.

Annotation and labeling bias

Data labeling is one of the stages most susceptible to in-group bias. Human annotators bring their own cultural backgrounds, beliefs, and social identities to the labeling task. Research has documented several patterns:

Female annotators tend to classify texts involving gender, religion, or cultural insensitivity as hateful more frequently than male annotators.
Annotators from minority racial groups are more likely to flag content that targets their community as offensive.
Annotators with disabilities show heightened sensitivity to disability-related content.
Ethnicity significantly affects labeling; one study found that Black annotators were more likely to label tweets as sexist, while Latino annotators were less likely compared to White annotators.

These patterns mean that the demographic composition of an annotation team directly shapes the labels in the dataset, which in turn shapes what the model learns. A homogeneous annotation team is likely to produce labels that reflect the norms and sensitivities of that particular group, potentially encoding in-group bias into the ground truth.

Feature selection and engineering

Feature engineering decisions can introduce in-group bias when certain features correlate with protected attributes such as race, gender, or socioeconomic status. For instance, using zip code as a feature in a lending model can serve as a proxy for race due to residential segregation, even if race is not explicitly included as an input variable.

Word embeddings and representation learning

Word embeddings, which are dense vector representations of words learned from large text corpora, have been shown to absorb and amplify societal stereotypes present in their training data. In a landmark 2016 paper, Bolukbasi et al. demonstrated that Word2Vec embeddings trained on Google News articles encoded gender stereotypes: the analogy "man is to computer programmer as woman is to homemaker" emerged directly from the embedding space. These biased representations propagate through any downstream model that uses them, effectively baking in-group and out-group associations into natural language processing systems.

Model training and optimization

During training, models optimize for aggregate performance metrics such as accuracy or loss. If the training data is imbalanced, the model will learn to perform well on the majority group because doing so yields the greatest reduction in overall error. Minority groups, contributing fewer examples to the total loss, receive less optimization attention. The result is a model that works well for the "in-group" represented in the data and poorly for everyone else.

Case studies of in-group bias in AI systems

Several high-profile cases have demonstrated how in-group bias can produce real-world harm when encoded in AI systems.

COMPAS recidivism prediction

One of the most widely discussed examples of algorithmic bias involves the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) tool, used in the US criminal justice system to predict the likelihood that a defendant would reoffend. A 2016 investigation by ProPublica analyzed more than 10,000 criminal defendants in Broward County, Florida, and found that Black defendants were nearly twice as likely as White defendants to be incorrectly classified as high-risk for recidivism, while White defendants were more likely to be incorrectly flagged as low-risk.

After controlling for criminal history, age, and gender, Black defendants were 77% more likely to be scored as higher risk for future violent crime and 45% more likely to be scored as higher risk for any future crime. The tool's developer, Northpointe (now Equivant), argued that the overall accuracy rate of approximately 61% was equal across racial groups, but the distribution of errors was markedly unequal.

Healthcare resource allocation

In a 2019 study published in Science, Obermeyer et al. examined a widely deployed healthcare algorithm (used on approximately 200 million patients annually in the United States) that determined which patients should be referred to "high-risk care management" programs. The algorithm used healthcare spending as a proxy for healthcare needs. Because Black patients historically had less money spent on their care (due to systemic barriers to healthcare access), the algorithm systematically underestimated their health needs.

The researchers found that at any given risk score, Black patients had on average 26% more chronic conditions than White patients with the same score. The bias reduced the proportion of Black patients identified for extra care by more than half: only 17.7% of patients automatically flagged for enrollment were Black, compared to an estimated 46.5% that should have been. Replacing the cost-based proxy with a combined health-and-cost metric reduced the bias by 86%.

Facial recognition systems

The Gender Shades study (Buolamwini and Gebru, 2018) revealed that commercial facial analysis systems from IBM, Microsoft, and Face++ had dramatically different error rates across intersectional demographic groups. A follow-up report by the National Institute of Standards and Technology (NIST) in 2019 tested 189 facial recognition algorithms and found that many were 10 to 100 times more likely to misidentify a person with dark skin than a person with light skin.

Real-world consequences have followed. In 2020, a Black man named Robert Williams was wrongfully arrested in Detroit based on a faulty facial recognition match. The American Civil Liberties Union (ACLU) tested Amazon's Rekognition system on photos of US Congress members and found 28 false matches with a mugshot database, disproportionately affecting members of color.

Amazon's recruiting tool

In 2015, Amazon discontinued an AI-based resume screening tool after discovering it was biased against women. The system had been trained on resumes submitted over a 10-year period, during which the majority of applicants were men. The model learned to penalize resumes containing the word "women's" (as in "women's chess club captain") and downranked candidates from all-women's colleges.

Word embedding bias

Beyond the Bolukbasi et al. (2016) finding, subsequent research has revealed that embeddings trained on different corpora absorb different cultural biases. Caliskan, Bryson, and Narayanan (2017) used the Word Embedding Association Test (WEAT) to show that word embeddings replicate a wide range of human biases documented in the Implicit Association Test (IAT), including associations between European-American names and pleasant words, and between African-American names and unpleasant words.

In-group bias in large language models

Large language models (LLMs) such as GPT, BERT, and their successors are trained on massive text corpora that reflect the biases, perspectives, and demographics of their authors. Because English-language internet text is disproportionately produced by certain demographic and cultural groups, LLMs can absorb and reproduce the in-group preferences of those groups.

A 2024 study published in Nature Computational Science by Hu et al. tested 77 different LLMs using sentence completion prompts. Models were asked to complete sentences beginning with "We are" (in-group framing) and "They are" (out-group framing). The results showed that in-group sentences were 93% more likely to receive positive completions than out-group sentences, demonstrating that nearly all base models and some instruction-tuned models display measurable in-group favoritism and out-group derogation.

The study also found that these biases manifest in naturalistic human-LLM conversations, not just in controlled experimental settings. Models trained primarily on Western, English-language text tend to exhibit cultural values aligned with self-expression, individualism, and tolerance norms that are more common in Western societies, while underrepresenting perspectives from other cultural traditions.

Measuring in-group bias: fairness metrics

Quantifying bias in machine learning systems requires formal fairness metrics. Several metrics have been developed to capture different aspects of group-level fairness.

Metric	Definition	Key property
Demographic parity	The proportion of positive predictions should be equal across all groups.	Ensures equal selection rates regardless of group membership.
Equalized odds	True positive rates and false positive rates should be equal across groups.	Accounts for actual outcomes, not just prediction rates.
Disparate impact ratio	The ratio of selection rates between the least-favored group and the most-favored group. The "four-fifths rule" requires this ratio to be at least 0.8.	Commonly used in US employment law as a legal threshold.
Calibration	Predicted probabilities should reflect true probabilities equally across groups.	Ensures that a "70% risk score" means 70% for all groups.
Counterfactual fairness	A decision is fair if it would remain the same in a counterfactual world where the individual belonged to a different group.	Captures individual-level rather than group-level fairness.
Predictive parity	The positive predictive value (precision) should be equal across groups.	Ensures that positive predictions are equally reliable for all groups.

An important theoretical result, sometimes called the "impossibility theorem of fairness," shows that demographic parity, equalized odds, and calibration cannot all be satisfied simultaneously except in trivial cases (Chouldechova, 2017; Kleinberg et al., 2016). This means practitioners must choose which notion of fairness is most appropriate for their specific application context.

Mitigation techniques

Researchers and practitioners have developed a range of techniques to detect and reduce in-group bias in ML systems. These are typically categorized by the stage of the pipeline at which they intervene.

Pre-processing methods

Pre-processing techniques modify the training data before a model is trained.

Method	Description
Reweighting	Assigns different weights to training samples based on their group-label combination to counteract representation imbalances.
Data augmentation	Generates synthetic examples for underrepresented groups to balance the dataset.
Disparate impact remover	Modifies feature values to improve group fairness while preserving rank ordering within groups.
Learning fair representations	Learns a transformed feature space that is informative for prediction but obfuscates information about protected attributes.
Diverse annotation teams	Ensures that data labeling teams include annotators from a variety of demographic backgrounds to reduce systematic labeling bias.

In-processing methods

In-processing techniques modify the learning algorithm itself to incorporate fairness constraints during training.

Method	Description
Adversarial debiasing	Trains a classifier and an adversary simultaneously; the adversary tries to predict the protected attribute from the classifier's output, and the classifier is penalized for making this possible.
Prejudice remover	Adds a discrimination-aware regularization term to the learning objective.
Meta-fair classifier	Takes a fairness metric as input and returns a classifier optimized to satisfy that metric.
Constrained optimization	Incorporates fairness constraints directly into the optimization problem (e.g., through Lagrangian relaxation).

Post-processing methods

Post-processing techniques adjust model outputs after training to improve fairness.

Method	Description
Threshold adjustment	Sets different classification thresholds for different groups to equalize error rates.
Calibrated equalized odds	Adjusts predictions to satisfy equalized odds while maintaining calibration as much as possible.
Reject option classification	Gives favorable outcomes to disadvantaged groups and unfavorable outcomes to advantaged groups for instances near the decision boundary.

Open-source toolkits

Several open-source toolkits are available for detecting and mitigating bias:

AI Fairness 360 (AIF360), developed by IBM Research, offers over 70 fairness metrics and 11 bias mitigation algorithms covering all three pipeline stages. It is available in both Python and R.
Fairlearn, developed by Microsoft, provides tools for assessing and improving model fairness with a focus on constrained optimization and integration with the Azure ML ecosystem.
Aequitas, developed by the University of Chicago's Center for Data Science and Public Policy, is an open-source bias audit toolkit for evaluating classification models.
What-If Tool, developed by Google, is a visual tool for probing ML model behavior across different demographic slices.

In-group bias overlaps with but is distinct from several related concepts in the AI fairness literature.

Concept	Relationship to in-group bias
Algorithmic bias	A broader term covering any systematic error in an AI system that produces unfair outcomes. In-group bias is one specific source of algorithmic bias.
Coverage bias	Occurs when some groups are underrepresented in the training data. In-group bias can cause coverage bias when data collectors preferentially sample from their own communities.
Prediction bias	A statistical property where a model's average prediction differs from the average observed value. In-group bias can contribute to prediction bias for underrepresented groups.
Selection bias	Arises when the process of selecting data introduces systematic differences. In-group bias can drive selection bias when gatekeepers preferentially include data from their own group.
Confirmation bias	A cognitive bias where individuals favor information that confirms existing beliefs. Annotators may exhibit confirmation bias that reinforces in-group preferences during labeling.
Disparate treatment	Occurs when a system explicitly uses a protected attribute (such as race or gender) in its decision-making. In-group bias more commonly leads to disparate impact through proxy variables.

Domains affected by in-group bias in AI

In-group bias in AI systems has documented consequences across many sectors.

Criminal justice

Risk assessment tools like COMPAS are used in pretrial detention, sentencing, and parole decisions across the United States. When these tools encode racial bias, they contribute to the disproportionate incarceration of minority groups. Multiple studies have shown that recidivism prediction instruments produce unequal false positive rates across racial groups.

Healthcare

Clinical decision support algorithms that rely on historical healthcare spending data can systematically underestimate the needs of populations that have historically faced barriers to care. Beyond the Obermeyer et al. (2019) finding, researchers have identified racial bias in pulmonary function test algorithms that "race-normalize" results, leading to underdiagnosis of respiratory conditions in Black patients.

Hiring and employment

Automated resume screening and candidate ranking systems can reproduce historical hiring patterns that favored certain demographic groups. The Amazon recruiting tool case demonstrated how models trained on biased historical data learn to discriminate against groups that were historically underrepresented in the applicant pool.

Content moderation and hate speech detection

Sentiment analysis and hate speech classifiers are particularly susceptible to annotation bias because the task is inherently subjective. Research has shown that hate speech classifiers learn normative social stereotypes from their training data, and the demographic composition of the annotation team significantly shapes what the classifier treats as hateful content.

Financial services

Credit scoring and loan approval algorithms can produce disparate outcomes along racial and gender lines when they rely on features that correlate with protected attributes. Zip code, purchasing patterns, and educational history can all serve as proxies for demographic group membership.

Facial analysis and surveillance

Image recognition systems trained on demographically imbalanced datasets produce accuracy disparities across racial and gender groups. When these systems are deployed for law enforcement surveillance, the uneven error rates create disproportionate risks of misidentification for already-marginalized communities.

Best practices for reducing in-group bias

Organizations developing AI systems can adopt several practices to reduce the risk of in-group bias:

Audit training data for demographic representation. Before training begins, analyze the dataset to determine whether all relevant groups are adequately represented. If certain groups are underrepresented, consider augmentation, resampling, or targeted data collection.
Diversify annotation teams. Ensure that the people labeling data come from a range of demographic backgrounds and provide clear, detailed annotation guidelines to reduce subjective variation.
Use multiple independent annotators. Having each data point labeled by several annotators and aggregating their judgments (for example, by majority vote) can reduce the influence of any single annotator's biases.
Implement blind labeling. Where possible, remove demographic information from the data that annotators see during the labeling process.
Evaluate models across subgroups. Report performance metrics (accuracy, precision, recall, F1 score) separately for each demographic group, not just in aggregate.
Select appropriate fairness metrics. Choose fairness criteria based on the specific application and the types of harm most likely to result from errors.
Apply bias mitigation at multiple stages. Combine pre-processing, in-processing, and post-processing techniques for more robust debiasing.
Conduct ongoing monitoring. Bias can shift over time as populations and data distributions change. Regular audits are necessary to catch emerging disparities.
Engage affected communities. Include representatives from potentially affected groups in the design, evaluation, and governance of AI systems.
Document data and model decisions. Use datasheets for datasets (Gebru et al., 2021) and model cards (Mitchell et al., 2019) to create transparency about data sources, annotation processes, and known limitations.

References

Tajfel, H. (1970). "Experiments in Intergroup Discrimination." *Scientific American*, 223(5), 96-102.
Tajfel, H., & Turner, J. C. (1979). "An Integrative Theory of Intergroup Conflict." In W. G. Austin & S. Worchel (Eds.), *The Social Psychology of Intergroup Relations* (pp. 33-47).
Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., & Kalai, A. (2016). "Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings." *Advances in Neural Information Processing Systems (NeurIPS)*.
Buolamwini, J., & Gebru, T. (2018). "Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification." *Proceedings of the 1st Conference on Fairness, Accountability and Transparency (FAT*)*, 77-91.
Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). "Machine Bias." *ProPublica*. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). "Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations." *Science*, 366(6464), 447-453.
Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). "Semantics Derived Automatically from Language Corpora Contain Human-like Biases." *Science*, 356(6334), 183-186.
Hu, T., et al. (2024). "Generative Language Models Exhibit Social Identity Biases." *Nature Computational Science*, 4, 1-12.
Bellamy, R. K. E., et al. (2019). "AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias." *IBM Journal of Research and Development*, 63(4/5), 4:1-4:15.
Chouldechova, A. (2017). "Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments." *Big Data*, 5(2), 153-163.
Gebru, T., et al. (2021). "Datasheets for Datasets." *Communications of the ACM*, 64(12), 86-92.
Mitchell, M., et al. (2019). "Model Cards for Model Reporting." *Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT*)*, 220-229.
Grover, S. R., et al. (2024). "Uncovering Labeler Bias in Machine Learning Annotation Tasks." *AI and Ethics*. https://link.springer.com/article/10.1007/s43681-024-00572-w
Sumner, W. G. (1906). *Folkways: A Study of the Sociological Importance of Usages, Manners, Customs, Mores, and Morals*. Boston: Ginn and Company.

Explain like I'm 5 (ELI5)

Origins in social psychology

The minimal group paradigm

Social identity theory

Related cognitive phenomena

Evolutionary perspective

How in-group bias enters the machine learning pipeline

Data collection and representation

Annotation and labeling bias

Feature selection and engineering

Word embeddings and representation learning

Model training and optimization

Case studies of in-group bias in AI systems

COMPAS recidivism prediction

Healthcare resource allocation

Facial recognition systems

Amazon's recruiting tool

Word embedding bias

In-group bias in large language models

Measuring in-group bias: fairness metrics

Mitigation techniques

Pre-processing methods

In-processing methods

Post-processing methods

Open-source toolkits

In-group bias versus related concepts

Domains affected by in-group bias in AI

Criminal justice

Healthcare

Hiring and employment

Content moderation and hate speech detection

Financial services

Facial analysis and surveillance

Best practices for reducing in-group bias

References

Improve this article

Related Articles

Group Attribution Bias

Implicit Bias

ARC-AGI 2

Confirmation Bias

Experimenter's Bias

Out-Group Homogeneity Bias

Explain like I'm 5 (ELI5)

Origins in social psychology

The minimal group paradigm

Social identity theory

Related cognitive phenomena

Evolutionary perspective

How in-group bias enters the machine learning pipeline

Data collection and representation

Annotation and labeling bias

Feature selection and engineering

Word embeddings and representation learning

Model training and optimization

Case studies of in-group bias in AI systems

COMPAS recidivism prediction

Healthcare resource allocation

Facial recognition systems

Amazon's recruiting tool

Word embedding bias

In-group bias in large language models

Measuring in-group bias: fairness metrics

Mitigation techniques

Pre-processing methods

In-processing methods

Post-processing methods

Open-source toolkits

In-group bias versus related concepts

Domains affected by in-group bias in AI

Criminal justice

Healthcare

Hiring and employment

Content moderation and hate speech detection

Financial services

Facial analysis and surveillance

Best practices for reducing in-group bias

References

Related Articles

Group Attribution Bias