Out-Group Homogeneity Bias

Out-group homogeneity bias, also called the out-group homogeneity effect, is the cognitive bias in which people perceive members of an out-group as more similar to one another than members of their own in-group. In everyday terms, it is the "they all look alike" or "they're all the same" tendency. While individuals tend to see their own group as varied and composed of distinct people, they compress the members of other groups into a single, undifferentiated mass. The effect has been documented across racial, ethnic, gender, age, political, occupational, and even arbitrary laboratory-created groups.

In artificial intelligence and machine learning, out-group homogeneity bias is significant because it can be encoded in training data, learned by models, and reproduced in outputs. Facial recognition systems that fail disproportionately on minority faces, large language models that portray subordinate groups with less diversity, and hiring algorithms that treat non-majority applicants as interchangeable all reflect this bias in computational form.

ELI5 (Explain like I'm 5)

Imagine you have a big box of crayons at home. You know every single crayon: the slightly-bent red one, the short blue one with the torn label, the green one that smells weird. Now imagine your friend also has a box of crayons, but you have never opened it. You just think, "Oh, those are just crayons." You do not notice the differences between them the way you notice differences in your own set.

People do the same thing with groups of other people. You see lots of differences among people in your own group (your school, your neighborhood, your team), but when you look at a group you do not belong to, everyone in that group seems the same to you. This is not because they really are all the same; it is because you have not paid close enough attention to notice their differences.

When computers learn from data that humans created, they can pick up this same mistake. A computer trained mostly on photos of one group of people might have trouble telling apart people from a different group, just like how you might think all the crayons in your friend's box look the same.

Origins and key studies

Research on the out-group homogeneity effect began in the early 1980s, although related observations about stereotyping and intergroup perception date back further.

Quattrone and Jones (1980)

George Quattrone and Edward E. Jones provided one of the earliest experimental demonstrations using Princeton University eating club members. Sixty male undergraduates rated their own club as significantly more heterogeneous on traits such as friendliness and intelligence than three rival clubs. The researchers proposed that the tendency to generalize from the behavior of a single group member to the group as a whole is proportional to the observer's perception of the group's homogeneity.

Park and Rothbart (1982)

Bernadette Park and Myron Rothbart conducted four experiments that established the out-group homogeneity effect as a robust phenomenon. In one study, members of three different sororities each rated their own sorority as more internally diverse than members of the other two sororities rated them. Members of every group saw themselves as varied and the others as uniform. This work is often cited as the foundational empirical demonstration of the effect.

Linville, Fischer, and Salovey (1989)

Patricia Linville, Gregory Fischer, and Peter Salovey advanced the exemplar-based model of perceived group variability. They proposed that people form mental representations of groups by storing individual exemplars (remembered instances of group members). Because individuals typically encounter more exemplars of their in-group, their mental representation of the in-group contains more instances and therefore more variability. Their PDIST (perceived distributions) model provided a computational account showing that smaller exemplar samples naturally produce less perceived variability.

Park and Judd (1990)

Park and Judd published a methodological framework that distinguished two components of perceived group variability: dispersion (how spread out group members are on a given trait) and stereotypicality (how typical the central tendency is of the group). They also introduced several measurement tools, including the range measure and frequency distribution estimates, that became standard in later research.

Study	Year	Method	Key finding
Quattrone & Jones	1980	Eating club trait ratings	Own club rated more heterogeneous than rival clubs
Park & Rothbart	1982	Sorority member ratings	Each sorority saw itself as more diverse than outsiders saw it
Linville, Fischer & Salovey	1989	Exemplar-based PDIST model	Fewer stored exemplars produce lower perceived variability
Park & Judd	1990	Methodological framework	Distinguished dispersion from stereotypicality as components of perceived variability
Simon & Brown	1987	Minimal group paradigm	Minority group members accentuate in-group homogeneity for distinctiveness
Boldry, Gaertner & Quinn	2007	Meta-analysis of measures	Measurement method moderates the size and direction of homogeneity effects

Psychological mechanisms

Several theoretical accounts explain why people perceive out-groups as more homogeneous than in-groups.

Differential familiarity

The simplest explanation is that people have more experience with in-group members and therefore encode more individuating information about them. With less exposure to out-group members, people rely on category-level representations, which are inherently less detailed. However, familiarity alone does not fully account for the effect. Men and women interact frequently, yet gender-based out-group homogeneity persists, suggesting that additional mechanisms are involved.

Subgroup complexity

Research indicates that people spontaneously organize in-group members into meaningful subgroups (athletes, bookworms, artists, and so on) but generate fewer subgroups for out-groups. When experimenters asked participants to learn about a novel group by organizing members into subgroups, perceived variability increased substantially. When the number of generated subgroups was statistically controlled, differences in perceived group variability between in-groups and out-groups disappeared. This suggests that the cognitive organization of group knowledge, not just the amount of it, drives the effect.

Self-categorization theory

Self-categorization theory (Turner et al., 1987) argues that when people perceive an out-group, they operate in an intergroup context: they attend primarily to the differences between their group and the other group, which draws attention away from differences among out-group members. When perceiving the in-group, people can shift to an intragroup context, where within-group differences become salient. This context-dependent processing explains why even minimal groups (arbitrary, lab-created groupings with no history) show the out-group homogeneity effect.

Social identity theory (Tajfel & Turner, 1979) holds that people derive part of their self-concept from the groups they belong to. Perceiving the out-group as homogeneous reinforces the distinctiveness of the in-group and protects positive social identity. Simon and Brown (1987) found that minority group members sometimes accentuate in-group homogeneity on group-defining dimensions, a strategy that strengthens group cohesion and collective identity even at the cost of reversing the typical pattern.

Exemplar retrieval

According to exemplar models of semantic memory, perceptions of groups are formed by retrieving previously encountered instances. When a person encounters a new out-group member, they compare that individual to a small set of stored exemplars. Because the exemplar pool for out-groups is typically smaller and less varied, the resulting impression of the group is more homogeneous. Linville and Fischer's (1993) computational simulations confirmed that even unbiased sampling from a smaller pool produces lower perceived variability.

Mechanism	Core claim	Limitation
Differential familiarity	Less contact with out-group produces fewer individuating memories	Cannot explain homogeneity effects for highly familiar out-groups (e.g., gender)
Subgroup complexity	In-group knowledge is organized into more subgroups	Does not explain why subgroup generation differs in the first place
Self-categorization theory	Intergroup context suppresses within-group differentiation	Predicts in-group homogeneity in some conditions, complicating the picture
Social identity theory	Homogeneity perception serves identity-protection motives	Motivational component is difficult to measure directly
Exemplar retrieval	Smaller exemplar sets yield lower perceived variability	Assumes people store and retrieve exemplars rather than abstractions

Measurement approaches

Researchers have used several distinct methods to quantify perceived group variability. Each method captures a slightly different aspect of homogeneity perception.

Range estimates: Participants estimate the range (minimum to maximum) of a trait within a group. Narrower ranges indicate greater perceived homogeneity.
Variance or standard deviation estimates: Participants directly estimate how spread out a group is on a given dimension.
Percentage estimates (stereotypicality): Participants estimate what percentage of a group possesses a given trait. Higher percentages for stereotypical traits and lower percentages for counterstereotypical traits indicate greater perceived homogeneity.
Probability of differentiation (Pd): This measure calculates the likelihood that two randomly selected group members would differ on a given attribute. Lower Pd values indicate greater perceived homogeneity. Park and Judd (1990) introduced this as a formal metric.
Similarity ratings: Participants rate how similar group members are to one another. Higher within-group similarity ratings indicate greater perceived homogeneity.
Trait attribution breadth: Participants list traits they believe characterize a group. Fewer distinct traits attributed to a group suggest greater perceived homogeneity.

Boldry, Gaertner, and Quinn (2007) conducted a meta-analysis of these measures and found that the choice of measurement method significantly moderates the observed size of the out-group homogeneity effect.

The cross-race effect

The cross-race effect (also called the own-race bias or other-race effect) is a closely related phenomenon in which people show superior recognition memory for faces of their own race compared to faces of other races. This effect has direct connections to out-group homogeneity bias.

Chance and Goldstein (1981) found that white students viewing racially mixed sets of faces showed better recognition for same-race faces while falsely recognizing previously unseen Black faces at higher rates. Black participants showed the opposite pattern.

Hughes et al. (2019) used functional MRI to demonstrate that the neural basis of this effect lies in the earliest stages of face perception. Face-selective brain regions (including the fusiform face area) showed reduced sensitivity to variability among other-race faces compared to own-race faces. In other words, the brain literally processes out-group faces with less differentiation at a perceptual level, not just at the level of memory or judgment.

Social-cognitive theories propose that faces are rapidly classified as in-group or out-group. In-group faces receive individuated processing (attention to unique features), while out-group faces receive categorical processing (attention only to category-defining features). This differential processing mode produces both the cross-race recognition deficit and the broader perception of out-group homogeneity.

Implications for machine learning and AI

The out-group homogeneity bias has direct consequences for AI systems at multiple stages of the ML pipeline: data collection, model training, and deployment.

Training data imbalances

When training datasets are collected or curated by people who belong to a particular demographic group, those datasets tend to contain more varied and nuanced representations of the in-group and more stereotypical or homogeneous representations of out-groups. This is a structural manifestation of out-group homogeneity bias. For example, benchmark datasets used to train facial recognition systems have been found to comprise approximately 80% lighter-skinned subjects, underrepresenting women and people of color.

Facial recognition disparities

The Gender Shades study by Joy Buolamwini and Timnit Gebru (2018) evaluated commercial facial analysis systems from major technology companies and found that darker-skinned females were misclassified at rates up to 34.7%, while the maximum error rate for lighter-skinned males was 0.8%. This 40-fold difference in error rates reflects how systems trained predominantly on majority-group faces treat minority-group faces as more interchangeable.

The NIST Face Recognition Vendor Test (2019) evaluated 189 algorithms from 99 developers and found false positive rates that varied by factors of 10 to 100 across demographic groups. African Americans, Asian Americans, and Native Americans showed the highest false positive rates when tested with U.S.-developed algorithms. Notably, algorithms developed in Asian countries performed comparably well on East Asian faces, suggesting that the bias is tied to training data composition rather than inherent algorithmic limitations.

System / study	Year	Finding
Gender Shades (Buolamwini & Gebru)	2018	Up to 34.7% error rate on darker-skinned females vs. 0.8% on lighter-skinned males
NIST FRVT Part 3	2019	10x to 100x false positive rate differentials across race and gender groups
Hughes et al. (fMRI study)	2019	Neural face-selective regions show reduced sensitivity to other-race face variability

Homogeneity bias in large language models

Recent research has shown that large language models reproduce out-group homogeneity patterns in their text generation.

Lee, Montgomery, and Lai (2024) generated 52,000 text completions using GPT-3.5-turbo across 13 writing prompts and 8 intersectional demographic groups. They measured homogeneity by computing pairwise cosine similarity between sentence embeddings. Key findings included:

Texts about African Americans were 0.33 standard deviations more homogeneous than texts about White Americans
Texts about Asian Americans were 0.31 standard deviations more homogeneous
Texts about Hispanic Americans were 0.18 standard deviations more homogeneous
Texts about women were 0.037 standard deviations more homogeneous than texts about men
The bias persisted even after controlling for topic (e.g., when all groups were prompted about neutral topics like cooking)

The same research group (Lee, 2025) tested whether adjusting hyperparameters such as sampling temperature and top-p could eliminate homogeneity bias in GPT-4. The bias persisted in 19 out of 20 hyperparameter configurations tested. Gender bias proved more resistant to hyperparameter tuning than racial bias, and the effects were non-linear: increasing temperature to extreme values sometimes worsened the bias rather than reducing it.

Word embeddings and representational bias

Word embeddings trained on large text corpora encode societal biases, including patterns related to group homogeneity. Bolukbasi et al. (2016) showed that word2vec embeddings trained on Google News articles associated professions like "doctor" and "programmer" with male, and "nurse" and "homemaker" with female. These embeddings compress demographic groups into stereotypical representations, a computational analogue of out-group homogeneity bias where the diversity within a group is replaced by a single stereotypical profile.

Hiring and decision-making algorithms

Amazon developed an automated resume screening system in 2014 trained on 10 years of company resumes. Because the tech industry is male-dominated, the system learned to penalize resumes containing terms like "women's" (as in "women's chess club") and the names of certain all-women's colleges. The system treated female applicants as an undifferentiated out-group, failing to individuate their qualifications. Amazon abandoned the project in 2018 after efforts to remove the bias proved insufficient.

Relationship to other biases

Out-group homogeneity bias is connected to several other cognitive and algorithmic biases.

Related bias	Relationship to out-group homogeneity bias
In-group bias	The complementary tendency to favor one's own group. Out-group homogeneity and in-group favoritism often co-occur but are distinct phenomena.
Confirmation bias	Tendency to notice information that confirms existing stereotypes about out-groups, reinforcing the perception of homogeneity
Sampling bias	When training data overrepresents one group and underrepresents others, the model has fewer exemplars for the underrepresented group, mirroring the cognitive mechanism of out-group homogeneity
Group attribution bias	Tendency to generalize from one member's behavior to the entire group, which is stronger when the group is perceived as homogeneous
Automation bias	Tendency to trust algorithmic outputs uncritically, which can amplify harm when those outputs reflect out-group homogeneity bias
Out-group polarization	Linville and Jones (1980) found that out-group members receive more extreme evaluations (both positive and negative), a consequence of simplified mental representations with fewer evaluative dimensions

Mitigation strategies

Addressing out-group homogeneity bias requires interventions at both the human-cognitive level and the technical-algorithmic level.

In human cognition

Intergroup contact: The contact hypothesis (Allport, 1954) predicts that meaningful, equal-status interactions between group members reduce prejudice and stereotyping. Increased personal contact with out-group members adds exemplars to one's mental representation, increasing perceived variability.
Individuation training: Encouraging people to attend to individuating features of out-group members (rather than category-level features) has been shown to reduce both the cross-race effect and broader out-group homogeneity perceptions.
Subgroup awareness: Asking people to identify subgroups within an out-group increases perceived variability. This directly targets the subgroup complexity mechanism.
Perspective-taking: Actively imagining the perspective of an out-group member can shift processing from categorical to individuated mode.

In AI and ML systems

Dataset diversification: Ensuring that training data contains balanced, varied representation from all demographic groups reduces the structural analogue of out-group homogeneity in the data itself.
Fairness auditing: Systematic evaluation of model performance across demographic subgroups (as in the Gender Shades methodology) can reveal homogeneity-related performance gaps.
Disaggregated evaluation: Reporting model accuracy, precision, recall, and error rates separately for each demographic group, rather than only as aggregate metrics, prevents homogeneity-related disparities from being masked by overall averages.
Social contact debiasing (SCD): Raj et al. (2024) proposed instruction-tuning LLMs with prompts that simulate positive intergroup contact, inspired by the contact hypothesis from social psychology. This approach reduced bias by up to 40% in one epoch of fine-tuning without degrading performance on downstream tasks.
Adversarial testing: Probing models with inputs designed to reveal differential treatment of demographic groups can uncover hidden homogeneity biases before deployment.
Synthetic data balancing: NYU researchers demonstrated that pre-training facial recognition models on large, balanced synthetic datasets significantly reduced demographic bias while boosting overall accuracy.

Strategy	Domain	Mechanism
Intergroup contact	Human cognition	Adds exemplars, increases perceived variability
Individuation training	Human cognition	Shifts processing from categorical to individuated
Dataset diversification	ML/AI	Provides more exemplars of underrepresented groups
Fairness auditing	ML/AI	Detects differential performance across groups
Social contact debiasing	LLMs	Instruction-tunes models with simulated positive contact
Disaggregated evaluation	ML/AI	Prevents aggregate metrics from hiding group disparities
Synthetic data balancing	Computer vision	Supplements real data with balanced synthetic exemplars

References

Quattrone, G. A., & Jones, E. E. (1980). The perception of variability within in-groups and out-groups: Implications for the law of small numbers. *Journal of Personality and Social Psychology*, 38(1), 141-152.
Park, B., & Rothbart, M. (1982). Perception of out-group homogeneity and levels of social categorization: Memory for the subordinate attributes of in-group and out-group members. *Journal of Personality and Social Psychology*, 42(6), 1051-1068.
Tajfel, H., & Turner, J. C. (1979). An integrative theory of intergroup conflict. In W. G. Austin & S. Worchel (Eds.), *The Social Psychology of Intergroup Relations* (pp. 33-47). Brooks/Cole.
Linville, P. W., Fischer, G. W., & Salovey, P. (1989). Perceived distributions of the characteristics of in-group and out-group members: Empirical evidence and a computer simulation. *Journal of Personality and Social Psychology*, 57(2), 165-188.
Park, B., & Judd, C. M. (1990). Measures and models of perceived group variability. *Journal of Personality and Social Psychology*, 59(2), 173-191.
Linville, P. W., & Fischer, G. W. (1993). Exemplar and abstraction models of perceived group variability and stereotypicality. *Social Cognition*, 11(1), 92-125.
Boldry, J. G., Gaertner, L., & Quinn, J. (2007). Measuring the measures: A meta-analytic investigation of the measures of outgroup homogeneity. *Group Processes & Intergroup Relations*, 10(2), 157-178.
Buolamwini, J., & Gebru, T. (2018). Gender Shades: Intersectional accuracy disparities in commercial gender classification. *Proceedings of Machine Learning Research*, 81, 1-15.
Grother, P., Ngan, M., & Hanaoka, K. (2019). Face Recognition Vendor Test (FRVT) Part 3: Demographic Effects. *NIST Interagency Report 8280*.
Hughes, B. L., Camp, N. P., Gomez, J., Natu, V. S., Grill-Spector, K., & Eberhardt, J. L. (2019). Neural adaptation to faces reveals racial outgroup homogeneity effects in early perception. *Proceedings of the National Academy of Sciences*, 116(29), 14532-14537.
Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. *Advances in Neural Information Processing Systems*, 29, 4349-4357.
Lee, M. H. J., Montgomery, J. M., & Lai, C. K. (2024). Large language models portray socially subordinate groups as more homogeneous, consistent with a bias observed in humans. *arXiv preprint arXiv:2401.08495*.
Lee, M. H. J. (2025). Examining the robustness of homogeneity bias to hyperparameter adjustments in GPT-4. *arXiv preprint arXiv:2501.02211*.
Raj, S., et al. (2024). Breaking bias, building bridges: Evaluation and mitigation of social biases in LLMs via contact hypothesis. *Proceedings of the 2024 AAAI/ACM Conference on AI, Ethics, and Society*.
Simon, B., & Brown, R. (1987). Perceived intragroup homogeneity in minority-majority contexts. *Journal of Personality and Social Psychology*, 53(4), 703-711.

ELI5 (Explain like I'm 5)

Origins and key studies

Quattrone and Jones (1980)

Park and Rothbart (1982)

Linville, Fischer, and Salovey (1989)

Park and Judd (1990)

Psychological mechanisms

Differential familiarity

Subgroup complexity

Self-categorization theory

Social identity theory

Exemplar retrieval

Measurement approaches

The cross-race effect

Implications for machine learning and AI

Training data imbalances

Facial recognition disparities

Homogeneity bias in large language models

Word embeddings and representational bias

Hiring and decision-making algorithms

Relationship to other biases

Mitigation strategies

In human cognition

In AI and ML systems

See also

References

Improve this article

Related Articles

Machine learning terms/Fairness

Automation Bias

Equality of Opportunity

Predictive Parity

Unawareness (Fairness Through Unawareness)

ARC-AGI 2

ELI5 (Explain like I'm 5)

Origins and key studies

Quattrone and Jones (1980)

Park and Rothbart (1982)

Linville, Fischer, and Salovey (1989)

Park and Judd (1990)

Psychological mechanisms

Differential familiarity

Subgroup complexity

Self-categorization theory

Social identity theory

Exemplar retrieval

Measurement approaches

The cross-race effect

Implications for machine learning and AI

Training data imbalances

Facial recognition disparities

Homogeneity bias in large language models

Word embeddings and representational bias

Hiring and decision-making algorithms

Relationship to other biases

Mitigation strategies

In human cognition

In AI and ML systems

See also

References

Related Articles

Machine learning terms/Fairness

Automation Bias

Equality of Opportunity

Predictive Parity

Unawareness (Fairness Through Unawareness)

ARC-AGI 2