Out-Group Homogeneity Bias
Last reviewed
Jun 2, 2026
Sources
26 citations
Review status
Source-backed
Revision
v4 ยท 4,760 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 2, 2026
Sources
26 citations
Review status
Source-backed
Revision
v4 ยท 4,760 words
Add missing citations, update stale details, or suggest a clearer explanation.
Out-group homogeneity bias, also called the out-group homogeneity effect, is the cognitive bias in which people perceive members of an out-group as more similar to one another than members of their own in-group.[1] In everyday terms, it is the "they all look alike" or "they're all the same" tendency. While individuals tend to see their own group as varied and composed of distinct people, they compress the members of other groups into a single, undifferentiated mass. The effect has been documented across racial, ethnic, gender, age, political, occupational, and even arbitrary laboratory-created groups.[4]
The concept appears explicitly in Google's machine learning fairness glossary, which defines it as "the tendency to see out-group members as more alike than in-group members when comparing attitudes, values, personality traits, and other characteristics," and classifies it as a form of group attribution bias. The glossary illustrates the idea with the contrast that members of a group might describe each other's homes in vivid detail while declaring that out-group members "all live in identical houses."[16]
In artificial intelligence and machine learning, out-group homogeneity bias is significant because it can be encoded in training data, learned by models, and reproduced in outputs. Facial recognition systems that fail disproportionately on minority faces, large language models that portray subordinate groups with less diversity, and hiring algorithms that treat non-majority applicants as interchangeable all reflect this bias in computational form.
Imagine you have a big box of crayons at home. You know every single crayon: the slightly-bent red one, the short blue one with the torn label, the green one that smells weird. Now imagine your friend also has a box of crayons, but you have never opened it. You just think, "Oh, those are just crayons." You do not notice the differences between them the way you notice differences in your own set.
People do the same thing with groups of other people. You see lots of differences among people in your own group (your school, your neighborhood, your team), but when you look at a group you do not belong to, everyone in that group seems the same to you. This is not because they really are all the same; it is because you have not paid close enough attention to notice their differences.
When computers learn from data that humans created, they can pick up this same mistake. A computer trained mostly on photos of one group of people might have trouble telling apart people from a different group, just like how you might think all the crayons in your friend's box look the same.
Research on the out-group homogeneity effect began in the early 1980s, although related observations about stereotyping and intergroup perception date back further.
George Quattrone and Edward E. Jones provided one of the earliest experimental demonstrations of the effect.[1] In their core experiment, Rutgers and Princeton undergraduates watched a videotaped person make a choice and were then asked to estimate what percentage of students at that person's university would make the same choice. Participants generalized far more readily from a single behavior to the whole group when that group was an out-group (the other university) than when it was their own in-group. Quattrone and Jones framed this as an "implication for the law of small numbers": observers treat a small, possibly unrepresentative sample of out-group behavior as if it reliably characterizes the entire group, which is exactly the inferential error that follows from perceiving an out-group as homogeneous.[1]
A closely associated study by Jones, Wood, and Quattrone (1981) supplied the often-cited eating-club demonstration. Members of four Princeton eating clubs rated their own club as more heterogeneous on a series of personal characteristics than they rated the three other clubs. Notably, this in-group/out-group difference was unrelated to how many in-group or out-group members a participant actually knew, an early hint that mere familiarity does not fully explain the effect.[17]
Bernadette Park and Myron Rothbart conducted four experiments that established the out-group homogeneity effect as a robust phenomenon.[2] In one study, members of three University of Oregon sororities (roughly 90 participants) each rated their own sorority and two rival sororities on ten dimensions such as studiousness, partying frequency, and group cohesiveness, plus an overall intragroup-similarity judgment. Members of every group saw themselves as more internally varied and the other two as more uniform, while also attributing favorable traits more strongly to their own group. Complementary memory experiments in the same paper showed that people remember less individuating, more category-level information about out-group members than about in-group members, which is consistent with the perception of out-group uniformity. This work is often cited as the foundational empirical demonstration of the effect.[2]
Patricia Linville, Gregory Fischer, and Peter Salovey advanced the exemplar-based model of perceived group variability.[4] They proposed that people form mental representations of groups by storing individual exemplars (remembered instances of group members). Because individuals typically encounter more exemplars of their in-group, their mental representation of the in-group contains more instances and therefore more variability. Across four experiments with groups defined by age and by nationality, greater familiarity with a group predicted both greater perceived differentiation among its members and greater perceived variance. Their PDIST (perceived distributions) model activates a set of stored exemplars and judges the relative likelihood of different feature values from their activation strengths; a computer simulation confirmed that this mechanism is sufficient to reproduce the empirical pattern, with smaller exemplar samples naturally producing less perceived variability.[4]
Park and Judd published a methodological framework that distinguished two components of perceived group variability: dispersion (how spread out group members are on a given trait) and stereotypicality (how typical the central tendency is of the group).[5] They demonstrated that these two components are statistically independent: a perceiver can rate a group as tightly clustered yet not especially stereotypical, or vice versa. They also introduced several measurement tools, including the range measure and frequency distribution estimates, that became standard in later research.[5]
| Study | Year | Method | Key finding |
|---|---|---|---|
| Quattrone & Jones[1] | 1980 | Videotape generalization (Rutgers/Princeton) | Stronger generalization from one behavior to the whole out-group (law of small numbers) |
| Jones, Wood & Quattrone[17] | 1981 | Eating club trait ratings | Own club rated more heterogeneous than three rival clubs, unrelated to acquaintance |
| Park & Rothbart[2] | 1982 | Sorority member ratings | Each sorority saw itself as more diverse than outsiders saw it |
| Linville, Fischer & Salovey[4] | 1989 | Exemplar-based PDIST model | Fewer stored exemplars produce lower perceived variability |
| Park & Judd[5] | 1990 | Methodological framework | Distinguished dispersion from stereotypicality as components of perceived variability |
| Simon & Brown[15] | 1987 | Minimal group paradigm | Minority group members accentuate in-group homogeneity for distinctiveness |
| Boldry, Gaertner & Quinn[7] | 2007 | Meta-analysis of measures | Measurement method moderates the size and direction of homogeneity effects |
Several theoretical accounts explain why people perceive out-groups as more homogeneous than in-groups.
The simplest explanation is that people have more experience with in-group members and therefore encode more individuating information about them. With less exposure to out-group members, people rely on category-level representations, which are inherently less detailed. However, familiarity alone does not fully account for the effect. Men and women interact frequently, yet gender-based out-group homogeneity persists, suggesting that additional mechanisms are involved.
Research indicates that people spontaneously organize in-group members into meaningful subgroups (athletes, bookworms, artists, and so on) but generate fewer subgroups for out-groups. When experimenters asked participants to learn about a novel group by organizing members into subgroups, perceived variability increased substantially. When the number of generated subgroups was statistically controlled, differences in perceived group variability between in-groups and out-groups disappeared. This suggests that the cognitive organization of group knowledge, not just the amount of it, drives the effect.
Self-categorization theory (Turner et al., 1987) argues that when people perceive an out-group, they operate in an intergroup context: they attend primarily to the differences between their group and the other group, which draws attention away from differences among out-group members.[26] When perceiving the in-group, people can shift to an intragroup context, where within-group differences become salient. This context-dependent processing explains why even minimal groups (arbitrary, lab-created groupings with no history) show the out-group homogeneity effect.
Social identity theory (Tajfel & Turner, 1979) holds that people derive part of their self-concept from the groups they belong to.[3] Perceiving the out-group as homogeneous reinforces the distinctiveness of the in-group and protects positive social identity. Simon and Brown (1987) found that minority group members sometimes accentuate in-group homogeneity on group-defining dimensions, a strategy that strengthens group cohesion and collective identity even at the cost of reversing the typical pattern.[15]
According to exemplar models of semantic memory, perceptions of groups are formed by retrieving previously encountered instances. When a person encounters a new out-group member, they compare that individual to a small set of stored exemplars. Because the exemplar pool for out-groups is typically smaller and less varied, the resulting impression of the group is more homogeneous. Linville and Fischer's (1993) computational comparison of exemplar-based and abstraction-based accounts confirmed that even unbiased sampling from a smaller pool produces lower perceived variability.[6]
| Mechanism | Core claim | Limitation |
|---|---|---|
| Differential familiarity | Less contact with out-group produces fewer individuating memories | Cannot explain homogeneity effects for highly familiar out-groups (e.g., gender) |
| Subgroup complexity | In-group knowledge is organized into more subgroups | Does not explain why subgroup generation differs in the first place |
| Self-categorization theory | Intergroup context suppresses within-group differentiation | Predicts in-group homogeneity in some conditions, complicating the picture |
| Social identity theory | Homogeneity perception serves identity-protection motives | Motivational component is difficult to measure directly |
| Exemplar retrieval | Smaller exemplar sets yield lower perceived variability | Assumes people store and retrieve exemplars rather than abstractions |
Researchers have used several distinct methods to quantify perceived group variability. Each method captures a slightly different aspect of homogeneity perception.
Boldry, Gaertner, and Quinn (2007) conducted a meta-analysis of 177 effect sizes from 173 independent samples (12,078 participants) and 11 distinct measures of perceived variability. They found a small but reliable overall tendency toward out-group homogeneity that was stronger for real ("non-minimal") groups than for arbitrary minimal groups. Critically, the choice of measurement method significantly moderated the result: some measures yielded out-group homogeneity while others, in the same contexts, yielded in-group homogeneity, and these discrepancies themselves varied with the relative status of the groups.[7]
The cross-race effect (also called the own-race bias or other-race effect) is a closely related phenomenon in which people show superior recognition memory for faces of their own race compared to faces of other races. This effect has direct connections to out-group homogeneity bias: if other-race faces are encoded with less individuating detail, they become harder to tell apart, which is the perceptual signature of perceived homogeneity. The effect is well documented and emerges developmentally, appearing around age ten in children who recognize own- and other-race faces equally well at age six.[18]
Hughes et al. (2019) used functional MRI to demonstrate that the neural basis of this effect lies in the earliest stages of face perception.[10] Using an fMRI adaptation paradigm with White (in-group) and Black (out-group) faces, face-selective brain regions (including the fusiform face area) showed reduced sensitivity to variability among other-race faces compared to own-race faces. In other words, the brain processes out-group faces with less differentiation at a perceptual level, not just at the level of memory or judgment.[10] An independent, preregistered study by Reggev, Brodie, Cikara, and Mitchell (2020) conceptually replicated this result, reporting that the fusiform face area produced as much repetition suppression for two different Black individuals as it did for two identical faces, while it clearly distinguished different White faces. The convergence of two labs strengthens the claim that out-group homogeneity has an early perceptual basis.[19]
Social-cognitive theories propose that faces are rapidly classified as in-group or out-group. In-group faces receive individuated processing (attention to unique features), while out-group faces receive categorical processing (attention only to category-defining features). This differential processing mode produces both the cross-race recognition deficit and the broader perception of out-group homogeneity.
The out-group homogeneity bias has direct consequences for AI systems at multiple stages of the ML pipeline: data collection, model training, and deployment.
When training datasets are collected or curated by people who belong to a particular demographic group, those datasets tend to contain more varied and nuanced representations of the in-group and more stereotypical or homogeneous representations of out-groups. This is a structural manifestation of out-group homogeneity bias. For example, the benchmark datasets used to evaluate facial recognition systems in the Gender Shades audit were 79.6% to 86.2% lighter-skinned subjects; the IJB-A benchmark contained only 4.4% darker-skinned women against 59.4% lighter-skinned men.[8] When a group is represented by few and similar examples, a model has little basis on which to learn the within-group variation that distinguishes its members.
The Gender Shades study by Joy Buolamwini and Timnit Gebru (2018) evaluated the commercial gender-classification systems of Microsoft, IBM, and Face++ (Megvii) and found that darker-skinned females were misclassified at rates of 20.8%, 34.5%, and 34.7% across the three systems, rising to 46.5% and 46.8% for the darkest-skinned women (Fitzpatrick type VI) on two of them. The maximum error rate for lighter-skinned males was 0.8%; for the best-performing classifier, darker-skinned females were 32 times more likely to be misclassified than lighter-skinned males.[8] The authors traced these gaps to the skewed benchmarks and released the more balanced Pilot Parliaments Benchmark (44.4% female, 47% darker-skinned) to expose them. Such disparities reflect how systems trained predominantly on majority-group faces treat minority-group faces as more interchangeable.
The NIST Face Recognition Vendor Test Part 3 (2019) evaluated 189 algorithms from 99 developers using 18 million images of 8 million people, and found that false positive rates often varied by a factor of 10 to 100 across demographic groups.[9] African Americans and Asian Americans showed elevated false positives in one-to-one matching, while the American Indian group showed the highest rates among U.S.-developed algorithms; in one-to-many search, false positives were highest for African American women, the scenario most likely to produce a wrongful identification. Notably, several algorithms developed in Asian countries did not show the same elevated error rates on East Asian faces, which NIST suggested points to training-data composition rather than an inherent algorithmic limitation.[9]
| System / study | Year | Finding |
|---|---|---|
| Gender Shades (Buolamwini & Gebru)[8] | 2018 | Up to 34.7% error rate on darker-skinned females vs. 0.8% on lighter-skinned males (32x gap on the best classifier) |
| NIST FRVT Part 3[9] | 2019 | 10x to 100x false positive rate differentials across race and gender groups (189 algorithms, 99 developers) |
| Hughes et al. (fMRI study)[10] | 2019 | Neural face-selective regions show reduced sensitivity to other-race face variability |
| Reggev et al. (preregistered fMRI replication)[19] | 2020 | Fusiform face area fails to distinguish two different out-group faces |
Recent research has shown that large language models reproduce out-group homogeneity patterns in their text generation.
Lee, Montgomery, and Lai (2024) generated 52,000 text completions (500 each across 13 writing prompts and 8 intersectional demographic groups) using GPT-3.5-turbo. They measured homogeneity by computing pairwise cosine similarity between BERT-based sentence embeddings, where higher similarity means more uniform output.[12] Key findings included:
The same research group (Lee, 2025) tested whether adjusting hyperparameters such as sampling temperature and top-p could eliminate homogeneity bias in GPT-4. The bias persisted in 19 out of 20 hyperparameter configurations tested. Gender bias proved more resistant to hyperparameter tuning than racial bias, and the effects were non-linear: increasing temperature to extreme values sometimes worsened the bias rather than reducing it.[13] A follow-up study (Lee and Jeon, 2025) tested the mechanistic hypothesis that the bias is simply an artifact of the model being less "uncertain" (lower token-sampling entropy) when writing about minority groups. Across six LLMs they found robust homogeneity bias in sentence similarity but very little corresponding difference in token-sampling uncertainty, concluding that inference-time fixes like temperature scaling will not address the problem and that interventions must target representation learning and training data instead.[20]
The pattern is not confined to text-only models. Lee and Jeon (2024) extended the finding to vision-language models: prompting VLMs to write stories about computer-generated faces that varied in racial phenotypicality, they found that the models produced more homogeneous stories for individuals with more pronounced Black phenotypic features, and that stories about Black women were consistently more uniform than those about Black men.[21]
Word embeddings trained on large text corpora encode societal biases, including patterns related to group homogeneity. Bolukbasi et al. (2016) showed that word2vec embeddings trained on three million words of Google News text associated professions like "doctor" and "programmer" with male, and "nurse" and "homemaker" with female, to the point that the analogy "man is to computer programmer as woman is to X" returned "homemaker."[11] These embeddings compress demographic groups into stereotypical representations, a computational analogue of out-group homogeneity bias where the diversity within a group is replaced by a single stereotypical profile. The authors also showed that the gender bias lies along an identifiable direction in the vector space, which let them propose a debiasing transformation that removed stereotypical associations while preserving legitimate ones such as "queen" being female.[11]
Amazon developed an automated resume screening system starting in 2014, trained on ten years of company resumes. Because the tech industry is male-dominated, most of those resumes came from men, and the system learned to penalize resumes containing terms like "women's" (as in "women's chess club captain") and the names of certain all-women's colleges, while favoring verbs such as "executed" and "captured" that appeared more often on male engineers' resumes. The system treated female applicants as an undifferentiated out-group, failing to individuate their qualifications. Amazon abandoned the project after editing the program to neutralize those terms did not guarantee it would not find other discriminatory proxies, as Reuters reported in 2018.[22]
Out-group homogeneity bias is connected to several other cognitive and algorithmic biases.
| Related bias | Relationship to out-group homogeneity bias |
|---|---|
| In-group bias | The complementary tendency to favor one's own group. Out-group homogeneity and in-group favoritism often co-occur but are distinct phenomena. |
| Confirmation bias | Tendency to notice information that confirms existing stereotypes about out-groups, reinforcing the perception of homogeneity |
| Sampling bias | When training data overrepresents one group and underrepresents others, the model has fewer exemplars for the underrepresented group, mirroring the cognitive mechanism of out-group homogeneity |
| Group attribution bias | Tendency to generalize from one member's behavior to the entire group, which is stronger when the group is perceived as homogeneous |
| Automation bias | Tendency to trust algorithmic outputs uncritically, which can amplify harm when those outputs reflect out-group homogeneity bias |
| Out-group polarization | Linville and Jones (1980) found that out-group members receive more extreme evaluations (both positive and negative), a consequence of simplified mental representations with fewer evaluative dimensions[23] |
Addressing out-group homogeneity bias requires interventions at both the human-cognitive level and the technical-algorithmic level.
| Strategy | Domain | Mechanism |
|---|---|---|
| Intergroup contact | Human cognition | Adds exemplars, increases perceived variability |
| Individuation training | Human cognition | Shifts processing from categorical to individuated |
| Dataset diversification | ML/AI | Provides more exemplars of underrepresented groups |
| Fairness auditing | ML/AI | Detects differential performance across groups |
| Social contact debiasing | LLMs | Instruction-tunes models with simulated positive contact |
| Disaggregated evaluation | ML/AI | Prevents aggregate metrics from hiding group disparities |
| Synthetic data balancing | Computer vision | Supplements real data with balanced synthetic exemplars |