# Out-Group Homogeneity Bias

> Source: https://aiwiki.ai/wiki/out-group_homogeneity_bias
> Updated: 2026-06-28
> Categories: AI Ethics, Machine Learning
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**Out-group homogeneity bias**, also called the **out-group homogeneity effect**, is the [cognitive bias](/wiki/bias) in which people perceive members of an out-group as more similar to one another than members of their own in-group.[^1] In everyday terms, it is the "they all look alike" or "they're all the same" tendency: individuals see their own group as varied and composed of distinct people, while they compress the members of other groups into a single, undifferentiated mass. The effect has been documented since the early 1980s across racial, ethnic, gender, age, political, occupational, and even arbitrary laboratory-created groups, and it is now recognized as a measurable failure mode of [artificial intelligence](/wiki/artificial_intelligence) systems as well.[^4]

The concept appears explicitly in Google's machine learning fairness glossary, which defines it as "the tendency to see out-group members as more alike than in-group members when comparing attitudes, values, personality traits, and other characteristics," and classifies it as a form of [group attribution bias](/wiki/group_attribution_bias). The glossary illustrates the idea with the contrast that members of a group might describe each other's homes in vivid detail while declaring that out-group members "all live in identical houses."[^16]

In artificial intelligence and [machine learning](/wiki/machine_learning), out-group homogeneity bias is significant because it can be encoded in [training data](/wiki/training_set), learned by models, and reproduced in outputs. [Facial recognition](/wiki/image_recognition) systems that fail disproportionately on minority faces, [large language models](/wiki/large_language_model) that portray subordinate groups with less diversity, and hiring algorithms that treat non-majority applicants as interchangeable all reflect this bias in computational form.

## ELI5 (Explain like I'm 5)

Imagine you have a big box of crayons at home. You know every single crayon: the slightly-bent red one, the short blue one with the torn label, the green one that smells weird. Now imagine your friend also has a box of crayons, but you have never opened it. You just think, "Oh, those are just crayons." You do not notice the differences between them the way you notice differences in your own set.

People do the same thing with groups of other people. You see lots of differences among people in your own group (your school, your neighborhood, your team), but when you look at a group you do not belong to, everyone in that group seems the same to you. This is not because they really are all the same; it is because you have not paid close enough attention to notice their differences.

When computers learn from data that humans created, they can pick up this same mistake. A computer trained mostly on photos of one group of people might have trouble telling apart people from a different group, just like how you might think all the crayons in your friend's box look the same.

## What research established out-group homogeneity bias?

Research on the out-group homogeneity effect began in the early 1980s, although related observations about [stereotyping](/wiki/bias) and intergroup perception date back further.

### Quattrone and Jones (1980)

George Quattrone and Edward E. Jones provided one of the earliest experimental demonstrations of the effect.[^1] In their core experiment, Rutgers and Princeton undergraduates watched a videotaped person make a choice and were then asked to estimate what percentage of students at that person's university would make the same choice. Participants generalized far more readily from a single behavior to the whole group when that group was an out-group (the other university) than when it was their own in-group. Quattrone and Jones framed this as an "implication for the law of small numbers": observers treat a small, possibly unrepresentative sample of out-group behavior as if it reliably characterizes the entire group, which is exactly the inferential error that follows from perceiving an out-group as homogeneous.[^1]

A closely associated study by Jones, Wood, and Quattrone (1981) supplied the often-cited eating-club demonstration. Members of four Princeton eating clubs rated their own club as more heterogeneous on a series of personal characteristics than they rated the three other clubs. Notably, this in-group/out-group difference was unrelated to how many in-group or out-group members a participant actually knew, an early hint that mere familiarity does not fully explain the effect.[^17]

### Park and Rothbart (1982)

Bernadette Park and Myron Rothbart conducted four experiments that established the out-group homogeneity effect as a robust phenomenon.[^2] In one study, members of three University of Oregon sororities (roughly 90 participants) each rated their own sorority and two rival sororities on ten dimensions such as studiousness, partying frequency, and group cohesiveness, plus an overall intragroup-similarity judgment. Members of every group saw themselves as more internally varied and the other two as more uniform, while also attributing favorable traits more strongly to their own group. Complementary memory experiments in the same paper showed that people remember less individuating, more category-level information about out-group members than about in-group members, which is consistent with the perception of out-group uniformity. This work is often cited as the foundational empirical demonstration of the effect, and its title names the bias directly: "Perception of out-group homogeneity and levels of social categorization."[^2]

### Linville, Fischer, and Salovey (1989)

Patricia Linville, Gregory Fischer, and Peter Salovey advanced the **exemplar-based model** of perceived group variability.[^4] They proposed that people form mental representations of groups by storing individual exemplars (remembered instances of group members). Because individuals typically encounter more exemplars of their in-group, their mental representation of the in-group contains more instances and therefore more variability. Across four experiments with groups defined by age and by nationality, greater familiarity with a group predicted both greater perceived differentiation among its members and greater perceived variance. Their PDIST (perceived distributions) model activates a set of stored exemplars and judges the relative likelihood of different feature values from their activation strengths; a computer simulation confirmed that this mechanism is sufficient to reproduce the empirical pattern, with smaller exemplar samples naturally producing less perceived variability.[^4]

### Park and Judd (1990)

Park and Judd published a methodological framework that distinguished two components of perceived group variability: **dispersion** (how spread out group members are on a given trait) and **stereotypicality** (how typical the central tendency is of the group).[^5] They demonstrated that these two components are statistically independent: a perceiver can rate a group as tightly clustered yet not especially stereotypical, or vice versa. They also introduced several measurement tools, including the range measure and frequency distribution estimates, that became standard in later research.[^5]

| Study | Year | Method | Key finding |
|---|---|---|---|
| Quattrone & Jones[^1] | 1980 | Videotape generalization (Rutgers/Princeton) | Stronger generalization from one behavior to the whole out-group (law of small numbers) |
| Jones, Wood & Quattrone[^17] | 1981 | Eating club trait ratings | Own club rated more heterogeneous than three rival clubs, unrelated to acquaintance |
| Park & Rothbart[^2] | 1982 | Sorority member ratings | Each sorority saw itself as more diverse than outsiders saw it |
| Linville, Fischer & Salovey[^4] | 1989 | Exemplar-based PDIST model | Fewer stored exemplars produce lower perceived variability |
| Park & Judd[^5] | 1990 | Methodological framework | Distinguished dispersion from stereotypicality as components of perceived variability |
| Simon & Brown[^15] | 1987 | Minimal group paradigm | Minority group members accentuate in-group homogeneity for distinctiveness |
| Boldry, Gaertner & Quinn[^7] | 2007 | Meta-analysis of measures | Measurement method moderates the size and direction of homogeneity effects |

## Why do people perceive out-groups as more homogeneous?

Several theoretical accounts explain why people perceive out-groups as more homogeneous than in-groups.

### Differential familiarity

The simplest explanation is that people have more experience with in-group members and therefore encode more individuating information about them. With less exposure to out-group members, people rely on category-level representations, which are inherently less detailed. However, familiarity alone does not fully account for the effect. Men and women interact frequently, yet gender-based out-group homogeneity persists, suggesting that additional mechanisms are involved.

### Subgroup complexity

Research indicates that people spontaneously organize in-group members into meaningful subgroups (athletes, bookworms, artists, and so on) but generate fewer subgroups for out-groups. When experimenters asked participants to learn about a novel group by organizing members into subgroups, perceived variability increased substantially. When the number of generated subgroups was statistically controlled, differences in perceived group variability between in-groups and out-groups disappeared. This suggests that the cognitive organization of group knowledge, not just the amount of it, drives the effect.

### Self-categorization theory

Self-categorization theory (Turner et al., 1987) argues that when people perceive an out-group, they operate in an **intergroup context**: they attend primarily to the differences between their group and the other group, which draws attention away from differences among out-group members.[^26] When perceiving the in-group, people can shift to an **intragroup context**, where within-group differences become salient. This context-dependent processing explains why even minimal groups (arbitrary, lab-created groupings with no history) show the out-group homogeneity effect.

### Social identity theory

[Social identity theory](/wiki/bias) (Tajfel & Turner, 1979) holds that people derive part of their self-concept from the groups they belong to.[^3] Perceiving the out-group as homogeneous reinforces the distinctiveness of the in-group and protects positive social identity. Simon and Brown (1987) found that minority group members sometimes accentuate in-group homogeneity on group-defining dimensions, a strategy that strengthens group cohesion and collective identity even at the cost of reversing the typical pattern.[^15]

### Exemplar retrieval

According to exemplar models of [semantic memory](/wiki/word_embedding), perceptions of groups are formed by retrieving previously encountered instances. When a person encounters a new out-group member, they compare that individual to a small set of stored exemplars. Because the exemplar pool for out-groups is typically smaller and less varied, the resulting impression of the group is more homogeneous. Linville and Fischer's (1993) computational comparison of exemplar-based and abstraction-based accounts confirmed that even unbiased sampling from a smaller pool produces lower perceived variability.[^6]

| Mechanism | Core claim | Limitation |
|---|---|---|
| Differential familiarity | Less contact with out-group produces fewer individuating memories | Cannot explain homogeneity effects for highly familiar out-groups (e.g., gender) |
| Subgroup complexity | In-group knowledge is organized into more subgroups | Does not explain why subgroup generation differs in the first place |
| Self-categorization theory | Intergroup context suppresses within-group differentiation | Predicts in-group homogeneity in some conditions, complicating the picture |
| Social identity theory | Homogeneity perception serves identity-protection motives | Motivational component is difficult to measure directly |
| Exemplar retrieval | Smaller exemplar sets yield lower perceived variability | Assumes people store and retrieve exemplars rather than abstractions |

## How is perceived group variability measured?

Researchers have used several distinct methods to quantify perceived group variability. Each method captures a slightly different aspect of homogeneity perception.

- **Range estimates**: Participants estimate the range (minimum to maximum) of a trait within a group. Narrower ranges indicate greater perceived homogeneity.
- **Variance or standard deviation estimates**: Participants directly estimate how spread out a group is on a given dimension.
- **Percentage estimates (stereotypicality)**: Participants estimate what percentage of a group possesses a given trait. Higher percentages for stereotypical traits and lower percentages for counterstereotypical traits indicate greater perceived homogeneity.
- **Probability of differentiation (Pd)**: This measure calculates the likelihood that two randomly selected group members would differ on a given attribute. Lower Pd values indicate greater perceived homogeneity. Park and Judd (1990) introduced this as a formal metric.
- **Similarity ratings**: Participants rate how similar group members are to one another. Higher within-group similarity ratings indicate greater perceived homogeneity.
- **Trait attribution breadth**: Participants list traits they believe characterize a group. Fewer distinct traits attributed to a group suggest greater perceived homogeneity.

Boldry, Gaertner, and Quinn (2007) conducted a meta-analysis of 177 effect sizes from 173 independent samples (12,078 participants) and 11 distinct measures of perceived variability. They found a small but reliable overall tendency toward out-group homogeneity that was stronger for real ("non-minimal") groups than for arbitrary minimal groups. Critically, the choice of measurement method significantly moderated the result: some measures yielded out-group homogeneity while others, in the same contexts, yielded in-group homogeneity, and these discrepancies themselves varied with the relative status of the groups.[^7]

## What is the cross-race effect?

The **cross-race effect** (also called the own-race bias or other-race effect) is a closely related phenomenon in which people show superior recognition memory for faces of their own race compared to faces of other races. This effect has direct connections to out-group homogeneity bias: if other-race faces are encoded with less individuating detail, they become harder to tell apart, which is the perceptual signature of perceived homogeneity. The effect is well documented and emerges developmentally, appearing around age ten in children who recognize own- and other-race faces equally well at age six.[^18]

Hughes et al. (2019) used functional MRI to demonstrate that the neural basis of this effect lies in the earliest stages of face perception.[^10] Using an fMRI adaptation paradigm with White (in-group) and Black (out-group) faces, face-selective brain regions (including the fusiform face area) showed reduced sensitivity to variability among other-race faces compared to own-race faces. In other words, the brain processes out-group faces with less differentiation at a perceptual level, not just at the level of memory or judgment.[^10] An independent, preregistered study by Reggev, Brodie, Cikara, and Mitchell (2020) conceptually replicated this result, reporting that the fusiform face area produced as much repetition suppression for two different Black individuals as it did for two identical faces, while it clearly distinguished different White faces. The convergence of two labs strengthens the claim that out-group homogeneity has an early perceptual basis.[^19]

Social-cognitive theories propose that faces are rapidly classified as in-group or out-group. In-group faces receive **individuated processing** (attention to unique features), while out-group faces receive **categorical processing** (attention only to category-defining features). This differential processing mode produces both the cross-race recognition deficit and the broader perception of out-group homogeneity.

## How does out-group homogeneity bias affect AI and machine learning?

The out-group homogeneity bias has direct consequences for [AI systems](/wiki/artificial_intelligence) at multiple stages of the ML pipeline: data collection, model training, and deployment.

### Training data imbalances

When training datasets are collected or curated by people who belong to a particular demographic group, those datasets tend to contain more varied and nuanced representations of the in-group and more stereotypical or homogeneous representations of out-groups. This is a structural manifestation of out-group homogeneity bias. For example, the benchmark [datasets](/wiki/data_set_or_dataset) used to evaluate [facial recognition](/wiki/image_recognition) systems in the Gender Shades audit were 79.6% to 86.2% lighter-skinned subjects; the IJB-A benchmark contained only 4.4% darker-skinned women against 59.4% lighter-skinned men.[^8] When a group is represented by few and similar examples, a model has little basis on which to learn the within-group variation that distinguishes its members.

### Facial recognition disparities

The Gender Shades study by Joy Buolamwini and Timnit Gebru (2018) evaluated the commercial gender-classification systems of Microsoft, IBM, and Face++ (Megvii) and found that darker-skinned females were misclassified at rates of 20.8%, 34.5%, and 34.7% across the three systems, rising to 46.5% and 46.8% for the darkest-skinned women (Fitzpatrick type VI) on two of them. The maximum error rate for lighter-skinned males was 0.8%; for the best-performing classifier, darker-skinned females were 32 times more likely to be misclassified than lighter-skinned males.[^8] The authors traced these gaps to the skewed benchmarks and released the more balanced Pilot Parliaments Benchmark (44.4% female, 47% darker-skinned) to expose them. Such disparities reflect how systems trained predominantly on majority-group faces treat minority-group faces as more interchangeable.

The [NIST](/wiki/artificial_intelligence) Face Recognition Vendor Test Part 3 (2019) evaluated 189 algorithms from 99 developers using 18 million images of 8 million people, and found that false positive rates often varied by a factor of 10 to 100 across demographic groups.[^9] Patrick Grother, the report's primary author, stated of the one-to-one matching results that "the differentials often ranged from a factor of 10 to 100 times, depending on the individual algorithm."[^27] African Americans and Asian Americans showed elevated false positives in one-to-one matching, while the American Indian group showed the highest rates among U.S.-developed algorithms; in one-to-many search, false positives were highest for African American women, the scenario most likely to produce a wrongful identification. Notably, several algorithms developed in Asian countries did not show the same elevated error rates on East Asian faces, which NIST suggested points to training-data composition rather than an inherent algorithmic limitation.[^9]

| System / study | Year | Finding |
|---|---|---|
| Gender Shades (Buolamwini & Gebru)[^8] | 2018 | Up to 34.7% error rate on darker-skinned females vs. 0.8% on lighter-skinned males (32x gap on the best classifier) |
| [NIST](/wiki/artificial_intelligence) FRVT Part 3[^9] | 2019 | 10x to 100x false positive rate differentials across race and gender groups (189 algorithms, 99 developers) |
| Hughes et al. (fMRI study)[^10] | 2019 | Neural face-selective regions show reduced sensitivity to other-race face variability |
| Reggev et al. (preregistered fMRI replication)[^19] | 2020 | Fusiform face area fails to distinguish two different out-group faces |

### Homogeneity bias in large language models

Recent research has shown that [large language models](/wiki/large_language_model) reproduce out-group homogeneity patterns in their text generation.

Lee, Montgomery, and Lai (2024) generated 52,000 text completions (500 each across 13 writing prompts and 8 intersectional demographic groups) using [GPT](/wiki/gpt_generative_pre-trained_transformer)-3.5-turbo. They measured homogeneity by computing pairwise cosine similarity between BERT-based sentence [embeddings](/wiki/embeddings), where higher similarity means more uniform output.[^12] In their FAccT 2024 paper, the authors report: "We consistently found that ChatGPT portrayed African, Asian, and Hispanic Americans as more homogeneous than White Americans, indicating that the model described racial minority groups with a narrower range of human experience. ChatGPT also portrayed women as more homogeneous than men, but these differences were small."[^12] Key findings included:

- Texts about African Americans were 0.33 standard deviations more homogeneous than texts about White Americans
- Texts about Asian Americans were 0.31 standard deviations more homogeneous
- Texts about Hispanic Americans were 0.18 standard deviations more homogeneous
- Texts about women were 0.037 standard deviations more homogeneous than texts about men
- The bias persisted across all six alternative measurement strategies and remained even when prompts explicitly excluded hardship and adversity, ruling out the simplest topic-based explanation[^12]

The same research group (Lee, 2025) tested whether adjusting [hyperparameters](/wiki/hyperparameter) such as sampling [temperature](/wiki/temperature) and top-p could eliminate homogeneity bias in [GPT-4](/wiki/gpt4). The bias persisted in 19 out of 20 hyperparameter configurations tested. Gender bias proved more resistant to hyperparameter tuning than racial bias, and the effects were non-linear: increasing temperature to extreme values sometimes worsened the bias rather than reducing it.[^13] A follow-up study (Lee and Jeon, 2025) tested the mechanistic hypothesis that the bias is simply an artifact of the model being less "uncertain" (lower token-sampling entropy) when writing about minority groups. Across six LLMs they found robust homogeneity bias in sentence similarity but very little corresponding difference in token-sampling uncertainty, concluding that inference-time fixes like temperature scaling will not address the problem and that interventions must target representation learning and training data instead.[^20]

The pattern is not confined to text-only models. Lee and Jeon (2024) extended the finding to vision-language models: prompting VLMs to write stories about computer-generated faces that varied in racial phenotypicality, they found that the models produced more homogeneous stories for individuals with more pronounced Black phenotypic features, and that stories about Black women were consistently more uniform than those about Black men.[^21]

### Word embeddings and representational bias

[Word embeddings](/wiki/word_embedding) trained on large text corpora encode societal biases, including patterns related to group homogeneity. Bolukbasi et al. (2016) showed that word2vec embeddings trained on three million words of Google News text associated professions like "doctor" and "programmer" with male, and "nurse" and "homemaker" with female, to the point that the analogy "man is to computer programmer as woman is to X" returned "homemaker."[^11] These embeddings compress demographic groups into stereotypical representations, a computational analogue of out-group homogeneity bias where the diversity within a group is replaced by a single stereotypical profile. The authors also showed that the gender bias lies along an identifiable direction in the vector space, which let them propose a debiasing transformation that removed stereotypical associations while preserving legitimate ones such as "queen" being female.[^11]

### Hiring and decision-making algorithms

Amazon developed an automated resume screening system starting in 2014, trained on ten years of company resumes. Because the tech industry is male-dominated, most of those resumes came from men, and the system learned to penalize resumes containing terms like "women's" (as in "women's chess club captain") and the names of certain all-women's colleges, while favoring verbs such as "executed" and "captured" that appeared more often on male engineers' resumes. The system treated female applicants as an undifferentiated out-group, failing to individuate their qualifications. Amazon abandoned the project after editing the program to neutralize those terms did not guarantee it would not find other discriminatory proxies, as Reuters reported in 2018.[^22]

## How does out-group homogeneity bias relate to other biases?

Out-group homogeneity bias is connected to several other cognitive and algorithmic biases.

| Related bias | Relationship to out-group homogeneity bias |
|---|---|
| [In-group bias](/wiki/in-group_bias) | The complementary tendency to favor one's own group. Out-group homogeneity and in-group favoritism often co-occur but are distinct phenomena. |
| [Confirmation bias](/wiki/confirmation_bias) | Tendency to notice information that confirms existing stereotypes about out-groups, reinforcing the perception of homogeneity |
| [Sampling bias](/wiki/sampling_bias) | When training data overrepresents one group and underrepresents others, the model has fewer exemplars for the underrepresented group, mirroring the cognitive mechanism of out-group homogeneity |
| [Group attribution bias](/wiki/group_attribution_bias) | Tendency to generalize from one member's behavior to the entire group, which is stronger when the group is perceived as homogeneous |
| [Automation bias](/wiki/automation_bias) | Tendency to trust algorithmic outputs uncritically, which can amplify harm when those outputs reflect out-group homogeneity bias |
| Out-group polarization | Linville and Jones (1980) found that out-group members receive more extreme evaluations (both positive and negative), a consequence of simplified mental representations with fewer evaluative dimensions[^23] |

## How can out-group homogeneity bias be reduced?

Addressing out-group homogeneity bias requires interventions at both the human-cognitive level and the technical-algorithmic level.

### In human cognition

- **Intergroup contact**: The contact hypothesis (Allport, 1954) predicts that meaningful interaction between groups reduces prejudice and stereotyping when four conditions hold: equal status, common goals, cooperation rather than competition, and institutional support. Increased personal contact with out-group members adds exemplars to one's mental representation, increasing perceived variability.[^24]
- **Individuation training**: Encouraging people to attend to individuating features of out-group members (rather than category-level features) has been shown to reduce both the cross-race effect and broader out-group homogeneity perceptions.
- **Subgroup awareness**: Asking people to identify subgroups within an out-group increases perceived variability. This directly targets the subgroup complexity mechanism.
- **Perspective-taking**: Actively imagining the perspective of an out-group member can shift processing from categorical to individuated mode.

### In AI and ML systems

- **Dataset diversification**: Ensuring that training data contains balanced, varied representation from all demographic groups reduces the structural analogue of out-group homogeneity in the data itself.
- **[Fairness](/wiki/fairness_metric) auditing**: Systematic evaluation of model performance across demographic subgroups (as in the Gender Shades methodology) can reveal homogeneity-related performance gaps.
- **Disaggregated evaluation**: Reporting model accuracy, [precision](/wiki/precision), [recall](/wiki/recall), and error rates separately for each demographic group, rather than only as aggregate metrics, prevents homogeneity-related disparities from being masked by overall averages.
- **Social contact debiasing (SCD)**: Raj et al. (2024) built a dataset of 108,000 contact-probing prompts to measure bias across 13 social dimensions in three LLMs (LLaMA 2, Tulu, and NousHermes), then proposed instruction-tuning the model on prompts that simulate positive intergroup contact, inspired by the contact hypothesis from social psychology. This approach reduced measured bias in LLaMA 2 by up to 40% in a single epoch of fine-tuning without degrading performance on downstream tasks.[^14]
- **Adversarial testing**: Probing models with inputs designed to reveal differential treatment of demographic groups can uncover hidden homogeneity biases before deployment.
- **Synthetic data balancing**: NYU Tandon researchers demonstrated that training facial recognition models on large, demographically balanced synthetic face datasets (with the demographic distribution specified during generation) significantly reduced demographic bias while boosting overall accuracy relative to models trained on existing imbalanced data.[^25]

| Strategy | Domain | Mechanism |
|---|---|---|
| Intergroup contact | Human cognition | Adds exemplars, increases perceived variability |
| Individuation training | Human cognition | Shifts processing from categorical to individuated |
| Dataset diversification | ML/AI | Provides more exemplars of underrepresented groups |
| Fairness auditing | ML/AI | Detects differential performance across groups |
| Social contact debiasing | LLMs | Instruction-tunes models with simulated positive contact |
| Disaggregated evaluation | ML/AI | Prevents aggregate metrics from hiding group disparities |
| Synthetic data balancing | Computer vision | Supplements real data with balanced synthetic exemplars |

## See also

- [Bias (ethics and fairness)](/wiki/bias)
- [In-group bias](/wiki/in-group_bias)
- [Fairness (machine learning)](/wiki/fairness_metric)
- [Confirmation bias](/wiki/confirmation_bias)
- [Sampling bias](/wiki/sampling_bias)
- [Group attribution bias](/wiki/group_attribution_bias)
- [Automation bias](/wiki/automation_bias)
- [Demographic parity](/wiki/demographic_parity)
- [Disparate impact](/wiki/disparate_impact)
- [Counterfactual fairness](/wiki/counterfactual_fairness)

## References

1. Quattrone, G. A., & Jones, E. E. (1980). The perception of variability within in-groups and out-groups: Implications for the law of small numbers. *Journal of Personality and Social Psychology*, 38(1), 141-152. https://psycnet.apa.org/record/1980-29699-001
2. Park, B., & Rothbart, M. (1982). Perception of out-group homogeneity and levels of social categorization: Memory for the subordinate attributes of in-group and out-group members. *Journal of Personality and Social Psychology*, 42(6), 1051-1068. https://psycnet.apa.org/record/1982-25281-001
3. Tajfel, H., & Turner, J. C. (1979). An integrative theory of intergroup conflict. In W. G. Austin & S. Worchel (Eds.), *The Social Psychology of Intergroup Relations* (pp. 33-47). Brooks/Cole.
4. Linville, P. W., Fischer, G. W., & Salovey, P. (1989). Perceived distributions of the characteristics of in-group and out-group members: Empirical evidence and a computer simulation. *Journal of Personality and Social Psychology*, 57(2), 165-188. https://pubmed.ncbi.nlm.nih.gov/2760805/
5. Park, B., & Judd, C. M. (1990). Measures and models of perceived group variability. *Journal of Personality and Social Psychology*, 59(2), 173-191. https://psycnet.apa.org/record/1991-04492-001
6. Linville, P. W., & Fischer, G. W. (1993). Exemplar and abstraction models of perceived group variability and stereotypicality. *Social Cognition*, 11(1), 92-125. https://guilfordjournals.com/doi/10.1521/soco.1993.11.1.92
7. Boldry, J. G., Gaertner, L., & Quinn, J. (2007). Measuring the measures: A meta-analytic investigation of the measures of outgroup homogeneity. *Group Processes & Intergroup Relations*, 10(2), 157-178. https://journals.sagepub.com/doi/10.1177/1368430207075153
8. Buolamwini, J., & Gebru, T. (2018). Gender Shades: Intersectional accuracy disparities in commercial gender classification. *Proceedings of Machine Learning Research*, 81, 1-15. https://proceedings.mlr.press/v81/buolamwini18a.html
9. Grother, P., Ngan, M., & Hanaoka, K. (2019). Face Recognition Vendor Test (FRVT) Part 3: Demographic Effects. *NIST Interagency Report 8280*. https://nvlpubs.nist.gov/nistpubs/ir/2019/nist.ir.8280.pdf
10. Hughes, B. L., Camp, N. P., Gomez, J., Natu, V. S., Grill-Spector, K., & Eberhardt, J. L. (2019). Neural adaptation to faces reveals racial outgroup homogeneity effects in early perception. *Proceedings of the National Academy of Sciences*, 116(29), 14532-14537. https://www.pnas.org/doi/10.1073/pnas.1822084116
11. Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. *Advances in Neural Information Processing Systems*, 29, 4349-4357. https://arxiv.org/abs/1607.06520
12. Lee, M. H. J., Montgomery, J. M., & Lai, C. K. (2024). Large language models portray socially subordinate groups as more homogeneous, consistent with a bias observed in humans. *Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT)*, 1321-1340. arXiv:2401.08495. https://arxiv.org/abs/2401.08495
13. Lee, M. H. J. (2025). Examining the robustness of homogeneity bias to hyperparameter adjustments in GPT-4. *arXiv preprint arXiv:2501.02211*. https://arxiv.org/abs/2501.02211
14. Raj, C., Mukherjee, A., Caliskan, A., Anastasopoulos, A., & Zhu, Z. (2024). Breaking bias, building bridges: Evaluation and mitigation of social biases in LLMs via contact hypothesis. *Proceedings of the 2024 AAAI/ACM Conference on AI, Ethics, and Society (AIES)*. arXiv:2407.02030. https://arxiv.org/abs/2407.02030
15. Simon, B., & Brown, R. (1987). Perceived intragroup homogeneity in minority-majority contexts. *Journal of Personality and Social Psychology*, 53(4), 703-711. https://psycnet.apa.org/record/1988-13355-001
16. Google for Developers (2024). Machine Learning Glossary: Responsible AI, "out-group homogeneity bias." https://developers.google.com/machine-learning/glossary/fairness
17. Jones, E. E., Wood, G. C., & Quattrone, G. A. (1981). Perceived variability of personal characteristics in in-groups and out-groups: The role of knowledge and evaluation. *Personality and Social Psychology Bulletin*, 7(3), 523-528. https://journals.sagepub.com/doi/10.1177/014616728173024
18. Chance, J. E., Turner, A. L., & Goldstein, A. G. (1982). Development of differential recognition for own- and other-race faces. *The Journal of Psychology*, 112(1), 29-37. https://www.tandfonline.com/doi/abs/10.1080/00223980.1982.9923531
19. Reggev, N., Brodie, K., Cikara, M., & Mitchell, J. P. (2020). Human face-selective cortex does not distinguish between members of a racial outgroup. *eNeuro*, 7(3), ENEURO.0431-19.2020. https://www.eneuro.org/content/7/3/ENEURO.0431-19.2020
20. Lee, M. H. J., & Jeon, S. (2025). Token sampling uncertainty does not explain homogeneity bias in large language models. *arXiv preprint arXiv:2501.19337*. https://arxiv.org/abs/2501.19337
21. Lee, M. H. J., & Jeon, S. (2024). Vision-language models generate more homogeneous stories for phenotypically Black individuals. *arXiv preprint arXiv:2412.09668*. https://arxiv.org/abs/2412.09668
22. Dastin, J. (2018). Amazon scraps secret AI recruiting tool that showed bias against women. *Reuters*. https://www.reuters.com/article/world/insight-amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK0AG/
23. Linville, P. W., & Jones, E. E. (1980). Polarized appraisals of out-group members. *Journal of Personality and Social Psychology*, 38(5), 689-703. https://psycnet.apa.org/record/1981-12770-001
24. Allport, G. W. (1954). *The Nature of Prejudice*. Addison-Wesley. https://archive.org/details/natureofprejudic0000allp
25. Melzi, P., Rathgeb, C., Tolosana, R., Vera-Rodriguez, R., Morales, A., Lawatsch, D., Domin, F., & Schaubert, M. (2024). Synthetic data for the mitigation of demographic biases in face recognition. *arXiv preprint arXiv:2402.01472*. https://arxiv.org/abs/2402.01472
26. Turner, J. C., Hogg, M. A., Oakes, P. J., Reicher, S. D., & Wetherell, M. S. (1987). *Rediscovering the Social Group: A Self-Categorization Theory*. Blackwell.
27. National Institute of Standards and Technology (2019). NIST study evaluates effects of race, age, sex on face recognition software. NIST News, December 19, 2019. https://www.nist.gov/news-events/news/2019/12/nist-study-evaluates-effects-race-age-sex-face-recognition-software