Disparate Impact

Disparate impact refers to a legal and statistical concept describing situations where a seemingly neutral policy, practice, or algorithm produces disproportionately adverse outcomes for members of a protected class (such as a racial, ethnic, or gender group), regardless of whether there was any intent to discriminate. Originating in United States employment discrimination law, the concept has become central to algorithmic fairness and the study of bias in machine learning systems.

Unlike disparate treatment, which requires proof of intentional discrimination, disparate impact focuses entirely on outcomes. A hiring algorithm, credit scoring model, or criminal risk assessment tool can violate disparate impact standards even if it was designed without any discriminatory purpose, so long as its results fall disproportionately on a protected group.

Legal origins

Griggs v. Duke Power Co. (1971)

The disparate impact doctrine traces its origins to the landmark U.S. Supreme Court case Griggs v. Duke Power Co., 401 U.S. 424 (1971). The case involved thirteen Black employees at Duke Power Company's Dan River Steam Station in Draper, North Carolina. Duke Power had a documented history of racial segregation: Black workers were confined to the Labor Department, where the highest-paid worker earned less than the lowest-paid employee in the four other departments reserved for white workers.

After the passage of the Civil Rights Act of 1964, Duke Power imposed two new requirements for transfer out of the Labor Department: a high school diploma and a minimum score on two standardized aptitude tests. These requirements appeared race-neutral on their face, but they effectively screened out a disproportionate number of Black applicants. Neither requirement had been shown to predict job performance.

In a unanimous decision authored by Chief Justice Warren Burger, the Court held that Title VII of the Civil Rights Act of 1964 prohibits employment practices that have a discriminatory effect on protected groups, even when the employer harbors no discriminatory intent. The Court wrote that "the Act proscribes not only overt discrimination, but also practices that are fair in form, but discriminatory in operation." The employer bore the burden of demonstrating that any requirement with a disparate impact was "reasonably related" to the job in question.

Before Griggs, plaintiffs alleging employment discrimination had to prove discriminatory intent. After Griggs, they needed to show only discriminatory effects.

Wards Cove Packing Co. v. Atonio (1989)

In Wards Cove Packing Co. v. Atonio, 490 U.S. 642 (1989), the Supreme Court narrowed the Griggs framework. The case involved nonwhite cannery workers at Alaskan salmon canneries who alleged that hiring practices led to racial stratification, with skilled noncannery jobs filled predominantly by white workers and unskilled cannery positions filled by nonwhite workers.

The Court held that plaintiffs must identify the specific employment practice responsible for the statistical disparity, rather than pointing to overall workforce imbalances. It also shifted the burden of proof: rather than requiring employers to prove business necessity, the Court ruled that employers needed only to produce evidence of a legitimate business justification, while the burden of persuasion remained with the plaintiffs. This decision was widely criticized for weakening disparate impact protections.

Civil Rights Act of 1991

Congress responded to Wards Cove by passing the Civil Rights Act of 1991, which codified the disparate impact framework into Title VII at 42 U.S.C. Section 2000e-2(k). The Act restored the burden of proof to employers, requiring them to demonstrate that any challenged practice is "job-related and consistent with business necessity." Even when an employer meets this burden, the plaintiff can still prevail by showing that an alternative practice with less disparate impact could serve the employer's legitimate needs equally well. The Act explicitly defined "business necessity" as bearing "a significant and manifest relationship to the requirements for effective job performance," restoring the standard set in Griggs.

Texas Department of Housing v. Inclusive Communities Project (2015)

In Texas Department of Housing and Community Affairs v. Inclusive Communities Project, Inc., 576 U.S. 519 (2015), the Supreme Court extended disparate impact doctrine beyond employment. In a 5-4 decision written by Justice Kennedy, the Court held that disparate impact claims are cognizable under the Fair Housing Act of 1968. The case involved allegations that the Texas housing agency allocated low-income housing tax credits in a pattern that reinforced racial segregation. The ruling confirmed that housing discrimination claims can proceed based on discriminatory effects, without requiring proof of discriminatory intent.

The four-fifths rule

The four-fifths rule (also called the 80% rule) is a practical guideline for identifying potential adverse impact. It was established by the Equal Employment Opportunity Commission (EEOC), along with other federal agencies, in the 1978 Uniform Guidelines on Employee Selection Procedures (29 CFR Section 1607.4).

How the rule works

The rule states that a selection rate for any race, sex, or ethnic group that is less than four-fifths (80%) of the selection rate for the group with the highest rate will generally be regarded as evidence of adverse impact. The calculation proceeds in three steps:

Calculate each group's selection rate. The selection rate equals the number of individuals selected (hired, promoted, admitted, etc.) divided by the total number of applicants in that group.
Identify the group with the highest selection rate. This becomes the reference group.
Compute the impact ratio. For each other group, divide its selection rate by the reference group's selection rate.

If the impact ratio falls below 0.80 (80%), the selection process may have adverse impact.

Worked example

Group	Applicants	Selected	Selection rate	Impact ratio
Group A	400	120	30.0%	1.00 (reference)
Group B	300	60	20.0%	0.67
Group C	200	50	25.0%	0.83

In this example, Group A has the highest selection rate (30%). Group B's impact ratio is 20% / 30% = 0.67, which is below 0.80 and suggests possible adverse impact. Group C's impact ratio is 25% / 30% = 0.83, which is above the threshold.

Limitations of the four-fifths rule

The four-fifths rule was designed as a practical rule of thumb, not a definitive legal standard. According to the EEOC's own guidance, the rule "speaks only to the question of adverse impact, and is not intended to resolve the ultimate question of unlawful discrimination." It merely establishes a numerical basis for drawing an initial inference and for requiring additional information. Several limitations apply:

Small sample sizes can produce misleading results. A single additional hire or rejection can dramatically change the impact ratio when group sizes are small.
Larger differences in selection rates may still constitute adverse impact even when the four-fifths threshold is technically met, particularly when those differences are statistically significant.
The rule does not account for the statistical significance of observed differences. Courts and regulators often require more rigorous statistical testing (such as chi-squared tests or standard deviation analysis) alongside the four-fifths ratio.
The original rule was developed for employment selection, and its mechanical application to algorithmic systems introduces complications that its authors did not anticipate.

Disparate impact vs. disparate treatment

Disparate impact and disparate treatment are the two primary theories of discrimination under U.S. civil rights law. They differ in several ways.

Dimension	Disparate impact	Disparate treatment
Intent required	No; focuses on outcomes	Yes; requires proof of intentional discrimination
What plaintiff must show	A neutral practice causes disproportionate harm to a protected group	The employer treated the plaintiff differently because of a protected characteristic
Employer defense	The practice is job-related and consistent with business necessity	The employer had a legitimate, nondiscriminatory reason for the action
Plaintiff rebuttal	An alternative practice exists with less disparate impact	The employer's stated reason is a pretext for discrimination
Typical evidence	Statistical analysis of selection rates or outcomes	Direct evidence of bias, comparative evidence, statements, or patterns
Relevance to AI	High, because algorithms rarely have demonstrable "intent"	Lower, unless a system was explicitly programmed to use protected attributes

The distinction between these two theories is particularly important for artificial intelligence systems. When an algorithm produces discriminatory outputs, proving intentional discrimination is often impractical. Intent is a concept ascribed to human beings, and machines do not possess it in any meaningful legal sense. They execute instructions, even when those instructions produce biased results. For this reason, disparate impact doctrine has been described by legal scholars as the theory most likely to provide meaningful recourse against algorithmic discrimination.

Disparate impact in machine learning

The disparate impact ratio

In the machine learning fairness literature, disparate impact is typically measured using the disparate impact ratio (DIR), which directly adapts the four-fifths rule. The formula is:

Disparate Impact Ratio = P(Y_hat = positive | Group = unprivileged) / P(Y_hat = positive | Group = privileged)

where Y_hat is the model's predicted outcome. A ratio of 1.0 indicates perfect parity between groups. A ratio below 0.80 is typically flagged as evidence of disparate impact. Values above 1.0 indicate that the unprivileged group receives favorable outcomes at a higher rate than the privileged group.

IBM's AI Fairness 360 (AIF360) toolkit implements this metric as the disparate_impact_ratio function. Google's Responsible AI toolkit and Microsoft's Fairlearn library also provide implementations.

Relationship to demographic parity

Demographic parity (also called statistical parity or group fairness) is a closely related fairness criterion. Demographic parity requires that the probability of a positive prediction be equal across groups:

P(Y_hat = positive | Group = A) = P(Y_hat = positive | Group = B)

The disparate impact ratio quantifies how close a model comes to demographic parity. A ratio of exactly 1.0 means perfect demographic parity has been achieved. The 0.80 threshold from the four-fifths rule provides a practical standard for "close enough" to parity.

However, a 2022 paper by Wachter, Mittelstadt, and Russell (published at FAccT 2024) argued that the algorithmic fairness community has created an "imperfect synecdoche" by equating the four-fifths rule with the legal concept of disparate impact. The authors contend that the four-fifths rule was never a legal rule for establishing discrimination; it is merely a screening tool. Treating it as a hard threshold in ML fairness introduces new ethical problems absent from the original legal framework.

Sources of disparate impact in ML systems

Disparate impact in machine learning arises from multiple sources across the model development pipeline.

Source	Description	Example
Historical bias	Training data reflects past societal discrimination	A hiring model trained on historical decisions inherits past biases against women or minorities
Representation bias	Some groups are underrepresented in training data	Medical imaging datasets with few samples from darker-skinned patients lead to lower diagnostic accuracy for those groups
Measurement bias	Features are measured or recorded differently across groups	Using arrest records as a proxy for criminal behavior, when policing patterns vary by neighborhood
Proxy discrimination	Neutral features serve as proxies for protected attributes	Zip code, university name, or browsing history correlated with race or socioeconomic status
Label bias	Ground truth labels encode human prejudices	Performance ratings used as training labels may reflect supervisor bias
Aggregation bias	A single model is applied to populations with different characteristics	A medical risk model calibrated on one demographic produces inaccurate predictions for another
Feedback loops	Biased predictions influence future data collection	Predictive policing systems direct officers to neighborhoods that are already over-policed, generating more arrests in those areas and reinforcing the model's predictions

The fairness impossibility theorem

A fundamental theoretical result constrains what any fairness intervention can achieve. Chouldechova (2017) and Kleinberg, Mullainathan, and Raghavan (2016) independently proved that when base rates (the underlying rate of the positive outcome) differ between groups, it is mathematically impossible to simultaneously satisfy three desirable fairness properties:

Calibration: among individuals who receive a given risk score, the actual rate of the outcome should be the same across groups.
Equal false positive rates: the rate at which non-events are incorrectly flagged should be equal across groups.
Equal false negative rates: the rate at which actual events are missed should be equal across groups.

This impossibility result means that achieving disparate impact parity (or demographic parity) may require accepting tradeoffs in other fairness criteria, or in model accuracy. Every system design involves normative choices about which fairness criterion to prioritize.

Real-world cases and examples

Amazon's resume screening tool

In 2018, Reuters reported that Amazon had developed an experimental AI recruiting tool, built starting in 2014, that rated job candidates on a one-to-five-star scale. By 2015, the company discovered that the system was penalizing resumes containing the word "women's" (as in "women's chess club captain") and downgrading graduates of two all-women's colleges. The tool favored language patterns common in male engineers' resumes, such as verbs like "executed" and "captured." Because the training data consisted of resumes submitted over a ten-year period during which the majority of hires were male, the algorithm learned to reproduce the existing demographic skew. Amazon disbanded the team and scrapped the project.

COMPAS recidivism prediction

The Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) system, developed by Northpointe (now Equivant), assigns risk scores to criminal defendants to predict likelihood of reoffending. A 2016 investigation by ProPublica found that Black defendants were nearly twice as likely as white defendants to be incorrectly labeled as high risk (false positives), while white defendants were more likely to be incorrectly labeled as low risk (false negatives). Northpointe disputed the findings, arguing that COMPAS was calibrated: among defendants who received the same risk score, the actual recidivism rates were similar across races. This disagreement illustrated the impossibility theorem in practice, since calibration and equal error rates could not both be achieved when base rates differed.

Credit scoring and lending

Algorithmic credit scoring models have faced scrutiny for disparate impact on racial and ethnic minorities. Because many features used in credit models (such as zip code, educational history, and spending patterns) correlate with race, seemingly neutral models can produce racially disparate approval rates. The Consumer Financial Protection Bureau (CFPB) has issued guidance emphasizing that lenders must test models for disparate impact and, when impact is found, must demonstrate that the model serves a legitimate business need with no less discriminatory alternative available. Research by CFPB analysts identified alternative credit scoring models that reduced racial disparities while maintaining comparable predictive performance.

Healthcare resource allocation

A 2019 study published in Science by Obermeyer et al. found that a widely used algorithm for identifying patients who would benefit from extra medical care exhibited significant racial bias. The algorithm used healthcare spending as a proxy for healthcare need. Because Black patients historically had less access to healthcare and therefore lower spending, the algorithm systematically underestimated their medical needs. At a given risk score, Black patients were considerably sicker than white patients with the same score. Correcting the algorithm by using health outcomes rather than spending as the target variable reduced the racial disparity substantially.

Mitigation techniques

Approaches to mitigating disparate impact in machine learning are typically organized into three categories based on when they intervene in the modeling pipeline.

Pre-processing methods

Pre-processing methods modify the training data before any model is built.

Method	How it works
Reweighting	Assigns different sample weights to training instances so that protected groups are represented more equitably in the learning process
Resampling	Uses oversampling of underrepresented groups or undersampling of overrepresented groups to balance the training distribution
Disparate impact remover	Adjusts feature distributions so they are identical across protected groups while preserving rank ordering within groups. Proposed by Feldman et al. (2015) at KDD
Learning fair representations	Transforms the feature space into a new representation that encodes useful information while obscuring membership in protected groups
Relabeling	Modifies a subset of training labels near the decision boundary to reduce bias

In-processing methods

In-processing methods incorporate fairness constraints directly into the model training procedure.

Method	How it works
Fairness-constrained optimization	Adds a regularization term to the loss function that penalizes violations of a chosen fairness metric
Adversarial debiasing	Trains a primary predictor alongside an adversary that tries to predict the protected attribute from the predictor's output; the primary model is trained to minimize the adversary's accuracy
Prejudice remover	Integrates a fairness-aware regularization term into logistic regression, penalizing the mutual information between predictions and the protected attribute
Exponentiated gradient reduction	Solves a sequence of cost-sensitive classification problems to find a model that approximately satisfies fairness constraints

Post-processing methods

Post-processing methods adjust model predictions after training, without modifying the model itself.

Method	How it works
Threshold adjustment	Sets different classification thresholds for different groups to equalize selection rates or error rates
Equalized odds post-processing	Adjusts predictions to ensure that false positive rates and true positive rates are equal across groups
Calibrated equalized odds	Modifies predictions to balance equalized odds with calibration
Reject option classification	Gives favorable outcomes to unprivileged groups and unfavorable outcomes to privileged groups for instances near the decision boundary where the model is uncertain

Tradeoffs in mitigation

All mitigation techniques involve tradeoffs. Improving demographic parity or the disparate impact ratio typically reduces overall model accuracy, since the model is being constrained from learning patterns that correlate with both the target variable and the protected attribute. The magnitude of the accuracy cost depends on the degree of correlation between protected attributes and legitimate predictive features, the base rate differences between groups, and the specific fairness criterion being optimized.

Regulatory and policy landscape

United States

The enforcement of disparate impact doctrine in the United States has undergone significant changes.

EEOC guidance on AI (2023). On May 18, 2023, the EEOC released guidance titled "Select Issues: Assessing Adverse Impact in Software, Algorithms, and Artificial Intelligence Used in Employment Selection Procedures Under Title VII." The guidance confirmed that employers can be held liable for disparate impact caused by AI tools used in hiring, even when the tools are developed and operated by third-party vendors. Employers must ensure that algorithmic selection procedures do not produce adverse impact unless the employer can demonstrate that the tool is job-related and consistent with business necessity.

Executive Order 14281 (2025). On April 23, 2025, President Trump signed Executive Order 14281, titled "Restoring Equality of Opportunity and Meritocracy." The order directed all federal agencies to "deprioritize enforcement of all statutes and regulations to the extent they include disparate-impact liability." It instructed the Attorney General to repeal relevant Title VI regulations and directed agencies such as the EEOC, HUD, and the CFPB to review active cases relying on disparate impact theories. The EEOC announced plans to close all pending disparate impact charges. However, the executive order does not repeal the statutory provisions of Title VII or the Fair Housing Act. Private plaintiffs retain the right to bring disparate impact claims in court, and the statutory framework established by the Civil Rights Act of 1991 remains in force absent new legislation or Supreme Court rulings.

State and local regulation. Several states and cities have enacted their own rules. New York City's Local Law 144 (effective July 5, 2023) requires employers using automated employment decision tools (AEDTs) to conduct annual bias audits calculating selection rates and impact ratios by sex, race, and ethnicity, publish audit summaries on their websites, and notify candidates that an AEDT is being used. Illinois, Maryland, and Colorado have also enacted legislation addressing AI in employment decisions.

European Union

The EU AI Act, which entered into force on August 1, 2024, takes a risk-based approach to regulating AI systems. AI systems used in employment, credit scoring, law enforcement, and other high-risk domains are subject to requirements for bias testing, technical documentation, and post-deployment monitoring. The Act does not use the term "disparate impact" directly but draws on the EU's existing non-discrimination framework, where the concept of "indirect discrimination" serves a similar function. Under EU law, indirect discrimination occurs when an apparently neutral provision, criterion, or practice puts persons of a particular protected group at a particular disadvantage compared with other persons.

Fairness toolkits and software

Several open-source libraries provide implementations of disparate impact metrics and mitigation algorithms.

Toolkit	Developer	Key capabilities
AI Fairness 360 (AIF360)	IBM	Disparate impact ratio, disparate impact remover, reweighting, adversarial debiasing, equalized odds post-processing, and 70+ fairness metrics
Fairlearn	Microsoft	Demographic parity, equalized odds, threshold optimization, exponentiated gradient, and integration with scikit-learn
What-If Tool	Google	Visual exploration of model performance across subgroups, with fairness metric computation
Aequitas	University of Chicago	Bias audit toolkit for computing group-level fairness metrics including disparate impact
Themis-ML	Bantilan (2018)	Fairness-aware machine learning library with pre- and post-processing methods

Explain like I'm 5 (ELI5)

Imagine a school is picking kids for the soccer team. The coach says, "Everyone has to pass a juggling test." The test sounds fair because the same rule applies to everyone. But it turns out that kids who grew up with a soccer ball at home do much better on the juggling test than kids who did not. If most kids from one neighborhood had soccer balls and most kids from another neighborhood did not, then the juggling test would keep out more kids from the second neighborhood, even though the coach did not mean to be unfair. That is disparate impact: a rule that looks fair but ends up hurting one group more than another. In the computer world, this happens when a program makes decisions (like who gets a job or a loan) using patterns that accidentally work against certain groups of people.

References

Griggs v. Duke Power Co., 401 U.S. 424 (1971). United States Supreme Court.
Wards Cove Packing Co. v. Atonio, 490 U.S. 642 (1989). United States Supreme Court.
Civil Rights Act of 1991, Pub. L. 102-166, 105 Stat. 1071, codified at 42 U.S.C. Section 2000e-2(k).
Uniform Guidelines on Employee Selection Procedures (1978), 29 CFR Section 1607.4. Equal Employment Opportunity Commission.
Texas Department of Housing and Community Affairs v. Inclusive Communities Project, Inc., 576 U.S. 519 (2015). United States Supreme Court.
Feldman, M., Friedler, S. A., Moeller, J., Scheidegger, C., & Venkatasubramanian, S. (2015). "Certifying and Removing Disparate Impact." Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
Barocas, S. & Selbst, A. D. (2016). "Big Data's Disparate Impact." California Law Review, 104, 671-732.
Chouldechova, A. (2017). "Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments." Big Data, 5(2), 153-163.
Kleinberg, J., Mullainathan, S., & Raghavan, M. (2016). "Inherent Trade-Offs in the Fair Determination of Risk Scores." arXiv preprint arXiv:1609.05807.
Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). "Machine Bias." ProPublica.
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). "Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations." Science, 366(6464), 447-453.
EEOC (2023). "Select Issues: Assessing Adverse Impact in Software, Algorithms, and Artificial Intelligence Used in Employment Selection Procedures Under Title VII."
Wachter, S., Mittelstadt, B., & Russell, C. (2024). "The Four-Fifths Rule is Not Disparate Impact: A Woeful Tale of Epistemic Trespassing in Algorithmic Fairness." Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT).
Executive Order 14281, "Restoring Equality of Opportunity and Meritocracy" (April 23, 2025).
New York City Local Law 144 of 2021, regarding automated employment decision tools.

Legal origins

Griggs v. Duke Power Co. (1971)

Wards Cove Packing Co. v. Atonio (1989)

Civil Rights Act of 1991

Texas Department of Housing v. Inclusive Communities Project (2015)

The four-fifths rule

How the rule works

Worked example

Limitations of the four-fifths rule

Disparate impact vs. disparate treatment

Disparate impact in machine learning

The disparate impact ratio

Relationship to demographic parity

Sources of disparate impact in ML systems

The fairness impossibility theorem

Real-world cases and examples

Amazon's resume screening tool

COMPAS recidivism prediction

Credit scoring and lending

Healthcare resource allocation

Mitigation techniques

Pre-processing methods

In-processing methods

Post-processing methods

Tradeoffs in mitigation

Regulatory and policy landscape

United States

European Union

Fairness toolkits and software

Explain like I'm 5 (ELI5)

See also

References

Improve this article

Related Articles

Disparate Treatment

ARC-AGI 2

Counterfactual Fairness

Demographic Parity

Equalized Odds

Fairness Constraint

Legal origins

Griggs v. Duke Power Co. (1971)

Wards Cove Packing Co. v. Atonio (1989)

Civil Rights Act of 1991

Texas Department of Housing v. Inclusive Communities Project (2015)

The four-fifths rule

How the rule works

Worked example

Limitations of the four-fifths rule

Disparate impact vs. disparate treatment

Disparate impact in machine learning

The disparate impact ratio

Relationship to demographic parity

Sources of disparate impact in ML systems

The fairness impossibility theorem

Real-world cases and examples

Amazon's resume screening tool

COMPAS recidivism prediction

Credit scoring and lending

Healthcare resource allocation

Mitigation techniques

Pre-processing methods

In-processing methods

Post-processing methods

Tradeoffs in mitigation

Regulatory and policy landscape

United States

European Union

Fairness toolkits and software

Explain like I'm 5 (ELI5)

See also

References

Related Articles

Disparate Treatment

ARC-AGI 2

Counterfactual Fairness

Demographic Parity

Equalized Odds

Fairness Constraint