Disparate Impact
Last reviewed
Jun 2, 2026
Sources
24 citations
Review status
Source-backed
Revision
v5 ยท 5,253 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 2, 2026
Sources
24 citations
Review status
Source-backed
Revision
v5 ยท 5,253 words
Add missing citations, update stale details, or suggest a clearer explanation.
Disparate impact refers to a legal and statistical concept describing situations where a seemingly neutral policy, practice, or algorithm produces disproportionately adverse outcomes for members of a protected class (such as a racial, ethnic, or gender group), regardless of whether there was any intent to discriminate. Originating in United States employment discrimination law, the concept has become central to algorithmic fairness and the study of bias in machine learning systems.
Unlike disparate treatment, which requires proof of intentional discrimination, disparate impact focuses entirely on outcomes. A hiring algorithm, credit scoring model, or criminal risk assessment tool can violate disparate impact standards even if it was designed without any discriminatory purpose, so long as its results fall disproportionately on a protected group.
The disparate impact doctrine traces its origins to the landmark U.S. Supreme Court case Griggs v. Duke Power Co., 401 U.S. 424 (1971).[1] The case involved thirteen Black employees at Duke Power Company's Dan River Steam Station in Draper, North Carolina. Duke Power had a documented history of racial segregation: Black workers were confined to the Labor Department, where the highest-paid worker earned less than the lowest-paid employee in the four other departments reserved for white workers.
After the passage of the Civil Rights Act of 1964, Duke Power imposed two new requirements for transfer out of the Labor Department: a high school diploma and a minimum score on two standardized aptitude tests. These requirements appeared race-neutral on their face, but they effectively screened out a disproportionate number of Black applicants. Neither requirement had been shown to predict job performance.
In a unanimous decision authored by Chief Justice Warren Burger, the Court held that Title VII of the Civil Rights Act of 1964 prohibits employment practices that have a discriminatory effect on protected groups, even when the employer harbors no discriminatory intent. The Court wrote that "the Act proscribes not only overt discrimination, but also practices that are fair in form, but discriminatory in operation." It added that "good intent or absence of discriminatory intent does not redeem employment procedures or testing mechanisms that operate as 'built-in headwinds' for minority groups and are unrelated to measuring job capability," and that "Congress directed the thrust of the Act to the consequences of employment practices, not simply the motivation."[1] The employer bore the burden of demonstrating that any requirement with a disparate impact was "reasonably related" to the job in question.
Before Griggs, plaintiffs alleging employment discrimination had to prove discriminatory intent. After Griggs, they needed to show only discriminatory effects.
In Wards Cove Packing Co. v. Atonio, 490 U.S. 642 (1989), the Supreme Court narrowed the Griggs framework.[2] The case involved nonwhite cannery workers at Alaskan salmon canneries who alleged that hiring practices led to racial stratification, with skilled noncannery jobs filled predominantly by white workers and unskilled cannery positions filled by nonwhite workers.
The Court held that plaintiffs must identify the specific employment practice responsible for the statistical disparity, rather than pointing to overall workforce imbalances. It also shifted the burden of proof: rather than requiring employers to prove business necessity, the Court ruled that employers needed only to produce evidence of a legitimate business justification, while the burden of persuasion remained with the plaintiffs. This decision was widely criticized for weakening disparate impact protections.
Congress responded to Wards Cove by passing the Civil Rights Act of 1991, which codified the disparate impact framework into Title VII at 42 U.S.C. Section 2000e-2(k).[3] The Act restored the burden of proof to employers, requiring them to demonstrate that any challenged practice is "job related for the position in question and consistent with business necessity." Even when an employer meets this burden, the plaintiff can still prevail by showing that an alternative practice with less disparate impact could serve the employer's legitimate needs equally well. The 1991 Act was one of several provisions expressly intended to overturn Wards Cove and restore the pre-1989 understanding of business necessity set out in Griggs.[3]
In Texas Department of Housing and Community Affairs v. Inclusive Communities Project, Inc., 576 U.S. 519 (2015), decided on June 25, 2015, the Supreme Court extended disparate impact doctrine beyond employment.[5] In a 5-4 decision written by Justice Kennedy, the Court held that disparate impact claims are cognizable under the Fair Housing Act of 1968. The case involved allegations that the Texas housing agency allocated low-income housing tax credits in a pattern that reinforced racial segregation. The ruling confirmed that housing discrimination claims can proceed based on discriminatory effects, without requiring proof of discriminatory intent. At the same time, the majority imposed a "robust causality requirement," holding that a plaintiff must point to a specific policy causing the disparity and that "racial imbalance does not, without more, establish a prima facie case of disparate impact." Justice Alito dissented, joined by Chief Justice Roberts and Justices Scalia and Thomas.[5] As of 2026, the Supreme Court has not revisited or overruled Inclusive Communities.[16]
The four-fifths rule (also called the 80% rule) is a practical guideline for identifying potential adverse impact. It was established by the Equal Employment Opportunity Commission (EEOC), along with the Civil Service Commission, the Department of Labor, and the Department of Justice, in the 1978 Uniform Guidelines on Employee Selection Procedures (29 CFR Section 1607.4).[4]
The rule states that a selection rate for any race, sex, or ethnic group that is less than four-fifths (80%) of the selection rate for the group with the highest rate will generally be regarded as evidence of adverse impact. The calculation proceeds in three steps:
If the impact ratio falls below 0.80 (80%), the selection process may have adverse impact.
| Group | Applicants | Selected | Selection rate | Impact ratio |
|---|---|---|---|---|
| Group A | 400 | 120 | 30.0% | 1.00 (reference) |
| Group B | 300 | 60 | 20.0% | 0.67 |
| Group C | 200 | 50 | 25.0% | 0.83 |
In this example, Group A has the highest selection rate (30%). Group B's impact ratio is 20% / 30% = 0.67, which is below 0.80 and suggests possible adverse impact. Group C's impact ratio is 25% / 30% = 0.83, which is above the threshold.
The four-fifths rule was designed as a practical rule of thumb, not a definitive legal standard. According to the EEOC's own guidance, the rule "speaks only to the question of adverse impact, and is not intended to resolve the ultimate question of unlawful discrimination."[4] It merely establishes a numerical basis for drawing an initial inference and for requiring additional information. Several limitations apply:
Disparate impact and disparate treatment are the two primary theories of discrimination under U.S. civil rights law. They differ in several ways.
| Dimension | Disparate impact | Disparate treatment |
|---|---|---|
| Intent required | No; focuses on outcomes | Yes; requires proof of intentional discrimination |
| What plaintiff must show | A neutral practice causes disproportionate harm to a protected group | The employer treated the plaintiff differently because of a protected characteristic |
| Employer defense | The practice is job-related and consistent with business necessity | The employer had a legitimate, nondiscriminatory reason for the action |
| Plaintiff rebuttal | An alternative practice exists with less disparate impact | The employer's stated reason is a pretext for discrimination |
| Typical evidence | Statistical analysis of selection rates or outcomes | Direct evidence of bias, comparative evidence, statements, or patterns |
| Relevance to AI | High, because algorithms rarely have demonstrable "intent" | Lower, unless a system was explicitly programmed to use protected attributes |
The distinction between these two theories is particularly important for artificial intelligence systems. When an algorithm produces discriminatory outputs, proving intentional discrimination is often impractical. Intent is a concept ascribed to human beings, and machines do not possess it in any meaningful legal sense. They execute instructions, even when those instructions produce biased results. For this reason, legal scholars Solon Barocas and Andrew Selbst argued in their influential 2016 article "Big Data's Disparate Impact" that disparate impact is the doctrine most likely to provide meaningful recourse against algorithmic discrimination, while also cautioning that data mining can both reflect prior prejudice and produce new forms of discrimination that existing antidiscrimination law struggles to reach.[7]
In the machine learning fairness literature, disparate impact is typically measured using the disparate impact ratio (DIR), which directly adapts the four-fifths rule. The formula is:
Disparate Impact Ratio = P(Y_hat = positive | Group = unprivileged) / P(Y_hat = positive | Group = privileged)
where Y_hat is the model's predicted outcome. A ratio of 1.0 indicates perfect parity between groups. A ratio below 0.80 is typically flagged as evidence of disparate impact. Values above 1.0 indicate that the unprivileged group receives favorable outcomes at a higher rate than the privileged group.
This framing was formalized by Feldman et al. (2015) in "Certifying and Removing Disparate Impact," presented at KDD.[6] Their key technical insight was that the presence of disparate impact in a dataset is closely tied to how well the protected attribute can be predicted from the remaining features. If a classifier can accurately infer group membership from the other attributes, those attributes can serve as proxies and the data is said to admit disparate impact. The authors framed disparate impact removal as reducing this "information leakage" of the protected class.[6]
IBM's AI Fairness 360 (AIF360) toolkit implements this metric as the disparate_impact_ratio function and treats 0.8 as the acceptable lower bound.[17] Microsoft's Fairlearn library provides the equivalent demographic_parity_ratio, and Google's What-If Tool also computes group selection rates for fairness analysis.[18]
Demographic parity (also called statistical parity or group fairness) is a closely related fairness criterion. Demographic parity requires that the probability of a positive prediction be equal across groups:
P(Y_hat = positive | Group = A) = P(Y_hat = positive | Group = B)
The disparate impact ratio quantifies how close a model comes to demographic parity. A ratio of exactly 1.0 means perfect demographic parity has been achieved. The 0.80 threshold from the four-fifths rule provides a practical standard for "close enough" to parity.
However, a paper by Elizabeth Anne Watkins, Michael McKenna, and Jiahao Chen (first posted in 2022 and published at FAccT 2024) argued that the algorithmic fairness community has created an "imperfect synecdoche" by abstracting the four-fifths rule, which is only one part of disparate impact discrimination law, into a standalone disparate impact metric.[13] The authors contend that the four-fifths rule was never a legal rule for establishing discrimination; it is merely a screening tool. They argue that codifying it as a hard threshold in machine learning, including in popular fairness toolkits, introduces new deontic assumptions and potentials for ethical harm that were absent from the original rule, a phenomenon they characterize as "epistemic trespassing" when computer scientists translate legal concepts without the relevant legal context.[13]
Disparate impact in machine learning arises from multiple sources across the model development pipeline.
| Source | Description | Example |
|---|---|---|
| Historical bias | Training data reflects past societal discrimination | A hiring model trained on historical decisions inherits past biases against women or minorities |
| Representation bias | Some groups are underrepresented in training data | Medical imaging datasets with few samples from darker-skinned patients lead to lower diagnostic accuracy for those groups |
| Measurement bias | Features are measured or recorded differently across groups | Using arrest records as a proxy for criminal behavior, when policing patterns vary by neighborhood |
| Proxy discrimination | Neutral features serve as proxies for protected attributes | Zip code, university name, or browsing history correlated with race or socioeconomic status |
| Label bias | Ground truth labels encode human prejudices | Performance ratings used as training labels may reflect supervisor bias |
| Aggregation bias | A single model is applied to populations with different characteristics | A medical risk model calibrated on one demographic produces inaccurate predictions for another |
| Feedback loops | Biased predictions influence future data collection | Predictive policing systems direct officers to neighborhoods that are already over-policed, generating more arrests in those areas and reinforcing the model's predictions |
A fundamental theoretical result constrains what any fairness intervention can achieve. Chouldechova (2017)[8] and Kleinberg, Mullainathan, and Raghavan (2016)[9] independently proved that when base rates (the underlying rate of the positive outcome) differ between groups, it is mathematically impossible to simultaneously satisfy three desirable fairness properties:
This impossibility result means that achieving disparate impact parity (or demographic parity) may require accepting tradeoffs in other fairness criteria, or in model accuracy. Every system design involves normative choices about which fairness criterion to prioritize.
In 2018, Reuters reported that Amazon had developed an experimental AI recruiting tool, built starting in 2014, that rated job candidates on a one-to-five-star scale.[19] By 2015, the company discovered that the system was penalizing resumes containing the word "women's" (as in "women's chess club captain") and downgrading graduates of two all-women's colleges. The tool favored language patterns common in male engineers' resumes, such as verbs like "executed" and "captured." Because the training data consisted of resumes submitted over a ten-year period during which the majority of hires were male, the algorithm learned to reproduce the existing demographic skew.[19] Amazon disbanded the team and scrapped the project. The episode is frequently cited as an example of how historical bias in training data can produce disparate impact even when protected attributes are never used as explicit inputs.
The Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) system, developed by Northpointe (now Equivant), assigns risk scores to criminal defendants to predict likelihood of reoffending. A 2016 investigation by ProPublica found that Black defendants were nearly twice as likely as white defendants to be incorrectly labeled as high risk (false positives), while white defendants were more likely to be incorrectly labeled as low risk (false negatives).[10] Northpointe disputed the findings, arguing that COMPAS was calibrated: among defendants who received the same risk score, the actual recidivism rates were similar across races. This disagreement illustrated the impossibility theorem in practice, since calibration and equal error rates (a component of equalized odds) could not both be achieved when base rates differed.[8]
Algorithmic credit scoring models have faced scrutiny for disparate impact on racial and ethnic minorities. Because many features used in credit models (such as zip code, educational history, and spending patterns) correlate with race, seemingly neutral models can produce racially disparate approval rates. The Consumer Financial Protection Bureau (CFPB) has issued guidance emphasizing that lenders must test models for disparate impact and, when impact is found, must demonstrate that the model serves a legitimate business need with no less discriminatory alternative available. Research by CFPB analysts identified alternative credit scoring models that reduced racial disparities while maintaining comparable predictive performance.
A 2019 study published in Science by Obermeyer et al. found that a widely used algorithm for identifying patients who would benefit from extra medical care exhibited significant racial bias.[11] The algorithm, applied to roughly 200 million people in the United States each year, used healthcare spending as a proxy for healthcare need. Because Black patients historically had less access to healthcare and therefore lower spending, the algorithm systematically underestimated their medical needs. At a given risk score, Black patients were considerably sicker than white patients with the same score. The researchers calculated that correcting the algorithm to predict health outcomes rather than spending would increase the share of Black patients flagged for additional care from 17.7% to 46.5%, illustrating how the choice of a target variable (label bias) can create severe disparate impact.[11]
Algorithmic tenant screening produced one of the first major settlements applying disparate impact theory to an AI scoring tool. In Louis v. SafeRent Solutions, filed in 2022 in the U.S. District Court for the District of Massachusetts, plaintiffs alleged that SafeRent's tenant screening score disproportionately assigned low scores to Black and Hispanic rental applicants and to applicants who used federally funded housing vouchers, in violation of the Fair Housing Act and Massachusetts law. A central allegation was that the score did not account for the value of housing vouchers, which on average cover a large majority of a tenant's monthly rent. In July 2023, the court denied SafeRent's motion to dismiss the Fair Housing Act and state-law claims. On November 20, 2024, Judge Angel Kelley granted final approval of a roughly $2.275 million settlement under which SafeRent agreed, for voucher holders, to stop issuing "accept" or "decline" recommendations based on its score unless the model was independently validated for fairness.[20]
Approaches to mitigating disparate impact in machine learning are typically organized into three categories based on when they intervene in the modeling pipeline.
Pre-processing methods modify the training data before any model is built.
| Method | How it works |
|---|---|
| Reweighting | Assigns different sample weights to training instances so that protected groups are represented more equitably in the learning process |
| Resampling | Uses oversampling of underrepresented groups or undersampling of overrepresented groups to balance the training distribution |
| Disparate impact remover | Adjusts feature distributions so they are identical across protected groups while preserving rank ordering within groups. Proposed by Feldman et al. (2015) at KDD[6] |
| Learning fair representations | Transforms the feature space into a new representation that encodes useful information while obscuring membership in protected groups |
| Relabeling | Modifies a subset of training labels near the decision boundary to reduce bias |
In-processing methods incorporate fairness constraints directly into the model training procedure.
| Method | How it works |
|---|---|
| Fairness-constrained optimization | Adds a regularization term to the loss function that penalizes violations of a chosen fairness metric |
| Adversarial debiasing | Trains a primary predictor alongside an adversary that tries to predict the protected attribute from the predictor's output; the primary model is trained to minimize the adversary's accuracy |
| Prejudice remover | Integrates a fairness-aware regularization term into logistic regression, penalizing the mutual information between predictions and the protected attribute |
| Exponentiated gradient reduction | Solves a sequence of cost-sensitive classification problems to find a model that approximately satisfies fairness constraints |
Post-processing methods adjust model predictions after training, without modifying the model itself.
| Method | How it works |
|---|---|
| Threshold adjustment | Sets different classification thresholds for different groups to equalize selection rates or error rates |
| Equalized odds post-processing | Adjusts predictions to ensure that false positive rates and true positive rates are equal across groups |
| Calibrated equalized odds | Modifies predictions to balance equalized odds with calibration |
| Reject option classification | Gives favorable outcomes to unprivileged groups and unfavorable outcomes to privileged groups for instances near the decision boundary where the model is uncertain |
All mitigation techniques involve tradeoffs. Improving demographic parity or the disparate impact ratio typically reduces overall model accuracy, since the model is being constrained from learning patterns that correlate with both the target variable and the protected attribute. The magnitude of the accuracy cost depends on the degree of correlation between protected attributes and legitimate predictive features, the base rate differences between groups, and the specific fairness criterion being optimized.
The enforcement of disparate impact doctrine in the United States has undergone significant changes.
EEOC guidance on AI (2023) and its 2025 removal. On May 18, 2023, the EEOC released a technical assistance document titled "Select Issues: Assessing Adverse Impact in Software, Algorithms, and Artificial Intelligence Used in Employment Selection Procedures Under Title VII."[12] The guidance confirmed that employers can be held liable for disparate impact caused by AI tools used in hiring, even when the tools are developed and operated by third-party vendors. Employers must ensure that algorithmic selection procedures do not produce adverse impact unless the employer can demonstrate that the tool is job related and consistent with business necessity. Under the second Trump administration, the EEOC removed this AI guidance from its website on January 27, 2025, as part of a broader rollback that accompanied the revocation of the Biden-era Executive Order 14110 on AI. Commentators noted that the document was non-binding technical assistance and that its removal did not change employers' underlying obligations under Title VII.[21]
Mobley v. Workday (ongoing). The leading test case for algorithmic disparate impact in hiring is Mobley v. Workday, Inc. (No. 3:23-cv-00770, N.D. Cal.). Derek Mobley alleged that Workday's applicant-screening software produced a disparate impact on applicants based on race, age, and disability. On July 12, 2024, Judge Rita Lin allowed the case to proceed on the theory that an AI vendor can be directly liable under Title VII, the ADEA, and the ADA as an "agent" of the employers who delegate screening decisions to it, while dismissing the intentional discrimination claim. On May 16, 2025, the court granted preliminary (conditional) certification of a nationwide ADEA collective of applicants age 40 and older, a group that could be very large given Workday's scale.[22] The case is widely viewed as a key signal of how courts will apply disparate impact doctrine to third-party AI hiring tools.
Executive Order 14281 (2025). On April 23, 2025, President Trump signed Executive Order 14281, titled "Restoring Equality of Opportunity and Meritocracy."[14] Section 2 declares it "the policy of the United States to eliminate the use of disparate-impact liability in all contexts to the maximum degree possible." The order directs all federal agencies to "deprioritize enforcement of all statutes and regulations to the extent they include disparate-impact liability," instructs the Attorney General to repeal or amend the Title VI implementing regulations (including provisions at 28 CFR 42.104), and requires agencies to review pending investigations, civil suits, consent judgments, and permanent injunctions that rely on disparate impact theories.[14] However, an executive order cannot repeal the statutory provisions of Title VII or the Fair Housing Act or override judicial precedent. Private plaintiffs retain the right to bring disparate impact claims in court, and the statutory framework established by the Civil Rights Act of 1991 remains in force absent new legislation or Supreme Court rulings.
Fair housing rulemaking (2026). Consistent with the executive order, HUD issued a proposed rule on January 14, 2026 that would rescind its regulations implementing the Fair Housing Act's disparate impact (discriminatory effects) standard, regulations that HUD had only restored to their 2013 form in 2023. Removing the regulation does not amend the Fair Housing Act itself, and disparate impact claims remain available to private plaintiffs and advocacy groups under Inclusive Communities.[16]
State and local regulation. Several states and cities have enacted their own rules. New York City's Local Law 144 (enforcement began July 5, 2023) requires employers using automated employment decision tools (AEDTs) to conduct annual independent bias audits calculating selection rates and impact ratios by sex, race, and ethnicity, publish audit summaries on their websites, and notify candidates that an AEDT is being used.[15] Illinois, Maryland, and Colorado have also enacted legislation addressing AI in employment or consumer decisions. Colorado's experience illustrates how unsettled this area is: the Colorado AI Act (SB 24-205), the first comprehensive U.S. state law to require developers and deployers of "high-risk" AI systems to use reasonable care against algorithmic discrimination, was signed in 2024 but had its effective date pushed back from February 2026 to June 30, 2026, and in May 2026 the legislature passed SB 26-189 to repeal and replace it with a narrower, disclosure-focused framework taking effect January 1, 2027.[23]
The EU AI Act, which entered into force on August 1, 2024, takes a risk-based approach to regulating AI systems. Under Annex III, AI systems used for the recruitment or selection of workers, for decisions on promotion or termination, and for tasks such as creditworthiness assessment and law enforcement are classified as high-risk. Article 10 requires that training, validation, and testing data for high-risk systems be relevant, sufficiently representative, and examined for possible biases, and other provisions impose technical documentation, logging, human oversight, and post-market monitoring obligations.[24] The Act does not use the term "disparate impact" directly but draws on the EU's existing non-discrimination framework, where the concept of "indirect discrimination" serves a similar function. Under EU law, indirect discrimination occurs when an apparently neutral provision, criterion, or practice puts persons of a particular protected group at a particular disadvantage compared with other persons. The obligations for high-risk systems were originally set to apply from August 2, 2026, but under a "Digital Omnibus" simplification package advanced in late 2025 and 2026 the European Commission proposed postponing the application date for Annex III high-risk systems, including recruitment tools, to December 2, 2027.[24]
Several open-source libraries provide implementations of disparate impact metrics and mitigation algorithms.
| Toolkit | Developer | Key capabilities |
|---|---|---|
| AI Fairness 360 (AIF360) | IBM | Disparate impact ratio, disparate impact remover, reweighting, adversarial debiasing, equalized odds post-processing, and 70+ fairness metrics |
| Fairlearn | Microsoft | Demographic parity, equalized odds, threshold optimization, exponentiated gradient, and integration with scikit-learn |
| What-If Tool | Visual exploration of model performance across subgroups, with fairness metric computation | |
| Aequitas | University of Chicago | Bias audit toolkit for computing group-level fairness metrics including disparate impact |
| Themis-ML | Bantilan (2018) | Fairness-aware machine learning library with pre- and post-processing methods |
Imagine a school is picking kids for the soccer team. The coach says, "Everyone has to pass a juggling test." The test sounds fair because the same rule applies to everyone. But it turns out that kids who grew up with a soccer ball at home do much better on the juggling test than kids who did not. If most kids from one neighborhood had soccer balls and most kids from another neighborhood did not, then the juggling test would keep out more kids from the second neighborhood, even though the coach did not mean to be unfair. That is disparate impact: a rule that looks fair but ends up hurting one group more than another. In the computer world, this happens when a program makes decisions (like who gets a job or a loan) using patterns that accidentally work against certain groups of people.