# Demographic Parity

> Source: https://aiwiki.ai/wiki/demographic_parity
> Updated: 2026-07-12
> Categories: AI Ethics, Machine Learning
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**Demographic parity**, also called **statistical parity** or **acceptance rate parity**, is a [fairness](/wiki/ai_fairness) criterion in [machine learning](/wiki/machine_learning) that requires a model's predictions to be statistically independent of a protected attribute such as race, gender, or age. In a [binary classification](/wiki/binary_classification) setting, demographic parity is satisfied when each demographic group receives positive predictions at the same rate, regardless of the true labels. The concept is one of the earliest and most widely studied group fairness definitions, dating back to work by Calders, Kamiran, and Pechenizkiy in 2009,[1] and it remains a central reference point in discussions about [algorithmic bias](/wiki/algorithmic_fairness), anti-discrimination law, and [responsible AI](/wiki/responsible_ai). In the influential 2012 paper "Fairness Through Awareness," Dwork and coauthors define statistical parity as the requirement that "the demographics of the set of individuals receiving any classification" be "the same as the demographics of the underlying population."[3]

## Explain like I'm 5 (ELI5)

Imagine a teacher is handing out gold star stickers to students. Demographic parity means the teacher gives stickers to the same percentage of boys as girls, and the same percentage of tall kids as short kids. It does not matter whether each individual student did something to earn a sticker; the rule only checks that each group gets roughly the same share. This makes it easy to spot when one group is being left out, but it also means the teacher might give stickers to some students who did not earn them, or skip students who did, just to keep the numbers even.

## What is the mathematical definition of demographic parity?

Let $$X$$ denote the feature vector, $$A$$ the protected (sensitive) attribute, and $$Y$$ the true label. A classifier $$h$$ satisfies demographic parity if its predicted outcome is independent of the protected attribute:

> $$P(h(X) = 1 \mid A = a) = P(h(X) = 1 \mid A = b)$$ for all groups $$a, b \in A$$

Equivalently, the condition can be written as:

> $$P(h(X) = 1 \mid A = a) = P(h(X) = 1)$$ for every value $$a$$ of $$A$$

This means the selection rate (the proportion of individuals receiving a positive prediction) must be the same across all groups defined by the protected attribute. In the regression setting, a predictor $$f$$ satisfies demographic parity if the cumulative distribution of $$f(X)$$ is identical across groups:[12]

> $$P(f(X) \ge z \mid A = a) = P(f(X) \ge z)$$ for all $$a$$ and all thresholds $$z$$

### How is demographic parity measured?

In practice, perfect independence rarely holds. Practitioners therefore measure how close a system is to demographic parity using two related quantities.

| Metric | Formula | Perfect fairness value |
|---|---|---|
| Demographic parity difference (DPD) | $$\max_a P(h(X)=1 \mid A=a) - \min_a P(h(X)=1 \mid A=a)$$ | 0 |
| Demographic parity ratio (DPR) | $$\min_a P(h(X)=1 \mid A=a) / \max_a P(h(X)=1 \mid A=a)$$ | 1 |

A DPD close to 0 or a DPR close to 1 indicates that the system is approximately satisfying demographic parity. The widely cited "four-fifths rule" (discussed below) uses a threshold of 0.8 on the ratio.

## When was demographic parity introduced?

The idea that selection rates across demographic groups should be comparable predates machine learning. In United States employment law, the Uniform Guidelines on Employee Selection Procedures, adopted in 1978 by the Equal Employment Opportunity Commission (EEOC), the Department of Labor, the Department of Justice, and the Civil Service Commission, introduced the four-fifths (80%) rule as a practical test for [disparate impact](/wiki/disparate_impact).[13] Under this rule, a selection process is flagged for further review if the selection rate for any protected group is less than 80% of the rate for the group with the highest selection rate.[13]

Within [artificial intelligence](/wiki/artificial_intelligence) and machine learning research, Calders, Kamiran, and Pechenizkiy (2009) formally introduced demographic parity (under the name "discrimination-free classification") as a constraint for learned classifiers.[1] Calders and Verwer (2010) extended this with a latent-variable naive Bayes approach.[2] The term "statistical parity" became common in subsequent literature, including the influential paper "Fairness Through Awareness" by Dwork, Hardt, Pitassi, Reingold, and Zemel (2012), which distinguished between group fairness criteria such as statistical parity and [individual fairness](/wiki/individual_fairness) criteria that require similar individuals to be treated similarly.[3]

## How does demographic parity relate to other fairness criteria?

Demographic parity is one of several group fairness criteria. Each criterion conditions on different variables and therefore encodes a different notion of what it means for a classifier to be "fair." The table below summarizes the main criteria and how they relate to demographic parity.

| Criterion | Condition | What it requires | Relationship to demographic parity |
|---|---|---|---|
| Demographic parity (statistical parity) | $$P(h(X)=1 \mid A=a) = P(h(X)=1 \mid A=b)$$ | Equal selection rates across groups | (This criterion) |
| [Equalized odds](/wiki/equalized_odds) | $$P(h(X)=1 \mid A=a, Y=y) = P(h(X)=1 \mid A=b, Y=y)$$ for $$y \in \{0,1\}$$ | Equal [true positive rate](/wiki/true_positive_rate) and equal false positive rate across groups | Conditions on true label Y; incompatible with demographic parity when base rates differ |
| Equal opportunity | $$P(h(X)=1 \mid A=a, Y=1) = P(h(X)=1 \mid A=b, Y=1)$$ | Equal true positive rate across groups | A relaxed form of equalized odds; also conflicts with demographic parity under unequal base rates |
| [Predictive parity](/wiki/predictive_parity) | $$P(Y=1 \mid h(X)=1, A=a) = P(Y=1 \mid h(X)=1, A=b)$$ | Equal [precision](/wiki/precision) across groups | Focuses on the predictive value of positive predictions; generally incompatible with demographic parity |
| [Counterfactual fairness](/wiki/counterfactual_fairness) | Prediction unchanged if protected attribute were different | Individual-level causal criterion | Operates at the individual rather than group level |
| [Individual fairness](/wiki/individual_fairness) | Similar individuals receive similar predictions | Requires a task-specific similarity metric | Complementary to demographic parity; can conflict if group distributions differ |

Demographic parity belongs to the family of **independence-based** fairness criteria because it requires the prediction to be statistically independent of the protected attribute. Equalized odds and equal opportunity belong to the **separation-based** family because they require conditional independence given the true label. Predictive parity belongs to the **sufficiency-based** family because it requires conditional independence given the prediction.[12]

## Can demographic parity be satisfied alongside other fairness criteria?

A key result in algorithmic fairness is that independence (demographic parity), separation ([equalized odds](/wiki/equalized_odds)), and sufficiency ([predictive parity](/wiki/predictive_parity)) cannot all be satisfied simultaneously except in two degenerate cases: (1) the classifier is perfect, or (2) the base rates are identical across groups.[12]

This result was established in work by Chouldechova (2017)[8] and by Kleinberg, Mullainathan, and Raghavan (2016).[7] Chouldechova showed that when base rates differ between groups, it is impossible for a classifier to simultaneously achieve equal false positive rates, equal false negative rates, and equal positive predictive values.[8] Kleinberg, Mullainathan, and Raghavan proved a closely related result showing that calibration and balance (a condition related to equalized odds) are mutually exclusive except in trivial settings.[7]

The impossibility theorem has practical consequences: any team building a fair [classification](/wiki/classification) system must choose which fairness criterion to prioritize, because satisfying one will generally violate another when group base rates differ. This makes demographic parity a deliberate design choice rather than a universally optimal criterion.

## How do you build a classifier that satisfies demographic parity?

Techniques for building classifiers that satisfy or approximate demographic parity fall into three categories based on where in the modeling pipeline the intervention occurs.

### Pre-processing methods

Pre-processing methods modify the [training data](/wiki/training_data) before the model is trained, with the goal of removing or reducing bias in the data itself.

| Method | Authors | Description |
|---|---|---|
| Reweighing | Kamiran and Calders (2012) | Assigns weights to training samples based on their group and label combination so that the weighted data satisfies demographic parity. The classifier is then trained on the reweighted data. |
| Disparate impact removal | Feldman, Friedler, Moeller, Scheidegger, Venkatasubramanian (2015) | Modifies feature distributions so that they are indistinguishable across groups (measured by earth mover's distance) while preserving within-group rank ordering.[5] |
| Learning fair representations | Zemel, Wu, Swersky, Pitassi, Dwork (2013) | Learns an intermediate representation of the data that encodes information useful for prediction while obfuscating membership in the protected group.[4] |

### In-processing methods

In-processing methods incorporate fairness constraints directly into the model training procedure.

| Method | Authors | Description |
|---|---|---|
| Constrained optimization | Zafar, Valera, Gomez-Rodriguez, Gummadi (2017) | Adds a fairness constraint (based on the covariance between the protected attribute and the decision boundary) to the objective function of [logistic regression](/wiki/logistic_regression) or [support vector machines](/wiki/support_vector_machine_svm).[9] |
| Exponentiated gradient reduction | Agarwal, Beygelzimer, Dudik, Langford, Wallach (2018) | Reduces fair classification to a sequence of cost-sensitive classification problems solved via an exponentiated gradient algorithm. Compatible with any base learner and supports both demographic parity and [equalized odds](/wiki/equalized_odds) constraints.[10] |
| Adversarial debiasing | Zhang, Lemoine, Mitchell (2018) | Trains a predictor and an adversary simultaneously. The adversary tries to predict the protected attribute from the predictor's output, while the predictor is trained to maximize [accuracy](/wiki/accuracy) and fool the adversary. |

### Post-processing methods

Post-processing methods adjust the outputs of an already-trained model to satisfy fairness constraints.

| Method | Authors | Description |
|---|---|---|
| Threshold adjustment | Hardt, Price, Srebro (2016) | Sets group-specific decision thresholds so that the resulting selection rates satisfy demographic parity (or equalized odds). Requires access to the protected attribute at prediction time.[6] |
| Reject option classification | Kamiran, Karim, Zhang (2012) | Reassigns predictions for instances near the decision boundary. Instances from the disadvantaged group near the boundary are moved to the positive class, and instances from the advantaged group near the boundary are moved to the negative class. |

## What software measures and enforces demographic parity?

Several open-source libraries provide built-in support for measuring and enforcing demographic parity.

| Library | Organization | Language | Key capabilities |
|---|---|---|---|
| Fairlearn | Microsoft | Python | Metrics (demographic parity difference/ratio), exponentiated gradient reduction, threshold optimizer |
| AI Fairness 360 (AIF360) | IBM | Python | Reweighing, disparate impact remover, adversarial debiasing, calibrated equalized odds, and over 70 fairness metrics |
| Aequitas | University of Chicago | Python | Bias audit toolkit for group fairness metrics including demographic parity |
| What-If Tool | Google | Python / Web | Interactive fairness exploration for TensorFlow and XGBoost models |
| yardstick | RStudio / Posit | R | Demographic parity metric for tidymodels workflows |

## Advantages of demographic parity

1. **Simplicity.** The definition is straightforward and does not require access to true labels, which makes it easy to compute and communicate to non-technical stakeholders.
2. **Proportional representation.** It ensures that each demographic group is represented among positive outcomes in proportion to its size in the applicant or candidate pool.
3. **Legal alignment.** The criterion maps naturally onto the four-fifths rule used in U.S. employment law and onto disparate impact analysis more generally.
4. **Label-free evaluation.** Because demographic parity depends only on predictions and group membership (not on ground truth labels), it can be monitored in production settings where true labels are delayed or unavailable.

## What are the limitations and criticisms of demographic parity?

Demographic parity has several well-documented shortcomings.

1. **Ignores prediction quality.** The criterion says nothing about whether the model's predictions are correct. A system could satisfy demographic parity by randomly selecting members of the disadvantaged group while systematically selecting qualified members of the advantaged group. Both groups would receive positive predictions at the same rate, but the quality of those predictions would differ sharply.
2. **Conflicts with calibration.** When base rates differ across groups, enforcing demographic parity requires the model to over-predict for the lower-base-rate group or under-predict for the higher-base-rate group (or both), which destroys calibration.[7]
3. **Permits laziness.** Nothing in the definition prevents a model from using completely different decision rules for different groups, as long as the selection rates match. A model that carefully evaluates one group but flips a coin for another would still satisfy demographic parity.
4. **Rejects the optimal classifier.** When the fractions of genuinely positive individuals differ across groups, the Bayes-optimal classifier (the most accurate possible classifier) will not satisfy demographic parity. Enforcing the constraint therefore necessarily reduces overall accuracy.
5. **May harm the intended beneficiaries.** If demographic parity is enforced in a college admissions setting, students from a disadvantaged group may be admitted at higher rates than their current qualification levels would predict, potentially leading to higher dropout rates. This is related to the "mismatch" debate in affirmative action discussions, though the empirical evidence on the magnitude of this effect is mixed.
6. **Insensitive to subgroup differences.** Demographic parity checks balance only at the level of the groups defined by the protected attribute. It can be satisfied overall while subgroups (defined by intersections of protected attributes) experience significant disparities.

## Legal and regulatory context

### United States

The concept of demographic parity is closely tied to **disparate impact** doctrine under Title VII of the Civil Rights Act of 1964. Disparate impact claims do not require proof of intentional discrimination; instead, a plaintiff can establish a prima facie case by showing that a facially neutral employment practice has a disproportionately adverse effect on a protected class.

The **four-fifths rule**, codified in the 1978 Uniform Guidelines on Employee Selection Procedures, operationalizes this idea: if the selection rate for a protected group is less than 80% (four-fifths) of the rate for the group with the highest selection rate, the practice is flagged for further scrutiny.[13] The guidance itself stresses that the rule is only a screening device, stating that the "4/5ths" or "80%" rule of thumb "is not intended as a legal definition, but is a practical means of keeping the attention of the enforcement agencies on serious discrepancies in rates of hiring, promotion and other selection decisions."[15] Courts often supplement it with more rigorous statistical tests, such as tests of statistical significance.

In 2023, the EEOC issued guidance on the use of [artificial intelligence](/wiki/artificial_intelligence) in employment, reaffirming that AI-based selection tools are subject to the same Title VII analysis as traditional selection procedures. The guidance emphasized that employers can be held liable for [disparate impact](/wiki/disparate_impact) even when they rely on third-party AI vendors.

### European Union

The EU AI Act, which entered into force on August 1, 2024, establishes a risk-based regulatory framework for artificial intelligence.[14] High-risk AI systems, including those used in employment, education, and law enforcement, are subject to requirements for data governance, bias testing, technical documentation, and post-deployment monitoring.[14] While the AI Act does not mandate any specific fairness metric, its emphasis on preventing discriminatory outcomes means that demographic parity and related metrics are likely to play a role in compliance assessments.

The General Data Protection Regulation (GDPR), specifically Article 22, gives individuals the right not to be subject to a decision based solely on automated processing that produces legal or similarly significant effects. This provision creates additional pressure on organizations to demonstrate that their automated systems do not produce discriminatory outcomes.

## Real-world applications and case studies

### Hiring and recruitment

Automated resume screening tools are one of the most common settings where demographic parity is applied. For example, if a screening tool selects 30% of male applicants for interviews but only 15% of female applicants, the demographic parity ratio is 0.50, well below the four-fifths threshold. Organizations use demographic parity monitoring to detect and correct such imbalances before deployment.

### Credit and lending

Fair lending laws in the United States (including the Equal Credit Opportunity Act and the Fair Housing Act) prohibit discrimination in credit decisions. Lenders who use [machine learning](/wiki/machine_learning) models for credit scoring monitor demographic parity ratios across race and ethnicity groups to identify potential disparate impact. However, regulators and practitioners generally recognize that raw demographic parity may not be the appropriate standard for credit decisions, since differences in creditworthiness across groups may reflect real differences in financial history rather than discrimination.

### Criminal justice and risk assessment

The COMPAS recidivism prediction tool, developed by Northpointe (now Equivant), became the subject of intense public scrutiny after a 2016 ProPublica investigation found that the tool's false positive rate was significantly higher for Black defendants than for white defendants. The COMPAS controversy illustrates the tension between different fairness criteria: Northpointe argued that the tool satisfied [predictive parity](/wiki/predictive_parity) (equal positive predictive value across groups), while ProPublica's analysis showed it violated separation-based criteria.[8] Demographic parity was not the primary metric in that debate, but the case highlighted the practical importance of choosing and justifying a fairness criterion before deployment.

### Healthcare

In 2019, Obermeyer, Powers, Vogeli, and Mullainathan published a study in *Science* showing that a widely used commercial healthcare algorithm exhibited significant racial bias.[11] The algorithm used healthcare costs as a proxy for healthcare needs, but because Black patients on average incurred lower costs due to systemic barriers to care, the algorithm systematically assigned lower risk scores to Black patients.[11] The authors found that the bias was large: at a fixed risk score, Black patients were considerably sicker than white patients, and the researchers estimated that remedying the disparity "would increase the percentage of Black patients receiving additional help from 17.7 to 46.5%."[11] The study did not frame the problem in terms of demographic parity specifically, but correcting the proxy variable brought the algorithm much closer to satisfying demographic parity in its allocation of resources.

## Demographic parity in practice: guidance for practitioners

1. **Clarify the context.** Determine whether demographic parity is the appropriate fairness criterion for your application. In settings where the protected attribute is clearly irrelevant to the decision (e.g., jury selection), demographic parity is a natural choice. In settings where group-level differences in the target variable may be legitimate (e.g., medical diagnosis), other criteria such as [equalized odds](/wiki/equalized_odds) may be more appropriate.
2. **Measure before and after.** Compute the demographic parity difference and ratio on your training data, validation data, and production data. Track these metrics over time to detect drift.
3. **Use relaxed thresholds.** Perfect demographic parity is rarely achievable or desirable. Use the four-fifths rule (DPR >= 0.8) as a starting point, and adjust based on context, stakeholder input, and legal requirements.
4. **Combine with other metrics.** No single fairness metric captures all aspects of fairness. Report demographic parity alongside accuracy, [precision](/wiki/precision), [recall](/wiki/recall), false positive rates, and other fairness criteria to give a complete picture.
5. **Document decisions.** Record which fairness criteria you chose, why you chose them, what thresholds you used, and how you handled trade-offs. This documentation is increasingly expected by regulators and auditors.

## See also

- [Equalized odds](/wiki/equalized_odds)
- [Predictive parity](/wiki/predictive_parity)
- [Individual fairness](/wiki/individual_fairness)
- [Counterfactual fairness](/wiki/counterfactual_fairness)
- [Disparate impact](/wiki/disparate_impact)
- [Bias](/wiki/bias)
- [AI ethics](/wiki/ai_ethics)
- [Responsible AI](/wiki/responsible_ai)
- [Confusion matrix](/wiki/confusion_matrix)

## References

1. Calders, T., Kamiran, F., & Pechenizkiy, M. (2009). "Building classifiers with independency constraints." *IEEE International Conference on Data Mining Workshops (ICDMW)*, pp. 13-18.
2. Calders, T. & Verwer, S. (2010). "Three naive Bayes approaches for discrimination-free classification." *Data Mining and Knowledge Discovery*, 21(2), pp. 277-292.
3. Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012). "Fairness through awareness." *Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS)*, pp. 214-226. arXiv:1104.3913.
4. Zemel, R., Wu, Y., Swersky, K., Pitassi, T., & Dwork, C. (2013). "Learning fair representations." *Proceedings of the 30th International Conference on Machine Learning (ICML)*, pp. 325-333.
5. Feldman, M., Friedler, S., Moeller, J., Scheidegger, C., & Venkatasubramanian, S. (2015). "Certifying and removing disparate impact." *Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, pp. 259-268.
6. Hardt, M., Price, E., & Srebro, N. (2016). "Equality of opportunity in supervised learning." *Advances in Neural Information Processing Systems (NeurIPS)*, pp. 3315-3323.
7. Kleinberg, J., Mullainathan, S., & Raghavan, M. (2016). "Inherent trade-offs in the fair determination of risk scores." *Proceedings of the 8th Innovations in Theoretical Computer Science Conference (ITCS)*.
8. Chouldechova, A. (2017). "Fair prediction with disparate impact: A study of bias in recidivism prediction instruments." *Big Data*, 5(2), pp. 153-163.
9. Zafar, M. B., Valera, I., Gomez-Rodriguez, M., & Gummadi, K. P. (2017). "Fairness constraints: Mechanisms for fair classification." *Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS)*, pp. 962-970.
10. Agarwal, A., Beygelzimer, A., Dudik, M., Langford, J., & Wallach, H. (2018). "A reductions approach to fair classification." *Proceedings of the 35th International Conference on Machine Learning (ICML)*, pp. 60-69.
11. Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). "Dissecting racial bias in an algorithm used to manage the health of populations." *Science*, 366(6464), pp. 447-453.
12. Barocas, S., Hardt, M., & Narayanan, A. (2023). *Fairness and Machine Learning: Limitations and Opportunities.* MIT Press. Available at https://fairmlbook.org.
13. U.S. Equal Employment Opportunity Commission (1978). "Uniform Guidelines on Employee Selection Procedures." 29 CFR Part 1607.
14. European Parliament and Council of the European Union (2024). "Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI Act)."
15. U.S. Equal Employment Opportunity Commission. "Questions and Answers to Clarify and Provide a Common Interpretation of the Uniform Guidelines on Employee Selection Procedures." Question 11.