Demographic parity, also called statistical parity or acceptance rate parity, is a fairness criterion in machine learning that requires a model's predictions to be statistically independent of a protected attribute such as race, gender, or age. In a binary classification setting, demographic parity is satisfied when each demographic group receives positive predictions at the same rate. The concept is one of the earliest and most widely studied group fairness definitions, dating back to work by Calders, Kamiran, and Pechenizkiy in 2009, and it remains a central reference point in discussions about algorithmic bias, anti-discrimination law, and responsible AI.
Imagine a teacher is handing out gold star stickers to students. Demographic parity means the teacher gives stickers to the same percentage of boys as girls, and the same percentage of tall kids as short kids. It does not matter whether each individual student did something to earn a sticker; the rule only checks that each group gets roughly the same share. This makes it easy to spot when one group is being left out, but it also means the teacher might give stickers to some students who did not earn them, or skip students who did, just to keep the numbers even.
Let X denote the feature vector, A the protected (sensitive) attribute, and Y the true label. A classifier h satisfies demographic parity if its predicted outcome is independent of the protected attribute:
P( h(X) = 1 | A = a ) = P( h(X) = 1 | A = b ) for all groups a, b in A
Equivalently, the condition can be written as:
P( h(X) = 1 | A = a ) = P( h(X) = 1 ) for every value a of A
This means the selection rate (the proportion of individuals receiving a positive prediction) must be the same across all groups defined by the protected attribute. In the regression setting, a predictor f satisfies demographic parity if the cumulative distribution of f(X) is identical across groups:
P( f(X) >= z | A = a ) = P( f(X) >= z ) for all a and all thresholds z
In practice, perfect independence rarely holds. Practitioners therefore measure how close a system is to demographic parity using two related quantities.
| Metric | Formula | Perfect fairness value |
|---|---|---|
| Demographic parity difference (DPD) | max_a P(h(X)=1 | A=a) - min_a P(h(X)=1 | A=a) | 0 |
| Demographic parity ratio (DPR) | min_a P(h(X)=1 | A=a) / max_a P(h(X)=1 | A=a) | 1 |
A DPD close to 0 or a DPR close to 1 indicates that the system is approximately satisfying demographic parity. The widely cited "four-fifths rule" (discussed below) uses a threshold of 0.8 on the ratio.
The idea that selection rates across demographic groups should be comparable predates machine learning. In United States employment law, the Uniform Guidelines on Employee Selection Procedures, adopted in 1978 by the Equal Employment Opportunity Commission (EEOC), the Department of Labor, the Department of Justice, and the Civil Service Commission, introduced the four-fifths (80%) rule as a practical test for disparate impact. Under this rule, a selection process is flagged for further review if the selection rate for any protected group is less than 80% of the rate for the group with the highest selection rate.
Within artificial intelligence and machine learning research, Calders, Kamiran, and Pechenizkiy (2009) formally introduced demographic parity (under the name "discrimination-free classification") as a constraint for learned classifiers. Calders and Verwer (2010) extended this with a latent-variable naive Bayes approach. The term "statistical parity" became common in subsequent literature, including the influential paper "Fairness Through Awareness" by Dwork, Hardt, Pitassi, Reingold, and Zemel (2012), which distinguished between group fairness criteria such as statistical parity and individual fairness criteria that require similar individuals to be treated similarly.
Demographic parity is one of several group fairness criteria. Each criterion conditions on different variables and therefore encodes a different notion of what it means for a classifier to be "fair." The table below summarizes the main criteria and how they relate to demographic parity.
| Criterion | Condition | What it requires | Relationship to demographic parity |
|---|---|---|---|
| Demographic parity (statistical parity) | P(h(X)=1 | A=a) = P(h(X)=1 | A=b) | Equal selection rates across groups | (This criterion) |
| Equalized odds | P(h(X)=1 | A=a, Y=y) = P(h(X)=1 | A=b, Y=y) for y in {0,1} | Equal true positive rate and equal false positive rate across groups | Conditions on true label Y; incompatible with demographic parity when base rates differ |
| Equal opportunity | P(h(X)=1 | A=a, Y=1) = P(h(X)=1 | A=b, Y=1) | Equal true positive rate across groups | A relaxed form of equalized odds; also conflicts with demographic parity under unequal base rates |
| Predictive parity | P(Y=1 | h(X)=1, A=a) = P(Y=1 | h(X)=1, A=b) | Equal precision across groups | Focuses on the predictive value of positive predictions; generally incompatible with demographic parity |
| Counterfactual fairness | Prediction unchanged if protected attribute were different | Individual-level causal criterion | Operates at the individual rather than group level |
| Individual fairness | Similar individuals receive similar predictions | Requires a task-specific similarity metric | Complementary to demographic parity; can conflict if group distributions differ |
Demographic parity belongs to the family of independence-based fairness criteria because it requires the prediction to be statistically independent of the protected attribute. Equalized odds and equal opportunity belong to the separation-based family because they require conditional independence given the true label. Predictive parity belongs to the sufficiency-based family because it requires conditional independence given the prediction.
A key result in algorithmic fairness is that independence (demographic parity), separation (equalized odds), and sufficiency (predictive parity) cannot all be satisfied simultaneously except in two degenerate cases: (1) the classifier is perfect, or (2) the base rates are identical across groups.
This result was established in work by Chouldechova (2017) and by Kleinberg, Mullainathan, and Raghavan (2016). Chouldechova showed that when base rates differ between groups, it is impossible for a classifier to simultaneously achieve equal false positive rates, equal false negative rates, and equal positive predictive values. Kleinberg, Mullainathan, and Raghavan proved a closely related result showing that calibration and balance (a condition related to equalized odds) are mutually exclusive except in trivial settings.
The impossibility theorem has practical consequences: any team building a fair classification system must choose which fairness criterion to prioritize, because satisfying one will generally violate another when group base rates differ. This makes demographic parity a deliberate design choice rather than a universally optimal criterion.
Techniques for building classifiers that satisfy or approximate demographic parity fall into three categories based on where in the modeling pipeline the intervention occurs.
Pre-processing methods modify the training data before the model is trained, with the goal of removing or reducing bias in the data itself.
| Method | Authors | Description |
|---|---|---|
| Reweighing | Kamiran and Calders (2012) | Assigns weights to training samples based on their group and label combination so that the weighted data satisfies demographic parity. The classifier is then trained on the reweighted data. |
| Disparate impact removal | Feldman, Friedler, Moeller, Scheidegger, Venkatasubramanian (2015) | Modifies feature distributions so that they are indistinguishable across groups (measured by earth mover's distance) while preserving within-group rank ordering. |
| Learning fair representations | Zemel, Wu, Swersky, Pitassi, Dwork (2013) | Learns an intermediate representation of the data that encodes information useful for prediction while obfuscating membership in the protected group. |
In-processing methods incorporate fairness constraints directly into the model training procedure.
| Method | Authors | Description |
|---|---|---|
| Constrained optimization | Zafar, Valera, Gomez-Rodriguez, Gummadi (2017) | Adds a fairness constraint (based on the covariance between the protected attribute and the decision boundary) to the objective function of logistic regression or support vector machines. |
| Exponentiated gradient reduction | Agarwal, Beygelzimer, Dudik, Langford, Wallach (2018) | Reduces fair classification to a sequence of cost-sensitive classification problems solved via an exponentiated gradient algorithm. Compatible with any base learner and supports both demographic parity and equalized odds constraints. |
| Adversarial debiasing | Zhang, Lemoine, Mitchell (2018) | Trains a predictor and an adversary simultaneously. The adversary tries to predict the protected attribute from the predictor's output, while the predictor is trained to maximize accuracy and fool the adversary. |
Post-processing methods adjust the outputs of an already-trained model to satisfy fairness constraints.
| Method | Authors | Description |
|---|---|---|
| Threshold adjustment | Hardt, Price, Srebro (2016) | Sets group-specific decision thresholds so that the resulting selection rates satisfy demographic parity (or equalized odds). Requires access to the protected attribute at prediction time. |
| Reject option classification | Kamiran, Karim, Zhang (2012) | Reassigns predictions for instances near the decision boundary. Instances from the disadvantaged group near the boundary are moved to the positive class, and instances from the advantaged group near the boundary are moved to the negative class. |
Several open-source libraries provide built-in support for measuring and enforcing demographic parity.
| Library | Organization | Language | Key capabilities |
|---|---|---|---|
| Fairlearn | Microsoft | Python | Metrics (demographic parity difference/ratio), exponentiated gradient reduction, threshold optimizer |
| AI Fairness 360 (AIF360) | IBM | Python | Reweighing, disparate impact remover, adversarial debiasing, calibrated equalized odds, and over 70 fairness metrics |
| Aequitas | University of Chicago | Python | Bias audit toolkit for group fairness metrics including demographic parity |
| What-If Tool | Python / Web | Interactive fairness exploration for TensorFlow and XGBoost models | |
| yardstick | RStudio / Posit | R | Demographic parity metric for tidymodels workflows |
Demographic parity has several well-documented shortcomings.
The concept of demographic parity is closely tied to disparate impact doctrine under Title VII of the Civil Rights Act of 1964. Disparate impact claims do not require proof of intentional discrimination; instead, a plaintiff can establish a prima facie case by showing that a facially neutral employment practice has a disproportionately adverse effect on a protected class.
The four-fifths rule, codified in the 1978 Uniform Guidelines on Employee Selection Procedures, operationalizes this idea: if the selection rate for a protected group is less than 80% (four-fifths) of the rate for the group with the highest selection rate, the practice is flagged for further scrutiny. The four-fifths rule is not a legal definition of discrimination; it is a screening tool that triggers closer examination. Courts often supplement it with more rigorous statistical tests.
In 2023, the EEOC issued guidance on the use of artificial intelligence in employment, reaffirming that AI-based selection tools are subject to the same Title VII analysis as traditional selection procedures. The guidance emphasized that employers can be held liable for disparate impact even when they rely on third-party AI vendors.
The EU AI Act, which entered into force on August 1, 2024, establishes a risk-based regulatory framework for artificial intelligence. High-risk AI systems, including those used in employment, education, and law enforcement, are subject to requirements for data governance, bias testing, technical documentation, and post-deployment monitoring. While the AI Act does not mandate any specific fairness metric, its emphasis on preventing discriminatory outcomes means that demographic parity and related metrics are likely to play a role in compliance assessments.
The General Data Protection Regulation (GDPR), specifically Article 22, gives individuals the right not to be subject to a decision based solely on automated processing that produces legal or similarly significant effects. This provision creates additional pressure on organizations to demonstrate that their automated systems do not produce discriminatory outcomes.
Automated resume screening tools are one of the most common settings where demographic parity is applied. For example, if a screening tool selects 30% of male applicants for interviews but only 15% of female applicants, the demographic parity ratio is 0.50, well below the four-fifths threshold. Organizations use demographic parity monitoring to detect and correct such imbalances before deployment.
Fair lending laws in the United States (including the Equal Credit Opportunity Act and the Fair Housing Act) prohibit discrimination in credit decisions. Lenders who use machine learning models for credit scoring monitor demographic parity ratios across race and ethnicity groups to identify potential disparate impact. However, regulators and practitioners generally recognize that raw demographic parity may not be the appropriate standard for credit decisions, since differences in creditworthiness across groups may reflect real differences in financial history rather than discrimination.
The COMPAS recidivism prediction tool, developed by Northpointe (now Equivant), became the subject of intense public scrutiny after a 2016 ProPublica investigation found that the tool's false positive rate was significantly higher for Black defendants than for white defendants. The COMPAS controversy illustrates the tension between different fairness criteria: Northpointe argued that the tool satisfied predictive parity (equal positive predictive value across groups), while ProPublica's analysis showed it violated separation-based criteria. Demographic parity was not the primary metric in that debate, but the case highlighted the practical importance of choosing and justifying a fairness criterion before deployment.
In 2019, Obermeyer, Powers, Vogeli, and Mullainathan published a study in Science showing that a widely used commercial healthcare algorithm exhibited significant racial bias. The algorithm used healthcare costs as a proxy for healthcare needs, but because Black patients on average incurred lower costs due to systemic barriers to care, the algorithm systematically assigned lower risk scores to Black patients. The study did not frame the problem in terms of demographic parity specifically, but correcting the proxy variable brought the algorithm much closer to satisfying demographic parity in its allocation of resources.