Demographic Parity

Demographic parity, also called statistical parity or acceptance rate parity, is a fairness criterion in machine learning that requires a model's predictions to be statistically independent of a protected attribute such as race, gender, or age. In a binary classification setting, demographic parity is satisfied when each demographic group receives positive predictions at the same rate. The concept is one of the earliest and most widely studied group fairness definitions, dating back to work by Calders, Kamiran, and Pechenizkiy in 2009, and it remains a central reference point in discussions about algorithmic bias, anti-discrimination law, and responsible AI.

Explain like I'm 5 (ELI5)

Imagine a teacher is handing out gold star stickers to students. Demographic parity means the teacher gives stickers to the same percentage of boys as girls, and the same percentage of tall kids as short kids. It does not matter whether each individual student did something to earn a sticker; the rule only checks that each group gets roughly the same share. This makes it easy to spot when one group is being left out, but it also means the teacher might give stickers to some students who did not earn them, or skip students who did, just to keep the numbers even.

Mathematical definition

Let X denote the feature vector, A the protected (sensitive) attribute, and Y the true label. A classifier h satisfies demographic parity if its predicted outcome is independent of the protected attribute:

P( h(X) = 1 | A = a ) = P( h(X) = 1 | A = b ) for all groups a, b in A

Equivalently, the condition can be written as:

P( h(X) = 1 | A = a ) = P( h(X) = 1 ) for every value a of A

This means the selection rate (the proportion of individuals receiving a positive prediction) must be the same across all groups defined by the protected attribute. In the regression setting, a predictor f satisfies demographic parity if the cumulative distribution of f(X) is identical across groups:

P( f(X) >= z | A = a ) = P( f(X) >= z ) for all a and all thresholds z

Measurement: difference and ratio

In practice, perfect independence rarely holds. Practitioners therefore measure how close a system is to demographic parity using two related quantities.

Metric	Formula	Perfect fairness value
Demographic parity difference (DPD)	max_a P(h(X)=1 \| A=a) - min_a P(h(X)=1 \| A=a)	0
Demographic parity ratio (DPR)	min_a P(h(X)=1 \| A=a) / max_a P(h(X)=1 \| A=a)	1

A DPD close to 0 or a DPR close to 1 indicates that the system is approximately satisfying demographic parity. The widely cited "four-fifths rule" (discussed below) uses a threshold of 0.8 on the ratio.

Historical background

The idea that selection rates across demographic groups should be comparable predates machine learning. In United States employment law, the Uniform Guidelines on Employee Selection Procedures, adopted in 1978 by the Equal Employment Opportunity Commission (EEOC), the Department of Labor, the Department of Justice, and the Civil Service Commission, introduced the four-fifths (80%) rule as a practical test for disparate impact. Under this rule, a selection process is flagged for further review if the selection rate for any protected group is less than 80% of the rate for the group with the highest selection rate.

Within artificial intelligence and machine learning research, Calders, Kamiran, and Pechenizkiy (2009) formally introduced demographic parity (under the name "discrimination-free classification") as a constraint for learned classifiers. Calders and Verwer (2010) extended this with a latent-variable naive Bayes approach. The term "statistical parity" became common in subsequent literature, including the influential paper "Fairness Through Awareness" by Dwork, Hardt, Pitassi, Reingold, and Zemel (2012), which distinguished between group fairness criteria such as statistical parity and individual fairness criteria that require similar individuals to be treated similarly.

Relationship to other fairness criteria

Demographic parity is one of several group fairness criteria. Each criterion conditions on different variables and therefore encodes a different notion of what it means for a classifier to be "fair." The table below summarizes the main criteria and how they relate to demographic parity.

Criterion	Condition	What it requires	Relationship to demographic parity
Demographic parity (statistical parity)	P(h(X)=1 \| A=a) = P(h(X)=1 \| A=b)	Equal selection rates across groups	(This criterion)
Equalized odds	P(h(X)=1 \| A=a, Y=y) = P(h(X)=1 \| A=b, Y=y) for y in {0,1}	Equal true positive rate and equal false positive rate across groups	Conditions on true label Y; incompatible with demographic parity when base rates differ
Equal opportunity	P(h(X)=1 \| A=a, Y=1) = P(h(X)=1 \| A=b, Y=1)	Equal true positive rate across groups	A relaxed form of equalized odds; also conflicts with demographic parity under unequal base rates
Predictive parity	P(Y=1 \| h(X)=1, A=a) = P(Y=1 \| h(X)=1, A=b)	Equal precision across groups	Focuses on the predictive value of positive predictions; generally incompatible with demographic parity
Counterfactual fairness	Prediction unchanged if protected attribute were different	Individual-level causal criterion	Operates at the individual rather than group level
Individual fairness	Similar individuals receive similar predictions	Requires a task-specific similarity metric	Complementary to demographic parity; can conflict if group distributions differ

Demographic parity belongs to the family of independence-based fairness criteria because it requires the prediction to be statistically independent of the protected attribute. Equalized odds and equal opportunity belong to the separation-based family because they require conditional independence given the true label. Predictive parity belongs to the sufficiency-based family because it requires conditional independence given the prediction.

The impossibility theorem

A key result in algorithmic fairness is that independence (demographic parity), separation (equalized odds), and sufficiency (predictive parity) cannot all be satisfied simultaneously except in two degenerate cases: (1) the classifier is perfect, or (2) the base rates are identical across groups.

This result was established in work by Chouldechova (2017) and by Kleinberg, Mullainathan, and Raghavan (2016). Chouldechova showed that when base rates differ between groups, it is impossible for a classifier to simultaneously achieve equal false positive rates, equal false negative rates, and equal positive predictive values. Kleinberg, Mullainathan, and Raghavan proved a closely related result showing that calibration and balance (a condition related to equalized odds) are mutually exclusive except in trivial settings.

The impossibility theorem has practical consequences: any team building a fair classification system must choose which fairness criterion to prioritize, because satisfying one will generally violate another when group base rates differ. This makes demographic parity a deliberate design choice rather than a universally optimal criterion.

Algorithms for achieving demographic parity

Techniques for building classifiers that satisfy or approximate demographic parity fall into three categories based on where in the modeling pipeline the intervention occurs.

Pre-processing methods

Pre-processing methods modify the training data before the model is trained, with the goal of removing or reducing bias in the data itself.

Method	Authors	Description
Reweighing	Kamiran and Calders (2012)	Assigns weights to training samples based on their group and label combination so that the weighted data satisfies demographic parity. The classifier is then trained on the reweighted data.
Disparate impact removal	Feldman, Friedler, Moeller, Scheidegger, Venkatasubramanian (2015)	Modifies feature distributions so that they are indistinguishable across groups (measured by earth mover's distance) while preserving within-group rank ordering.
Learning fair representations	Zemel, Wu, Swersky, Pitassi, Dwork (2013)	Learns an intermediate representation of the data that encodes information useful for prediction while obfuscating membership in the protected group.

In-processing methods

In-processing methods incorporate fairness constraints directly into the model training procedure.

Method	Authors	Description
Constrained optimization	Zafar, Valera, Gomez-Rodriguez, Gummadi (2017)	Adds a fairness constraint (based on the covariance between the protected attribute and the decision boundary) to the objective function of logistic regression or support vector machines.
Exponentiated gradient reduction	Agarwal, Beygelzimer, Dudik, Langford, Wallach (2018)	Reduces fair classification to a sequence of cost-sensitive classification problems solved via an exponentiated gradient algorithm. Compatible with any base learner and supports both demographic parity and equalized odds constraints.
Adversarial debiasing	Zhang, Lemoine, Mitchell (2018)	Trains a predictor and an adversary simultaneously. The adversary tries to predict the protected attribute from the predictor's output, while the predictor is trained to maximize accuracy and fool the adversary.

Post-processing methods

Post-processing methods adjust the outputs of an already-trained model to satisfy fairness constraints.

Method	Authors	Description
Threshold adjustment	Hardt, Price, Srebro (2016)	Sets group-specific decision thresholds so that the resulting selection rates satisfy demographic parity (or equalized odds). Requires access to the protected attribute at prediction time.
Reject option classification	Kamiran, Karim, Zhang (2012)	Reassigns predictions for instances near the decision boundary. Instances from the disadvantaged group near the boundary are moved to the positive class, and instances from the advantaged group near the boundary are moved to the negative class.

Software and tooling

Several open-source libraries provide built-in support for measuring and enforcing demographic parity.

Library	Organization	Language	Key capabilities
Fairlearn	Microsoft	Python	Metrics (demographic parity difference/ratio), exponentiated gradient reduction, threshold optimizer
AI Fairness 360 (AIF360)	IBM	Python	Reweighing, disparate impact remover, adversarial debiasing, calibrated equalized odds, and over 70 fairness metrics
Aequitas	University of Chicago	Python	Bias audit toolkit for group fairness metrics including demographic parity
What-If Tool	Google	Python / Web	Interactive fairness exploration for TensorFlow and XGBoost models
yardstick	RStudio / Posit	R	Demographic parity metric for tidymodels workflows

Advantages of demographic parity

Simplicity. The definition is straightforward and does not require access to true labels, which makes it easy to compute and communicate to non-technical stakeholders.
Proportional representation. It ensures that each demographic group is represented among positive outcomes in proportion to its size in the applicant or candidate pool.
Legal alignment. The criterion maps naturally onto the four-fifths rule used in U.S. employment law and onto disparate impact analysis more generally.
Label-free evaluation. Because demographic parity depends only on predictions and group membership (not on ground truth labels), it can be monitored in production settings where true labels are delayed or unavailable.

Limitations and criticisms

Demographic parity has several well-documented shortcomings.

Ignores prediction quality. The criterion says nothing about whether the model's predictions are correct. A system could satisfy demographic parity by randomly selecting members of the disadvantaged group while systematically selecting qualified members of the advantaged group. Both groups would receive positive predictions at the same rate, but the quality of those predictions would differ sharply.
Conflicts with calibration. When base rates differ across groups, enforcing demographic parity requires the model to over-predict for the lower-base-rate group or under-predict for the higher-base-rate group (or both), which destroys calibration.
Permits laziness. Nothing in the definition prevents a model from using completely different decision rules for different groups, as long as the selection rates match. A model that carefully evaluates one group but flips a coin for another would still satisfy demographic parity.
Rejects the optimal classifier. When the fractions of genuinely positive individuals differ across groups, the Bayes-optimal classifier (the most accurate possible classifier) will not satisfy demographic parity. Enforcing the constraint therefore necessarily reduces overall accuracy.
May harm the intended beneficiaries. If demographic parity is enforced in a college admissions setting, students from a disadvantaged group may be admitted at higher rates than their current qualification levels would predict, potentially leading to higher dropout rates. This is related to the "mismatch" debate in affirmative action discussions, though the empirical evidence on the magnitude of this effect is mixed.
Insensitive to subgroup differences. Demographic parity checks balance only at the level of the groups defined by the protected attribute. It can be satisfied overall while subgroups (defined by intersections of protected attributes) experience significant disparities.

Legal and regulatory context

United States

The concept of demographic parity is closely tied to disparate impact doctrine under Title VII of the Civil Rights Act of 1964. Disparate impact claims do not require proof of intentional discrimination; instead, a plaintiff can establish a prima facie case by showing that a facially neutral employment practice has a disproportionately adverse effect on a protected class.

The four-fifths rule, codified in the 1978 Uniform Guidelines on Employee Selection Procedures, operationalizes this idea: if the selection rate for a protected group is less than 80% (four-fifths) of the rate for the group with the highest selection rate, the practice is flagged for further scrutiny. The four-fifths rule is not a legal definition of discrimination; it is a screening tool that triggers closer examination. Courts often supplement it with more rigorous statistical tests.

In 2023, the EEOC issued guidance on the use of artificial intelligence in employment, reaffirming that AI-based selection tools are subject to the same Title VII analysis as traditional selection procedures. The guidance emphasized that employers can be held liable for disparate impact even when they rely on third-party AI vendors.

European Union

The EU AI Act, which entered into force on August 1, 2024, establishes a risk-based regulatory framework for artificial intelligence. High-risk AI systems, including those used in employment, education, and law enforcement, are subject to requirements for data governance, bias testing, technical documentation, and post-deployment monitoring. While the AI Act does not mandate any specific fairness metric, its emphasis on preventing discriminatory outcomes means that demographic parity and related metrics are likely to play a role in compliance assessments.

The General Data Protection Regulation (GDPR), specifically Article 22, gives individuals the right not to be subject to a decision based solely on automated processing that produces legal or similarly significant effects. This provision creates additional pressure on organizations to demonstrate that their automated systems do not produce discriminatory outcomes.

Real-world applications and case studies

Hiring and recruitment

Automated resume screening tools are one of the most common settings where demographic parity is applied. For example, if a screening tool selects 30% of male applicants for interviews but only 15% of female applicants, the demographic parity ratio is 0.50, well below the four-fifths threshold. Organizations use demographic parity monitoring to detect and correct such imbalances before deployment.

Credit and lending

Fair lending laws in the United States (including the Equal Credit Opportunity Act and the Fair Housing Act) prohibit discrimination in credit decisions. Lenders who use machine learning models for credit scoring monitor demographic parity ratios across race and ethnicity groups to identify potential disparate impact. However, regulators and practitioners generally recognize that raw demographic parity may not be the appropriate standard for credit decisions, since differences in creditworthiness across groups may reflect real differences in financial history rather than discrimination.

Criminal justice and risk assessment

The COMPAS recidivism prediction tool, developed by Northpointe (now Equivant), became the subject of intense public scrutiny after a 2016 ProPublica investigation found that the tool's false positive rate was significantly higher for Black defendants than for white defendants. The COMPAS controversy illustrates the tension between different fairness criteria: Northpointe argued that the tool satisfied predictive parity (equal positive predictive value across groups), while ProPublica's analysis showed it violated separation-based criteria. Demographic parity was not the primary metric in that debate, but the case highlighted the practical importance of choosing and justifying a fairness criterion before deployment.

Healthcare

In 2019, Obermeyer, Powers, Vogeli, and Mullainathan published a study in Science showing that a widely used commercial healthcare algorithm exhibited significant racial bias. The algorithm used healthcare costs as a proxy for healthcare needs, but because Black patients on average incurred lower costs due to systemic barriers to care, the algorithm systematically assigned lower risk scores to Black patients. The study did not frame the problem in terms of demographic parity specifically, but correcting the proxy variable brought the algorithm much closer to satisfying demographic parity in its allocation of resources.