# Fairness Constraint

> Source: https://aiwiki.ai/wiki/fairness_constraint
> Updated: 2026-07-11
> Categories: AI Ethics, Machine Learning
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

A **fairness constraint** is an explicit mathematical condition imposed on a [machine learning](/wiki/machine_learning) model during training, evaluation, or post-processing that forces its predictions to satisfy a specified group-fairness criterion, such as demographic parity, equalized odds, or equal opportunity, with respect to protected attributes like race, gender, age, or disability status. In formal terms, a classifier is trained to minimize prediction error subject to a constraint of the form $$g(h, D) \le \epsilon$$, where g measures the disparity between demographic groups and epsilon is the tolerance for how much unfairness is permitted. Fairness constraints are the central tool of [algorithmic fairness](/wiki/algorithmic_fairness): rather than auditing a model after the fact, they encode equity requirements directly into the optimization problem so that the resulting model trades a controlled amount of predictive accuracy for more equitable outcomes.

Fairness constraints have become a central topic in responsible [artificial intelligence](/wiki/artificial_intelligence) research and practice. They appear in hiring algorithms, credit scoring systems, criminal justice risk assessments, and healthcare resource allocation, where biased predictions can cause measurable harm to individuals and communities. A foundational result in the field is that the most common fairness constraints are mutually incompatible: Kleinberg, Mullainathan, and Raghavan (2016) proved that except in highly constrained special cases, no classifier can simultaneously satisfy calibration and equal error rates across groups when the groups have different base rates.[1][2]

## Explain like I'm 5 (ELI5)

Imagine you are dividing candy among your friends. You want to be fair, so you make a rule: everyone should get roughly the same amount, no matter what color shirt they are wearing. A fairness constraint in machine learning works the same way. When a computer learns to make decisions (like who gets a loan or who gets called for a job interview), a fairness constraint is a rule that tells the computer it cannot give better results to one group of people over another just because of something like their gender or race. There are different rules for different kinds of fairness, and the computer has to follow whichever rule the designer picks while still trying to make good decisions overall.

## Background and motivation

Machine learning models learn patterns from historical data. When that data reflects past societal biases, the resulting models can replicate or amplify those biases, a problem studied under the heading of [AI bias](/wiki/ai_bias). For example, a resume screening tool trained on historical hiring decisions may learn to penalize female applicants if the training data reflects a period when women were hired at lower rates. Similarly, a recidivism prediction model may assign higher risk scores to individuals from certain racial groups if the underlying arrest data is skewed by discriminatory policing practices.

Fairness constraints address this problem by encoding equity requirements directly into the learning process. Rather than relying solely on post-hoc audits, they allow practitioners to specify what "fair" means in mathematical terms and to optimize the model subject to those requirements.

The formal study of fairness constraints accelerated after several high-profile incidents. In May 2016, ProPublica's "Machine Bias" investigation reported that the COMPAS recidivism prediction tool exhibited racial disparities: among defendants who did not go on to reoffend within two years, 45 percent of Black defendants were labeled high risk versus 23 percent of white defendants, while among defendants who did reoffend, white defendants were mislabeled as low risk 48 percent of the time versus 28 percent for Black defendants.[3] The same year, researchers demonstrated that word embeddings trained on large text corpora encoded gender stereotypes (for instance, associating "nurse" with female and "engineer" with male). These cases motivated a wave of research on how to formalize and enforce fairness in algorithmic systems.

## What fairness definitions are used as constraints?

A fairness constraint requires a precise mathematical definition of what constitutes fair behavior. Multiple definitions have been proposed, each capturing a different aspect of fairness. The table below summarizes the most widely used definitions.

| Fairness definition | Mathematical condition | Intuition | Introduced by |
|---|---|---|---|
| [Demographic parity](/wiki/demographic_parity) | $$P(\hat{Y} = 1 \mid A = a) = P(\hat{Y} = 1 \mid A = b)$$ for all groups a, b | Positive prediction rates are equal across groups | Calders and Verwer (2010) |
| [Equalized odds](/wiki/equalized_odds) | $$P(\hat{Y} = 1 \mid A = a, Y = y) = P(\hat{Y} = 1 \mid A = b, Y = y)$$ for all y | True positive and false positive rates are equal across groups | Hardt, Price, and Srebro (2016) |
| [Equality of opportunity](/wiki/equality_of_opportunity) | $$P(\hat{Y} = 1 \mid A = a, Y = 1) = P(\hat{Y} = 1 \mid A = b, Y = 1)$$ | True positive rates are equal across groups (relaxation of equalized odds) | Hardt, Price, and Srebro (2016) |
| Predictive parity | $$P(Y = 1 \mid \hat{Y} = 1, A = a) = P(Y = 1 \mid \hat{Y} = 1, A = b)$$ | Positive predictive values are equal across groups | Chouldechova (2017) |
| Calibration | $$P(Y = 1 \mid S = s, A = a) = P(Y = 1 \mid S = s, A = b)$$ for all scores s | Predicted probabilities reflect true outcome rates equally across groups | Kleinberg, Mullainathan, and Raghavan (2016) |
| [Individual fairness](/wiki/individual_fairness) | $$d(f(x_i), f(x_j)) \le L \cdot d(x_i, x_j)$$ | Similar individuals receive similar predictions (Lipschitz condition) | Dwork, Hardt, Pitassi, Reingold, and Zemel (2012) |
| [Counterfactual fairness](/wiki/counterfactual_fairness) | $$P(\hat{Y}_{A \leftarrow a} \mid X = x, A = a) = P(\hat{Y}_{A \leftarrow b} \mid X = x, A = a)$$ | Prediction does not change in a counterfactual world where the individual belongs to a different group | Kusner, Loftus, Russell, and Silva (2017) |
| [Disparate impact](/wiki/disparate_impact) ratio | $$P(\hat{Y} = 1 \mid A = \text{unprivileged}) / P(\hat{Y} = 1 \mid A = \text{privileged}) \ge 0.8$$ | Selection rate for the unprivileged group is at least 80% of the privileged group (the "four-fifths rule") | EEOC (1978), adapted for ML |

In this table, $$\hat{Y}$$ denotes the model's prediction, Y denotes the true outcome, A denotes the sensitive (protected) attribute, S denotes a score or predicted probability, and f denotes the classifier mapping.

The disparate impact threshold of 0.8 is not an arbitrary choice. It is adapted from the four-fifths rule in the U.S. Uniform Guidelines on Employee Selection Procedures, jointly adopted in 1978 by the Equal Employment Opportunity Commission and three other federal agencies, which treat "a selection rate for any race, sex, or ethnic group which is less than four-fifths (4/5) (or eighty percent) of the rate for the group with the highest rate" as evidence of adverse impact.[4]

### Group fairness versus individual fairness

Group fairness definitions (demographic parity, equalized odds, equality of opportunity, predictive parity, calibration) require statistical properties to hold at the level of demographic groups. Individual fairness, by contrast, requires that any two individuals who are similar with respect to the task at hand receive similar predictions. The similarity is measured by a task-specific distance metric, and the constraint is expressed as a Lipschitz condition on the classifier.[5]

Group fairness is easier to measure and enforce because it only requires aggregate statistics. Individual fairness is more demanding because it requires specifying what "similar" means for each pair of individuals, which itself may be contested. In practice, most fairness constraint methods target group fairness definitions.

### Why can't all fairness definitions hold at once? (Impossibility results)

A set of impossibility results, demonstrated independently by Chouldechova (2017) and Kleinberg, Mullainathan, and Raghavan (2016), shows that several common fairness definitions cannot be satisfied simultaneously except in trivial cases. Specifically, when the base rates (the proportion of positive outcomes) differ across groups, it is impossible for a classifier to simultaneously achieve calibration, false positive rate balance (equal false positive rates), and false negative rate balance (equal false negative rates), unless the classifier is a perfect predictor.[1][2] Kleinberg and co-authors state the result starkly: except in special cases, "there is no method that can satisfy these three conditions simultaneously."[2]

This is precisely the tension that played out in the COMPAS debate. Northpointe, the tool's maker, defended it on the grounds that it was calibrated and equally accurate (about 60 percent) for Black and white defendants, while ProPublica criticized it for unequal false positive rates.[3] The impossibility theorem shows that, given the different base rates of reoffense, both properties could not hold at once. These impossibility results mean that practitioners must choose which fairness definition to prioritize. The choice depends on the application context, the stakeholders involved, and the specific harms that the system could cause.

## What methods enforce fairness constraints?

Techniques for incorporating fairness constraints into machine learning fall into three broad categories based on when the intervention occurs in the modeling pipeline: pre-processing (before training), in-processing (during training), and post-processing (after training).

### Pre-processing methods

Pre-processing methods modify the training data before the model is trained. The goal is to remove or reduce bias in the data so that a standard (unconstrained) learning algorithm produces fair predictions.

**Reweighing.** Proposed by Kamiran and Calders (2012), reweighing assigns instance-level weights to the training data such that the weighted distribution satisfies demographic parity with respect to the protected attribute. Instances from underrepresented (group, label) combinations receive higher weights, and instances from overrepresented combinations receive lower weights. The classifier then trains on the reweighted data using standard methods.[6]

**Disparate impact remover.** Proposed by Feldman, Friedler, Moeller, Scheidegger, and Venkatasubramanian (2015), this method transforms feature values so that the distributions of each feature are the same across groups, while preserving within-group ranking. The transformed data can then be used with any classifier.[7]

**Learning fair representations.** Proposed by Zemel, Wu, Swersky, Pitassi, and Dwork (2013), this approach learns a latent representation of the data that encodes the information needed for prediction while being statistically independent of the protected attribute. Any classifier trained on this representation inherits the fairness property.[8]

Pre-processing methods are attractive because they are model-agnostic: once the data is transformed, any learning algorithm can be applied. However, they may lose information that is useful for prediction, and they do not provide formal guarantees on the fairness of the downstream classifier.

### In-processing methods

In-processing methods incorporate fairness constraints directly into the model training procedure. They modify the [optimization](/wiki/convex_optimization) problem that the learning algorithm solves so that the resulting model satisfies the desired fairness criterion.

#### Regularization-based approaches

The simplest in-processing strategy adds a fairness penalty term to the [loss function](/wiki/loss_function). The modified objective takes the form:

$$
\min_w L(w) + \lambda F(w)
$$

where L(w) is the standard loss (for example, [cross-entropy](/wiki/cross-entropy) for classification), F(w) is a measure of unfairness (for example, the difference in positive prediction rates across groups), and lambda is a hyperparameter controlling the strength of the fairness penalty.

This approach is simple to implement and compatible with standard [gradient descent](/wiki/stochastic_gradient_descent_sgd) optimizers. However, it does not guarantee that the fairness constraint is satisfied exactly; it only encourages the model in the direction of fairness. The level of fairness achieved depends on the choice of lambda, which must be tuned empirically.

Zafar, Valera, Gomez-Rodriguez, and Gummadi (2017), in a paper titled "Fairness Constraints: Mechanisms for Fair Classification," proposed a decision-boundary fairness mechanism that adds covariance-based fairness constraints to logistic regression and [support vector machine](/wiki/support_vector_machine_svm) classifiers. Their method approximates disparate impact using the covariance between the protected attribute and the signed distance to the decision boundary, providing fine-grained control over the trade-off between accuracy and fairness.[9]

#### Constrained optimization approaches

Constrained optimization formulates the fairness requirement as a hard constraint rather than a soft penalty:

$$
\min_w L(w) \quad \text{subject to} \quad F(w) \le \epsilon
$$

where epsilon is a tolerance parameter specifying how much unfairness is acceptable. This formulation is mathematically more principled than regularization because it separates the accuracy objective from the fairness requirement, and the tolerance epsilon has a direct interpretation.

Solving this constrained optimization problem is challenging because fairness constraints are typically non-convex and non-differentiable (they involve indicator functions over model predictions and group memberships). Several algorithmic frameworks have been developed to handle these challenges.

#### The exponentiated gradient (reductions) method

Agarwal, Beygelzimer, Dudik, Langford, and Wallach (2018) proposed a reductions approach to fair classification that converts the constrained fairness problem into a sequence of cost-sensitive classification problems. The method uses the exponentiated gradient algorithm to iteratively adjust dual variables (Lagrange multipliers) corresponding to the fairness constraints.[10]

The algorithm works as follows:

1. Initialize the Lagrange multipliers (dual variables) uniformly.
2. At each iteration, solve a cost-sensitive classification problem with costs determined by the current multipliers.
3. Update the multipliers using an exponentiated gradient step, increasing multipliers for violated constraints and decreasing multipliers for satisfied constraints.
4. Return a randomized classifier that mixes the classifiers obtained across iterations.

The exponentiated gradient method supports demographic parity, equalized odds, and other linear fairness constraints. It produces a randomized classifier (a distribution over at most T base classifiers, where T is the number of iterations), and it comes with finite-sample convergence guarantees: the resulting classifier is approximately optimal and approximately feasible. This approach is implemented in Microsoft's [Fairlearn](/wiki/fairlearn) library as the ExponentiatedGradient reduction.[10][19]

#### The proxy-Lagrangian method

Cotter, Jiang, Gupta, Wang, Narayan, You, and Sridharan (2019) addressed the problem that many fairness constraints are non-differentiable (for example, constraints on false positive rates involve indicator functions). Since standard Lagrangian methods require differentiable constraints, they introduced the proxy-Lagrangian formulation.[11]

The proxy-Lagrangian replaces the original non-differentiable constraints with smooth, differentiable proxy constraints for the purpose of gradient computation, while the original constraints are still used by the dual player (the "auditor") to update the Lagrange multipliers. This leads to a two-player non-zero-sum game:

- **Player 1 (the learner)** minimizes the Lagrangian with respect to model parameters, using proxy constraints.
- **Player 2 (the auditor)** maximizes the Lagrangian with respect to the Lagrange multipliers, using the original constraints.

The solution concept is a semi-coarse correlated equilibrium. The resulting stochastic classifier can be expressed as a mixture of at most $$m+1$$ deterministic classifiers, where m is the number of constraints. This method is particularly useful for deep learning models where the loss landscape is non-convex.[11]

#### Adversarial debiasing

Zhang, Lemoine, and Mitchell (2018) proposed an in-processing method based on [adversarial learning](/wiki/generative_adversarial_network_gan). The approach trains two networks simultaneously:[12]

- A **predictor network** that maps inputs to predictions of the target variable.
- An **adversary network** that takes the predictor's output and attempts to predict the protected attribute.

The predictor is trained to maximize prediction accuracy while minimizing the adversary's ability to infer the protected attribute from the predictions. The gradient update for the predictor subtracts the projection of the adversary's gradient, pushing the predictor toward representations that are informative for the task but uninformative about group membership.

Adversarial debiasing can approximately enforce demographic parity (by making predictions independent of the protected attribute) or equalized odds (by conditioning the adversary on the true label). It is implemented in IBM's [AI Fairness 360](/wiki/aif360) toolkit.[12][13]

### Post-processing methods

Post-processing methods adjust the predictions of a trained model to satisfy fairness constraints, without modifying the training procedure itself.

**Threshold adjustment.** Hardt, Price, and Srebro (2016) showed that for any binary classifier producing a continuous score, one can find group-specific thresholds that convert the scores into binary predictions satisfying equalized odds. The method solves a linear program to find the optimal thresholds that maximize accuracy subject to the equalized odds constraint. This approach is model-agnostic and requires access only to the model's predictions and the protected attribute values.[5]

**Reject option classification.** Kamiran, Karim, and Zhang (2012) proposed changing predictions for instances near the decision boundary. For individuals in the unprivileged group who are close to the boundary and receive a negative prediction, the prediction is flipped to positive. For individuals in the privileged group who are close to the boundary and receive a positive prediction, the prediction is flipped to negative.

**Calibrated equalized odds.** Pleiss, Raghavan, Wu, Kleinberg, and Weinberger (2017) developed a post-processing method that adjusts a calibrated classifier to satisfy equalized odds while preserving calibration as much as possible. Consistent with the impossibility results, they showed that calibration is compatible only with a single equalized-odds error constraint (such as equal false negative rates), not both simultaneously.[14]

Post-processing methods have the advantage of being applicable to any existing model without retraining. However, they generally provide weaker fairness guarantees than in-processing methods because they cannot change the model's internal representations.

## How do the enforcement approaches compare?

The following table compares the three categories of fairness constraint enforcement.

| Property | Pre-processing | In-processing | Post-processing |
|---|---|---|---|
| When applied | Before training | During training | After training |
| Model access required | Data only | Full model and training loop | Predictions only |
| Model-agnostic | Yes | No (method-specific) | Yes |
| Formal guarantees | Typically weak | Can be strong (e.g., exponentiated gradient) | Moderate (e.g., linear programming for threshold adjustment) |
| Computational cost | Low to moderate | Moderate to high | Low |
| Accuracy impact | Can degrade if too much information is removed | Can be controlled via constraint tolerance | Minimal if original model is strong |
| Supported fairness definitions | Primarily demographic parity | Flexible (DP, EO, equality of opportunity, etc.) | Primarily equalized odds, demographic parity |
| Example methods | Reweighing, disparate impact remover, fair representations | Exponentiated gradient, adversarial debiasing, regularization | Threshold adjustment, reject option, calibrated EO |

## Mathematical formulation

This section provides a more detailed mathematical treatment of constrained optimization for fairness.

### The constrained learning problem

Let X denote the feature space, Y the label space ($$Y = \{0, 1\}$$ for binary classification), and A the set of protected attribute values. Let D be the joint distribution over (X, A, Y). Let H be a hypothesis class of classifiers $$h: X \to Y$$.

The goal is to find a classifier h* that minimizes the expected loss while satisfying fairness constraints:

$$
h^* = \arg\min_{h \in H} \mathbb{E}_{(x,a,y) \sim D}[l(h(x), y)]
$$

$$
\text{subject to:} \quad g_j(h, D) \le \epsilon_j \quad \text{for } j = 1, \ldots, m
$$

where l is a loss function, g_j are constraint functions encoding fairness requirements, and epsilon_j are tolerance parameters.

For demographic parity, the constraint function might be:

$$
g(h, D) = \left| P(h(X) = 1 \mid A = a) - P(h(X) = 1 \mid A = b) \right|
$$

For equalized odds, there are two constraint functions (one for each value of Y):

$$
g_y(h, D) = \left| P(h(X) = 1 \mid A = a, Y = y) - P(h(X) = 1 \mid A = b, Y = y) \right| \quad \text{for } y \in \{0, 1\}
$$

### Lagrangian relaxation

The standard approach to constrained optimization is Lagrangian relaxation. Define the Lagrangian:

$$
L(h, \lambda) = \mathbb{E}[l(h(x), y)] + \sum_{j=1}^{m} \lambda_j g_j(h, D)
$$

where $$\lambda = (\lambda_1, \ldots, \lambda_m)$$ are non-negative Lagrange multipliers. The constrained problem is equivalent to the saddle-point problem:

$$
\min_h \max_{\lambda \ge 0} L(h, \lambda)
$$

Under certain convexity conditions, strong duality holds and the saddle-point solution coincides with the constrained optimum. For non-convex problems (such as training [neural networks](/wiki/neural_network)), strong duality may not hold, but the Lagrangian relaxation still provides a useful algorithmic framework.

### Convergence and feasibility guarantees

The exponentiated gradient method of Agarwal et al. (2018) provides the following guarantee: after T iterations, the algorithm returns a randomized classifier whose expected loss is at most $$\text{OPT} + O(1/\sqrt{T})$$ and whose constraint violation is at most $$O(1/\sqrt{T})$$, where OPT is the optimal loss among all feasible classifiers. These are finite-sample bounds that hold for a fixed dataset.[10]

The proxy-Lagrangian method of Cotter et al. (2019) achieves similar convergence rates for non-convex, non-differentiable constraints, at the cost of requiring smooth proxy constraints.[11]

## Fairness constraints in gradient boosting

Gradient boosted decision trees ([GBDT](/wiki/gradient_boosting)) are widely used in high-stakes applications like credit scoring and fraud detection. Standard GBDT implementations (XGBoost, LightGBM, CatBoost) do not natively support fairness constraints. Several methods have been proposed to fill this gap.

**FairGBM.** Cruz, Belem, Jesus, Bravo, Saleiro, and Bizarro (2023, ICLR) developed FairGBM, a framework for training gradient boosted trees with fairness constraints. FairGBM uses smooth convex proxies for non-differentiable fairness metrics and optimizes them using a dual ascent (proxy-Lagrangian) procedure. The method is implemented as a fork of Microsoft's [LightGBM](/wiki/lightgbm) and supports demographic parity, equalized odds, and equality of opportunity. The authors report that FairGBM achieves comparable predictive performance to unconstrained GBDT while satisfying fairness constraints, with "an order of magnitude speedup in training time relative to related work": the total time to train all FairGBM models was under a tenth of the time needed for the equivalent exponentiated gradient (reductions) models.[15]

**Fair regularization for GBDT.** An alternative approach adds fairness-aware regularization terms to the GBDT objective function, penalizing splits that increase group disparity. This is conceptually simpler but provides weaker control over the fairness-accuracy trade-off.

## What software toolkits implement fairness constraints?

Several open-source libraries implement fairness constraint methods. The table below compares the major toolkits.

| Toolkit | Developer | Language | Key fairness constraint methods | Fairness metrics |
|---|---|---|---|---|
| [Fairlearn](/wiki/fairlearn) | Microsoft Research | Python | ExponentiatedGradient, GridSearch, ThresholdOptimizer, CorrelationRemover, AdversarialFairnessClassifier | Demographic parity, equalized odds, bounded group loss, error rate parity |
| [AI Fairness 360](/wiki/aif360) (AIF360) | IBM Research | Python, R | Reweighing, adversarial debiasing, prejudice remover, meta-fair classifier, reject option classification, calibrated equalized odds | Disparate impact, statistical parity difference, equal opportunity difference, average odds difference, Theil index |
| FairGBM | Feedzai Research | Python (C++ core) | Proxy-Lagrangian dual ascent for GBDT | Demographic parity, equalized odds, equality of opportunity |
| Themis-ML | Various | Python | Relabeling, additive counterfactually fair model | Demographic parity |
| What-If Tool | Google | JavaScript/Python | Threshold adjustment (interactive) | Multiple (user-configurable) |

## Real-world applications

Fairness constraints have been applied in numerous domains where algorithmic decisions affect people's lives.

### Criminal justice

Recidivism prediction instruments like COMPAS are used by courts and parole boards across the United States to estimate the likelihood of reoffense. After ProPublica's 2016 analysis revealed racial disparities in COMPAS scores (45 percent of non-reoffending Black defendants flagged high risk versus 23 percent of white defendants), researchers applied equalized odds and calibration constraints to recidivism prediction models.[3] Studies have shown that fairness constraints can reduce racial disparities in false positive rates without a large cost in overall predictive accuracy.

### Lending and credit

Credit scoring models determine who receives loans and at what interest rates. The Equal Credit Opportunity Act in the United States prohibits discrimination based on race, sex, and other protected attributes. Fairness constraints, particularly demographic parity and equalized odds, have been applied to ensure that approval rates and error rates do not systematically differ across racial or gender groups. The original equalized-odds paper used FICO credit scores as its central case study, examining how group-specific thresholds affect loan qualification across racial groups.[5] Microsoft Research has demonstrated the use of Fairlearn's exponentiated gradient method for fair credit modeling.

### Hiring and recruitment

Automated resume screening and candidate ranking tools can perpetuate historical hiring biases. Amazon began building an experimental recruiting tool in 2014 that scored applicants from one to five stars, but according to a 2018 Reuters report the company scrapped it after finding that it penalized resumes containing the word "women's" (as in "women's chess club captain") and downgraded graduates of two all-women colleges, because it had been trained on ten years of mostly male resumes.[16] Fairness constraints such as demographic parity can ensure that the selection rate for each demographic group meets a minimum threshold, consistent with the four-fifths (80 percent) rule used by the U.S. Equal Employment Opportunity Commission.[4]

### Healthcare

Algorithms that allocate healthcare resources (such as prioritizing patients for care management programs) have been found to exhibit racial bias. Obermeyer, Powers, Vogeli, and Mullainathan (2019), writing in Science, showed that a widely used commercial algorithm exhibited significant racial bias because it used healthcare cost as a proxy for health need; the authors found that at a given risk score, Black patients were considerably sicker than white patients, and that reformulating the algorithm to predict need rather than cost would raise the share of Black patients flagged for extra help from 17.7 percent to 46.5 percent.[17] Applying fairness constraints to equalize positive prediction rates or error rates across racial groups can help mitigate such disparities.

## What is the fairness-accuracy trade-off?

Imposing fairness constraints generally reduces the model's predictive accuracy relative to an unconstrained model. This happens because the unconstrained optimum may involve predictions that are correlated with the protected attribute, and the fairness constraint forces the model away from this optimum.

The magnitude of the trade-off depends on several factors:

- **The strength of the constraint.** Tighter constraints (smaller epsilon) impose a larger accuracy cost.
- **The base rate difference.** When the outcome rates differ substantially across groups, achieving equalized odds or demographic parity requires larger deviations from the unconstrained optimum.
- **The informativeness of the protected attribute.** If the protected attribute is strongly correlated with the target variable (possibly through legitimate features), removing its influence has a larger accuracy cost.
- **The choice of fairness definition.** Some definitions are more restrictive than others. For example, equalized odds constrains both false positive and false negative rates, while equality of opportunity constrains only the false negative rate (or equivalently, the true positive rate).

Empirically, the trade-off is often small for moderate levels of fairness. For example, Agarwal et al. (2018) showed that their exponentiated gradient method achieved near-zero constraint violation with only a small accuracy loss (on the order of 1-2 percentage points) on several benchmark datasets.[10] However, in cases with large base rate differences, the trade-off can be substantial.

## Limitations and open challenges

Despite significant progress, fairness constraints have several limitations.

**Choice of fairness definition.** The impossibility results of Chouldechova (2017) and Kleinberg et al. (2016) show that no single classifier can satisfy all common fairness definitions simultaneously.[1][2] Practitioners must choose which definition to enforce, and this choice is inherently a normative judgment rather than a purely technical one.

**Intersectionality.** Most fairness constraint methods consider protected attributes independently (for example, enforcing equal rates separately for race and gender). Enforcing fairness across intersectional groups (for example, Black women or elderly disabled individuals) is more challenging because the number of groups grows combinatorially, and sample sizes within each intersectional group may be small. Cotter, Gupta, Jiang, Srebro, Sridharan, Wang, Woodworth, and You (2019) proposed learning multiplier models to handle large numbers of constraints arising from intersectional groups.

**Distributional shift.** Fairness constraints are typically enforced on the training distribution. If the deployment distribution differs (for instance, because of population changes or feedback loops), the trained model may violate the fairness constraints on the new distribution.

**Fairness gerrymandering.** Kearns, Neel, Roth, and Wu (2018) showed that a classifier can satisfy fairness constraints for a predefined set of groups while being unfair to subgroups not included in the constraint set. They proposed algorithms for preventing this by ensuring fairness over a rich (exponentially large) class of structured subgroups.[18]

**Proxy attributes.** Even when a model does not directly use the protected attribute, other features may serve as proxies (for example, zip code as a proxy for race). Fairness constraints that operate only on the explicit protected attribute may not capture discrimination mediated through proxy features. The Obermeyer study is a canonical example, where healthcare cost served as a proxy that introduced racial bias.[17]

**Causal reasoning.** Observational fairness definitions (demographic parity, equalized odds) do not distinguish between legitimate and illegitimate uses of the protected attribute. Causal fairness definitions (such as counterfactual fairness) attempt to address this but require a causal model of the data generating process, which is often unavailable or contested.

## Historical timeline

The following table traces the development of key ideas in fairness constraints for machine learning.

| Year | Development | Authors |
|---|---|---|
| 2008 | Formalization of discrimination-aware data mining | Pedreshi, Ruggieri, and Turini |
| 2010 | Demographic parity for classifiers | Calders and Verwer |
| 2012 | Individual fairness via Lipschitz constraints ("Fairness Through Awareness") | Dwork, Hardt, Pitassi, Reingold, and Zemel |
| 2012 | Data preprocessing by reweighing for fairness | Kamiran and Calders |
| 2016 | Equalized odds and equality of opportunity | Hardt, Price, and Srebro |
| 2016 | Impossibility results for simultaneous calibration and error rate balance | Kleinberg, Mullainathan, and Raghavan; Chouldechova |
| 2017 | Decision-boundary fairness constraints for classifiers | Zafar, Valera, Gomez-Rodriguez, and Gummadi |
| 2017 | Counterfactual fairness via causal models | Kusner, Loftus, Russell, and Silva |
| 2018 | Exponentiated gradient reductions approach to fair classification | Agarwal, Beygelzimer, Dudik, Langford, and Wallach |
| 2018 | Adversarial debiasing | Zhang, Lemoine, and Mitchell |
| 2018 | Release of IBM AI Fairness 360 toolkit | Bellamy et al. (IBM Research) |
| 2019 | Proxy-Lagrangian method for non-differentiable fairness constraints | Cotter, Jiang, Gupta, et al. |
| 2020 | Release of Microsoft Fairlearn toolkit | Bird, Dudik, Edgar, et al. |
| 2023 | FairGBM for gradient boosting with fairness constraints | Cruz, Belem, Jesus, et al. |

## See also

- [Algorithmic fairness](/wiki/algorithmic_fairness)
- [AI bias](/wiki/ai_bias)
- [Demographic parity](/wiki/demographic_parity)
- [Equalized odds](/wiki/equalized_odds)
- [Equality of opportunity](/wiki/equality_of_opportunity)
- [Disparate impact](/wiki/disparate_impact)
- [Disparate treatment](/wiki/disparate_treatment)
- [Counterfactual fairness](/wiki/counterfactual_fairness)
- [Individual fairness](/wiki/individual_fairness)
- [Bias (ethics/fairness)](/wiki/bias_ethics_fairness)
- [AI Fairness 360](/wiki/aif360)
- [Fairlearn](/wiki/fairlearn)
- [Loss function](/wiki/loss_function)
- [Regularization](/wiki/regularization)

## References

1. Chouldechova, A. (2017). "Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments." *Big Data*, 5(2), pp. 153-163. https://doi.org/10.1089/big.2016.0047
2. Kleinberg, J., Mullainathan, S., and Raghavan, M. (2016). "Inherent Trade-Offs in the Fair Determination of Risk Scores." *Proceedings of the 8th Innovations in Theoretical Computer Science Conference (ITCS 2017)*, LIPIcs 67, pp. 43:1-43:23. arXiv:1609.05807. https://arxiv.org/abs/1609.05807
3. Angwin, J., Larson, J., Mattu, S., and Kirchner, L. (2016). "Machine Bias." *ProPublica*, May 23, 2016. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
4. U.S. Equal Employment Opportunity Commission et al. (1978). "Uniform Guidelines on Employee Selection Procedures," 29 CFR Part 1607, Section 1607.4(D) (the four-fifths / 80% rule). https://www.ecfr.gov/current/title-29/subtitle-B/chapter-XIV/part-1607
5. Hardt, M., Price, E., and Srebro, N. (2016). "Equality of Opportunity in Supervised Learning." *Advances in Neural Information Processing Systems (NeurIPS) 29*, pp. 3315-3323. arXiv:1610.02413. https://arxiv.org/abs/1610.02413
6. Kamiran, F. and Calders, T. (2012). "Data Preprocessing Techniques for Classification without Discrimination." *Knowledge and Information Systems*, 33(1), pp. 1-33. https://doi.org/10.1007/s10115-011-0463-8
7. Feldman, M., Friedler, S. A., Moeller, J., Scheidegger, C., and Venkatasubramanian, S. (2015). "Certifying and Removing Disparate Impact." *Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, pp. 259-268. arXiv:1412.3756. https://arxiv.org/abs/1412.3756
8. Zemel, R., Wu, Y., Swersky, K., Pitassi, T., and Dwork, C. (2013). "Learning Fair Representations." *Proceedings of the 30th International Conference on Machine Learning (ICML)*, PMLR 28(3), pp. 325-333. https://proceedings.mlr.press/v28/zemel13.html
9. Zafar, M. B., Valera, I., Gomez-Rodriguez, M., and Gummadi, K. P. (2017). "Fairness Constraints: Mechanisms for Fair Classification." *Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS)*, PMLR 54, pp. 962-970. https://proceedings.mlr.press/v54/zafar17a.html
10. Agarwal, A., Beygelzimer, A., Dudik, M., Langford, J., and Wallach, H. (2018). "A Reductions Approach to Fair Classification." *Proceedings of the 35th International Conference on Machine Learning (ICML)*, PMLR 80, pp. 60-69. arXiv:1803.02453. https://arxiv.org/abs/1803.02453
11. Cotter, A., Jiang, H., Gupta, M., Wang, S., Narayan, T., You, S., and Sridharan, K. (2019). "Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals." *Journal of Machine Learning Research*, 20(172), pp. 1-59. https://jmlr.org/papers/v20/18-616.html
12. Zhang, B. H., Lemoine, B., and Mitchell, M. (2018). "Mitigating Unwanted Biases with Adversarial Learning." *Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES)*, pp. 335-340. arXiv:1801.07593. https://arxiv.org/abs/1801.07593
13. Bellamy, R. K. E., Dey, K., Hind, M., et al. (2019). "AI Fairness 360: An Extensible Toolkit for Detecting and Mitigating Algorithmic Bias." *IBM Journal of Research and Development*, 63(4/5), pp. 4:1-4:15. arXiv:1810.01943. https://arxiv.org/abs/1810.01943
14. Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J., and Weinberger, K. Q. (2017). "On Fairness and Calibration." *Advances in Neural Information Processing Systems (NeurIPS) 30*. https://proceedings.neurips.cc/paper/2017/hash/b8b9c74ac526fffbeb2d39ab038d1cd7-Abstract.html
15. Cruz, A. F., Belem, C., Jesus, S., Bravo, J., Saleiro, P., and Bizarro, P. (2023). "FairGBM: Gradient Boosting with Fairness Constraints." *Proceedings of the Eleventh International Conference on Learning Representations (ICLR)*. arXiv:2209.07850. https://arxiv.org/abs/2209.07850
16. Dastin, J. (2018). "Amazon Scraps Secret AI Recruiting Tool That Showed Bias Against Women." *Reuters*, October 10, 2018. https://www.reuters.com/article/world/insight-amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK0AG/
17. Obermeyer, Z., Powers, B., Vogeli, C., and Mullainathan, S. (2019). "Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations." *Science*, 366(6464), pp. 447-453. https://doi.org/10.1126/science.aax2342
18. Kearns, M., Neel, S., Roth, A., and Wu, Z. S. (2018). "Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness." *Proceedings of the 35th International Conference on Machine Learning (ICML)*, PMLR 80, pp. 2564-2572. arXiv:1711.05144. https://arxiv.org/abs/1711.05144
19. Bird, S., Dudik, M., Edgar, R., et al. (2020). "Fairlearn: A Toolkit for Assessing and Improving Fairness in AI." Microsoft Research Technical Report MSR-TR-2020-32. https://www.microsoft.com/en-us/research/publication/fairlearn-a-toolkit-for-assessing-and-improving-fairness-in-ai/