A fairness constraint is a mathematical condition imposed on a machine learning model during training, evaluation, or post-processing to ensure that its predictions or decisions satisfy a specified fairness criterion with respect to protected attributes such as race, gender, age, or disability status. Fairness constraints formalize the intuition that automated systems should not systematically disadvantage particular demographic groups, and they provide a principled mechanism for balancing predictive accuracy against equitable outcomes.
Fairness constraints have become a central topic in responsible artificial intelligence research and practice. They appear in hiring algorithms, credit scoring systems, criminal justice risk assessments, and healthcare resource allocation, where biased predictions can cause measurable harm to individuals and communities.
Imagine you are dividing candy among your friends. You want to be fair, so you make a rule: everyone should get roughly the same amount, no matter what color shirt they are wearing. A fairness constraint in machine learning works the same way. When a computer learns to make decisions (like who gets a loan or who gets called for a job interview), a fairness constraint is a rule that tells the computer it cannot give better results to one group of people over another just because of something like their gender or race. There are different rules for different kinds of fairness, and the computer has to follow whichever rule the designer picks while still trying to make good decisions overall.
Machine learning models learn patterns from historical data. When that data reflects past societal biases, the resulting models can replicate or amplify those biases. For example, a resume screening tool trained on historical hiring decisions may learn to penalize female applicants if the training data reflects a period when women were hired at lower rates. Similarly, a recidivism prediction model may assign higher risk scores to individuals from certain racial groups if the underlying arrest data is skewed by discriminatory policing practices.
Fairness constraints address this problem by encoding equity requirements directly into the learning process. Rather than relying solely on post-hoc audits, they allow practitioners to specify what "fair" means in mathematical terms and to optimize the model subject to those requirements.
The formal study of fairness constraints accelerated after several high-profile incidents. In 2016, ProPublica reported that the COMPAS recidivism prediction tool exhibited racial disparities, with Black defendants roughly twice as likely as white defendants to be incorrectly flagged as high risk. The same year, researchers demonstrated that word embeddings trained on large text corpora encoded gender stereotypes (for instance, associating "nurse" with female and "engineer" with male). These cases motivated a wave of research on how to formalize and enforce fairness in algorithmic systems.
A fairness constraint requires a precise mathematical definition of what constitutes fair behavior. Multiple definitions have been proposed, each capturing a different aspect of fairness. The table below summarizes the most widely used definitions.
| Fairness definition | Mathematical condition | Intuition | Introduced by |
|---|---|---|---|
| Demographic parity | P(Ŷ = 1 | A = a) = P(Ŷ = 1 | A = b) for all groups a, b | Positive prediction rates are equal across groups | Calders and Verwer (2010) |
| Equalized odds | P(Ŷ = 1 | A = a, Y = y) = P(Ŷ = 1 | A = b, Y = y) for all y | True positive and false positive rates are equal across groups | Hardt, Price, and Srebro (2016) |
| Equality of opportunity | P(Ŷ = 1 | A = a, Y = 1) = P(Ŷ = 1 | A = b, Y = 1) | True positive rates are equal across groups (relaxation of equalized odds) | Hardt, Price, and Srebro (2016) |
| Predictive parity | P(Y = 1 | Ŷ = 1, A = a) = P(Y = 1 | Ŷ = 1, A = b) | Positive predictive values are equal across groups | Chouldechova (2017) |
| Calibration | P(Y = 1 | S = s, A = a) = P(Y = 1 | S = s, A = b) for all scores s | Predicted probabilities reflect true outcome rates equally across groups | Kleinberg, Mullainathan, and Raghavan (2016) |
| Individual fairness | d(f(x_i), f(x_j)) ≤ L · d(x_i, x_j) | Similar individuals receive similar predictions (Lipschitz condition) | Dwork, Hardt, Pitassi, Reingold, and Zemel (2012) |
| Counterfactual fairness | P(Ŷ{A←a} | X = x, A = a) = P(Ŷ{A←b} | X = x, A = a) | Prediction does not change in a counterfactual world where the individual belongs to a different group | Kusner, Loftus, Russell, and Silva (2017) |
| Disparate impact ratio | P(Ŷ = 1 | A = unprivileged) / P(Ŷ = 1 | A = privileged) ≥ 0.8 | Selection rate for the unprivileged group is at least 80% of the privileged group (the "four-fifths rule") | EEOC (1978), adapted for ML |
In this table, Ŷ denotes the model's prediction, Y denotes the true outcome, A denotes the sensitive (protected) attribute, S denotes a score or predicted probability, and f denotes the classifier mapping.
Group fairness definitions (demographic parity, equalized odds, equality of opportunity, predictive parity, calibration) require statistical properties to hold at the level of demographic groups. Individual fairness, by contrast, requires that any two individuals who are similar with respect to the task at hand receive similar predictions. The similarity is measured by a task-specific distance metric, and the constraint is expressed as a Lipschitz condition on the classifier.
Group fairness is easier to measure and enforce because it only requires aggregate statistics. Individual fairness is more demanding because it requires specifying what "similar" means for each pair of individuals, which itself may be contested. In practice, most fairness constraint methods target group fairness definitions.
A set of impossibility results, demonstrated independently by Chouldechova (2017) and Kleinberg, Mullainathan, and Raghavan (2016), shows that several common fairness definitions cannot be satisfied simultaneously except in trivial cases. Specifically, when the base rates (the proportion of positive outcomes) differ across groups, it is impossible for a classifier to simultaneously achieve calibration, false positive rate balance (equal false positive rates), and false negative rate balance (equal false negative rates), unless the classifier is a perfect predictor.
These impossibility results mean that practitioners must choose which fairness definition to prioritize. The choice depends on the application context, the stakeholders involved, and the specific harms that the system could cause.
Techniques for incorporating fairness constraints into machine learning fall into three broad categories based on when the intervention occurs in the modeling pipeline.
Pre-processing methods modify the training data before the model is trained. The goal is to remove or reduce bias in the data so that a standard (unconstrained) learning algorithm produces fair predictions.
Reweighing. Proposed by Kamiran and Calders (2012), reweighing assigns instance-level weights to the training data such that the weighted distribution satisfies demographic parity with respect to the protected attribute. Instances from underrepresented (group, label) combinations receive higher weights, and instances from overrepresented combinations receive lower weights. The classifier then trains on the reweighted data using standard methods.
Disparate impact remover. Proposed by Feldman, Friedler, Moeller, Scheidegger, and Venkatasubramanian (2015), this method transforms feature values so that the distributions of each feature are the same across groups, while preserving within-group ranking. The transformed data can then be used with any classifier.
Learning fair representations. Proposed by Zemel, Wu, Swersky, Pitassi, and Dwork (2013), this approach learns a latent representation of the data that encodes the information needed for prediction while being statistically independent of the protected attribute. Any classifier trained on this representation inherits the fairness property.
Pre-processing methods are attractive because they are model-agnostic: once the data is transformed, any learning algorithm can be applied. However, they may lose information that is useful for prediction, and they do not provide formal guarantees on the fairness of the downstream classifier.
In-processing methods incorporate fairness constraints directly into the model training procedure. They modify the optimization problem that the learning algorithm solves so that the resulting model satisfies the desired fairness criterion.
The simplest in-processing strategy adds a fairness penalty term to the loss function. The modified objective takes the form:
min_w L(w) + lambda * F(w)
where L(w) is the standard loss (for example, cross-entropy for classification), F(w) is a measure of unfairness (for example, the difference in positive prediction rates across groups), and lambda is a hyperparameter controlling the strength of the fairness penalty.
This approach is simple to implement and compatible with standard gradient descent optimizers. However, it does not guarantee that the fairness constraint is satisfied exactly; it only encourages the model in the direction of fairness. The level of fairness achieved depends on the choice of lambda, which must be tuned empirically.
Zafar, Valera, Gomez-Rodriguez, and Gummadi (2017) proposed a decision-boundary fairness mechanism that adds covariance-based fairness constraints to logistic regression and support vector machine classifiers. Their method provides fine-grained control over the trade-off between accuracy and fairness.
Constrained optimization formulates the fairness requirement as a hard constraint rather than a soft penalty:
min_w L(w) subject to F(w) ≤ epsilon
where epsilon is a tolerance parameter specifying how much unfairness is acceptable. This formulation is mathematically more principled than regularization because it separates the accuracy objective from the fairness requirement, and the tolerance epsilon has a direct interpretation.
Solving this constrained optimization problem is challenging because fairness constraints are typically non-convex and non-differentiable (they involve indicator functions over model predictions and group memberships). Several algorithmic frameworks have been developed to handle these challenges.
Agarwal, Beygelzimer, Dudik, Langford, and Wallach (2018) proposed a reductions approach to fair classification that converts the constrained fairness problem into a sequence of cost-sensitive classification problems. The method uses the exponentiated gradient algorithm to iteratively adjust dual variables (Lagrange multipliers) corresponding to the fairness constraints.
The algorithm works as follows:
The exponentiated gradient method supports demographic parity, equalized odds, and other linear fairness constraints. It produces a randomized classifier (a distribution over at most T base classifiers, where T is the number of iterations), and it comes with finite-sample convergence guarantees: the resulting classifier is approximately optimal and approximately feasible. This approach is implemented in Microsoft's Fairlearn library.
Cotter, Jiang, Gupta, Wang, Narayan, You, and Sridharan (2019) addressed the problem that many fairness constraints are non-differentiable (for example, constraints on false positive rates involve indicator functions). Since standard Lagrangian methods require differentiable constraints, they introduced the proxy-Lagrangian formulation.
The proxy-Lagrangian replaces the original non-differentiable constraints with smooth, differentiable proxy constraints for the purpose of gradient computation, while the original constraints are still used by the dual player (the "auditor") to update the Lagrange multipliers. This leads to a two-player non-zero-sum game:
The solution concept is a semi-coarse correlated equilibrium. The resulting stochastic classifier can be expressed as a mixture of at most m+1 deterministic classifiers, where m is the number of constraints. This method is particularly useful for deep learning models where the loss landscape is non-convex.
Zhang, Lemoine, and Mitchell (2018) proposed an in-processing method based on adversarial learning. The approach trains two networks simultaneously:
The predictor is trained to maximize prediction accuracy while minimizing the adversary's ability to infer the protected attribute from the predictions. The gradient update for the predictor subtracts the projection of the adversary's gradient, pushing the predictor toward representations that are informative for the task but uninformative about group membership.
Adversarial debiasing can approximately enforce demographic parity (by making predictions independent of the protected attribute) or equalized odds (by conditioning the adversary on the true label). It is implemented in IBM's AI Fairness 360 toolkit.
Post-processing methods adjust the predictions of a trained model to satisfy fairness constraints, without modifying the training procedure itself.
Threshold adjustment. Hardt, Price, and Srebro (2016) showed that for any binary classifier producing a continuous score, one can find group-specific thresholds that convert the scores into binary predictions satisfying equalized odds. The method solves a linear program to find the optimal thresholds that maximize accuracy subject to the equalized odds constraint. This approach is model-agnostic and requires access only to the model's predictions and the protected attribute values.
Reject option classification. Kamiran, Karim, and Zhang (2012) proposed changing predictions for instances near the decision boundary. For individuals in the unprivileged group who are close to the boundary and receive a negative prediction, the prediction is flipped to positive. For individuals in the privileged group who are close to the boundary and receive a positive prediction, the prediction is flipped to negative.
Calibrated equalized odds. Pleiss, Raghavan, Wu, Kleinberg, and Weinberger (2017) developed a post-processing method that adjusts a calibrated classifier to satisfy equalized odds while preserving calibration as much as possible.
Post-processing methods have the advantage of being applicable to any existing model without retraining. However, they generally provide weaker fairness guarantees than in-processing methods because they cannot change the model's internal representations.
The following table compares the three categories of fairness constraint enforcement.
| Property | Pre-processing | In-processing | Post-processing |
|---|---|---|---|
| When applied | Before training | During training | After training |
| Model access required | Data only | Full model and training loop | Predictions only |
| Model-agnostic | Yes | No (method-specific) | Yes |
| Formal guarantees | Typically weak | Can be strong (e.g., exponentiated gradient) | Moderate (e.g., linear programming for threshold adjustment) |
| Computational cost | Low to moderate | Moderate to high | Low |
| Accuracy impact | Can degrade if too much information is removed | Can be controlled via constraint tolerance | Minimal if original model is strong |
| Supported fairness definitions | Primarily demographic parity | Flexible (DP, EO, equality of opportunity, etc.) | Primarily equalized odds, demographic parity |
| Example methods | Reweighing, disparate impact remover, fair representations | Exponentiated gradient, adversarial debiasing, regularization | Threshold adjustment, reject option, calibrated EO |
This section provides a more detailed mathematical treatment of constrained optimization for fairness.
Let X denote the feature space, Y the label space (Y = {0, 1} for binary classification), and A the set of protected attribute values. Let D be the joint distribution over (X, A, Y). Let H be a hypothesis class of classifiers h: X -> Y.
The goal is to find a classifier h* that minimizes the expected loss while satisfying fairness constraints:
h* = argmin_{h in H} E_{(x,a,y) ~ D} [l(h(x), y)]
subject to: g_j(h, D) ≤ epsilon_j for j = 1, ..., m
where l is a loss function, g_j are constraint functions encoding fairness requirements, and epsilon_j are tolerance parameters.
For demographic parity, the constraint function might be:
g(h, D) = |P(h(X) = 1 | A = a) - P(h(X) = 1 | A = b)|
For equalized odds, there are two constraint functions (one for each value of Y):
g_y(h, D) = |P(h(X) = 1 | A = a, Y = y) - P(h(X) = 1 | A = b, Y = y)| for y in {0, 1}
The standard approach to constrained optimization is Lagrangian relaxation. Define the Lagrangian:
L(h, lambda) = E[l(h(x), y)] + sum_{j=1}^{m} lambda_j * g_j(h, D)
where lambda = (lambda_1, ..., lambda_m) are non-negative Lagrange multipliers. The constrained problem is equivalent to the saddle-point problem:
min_h max_{lambda ≥ 0} L(h, lambda)
Under certain convexity conditions, strong duality holds and the saddle-point solution coincides with the constrained optimum. For non-convex problems (such as training neural networks), strong duality may not hold, but the Lagrangian relaxation still provides a useful algorithmic framework.
The exponentiated gradient method of Agarwal et al. (2018) provides the following guarantee: after T iterations, the algorithm returns a randomized classifier whose expected loss is at most OPT + O(1/sqrt(T)) and whose constraint violation is at most O(1/sqrt(T)), where OPT is the optimal loss among all feasible classifiers. These are finite-sample bounds that hold for a fixed dataset.
The proxy-Lagrangian method of Cotter et al. (2019) achieves similar convergence rates for non-convex, non-differentiable constraints, at the cost of requiring smooth proxy constraints.
Gradient boosted decision trees (GBDT) are widely used in high-stakes applications like credit scoring and fraud detection. Standard GBDT implementations (XGBoost, LightGBM, CatBoost) do not natively support fairness constraints. Several methods have been proposed to fill this gap.
FairGBM. Cruz, Belem, Jesus, Bravo, Saleiro, and Bizarro (2023) developed FairGBM, a framework for training gradient boosted trees with fairness constraints. FairGBM uses smooth convex proxies for non-differentiable fairness metrics and optimizes them using a dual ascent (proxy-Lagrangian) procedure. The method is implemented on top of LightGBM and supports demographic parity, equalized odds, and equality of opportunity. Experiments show that FairGBM achieves comparable predictive performance to unconstrained GBDT while satisfying fairness constraints, with training times that are an order of magnitude faster than alternative fair ML methods.
Fair regularization for GBDT. An alternative approach adds fairness-aware regularization terms to the GBDT objective function, penalizing splits that increase group disparity. This is conceptually simpler but provides weaker control over the fairness-accuracy trade-off.
Several open-source libraries implement fairness constraint methods. The table below compares the major toolkits.
| Toolkit | Developer | Language | Key fairness constraint methods | Fairness metrics |
|---|---|---|---|---|
| Fairlearn | Microsoft Research | Python | ExponentiatedGradient, GridSearch, ThresholdOptimizer, CorrelationRemover, AdversarialFairnessClassifier | Demographic parity, equalized odds, bounded group loss, error rate parity |
| AI Fairness 360 (AIF360) | IBM Research | Python, R | Reweighing, adversarial debiasing, prejudice remover, meta-fair classifier, reject option classification, calibrated equalized odds | Disparate impact, statistical parity difference, equal opportunity difference, average odds difference, Theil index |
| FairGBM | Feedzai Research | Python (C++ core) | Proxy-Lagrangian dual ascent for GBDT | Demographic parity, equalized odds, equality of opportunity |
| Themis-ML | Various | Python | Relabeling, additive counterfactually fair model | Demographic parity |
| What-If Tool | JavaScript/Python | Threshold adjustment (interactive) | Multiple (user-configurable) |
Fairness constraints have been applied in numerous domains where algorithmic decisions affect people's lives.
Recidivism prediction instruments like COMPAS are used by courts and parole boards across the United States to estimate the likelihood of reoffense. After ProPublica's 2016 analysis revealed racial disparities in COMPAS scores, researchers applied equalized odds and calibration constraints to recidivism prediction models. Studies have shown that fairness constraints can reduce racial disparities in false positive rates without a large cost in overall predictive accuracy.
Credit scoring models determine who receives loans and at what interest rates. The Equal Credit Opportunity Act in the United States prohibits discrimination based on race, sex, and other protected attributes. Fairness constraints, particularly demographic parity and equalized odds, have been applied to ensure that approval rates and error rates do not systematically differ across racial or gender groups. Microsoft Research and Ernst & Young have demonstrated the use of Fairlearn's exponentiated gradient method for fair credit modeling.
Automated resume screening and candidate ranking tools can perpetuate historical hiring biases. In 2018, Amazon discontinued an internal hiring tool after discovering it penalized resumes containing the word "women's" (as in "women's chess club"). Fairness constraints such as demographic parity can ensure that the selection rate for each demographic group meets a minimum threshold, consistent with the four-fifths rule used by the U.S. Equal Employment Opportunity Commission.
Algorithms that allocate healthcare resources (such as prioritizing patients for care management programs) have been found to exhibit racial bias. Obermeyer, Powers, Vogeli, and Mullainathan (2019) showed that a widely used healthcare algorithm exhibited significant racial bias because it used healthcare cost as a proxy for health need. Applying fairness constraints to equalize positive prediction rates or error rates across racial groups can help mitigate such disparities.
Imposing fairness constraints generally reduces the model's predictive accuracy relative to an unconstrained model. This happens because the unconstrained optimum may involve predictions that are correlated with the protected attribute, and the fairness constraint forces the model away from this optimum.
The magnitude of the trade-off depends on several factors:
Empirically, the trade-off is often small for moderate levels of fairness. For example, Agarwal et al. (2018) showed that their exponentiated gradient method achieved near-zero constraint violation with less than 1-2 percentage points of accuracy loss on several benchmark datasets. However, in cases with large base rate differences, the trade-off can be substantial.
Despite significant progress, fairness constraints have several limitations.
Choice of fairness definition. The impossibility results of Chouldechova (2017) and Kleinberg et al. (2016) show that no single classifier can satisfy all common fairness definitions simultaneously. Practitioners must choose which definition to enforce, and this choice is inherently a normative judgment rather than a purely technical one.
Intersectionality. Most fairness constraint methods consider protected attributes independently (for example, enforcing equal rates separately for race and gender). Enforcing fairness across intersectional groups (for example, Black women or elderly disabled individuals) is more challenging because the number of groups grows combinatorially, and sample sizes within each intersectional group may be small. Cotter, Gupta, Jiang, Srebro, Sridharan, Wang, Woodworth, and You (2019) proposed learning multiplier models to handle large numbers of constraints arising from intersectional groups.
Distributional shift. Fairness constraints are typically enforced on the training distribution. If the deployment distribution differs (for instance, because of population changes or feedback loops), the trained model may violate the fairness constraints on the new distribution.
Fairness gerrymandering. Kearns, Neel, Roth, and Wu (2018) showed that a classifier can satisfy fairness constraints for a predefined set of groups while being unfair to subgroups not included in the constraint set. They proposed algorithms for preventing this by ensuring fairness over a rich class of subgroups.
Proxy attributes. Even when a model does not directly use the protected attribute, other features may serve as proxies (for example, zip code as a proxy for race). Fairness constraints that operate only on the explicit protected attribute may not capture discrimination mediated through proxy features.
Causal reasoning. Observational fairness definitions (demographic parity, equalized odds) do not distinguish between legitimate and illegitimate uses of the protected attribute. Causal fairness definitions (such as counterfactual fairness) attempt to address this but require a causal model of the data generating process, which is often unavailable or contested.
The following table traces the development of key ideas in fairness constraints for machine learning.
| Year | Development | Authors |
|---|---|---|
| 2008 | Formalization of discrimination-aware data mining | Pedreshi, Ruggieri, and Turini |
| 2010 | Demographic parity for classifiers | Calders and Verwer |
| 2012 | Individual fairness via Lipschitz constraints ("Fairness Through Awareness") | Dwork, Hardt, Pitassi, Reingold, and Zemel |
| 2012 | Data preprocessing by reweighing for fairness | Kamiran and Calders |
| 2016 | Equalized odds and equality of opportunity | Hardt, Price, and Srebro |
| 2016 | Impossibility results for simultaneous calibration and error rate balance | Kleinberg, Mullainathan, and Raghavan; Chouldechova |
| 2017 | Decision-boundary fairness constraints for classifiers | Zafar, Valera, Gomez-Rodriguez, and Gummadi |
| 2017 | Counterfactual fairness via causal models | Kusner, Loftus, Russell, and Silva |
| 2018 | Exponentiated gradient reductions approach to fair classification | Agarwal, Beygelzimer, Dudik, Langford, and Wallach |
| 2018 | Adversarial debiasing | Zhang, Lemoine, and Mitchell |
| 2018 | Release of IBM AI Fairness 360 toolkit | Bellamy et al. (IBM Research) |
| 2019 | Proxy-Lagrangian method for non-differentiable fairness constraints | Cotter, Jiang, Gupta, et al. |
| 2020 | Release of Microsoft Fairlearn toolkit | Bird, Dudik, Edgar, et al. |
| 2023 | FairGBM for gradient boosting with fairness constraints | Cruz, Belem, Jesus, et al. |