A fairness metric is a quantitative measure used to evaluate whether a machine learning model's predictions or decisions treat different demographic groups equitably. Fairness metrics provide formal, mathematical criteria that allow researchers and practitioners to audit algorithms for bias, compare the behavior of different models across protected groups, and guide the design of bias mitigation strategies. Because no single metric captures all aspects of fairness, selecting the right metric depends on the application domain, the type of harm being measured, and the values of the stakeholders involved.
Fairness metrics have become a central topic in responsible AI research, driven in part by high-profile controversies such as the 2016 ProPublica investigation of the COMPAS recidivism prediction tool and by growing regulatory attention from bodies like the U.S. Equal Employment Opportunity Commission (EEOC) and the European Union.
Imagine a teacher is handing out gold stars to students. A fairness metric is like a rule that checks whether the teacher is giving stars fairly. For example, one rule might say "the same percentage of boys and girls should get stars." Another rule might say "if a boy and a girl both did great work, they should both get a star." These rules sometimes disagree with each other, and that is one of the tricky parts about fairness. In machine learning, computers make decisions about people (like who gets a loan or who gets called for a job interview), and fairness metrics are the rules we use to check whether those decisions are being made fairly for everyone.
The intellectual roots of fairness metrics extend well beyond computer science. Quantitative fairness testing first emerged in the 1960s and 1970s, following the passage of the Civil Rights Act of 1964 in the United States. The landmark Supreme Court case Griggs v. Duke Power Co. (1971) established the legal concept of "disparate impact," ruling that employment practices with discriminatory effects violate Title VII of the Civil Rights Act even if the employer had no discriminatory intent. The court found that aptitude tests used by Duke Power Company resulted in a pass rate of 58% for white applicants but only 6% for Black applicants, and that the tests bore no demonstrable relationship to job performance.
Following Griggs, the EEOC adopted the "four-fifths rule" (also known as the 80% rule) as a guideline for detecting adverse impact: if the selection rate for a protected group is less than 80% of the rate for the group with the highest selection rate, this may indicate discrimination. Although the four-fifths rule is not a strict legal standard and has known statistical limitations, it has influenced the development of disparate impact metrics in machine learning.
Modern fairness metrics research in machine learning accelerated after 2012, when Dwork et al. introduced the concept of individual fairness in their paper "Fairness Through Awareness." The field gained broader public attention in May 2016, when ProPublica published its "Machine Bias" investigation. ProPublica analyzed COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), a risk assessment tool developed by Northpointe (now Equivant) and used in U.S. courts to predict recidivism. ProPublica found that Black defendants were almost twice as likely as white defendants to be incorrectly flagged as high risk (higher false positive rate), while white defendants were more likely to be incorrectly labeled low risk despite going on to reoffend (higher false negative rate). Northpointe responded that its tool satisfied predictive parity, meaning that among defendants assigned the same risk score, Black and white defendants reoffended at similar rates.
Both sides were correct by their own chosen metric. This disagreement highlighted a fundamental tension that was soon formalized by Chouldechova (2017) and Kleinberg, Mullainathan, and Raghavan (2016) in what became known as the impossibility theorems of fairness.
Fairness metrics can be organized into several broad families based on what they measure and at what level of granularity they operate.
| Family | Level | Core idea | Key examples |
|---|---|---|---|
| Group fairness (independence) | Group | Predictions should be statistically independent of protected attributes | Demographic parity, conditional statistical parity |
| Group fairness (separation) | Group | Prediction errors should be equal across groups, given the true outcome | Equalized odds, equal opportunity, false positive rate balance |
| Group fairness (sufficiency) | Group | True outcomes should be independent of group membership, given the prediction | Predictive parity, calibration, conditional use accuracy equality |
| Individual fairness | Individual | Similar individuals should receive similar predictions | Fairness through awareness (Lipschitz condition) |
| Causal fairness | Individual/Structural | Outcomes should not change in counterfactual scenarios where group membership differs | Counterfactual fairness |
| Subgroup fairness | Subgroup | Fairness should hold for intersections of protected attributes, not just single groups | Subgroup fairness (Kearns et al., 2018) |
Group fairness metrics compare the statistical behavior of a model's predictions across demographic groups defined by one or more protected attributes (such as race, gender, or age). These metrics can be further divided based on which statistical property they require to be equal across groups.
Demographic parity requires that the probability of receiving a positive prediction is the same for all groups. Formally, a classifier h satisfies demographic parity if:
P(h(X) = 1 | A = a) = P(h(X) = 1 | A = b)
for all values a and b of the protected attribute A.
In other words, the selection rate (the fraction of individuals who receive a positive outcome) must be equal across groups, regardless of whether those individuals actually qualified for the positive outcome. Demographic parity corresponds to the statistical independence criterion: the prediction R and the sensitive attribute A are statistically independent (R is independent of A).
Demographic parity is straightforward to compute and easy to interpret, but it has a significant limitation. It ignores the true label Y entirely, meaning it can be satisfied by a classifier that performs poorly on all groups as long as it assigns positive predictions at equal rates. It can also conflict with accuracy when base rates (the prevalence of the positive outcome) differ between groups.
Equalized odds requires that the model's true positive rate (TPR) and false positive rate (FPR) are equal across groups. Formally:
P(h(X) = 1 | Y = y, A = a) = P(h(X) = 1 | Y = y, A = b)
for y in {0, 1} and all values a, b of the protected attribute A.
This criterion corresponds to the separation condition: the prediction R is independent of A given the true outcome Y (R is independent of A given Y). Equalized odds is stricter than demographic parity because it conditions on the true label, requiring that the model makes errors at equal rates across groups.
Equal opportunity is a relaxation of equalized odds proposed by Hardt, Price, and Srebro (2016). It requires only that the true positive rate be equal across groups:
P(h(X) = 1 | Y = 1, A = a) = P(h(X) = 1 | Y = 1, A = b)
This means that among individuals who actually deserve the positive outcome, each group should have the same chance of receiving it. Equal opportunity ignores the false positive rate, making it a less restrictive requirement than equalized odds but still stronger than demographic parity.
Predictive parity requires that the positive predictive value (PPV, or precision) be the same across groups:
P(Y = 1 | h(X) = 1, A = a) = P(Y = 1 | h(X) = 1, A = b)
In words, among individuals who receive a positive prediction, the proportion who truly belong to the positive class should be equal for all groups. This is the metric that Northpointe claimed COMPAS satisfied in response to ProPublica's critique.
Predictive parity belongs to the sufficiency family: the true outcome Y is independent of A given the prediction R (Y is independent of A given R).
Calibration, sometimes called test fairness or well-calibration, extends predictive parity to risk scores rather than binary predictions. A model is calibrated across groups if, for any predicted probability score s:
P(Y = 1 | S = s, A = a) = P(Y = 1 | S = s, A = b)
where S is the model's outputted probability score. In a well-calibrated model, a predicted probability of 0.7 should correspond to an actual positive rate of approximately 70% regardless of group membership. Calibration is a sufficiency-based criterion.
Several other group fairness metrics appear in the literature:
| Metric | Definition | Condition equalized across groups |
|---|---|---|
| False positive rate balance | P(h(X)=1 | Y=0, A=a) = P(h(X)=1 | Y=0, A=b) | False positive rate |
| False negative rate balance | P(h(X)=0 | Y=1, A=a) = P(h(X)=0 | Y=1, A=b) | False negative rate |
| Overall accuracy equality | Accuracy(A=a) = Accuracy(A=b) | Overall accuracy |
| Treatment equality | FN/FP ratio for group a = FN/FP ratio for group b | Ratio of false negatives to false positives |
| Conditional use accuracy equality | PPV(A=a) = PPV(A=b) and NPV(A=a) = NPV(A=b) | Both positive and negative predictive values |
| Balance for the positive class | E[S | Y=1, A=a] = E[S | Y=1, A=b] | Mean predicted score among positive instances |
| Balance for the negative class | E[S | Y=0, A=a] = E[S | Y=0, A=b] | Mean predicted score among negative instances |
While group fairness metrics compare aggregate statistics across demographic groups, individual fairness focuses on whether a model treats each individual appropriately relative to other similar individuals.
The concept of individual fairness was formalized by Dwork, Hardt, Pitassi, Reingold, and Zemel in their 2012 paper "Fairness Through Awareness." Their definition is grounded in a Lipschitz condition: a mapping M from individuals to outcome distributions satisfies individual fairness if, for any two individuals x and y:
D(M(x), M(y)) <= d(x, y)
where D is a distance metric on outcome distributions (such as total variation distance or statistical distance) and d is a task-specific similarity metric on individuals. The inequality states that if two individuals are close in the feature space according to d, they must receive similar outcome distributions according to D.
The main advantage of individual fairness is that it provides guarantees at the individual level rather than only at the group level. However, it requires defining an appropriate task-specific similarity metric d, which can be difficult in practice and may itself encode biases. The choice of d is domain-dependent and often requires expert knowledge or stakeholder input.
A simpler and weaker notion is fairness through unawareness (FTU), which simply excludes protected attributes from the model's input features. While intuitive, FTU is widely considered insufficient because other features (such as zip code or surname) may serve as proxies for the protected attribute, allowing the model to reconstruct group membership indirectly. This phenomenon is known as redundant encoding.
Counterfactual fairness, introduced by Kusner, Loftus, Russell, and Silva (2017), uses causal inference to define fairness. A decision is counterfactually fair toward an individual if the decision would remain the same in a counterfactual world where the individual's protected attribute had been different, with all other non-descendant variables held at their observed values.
Formally, a predictor h is counterfactually fair if:
P(h(X){A <- a} = 1 | A = a, X = x) = P(h(X){A <- b} = 1 | A = a, X = x)
for all attribute values a and b. Here, h(X)_{A <- a} denotes the prediction that would have been made if the protected attribute had been set to value a through intervention.
Counterfactual fairness requires constructing a causal model (a directed acyclic graph) that specifies the causal relationships between the protected attribute, other features, and the outcome. In practice, the predictor achieves counterfactual fairness by using only features that are non-descendants of the protected attribute in the causal graph.
The causal approach addresses a key limitation of purely statistical fairness metrics: it distinguishes between legitimate and illegitimate uses of information that correlates with group membership. For example, in a hiring scenario, education level may correlate with race due to historical inequities, but whether this correlation constitutes unfairness depends on whether education is causally downstream of race in a way that the decision-maker considers illegitimate.
Traditional group fairness metrics evaluate fairness with respect to a single protected attribute (e.g., race or gender). However, individuals belong to multiple demographic groups simultaneously, and a model that appears fair for each attribute in isolation may be unfair for intersectional subgroups (e.g., Black women).
Kearns, Neel, Roth, and Wu (2018) formalized this problem in their paper "Preventing Fairness Gerrymandering." They showed that a classifier can satisfy a fairness constraint for each protected group individually while violating it for structured subgroups formed by combinations of protected attributes. The authors proved that auditing subgroup fairness is computationally equivalent to the problem of weak agnostic learning, making it hard in the worst case even for simple subgroup classes.
Intersectional fairness metrics extend standard group fairness definitions (such as demographic parity or equalized odds) to hold across a rich collection of subgroups rather than just the groups defined by a single attribute.
One of the most significant theoretical results in the fairness metrics literature is that several commonly desired fairness properties cannot be satisfied simultaneously, except in trivial or degenerate cases.
In "Fair Prediction with Disparate Impact," Alexandra Chouldechova proved that when the base rate (prevalence of the positive outcome) differs between two groups, no imperfect classifier can simultaneously satisfy:
The intuition is that when one group has a higher base rate, maintaining equal PPV forces the classifier to have different error rate distributions across groups. The only exceptions are a perfect classifier (which makes no errors) and the case where both groups have identical base rates.
In "Inherent Trade-Offs in the Fair Determination of Risk Scores," Kleinberg, Mullainathan, and Raghavan formalized three conditions for risk score assignments:
They proved that except when the classifier is perfect or the base rates are equal across groups, no risk score assignment can satisfy all three conditions simultaneously.
The impossibility theorems do not mean that pursuing fairness is futile. Instead, they establish that fairness is an inherently normative concept requiring practitioners to make explicit choices about which fairness criteria to prioritize. The appropriate choice depends on the specific application, the types of harm at stake, and the values of the affected communities. In criminal justice, for example, one might prioritize equal false positive rates (to avoid disproportionately punishing one group) over predictive parity. In lending, predictive parity might be preferred to ensure that approved applicants from all groups have similar repayment rates.
The following table summarizes the key impossibility results:
| Result | Year | Conditions shown to be incompatible | Exception cases |
|---|---|---|---|
| Chouldechova | 2017 | Predictive parity + FPR balance + FNR balance | Perfect classifier; equal base rates |
| Kleinberg, Mullainathan, Raghavan | 2016 | Calibration + balance for positive class + balance for negative class | Perfect classifier; equal base rates |
| General incompatibility | Various | Independence (demographic parity) + separation (equalized odds) + sufficiency (calibration) | Only two of the three can be satisfied simultaneously (unless degenerate) |
Enforcing fairness constraints typically comes at some cost to overall predictive accuracy. This trade-off arises because fairness constraints restrict the space of allowable classifiers, potentially excluding the most accurate model.
The magnitude of the trade-off depends on several factors:
Researchers have studied the Pareto frontier of fairness-accuracy trade-offs, characterizing the best achievable accuracy for a given level of fairness and vice versa. In some settings, the trade-off is modest, and fair classifiers achieve accuracy close to unconstrained models. In other settings, especially when base rates differ substantially, the trade-off can be significant.
Fairness metrics are typically used in conjunction with bias mitigation algorithms, which can be categorized by where they intervene in the machine learning pipeline.
Pre-processing methods modify the training data before it is fed to the learning algorithm. The goal is to remove or reduce correlations between the protected attribute and the features or labels.
| Method | Description |
|---|---|
| Reweighting | Assigns sample weights to balance the representation of different groups and outcomes. The weight for each sample is calculated as the ratio of the expected probability under independence to the observed probability. |
| Disparate impact remover | Transforms feature values to reduce their correlation with the protected attribute while preserving their rank ordering within each group. |
| Learning fair representations | Learns a new feature representation that is independent of the protected attribute while retaining information about the outcome (Zemel et al., 2013). |
In-processing methods incorporate fairness constraints directly into the model's training objective.
| Method | Description |
|---|---|
| Adversarial debiasing | Trains a predictor and an adversary simultaneously. The predictor tries to predict the outcome, while the adversary tries to predict the protected attribute from the predictor's output. The predictor is penalized when the adversary succeeds. |
| Prejudice remover | Adds a regularization term to the loss function that penalizes dependence between the prediction and the protected attribute (Kamishima et al., 2012). |
| Constrained optimization | Formulates fairness as an explicit constraint in the optimization problem and uses methods from constrained optimization to find the best model that satisfies the constraint. |
| Meta-fair classifier | Uses a meta-learning approach to find classifiers that optimize a chosen fairness metric. |
Post-processing methods adjust the model's predictions after training to improve fairness.
| Method | Description |
|---|---|
| Equalized odds post-processing | Finds group-specific thresholds or randomization probabilities that satisfy equalized odds while maximizing accuracy (Hardt et al., 2016). |
| Calibrated equalized odds | Modifies predictions to satisfy a relaxed version of equalized odds that preserves calibration as much as possible. |
| Reject option classification | Gives favorable outcomes to unprivileged groups and unfavorable outcomes to privileged groups in the region near the decision boundary where the model is least certain (Kamiran et al., 2012). |
Several software libraries implement fairness metrics and bias mitigation algorithms:
| Tool | Developer | Language | Key features |
|---|---|---|---|
| AI Fairness 360 (AIF360) | IBM Research | Python, R | Over 70 fairness metrics, 11 bias mitigation algorithms spanning pre-processing, in-processing, and post-processing |
| Fairlearn | Microsoft | Python | Fairness assessment dashboard, mitigation algorithms including exponentiated gradient and threshold optimizer, regression support |
| What-If Tool | Google PAIR | JavaScript/Python | Interactive visualization for exploring model behavior, supports TensorFlow and XGBoost models, Jupyter integration |
| Aequitas | University of Chicago | Python | Bias audit toolkit focused on group fairness metrics for classification models |
| Responsible AI Toolbox | Microsoft | Python | Integrates fairness assessment with interpretability, error analysis, and causal inference |
Fairness metrics have taken on increased importance as governments introduce regulations governing automated decision-making.
The EEOC applies existing anti-discrimination laws (Title VII, the Age Discrimination in Employment Act, and the Americans with Disabilities Act) to automated decision-making tools, including those powered by machine learning. In May 2023, the EEOC issued technical guidance reaffirming that employers using AI tools in hiring must ensure those tools do not produce disparate impact against protected groups. The four-fifths rule remains a common initial screening tool, though researchers have noted that it is a rough heuristic and not a formal legal threshold for discrimination.
The EU AI Act, which entered into force on August 1, 2024, classifies AI systems used in employment, credit scoring, criminal justice, and other high-stakes domains as "high-risk." These systems must meet requirements for transparency, human oversight, and non-discrimination. Organizations deploying high-risk AI must conduct conformity assessments and maintain documentation demonstrating that their systems do not produce discriminatory outcomes. The Act's obligations for high-risk systems embedded in regulated products become fully applicable by August 2027.
Regulatory frameworks generally do not prescribe specific fairness metrics but instead require that systems avoid discriminatory outcomes. This leaves practitioners with the responsibility of choosing metrics appropriate to their context. The choice should be guided by the type of decision being made, the potential harms of errors, the presence of base rate differences, and consultation with affected communities.
Despite significant progress, fairness metrics face several unresolved challenges:
The following table lists influential papers that have shaped the fairness metrics field:
| Year | Authors | Contribution |
|---|---|---|
| 2012 | Dwork, Hardt, Pitassi, Reingold, Zemel | Introduced individual fairness via the Lipschitz condition ("Fairness Through Awareness") |
| 2016 | Hardt, Price, Srebro | Defined equalized odds and equal opportunity; proposed post-processing methods |
| 2016 | Kleinberg, Mullainathan, Raghavan | Proved impossibility of simultaneously satisfying calibration and balance conditions |
| 2016 | ProPublica (Angwin, Larson, Mattu, Kirchner) | Published "Machine Bias" analysis of COMPAS recidivism tool |
| 2017 | Chouldechova | Proved impossibility of satisfying predictive parity and error rate balance simultaneously |
| 2017 | Kusner, Loftus, Russell, Silva | Introduced counterfactual fairness using causal inference |
| 2018 | Kearns, Neel, Roth, Wu | Formalized subgroup fairness and fairness gerrymandering |
| 2018 | Verma and Rubin | Provided a comprehensive taxonomy of fairness definitions ("Fairness Definitions Explained") |
| 2018 | IBM Research | Released AI Fairness 360 open-source toolkit |
| 2020 | Fairlearn team (Microsoft) | Released Fairlearn open-source toolkit |