Counterfactual Fairness

Counterfactual fairness is a formal definition of algorithmic fairness rooted in causal inference. Introduced by Matt Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva in their 2017 paper presented at the Conference on Neural Information Processing Systems (NeurIPS), the concept defines a prediction as fair toward an individual if the prediction would remain the same in a counterfactual world where that individual belonged to a different demographic group. Unlike statistical fairness criteria such as demographic parity or equalized odds, counterfactual fairness explicitly models the causal mechanisms through which protected attributes influence outcomes, drawing on Judea Pearl's framework of structural causal models (SCMs).

The framework has become one of the most widely studied approaches to individual-level fairness in machine learning, with applications in hiring, lending, criminal justice, and healthcare. It has also sparked active debate about the relationship between causal and statistical fairness definitions.

Explain like I'm 5 (ELI5)

Imagine a teacher is picking students for the school spelling bee. The teacher should choose students based on how well they can spell, not based on whether they are a boy or a girl. Counterfactual fairness asks a simple question: if we could magically change one thing about a student (like whether they are a boy or a girl) but keep everything else the same (how much they practiced, how many words they know), would the teacher still make the same choice? If the answer is yes, the decision is fair. If the answer is no, then the decision is being influenced by something it should not depend on.

Background and motivation

Machine learning systems are increasingly used to make high-stakes decisions that affect people's lives, including loan approvals, hiring decisions, bail and sentencing recommendations, and medical diagnoses. These systems learn from historical data that often reflects decades or centuries of systemic discrimination. A hiring model trained on past hiring decisions, for example, may learn to penalize applicants from underrepresented groups simply because those groups were historically hired at lower rates, regardless of their actual qualifications.

Early approaches to algorithmic fairness focused on statistical definitions. Demographic parity requires that the proportion of positive outcomes be equal across groups. Equalized odds, introduced by Hardt, Price, and Srebro in 2016, requires that a classifier have equal true positive and false positive rates across groups. Individual fairness, proposed by Dwork et al. in 2012, requires that similar individuals receive similar predictions, but leaves the definition of "similar" as a task-specific design choice.

These definitions, while useful, do not capture the causal pathways through which protected attributes influence predictions. A model that achieves demographic parity might still rely on proxies for race or gender. Equalized odds conditions on the true outcome, which may itself be tainted by historical bias. Individual fairness requires a similarity metric that can be difficult to specify in practice.

Kusner et al. argued that fairness is inherently a causal concept. To determine whether a decision is fair, one must ask: would this decision have been different if the individual had belonged to a different group? Answering this question requires reasoning about counterfactuals, which in turn requires a causal model of how variables relate to one another.

Structural causal models

Counterfactual fairness is defined within the framework of structural causal models (SCMs), a formalism developed by Judea Pearl. An SCM is a mathematical object that represents the causal relationships among a set of variables. Formally, an SCM is a triple (U, V, F) consisting of three components.

Exogenous variables (U) are variables whose values are determined by factors outside the model. They represent unobserved background conditions, individual characteristics, or sources of randomness. In the fairness setting, exogenous variables often represent innate traits or abilities that are independent of a person's membership in a protected group.

Endogenous variables (V) are variables whose values are determined by other variables in the model through structural equations. In the fairness context, endogenous variables include the protected attribute A (such as race or gender), observed features X (such as test scores or work experience), and the outcome Y (such as whether someone is hired).

Structural equations (F) define how each endogenous variable is determined by its parents in the causal graph and by relevant exogenous variables. Each equation takes the form V_i = f_i(pa_i, U_pa_i), where pa_i denotes the direct causes (parents) of V_i in the causal graph.

The structural equations induce a directed acyclic graph (DAG) that represents the causal relationships among variables. Arrows in the graph point from causes to effects. The DAG makes explicit which variables are causally affected by the protected attribute, either directly or through intermediate variables.

Counterfactual reasoning in SCMs

A central feature of SCMs is their ability to answer counterfactual questions. Counterfactual reasoning in the Pearl framework follows a three-step procedure known as abduction-action-prediction.

Step	Name	Description
1	Abduction	Given the observed evidence (the actual values of all variables for an individual), update the probability distribution over the exogenous variables U. This step infers what the individual's background characteristics must be, given what we observe about them.
2	Action	Modify the structural equations by intervening on the variable of interest. In the fairness setting, this means setting the protected attribute A to a different value (for example, changing race from one group to another).
3	Prediction	Using the modified model with the updated exogenous variables, compute the values of all other variables under the intervention. This produces the counterfactual outcome: what would have happened to this specific individual if their protected attribute had been different.

This three-step procedure distinguishes counterfactual queries from interventional queries. An intervention asks what would happen to a randomly selected individual if we changed their group membership. A counterfactual asks what would have happened to a specific observed individual, taking into account everything we know about them.

Formal definition

The formal definition of counterfactual fairness, as stated by Kusner et al. (2017), is as follows.

A predictor Y-hat is counterfactually fair if, for any individual with observed features X = x and protected attribute A = a:

P(Y-hat_{A <- a}(U) = y | X = x, A = a) = P(Y-hat_{A <- a'}(U) = y | X = x, A = a)

for all y and for any value a' that A could take.

In plain language, this definition states that the distribution of predictions for an individual must be identical whether we consider the actual world (where A = a) or a counterfactual world (where A = a') while holding fixed the individual's background characteristics U. The conditioning on X = x and A = a reflects our knowledge of the individual from the observed data. The subscript A <- a' denotes the counterfactual operation of setting A to a'.

The definition operates at the individual level, not the group level. It does not require that aggregate statistics (such as acceptance rates) be equal across groups. Instead, it requires that each individual person would receive the same prediction regardless of which group they belong to, after accounting for the causal structure of the data.

Key lemma

Kusner et al. proved a result that simplifies the implementation of counterfactual fairness in practice.

Lemma 1: A predictor Y-hat is counterfactually fair if it is a function only of the non-descendants of the protected attribute A in the causal graph.

This lemma provides a practical criterion: if the predictor uses only variables that are not causally affected by the protected attribute (either directly or indirectly), then it is automatically counterfactually fair. Variables that are descendants of A in the causal graph carry information that is causally influenced by group membership. Using such variables in a predictor can transmit the effect of the protected attribute into the prediction, even if the protected attribute is not used directly.

Three levels of implementation

Kusner et al. described three levels at which counterfactual fairness can be implemented, with each level requiring progressively stronger causal assumptions but also allowing the use of more information for prediction.

Level	Approach	Causal assumptions	Information used	Practical notes
Level 1	Use only observable non-descendants of A	Minimal: only requires knowledge of which observed variables are not descendants of A	Only observed variables that are causally unaffected by A	Simple to implement, but in many real-world problems most observed variables are descendants of protected attributes, leaving few usable features
Level 2	Infer latent variables U from observed data using domain knowledge	Moderate: requires specifying a causal model with latent variables and learning conditional distributions P(U given X, A)	Latent "fair" variables inferred from the data, which capture individual characteristics independent of group membership	Requires explicit domain knowledge about the causal structure; latent variables act as debiased versions of observed features
Level 3	Specify fully deterministic structural equations with additive error terms	Strong: requires a complete specification of structural equations, typically as additive noise models V_i = f_i(pa_i) + e_i	Error terms (residuals) that are independent of A by construction, capturing individual-specific variation not attributable to group membership	Maximizes the amount of information available for prediction; the error terms serve as counterfactually fair inputs to the predictor

Level 3 extracts the most information from the data while maintaining counterfactual fairness, but it depends on the correctness of the assumed structural equations. If the causal model is misspecified, the resulting predictor may not be truly fair.

The law school example

Kusner et al. demonstrated their framework using a dataset from the Law School Admission Council (LSAC), which contains records of 21,790 law students across 163 law schools in the United States. The data was originally collected for the LSAC National Longitudinal Bar Passage Study (Wightman, 1998).

Variables

The dataset includes the following key variables.

Variable	Description	Role in the causal model
Race (R)	Student's race	Protected attribute
Sex (S)	Student's sex	Protected attribute
LSAT	Law School Admission Test score	Observed feature (descendant of A)
GPA	Undergraduate grade point average	Observed feature (descendant of A)
FYA	First-year law school average grade	Outcome variable (target)
Knowledge (K)	Latent variable representing a student's underlying knowledge and ability	Exogenous variable (not directly observed)

Causal structure

The causal model posits that a student's latent knowledge K, together with their race and sex, causally determines their observable test scores (LSAT, GPA) and their law school performance (FYA). Race and sex affect LSAT, GPA, and FYA both through their influence on K (for example, through differential access to educational resources) and through direct effects (for example, through stereotype threat or grading bias).

The structural equations in the model are:

GPA follows a normal distribution with mean determined by K, R, and S, plus individual-specific noise
LSAT follows a Poisson distribution with rate determined by K, R, and S
FYA follows a normal distribution with mean determined by K, R, and S, plus individual-specific noise

The key insight is that using raw LSAT and GPA scores to predict FYA is not counterfactually fair, because these scores are descendants of race and sex in the causal graph. They carry information about the causal effect of protected attributes on test performance.

Results

Kusner et al. compared four approaches.

Model	Description	RMSE	Counterfactually fair?
Full	Uses all variables including race and sex	0.873	No
Unaware	Uses LSAT and GPA but not race and sex directly	0.894	No (LSAT and GPA are descendants of A)
Fair K (Level 2)	Infers latent knowledge K and uses it for prediction	0.929	Yes
Fair Add (Level 3)	Uses additive error terms from structural equations	0.918	Yes

The fair models achieve counterfactual fairness with a modest increase in prediction error. The "Unaware" model, which simply removes the protected attributes from the input, is not counterfactually fair because it still uses features (LSAT and GPA) that are causally influenced by race and sex. This illustrates why fairness through unawareness (simply removing sensitive attributes) is generally insufficient.

Relationship to other fairness definitions

Counterfactual fairness occupies a specific position in the broader landscape of fairness definitions. The following table compares it with other commonly studied criteria.

Fairness definition	Type	Key requirement	Uses causal model?	Limitations
Demographic parity	Group	Equal positive prediction rates across groups	No	May require different thresholds for different groups; ignores differences in base rates
Equalized odds	Group	Equal true positive and false positive rates across groups	No	Conditions on the true label, which may itself be biased
Individual fairness	Individual	Similar individuals receive similar predictions	No (requires a similarity metric)	Defining the similarity metric is challenging and task-specific
Counterfactual fairness	Individual (causal)	Prediction unchanged under counterfactual change of protected attribute	Yes	Requires specification and correctness of a causal model
Calibration	Group	Among individuals assigned probability p, the fraction of positives is p, regardless of group	No	Can be satisfied by a biased model if base rates differ
Path-specific counterfactual fairness	Individual (causal)	Only effects through "unfair" causal pathways are blocked	Yes	Requires distinguishing fair from unfair causal pathways

The demographic parity debate

In 2023, Rosenblatt and Witter published a paper titled "Counterfactual Fairness Is Basically Demographic Parity" at the AAAI Conference on Artificial Intelligence. They argued that any algorithm satisfying counterfactual fairness also satisfies demographic parity, and that any algorithm satisfying demographic parity can be trivially modified to satisfy counterfactual fairness. Their empirical analysis found that simple benchmark algorithms outperformed existing counterfactual fairness algorithms in terms of fairness, accuracy, and efficiency on several datasets.

In 2024, Ricardo Silva (one of the original authors of the counterfactual fairness paper) published a rebuttal titled "Counterfactual Fairness Is Not Demographic Parity, and Other Observations." Silva argued that the equivalence claim does not hold under careful examination and cautioned against making blanket statements of equivalence between causal concepts and purely probabilistic concepts. The debate remains active and highlights the subtle relationship between causal and statistical notions of fairness.

Path-specific counterfactual fairness

A notable extension of counterfactual fairness is path-specific counterfactual fairness (PC fairness), introduced by Chiappa in 2019. Standard counterfactual fairness requires that the prediction be invariant to any change in the protected attribute. However, in some settings certain causal pathways from the protected attribute to the outcome may be considered acceptable, while others are not.

For example, consider a university admissions decision where the protected attribute is socioeconomic status. The effect of socioeconomic status on admission through educational quality (attending under-resourced schools) might be considered unfair, while the effect through genuine effort and talent development might be considered acceptable. Path-specific counterfactual fairness allows practitioners to specify which causal pathways are "fair" and which are "unfair," and requires invariance only along the unfair pathways.

This extension provides greater flexibility but also introduces additional complexity, since the practitioner must make normative judgments about which pathways constitute fair versus unfair influence. It subsumes several other causal fairness notions as special cases.

Applications

Counterfactual fairness has been studied in several application domains where algorithmic decisions have significant consequences for individuals.

Criminal justice

Recidivism prediction tools such as the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) system have faced criticism for racial bias. ProPublica's 2016 investigation found that Black defendants were significantly more likely than white defendants to be incorrectly classified as high risk for reoffending. Counterfactual fairness provides a framework for evaluating whether a risk assessment tool's prediction for a defendant would change if that defendant's race were counterfactually altered while their background circumstances remained the same. Applying counterfactual fairness to such systems requires a causal model of how race, socioeconomic factors, prior criminal history, and recidivism are related.

Lending and credit

Fair lending laws in the United States (such as the Equal Credit Opportunity Act and the Fair Housing Act) prohibit discrimination in credit decisions on the basis of race, sex, religion, and other protected characteristics. Machine learning models used for credit scoring may inadvertently discriminate through proxy variables such as zip code, which correlate with race due to historical patterns of residential segregation. A counterfactually fair credit scoring model would produce the same credit decision for an applicant regardless of their race, after accounting for the causal relationships between race, socioeconomic factors, and creditworthiness indicators.

Hiring and employment

Automated resume screening and hiring recommendation systems may perpetuate historical biases present in training data. Counterfactual fairness in this context asks whether a hiring decision would remain the same if a candidate's gender (or other protected attribute) were changed while their skills, qualifications, and experience remained causally the same. This requires distinguishing between qualifications that are genuinely job-relevant and those that are artifacts of historical discrimination.

Healthcare

Clinical decision support systems that use patient data may produce different recommendations for patients of different racial or ethnic groups, even when clinical indicators are similar. Counterfactual fairness can be applied to ensure that treatment recommendations are based on clinical need rather than on features that are causally influenced by race or ethnicity. Obermeyer et al. (2019) documented a widely used healthcare algorithm that exhibited racial bias in identifying patients who need extra care, underscoring the importance of fairness-aware approaches in medical AI.

Advantages

Counterfactual fairness offers several properties that distinguish it from purely statistical fairness definitions.

Causal grounding. By requiring a causal model, counterfactual fairness forces practitioners to make their assumptions about the data-generating process explicit. This transparency can reveal hidden sources of bias that statistical approaches might miss.

Individual-level fairness. The definition operates at the level of individual predictions rather than group-level statistics. This means it can detect unfairness even when aggregate statistics appear balanced.

Handling proxy discrimination. Because counterfactual fairness traces causal pathways, it can identify cases where a model discriminates through proxy variables (such as zip code as a proxy for race), even when the protected attribute is not directly used as an input.

Principled treatment of descendants. The framework provides a clear criterion for which variables can be safely used in prediction (non-descendants of A) and which carry the causal influence of the protected attribute.

Limitations and challenges

Despite its theoretical appeal, counterfactual fairness faces several practical challenges.

Causal model specification. The definition requires a structural causal model, but the true causal model is rarely known. Constructing a causal model requires domain expertise and involves assumptions that may be contested. Different stakeholders may disagree about the correct causal graph, and there is no purely data-driven way to resolve such disagreements.

Identifiability. Counterfactual quantities are not always identifiable from observational data alone. In some settings, the causal effect of the protected attribute on the outcome cannot be uniquely determined without additional assumptions (such as the absence of hidden confounders) or experimental data.

Model misspecification. If the assumed causal model is incorrect, a predictor that appears counterfactually fair under the assumed model may not be fair under the true causal model. Sensitivity analysis methods can help assess how robust conclusions are to model misspecification, but they do not eliminate the problem.

Scalability. Computing counterfactuals in complex causal models with many variables can be computationally expensive. For large-scale machine learning systems with hundreds or thousands of features, constructing and fitting a full structural causal model may be impractical.

Few non-descendant features. In many real-world problems, most observed features are descendants of protected attributes. Race and gender, for example, causally influence education, income, neighborhood, health outcomes, and many other commonly used features. At Level 1 (using only non-descendant features), this may leave very few usable features for prediction.

Ontological questions about protected attributes. The framework treats protected attributes as variables that can be "set" to different values in a counterfactual world. Some scholars have questioned whether it is coherent to ask what would have happened if a person had been a different race, given that race is deeply intertwined with life experience, identity, and social context.

Hidden confounders. Unobserved confounders that affect both the protected attribute and the outcome can lead to incorrect counterfactual estimates. If important variables are omitted from the causal model, the resulting fairness guarantees may be unreliable.

Software and tools

Several software packages and toolkits support the implementation of causal fairness methods, including counterfactual fairness.

Tool	Language	Description
DoWhy	Python	A library for causal inference that includes support for counterfactual fairness evaluation, developed by Microsoft Research and now part of the PyWhy ecosystem
faircause	R	An R package implementing the causal fairness analysis framework of Plecko and Bareinboim (2024), including methods for decomposing observed disparities into causal components
AI Fairness 360 (AIF360)	Python, R	An extensible toolkit developed by IBM Research for detecting and mitigating algorithmic bias, including multiple fairness metrics and bias mitigation algorithms
Fairlearn	Python	A Microsoft toolkit for assessing and improving fairness of machine learning models, supporting multiple fairness definitions and mitigation techniques

Recent developments

Research on counterfactual fairness has expanded in several directions since the original 2017 paper.

Deep learning extensions. Researchers have developed methods for enforcing counterfactual fairness in deep neural networks. The Generative Counterfactual Fairness Network (GCFN) uses generative adversarial networks (GANs) to learn the counterfactual distribution of descendants of the protected attribute and enforces fair predictions through a counterfactual mediator regularization term.

Graph neural networks. Counterfactual fairness has been extended to graph neural networks (GNNs), where the graph structure itself can encode discriminatory patterns. Methods such as contrastive learning on counterfactual graph augmentations aim to produce fair node representations that are invariant to changes in sensitive attributes.

Counterfactual fairness with imperfect models. Recognizing that perfect causal models are rarely available, researchers have developed methods for achieving approximate counterfactual fairness when the structural causal model is imperfect or partially specified. These methods aim to provide fairness guarantees that degrade gracefully with the degree of model misspecification.

Partial identification. When counterfactual quantities are not point-identified from observational data, partial identification approaches compute bounds on the degree of counterfactual unfairness. This allows practitioners to assess fairness even when the causal model does not fully determine counterfactual outcomes.

Combining factual and counterfactual predictions. A 2024 NeurIPS paper explored methods for combining factual predictions (based on observed data) with counterfactual predictions (based on counterfactual reasoning) to achieve a balance between predictive accuracy and counterfactual fairness.

Comprehensive causal fairness frameworks. Plecko and Bareinboim (2024) published a comprehensive monograph on causal fairness analysis in Foundations and Trends in Machine Learning. Their framework introduces the Fairness Map, which links observed disparities to underlying causal mechanisms and provides a unified toolkit for causal fairness analysis.

References

Kusner, M. J., Loftus, J. R., Russell, C., & Silva, R. (2017). Counterfactual Fairness. *Advances in Neural Information Processing Systems*, 30, 4066-4076.
Pearl, J. (2009). *Causality: Models, Reasoning, and Inference* (2nd ed.). Cambridge University Press.
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012). Fairness through Awareness. *Proceedings of the 3rd Innovations in Theoretical Computer Science Conference*, 214-226.
Hardt, M., Price, E., & Srebro, N. (2016). Equality of Opportunity in Supervised Learning. *Advances in Neural Information Processing Systems*, 29, 3315-3323.
Rosenblatt, L., & Witter, R. T. (2023). Counterfactual Fairness Is Basically Demographic Parity. *Proceedings of the AAAI Conference on Artificial Intelligence*, 37(12).
Silva, R. (2024). Counterfactual Fairness Is Not Demographic Parity, and Other Observations. *arXiv preprint arXiv:2402.02663*.
Chiappa, S. (2019). Path-Specific Counterfactual Fairness. *Proceedings of the AAAI Conference on Artificial Intelligence*, 33(1), 7801-7808.
Plecko, D., & Bareinboim, E. (2024). Causal Fairness Analysis: A Causal Toolkit for Fair Machine Learning. *Foundations and Trends in Machine Learning*, 17(3), 304-542.
Wightman, L. F. (1998). LSAC National Longitudinal Bar Passage Study. Law School Admission Council.
Bellamy, R. K. E., et al. (2019). AI Fairness 360: An Extensible Toolkit for Detecting and Mitigating Algorithmic Bias. *IBM Journal of Research and Development*, 63(4/5), 4:1-4:15.
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations. *Science*, 366(6464), 447-453.
Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine Bias. *ProPublica*.
Kilbertus, N., Rojas-Carulla, M., Parascandolo, G., Hardt, M., Janzing, D., & Scholkopf, B. (2017). Avoiding Discrimination through Causal Reasoning. *Advances in Neural Information Processing Systems*, 30.
Pearl, J., Glymour, M., & Jewell, N. P. (2016). *Causal Inference in Statistics: A Primer*. Wiley.
Kleinberg, J., Mullainathan, S., & Raghavan, M. (2017). Inherent Trade-Offs in the Fair Determination of Risk Scores. *Proceedings of the 8th Innovations in Theoretical Computer Science Conference (ITCS)*.

Explain like I'm 5 (ELI5)

Background and motivation

Structural causal models

Counterfactual reasoning in SCMs

Formal definition

Key lemma

Three levels of implementation

The law school example

Variables

Causal structure

Results

Relationship to other fairness definitions

The demographic parity debate

Path-specific counterfactual fairness

Applications

Criminal justice

Lending and credit

Hiring and employment

Healthcare

Advantages

Limitations and challenges

Software and tools

Recent developments

See also

References

Improve this article

Related Articles

ARC-AGI 2

Uplift Modeling

Disparate Impact

Disparate Treatment

Demographic Parity

Equalized Odds

Explain like I'm 5 (ELI5)

Background and motivation

Structural causal models

Counterfactual reasoning in SCMs

Formal definition

Key lemma

Three levels of implementation

The law school example

Variables

Causal structure

Results

Relationship to other fairness definitions

The demographic parity debate

Path-specific counterfactual fairness

Applications

Criminal justice

Lending and credit

Hiring and employment

Healthcare

Advantages

Limitations and challenges

Software and tools

Recent developments

See also

References

Related Articles

ARC-AGI 2

Uplift Modeling

Disparate Impact

Disparate Treatment

Demographic Parity

Equalized Odds