Algorithmic fairness
Last reviewed
Apr 30, 2026
Sources
51 citations
Review status
Source-backed
Revision
v2 ยท 4,139 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Apr 30, 2026
Sources
51 citations
Review status
Source-backed
Revision
v2 ยท 4,139 words
Add missing citations, update stale details, or suggest a clearer explanation.
Algorithmic fairness (also called fairness in machine learning, ML fairness, or AI fairness) is the study of how automated decision systems can produce decisions that are equitable across protected attributes such as race, gender, age, religion, and disability. A subfield of AI ethics, it combines computer science, law, philosophy, statistics, and the social sciences. Researchers develop mathematical definitions of fairness, audit deployed systems for algorithmic bias, design mitigation techniques, and analyze the social and legal context in which decisions operate [1][2].
The field grew rapidly after audits in the mid-2010s exposed disparities in tools used for criminal sentencing, hiring, lending, healthcare, and face analysis. By the early 2020s, fairness had become a recognized requirement in major regulatory frameworks including the EU AI Act and NYC Local Law 144 [3][4]. Many central questions, including which fairness definition to apply, whether sensitive attributes should be collected, and how to weigh trade-offs against accuracy, remain contested.
The legal roots of algorithmic fairness predate machine learning. In the United States, the doctrine of disparate impact was articulated in Griggs v. Duke Power Co. (1971), which held that facially neutral employment practices that disproportionately exclude protected groups can be unlawful unless justified by business necessity [5]. Title VII of the Civil Rights Act of 1964, the Equal Credit Opportunity Act of 1974, and the Fair Housing Act of 1968 form the backbone of US anti-discrimination law. In the EU, the Racial Equality Directive (2000/43/EC), the Employment Equality Directive (2000/78/EC), and Article 21 of the Charter of Fundamental Rights cover similar ground.
The COMPAS controversy began in May 2016 when ProPublica published "Machine Bias" by Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. The investigation analyzed risk scores produced by Correctional Offender Management Profiling for Alternative Sanctions, sold by Northpointe (now Equivant) and used by judges in several US states. ProPublica reported that Black defendants who did not reoffend were almost twice as likely to be misclassified as high risk than white non-reoffenders, while white reoffenders were more often misclassified as low risk [6]. Northpointe responded that COMPAS was calibrated within groups: a given score corresponded to the same probability of reoffense for Black and white defendants [7]. The dispute was later shown to reflect a mathematical impossibility rather than an error on either side.
The Amazon recruiting tool was an internal experiment to score job applicants. Reuters reported in October 2018 that Amazon scrapped the tool around 2017 after engineers discovered it penalized resumes containing the word "women's" and downgraded graduates of two all-women's colleges. The model had been trained on a decade of resumes submitted to Amazon, predominantly from men [8].
Gender Shades, a 2018 audit by Joy Buolamwini and Timnit Gebru presented at FAccT, evaluated commercial gender classification systems from IBM, Microsoft, and Face++ on a curated set of 1,270 face images. Error rates were under 1% for lighter-skinned men but as high as 34.7% for darker-skinned women in one system [9]. The study became a touchstone for intersectional analysis, and IBM exited the face recognition business in 2020.
In October 2019, Obermeyer, Powers, Vogeli, and Mullainathan published a study in Science showing that a widely used commercial healthcare risk algorithm assigned similar risk scores to Black and white patients with different levels of underlying illness. The system used past healthcare costs as a proxy for need; because Black patients historically received less care, costs were a biased proxy. Correcting it would more than double the proportion of Black patients identified for high-risk care management [10].
The Apple Card credit limit dispute began in November 2019 when entrepreneur David Heinemeier Hansson posted that he had received a credit limit roughly 20 times higher than his wife despite shared finances. The New York State Department of Financial Services investigated Goldman Sachs and concluded in March 2021 that no violation of New York fair lending law had been substantiated, while noting the case raised questions about explainability [11]. The Twitter image cropping algorithm came under scrutiny in 2020 when users observed that the saliency model appeared to favor lighter-skinned and female faces; Twitter audited internally and changed the product to display full images on mobile timelines in 2021 [12]. The NIST Face Recognition Vendor Test demographic effects study (December 2019) evaluated 189 algorithms from 99 developers and reported many systems showing higher false match rates for African and East Asian faces relative to Eastern European faces [13].
Suresh and Guttag (2021) and Mehrabi et al. (2021) provide widely cited taxonomies [14][2].
Historical bias arises when training data accurately reflects past discrimination. A model predicting who has historically been promoted mirrors past managerial preferences, including unlawful patterns. Representation bias occurs when the training population underrepresents some groups; ImageNet and IJB-A were criticized for skewing toward lighter-skinned subjects, and the Pilot Parliaments Benchmark used in Gender Shades was constructed to balance representation [9]. Measurement bias appears when features or labels are differential proxies for the construct of interest. Using arrest rates as a proxy for crime encodes disparities in policing; using healthcare cost as a proxy for need encodes disparities in access, as in Obermeyer et al. [10]. Aggregation bias results from fitting one model to a heterogeneous population. Evaluation bias occurs when benchmarks do not represent the deployment population; deployment bias occurs when a model is used in a context for which it was not designed. Feedback loops arise when predictions influence future training data. Lum and Isaac (2016) showed that predictive policing tools directing patrols to historically over-policed neighborhoods can generate more arrests there, reinforcing the pattern when the model is retrained [15].
Using notation A for a sensitive attribute, Y for the true outcome, hat-Y for a binary prediction, and S for a continuous score, researchers have proposed many statistical fairness criteria.
| Criterion | Statistical statement | Intuition | Key reference |
|---|---|---|---|
| Demographic parity (statistical parity) | P(hat-Y=1 | A=a) = P(hat-Y=1 | A=b) | Each group receives positive predictions at the same rate | Dwork et al. 2012 [1] |
| Conditional demographic parity | P(hat-Y=1 | A=a, L=l) = P(hat-Y=1 | A=b, L=l) | Equal positive rates after conditioning on legitimate features L | Kamiran et al. 2013 [16] |
| Equalized odds | P(hat-Y=1 | Y=y, A=a) = P(hat-Y=1 | Y=y, A=b) for y in {0,1} | Equal true positive and false positive rates across groups | Hardt, Price, Srebro 2016 [17] |
| Equality of opportunity | P(hat-Y=1 | Y=1, A=a) = P(hat-Y=1 | Y=1, A=b) | Among true positives, equal selection rates across groups | Hardt, Price, Srebro 2016 [17] |
| Predictive parity (calibration within groups) | P(Y=1 | S=s, A=a) = P(Y=1 | S=s, A=b) | Score s means the same probability regardless of group | Chouldechova 2017 [18] |
| Counterfactual fairness | hat-Y(A=a, U) = hat-Y(A=b, U) | Prediction is unchanged in a counterfactual world where the sensitive attribute is altered | Kusner et al. 2017 [19] |
| Individual fairness | If d(x, x') is small, then |hat-Y(x) - hat-Y(x')| is small | Similar individuals receive similar predictions, given a task-specific metric | Dwork et al. 2012 [1] |
| Treatment equality | FN_a / FP_a = FN_b / FP_b | Ratio of false negatives to false positives is equal across groups | Berk et al. 2018 [20] |
Demographic parity is among the oldest formalizations and roughly corresponds to disparate impact. It is sometimes operationalized using the four-fifths rule from US EEOC guidance, which treats a selection rate ratio below 0.8 as evidence of adverse impact. Critics note it can require accepting weaker candidates in one group purely to equalize rates.
Equalized odds and equality of opportunity, introduced by Hardt, Price, and Srebro in 2016, condition on the true outcome and so do not require equal positive rates when base rates differ [17]. Calibration within groups was the criterion satisfied by COMPAS as analyzed by Northpointe and by Corbett-Davies, Pierson, Feller, Goel, and Huq (2017): a given score corresponds to the same risk regardless of group [18][21]. Counterfactual fairness, proposed by Kusner, Loftus, Russell, and Silva in 2017, formalizes fairness using structural causal models [19]. Individual fairness, introduced by Dwork et al. in 2012, requires that similar individuals receive similar predictions; the challenge is choosing the similarity metric, which encodes much of the normative content of the judgment [1].
A central result is that several intuitive fairness criteria cannot be satisfied simultaneously when base rates differ across groups. This was shown independently by Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan in their 2016 paper "Inherent Trade-Offs in the Fair Determination of Risk Scores" and by Alexandra Chouldechova in her 2017 paper "Fair Prediction with Disparate Impact" [22][18].
Kleinberg, Mullainathan, and Raghavan formalized three properties: calibration within groups, balance for the positive class (equal average score among Y=1 in each group), and balance for the negative class (equal average score among Y=0 in each group). They proved these conditions cannot all hold unless the predictor is perfect or base rates are equal across groups. Chouldechova showed an analogous result for binary classifiers: predictive parity and equal false positive and false negative rates cannot all hold simultaneously when base rates differ, outside the trivial cases.
The COMPAS dispute is the canonical illustration. Black and white defendants in Broward County had different observed two-year recidivism rates, so no non-perfect classifier could satisfy both the calibration property emphasized by Northpointe and the equal error-rate property emphasized by ProPublica. Both critiques were mathematically correct; the disagreement was about which criterion mattered more, not about the data. Pleiss et al. (2017) extended the result by showing calibration is generally incompatible with even relaxed equalized-odds constraints [23]. Fairness cannot be reduced to a single metric: every deployed system embeds a choice about which errors are acceptable for which groups, and that choice has moral and legal content mathematics alone cannot settle.
Methods to reduce unfairness are typically grouped by where in the pipeline they intervene.
| Stage | Method | Mechanism | Reference |
|---|---|---|---|
| Pre-processing | Reweighting | Weight examples so group-conditional prevalence matches | Kamiran & Calders 2012 [24] |
| Pre-processing | Disparate impact remover | Modify features so they are not predictive of group | Feldman et al. 2015 [25] |
| Pre-processing | Fair representation learning | Learn embedding obscuring protected attribute while preserving task signal | Zemel et al. 2013 [26] |
| Pre-processing | Optimized preprocessing | Transform features and labels under fairness and utility constraints | Calmon et al. 2017 [27] |
| In-processing | Constrained optimization | Train under explicit fairness constraints | Zafar et al. 2017 [28] |
| In-processing | Adversarial debiasing | Train predictor jointly with adversary recovering sensitive attribute | Zhang et al. 2018 [29] |
| In-processing | Fair regularization | Add regularization penalizing disparity | Kamishima et al. 2012 [30] |
| In-processing | Reductions | Reduce fair classification to cost-sensitive problems | Agarwal et al. 2018 [31] |
| Post-processing | Group-specific thresholds | Choose different decision thresholds per group | Hardt et al. 2016 [17] |
| Post-processing | Reject option | Reassign predictions near boundary in favor of disadvantaged group | Kamiran et al. 2012 [32] |
| Post-processing | Calibrated equalized odds | Trade calibration for equalized odds via randomized post-processing | Pleiss et al. 2017 [23] |
Pre-processing methods modify training data so downstream learners are less likely to encode unfair patterns; they are model-agnostic but may distort genuinely predictive signal. In-processing methods build fairness into training itself; adversarial debiasing treats fairness as a min-max problem [29], and the reductions approach reduces fair classification to a sequence of weighted classification problems with theoretical guarantees [31]. Post-processing methods adjust an already trained predictor; they can satisfy specific criteria exactly but typically require access to the sensitive attribute at decision time. Friedler et al. (2019) reported that mitigation effects depend strongly on dataset, base rate, and choice of fairness metric [33].
Several open-source toolkits implement fairness metrics and mitigation methods. AIF360 was released by IBM in 2018 with Python and R interfaces to dozens of metrics and mitigation algorithms [34]. Fairlearn was released by Microsoft in 2018 and is now an independent project; it implements the reductions approach, post-processing including ThresholdOptimizer, and a trade-off dashboard [35]. The What-If Tool from Google PAIR (2018) integrates with TensorBoard. Aequitas, from CMU's Center for Data Science and Public Policy, targets policymakers and journalists.
Common benchmarks include the UCI Adult Income dataset (a 1994 census extract), the COMPAS dataset released by ProPublica, German Credit, MIMIC-III for clinical applications, the Pilot Parliaments Benchmark from Gender Shades, and CelebA. Critiques of Adult by Ding, Hardt, Miller, and Schmidt (2021) led to Folktables, drawn from the American Community Survey [36].
Governments have begun translating fairness research into binding rules.
The EU AI Act was politically agreed in December 2023 and entered into force on August 1, 2024. Provisions phase in over the following years, with prohibitions applying after six months and high-risk system requirements applying twenty-four months after entry. The Act categorizes systems by risk and imposes obligations on high-risk systems, including those in employment, education, credit scoring, law enforcement, and the administration of justice. High-risk systems must undergo conformity assessment addressing data quality, documentation, transparency, oversight, accuracy, robustness, and bias. Article 10 requires training, validation, and testing datasets be examined for biases that may lead to discrimination [3]. The EU General Data Protection Regulation (GDPR), in force since May 2018, regulates automated decision-making in Article 22, granting data subjects a right against decisions based solely on automated processing with legal or similarly significant effects; Recital 71 mentions preventing discriminatory effects [37].
In the US, the Equal Employment Opportunity Commission issued guidance in May 2022 and 2023 explaining how the Americans with Disabilities Act and Title VII apply to employer use of AI tools, including testing for adverse impact [38]. NYC Local Law 144 took effect on January 1, 2023, with enforcement from July 5, 2023; it requires employers using automated employment decision tools for New York City residents to commission an annual independent bias audit, publish a summary, and notify candidates [4]. The Federal Trade Commission issued a 2021 business guidance post warning that biased algorithms in consumer decisions can violate the Equal Credit Opportunity Act and Section 5 of the FTC Act [39]. The NIST AI Risk Management Framework (AI RMF 1.0) released January 26, 2023 is voluntary but widely referenced and includes fairness among the characteristics of trustworthy AI [40]. The Algorithmic Accountability Act has been proposed since 2019 but as of early 2026 has not been enacted. California AB 2930 was introduced in 2024 to regulate automated decision tools, and Colorado's SB 24-205, enacted in May 2024, applies developer and deployer obligations to high-risk AI systems from February 2026.
In "Fairness and Abstraction in Sociotechnical Systems" (2019), Andrew Selbst, danah boyd, Sorelle Friedler, Suresh Venkatasubramanian, and Janet Vertesi argue that fair machine learning research often abstracts away from the social context in which a system is embedded, treating fairness as a property of an algorithm rather than of a sociotechnical assemblage including data collection, deployment, oversight, and contestation [41]. They identify "abstraction traps," including framing, portability, and solutionism traps.
A related debate concerns process-based versus outcome-based fairness. Process-based accounts emphasize equal treatment, transparency, and the right to explanation; outcome-based accounts measure disparities in results. The two can come apart: a process that treats individuals identically can still produce uneven outcomes if relevant features are unequally distributed. The distinction between disparate treatment and disparate impact is closely related. US anti-discrimination law generally permits disparate impact analysis but treats explicit use of protected attributes with greater suspicion. Mitigation techniques that adjust thresholds by group raise questions under the disparate treatment doctrine, especially after Students for Fair Admissions v. Harvard (2023).
Trade-offs with accuracy are contested: some papers argue fairness and accuracy can improve together when unfairness stems from poor data quality, while others find a clearer trade-off when criteria are imposed strictly across multiple groups. Fairness versus privacy is a structural tension: measuring fairness across protected attributes typically requires collecting them, which conflicts with privacy norms and laws limiting data collection. Methods using proxy sensitive attributes raise their own concerns. Andrus, Spitzer, Brown, and Xiang (2021) document the challenges firms face when they cannot or will not collect race and gender data [42].
Intersectionality is a sustained concern. Many analyses report metrics by single attributes and miss patterns affecting specific subgroups; Gender Shades centered intersectionality by reporting error rates separately for darker-skinned women, lighter-skinned women, darker-skinned men, and lighter-skinned men [9]. Long-term effects of fairness interventions are a newer area. Liu, Dean, Rolf, Simchowitz, and Hardt (2018) showed that criteria optimized at one point in time can worsen disparities for the disadvantaged group when decisions affect future qualifications [43]. Mouzannar, Ohannessian, and Srebro (2019) reported similar dynamics in education and lending [44]. Hu and Kohler-Hausmann argue that race in particular cannot be cleanly separated from other features for causal counterfactuals [45].
Researchers who have shaped the field include Cynthia Dwork, Moritz Hardt, Solon Barocas, Arvind Narayanan, Toniann Pitassi, Omer Reingold, Richard Zemel, Joy Buolamwini, Timnit Gebru, Margaret Mitchell, Inioluwa Deborah Raji, Sandra Wachter, Chris Russell, Alexandra Chouldechova, Jon Kleinberg, Sendhil Mullainathan, Manish Raghavan, Sorelle Friedler, Suresh Venkatasubramanian, danah boyd, Andrew Selbst, Alex Hanna, Hanna Wallach, Aaron Roth, and Lily Hu.
The principal venue is the ACM Conference on Fairness, Accountability, and Transparency (FAccT), founded in 2018 as FAT*. The AAAI/ACM Conference on AI, Ethics, and Society (AIES) covers similar ground, and NeurIPS, ICML, and ICLR also publish fairness research. Research and advocacy centers include the Algorithmic Justice League founded by Buolamwini, the Distributed AI Research Institute (DAIR) founded by Gebru in 2021, the AI Now Institute at NYU, Stanford HAI, and the Center for Human-Compatible AI at Berkeley. The textbook Fairness and Machine Learning by Barocas, Hardt, and Narayanan, free at fairmlbook.org since 2018 and in print from MIT Press in 2023, is a standard reference [46].
From roughly 2022 onward, fairness research has expanded to cover foundation models and generative systems alongside earlier work on tabular and image classification.
Large language model fairness is now a significant subfield. The BBQ benchmark (Bias Benchmark for Question Answering), released by Parrish et al. in 2022, contains over 58,000 question-answer pairs covering age, disability, gender identity, nationality, physical appearance, race, religion, socioeconomic status, and sexual orientation [47]. The BOLD benchmark (Bias in Open-Ended Language Generation Dataset), released by Dhamala and colleagues at Amazon in 2021, contains roughly 23,000 prompts evaluating bias across profession, gender, race, religion, and political ideology [48]. StereoSet, CrowS-Pairs, RealToxicityPrompts, and HELM cover related ground.
Text-to-image models have been audited for racial and gender skew. Bianchi et al. (2023) showed that Stable Diffusion amplified demographic stereotypes for prompts about professions and adjectives [49]. Bloomberg Graphics published a 2023 visual analysis reaching similar conclusions across DALL-E, Midjourney, and Stable Diffusion [50]. Algorithmic auditing has matured as a practice: Raji et al. (2020) proposed a framework for end-to-end internal audits, and external audits are commissioned under regimes such as NYC Local Law 144 [51]. Algorithmic fairness is not solely a technical problem and cannot be resolved by technical means alone.
AI ethics, algorithmic bias, COMPAS recidivism, Gender Shades, equalized odds, demographic parity, calibration within groups, counterfactual fairness, individual fairness, AIF360, Fairlearn, EU AI Act, GDPR, NIST AI RMF, NYC Local Law 144, FAccT, AIES, DAIR Institute, AI Now Institute, BBQ benchmark, BOLD benchmark.