Machine learning terms/Fairness
Last reviewed
May 9, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 ยท 3,963 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 9, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 ยท 3,963 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: Machine learning terms
Machine learning fairness is the subfield of machine learning and artificial intelligence ethics concerned with detecting, measuring, and mitigating unjust outcomes that ML systems produce for individuals or groups defined by protected attributes such as race, sex, age, disability, religion, or national origin. The field draws on statistics, computer science, law, and moral philosophy, and has grown rapidly since the early 2010s as automated decision systems have moved into hiring, lending, criminal justice, healthcare, education, advertising, and content moderation.
Fairness matters for several reasons. Models trained on historical data can reproduce or amplify human discrimination encoded in that data, disadvantaging already marginalized groups. Because the same model is applied at scale, a small disparity in error rate can translate into thousands of harmful decisions. Fairness is also a legal requirement in many jurisdictions: statutes such as the U.S. Civil Rights Act of 1964 (Title VII), the Fair Housing Act, the Equal Credit Opportunity Act, and the U.K. Equality Act 2010 apply to automated decisions just as to human ones.
The technical literature distinguishes related notions. Bias in the statistical sense is systematic error of a model relative to ground truth. Bias in the social sense, sometimes called bias (ethics/fairness), is systematic difference in outcomes across demographic groups. Discrimination in the legal sense is differential treatment or impact on members of a protected class. A system can be statistically unbiased yet socially biased, or socially fair yet legally non-compliant, so practitioners must be precise about which definition is in play.
Machine learning bias arises at every stage of the development and deployment pipeline. The list below groups common sources by where they enter the system.
Data bias is bias that originates in the dataset used to train a model. The training set may not represent the population the model will be deployed on, the labels may be unreliable, or the features may measure different constructs across groups.
| Type | Description | Example |
|---|---|---|
| sampling bias | The training sample is not drawn uniformly at random from the deployment population. | A face recognition system trained mostly on images of light-skinned men. |
| coverage bias | Some segments of the population are absent from the data entirely. | A speech model with no training data from non-native speakers of a language. |
| non-response bias | Selected individuals decline to participate at different rates across groups. | A patient survey where rural respondents reply less often than urban ones. |
| participation bias | Self-selection into the dataset correlates with the outcome. | An online course model trained only on users who voluntarily completed an opt-in survey. |
| selection bias | Umbrella term for any non-random inclusion of examples in training data. | Loan default predictions trained only on applicants who were previously approved. |
| label bias | The labels in the training data reflect historical human decisions that were themselves biased. | A hiring model trained on past hires made by managers who preferred male candidates. |
| measurement bias | Features measure different things, or measure with different accuracy, across groups. | Arrest records used as a proxy for criminal behavior, when arrest rates vary by neighborhood policing intensity. |
| reporting bias | Frequency of events in data does not match real-world frequency. | Negative product reviews are over-represented online relative to positive experiences. |
Algorithmic bias is bias introduced by the choice of model family, loss function, feature engineering, or optimization procedure, even when the data itself is balanced. Examples include models that minimize average loss and therefore perform worse on minority subgroups, regularizers that disproportionately shrink coefficients on rare features, and feature pipelines that drop sparse signals important to a smaller subgroup.
Deployment bias appears after the model is in production. A model used differently from how it was designed, used in a different population than its training set, or used in a feedback loop where its own predictions influence future training data can produce unfair outcomes even if it was fair on the original test set. Automation bias is the human tendency to trust machine outputs even when they are wrong, which can compound model bias by removing the human check.
Human cognitive biases shape labeling, model design, and evaluation. Recognized examples include confirmation bias, implicit bias, in-group bias, out-group homogeneity bias, group attribution bias, and experimenter's bias. These biases can be embedded into a system through choices about which features to collect, which labels to apply, and how to interpret model errors.
A handful of high-profile audits drove fairness research from a niche academic concern into a mainstream engineering priority. Each case study illustrates how a different stage of the pipeline can produce disparate outcomes.
| Case | Year | System | Finding |
|---|---|---|---|
| COMPAS recidivism risk | 2016 | Pretrial risk assessment used by U.S. courts | ProPublica reported Black defendants were nearly twice as likely to be incorrectly labeled high risk than white defendants, while white defendants were more likely to be incorrectly labeled low risk. |
| Amazon resume screening | 2018 | Internal hiring tool | Reuters reported the system penalized resumes containing the word "women's" (as in "women's chess club captain") because it was trained on a decade of resumes dominated by men. Amazon scrapped the tool in 2017. |
| Optum/UnitedHealth healthcare risk | 2019 | Algorithm used to identify patients for high-risk care management programs | Obermeyer, Powers, Vogeli, and Mullainathan, in Science, showed the algorithm used healthcare spending as a proxy for health needs, which under-identified Black patients because less money is spent on Black patients with the same level of need. Researchers estimated only 17.7% of patients flagged were Black, where 46.5% should have been. |
| Gender Shades | 2018 | Commercial face analysis APIs | Buolamwini and Gebru tested IBM, Microsoft, and Face++ APIs and found error rates up to 34.7% for darker-skinned women versus 0.8% for lighter-skinned men. |
| Apple Card credit limits | 2019 | Goldman Sachs credit decisioning for Apple Card | David Heinemeier Hansson reported a credit limit twenty times higher than his wife's despite joint taxes. The New York Department of Financial Services closed the case in 2021 without finding fair lending violations, but the episode brought scrutiny to opaque underwriting models. |
| Twitter image cropping | 2020 | Saliency-based auto-crop | Users showed the algorithm preferentially cropped to lighter-skinned faces. Twitter removed the auto-crop. |
| Dutch childcare benefits scandal | 2013 to 2019 | Tax authority risk-classification algorithm | A risk-scoring system falsely accused tens of thousands of families, disproportionately non-Dutch nationals, of benefits fraud. The scandal contributed to the resignation of the Rutte III cabinet in January 2021. |
These cases share a common pattern: a model deployed at scale, a sensitive demographic characteristic correlated with the prediction target through a flawed proxy, and a downstream harm that disproportionately affected one group.
Fairness researchers have proposed dozens of formal criteria. The most widely cited are listed below. Notation: let A be a sensitive attribute (such as race), Y the true outcome, and \u0176 the model's prediction. P denotes probability.
| Criterion | Definition | Intuition |
|---|---|---|
| demographic parity | P(\u0176 = 1 | A = a) equal for all groups a. Also called statistical parity or independence. | Positive prediction rate is the same across groups. |
| disparate impact | Ratio of positive prediction rates between groups within a fairness band, often 0.8 (the four-fifths rule used by the U.S. EEOC). | A version of demographic parity tied to U.S. employment law. |
| disparate treatment | The model's decisions do not depend on a sensitive attribute directly. | Sometimes called fairness through unawareness (to a sensitive attribute). Insufficient because proxy (sensitive attributes) can encode the protected feature. |
| equalized odds | True positive and false positive rates equal across groups. | Equally accurate at identifying members and avoiding false alarms in each group. |
| equality of opportunity | True positive rate equal across groups (a relaxation of equalized odds). | Among those deserving a positive outcome, the model identifies them at equal rates. |
| predictive parity | Positive predictive value (precision) equal across groups: P(Y = 1 | \u0176 = 1, A = a) constant in a. | When the model says yes, it is right at the same rate for each group. |
| predictive rate parity | Both positive and negative predictive values equal across groups. | Stronger version of predictive parity. |
| calibration | For each score s, P(Y = 1 | score = s, A = a) is the same across groups. | A score of 0.7 means the same thing for every group. |
| individual fairness | Similar individuals receive similar predictions, similarity measured by a task-specific metric. Formalized by Dwork et al. (2012). | A person should not be treated worse than a near-identical person in another group. |
| counterfactual fairness | Prediction would be the same if the individual belonged to a different group, holding non-descendant variables fixed. Proposed by Kusner et al. (2017). | Causal version of individual fairness. |
| fairness constraint | A formal constraint added to the optimization so the trained model satisfies a chosen criterion. | Operationalizes the criterion at training time. |
| fairness metric | Scalar measurement of distance from a chosen criterion. | Used to compare models or monitor production. |
A central result of the field is that the most popular fairness criteria are mutually incompatible except in trivial cases. Three formulations of this result appeared close together.
This cluster of results is sometimes called the incompatibility of fairness metrics or simply the impossibility theorem. It does not mean fairness is unattainable; it means choosing a definition is a value judgment that cannot be deferred to mathematics. Practitioners must decide which kind of fairness is appropriate for the application at hand.
Mitigation techniques fall into three families based on where in the pipeline they intervene. Most production systems combine techniques from more than one family.
Pre-processing methods modify the training data before model fitting. Examples:
The term preprocessing covers these techniques. They are model-agnostic but can degrade overall accuracy.
In-processing methods change the training procedure, typically by adding fairness terms to the loss function or imposing constraints on the optimization.
Post-processing methods adjust the outputs of an already-trained model. They are useful when retraining is expensive or the model is supplied by a third party. The umbrella term is post-processing.
A growing ecosystem of open-source libraries supports fairness measurement and mitigation. The most widely used are listed below.
| Tool | Maintainer | Notes |
|---|---|---|
| AI Fairness 360 (AIF360) | IBM Research | Released 2018. Over 70 metrics and 11 mitigation algorithms. Python and R. |
| Fairlearn | Microsoft | Released 2018. Reductions and post-processing algorithms, plus an assessment dashboard. |
| What-If Tool | Released 2018 as a TensorBoard plugin. Interactive code-free model inspection across data slices. | |
| Aequitas | University of Chicago | Released 2018. Audit-oriented toolkit producing fairness reports for policymakers and analysts. |
| LinkedIn Fairness Toolkit (LiFT) | Released 2020. Spark-based library used in LinkedIn's own ranking systems. | |
| Themis-ML | Niels Bantilan | Open-source Python library predating AIF360, for auditing and mitigation. |
| Captum / SHAP | Meta / independent | Interpretability tools used extensively in fairness audits. |
Larger MLOps platforms (Amazon SageMaker Clarify, Google Vertex AI Model Monitoring, Azure Responsible AI dashboard) bundle subsets of these capabilities with the rest of the model lifecycle.
Fairness is increasingly a matter of statute, not only voluntary best practice. The most consequential current rules are summarized below.
| Jurisdiction | Instrument | Status | Scope |
|---|---|---|---|
| European Union | EU AI Act (Regulation 2024/1689) | Adopted June 2024, entry into force 1 August 2024, phased application through 2026 | Classifies AI systems by risk. High-risk systems, including those used in employment, education, essential services, law enforcement, and migration, must meet data-quality, documentation, transparency, human oversight, and post-market monitoring obligations. Bias-monitoring is explicitly required. |
| United States (federal) | NIST AI Risk Management Framework (AI RMF 1.0) | Published January 2023, voluntary | Defines four functions: govern, map, measure, manage. The associated AI RMF Playbook and the Generative AI Profile (NIST AI 600-1, July 2024) provide concrete fairness guidance. |
| United States (federal) | EEOC technical assistance on AI in hiring | Issued 2022 to 2023 | Confirms that the four-fifths rule and Title VII liability apply to algorithmic hiring tools. |
| New York City | Local Law 144 (Automated Employment Decision Tools) | Effective 5 July 2023, enforcement from 5 July 2023 | Employers using automated hiring tools on NYC residents must commission an independent annual bias audit and publish a summary. |
| Illinois | Artificial Intelligence Video Interview Act and HB 3773 (amending the Illinois Human Rights Act) | 2020 (video) and effective 1 January 2026 (HB 3773) | Notice and consent for video interviews; HB 3773 expands employer liability for discriminatory effects of AI in employment decisions. |
| Colorado | Colorado AI Act (SB 24-205) | Signed May 2024, effective 1 February 2026 | First comprehensive U.S. state AI law. Requires developers and deployers of high-risk AI systems to use reasonable care to protect consumers from algorithmic discrimination, complete impact assessments, and notify consumers of adverse decisions. |
| California | Civil Rights Department regulations on automated decision systems | Adopted 21 March 2025, effective 1 October 2025 | Clarifies that California's Fair Employment and Housing Act applies to algorithmic decision tools used in employment. |
| United Kingdom | Equality Act 2010 plus AI white paper guidance | Existing law plus 2023 white paper | Principles-based regime; sectoral regulators apply existing equality law to AI. |
| Canada | Artificial Intelligence and Data Act (AIDA), part of Bill C-27 | Pending as of 2026 | Would require risk assessments and bias mitigation for high-impact AI systems. |
| China | Algorithmic Recommendations Provisions and Generative AI Measures | Effective 2022 and 2023 respectively | Require that algorithms not discriminate based on user attributes. |
In addition, U.S. financial regulators (CFPB, OCC, Federal Reserve) have issued guidance reaffirming that the Equal Credit Opportunity Act and Regulation B apply to algorithmic underwriting and that creditors must provide specific reasons for adverse actions, including those produced by complex models.
A short, non-exhaustive list of researchers whose work has shaped the field:
| Researcher | Affiliations | Key contributions |
|---|---|---|
| Cynthia Dwork | Harvard, Microsoft Research | Co-author of the 2012 paper "Fairness through awareness," introducing individual fairness. Foundational figure in differential privacy. |
| Moritz Hardt | Max Planck Institute for Intelligent Systems | Co-author of "Equality of Opportunity in Supervised Learning" (2016), formalizing equalized odds; co-author of Fairness and Machine Learning. |
| Solon Barocas | Microsoft Research and Cornell | Co-author of Fairness and Machine Learning; "Big Data's Disparate Impact" (2016) with Andrew Selbst. |
| Timnit Gebru | DAIR, formerly Google | Co-author of Gender Shades (2018), "Datasheets for Datasets" (2018), and "Stochastic Parrots" (2021) on large language models. Founded DAIR in 2021. |
| Joy Buolamwini | MIT Media Lab, Algorithmic Justice League | Co-author of Gender Shades; founder of the Algorithmic Justice League; author of Unmasking AI (2023). |
| Arvind Narayanan | Princeton | Co-author of Fairness and Machine Learning; co-author of AI Snake Oil (2024). |
| Suresh Venkatasubramanian | Brown University, formerly White House OSTP | Co-architect of the U.S. AI Bill of Rights Blueprint (2022); co-author of disparate impact remover (2015). |
| Sorelle Friedler | Haverford College | Co-author of disparate impact remover; long-running editor of FAccT. |
| Aaron Roth | University of Pennsylvania | Co-author of The Ethical Algorithm (2019); work on multicalibration (2018). |
| Sendhil Mullainathan | University of Chicago Booth | Co-author of the 2019 Science healthcare-algorithm study and the 2016 impossibility result. |
| Ziad Obermeyer | UC Berkeley School of Public Health | Lead author of the 2019 Science healthcare-algorithm study. |
| Alexandra Chouldechova | Microsoft Research, formerly Carnegie Mellon | Author of the 2017 impossibility result for binary classifiers. |
| Jon Kleinberg | Cornell | Co-author of the 2016 inherent trade-offs paper. |
| Margaret Mitchell | Hugging Face, formerly Google | Co-author of "Model Cards for Model Reporting" (2019) and "Stochastic Parrots." |
| Ruha Benjamin | Princeton | Author of Race After Technology (2019), framing algorithmic discrimination as part of the "New Jim Code." |
| Safiya Umoja Noble | UCLA | Author of Algorithms of Oppression (2018), a book-length critique of search-engine bias. |
The annual ACM Conference on Fairness, Accountability, and Transparency (FAccT, formerly FAT* and FAT/ML) launched in 2018 and serves as the field's primary publication venue.
A typical fairness workflow combines several activities:
The following pages on this wiki cover individual fairness concepts in detail.