Algorithmic fairness

AI Ethics AI Safety Machine Learning

22 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

51 citations

Revision

v3 · 4,408 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Algorithmic fairness is the study of how automated decision systems can be made to produce decisions that are equitable across protected attributes such as race, gender, age, religion, and disability. Also called fairness in machine learning, ML fairness, or AI fairness, it is a subfield of AI ethics that combines computer science, law, philosophy, statistics, and the social sciences. Researchers develop mathematical definitions of fairness, audit deployed systems for algorithmic bias, design mitigation techniques, and analyze the social and legal context in which decisions operate ^[1]^[2].

The field's defining insight is that fairness cannot be reduced to a single number: a foundational 2016 result proved that several intuitive fairness criteria are mathematically incompatible whenever base rates differ across groups, so every deployed system embeds a value-laden choice about which errors are acceptable for which people ^[22]^[18]. The field grew rapidly after mid-2010s audits exposed disparities in tools used for criminal sentencing, hiring, lending, healthcare, and face analysis. In the 2016 ProPublica investigation of the COMPAS recidivism tool, the false-positive rate for Black defendants in Broward County was 44.9 percent versus 23.5 percent for white defendants, while the 2018 Gender Shades audit found commercial gender-classification error rates as high as 34.7 percent for darker-skinned women against 0.8 percent for lighter-skinned men ^[6]^[9]. By the early 2020s, fairness had become a recognized requirement in major regulatory frameworks including the EU AI Act and NYC Local Law 144 ^[3]^[4]. Many central questions, including which fairness definition to apply, whether sensitive attributes should be collected, and how to weigh trade-offs against accuracy, remain contested.

What problems motivated the field?

The legal roots of algorithmic fairness predate machine learning. In the United States, the doctrine of disparate impact was articulated in Griggs v. Duke Power Co. (1971), which held that facially neutral employment practices that disproportionately exclude protected groups can be unlawful unless justified by business necessity ^[5]. Title VII of the Civil Rights Act of 1964, the Equal Credit Opportunity Act of 1974, and the Fair Housing Act of 1968 form the backbone of US anti-discrimination law. In the EU, the Racial Equality Directive (2000/43/EC), the Employment Equality Directive (2000/78/EC), and Article 21 of the Charter of Fundamental Rights cover similar ground.

The COMPAS controversy began in May 2016 when ProPublica published "Machine Bias" by Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. The investigation analyzed risk scores produced by Correctional Offender Management Profiling for Alternative Sanctions, sold by Northpointe (now Equivant) and used by judges in several US states, drawing on scores for more than 7,000 people arrested in Broward County, Florida, in 2013 and 2014. ProPublica reported that Black defendants who did not reoffend were almost twice as likely to be misclassified as high risk than white non-reoffenders (a false-positive rate of 44.9 percent versus 23.5 percent), while white reoffenders were more often misclassified as low risk ^[6]. Northpointe responded that COMPAS was calibrated within groups: a given score corresponded to the same probability of reoffense for Black and white defendants, and that the tool correctly predicted recidivism about 61 percent of the time for both groups ^[7]. The dispute was later shown to reflect a mathematical impossibility rather than an error on either side.

The Amazon recruiting tool was an internal experiment to score job applicants. Reuters reported in October 2018 that Amazon scrapped the tool around 2017 after engineers discovered it penalized resumes containing the word "women's" and downgraded graduates of two all-women's colleges. The model had been trained on a decade of resumes submitted to Amazon, predominantly from men ^[8].

Gender Shades, a 2018 audit by Joy Buolamwini and Timnit Gebru presented at FAccT, evaluated commercial gender classification systems from IBM, Microsoft, and Face++ on a curated set of 1,270 face images. Error rates were 0.8 percent or lower for lighter-skinned men but as high as 34.7 percent for darker-skinned women in one system ^[9]. The study became a touchstone for intersectional analysis, and IBM exited the face recognition business in 2020.

In October 2019, Obermeyer, Powers, Vogeli, and Mullainathan published a study in Science showing that a widely used commercial healthcare risk algorithm assigned similar risk scores to Black and white patients with different levels of underlying illness. The system used past healthcare costs as a proxy for need; because Black patients historically received less care, costs were a biased proxy. Correcting it would more than double the proportion of Black patients identified for high-risk care management, from 17.7 percent to 46.5 percent ^[10].

The Apple Card credit limit dispute began in November 2019 when entrepreneur David Heinemeier Hansson posted that he had received a credit limit roughly 20 times higher than his wife despite shared finances. The New York State Department of Financial Services investigated Goldman Sachs and concluded in March 2021 that no violation of New York fair lending law had been substantiated, while noting the case raised questions about explainability ^[11]. The Twitter image cropping algorithm came under scrutiny in 2020 when users observed that the saliency model appeared to favor lighter-skinned and female faces; Twitter audited internally and changed the product to display full images on mobile timelines in 2021 ^[12]. The NIST Face Recognition Vendor Test demographic effects study (December 2019) evaluated 189 algorithms from 99 developers and reported many systems showing higher false match rates for African and East Asian faces relative to Eastern European faces ^[13].

Where does algorithmic bias come from?

Suresh and Guttag (2021) and Mehrabi et al. (2021) provide widely cited taxonomies ^[14]^[2].

Historical bias arises when training data accurately reflects past discrimination. A model predicting who has historically been promoted mirrors past managerial preferences, including unlawful patterns. Representation bias occurs when the training population underrepresents some groups; ImageNet and IJB-A were criticized for skewing toward lighter-skinned subjects, and the Pilot Parliaments Benchmark used in Gender Shades was constructed to balance representation ^[9]. Measurement bias appears when features or labels are differential proxies for the construct of interest. Using arrest rates as a proxy for crime encodes disparities in policing; using healthcare cost as a proxy for need encodes disparities in access, as in Obermeyer et al. ^[10]. Aggregation bias results from fitting one model to a heterogeneous population. Evaluation bias occurs when benchmarks do not represent the deployment population; deployment bias occurs when a model is used in a context for which it was not designed. Feedback loops arise when predictions influence future training data. Lum and Isaac (2016) showed that predictive policing tools directing patrols to historically over-policed neighborhoods can generate more arrests there, reinforcing the pattern when the model is retrained ^[15].

How is fairness defined mathematically?

Using notation A for a sensitive attribute, Y for the true outcome, hat-Y for a binary prediction, and S for a continuous score, researchers have proposed many statistical fairness criteria.

Criterion	Statistical statement	Intuition	Key reference
Demographic parity (statistical parity)	P(hat-Y=1 \| A=a) = P(hat-Y=1 \| A=b)	Each group receives positive predictions at the same rate	Dwork et al. 2012 ^[1]
Conditional demographic parity	P(hat-Y=1 \| A=a, L=l) = P(hat-Y=1 \| A=b, L=l)	Equal positive rates after conditioning on legitimate features L	Kamiran et al. 2013 ^[16]
Equalized odds	P(hat-Y=1 \| Y=y, A=a) = P(hat-Y=1 \| Y=y, A=b) for y in {0,1}	Equal true positive and false positive rates across groups	Hardt, Price, Srebro 2016 ^[17]
Equality of opportunity	P(hat-Y=1 \| Y=1, A=a) = P(hat-Y=1 \| Y=1, A=b)	Among true positives, equal selection rates across groups	Hardt, Price, Srebro 2016 ^[17]
Predictive parity (calibration within groups)	P(Y=1 \| S=s, A=a) = P(Y=1 \| S=s, A=b)	Score s means the same probability regardless of group	Chouldechova 2017 ^[18]
Counterfactual fairness	hat-Y(A=a, U) = hat-Y(A=b, U)	Prediction is unchanged in a counterfactual world where the sensitive attribute is altered	Kusner et al. 2017 ^[19]
Individual fairness	If d(x, x') is small, then \|hat-Y(x) - hat-Y(x')\| is small	Similar individuals receive similar predictions, given a task-specific metric	Dwork et al. 2012 ^[1]
Treatment equality	FN_a / FP_a = FN_b / FP_b	Ratio of false negatives to false positives is equal across groups	Berk et al. 2018 ^[20]

These criteria fall into broad families. Independence criteria such as demographic parity require the prediction to be statistically independent of the protected attribute. Separation criteria such as equalized odds require independence conditional on the true outcome. Sufficiency criteria such as calibration require independence of the true outcome from the attribute given the score. Demographic parity is among the oldest formalizations and roughly corresponds to disparate impact. It is sometimes operationalized using the four-fifths rule from US EEOC guidance, which treats a selection rate ratio below 0.8 as evidence of adverse impact. Critics note it can require accepting weaker candidates in one group purely to equalize rates.

Equalized odds and equality of opportunity, introduced by Hardt, Price, and Srebro in 2016, condition on the true outcome and so do not require equal positive rates when base rates differ ^[17]. Calibration within groups was the criterion satisfied by COMPAS as analyzed by Northpointe and by Corbett-Davies, Pierson, Feller, Goel, and Huq (2017): a given score corresponds to the same risk regardless of group ^[18]^[21]. Counterfactual fairness, proposed by Kusner, Loftus, Russell, and Silva in 2017, formalizes fairness using structural causal models ^[19]. Individual fairness, introduced by Dwork et al. in 2012, requires that similar individuals receive similar predictions; the challenge is choosing the similarity metric, which encodes much of the normative content of the judgment ^[1].

Why can fairness criteria not all be satisfied at once?

A central result is that several intuitive fairness criteria cannot be satisfied simultaneously when base rates differ across groups. This was shown independently by Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan in their 2016 paper "Inherent Trade-Offs in the Fair Determination of Risk Scores" and by Alexandra Chouldechova in her 2017 paper "Fair Prediction with Disparate Impact" ^[22]^[18].

Kleinberg, Mullainathan, and Raghavan formalized three properties: calibration within groups, balance for the positive class (equal average score among Y=1 in each group), and balance for the negative class (equal average score among Y=0 in each group). They proved these conditions cannot all hold unless the predictor is perfect or base rates are equal across groups. Chouldechova showed an analogous result for binary classifiers, stating in her abstract that "the criteria cannot all be simultaneously satisfied when recidivism prevalence differs across groups": predictive parity and equal false positive and false negative rates cannot all hold at once outside the trivial cases ^[18].

The COMPAS dispute is the canonical illustration. Black and white defendants in Broward County had different observed two-year recidivism rates, so no non-perfect classifier could satisfy both the calibration property emphasized by Northpointe and the equal error-rate property emphasized by ProPublica. Both critiques were mathematically correct; the disagreement was about which criterion mattered more, not about the data. Pleiss et al. (2017) extended the result by showing calibration is generally incompatible with even relaxed equalized-odds constraints ^[23]. Fairness cannot be reduced to a single metric: every deployed system embeds a choice about which errors are acceptable for which groups, and that choice has moral and legal content mathematics alone cannot settle.

How can unfairness be mitigated?

Methods to reduce unfairness are typically grouped by where in the pipeline they intervene.

Stage	Method	Mechanism	Reference
Pre-processing	Reweighting	Weight examples so group-conditional prevalence matches	Kamiran & Calders 2012 ^[24]
Pre-processing	Disparate impact remover	Modify features so they are not predictive of group	Feldman et al. 2015 ^[25]
Pre-processing	Fair representation learning	Learn embedding obscuring protected attribute while preserving task signal	Zemel et al. 2013 ^[26]
Pre-processing	Optimized preprocessing	Transform features and labels under fairness and utility constraints	Calmon et al. 2017 ^[27]
In-processing	Constrained optimization	Train under explicit fairness constraints	Zafar et al. 2017 ^[28]
In-processing	Adversarial debiasing	Train predictor jointly with adversary recovering sensitive attribute	Zhang et al. 2018 ^[29]
In-processing	Fair regularization	Add regularization penalizing disparity	Kamishima et al. 2012 ^[30]
In-processing	Reductions	Reduce fair classification to cost-sensitive problems	Agarwal et al. 2018 ^[31]
Post-processing	Group-specific thresholds	Choose different decision thresholds per group	Hardt et al. 2016 ^[17]
Post-processing	Reject option	Reassign predictions near boundary in favor of disadvantaged group	Kamiran et al. 2012 ^[32]
Post-processing	Calibrated equalized odds	Trade calibration for equalized odds via randomized post-processing	Pleiss et al. 2017 ^[23]

Pre-processing methods modify training data so downstream learners are less likely to encode unfair patterns; they are model-agnostic but may distort genuinely predictive signal. In-processing methods build fairness into training itself; adversarial debiasing treats fairness as a min-max problem ^[29], and the reductions approach reduces fair classification to a sequence of weighted classification problems with theoretical guarantees ^[31]. Post-processing methods adjust an already trained predictor; they can satisfy specific criteria exactly but typically require access to the sensitive attribute at decision time. Friedler et al. (2019) reported that mitigation effects depend strongly on dataset, base rate, and choice of fairness metric ^[33].

What tools and benchmarks does the field use?

Several open-source toolkits implement fairness metrics and mitigation methods. AIF360 was released by IBM in 2018 with Python and R interfaces to dozens of metrics and mitigation algorithms ^[34]. Fairlearn was released by Microsoft in 2018 and is now an independent project; it implements the reductions approach, post-processing including ThresholdOptimizer, and a trade-off dashboard ^[35]. The What-If Tool from Google PAIR (2018) integrates with TensorBoard. Aequitas, from CMU's Center for Data Science and Public Policy, targets policymakers and journalists.

Common benchmarks include the UCI Adult Income dataset (a 1994 census extract), the COMPAS dataset released by ProPublica, German Credit, MIMIC-III for clinical applications, the Pilot Parliaments Benchmark from Gender Shades, and CelebA. Critiques of Adult by Ding, Hardt, Miller, and Schmidt (2021) led to Folktables, drawn from the American Community Survey ^[36].

How is algorithmic fairness regulated?

Governments have begun translating fairness research into binding rules.

The EU AI Act was politically agreed in December 2023 and entered into force on August 1, 2024. Provisions phase in over the following years, with prohibitions applying after six months and high-risk system requirements applying twenty-four months after entry. The Act categorizes systems by risk and imposes obligations on high-risk systems, including those in employment, education, credit scoring, law enforcement, and the administration of justice. High-risk systems must undergo conformity assessment addressing data quality, documentation, transparency, oversight, accuracy, robustness, and bias. Article 10 requires training, validation, and testing datasets be examined for biases that may lead to discrimination ^[3]. The EU General Data Protection Regulation (GDPR), in force since May 2018, regulates automated decision-making in Article 22, granting data subjects a right against decisions based solely on automated processing with legal or similarly significant effects; Recital 71 mentions preventing discriminatory effects ^[37].

In the US, the Equal Employment Opportunity Commission issued guidance in May 2022 and 2023 explaining how the Americans with Disabilities Act and Title VII apply to employer use of AI tools, including testing for adverse impact ^[38]. NYC Local Law 144 took effect on January 1, 2023, with enforcement from July 5, 2023; it requires employers using automated employment decision tools for New York City residents to commission an annual independent bias audit, publish a summary, and notify candidates ^[4]. The Federal Trade Commission issued a 2021 business guidance post warning that biased algorithms in consumer decisions can violate the Equal Credit Opportunity Act and Section 5 of the FTC Act ^[39]. The NIST AI Risk Management Framework (AI RMF 1.0) released January 26, 2023 is voluntary but widely referenced and includes fairness among the characteristics of trustworthy AI ^[40]. The Algorithmic Accountability Act has been proposed since 2019 but as of early 2026 has not been enacted. California AB 2930 was introduced in 2024 to regulate automated decision tools, and Colorado's SB 24-205, enacted in May 2024, applies developer and deployer obligations to high-risk AI systems from February 2026.

What are the main critiques and debates?

In "Fairness and Abstraction in Sociotechnical Systems" (2019), Andrew Selbst, danah boyd, Sorelle Friedler, Suresh Venkatasubramanian, and Janet Vertesi argue that fair machine learning research often abstracts away from the social context in which a system is embedded, treating fairness as a property of an algorithm rather than of a sociotechnical assemblage including data collection, deployment, oversight, and contestation ^[41]. They identify "abstraction traps," including framing, portability, and solutionism traps.

A related debate concerns process-based versus outcome-based fairness. Process-based accounts emphasize equal treatment, transparency, and the right to explanation; outcome-based accounts measure disparities in results. The two can come apart: a process that treats individuals identically can still produce uneven outcomes if relevant features are unequally distributed. The distinction between disparate treatment and disparate impact is closely related. US anti-discrimination law generally permits disparate impact analysis but treats explicit use of protected attributes with greater suspicion. Mitigation techniques that adjust thresholds by group raise questions under the disparate treatment doctrine, especially after Students for Fair Admissions v. Harvard (2023).

Trade-offs with accuracy are contested: some papers argue fairness and accuracy can improve together when unfairness stems from poor data quality, while others find a clearer trade-off when criteria are imposed strictly across multiple groups. Fairness versus privacy is a structural tension: measuring fairness across protected attributes typically requires collecting them, which conflicts with privacy norms and laws limiting data collection. Methods using proxy sensitive attributes raise their own concerns. Andrus, Spitzer, Brown, and Xiang (2021) document the challenges firms face when they cannot or will not collect race and gender data ^[42].

Intersectionality is a sustained concern. Many analyses report metrics by single attributes and miss patterns affecting specific subgroups; Gender Shades centered intersectionality by reporting error rates separately for darker-skinned women, lighter-skinned women, darker-skinned men, and lighter-skinned men ^[9]. Long-term effects of fairness interventions are a newer area. Liu, Dean, Rolf, Simchowitz, and Hardt (2018) showed that criteria optimized at one point in time can worsen disparities for the disadvantaged group when decisions affect future qualifications ^[43]. Mouzannar, Ohannessian, and Srebro (2019) reported similar dynamics in education and lending ^[44]. Hu and Kohler-Hausmann argue that race in particular cannot be cleanly separated from other features for causal counterfactuals ^[45].

Who shaped the field, and where is it published?

Researchers who have shaped the field include Cynthia Dwork, Moritz Hardt, Solon Barocas, Arvind Narayanan, Toniann Pitassi, Omer Reingold, Richard Zemel, Joy Buolamwini, Timnit Gebru, Margaret Mitchell, Inioluwa Deborah Raji, Sandra Wachter, Chris Russell, Alexandra Chouldechova, Jon Kleinberg, Sendhil Mullainathan, Manish Raghavan, Sorelle Friedler, Suresh Venkatasubramanian, danah boyd, Andrew Selbst, Alex Hanna, Hanna Wallach, Aaron Roth, and Lily Hu.

The principal venue is the ACM Conference on Fairness, Accountability, and Transparency (FAccT), founded in 2018 as FAT*. The AAAI/ACM Conference on AI, Ethics, and Society (AIES) covers similar ground, and NeurIPS, ICML, and ICLR also publish fairness research. Research and advocacy centers include the Algorithmic Justice League founded by Buolamwini, the Distributed AI Research Institute (DAIR) founded by Gebru in 2021, the AI Now Institute at NYU, Stanford HAI, and the Center for Human-Compatible AI at Berkeley. The textbook Fairness and Machine Learning by Barocas, Hardt, and Narayanan, free at fairmlbook.org since 2018 and in print from MIT Press in 2023, is a standard reference ^[46].

How does fairness apply to large language models?

From roughly 2022 onward, fairness research has expanded to cover foundation models and generative systems alongside earlier work on tabular and image classification.

Large language model fairness is now a significant subfield. The BBQ benchmark (Bias Benchmark for Question Answering), released by Parrish et al. in 2022, contains over 58,000 question-answer pairs covering age, disability, gender identity, nationality, physical appearance, race, religion, socioeconomic status, and sexual orientation ^[47]. The BOLD benchmark (Bias in Open-Ended Language Generation Dataset), released by Dhamala and colleagues at Amazon in 2021, contains roughly 23,000 prompts evaluating bias across profession, gender, race, religion, and political ideology ^[48]. StereoSet, CrowS-Pairs, RealToxicityPrompts, and HELM cover related ground.

Text-to-image models have been audited for racial and gender skew. Bianchi et al. (2023) showed that Stable Diffusion amplified demographic stereotypes for prompts about professions and adjectives ^[49]. Bloomberg Graphics published a 2023 visual analysis reaching similar conclusions across DALL-E, Midjourney, and Stable Diffusion ^[50]. Algorithmic auditing has matured as a practice: Raji et al. (2020) proposed a framework for end-to-end internal audits, and external audits are commissioned under regimes such as NYC Local Law 144 ^[51]. Algorithmic fairness is not solely a technical problem and cannot be resolved by technical means alone.

References

Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012). Fairness through awareness. *ITCS*. https://arxiv.org/abs/1104.3913 ↩
Mehrabi, N., et al. (2021). A survey on bias and fairness in machine learning. *ACM Computing Surveys* 54(6). https://arxiv.org/abs/1908.09635 ↩
EU. (2024). Regulation (EU) 2024/1689 (AI Act). https://eur-lex.europa.eu/eli/reg/2024/1689/oj ↩
NYC DCWP. (2023). Automated Employment Decision Tools (Local Law 144). https://www.nyc.gov/site/dca/about/automated-employment-decision-tools.page ↩
Griggs v. Duke Power Co., 401 U.S. 424 (1971). https://supreme.justia.com/cases/federal/us/401/424/ ↩
Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016, May 23). Machine bias. *ProPublica*. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing ↩
Dieterich, W., Mendoza, C., & Brennan, T. (2016). COMPAS risk scales: Demonstrating accuracy equity and predictive parity. Northpointe. https://go.volarisgroup.com/rs/430-MBX-989/images/ProPublica_Commentary_Final_070616.pdf ↩
Dastin, J. (2018, October 10). Amazon scraps secret AI recruiting tool that showed bias against women. *Reuters*. https://www.reuters.com/article/world/insight-amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK0AG/ ↩
Buolamwini, J., & Gebru, T. (2018). Gender shades. *FAT* 2018*, 81, 77-91. https://proceedings.mlr.press/v81/buolamwini18a.html ↩
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. *Science* 366(6464), 447-453. https://www.science.org/doi/10.1126/science.aax2342 ↩
NY DFS. (2021, March 23). Report on Apple Card investigation. https://www.dfs.ny.gov/system/files/documents/2021/03/rpt_202103_apple_card_investigation.pdf ↩
Yee, K., Tantipongpipat, U., & Mishra, S. (2021). Image cropping on Twitter. *PACM HCI* 5(CSCW2). https://arxiv.org/abs/2105.08667 ↩
Grother, P., Ngan, M., & Hanaoka, K. (2019). FRVT Part 3: Demographic effects (NISTIR 8280). NIST. https://nvlpubs.nist.gov/nistpubs/ir/2019/NIST.IR.8280.pdf ↩
Suresh, H., & Guttag, J. (2021). A framework for understanding sources of harm throughout the ML life cycle. *EAAMO*. https://arxiv.org/abs/1901.10002 ↩
Lum, K., & Isaac, W. (2016). To predict and serve? *Significance* 13(5), 14-19. https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2016.00960.x ↩
Kamiran, F., Zliobaite, I., & Calders, T. (2013). Quantifying explainable discrimination. *KAIS* 35(3), 613-644. https://link.springer.com/article/10.1007/s10115-012-0584-8 ↩
Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in supervised learning. *NeurIPS*. https://arxiv.org/abs/1610.02413 ↩
Chouldechova, A. (2017). Fair prediction with disparate impact. *Big Data* 5(2), 153-163. https://arxiv.org/abs/1703.00056 ↩
Kusner, M. J., Loftus, J. R., Russell, C., & Silva, R. (2017). Counterfactual fairness. *NeurIPS*. https://arxiv.org/abs/1703.06856 ↩
Berk, R., Heidari, H., Jabbari, S., Kearns, M., & Roth, A. (2018). Fairness in criminal justice risk assessments. *Sociological Methods and Research* 50(1), 3-44. https://arxiv.org/abs/1703.09207 ↩
Corbett-Davies, S., Pierson, E., Feller, A., Goel, S., & Huq, A. (2017). Algorithmic decision making and the cost of fairness. *KDD*. https://arxiv.org/abs/1701.08230 ↩
Kleinberg, J., Mullainathan, S., & Raghavan, M. (2017). Inherent trade-offs in the fair determination of risk scores. *ITCS*. https://arxiv.org/abs/1609.05807 ↩
Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J., & Weinberger, K. Q. (2017). On fairness and calibration. *NeurIPS*. https://arxiv.org/abs/1709.02012 ↩
Kamiran, F., & Calders, T. (2012). Data preprocessing techniques for classification without discrimination. *KAIS* 33(1), 1-33. https://link.springer.com/article/10.1007/s10115-011-0463-8 ↩
Feldman, M., et al. (2015). Certifying and removing disparate impact. *KDD*. https://arxiv.org/abs/1412.3756 ↩
Zemel, R., et al. (2013). Learning fair representations. *ICML*. https://proceedings.mlr.press/v28/zemel13.html ↩
Calmon, F., et al. (2017). Optimized pre-processing for discrimination prevention. *NeurIPS*. https://papers.nips.cc/paper/2017/hash/9a49a25d845a483fae4be7e341368e36-Abstract.html ↩
Zafar, M. B., et al. (2017). Fairness constraints. *AISTATS*. https://arxiv.org/abs/1507.05259 ↩
Zhang, B. H., Lemoine, B., & Mitchell, M. (2018). Mitigating unwanted biases with adversarial learning. *AIES*. https://arxiv.org/abs/1801.07593 ↩
Kamishima, T., et al. (2012). Fairness-aware classifier with prejudice remover regularizer. *ECML PKDD*. https://link.springer.com/chapter/10.1007/978-3-642-33486-3_3 ↩
Agarwal, A., et al. (2018). A reductions approach to fair classification. *ICML*. https://arxiv.org/abs/1803.02453 ↩
Kamiran, F., Karim, A., & Zhang, X. (2012). Decision theory for discrimination-aware classification. *ICDM*. https://ieeexplore.ieee.org/document/6413831 ↩
Friedler, S. A., et al. (2019). A comparative study of fairness-enhancing interventions. *FAT**. https://arxiv.org/abs/1802.04422 ↩
Bellamy, R. K. E., et al. (2018). AI Fairness 360. *IBM JRD* 63(4/5). https://arxiv.org/abs/1810.01943 ↩
Bird, S., et al. (2020). Fairlearn: A toolkit for assessing and improving fairness in AI (MSR-TR-2020-32). https://www.microsoft.com/en-us/research/publication/fairlearn-a-toolkit-for-assessing-and-improving-fairness-in-ai/ ↩
Ding, F., Hardt, M., Miller, J., & Schmidt, L. (2021). Retiring Adult. *NeurIPS*. https://arxiv.org/abs/2108.04884 ↩
EU. (2016). Regulation 2016/679 (GDPR), Article 22. https://gdpr-info.eu/art-22-gdpr/ ↩
US EEOC. (2023). Assessing adverse impact in software, algorithms, and AI under Title VII. https://www.eeoc.gov/laws/guidance/select-issues-assessing-adverse-impact-software-algorithms-and-artificial ↩
Jillson, E. (2021, April 19). Aiming for truth, fairness, and equity in your company's use of AI. US FTC. https://www.ftc.gov/business-guidance/blog/2021/04/aiming-truth-fairness-equity-your-companys-use-ai ↩
NIST. (2023, January 26). AI Risk Management Framework (AI RMF 1.0). https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf ↩
Selbst, A. D., et al. (2019). Fairness and abstraction in sociotechnical systems. *FAT**. https://dl.acm.org/doi/10.1145/3287560.3287598 ↩
Andrus, M., Spitzer, E., Brown, J., & Xiang, A. (2021). What we can't measure, we can't understand. *FAccT*. https://arxiv.org/abs/2011.02282 ↩
Liu, L. T., Dean, S., Rolf, E., Simchowitz, M., & Hardt, M. (2018). Delayed impact of fair machine learning. *ICML*. https://arxiv.org/abs/1803.04383 ↩
Mouzannar, H., Ohannessian, M. I., & Srebro, N. (2019). From fair decision making to social equality. *FAT**. https://arxiv.org/abs/1812.02952 ↩
Hu, L., & Kohler-Hausmann, I. (2020). What's sex got to do with machine learning? *FAT**. https://dl.acm.org/doi/10.1145/3351095.3375674 ↩
Barocas, S., Hardt, M., & Narayanan, A. (2023). *Fairness and Machine Learning*. MIT Press. https://fairmlbook.org/ ↩
Parrish, A., et al. (2022). BBQ: A hand-built bias benchmark. *Findings of ACL*. https://arxiv.org/abs/2110.08193 ↩
Dhamala, J., et al. (2021). BOLD: Dataset and metrics for measuring biases in open-ended language generation. *FAccT*. https://arxiv.org/abs/2101.11718 ↩
Bianchi, F., et al. (2023). Easily accessible text-to-image generation amplifies demographic stereotypes. *FAccT*. https://arxiv.org/abs/2211.03759 ↩
Nicoletti, L., & Bass, D. (2023, June 9). Humans are biased. Generative AI is even worse. *Bloomberg*. https://www.bloomberg.com/graphics/2023-generative-ai-bias/ ↩
Raji, I. D., et al. (2020). Closing the AI accountability gap. *FAT**. https://arxiv.org/abs/2001.00973 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

Algorithmic fairness

What problems motivated the field?

Where does algorithmic bias come from?

How is fairness defined mathematically?

Why can fairness criteria not all be satisfied at once?

How can unfairness be mitigated?

What tools and benchmarks does the field use?

How is algorithmic fairness regulated?

What are the main critiques and debates?

Who shaped the field, and where is it published?

How does fairness apply to large language models?

See also

References

Improve this article

What links here (24 of 30)

What links here (24 of 30)

What problems motivated the field?

Where does algorithmic bias come from?

How is fairness defined mathematically?

Why can fairness criteria not all be satisfied at once?

How can unfairness be mitigated?

What tools and benchmarks does the field use?

How is algorithmic fairness regulated?

What are the main critiques and debates?

Who shaped the field, and where is it published?

How does fairness apply to large language models?

See also

References

Improve this article

Related Articles

Confirmation Bias

AI Alignment

AI safety

AI ethics

AI bias

Responsible AI

What links here (24 of 30)

Related Articles

Confirmation Bias

AI Alignment

AI safety

AI ethics

AI bias

Responsible AI

What links here (24 of 30)