AI Fairness 360 (AIF360)
Last reviewed
Sources
30 citations
Review status
Source-backed
Revision
v2 ยท 4,678 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
30 citations
Review status
Source-backed
Revision
v2 ยท 4,678 words
Add missing citations, update stale details, or suggest a clearer explanation.
AI Fairness 360, abbreviated AIF360, is an open-source Python and R toolkit, originally created by IBM Research and released in September 2018, that detects and mitigates unwanted bias in machine learning models and the datasets they are trained on [1][5]. Its initial release shipped with more than 70 fairness metrics and nine bias-mitigation algorithms spanning three intervention stages (pre-processing, in-processing, and post-processing), all wrapped in a single uniform API [5]. In June 2020, IBM donated the project to the Linux Foundation AI & Data Foundation (LF AI & Data), and it is now community-governed under the Trusted-AI GitHub organization at Trusted-AI/AIF360, where the algorithm count has since grown to 13 [6][29].
AIF360 was the first widely adopted attempt to consolidate the academic literature on algorithmic fairness into a single library with a uniform API, and it has since become a standard reference implementation for both researchers comparing methods and practitioners auditing production systems. IBM Research described the package on launch as "a comprehensive open-source toolkit of metrics to check for unwanted bias in datasets and machine learning models, and state-of-the-art algorithms to mitigate such bias" [5]. The project sits alongside sister toolkits such as AI Explainability 360 (AIX360) and the Adversarial Robustness Toolbox (ART), which were donated to LF AI & Data at the same time [29].
The toolkit was introduced in the paper "AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias" by Rachel K. E. Bellamy and 17 co-authors (Bellamy et al., arXiv:1810.01943, submitted 3 October 2018), later expanded for the IBM Journal of Research and Development in 2019 [1][2]. The library has grown steadily since, with regular minor releases adding datasets, intersectional metrics, an R interface, and tighter scikit-learn compatibility.
AIF360 is used to measure and reduce algorithmic bias at three points in a typical machine-learning workflow: in the training data, inside the learning algorithm, and in the predictions of an already-trained model. A data scientist uses the toolkit to compute fairness metrics (for example, whether a model approves loans at a lower rate for one protected group than another), and then applies one of its mitigation algorithms to bring those metrics closer to parity while monitoring the impact on accuracy [1]. The stated goal of the project, per the Bellamy et al. paper, is "to help facilitate the transition of fairness research algorithms to use in an industrial setting and to provide a common framework for fairness researchers to share and evaluate algorithms" [1].
The toolkit is released under the Apache 2.0 license and is installable with pip install aif360 in Python or install.packages("aif360") in R [3]. Typical applications include credit scoring, hiring, recidivism prediction, healthcare resource allocation, and any other high-stakes domain where a protected attribute such as race, sex, or age could influence model decisions.
The origins of AIF360 trace to a 2018 collaboration between IBM India Research Lab and the IBM T.J. Watson Research Center. Concerns about bias in high-stakes machine-learning applications such as credit scoring, hiring, and recidivism prediction had moved from academic conferences (FAccT, then known as FAT*) into mainstream press, especially after ProPublica's May 2016 investigation into the COMPAS recidivism risk score [30]. IBM's response was to package the most influential fairness papers from the previous decade into a single Python library that any data scientist could pip install.
The public launch happened on 19 September 2018 with a blog post on the IBM Research site, the arXiv preprint by Bellamy et al., and the simultaneous release of the GitHub repository (originally IBM/AIF360) [5][1]. The launch included an interactive web demo, three end-to-end Jupyter tutorials covering credit scoring, medical-expenditure prediction, and facial gender classification, and an extensive documentation site at aif360.readthedocs.io [4]. IBM Research framed the release as a community effort, writing: "we hope that the package is not only a way to bring all of us researchers together, but also a way to translate our collective research results to data scientists, data engineers, and developers" [5].
In June 2020 IBM contributed AIF360 to the Linux Foundation AI & Data Foundation as an incubation project, alongside AIX360 (AI Explainability 360) and ART (Adversarial Robustness Toolbox) [29]. The LF AI & Data Technical Advisory Committee voted to host and incubate the three Trusted AI toolkits on 18 June 2020 [6]. At that point the GitHub repository was moved from IBM/AIF360 to Trusted-AI/AIF360, signalling that the project was no longer governed solely by IBM. Versions released after the donation continued to add fairness algorithms from the wider research community, including Agarwal et al.'s reductions approach (2018) and Celis et al.'s meta fair classifier (2019) [17][16]. The 0.5 series introduced an R interface, and later 0.6.x releases added sample distortion metrics, MDSS bias subset scanning, and deterministic re-ranking for fair recommendation.
| Year | Milestone |
|---|---|
| 2018 (Sept) | Open-sourced by IBM Research; arXiv preprint 1810.01943 published; over 70 metrics and 9 algorithms in the first release [5] |
| 2019 | Expanded paper appears in IBM Journal of Research and Development vol. 63, no. 4/5 [2] |
| 2019 | R bindings (aif360-r) released |
| 2020 (June) | Project contributed to LF AI & Data; repo moved to Trusted-AI/AIF360 [6][29] |
| 2021-2022 | scikit-learn-compatible aif360.sklearn API stabilised |
| 2023-2024 | Additional datasets, MDSS bias scanning, intersectional fairness extensions; algorithm count reaches 13 [3] |
The Bellamy et al. paper states four explicit goals for the toolkit [1]. First, to make state-of-the-art fairness algorithms accessible to data scientists who are not specialists in the fairness literature. Second, to provide a common API so that different fairness metrics and mitigation algorithms can be compared on equal footing using the same datasets. Third, to bridge the gap between research and production, in particular to give engineers a way to retrofit fairness checks onto existing scikit-learn pipelines. Fourth, an educational mission, supported by the interactive web demo, which walks newcomers through the conceptual differences between group fairness and individual fairness using concrete examples.
A recurring design choice across the codebase is the preference for fit/transform/predict semantics borrowed from scikit-learn. Mitigation algorithms expose fit(dataset) and transform(dataset) for pre- and post-processing methods, and fit(dataset) and predict(dataset) for in-processing methods. This makes it straightforward to drop AIF360 components into existing modelling pipelines without rewriting them.
AIF360 is organised around a small set of abstractions that the rest of the library composes on top of.
At the bottom of the stack is StructuredDataset, which wraps a pandas DataFrame together with metadata describing protected attributes (such as race, sex, or age), favourable and unfavourable labels, and privileged and unprivileged groups. BinaryLabelDataset is the most common subclass and is used for binary-classification fairness, which covers most of the published literature. RegressionDataset exists for continuous targets but supports a smaller set of algorithms. The dataset object is the unit on which both metrics and mitigation algorithms operate, which avoids the awkward situation of metrics taking raw arrays without knowing which column is the protected attribute.
Three metric classes cover most use cases. BinaryLabelDatasetMetric evaluates fairness properties of a single labelled dataset, for instance whether the base rate of the positive label differs across protected groups. ClassificationMetric takes two BinaryLabelDataset objects, one with ground truth and one with model predictions, and computes metrics like equalised odds, equal opportunity, and predictive parity. SampleDistortionMetric measures how much a pre-processing transformation has altered individual records, which matters because many fairness metric interventions trade off group fairness against individual fairness.
Mitigation algorithms are split into three subpackages corresponding to where they intervene in the pipeline.
aif360.algorithms.preprocessing modifies the training data before any model sees it.aif360.algorithms.inprocessing modifies the learning algorithm itself, usually by adding a fairness term to the loss function or by training a classifier and adversary jointly.aif360.algorithms.postprocessing modifies the predictions of an already-trained classifier.This taxonomy was popularised by Friedler et al. (2019) and is the same one used by the Fairlearn library [26].
Each metric can be wrapped in an explainer that produces human-readable text describing what the metric value means and which subgroup is disadvantaged. The toolkit also exposes JSON-formatted explanations for integration with downstream dashboards.
The aif360.sklearn namespace, added after the LF AI donation, provides scikit-learn-compatible classes that operate directly on pandas DataFrames with MultiIndex rows where the protected attribute is part of the index. This makes it easier to chain AIF360 components with sklearn.pipeline.Pipeline and GridSearchCV for hyperparameter tuning.
The initial 2018 release exposed more than 70 fairness metrics, many of which are simple counting differences computed from the confusion matrix sliced by protected group [5]. The most widely used ones are summarised below.
| Metric | What it measures | Origin |
|---|---|---|
| Disparate impact ratio | Ratio of positive-prediction rate for unprivileged group over privileged group; below 0.8 fails the four-fifths rule | US EEOC Uniform Guidelines on Employee Selection Procedures, 1978 [25] |
| Statistical parity difference | Difference in positive-prediction rates between groups | Calders & Verwer 2010; Dwork et al. 2012 [7][24] |
| Equal opportunity difference | Difference in true-positive rates between groups | Hardt, Price, Srebro 2016 [8] |
| Average odds difference | Mean of TPR and FPR differences between groups | Hardt, Price, Srebro 2016 [8] |
| Equalised odds | Joint constraint of equal TPR and equal FPR | Hardt, Price, Srebro 2016 [8] |
| Theil index | Entropy-based measure of inequality across individual benefits | Theil 1967; adapted by Speicher et al. 2018 [20] |
| Generalised entropy index | Family that includes Theil index as a special case | Speicher et al. 2018 [20] |
| Consistency | How similarly the model treats nearest-neighbour individuals | Zemel et al. 2013 [15] |
| Differential fairness | Smoothed empirical bound across all subgroups defined by protected attributes | Foulds et al. 2020 [21] |
| Smoothed empirical differential fairness | Bayesian-smoothed version of differential fairness | Foulds et al. 2020 [21] |
| Between-group / within-group generalised entropy | Decomposition of total inequality | Speicher et al. 2018 [20] |
The disparate impact metric deserves a note. The four-fifths rule is a US-specific legal heuristic adopted by the Equal Employment Opportunity Commission in 1978 to flag employment selection procedures with adverse impact on protected classes [25]. AIF360 ships it as a metric because it is widely cited, but the documentation explicitly cautions users that the rule has no legal force outside the United States, and that even within the US it is one of several signals rather than a binary pass/fail test [4].
The combination of pre-, in-, and post-processing algorithms is the part of AIF360 that has the largest effect on practical workflows. The first release contained nine such algorithms; current versions on the Trusted-AI/AIF360 repository ship 13, each contributed by the broader algorithmic-fairness research community [5][3]. The following table lists the algorithms shipped with current versions, grouped by category. Each algorithm is a faithful implementation of a published method, with the original paper as the canonical reference.
| Category | Algorithm | Authors and year | One-line description |
|---|---|---|---|
| Pre-processing | Reweighing | Kamiran & Calders 2012 [12] | Assigns weights to (group, label) cells so the joint distribution becomes statistically independent of the protected attribute |
| Pre-processing | Disparate impact remover | Feldman, Friedler, Moeller, Scheidegger, Venkatasubramanian 2015 [14] | Edits feature values to remove disparate impact while preserving within-group rank ordering |
| Pre-processing | Learning fair representations (LFR) | Zemel, Wu, Swersky, Pitassi, Dwork 2013 [15] | Learns a latent representation that obfuscates the protected attribute while preserving utility |
| Pre-processing | Optimised pre-processing | Calmon, Wei, Vinzamuri, Ramamurthy, Varshney 2017 [13] | Convex optimisation that jointly bounds discrimination, individual distortion, and utility loss |
| In-processing | Adversarial debiasing | Zhang, Lemoine, Mitchell 2018 [11] | Trains a classifier and an adversary jointly so the adversary cannot recover the protected attribute from predictions |
| In-processing | Prejudice remover | Kamishima, Akaho, Asoh, Sakuma 2012 [10] | Adds a prejudice index regulariser to logistic regression |
| In-processing | Meta fair classifier | Celis, Huang, Keswani, Vishnoi 2019 [16] | Takes a fairness metric as input and returns a classifier optimal for that metric |
| In-processing | Exponentiated gradient reduction | Agarwal, Beygelzimer, Dudik, Langford, Wallach 2018 [17] | Reduces fair classification to a sequence of cost-sensitive classification problems |
| In-processing | Grid search reduction | Agarwal, Dudik, Wu 2019 | Grid-search variant of the reductions approach for binary classification and regression |
| In-processing | GerryFair classifier | Kearns, Neel, Roth, Wu 2018 [19] | Audits and mitigates fairness gerrymandering across rich subgroups |
| In-processing | ART classifier wrapper | Trusted-AI 2020 | Wraps an Adversarial Robustness Toolbox classifier so it can be measured by AIF360 |
| Post-processing | Equalised odds post-processing | Hardt, Price, Srebro 2016 [8] | Solves a linear program to randomise predictions until TPR and FPR match across groups |
| Post-processing | Calibrated equalised odds | Pleiss, Raghavan, Wu, Kleinberg, Weinberger 2017 [9] | Relaxes equalised odds to preserve calibration on classifier scores |
| Post-processing | Reject option classification | Kamiran, Karim, Zhang 2012 [18] | Assigns favourable outcomes to the unprivileged group and unfavourable outcomes to the privileged group inside a confidence band around the decision boundary |
| Post-processing | Deterministic re-ranking | Yang & Stoyanovich 2017 (variant) | Rebalances ranked candidate lists to satisfy a fairness constraint at every prefix |
Three of these names appear among the example algorithms IBM Research highlighted at launch: "Optimized Pre-Processing for Discrimination Prevention," "Certifying and Removing Disparate Impact," and "Learning Fair Representations" [5]. Most in-processing methods have specific dependencies. Adversarial debiasing requires TensorFlow because the original paper used TensorFlow's gradient reversal layer, and optimised pre-processing requires CVXPY because it solves a convex program. The library ships these as optional extras (pip install 'aif360[all]') so users without GPUs do not need to install TensorFlow.
AIF360 ships several benchmark datasets that the fairness literature has converged on, mostly because they are public and have natural protected attributes. These include the following.
| Dataset | Domain | Protected attributes | Notes |
|---|---|---|---|
| Adult (Census Income) | US Census 1994 | Sex, race | Predicting whether income exceeds 50,000 USD; the most cited fairness benchmark |
| German Credit | Banking | Sex, age | UCI repository; predicting credit risk |
| COMPAS | Criminal justice | Race, sex | The dataset behind the ProPublica 2016 investigation; recidivism risk in Broward County, FL [30] |
| Bank Marketing | Marketing | Age | Portuguese banking institution telemarketing dataset |
| MEPS (19, 20, 21) | Healthcare | Race | Medical Expenditure Panel Survey; predicting high utilisation of care |
| Law School GPA | Education | Race, sex | Wightman 1998 LSAC; first-year GPA; supports regression fairness |
The datasets ship with helper code to download the raw files (because most cannot be redistributed under permissive licences) and to encode protected attributes consistently with the fairness literature.
Yes. AIF360 began as a Python package and added an R interface in the 0.5 series, and the project README states that "the AI Fairness 360 package is available in both Python and R" [3]. The Python package is installed with pip install aif360 and the R package with install.packages("aif360"). Both expose the same core abstractions (datasets, metrics, and the three families of mitigation algorithms), although the Python package receives new algorithms first and remains the more actively developed of the two.
Because AIF360's mitigation algorithms expose either scikit-learn or transform-style APIs, integration with the broader Python ML ecosystem is straightforward. A few specific integrations are worth calling out.
aif360.sklearn module is designed to plug into scikit-learn pipelines.A handful of open-source fairness toolkits emerged in the same 2018-2020 window, each with slightly different priorities. The table below summarises the main options.
| Toolkit | Organisation | Language | Licence | Primary focus | Mitigation algorithms | Notable feature |
|---|---|---|---|---|---|---|
| AIF360 | IBM, now LF AI & Data | Python and R | Apache 2.0 | End-to-end metrics and mitigation | 13 across pre, in, post | Largest library of mitigation algorithms |
| Fairlearn | Microsoft | Python | MIT | Reductions approach plus dashboard | Several pre and post; reductions for in-processing | Dashboard widget for Jupyter |
| Aequitas | University of Chicago | Python and CLI | MIT | Audit reports for policymakers | None; audit only | Web-app and CLI interfaces |
| TF Fairness Indicators | Python (TFX) | Apache 2.0 | Slice-based metrics for TensorFlow models | None | Tight integration with TFX pipelines | |
| What-If Tool | Google PAIR | Python (TensorBoard) | Apache 2.0 | Interactive what-if analysis | None; counterfactual exploration | Visual interactive tool |
| Themis-ML | N. Bantilan 2017 | Python | MIT | Early academic toolkit | A handful of preprocessing methods | Predates AIF360 |
In practice many teams use more than one. A common workflow is to audit with Aequitas because it produces a report that lawyers and policy people can read, mitigate with AIF360 because it has the widest selection of methods, and visualise with the What-If Tool when a senior stakeholder wants to interrogate individual records.
The AIF360 documentation is unusually candid about the limits of what a software library can do for algorithmic fairness, and the academic literature has been clear that no toolkit can resolve the fundamental tensions in the field.
The impossibility theorems. Chouldechova (2017) and Kleinberg, Mullainathan, & Raghavan (2017) independently showed that several intuitive fairness criteria, including calibration, predictive parity, and equalised false-positive rates, cannot all hold simultaneously when base rates differ across groups [22][23]. AIF360 implements the metrics but cannot square this circle: a user has to pick which fairness criterion they care about, and that choice is normative, not technical.
Group fairness versus individual fairness. Most AIF360 metrics are group fairness metrics. They say nothing about whether two similar individuals from different groups are treated similarly, which is the individual fairness criterion of Dwork et al. (2012) [24]. Consistency and Theil index partially address this, but the toolkit's centre of gravity is group fairness.
Binary protected attributes. The bulk of the implementations assume a single binary protected attribute. Intersectional fairness (Crenshaw 1989; Buolamwini & Gebru 2018) is supported through the differential fairness metrics added later, but in-processing algorithms generally do not natively handle multiple intersecting protected attributes [28]. The GerryFair classifier is the main exception.
Need for labelled protected attributes. Most mitigation algorithms require the protected attribute to be available at training time. In many production settings this is legally or ethically constrained (the GDPR places restrictions on processing of "special category" personal data), and "fairness through unawareness" is known to be insufficient. AIF360 cannot solve this; it assumes the data is already there.
Performance loss is not always quantified. Most published fairness mitigations reduce model accuracy or revenue. The toolkit reports both the fairness metric and standard accuracy after mitigation, but it does not always make the Pareto trade-off explicit, and the optimal point on that trade-off is application-specific.
Cultural and regulatory context. The four-fifths rule is from the US EEOC and has no force in the EU, the UK, India, or most other jurisdictions [25]. AIF360 ships it as a metric without enforcing this caveat in code. Users who deploy AIF360 outside the US should not assume that passing the four-fifths rule means they are compliant with local anti-discrimination law.
It does not replace bias auditing by humans. Friedler et al. (2019) and Holstein et al. (2019) found that the social and organisational context in which a model is deployed often dominates the technical fairness intervention [26][27]. AIF360 is a useful tool inside that process, not a substitute for it.
AIF360 has been cited by thousands of academic papers since 2018 and is widely used as a teaching tool in machine-learning ethics courses. The toolkit is the reference implementation for several follow-up fairness papers, which makes raw citation counts somewhat misleading: a paper that benchmarks against AIF360 cites it even if the paper proposes a competing method.
Industry usage is harder to measure because compliance pipelines are rarely public, but several documented examples exist. IBM uses AIF360 inside its watsonx Governance product as part of the Trusted AI lifecycle. Several large US banks have referenced AIF360 in regulatory filings about model risk management. The dataset and mitigation choices in some published audit reports, including for the COMPAS dataset, mirror the AIF360 pipeline.
The project is also part of IBM's broader Trusted AI suite, which together with AIF360 includes:
Releases in 2023 and 2024 focused on tighter integration with modern ML platforms. Newer datasets, including additional MEPS years and extensions to the law school dataset, have been added for benchmarking. The MDSS bias subset scanning method, contributed by researchers at Carnegie Mellon and the University of Edinburgh, was added to support intersectional bias detection. An expanded set of post-processing methods now includes deterministic re-ranking, which is useful for fair recommender systems. Continuous-integration support has been broadened to Python 3.11 across macOS, Ubuntu, and Windows.
Community governance under LF AI & Data has settled into a pattern of regular maintainer meetings, an open Slack workspace, and a clear contribution guide. Pull requests now come from a wider set of organisations than the original IBM-only contributor list, including academic groups at Carnegie Mellon, Edinburgh, and the University of Chicago, and corporate contributors from companies that use AIF360 internally.
Imagine a computer program that decides who gets a loan. If the program quietly says "yes" more often to one group of people than another, that is unfair bias. AI Fairness 360 is a free toolbox that does two things. First, it measures how lopsided the program's decisions are, using more than 70 different fairness rulers. Second, it offers 13 different ways to fix the lopsidedness: it can clean up the data before the program learns from it, change how the program learns, or adjust the program's answers after the fact. IBM built it and then gave it away to a non-profit so anyone can use it for free.