AI Fairness 360 (AIF360)
Last reviewed
Apr 30, 2026
Sources
30 citations
Review status
Source-backed
Revision
v1 · 3,997 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Apr 30, 2026
Sources
30 citations
Review status
Source-backed
Revision
v1 · 3,997 words
Add missing citations, update stale details, or suggest a clearer explanation.
AI Fairness 360, abbreviated AIF360, is an open-source Python and R toolkit released by IBM Research in September 2018 that provides a comprehensive set of fairness metrics, bias-mitigation algorithms, and explainers for machine-learning models and the datasets they are trained on. It was the first widely adopted attempt to consolidate the academic literature on algorithmic fairness into a single library with a uniform API, and it has since become a standard reference implementation for both researchers comparing methods and practitioners auditing production systems. The project was donated by IBM to the Linux Foundation AI & Data Foundation (LF AI & Data) in June 2020 and now lives under the Trusted-AI GitHub organization at Trusted-AI/AIF360, alongside sister projects such as AI Explainability 360 (AIX360) and the Adversarial Robustness Toolbox (ART).
The toolkit was introduced in the paper "AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias" by Rachel K. E. Bellamy and 17 co-authors (Bellamy et al., arXiv:1810.01943, October 2018), later expanded for the IBM Journal of Research and Development in 2019. The first release shipped with more than 70 fairness metrics and 9 bias-mitigation algorithms, and the library has grown significantly since, with regular minor releases adding datasets, intersectional metrics, and tighter scikit-learn compatibility.
The origins of AIF360 trace to a summer 2018 collaboration between IBM India Research Lab and the IBM T.J. Watson Research Center. Concerns about bias in high-stakes machine-learning applications such as credit scoring, hiring, and recidivism prediction had moved from academic conferences (FAccT, then known as FAT*) into mainstream press, especially after ProPublica's May 2016 investigation into the COMPAS recidivism risk score. IBM's response was to package the most influential fairness papers from the previous decade into a single Python library that any data scientist could pip install.
The public launch happened on 19 September 2018 with a blog post on the IBM Research site by Aleksandra Mojsilovic, the arXiv preprint by Bellamy et al., and the simultaneous release of the GitHub repository (originally IBM/AIF360). The launch included an interactive web demo, three end-to-end Jupyter tutorials covering credit scoring, medical-expenditure prediction, and facial gender classification, and an extensive documentation site at aif360.readthedocs.io.
In June 2020 IBM contributed AIF360 to the Linux Foundation AI & Data Foundation as an incubation project, alongside AIX360 (AI Explainability 360) and ART (Adversarial Robustness Toolbox). At that point the GitHub repository was moved from IBM/AIF360 to Trusted-AI/AIF360, signalling that the project was no longer governed solely by IBM. Versions released after the donation continued to add fairness algorithms from the wider research community, including Agarwal et al.'s reductions approach (2018) and Celis et al.'s meta fair classifier (2019). The 0.5 series introduced an R interface, and later 0.6.x releases added sample distortion metrics, MDSS bias subset scanning, and deterministic re-ranking for fair recommendation.
| Year | Milestone |
|---|---|
| 2018 (Sept) | Open-sourced by IBM Research; arXiv preprint 1810.01943 published |
| 2019 | Expanded paper appears in IBM Journal of Research and Development vol. 63, no. 4/5 |
| 2019 | R bindings (aif360-r) released |
| 2020 (June) | Project contributed to LF AI & Data; repo moved to Trusted-AI/AIF360 |
| 2021-2022 | scikit-learn-compatible aif360.sklearn API stabilised |
| 2023-2024 | Additional datasets, MDSS bias scanning, intersectional fairness extensions |
The Bellamy et al. paper states four explicit goals for the toolkit. First, to make state-of-the-art fairness algorithms accessible to data scientists who are not specialists in the fairness literature. Second, to provide a common API so that different fairness metrics and mitigation algorithms can be compared on equal footing using the same datasets. Third, to bridge the gap between research and production, in particular to give engineers a way to retrofit fairness checks onto existing scikit-learn pipelines. Fourth, an educational mission, supported by the interactive web demo, which walks newcomers through the conceptual differences between group fairness and individual fairness using concrete examples.
A recurring design choice across the codebase is the preference for fit/transform/predict semantics borrowed from scikit-learn. Mitigation algorithms expose fit(dataset) and transform(dataset) for pre- and post-processing methods, and fit(dataset) and predict(dataset) for in-processing methods. This makes it straightforward to drop AIF360 components into existing modelling pipelines without rewriting them.
AIF360 is organised around a small set of abstractions that the rest of the library composes on top of.
At the bottom of the stack is StructuredDataset, which wraps a pandas DataFrame together with metadata describing protected attributes (such as race, sex, or age), favourable and unfavourable labels, and privileged and unprivileged groups. BinaryLabelDataset is the most common subclass and is used for binary-classification fairness, which covers most of the published literature. RegressionDataset exists for continuous targets but supports a smaller set of algorithms. The dataset object is the unit on which both metrics and mitigation algorithms operate, which avoids the awkward situation of metrics taking raw arrays without knowing which column is the protected attribute.
Three metric classes cover most use cases. BinaryLabelDatasetMetric evaluates fairness properties of a single labelled dataset, for instance whether the base rate of the positive label differs across protected groups. ClassificationMetric takes two BinaryLabelDataset objects, one with ground truth and one with model predictions, and computes metrics like equalised odds, equal opportunity, and predictive parity. SampleDistortionMetric measures how much a pre-processing transformation has altered individual records, which matters because many fairness metric interventions trade off group fairness against individual fairness.
Mitigation algorithms are split into three subpackages corresponding to where they intervene in the pipeline.
aif360.algorithms.preprocessing modifies the training data before any model sees it.aif360.algorithms.inprocessing modifies the learning algorithm itself, usually by adding a fairness term to the loss function or by training a classifier and adversary jointly.aif360.algorithms.postprocessing modifies the predictions of an already-trained classifier.This taxonomy was popularised by Friedler et al. (2019) and is the same one used by the Fairlearn library.
Each metric can be wrapped in an explainer that produces human-readable text describing what the metric value means and which subgroup is disadvantaged. The toolkit also exposes JSON-formatted explanations for integration with downstream dashboards.
The aif360.sklearn namespace, added after the LF AI donation, provides scikit-learn-compatible classes that operate directly on pandas DataFrames with MultiIndex rows where the protected attribute is part of the index. This makes it easier to chain AIF360 components with sklearn.pipeline.Pipeline and GridSearchCV for hyperparameter tuning.
The initial 2018 release exposed over 70 metrics, many of which are simple counting differences computed from the confusion matrix sliced by protected group. The most widely used ones are summarised below.
| Metric | What it measures | Origin |
|---|---|---|
| Disparate impact ratio | Ratio of positive-prediction rate for unprivileged group over privileged group; below 0.8 fails the four-fifths rule | US EEOC Uniform Guidelines on Employee Selection Procedures, 1978 |
| Statistical parity difference | Difference in positive-prediction rates between groups | Calders & Verwer 2010; Dwork et al. 2012 |
| Equal opportunity difference | Difference in true-positive rates between groups | Hardt, Price, Srebro 2016 |
| Average odds difference | Mean of TPR and FPR differences between groups | Hardt, Price, Srebro 2016 |
| Equalised odds | Joint constraint of equal TPR and equal FPR | Hardt, Price, Srebro 2016 |
| Theil index | Entropy-based measure of inequality across individual benefits | Theil 1967; adapted by Speicher et al. 2018 |
| Generalised entropy index | Family that includes Theil index as a special case | Speicher et al. 2018 |
| Consistency | How similarly the model treats nearest-neighbour individuals | Zemel et al. 2013 |
| Differential fairness | Smoothed empirical bound across all subgroups defined by protected attributes | Foulds et al. 2020 |
| Smoothed empirical differential fairness | Bayesian-smoothed version of differential fairness | Foulds et al. 2020 |
| Between-group / within-group generalised entropy | Decomposition of total inequality | Speicher et al. 2018 |
The disparate impact metric deserves a note. The four-fifths rule is a US-specific legal heuristic adopted by the Equal Employment Opportunity Commission in 1978 to flag employment selection procedures with adverse impact on protected classes. AIF360 ships it as a metric because it is widely cited, but the documentation explicitly cautions users that the rule has no legal force outside the United States, and that even within the US it is one of several signals rather than a binary pass/fail test.
The combination of pre-, in-, and post-processing algorithms is the part of AIF360 that has the largest effect on practical workflows. The following table lists the algorithms shipped with current versions, grouped by category. Each algorithm is a faithful implementation of a published method, with the original paper as the canonical reference.
| Category | Algorithm | Authors and year | One-line description |
|---|---|---|---|
| Pre-processing | Reweighing | Kamiran & Calders 2012 | Assigns weights to (group, label) cells so the joint distribution becomes statistically independent of the protected attribute |
| Pre-processing | Disparate impact remover | Feldman, Friedler, Moeller, Scheidegger, Venkatasubramanian 2015 | Edits feature values to remove disparate impact while preserving within-group rank ordering |
| Pre-processing | Learning fair representations (LFR) | Zemel, Wu, Swersky, Pitassi, Dwork 2013 | Learns a latent representation that obfuscates the protected attribute while preserving utility |
| Pre-processing | Optimised pre-processing | Calmon, Wei, Vinzamuri, Ramamurthy, Varshney 2017 | Convex optimisation that jointly bounds discrimination, individual distortion, and utility loss |
| In-processing | Adversarial debiasing | Zhang, Lemoine, Mitchell 2018 | Trains a classifier and an adversary jointly so the adversary cannot recover the protected attribute from predictions |
| In-processing | Prejudice remover | Kamishima, Akaho, Asoh, Sakuma 2012 | Adds a prejudice index regulariser to logistic regression |
| In-processing | Meta fair classifier | Celis, Huang, Keswani, Vishnoi 2019 | Takes a fairness metric as input and returns a classifier optimal for that metric |
| In-processing | Exponentiated gradient reduction | Agarwal, Beygelzimer, Dudik, Langford, Wallach 2018 | Reduces fair classification to a sequence of cost-sensitive classification problems |
| In-processing | Grid search reduction | Agarwal, Dudik, Wu 2019 | Grid-search variant of the reductions approach for binary classification and regression |
| In-processing | GerryFair classifier | Kearns, Neel, Roth, Wu 2018 | Audits and mitigates fairness gerrymandering across rich subgroups |
| In-processing | ART classifier wrapper | Trusted-AI 2020 | Wraps an Adversarial Robustness Toolbox classifier so it can be measured by AIF360 |
| Post-processing | Equalised odds post-processing | Hardt, Price, Srebro 2016 | Solves a linear program to randomise predictions until TPR and FPR match across groups |
| Post-processing | Calibrated equalised odds | Pleiss, Raghavan, Wu, Kleinberg, Weinberger 2017 | Relaxes equalised odds to preserve calibration on classifier scores |
| Post-processing | Reject option classification | Kamiran, Karim, Zhang 2012 | Assigns favourable outcomes to the unprivileged group and unfavourable outcomes to the privileged group inside a confidence band around the decision boundary |
| Post-processing | Deterministic re-ranking | Yang & Stoyanovich 2017 (variant) | Rebalances ranked candidate lists to satisfy a fairness constraint at every prefix |
Most in-processing methods have specific dependencies. Adversarial debiasing requires TensorFlow because the original paper used TensorFlow's gradient reversal layer, and optimised pre-processing requires CVXPY because it solves a convex program. The library ships these as optional extras (pip install 'aif360[all]') so users without GPUs do not need to install TensorFlow.
AIF360 ships several benchmark datasets that the fairness literature has converged on, mostly because they are public and have natural protected attributes. These include the following.
| Dataset | Domain | Protected attributes | Notes |
|---|---|---|---|
| Adult (Census Income) | US Census 1994 | Sex, race | Predicting whether income exceeds 50,000 USD; the most cited fairness benchmark |
| German Credit | Banking | Sex, age | UCI repository; predicting credit risk |
| COMPAS | Criminal justice | Race, sex | The dataset behind the ProPublica 2016 investigation; recidivism risk in Broward County, FL |
| Bank Marketing | Marketing | Age | Portuguese banking institution telemarketing dataset |
| MEPS (19, 20, 21) | Healthcare | Race | Medical Expenditure Panel Survey; predicting high utilisation of care |
| Law School GPA | Education | Race, sex | Wightman 1998 LSAC; first-year GPA; supports regression fairness |
The datasets ship with helper code to download the raw files (because most cannot be redistributed under permissive licences) and to encode protected attributes consistently with the fairness literature.
Because AIF360's mitigation algorithms expose either scikit-learn or transform-style APIs, integration with the broader Python ML ecosystem is straightforward. A few specific integrations are worth calling out.
aif360.sklearn module is designed to plug into scikit-learn pipelines.A handful of open-source fairness toolkits emerged in the same 2018-2020 window, each with slightly different priorities. The table below summarises the main options.
| Toolkit | Organisation | Language | Licence | Primary focus | Mitigation algorithms | Notable feature |
|---|---|---|---|---|---|---|
| AIF360 | IBM, now LF AI & Data | Python and R | Apache 2.0 | End-to-end metrics and mitigation | 13+ across pre, in, post | Largest library of mitigation algorithms |
| Fairlearn | Microsoft | Python | MIT | Reductions approach plus dashboard | Several pre and post; reductions for in-processing | Dashboard widget for Jupyter |
| Aequitas | University of Chicago | Python and CLI | MIT | Audit reports for policymakers | None; audit only | Web-app and CLI interfaces |
| TF Fairness Indicators | Python (TFX) | Apache 2.0 | Slice-based metrics for TensorFlow models | None | Tight integration with TFX pipelines | |
| What-If Tool | Google PAIR | Python (TensorBoard) | Apache 2.0 | Interactive what-if analysis | None; counterfactual exploration | Visual interactive tool |
| Themis-ML | N. Bantilan 2017 | Python | MIT | Early academic toolkit | A handful of preprocessing methods | Predates AIF360 |
In practice many teams use more than one. A common workflow is to audit with Aequitas because it produces a report that lawyers and policy people can read, mitigate with AIF360 because it has the widest selection of methods, and visualise with the What-If Tool when a senior stakeholder wants to interrogate individual records.
The AIF360 documentation is unusually candid about the limits of what a software library can do for algorithmic fairness, and the academic literature has been clear that no toolkit can resolve the fundamental tensions in the field.
The impossibility theorems. Chouldechova (2017) and Kleinberg, Mullainathan, & Raghavan (2017) independently showed that several intuitive fairness criteria, including calibration, predictive parity, and equalised false-positive rates, cannot all hold simultaneously when base rates differ across groups. AIF360 implements the metrics but cannot square this circle: a user has to pick which fairness criterion they care about, and that choice is normative, not technical.
Group fairness versus individual fairness. Most AIF360 metrics are group fairness metrics. They say nothing about whether two similar individuals from different groups are treated similarly, which is the individual fairness criterion of Dwork et al. (2012). Consistency and Theil index partially address this, but the toolkit's centre of gravity is group fairness.
Binary protected attributes. The bulk of the implementations assume a single binary protected attribute. Intersectional fairness (Crenshaw 1989; Buolamwini & Gebru 2018) is supported through the differential fairness metrics added later, but in-processing algorithms generally do not natively handle multiple intersecting protected attributes. The GerryFair classifier is the main exception.
Need for labelled protected attributes. Most mitigation algorithms require the protected attribute to be available at training time. In many production settings this is legally or ethically constrained (the GDPR places restrictions on processing of "special category" personal data), and "fairness through unawareness" is known to be insufficient. AIF360 cannot solve this; it assumes the data is already there.
Performance loss is not always quantified. Most published fairness mitigations reduce model accuracy or revenue. The toolkit reports both the fairness metric and standard accuracy after mitigation, but it does not always make the Pareto trade-off explicit, and the optimal point on that trade-off is application-specific.
Cultural and regulatory context. The four-fifths rule is from the US EEOC and has no force in the EU, the UK, India, or most other jurisdictions. AIF360 ships it as a metric without enforcing this caveat in code. Users who deploy AIF360 outside the US should not assume that passing the four-fifths rule means they are compliant with local anti-discrimination law.
It does not replace bias auditing by humans. Friedler et al. (2019) and Holstein et al. (2019) found that the social and organisational context in which a model is deployed often dominates the technical fairness intervention. AIF360 is a useful tool inside that process, not a substitute for it.
AIF360 has been cited by more than 2,500 academic papers since 2018 and is widely used as a teaching tool in machine-learning ethics courses, including Princeton's COS 597E, Stanford's CS 281, and Harvard's CS 282R. The toolkit is the reference implementation for several follow-up fairness papers, which makes citation counts somewhat misleading: a paper that benchmarks against AIF360 cites it even if the paper proposes a competing method.
Industry usage is harder to measure because compliance pipelines are rarely public, but several documented examples exist. IBM uses AIF360 inside its watsonx Governance product as part of the Trusted AI lifecycle. Several large US banks have referenced AIF360 in regulatory filings about model risk management. The dataset and mitigation choices in some published audit reports, including for the COMPAS dataset, mirror the AIF360 pipeline.
The project is also part of IBM's broader Trusted AI suite, which together with AIF360 includes:
Releases in 2023 and 2024 focused on tighter integration with modern ML platforms. Newer datasets, including additional MEPS years and extensions to the law school dataset, have been added for benchmarking. The MDSS bias subset scanning method, contributed by researchers at Carnegie Mellon and the University of Edinburgh, was added to support intersectional bias detection. An expanded set of post-processing methods now includes deterministic re-ranking, which is useful for fair recommender systems. Continuous-integration support has been broadened to Python 3.11 across macOS, Ubuntu, and Windows.
Community governance under LF AI & Data has settled into a pattern of regular maintainer meetings, an open Slack workspace, and a clear contribution guide. Pull requests now come from a wider set of organisations than the original IBM-only contributor list, including academic groups at Carnegie Mellon, Edinburgh, and the University of Chicago, and corporate contributors from companies that use AIF360 internally.