Bias

Bias is a term with several distinct meanings in artificial intelligence and machine learning. It can refer to a learnable parameter in a neural network, a statistical property of an estimator, or systematic errors and unfair outcomes produced by AI systems. Understanding these different senses of the word is essential for anyone working in the field, because confusing them leads to muddled thinking about model design, evaluation, and ethics.

Bias as a Neural Network Parameter

In a neural network, bias is an additional scalar value added to the weighted sum of inputs in each neuron before the activation function is applied. It provides the network with the ability to adjust the output of the neuron independent of the input.

For a machine learning model, bias is a parameter typically symbolized by either b or w₀.

More formally, each neuron receives inputs from the previous layer, which are multiplied by a weight and summed up. This weighted sum is then passed through an activation function to produce the output of the neuron. The bias term provides an additional constant input to the neuron that can shift the output of the activation function in a certain direction.

Mathematical Formulation

For a neuron with input vector x = (x₁, x₂, ..., x_n) and weight vector w = (w₁, w₂, ..., w_n), the pre-activation value z is computed as:

z = w₁x₁ + w₂x₂ + ... + w_nx_n + b

Or in vector notation:

z = w . x + b

The activation function f is then applied to produce the neuron's output: y = f(z).

Why Bias Matters

Without the bias term, the activation function's output is constrained to pass through the origin when all inputs are zero. Adding a bias allows the network to shift the activation function horizontally, giving the model more flexibility. Consider a simple case where a neuron needs to output a non-zero value when all inputs are zero. Without bias, the weighted sum would be zero, and the output would be fixed at f(0). The bias term breaks this limitation.

Bias values are learnable parameters, updated during training through backpropagation and gradient descent, just like weights. They are typically initialized to zero or small random values. During training, the bias is adjusted to minimize the loss function, helping the network model complex relationships between inputs and outputs.

Bias in Different Layer Types

Layer Type	Bias Behavior	Notes
Fully connected (Dense)	One bias per neuron	Standard usage; each neuron has its own bias scalar
Convolutional	One bias per filter	Shared across all spatial positions of the feature map
Batch normalization	Bias often disabled	The batch norm layer includes its own shift parameter (beta), making a separate bias redundant
Recurrent (RNN/LSTM)	Bias in gate computations	Applied within each gate equation in LSTM and GRU cells

Statistical Bias (Bias-Variance Tradeoff)

In statistics and machine learning theory, bias refers to the systematic error introduced by approximating a real-world problem with a simplified model. This is a fundamentally different concept from the neural network parameter described above.

Bias of an Estimator

The bias of an estimator is the difference between the estimator's expected value and the true value of the parameter being estimated. Formally, for an estimator (\hat{\theta}) of a parameter (\theta):

Bias((\hat{\theta})) = E[(\hat{\theta})] - (\theta)

An estimator with zero bias is called unbiased. For example, the sample mean is an unbiased estimator of the population mean. In contrast, some estimators are intentionally biased because they offer lower variance, which can lead to better overall prediction accuracy.

The Bias-Variance Tradeoff

The bias-variance tradeoff is one of the central concepts in supervised learning. The total expected prediction error of a model can be decomposed into three components:

Expected Error = Bias^2 + Variance + Irreducible Noise

Bias measures how far off the model's average prediction is from the true value. High-bias models make strong assumptions about the data and tend to underfit, missing relevant patterns.
Variance measures how much the model's predictions change when trained on different subsets of the data. High-variance models are overly sensitive to the specific training data and tend to overfit.
Irreducible noise is the inherent uncertainty in the data that no model can eliminate.

Simple models (like linear regression) typically have high bias and low variance. Complex models (like deep neural networks or large decision trees) tend to have low bias but high variance. The goal is to find the right level of model complexity that minimizes total error.

Model Complexity	Bias	Variance	Risk
Too simple (underfitting)	High	Low	Misses real patterns in the data
Optimal	Balanced	Balanced	Best generalization to unseen data
Too complex (overfitting)	Low	High	Memorizes noise in training data

Techniques for managing the tradeoff include cross-validation, regularization (L1/L2 penalties), ensemble methods, and early stopping.

The third and most publicly discussed sense of "bias" in AI refers to systematic and unfair discrimination in the outputs of machine learning systems. This type of bias can cause real harm to individuals and communities, and it has become a major focus of AI ethics research since the mid-2010s.

Sources of Bias

Bias can enter an AI system at every stage of its lifecycle. Researchers have identified several distinct categories:

Historical bias arises when the training data reflects existing inequalities in society. Even if the data is collected perfectly, it encodes patterns of past discrimination. For example, if a hiring model is trained on a decade of resumes from a male-dominated industry, it will learn to favor male candidates, not because men are inherently better workers, but because the historical data skews that way.

Representation bias occurs when certain groups are underrepresented in the training data. A facial recognition system trained primarily on lighter-skinned faces will perform poorly on darker-skinned faces, not due to any inherent technical limitation but simply because the model has not seen enough examples.

Measurement bias results from how features are chosen and measured. Proxy variables can inadvertently encode protected attributes. For instance, using zip code as a feature in a lending model may serve as a proxy for race due to residential segregation.

Aggregation bias occurs when a single model is used for groups with different underlying distributions. A medical diagnostic model trained on data from one population may fail when applied to another population with different baseline rates of disease.

Evaluation bias happens when the benchmark datasets used to test a model do not represent the full diversity of real-world use cases. If the test set is biased, a model can appear to perform well while actually failing for underrepresented groups.

Deployment bias arises when a model is used in a context different from the one it was designed for. A system built to recommend job postings might be repurposed for screening candidates, introducing biases that were not present in the original use case.

Automation bias is the human tendency to trust automated outputs over contradictory evidence from non-automated sources. When users accept AI recommendations without critical evaluation, systematic errors in the model get amplified through the decision-making pipeline.

Notable Incidents of AI Bias

Several high-profile cases have drawn public attention to the real-world consequences of biased AI systems.

COMPAS Recidivism Prediction

In 2016, ProPublica published an investigation of COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), a risk assessment tool developed by Northpointe (now Equivant) and used in U.S. courts to predict recidivism. Analyzing more than 10,000 criminal defendants in Broward County, Florida, ProPublica found that Black defendants were almost twice as likely as white defendants to be incorrectly labeled as high risk for reoffending, while white defendants were more likely to be incorrectly flagged as low risk. The tool's overall accuracy was roughly 61 percent, but the error distribution was starkly unequal across racial groups. Even after controlling for criminal history, recidivism, age, and gender, Black defendants were 77 percent more likely to be pegged as higher risk for future violent crime. Northpointe disputed ProPublica's analysis, and the resulting debate helped crystallize the mathematical impossibility of satisfying multiple fairness criteria simultaneously when base rates differ across groups.

Amazon's Hiring Algorithm

In 2018, Reuters reported that Amazon had built an AI recruiting tool trained on ten years of resumes submitted to the company. Because the tech industry is predominantly male, the training data was overwhelmingly from male applicants. The system learned to penalize resumes that contained the word "women's" (as in "women's chess club captain") and downgraded graduates of certain all-women's colleges. Amazon attempted to correct the bias but ultimately scrapped the tool after concluding it could not be made reliably gender-neutral. The case illustrated how historical bias in training data can produce discriminatory outcomes even without any intent to discriminate, and it highlighted the legal risks under Title VII of the Civil Rights Act, which prohibits disparate impact discrimination in employment.

Facial Recognition Accuracy Disparities

The 2018 Gender Shades study by Joy Buolamwini and Timnit Gebru at MIT evaluated three commercial facial recognition systems for gender classification accuracy across intersections of gender and skin type. The researchers found that for lighter-skinned males, error rates were below 1 percent, while for darker-skinned females, error rates reached 34.7 percent. The study used the Fitzpatrick Skin Type classification system and a new balanced dataset called Pilot Parliaments Benchmark. In 2019, the U.S. National Institute of Standards and Technology (NIST) published a comprehensive evaluation of 189 facial recognition algorithms, finding that the majority exhibited demographic differentials: false positive rates for Asian and African American faces were 10 to 100 times higher than for Caucasian faces, depending on the algorithm. These findings accelerated regulatory action, with several U.S. cities banning or restricting government use of facial recognition technology.

Healthcare and Pulse Oximetry

Bias in AI is not limited to software systems. Pulse oximeters, medical devices that estimate blood oxygen saturation, have been shown to systematically overestimate oxygen levels in patients with darker skin tones. The underlying issue is that the devices were calibrated primarily on lighter-skinned individuals, and increased absorption of red light by melanin was not properly accounted for during development. During the COVID-19 pandemic, this bias drew renewed scrutiny because inaccurate oxygen readings could lead clinicians to undertreat patients of color. The U.S. Food and Drug Administration (FDA) has since signaled plans to update testing requirements for pulse oximeters to address racial disparities. In AI-assisted dermatology, a 2024 Northwestern University study found that while physician-AI partnerships improved overall diagnostic accuracy, accuracy gaps between patients with light and dark skin tones actually widened, demonstrating that AI can amplify existing disparities if not carefully designed.

Algorithmic Fairness

The study of algorithmic fairness has produced formal mathematical definitions of what it means for a model's predictions to be "fair." These definitions are used to audit systems, design fairness constraints, and navigate regulatory requirements. However, as researchers have demonstrated, it is generally impossible to satisfy all fairness criteria simultaneously.

Key Fairness Definitions

Fairness Metric	Definition	Intuition
Demographic Parity	P(Y_hat = 1 \| A = 0) = P(Y_hat = 1 \| A = 1)	The proportion receiving a positive prediction should be the same across groups
Equalized Odds	P(Y_hat = 1 \| Y = y, A = 0) = P(Y_hat = 1 \| Y = y, A = 1) for y in {0, 1}	True positive rate and false positive rate should be equal across groups
Equal Opportunity	P(Y_hat = 1 \| Y = 1, A = 0) = P(Y_hat = 1 \| Y = 1, A = 1)	True positive rate should be equal across groups (relaxation of equalized odds)
Predictive Parity	P(Y = 1 \| Y_hat = 1, A = 0) = P(Y = 1 \| Y_hat = 1, A = 1)	Among those predicted positive, the actual positive rate should be equal across groups
Calibration	P(Y = 1 \| S = s, A = 0) = P(Y = 1 \| S = s, A = 1) for all scores s	Among individuals assigned the same risk score, actual outcome rates should be equal across groups
Individual Fairness	Similar individuals should receive similar predictions	Requires a domain-specific similarity metric, which can be difficult to define

Here, Y_hat denotes the model's prediction, Y the true label, A the protected attribute (e.g., race or gender), and S a continuous risk score.

Impossibility Results

In 2016, Kleinberg, Mullainathan, and Raghavan proved that when base rates (the actual frequency of positive outcomes) differ between groups, it is mathematically impossible to simultaneously satisfy calibration, balance for the positive class, and balance for the negative class, except in trivial cases. Chouldechova (2017) showed a related result: when base rates differ, a classifier cannot simultaneously achieve equal false positive rates and equal false negative rates across groups while maintaining calibration. These impossibility theorems mean that practitioners must make explicit choices about which fairness criteria to prioritize, informed by the specific context and consequences of the application.

Bias Mitigation Techniques

Researchers have developed a range of techniques for reducing bias in AI systems. These are typically categorized by when in the machine learning pipeline they are applied.

Pre-processing Methods

Pre-processing approaches modify the training data before model training begins.

Reweighting: Training examples are assigned different weights so that underrepresented or disadvantaged groups have more influence on the model. The algorithm identifies points where representatives of disadvantaged groups with positive outcomes are poorly represented and upweights them.
Resampling: Oversampling underrepresented groups or undersampling overrepresented groups to create a more balanced training set.
Data augmentation: Generating synthetic examples for underrepresented groups, or transforming existing examples to remove correlations with protected attributes.
Feature transformation: Removing or transforming features that are proxies for protected attributes. This can include techniques like learning fair representations that encode useful information for prediction while obscuring group membership.

In-processing Methods

In-processing methods modify the training algorithm itself.

Adversarial debiasing: A classifier and an adversary are trained simultaneously. The classifier learns to make accurate predictions while the adversary tries to predict the protected attribute from the classifier's output. By training the classifier to maximize the adversary's loss, the system learns to make predictions that do not leak information about group membership. For demographic parity, the adversary receives only predicted labels as input. For equalized odds, the adversary also receives the true label.
Fairness constraints: Fairness requirements are incorporated directly into the optimization objective. For example, a constraint requiring equal true positive rates across groups can be added as a penalty term to the loss function.
Regularization for fairness: Adding regularization terms that penalize the model for correlations between predictions and protected attributes.
Meta-learning approaches: Training models that learn to be fair across different tasks and domains.

Post-processing Methods

Post-processing approaches adjust the model's outputs after training.

Threshold adjustment: Different classification thresholds are applied to different groups to equalize error rates. For instance, the threshold for predicting "high risk" might be higher for a group with a higher base rate.
Calibration: Adjusting the model's probability outputs so that they are well-calibrated within each group.
Reject option classification: In cases near the decision boundary, defaulting to the favorable outcome for disadvantaged groups.

Comparison of Approaches

Approach	Timing	Advantages	Limitations
Pre-processing	Before training	Model-agnostic; can be applied with any algorithm	May lose useful information; limited control over final model behavior
In-processing	During training	Directly optimizes for fairness; tight control over tradeoffs	Algorithm-specific; may require custom implementations
Post-processing	After training	Does not require retraining; easy to apply	Cannot fix deeply embedded biases; may reduce overall accuracy

Bias Evaluation Benchmarks

Several benchmarks have been developed to evaluate bias in AI systems, particularly in large language models.

Benchmark	Year	Focus	Method
WinoBias	2018	Gender bias in coreference resolution	3,160 sentences with occupation-pronoun pairs; tests whether models rely on gender stereotypes to resolve pronouns
StereoSet	2020	Stereotype detection across gender, profession, race, religion	Masked language modeling task; measures whether models prefer stereotypical, anti-stereotypical, or unrelated completions
CrowS-Pairs	2020	Stereotypical bias across nine categories	Pairs of sentences differing only in a protected attribute; tests whether models assign higher likelihood to stereotypical sentences
BBQ	2022	Bias in question answering across nine social categories	Hand-built QA pairs covering age, disability, gender, nationality, physical appearance, ethnicity, religion, socioeconomic status, and sexual orientation
RealToxicityPrompts	2020	Toxicity in text generation	100,000 naturally occurring prompts scored for toxicity; measures how often models generate toxic completions
SB-Bench	2025	Stereotype bias in multimodal models	Extends bias benchmarking to multimodal AI systems that process both images and text

These benchmarks have limitations. StereoSet and BBQ focus on specific types of stereotypes and may not capture more subtle or context-dependent forms of bias. Benchmarks can also become targets for optimization, where models are tuned to perform well on the benchmark without genuinely reducing bias in open-ended settings. Researchers continue to develop more comprehensive and robust evaluation methods.

Regulatory Landscape

Governments around the world have begun to regulate AI bias, recognizing the potential for automated systems to perpetuate or amplify discrimination.

European Union AI Act

The EU AI Act, which entered into force in August 2024, classifies AI systems by risk level and imposes strict requirements on high-risk systems. Provisions related to AI literacy and prohibited practices have applied since February 2, 2025, with the majority of high-risk system obligations taking effect on August 2, 2026. Key bias-related requirements include:

Data governance (Article 10): High-risk systems must be trained and tested on datasets that are sufficiently representative and relevant to the specific context where the AI will operate. Organizations must examine datasets for biases that could lead to discriminatory outcomes.
Bias detection and mitigation: Developers of high-risk AI systems must conduct statistical analysis of model performance across demographic segments and test for proxy discrimination.
Documentation of residual risk: When complete elimination of bias is not feasible, organizations must document why residual risk is justified by the system's benefits and what safeguards minimize remaining harm.
Human oversight: High-risk systems must include mechanisms for human oversight, allowing operators to intervene when biased outcomes are detected.

The European Commission proposed a "Digital Omnibus" package in late 2025 that could postpone certain high-risk obligations until December 2027, though organizations have generally been advised to treat August 2026 as the binding deadline.

United States

The United States has taken a more sector-specific approach. The Equal Employment Opportunity Commission (EEOC) has issued guidance on how existing anti-discrimination laws (Title VII, the Americans with Disabilities Act) apply to AI-powered hiring and employment tools. New York City's Local Law 144, effective since July 2023, requires annual bias audits of automated employment decision tools. Several states have proposed or enacted legislation addressing AI bias in specific domains such as insurance, housing, and criminal justice.

Other Jurisdictions

Canada's proposed Artificial Intelligence and Data Act (AIDA) would require organizations to assess and mitigate the risk of biased output from AI systems. Brazil, China, and several other countries have introduced or are developing AI governance frameworks that include provisions addressing algorithmic bias.

Bias in Large Language Models

Large language models (LLMs) such as GPT-4, Claude, and Gemini present unique challenges for bias detection and mitigation. Because these models are trained on massive corpora of internet text, they absorb the biases present in that data, including stereotypes, toxic language, and skewed representations of different groups.

How LLM Bias Manifests

LLM bias can appear in several ways:

Stereotypical associations: Models may associate certain occupations, traits, or behaviors with specific demographic groups, reflecting patterns in the training data.
Toxicity generation: Given certain prompts, models may generate hateful, offensive, or demeaning text targeting specific groups.
Representation gaps: Models may perform less well on text written in non-standard dialects, less-resourced languages, or about cultural contexts underrepresented in the training data.
Sycophancy and opinion bias: Models may express opinions that reflect the views dominant in their training data, systematically favoring certain political, cultural, or ideological perspectives.

Mitigation in LLMs

Approaches to reducing bias in LLMs include:

Reinforcement learning from human feedback (RLHF): Human raters evaluate model outputs for bias and toxicity, and the model is fine-tuned to produce outputs that align with human preferences.
Constitutional AI: A technique developed by Anthropic in which the model is trained to follow a set of principles that include non-discrimination and fairness.
Red-teaming: Teams of human testers systematically probe the model for biased or harmful outputs, and their findings are used to improve the model.
Data filtering and curation: Removing or downweighting toxic and biased content from training corpora.

Philosophical and Ethical Considerations

The study of bias in AI raises deep philosophical questions that do not have easy answers.

What counts as bias? Different stakeholders may disagree about whether a particular outcome is biased. A hiring model that matches the demographic composition of the applicant pool satisfies one notion of fairness (calibration) while violating another (demographic parity). The choice between these criteria is ultimately a moral and political judgment, not a purely technical one.

Bias vs. accuracy: In some cases, reducing bias may come at a cost to overall accuracy. The impossibility theorems of Kleinberg et al. and Chouldechova formalize this tradeoff. Practitioners must decide how much accuracy to sacrifice for fairness, and who bears the cost of that tradeoff.

Transparency and accountability: When AI systems make consequential decisions (in hiring, lending, criminal justice, or healthcare), affected individuals have a reasonable expectation to understand how those decisions were made. Bias auditing and algorithmic transparency are increasingly seen as prerequisites for legitimate use of AI in high-stakes settings.

Structural vs. individual bias: AI bias often reflects structural inequalities in society. Fixing the algorithm alone is not sufficient if the underlying data, institutions, and social structures remain unchanged. Some researchers argue that focusing narrowly on algorithmic fairness can distract from the deeper work of addressing systemic inequality.

References

Buolamwini, J. and Gebru, T. (2018). "Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification." *Proceedings of Machine Learning Research*, 81:1-15.
Angwin, J., Larson, J., Mattu, S., and Kirchner, L. (2016). "Machine Bias." *ProPublica*. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Dastin, J. (2018). "Amazon scraps secret AI recruiting tool that showed bias against women." *Reuters*.
Grother, P., Ngan, M., and Hanaoka, K. (2019). "Face Recognition Vendor Test Part 3: Demographic Effects." NIST Interagency Report 8280.
Kleinberg, J., Mullainathan, S., and Raghavan, M. (2016). "Inherent Trade-Offs in the Fair Determination of Risk Scores." *Proceedings of Innovations in Theoretical Computer Science (ITCS)*.
Chouldechova, A. (2017). "Fair prediction with disparate impact: A study of bias in recidivism prediction instruments." *Big Data*, 5(2):153-163.
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and Galstyan, A. (2021). "A Survey on Bias and Fairness in Machine Learning." *ACM Computing Surveys*, 54(6):1-35.
Zhang, B. H., Lemoine, B., and Mitchell, M. (2018). "Mitigating Unwanted Biases with Adversarial Learning." *Proceedings of AIES*.
Hardt, M., Price, E., and Srebro, N. (2016). "Equality of Opportunity in Supervised Learning." *Advances in Neural Information Processing Systems (NeurIPS)*.
European Parliament and Council of the European Union (2024). "Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act)."
Parrish, A., Chen, A., Nangia, N., et al. (2022). "BBQ: A Hand-Built Bias Benchmark for Question Answering." *Findings of ACL*.
Nadeem, M., Bethke, A., and Reddy, S. (2020). "StereoSet: Measuring stereotypical bias in pretrained language models." *arXiv preprint arXiv:2004.09456*.
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., and Chang, K-W. (2018). "Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods." *Proceedings of NAACL-HLT*.

Bias as a Neural Network Parameter

Mathematical Formulation

Why Bias Matters

Bias in Different Layer Types

Statistical Bias (Bias-Variance Tradeoff)

Bias of an Estimator

The Bias-Variance Tradeoff

Social and Cognitive Bias in AI Systems

Sources of Bias

Notable Incidents of AI Bias

COMPAS Recidivism Prediction

Amazon's Hiring Algorithm

Facial Recognition Accuracy Disparities

Healthcare and Pulse Oximetry

Algorithmic Fairness

Key Fairness Definitions

Impossibility Results

Bias Mitigation Techniques

Pre-processing Methods

In-processing Methods

Post-processing Methods

Comparison of Approaches

Bias Evaluation Benchmarks

Regulatory Landscape

European Union AI Act

United States

Other Jurisdictions

Bias in Large Language Models

How LLM Bias Manifests

Mitigation in LLMs

Philosophical and Ethical Considerations

References

Improve this article

Related Articles

Multi-head Latent Attention

ARC-AGI 2

GELU (Gaussian Error Linear Unit)

Predictive rate parity

Proxy (sensitive attributes)

Bias (Ethics/Fairness)

Bias as a Neural Network Parameter

Mathematical Formulation

Why Bias Matters

Bias in Different Layer Types

Statistical Bias (Bias-Variance Tradeoff)

Bias of an Estimator

The Bias-Variance Tradeoff

Social and Cognitive Bias in AI Systems

Sources of Bias

Notable Incidents of AI Bias

COMPAS Recidivism Prediction

Amazon's Hiring Algorithm

Facial Recognition Accuracy Disparities

Healthcare and Pulse Oximetry

Algorithmic Fairness

Key Fairness Definitions

Impossibility Results

Bias Mitigation Techniques

Pre-processing Methods

In-processing Methods

Post-processing Methods

Comparison of Approaches

Bias Evaluation Benchmarks

Regulatory Landscape

European Union AI Act

United States

Other Jurisdictions

Bias in Large Language Models

How LLM Bias Manifests

Mitigation in LLMs

Philosophical and Ethical Considerations

References

Related Articles

Multi-head Latent Attention

ARC-AGI 2

GELU (Gaussian Error Linear Unit)

Predictive rate parity

Proxy (sensitive attributes)

Bias (Ethics/Fairness)