See also: Bias-Variance Tradeoff, Fairness (Machine Learning), Neural Network, Activation Function, Weight
Bias is a term with several distinct meanings in artificial intelligence and machine learning. It can refer to a learnable parameter in a neural network, a statistical property of an estimator, or systematic errors and unfair outcomes produced by AI systems. Understanding these different senses of the word is essential for anyone working in the field, because confusing them leads to muddled thinking about model design, evaluation, and ethics.
In a neural network, bias is an additional scalar value added to the weighted sum of inputs in each neuron before the activation function is applied. It provides the network with the ability to adjust the output of the neuron independent of the input.
For a machine learning model, bias is a parameter typically symbolized by either b or w0.
More formally, each neuron receives inputs from the previous layer, which are multiplied by a weight and summed up. This weighted sum is then passed through an activation function to produce the output of the neuron. The bias term provides an additional constant input to the neuron that can shift the output of the activation function in a certain direction.
For a neuron with input vector x = (x1, x2, ..., xn) and weight vector w = (w1, w2, ..., wn), the pre-activation value z is computed as:
z = w1x1 + w2x2 + ... + wnxn + b
Or in vector notation:
z = w . x + b
The activation function f is then applied to produce the neuron's output: y = f(z).
Without the bias term, the activation function's output is constrained to pass through the origin when all inputs are zero. Adding a bias allows the network to shift the activation function horizontally, giving the model more flexibility. Consider a simple case where a neuron needs to output a non-zero value when all inputs are zero. Without bias, the weighted sum would be zero, and the output would be fixed at f(0). The bias term breaks this limitation.
Bias values are learnable parameters, updated during training through backpropagation and gradient descent, just like weights. They are typically initialized to zero or small random values. During training, the bias is adjusted to minimize the loss function, helping the network model complex relationships between inputs and outputs.
| Layer Type | Bias Behavior | Notes |
|---|---|---|
| Fully connected (Dense) | One bias per neuron | Standard usage; each neuron has its own bias scalar |
| Convolutional | One bias per filter | Shared across all spatial positions of the feature map |
| Batch normalization | Bias often disabled | The batch norm layer includes its own shift parameter (beta), making a separate bias redundant |
| Recurrent (RNN/LSTM) | Bias in gate computations | Applied within each gate equation in LSTM and GRU cells |
In statistics and machine learning theory, bias refers to the systematic error introduced by approximating a real-world problem with a simplified model. This is a fundamentally different concept from the neural network parameter described above.
The bias of an estimator is the difference between the estimator's expected value and the true value of the parameter being estimated. Formally, for an estimator (\hat{\theta}) of a parameter (\theta):
Bias((\hat{\theta})) = E[(\hat{\theta})] - (\theta)
An estimator with zero bias is called unbiased. For example, the sample mean is an unbiased estimator of the population mean. In contrast, some estimators are intentionally biased because they offer lower variance, which can lead to better overall prediction accuracy.
The bias-variance tradeoff is one of the central concepts in supervised learning. The total expected prediction error of a model can be decomposed into three components:
Expected Error = Bias^2 + Variance + Irreducible Noise
Simple models (like linear regression) typically have high bias and low variance. Complex models (like deep neural networks or large decision trees) tend to have low bias but high variance. The goal is to find the right level of model complexity that minimizes total error.
| Model Complexity | Bias | Variance | Risk |
|---|---|---|---|
| Too simple (underfitting) | High | Low | Misses real patterns in the data |
| Optimal | Balanced | Balanced | Best generalization to unseen data |
| Too complex (overfitting) | Low | High | Memorizes noise in training data |
Techniques for managing the tradeoff include cross-validation, regularization (L1/L2 penalties), ensemble methods, and early stopping.
The third and most publicly discussed sense of "bias" in AI refers to systematic and unfair discrimination in the outputs of machine learning systems. This type of bias can cause real harm to individuals and communities, and it has become a major focus of AI ethics research since the mid-2010s.
Bias can enter an AI system at every stage of its lifecycle. Researchers have identified several distinct categories:
Historical bias arises when the training data reflects existing inequalities in society. Even if the data is collected perfectly, it encodes patterns of past discrimination. For example, if a hiring model is trained on a decade of resumes from a male-dominated industry, it will learn to favor male candidates, not because men are inherently better workers, but because the historical data skews that way.
Representation bias occurs when certain groups are underrepresented in the training data. A facial recognition system trained primarily on lighter-skinned faces will perform poorly on darker-skinned faces, not due to any inherent technical limitation but simply because the model has not seen enough examples.
Measurement bias results from how features are chosen and measured. Proxy variables can inadvertently encode protected attributes. For instance, using zip code as a feature in a lending model may serve as a proxy for race due to residential segregation.
Aggregation bias occurs when a single model is used for groups with different underlying distributions. A medical diagnostic model trained on data from one population may fail when applied to another population with different baseline rates of disease.
Evaluation bias happens when the benchmark datasets used to test a model do not represent the full diversity of real-world use cases. If the test set is biased, a model can appear to perform well while actually failing for underrepresented groups.
Deployment bias arises when a model is used in a context different from the one it was designed for. A system built to recommend job postings might be repurposed for screening candidates, introducing biases that were not present in the original use case.
Automation bias is the human tendency to trust automated outputs over contradictory evidence from non-automated sources. When users accept AI recommendations without critical evaluation, systematic errors in the model get amplified through the decision-making pipeline.
Several high-profile cases have drawn public attention to the real-world consequences of biased AI systems.
In 2016, ProPublica published an investigation of COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), a risk assessment tool developed by Northpointe (now Equivant) and used in U.S. courts to predict recidivism. Analyzing more than 10,000 criminal defendants in Broward County, Florida, ProPublica found that Black defendants were almost twice as likely as white defendants to be incorrectly labeled as high risk for reoffending, while white defendants were more likely to be incorrectly flagged as low risk. The tool's overall accuracy was roughly 61 percent, but the error distribution was starkly unequal across racial groups. Even after controlling for criminal history, recidivism, age, and gender, Black defendants were 77 percent more likely to be pegged as higher risk for future violent crime. Northpointe disputed ProPublica's analysis, and the resulting debate helped crystallize the mathematical impossibility of satisfying multiple fairness criteria simultaneously when base rates differ across groups.
In 2018, Reuters reported that Amazon had built an AI recruiting tool trained on ten years of resumes submitted to the company. Because the tech industry is predominantly male, the training data was overwhelmingly from male applicants. The system learned to penalize resumes that contained the word "women's" (as in "women's chess club captain") and downgraded graduates of certain all-women's colleges. Amazon attempted to correct the bias but ultimately scrapped the tool after concluding it could not be made reliably gender-neutral. The case illustrated how historical bias in training data can produce discriminatory outcomes even without any intent to discriminate, and it highlighted the legal risks under Title VII of the Civil Rights Act, which prohibits disparate impact discrimination in employment.
The 2018 Gender Shades study by Joy Buolamwini and Timnit Gebru at MIT evaluated three commercial facial recognition systems for gender classification accuracy across intersections of gender and skin type. The researchers found that for lighter-skinned males, error rates were below 1 percent, while for darker-skinned females, error rates reached 34.7 percent. The study used the Fitzpatrick Skin Type classification system and a new balanced dataset called Pilot Parliaments Benchmark. In 2019, the U.S. National Institute of Standards and Technology (NIST) published a comprehensive evaluation of 189 facial recognition algorithms, finding that the majority exhibited demographic differentials: false positive rates for Asian and African American faces were 10 to 100 times higher than for Caucasian faces, depending on the algorithm. These findings accelerated regulatory action, with several U.S. cities banning or restricting government use of facial recognition technology.
Bias in AI is not limited to software systems. Pulse oximeters, medical devices that estimate blood oxygen saturation, have been shown to systematically overestimate oxygen levels in patients with darker skin tones. The underlying issue is that the devices were calibrated primarily on lighter-skinned individuals, and increased absorption of red light by melanin was not properly accounted for during development. During the COVID-19 pandemic, this bias drew renewed scrutiny because inaccurate oxygen readings could lead clinicians to undertreat patients of color. The U.S. Food and Drug Administration (FDA) has since signaled plans to update testing requirements for pulse oximeters to address racial disparities. In AI-assisted dermatology, a 2024 Northwestern University study found that while physician-AI partnerships improved overall diagnostic accuracy, accuracy gaps between patients with light and dark skin tones actually widened, demonstrating that AI can amplify existing disparities if not carefully designed.
The study of algorithmic fairness has produced formal mathematical definitions of what it means for a model's predictions to be "fair." These definitions are used to audit systems, design fairness constraints, and navigate regulatory requirements. However, as researchers have demonstrated, it is generally impossible to satisfy all fairness criteria simultaneously.
| Fairness Metric | Definition | Intuition |
|---|---|---|
| Demographic Parity | P(Y_hat = 1 | A = 0) = P(Y_hat = 1 | A = 1) | The proportion receiving a positive prediction should be the same across groups |
| Equalized Odds | P(Y_hat = 1 | Y = y, A = 0) = P(Y_hat = 1 | Y = y, A = 1) for y in {0, 1} | True positive rate and false positive rate should be equal across groups |
| Equal Opportunity | P(Y_hat = 1 | Y = 1, A = 0) = P(Y_hat = 1 | Y = 1, A = 1) | True positive rate should be equal across groups (relaxation of equalized odds) |
| Predictive Parity | P(Y = 1 | Y_hat = 1, A = 0) = P(Y = 1 | Y_hat = 1, A = 1) | Among those predicted positive, the actual positive rate should be equal across groups |
| Calibration | P(Y = 1 | S = s, A = 0) = P(Y = 1 | S = s, A = 1) for all scores s | Among individuals assigned the same risk score, actual outcome rates should be equal across groups |
| Individual Fairness | Similar individuals should receive similar predictions | Requires a domain-specific similarity metric, which can be difficult to define |
Here, Y_hat denotes the model's prediction, Y the true label, A the protected attribute (e.g., race or gender), and S a continuous risk score.
In 2016, Kleinberg, Mullainathan, and Raghavan proved that when base rates (the actual frequency of positive outcomes) differ between groups, it is mathematically impossible to simultaneously satisfy calibration, balance for the positive class, and balance for the negative class, except in trivial cases. Chouldechova (2017) showed a related result: when base rates differ, a classifier cannot simultaneously achieve equal false positive rates and equal false negative rates across groups while maintaining calibration. These impossibility theorems mean that practitioners must make explicit choices about which fairness criteria to prioritize, informed by the specific context and consequences of the application.
Researchers have developed a range of techniques for reducing bias in AI systems. These are typically categorized by when in the machine learning pipeline they are applied.
Pre-processing approaches modify the training data before model training begins.
In-processing methods modify the training algorithm itself.
Post-processing approaches adjust the model's outputs after training.
| Approach | Timing | Advantages | Limitations |
|---|---|---|---|
| Pre-processing | Before training | Model-agnostic; can be applied with any algorithm | May lose useful information; limited control over final model behavior |
| In-processing | During training | Directly optimizes for fairness; tight control over tradeoffs | Algorithm-specific; may require custom implementations |
| Post-processing | After training | Does not require retraining; easy to apply | Cannot fix deeply embedded biases; may reduce overall accuracy |
Several benchmarks have been developed to evaluate bias in AI systems, particularly in large language models.
| Benchmark | Year | Focus | Method |
|---|---|---|---|
| WinoBias | 2018 | Gender bias in coreference resolution | 3,160 sentences with occupation-pronoun pairs; tests whether models rely on gender stereotypes to resolve pronouns |
| StereoSet | 2020 | Stereotype detection across gender, profession, race, religion | Masked language modeling task; measures whether models prefer stereotypical, anti-stereotypical, or unrelated completions |
| CrowS-Pairs | 2020 | Stereotypical bias across nine categories | Pairs of sentences differing only in a protected attribute; tests whether models assign higher likelihood to stereotypical sentences |
| BBQ | 2022 | Bias in question answering across nine social categories | Hand-built QA pairs covering age, disability, gender, nationality, physical appearance, ethnicity, religion, socioeconomic status, and sexual orientation |
| RealToxicityPrompts | 2020 | Toxicity in text generation | 100,000 naturally occurring prompts scored for toxicity; measures how often models generate toxic completions |
| SB-Bench | 2025 | Stereotype bias in multimodal models | Extends bias benchmarking to multimodal AI systems that process both images and text |
These benchmarks have limitations. StereoSet and BBQ focus on specific types of stereotypes and may not capture more subtle or context-dependent forms of bias. Benchmarks can also become targets for optimization, where models are tuned to perform well on the benchmark without genuinely reducing bias in open-ended settings. Researchers continue to develop more comprehensive and robust evaluation methods.
Governments around the world have begun to regulate AI bias, recognizing the potential for automated systems to perpetuate or amplify discrimination.
The EU AI Act, which entered into force in August 2024, classifies AI systems by risk level and imposes strict requirements on high-risk systems. Provisions related to AI literacy and prohibited practices have applied since February 2, 2025, with the majority of high-risk system obligations taking effect on August 2, 2026. Key bias-related requirements include:
The European Commission proposed a "Digital Omnibus" package in late 2025 that could postpone certain high-risk obligations until December 2027, though organizations have generally been advised to treat August 2026 as the binding deadline.
The United States has taken a more sector-specific approach. The Equal Employment Opportunity Commission (EEOC) has issued guidance on how existing anti-discrimination laws (Title VII, the Americans with Disabilities Act) apply to AI-powered hiring and employment tools. New York City's Local Law 144, effective since July 2023, requires annual bias audits of automated employment decision tools. Several states have proposed or enacted legislation addressing AI bias in specific domains such as insurance, housing, and criminal justice.
Canada's proposed Artificial Intelligence and Data Act (AIDA) would require organizations to assess and mitigate the risk of biased output from AI systems. Brazil, China, and several other countries have introduced or are developing AI governance frameworks that include provisions addressing algorithmic bias.
Large language models (LLMs) such as GPT-4, Claude, and Gemini present unique challenges for bias detection and mitigation. Because these models are trained on massive corpora of internet text, they absorb the biases present in that data, including stereotypes, toxic language, and skewed representations of different groups.
LLM bias can appear in several ways:
Approaches to reducing bias in LLMs include:
The study of bias in AI raises deep philosophical questions that do not have easy answers.
What counts as bias? Different stakeholders may disagree about whether a particular outcome is biased. A hiring model that matches the demographic composition of the applicant pool satisfies one notion of fairness (calibration) while violating another (demographic parity). The choice between these criteria is ultimately a moral and political judgment, not a purely technical one.
Bias vs. accuracy: In some cases, reducing bias may come at a cost to overall accuracy. The impossibility theorems of Kleinberg et al. and Chouldechova formalize this tradeoff. Practitioners must decide how much accuracy to sacrifice for fairness, and who bears the cost of that tradeoff.
Transparency and accountability: When AI systems make consequential decisions (in hiring, lending, criminal justice, or healthcare), affected individuals have a reasonable expectation to understand how those decisions were made. Bias auditing and algorithmic transparency are increasingly seen as prerequisites for legitimate use of AI in high-stakes settings.
Structural vs. individual bias: AI bias often reflects structural inequalities in society. Fixing the algorithm alone is not sufficient if the underlying data, institutions, and social structures remain unchanged. Some researchers argue that focusing narrowly on algorithmic fairness can distract from the deeper work of addressing systemic inequality.