Automation bias is the tendency for humans to favor suggestions and outputs from automated decision-making systems over contradictory information from non-automated sources, even when the non-automated information is correct. The term describes a specific form of cognitive bias in which people assign greater authority and trust to automated recommendations than the situation warrants, often accepting them without independent verification. Automation bias affects both novice and expert users and has been documented across a wide range of domains, including aviation, healthcare, criminal justice, military operations, and autonomous driving.
The concept was formally introduced in the mid-1990s through research on cockpit automation by Kathleen Mosier, Linda Skitka, and colleagues, who defined automation bias as the use of automated cues "as a heuristic replacement for vigilant information seeking and processing." As artificial intelligence and machine learning systems become more prevalent in everyday decision-making, automation bias has become a growing concern in the fields of AI safety, fairness, and human-computer interaction.
Imagine you have a magic toy that tells you the color of things. Most of the time it gets the answer right, so you start trusting it completely. One day, the toy says a red ball is blue. Even though you can see the ball is red with your own eyes, you believe the toy instead because it has been right so many times before. That is automation bias: trusting a machine so much that you stop thinking for yourself, even when you have good reasons to disagree.
Concerns about human over-reliance on automated systems predate modern AI. In 1983, Lisanne Bainbridge published "Ironies of Automation" in the journal Automatica, a paper that has since attracted over 1,800 citations. Bainbridge identified a fundamental paradox: the more tasks are automated, the less practice human operators get at performing those tasks manually, which means they are less prepared for the rare but important moments when they need to intervene. She also observed that increased automation decreases cognitive workload under normal conditions but simultaneously increases the opportunities for monitoring errors.
The term "automation bias" was introduced by Mosier, Skitka, Heers, and Burdick in a 1996 study examining pilot behavior in simulated cockpit environments. Their research showed that pilots relied on automated aids as a shortcut for information processing and made systematic errors as a result. In a follow-up study published in 1999 in the International Journal of Human-Computer Studies, Skitka, Mosier, and Burdick demonstrated that participants working with a highly (but not perfectly) reliable automated aid performed worse on monitoring tasks than participants working without automation. The automated group committed both errors of omission and commission that the non-automated group avoided.
In 2010, Raja Parasuraman and Dietrich Manzey published a major review in Human Factors titled "Complacency and Bias in Human Use of Automation: An Attentional Integration." This paper proposed an integrated theoretical model showing that automation bias and automation-induced complacency arise from the dynamic interaction of personal, situational, and automation-related characteristics, with attention as the central mechanism. Their framework positioned automation bias and complacency as "different manifestations of overlapping automation-induced phenomena" rather than entirely separate constructs.
Research has identified several psychological and situational factors that contribute to automation bias.
| Factor | Description |
|---|---|
| Cognitive miser tendency | Humans naturally prefer the path of least cognitive effort. Accepting an automated recommendation requires less mental work than independently evaluating all available information. |
| Perceived superiority of automation | Users tend to view automated systems as analytically superior to human judgment, leading them to defer to machine outputs even when their own assessment conflicts. |
| Effort reduction under task sharing | When sharing decision-making responsibility with an automated system, people reduce their own cognitive investment in the task. |
| Anchoring effect | An automated recommendation serves as an anchor, biasing subsequent human judgment toward that recommendation even after it has been shown to be incorrect. |
| Factor | Description |
|---|---|
| High workload and time pressure | When operators face multiple simultaneous tasks or time constraints, they are more likely to rely on automated suggestions rather than conducting independent analysis. |
| System reliability history | A system with a strong track record of accuracy builds user trust, which then persists even in situations where the system produces errors (a pattern called "learned carelessness"). |
| Display design and salience | Automated recommendations that are displayed prominently on a screen are more likely to be followed without question, regardless of their accuracy. |
| Lack of transparency | When users do not understand how an automated system generates its recommendations, they are less able to evaluate whether a given output is trustworthy. |
| Team dynamics | Research shows that teams do not outperform individuals in detecting automation failures. Group settings can actually reinforce automation bias through diffusion of responsibility. |
Automation bias produces two distinct categories of errors, first defined by Mosier and Skitka in 1996.
Omission errors occur when an automated system fails to alert the user to a problem and the user, relying on the system's silence as an indication that everything is normal, fails to detect the issue independently. In these cases, the human operator does not take a necessary action because the automation did not prompt them to do so. Studies have reported omission error rates as high as 55% in some experimental settings.
Commission errors occur when an automated system provides an incorrect recommendation and the user follows it despite the availability of contradictory information from other valid sources. The user actively takes an inappropriate action because they accept the automated suggestion over their own training, experience, or direct observations. Commission errors are considered especially concerning because they involve the active dismissal or ignoring of correct information.
| Error type | Definition | Example |
|---|---|---|
| Omission error | Failure to act because automation did not provide an alert | A pilot misses an engine malfunction because the automated monitoring system does not flag it |
| Commission error | Following an incorrect automated recommendation despite contradictory evidence | A physician changes a correct diagnosis to match an incorrect suggestion from a clinical decision tree support system |
Automation bias and automation complacency are closely related but distinct concepts. Automation complacency refers to insufficient monitoring of an automated system's output, typically driven by a belief that the system is reliable enough that close monitoring is unnecessary. NASA's Aviation Safety Reporting System defines complacency as "self-satisfaction that may result in non-vigilance based on an unjustified assumption of satisfactory system state."
The key distinction is one of mechanism. Automation bias involves trusting the content of a decision-support system's output (accepting what it recommends), while complacency involves inadequate attention to and monitoring of the system's behavior (not watching it closely enough). In practice, both phenomena frequently co-occur and reinforce each other. Parasuraman and Manzey (2010) argued that the two concepts represent different aspects of the same underlying attentional dysfunction and proposed treating them within a single integrative framework.
A related concept is the "automation irony" identified by Bainbridge (1983): the more reliable and capable an automated system becomes, the less the human operator practices the skills needed to handle system failures. This creates a paradoxical situation in which the humans who are supposed to serve as a safety net for automation failures are progressively less capable of performing that role.
Aviation was the first domain in which automation bias was systematically studied, and it remains one of the most extensively documented. Notable findings include:
Clinical decision support systems (CDSS) are designed to improve diagnostic accuracy, but research has shown that they also introduce new types of errors through automation bias.
Algorithmic risk assessment tools are increasingly used in the criminal justice system, raising concerns about automation bias in judicial decision-making.
A 2004 study by Mary Cummings documented cases in which automation bias contributed to fatal military decisions, including friendly-fire incidents during operations in Iraq. In these cases, operators relied on automated targeting systems without adequately verifying the identity of targets.
A 2021 U.S. drone strike in Kabul, Afghanistan, was later determined to have killed ten civilians, including seven children, rather than the intended target. Investigations pointed to over-reliance on automated surveillance and targeting systems as a contributing factor.
The rise of driver-assistance and autonomous driving technologies has created new contexts for automation bias.
The proliferation of large language models (LLMs) such as ChatGPT, Claude, and Gemini has introduced automation bias into new areas of everyday life. Because LLMs generate fluent, confident-sounding text, users may accept their outputs without critical evaluation, even when those outputs contain factual errors (hallucinations).
Research published in 2024 in International Data Privacy Law (Oxford Academic) examined the specific risks of automation bias with generative LLMs, noting that the conversational and authoritative tone of LLM responses makes them especially susceptible to uncritical acceptance. A 2025 clinical trial found that physicians who had received AI training still demonstrated automation bias when using LLMs for diagnostic reasoning, over-relying on LLM-generated differential diagnoses.
Key areas of concern include:
In 2012, Goddard, Roudsari, and Wyatt published a systematic review in the Journal of the American Medical Informatics Association (JAMIA) that examined the frequency of automation bias across research fields, the factors that mediate its effects, and the interventions that can mitigate it. Key findings from this review include:
Researchers and system designers have proposed a range of interventions to reduce automation bias. No single strategy has been shown to eliminate the problem entirely, but several approaches have demonstrated partial effectiveness.
| Strategy | Description | Evidence |
|---|---|---|
| Reduced display prominence | Making automated recommendations less visually dominant on the screen so they do not anchor the user's attention | Mixed results; some studies confirm the effect, others find display prominence has limited impact |
| Transparency and explainability | Providing users with information about how the system generated its recommendation (see interpretability) | Users who understand system reasoning adjust their reliance accordingly, but overly technical explanations can backfire by reinforcing misplaced trust |
| Confidence indicators | Showing the system's estimated confidence or reliability for each recommendation | Can help calibrate trust, but confidence indicators themselves can become a source of bias if users do not understand probability well |
| Variable reliability cues | Designing systems whose reliability indicators change over time rather than presenting a constant level of confidence | Reduces complacency by preventing users from developing a fixed expectation of system accuracy |
| Framing as support rather than directive | Presenting automated outputs as suggestions or second opinions rather than definitive answers | Encourages users to treat automated input as one factor among several rather than the final word |
| Strategy | Description | Evidence |
|---|---|---|
| Error exposure training | Deliberately introducing system errors during training so users experience automation failures firsthand | More effective than simply telling users that errors can occur; reduces commission errors but may not reduce omission errors |
| Accountability structures | Making users explicitly accountable for the accuracy of their decisions, regardless of whether they used automated assistance | Mosier et al. (1996) found that pilots with an internalized sense of accountability were significantly more likely to verify automated outputs |
| Verification checklists | Requiring users to complete a structured verification process before acting on automated recommendations | Effective but increases time pressure and task complexity, which can introduce its own set of problems |
| Cross-referencing protocols | Establishing procedures that require users to compare automated recommendations against at least one independent source of information | Effective in principle, but compliance decreases under workload and time pressure |
The human-in-the-loop paradigm, in which human experts review and approve automated outputs before they are acted upon, is widely recommended as a safeguard against automation bias. However, research has shown that simply placing a human in the loop does not automatically solve the problem. If the human reviewer is subject to automation bias (as most humans are), they may rubber-stamp automated outputs without meaningful scrutiny. Effective human-in-the-loop systems require:
Research suggests that the effectiveness of automated decision aids drops sharply when system reliability falls below approximately 70%. Below this threshold, users tend to distrust and ignore the system, which can lead to underuse rather than overuse. Above this threshold, users tend to trust the system increasingly, with the risk of automation bias growing as reliability increases.
This creates a nuanced problem for system designers: a system that is reliable enough to be useful (above 70%) is also reliable enough to induce automation bias. Systems that are extremely reliable (above 95%) may produce the strongest automation bias because users have so few experiences of system failure that they stop expecting errors entirely. This pattern has been described as the "first-failure effect," in which the first experienced system failure causes a sharp decline in trust that is followed by a slow recovery back to high trust levels.