Hallucination
Last reviewed
Sources
17 citations
Review status
Source-backed
Revision
v7 ยท 4,523 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
17 citations
Review status
Source-backed
Revision
v7 ยท 4,523 words
Add missing citations, update stale details, or suggest a clearer explanation.
Hallucination in artificial intelligence is the generation of output that is fluent, confident, and plausible but factually incorrect, fabricated, or not grounded in the model's input or in real-world facts.[1] It is most associated with large language models (LLMs) such as GPT-4, Claude, and Gemini, but it also occurs in image generation, code synthesis, machine translation, and other modalities. A 2025 OpenAI study argued that hallucinations are not a mysterious glitch but a predictable statistical outcome: they "originate simply as errors in binary classification," and language models hallucinate because "the training and evaluation procedures reward guessing over acknowledging uncertainty."[13] Hallucination is widely considered one of the most significant unsolved challenges in deploying AI, particularly in high-stakes domains such as law, medicine, and finance.
Imagine you ask a very confident friend a question, and instead of saying "I don't know," they make up an answer that sounds completely real. They say it so convincingly that you believe them. That is what AI hallucination is like. The AI does not actually "know" things the way people do. It predicts what words are likely to come next based on patterns it learned during training. Sometimes those patterns lead to answers that sound right but are totally wrong. It might invent a book that does not exist, make up fake statistics, or describe events that never happened, all while sounding perfectly sure of itself.
In the context of AI, hallucination describes the generation of content that appears coherent and plausible on the surface but is not supported by the source material, the model's input, or verifiable real-world facts.[1] OpenAI defines hallucinations as "plausible but false statements generated by language models."[13] The term borrows from psychiatry, where hallucination refers to sensory perceptions that occur without external stimuli. However, the analogy is imperfect, and this has generated significant debate in the research community.
Several researchers and commentators have argued that "hallucination" is a misleading term when applied to AI systems. In clinical psychology, confabulation refers to the unintentional production of false memories or narratives to fill gaps in knowledge, which more closely mirrors what language models actually do. Since LLMs have no sensory experiences to misperceive, they cannot truly "hallucinate" in the psychiatric sense. Instead, they fill gaps in their learned patterns with plausible but fabricated content.
Usama Fayyad has called the term "misleading" and "vague," while Mary Shaw has argued that it inappropriately frames real errors as "idiosyncratic quirks." Computer scientist Gary N. Smith has pointed out that LLMs "do not understand what words mean" and therefore cannot be said to hallucinate in any meaningful sense. Alternative terms that have been proposed include confabulation, fabrication, bullshit (in the philosophical sense defined by Harry Frankfurt), and delusion. Despite these objections, "hallucination" has become the standard term in the field since its widespread adoption following ChatGPT's release in November 2022. Cambridge Dictionary updated its definition in 2023 to include the AI-specific meaning.
Two closely related but distinct concepts underpin the study of hallucination:
| Concept | Definition | Example |
|---|---|---|
| Faithfulness | Whether the output is consistent with the provided input or context | A summarization model adding claims not present in the source document |
| Factuality | Whether the output agrees with established real-world facts | A language model claiming that the Eiffel Tower is located in Berlin |
A model can be faithful to its input yet factually wrong (if the input itself contains errors), or factually correct yet unfaithful (if it introduces accurate information not present in the source).[2]
Researchers have developed several taxonomies for classifying hallucinations.[9] The most widely cited framework, established in early NLP research on abstractive summarization, distinguishes between intrinsic and extrinsic hallucinations.[2]
| Type | Description | Example |
|---|---|---|
| Intrinsic hallucination | The generated output directly contradicts information present in the source material or input context | A summarizer states "the patient was discharged on Monday" when the source says Tuesday |
| Extrinsic hallucination | The generated output contains information that cannot be verified or refuted from the source material alone | A summarizer adds a claim about the patient's family history that is never mentioned in the source |
Intrinsic hallucinations are generally considered more harmful because they actively distort known information. Extrinsic hallucinations may sometimes be benign (adding true background knowledge) or harmful (introducing fabricated details).[2]
Beyond the intrinsic/extrinsic framework, hallucinations manifest differently depending on the task:
Hallucinations arise from a combination of factors related to training data, model architecture, and the inference process.[10] No single cause fully explains the phenomenon, and in most cases, hallucinations result from multiple interacting factors.[1] A 2025 OpenAI analysis added a statistical-learning explanation: it formalizes generation as an "Is-It-Valid" binary classification problem and proves that "if incorrect statements cannot be distinguished from facts, then hallucinations in pretrained language models will arise through natural statistical pressures."[13] The same work argues the problem is reinforced after pretraining because most benchmarks score answers as right or wrong with no credit for abstaining, so "language models are optimized to be good test-takers, and guessing when uncertain improves test performance."[13]
In 2025, Anthropic published circuit-tracing research on the internal mechanisms of Claude that shed light on how hallucinations arise at the circuit level.[14] The researchers found that, in Claude 3.5 Haiku, refusal is the default behavior: a circuit that is "on" by default causes the model to state that it has insufficient information to answer, and a competing "known entities" feature inhibits this circuit when the model recognizes something it knows well, such as the basketball player Michael Jordan.[14] Hallucinations occur when this inhibition misfires: the model recognizes a name but lacks enough stored information to answer accurately, so instead of declining, it constructs a plausible but untrue response. By artificially activating the "known answer" features, the researchers could make the model "hallucinate (quite consistently!) that Michael Batkin plays chess."[14]
While most research focuses on text generation, hallucinations occur across all generative AI modalities.[11]
LLM hallucinations are the most widely studied form. Common manifestations include:
A 2023 study published in the Cureus Journal found that of 178 references generated by GPT-3, 69 had incorrect or nonexistent DOIs, and 28 had no locatable DOI at all. Another study analyzing 115 ChatGPT-3.5 references found that 47% were entirely fabricated, 46% cited real sources but extracted incorrect information, and only 7% were fully correct.
Hallucinations in image generative models (such as DALL-E, Stable Diffusion, and Midjourney) take different forms:
Research on diffusion models has shown that these models interpolate between nearby data modes in their training distribution, sometimes generating samples entirely outside the support of real data. This mode interpolation phenomenon is a fundamental cause of hallucinated visual content.
Multimodal large language models (MLLMs) that process both text and images face a distinct form of hallucination called object hallucination, where the model perceives or describes objects that are absent from the input image.[11] Studies have found that even state-of-the-art multimodal models frequently describe visual content inaccurately, particularly when prompted with leading questions about objects not present in the scene.
Code-generating models can hallucinate in several ways:
In neural machine translation, hallucinations manifest as translations that are fluent in the target language but bear no relation to the source text. Google researchers documented this phenomenon in 2017, noting that it was particularly common for low-resource language pairs and short or ambiguous source sentences.
Hallucinations have caused significant real-world harm across multiple domains.
The most prominent case is Mata v. Avianca, Inc. (2023), in which attorney Steven Schwartz submitted a legal brief containing six entirely fictitious case citations generated by ChatGPT, complete with fabricated docket numbers and judicial opinions. On June 22, 2023, Judge P. Kevin Castel of the Southern District of New York sanctioned Schwartz, co-counsel Peter LoDuca, and their firm Levidow, Levidow & Oberman a total of $5,000 and ordered them to send corrective letters to the judges falsely named in the fabricated opinions.[17] The case became a landmark example of AI hallucination risk.
An AI hallucination case database maintained by legal researcher Damien Charlotin, launched in April 2025, documents legal decisions worldwide that address hallucinated AI content (typically fake citations). By 2026 it had recorded more than 700 such decisions globally, roughly 90% of them issued in 2025, alongside hundreds of additional U.S. filings.[17]
A 2024 Stanford University study by Varun Magesh, Matthew Dahl, and colleagues found that specialized legal AI tools hallucinated on at least 1 in 6 benchmark queries.[8] Across 202 expert-scored queries, Lexis+ AI produced incorrect or misgrounded responses more than 17% of the time, while Westlaw's AI-Assisted Research hallucinated on approximately 33% of queries, nearly double the rate of Lexis+ AI; the study also measured a 43% hallucination rate for general-purpose GPT-4 on the same task.[8]
Hallucinated medical information poses serious risks to patient safety. Studies have found that AI chatbots can generate plausible but incorrect medical advice, fabricate drug interactions, or cite nonexistent clinical trials. The potential for harm is amplified because patients may lack the expertise to identify inaccurate medical claims.
In 2025, Deloitte faced scrutiny when an A$440,000 report was found to contain citations to nonexistent academic sources. Similarly, a CA$1.6 million Health Human Resources Plan included at least four false citations to fabricated research papers. These incidents highlight the risks of using AI-generated content in professional consulting without rigorous verification.
Hallucinated citations pose a threat to academic integrity. Northwestern University research found that plagiarism detectors rated AI-generated abstracts as 100% original, while AI detection tools achieved only 66% accuracy in identifying them. Human researchers performed only slightly better, identifying AI-generated text at a rate of 68%.
Detecting hallucinations is an active and challenging area of research. Methods can be broadly categorized into reference-based and reference-free approaches.[10]
These methods compare model outputs against a trusted knowledge source:
| Method | Approach | Strengths | Limitations |
|---|---|---|---|
| Fact verification | Decompose output into atomic claims and verify each against a knowledge base or retrieved documents | High precision for verifiable claims[5] | Requires comprehensive knowledge bases; cannot verify subjective or novel claims |
| NLI-based detection | Use natural language inference models to check whether source documents entail, contradict, or are neutral toward generated claims | Scalable; works across domains | NLI models themselves can be inaccurate |
| Retrieval-based checking | Retrieve relevant documents and compare them against the generated output for consistency | Leverages up-to-date information | Depends on retrieval quality; may miss nuanced errors |
These methods assess hallucination without access to external ground truth:
| Method | Approach | Strengths | Limitations |
|---|---|---|---|
| Self-consistency checking | Generate multiple responses to the same prompt and identify claims that appear inconsistently across samples | No external knowledge needed | Inconsistency does not always indicate hallucination; consistent errors are missed |
| Semantic uncertainty estimation | Measure the model's uncertainty at the semantic level across multiple sampled outputs | Can flag low-confidence generations | Computationally expensive; overconfident models may evade detection |
| Internal probe methods | Train classifiers on the model's internal activations to predict whether a given output is hallucinated | Can detect hallucinations the model "knows" are wrong | Requires access to model internals; may not generalize across models |
| SelfCheckGPT | Prompt the model to evaluate its own outputs for factual consistency without external databases | Simple to implement | Limited by the model's own knowledge and biases |
Several benchmarks have been developed specifically to measure and evaluate hallucination in AI systems.
| Benchmark | Description | Scale | Key features |
|---|---|---|---|
| TruthfulQA | Tests whether models avoid generating false answers to questions designed to elicit common misconceptions | 817 questions across 38 categories[3] | Targets common human misconceptions; widely used but increasingly saturated due to inclusion in training data |
| HaluEval | Provides human-annotated examples of hallucinated and factual responses for evaluation | 10,000 to 35,000 annotated examples[4] | Covers QA and dialogue formats; balanced between factual and hallucinated samples |
| FactScore | Decomposes long-form text into atomic facts and evaluates each for factual precision[5] | Variable (depends on input) | Fine-grained evaluation; identifies specific hallucinated claims within longer passages |
| HalluLens | A comprehensive benchmark for evaluating hallucination across multiple dimensions | Multi-task evaluation | Tests multiple hallucination types simultaneously; designed to address limitations of earlier benchmarks |
| Hallucinations Leaderboard | An open community effort hosted on Hugging Face to rank models by hallucination rates | Ongoing, multi-model | Combines multiple evaluation metrics; publicly accessible and regularly updated |
The original TruthfulQA paper documented an "inverse scaling" effect in which larger models were sometimes less truthful, as they more faithfully reproduced human "imitative falsehoods"; the best model tested was truthful on 58% of questions versus 94% for humans.[3] Researchers have since noted that TruthfulQA has become increasingly saturated because its questions have been incorporated into many models' training data, reducing its effectiveness as an evaluation tool.[3] Newer benchmarks like HalluLens and FactScore address some of these limitations by using more dynamic evaluation methodologies.[5]
Vectara's Hallucination Leaderboard, which measures grounded summarization hallucination, illustrates how rates have fallen for frontier models: on its updated, more challenging dataset in 2025, Gemini 2.5 Flash-Lite led at a 3.3% hallucination rate, while GPT-5 recorded a 1.4% grounded hallucination rate, down from double-digit rates common in 2023.[16]
A wide range of techniques have been developed to reduce hallucinations, though no single approach eliminates them entirely.[12] Effective mitigation typically requires combining multiple strategies.
Retrieval-augmented generation is one of the most widely adopted mitigation strategies. RAG systems retrieve relevant documents from an external knowledge base before generating a response, grounding the model's output in specific source material.[7] Studies have shown that RAG can reduce hallucination rates by 40% to 71% compared to standalone LLMs. However, RAG is not a complete solution. Poorly retrieved or irrelevant documents can actually amplify hallucinations, a phenomenon researchers have termed "hallucination on hallucination." The effectiveness of RAG depends heavily on the quality and relevance of the retrieval corpus.[12]
RLHF trains models to align their outputs with human preferences, including preferences for factual accuracy over plausible fabrication. By having human evaluators rate model outputs and training a reward model on these ratings, RLHF can teach models to avoid confident confabulation.[12] Most leading LLMs, including GPT-4, Claude, and Gemini, use RLHF as part of their training pipeline. However, RLHF can also introduce new biases and does not guarantee factual accuracy.
Chain-of-thought (CoT) prompting guides models to reason through problems step-by-step before generating a final answer, which helps reduce logical errors and hallucinations in tasks requiring multi-step reasoning. Self-consistency decoding extends this approach by sampling multiple diverse reasoning paths and selecting the answer that appears most consistently across them. Research by Wang et al. (2022) demonstrated that self-consistency improves performance on arithmetic and commonsense reasoning benchmarks by significant margins, including a 17.9% improvement on GSM8K and an 11.0% improvement on SVAMP.[6]
The 2025 OpenAI study proposed a socio-technical fix aimed at the root cause rather than the symptom: because most leaderboards penalize "I don't know" exactly as harshly as a wrong answer, they reward confident guessing. The authors recommend modifying mainstream benchmarks to give partial credit for appropriately expressed uncertainty and to stop penalizing abstention, so that "language models can abstain when uncertain" without being scored as if they had failed.[13]
Constrained decoding techniques restrict the model's output space to reduce the likelihood of hallucination:
These approaches check and correct outputs after generation:
Grounding refers to anchoring model outputs in verifiable, authoritative sources of information. Key grounding approaches include:
While hallucination is overwhelmingly viewed as a problem in information-centric applications, the same generative capacity has proven valuable in creative and scientific contexts:
In these applications, the model's ability to produce outputs beyond its training data is a feature rather than a bug, enabling the exploration of design spaces that humans might not have considered.
As of 2026, hallucination remains one of the most significant unsolved problems in AI. Key observations about the current state include: