Hallucination in artificial intelligence refers to outputs generated by AI models that are fluent, plausible, and confident-sounding but factually incorrect, fabricated, or not grounded in the provided input or real-world knowledge. The term is most commonly associated with large language models (LLMs) such as GPT-4, Claude, and Gemini, but hallucinations also occur in image generation, code synthesis, machine translation, and other modalities. Hallucination is widely considered one of the most significant challenges in deploying AI systems for real-world applications, particularly in high-stakes domains such as law, medicine, and finance.
Imagine you ask a very confident friend a question, and instead of saying "I don't know," they make up an answer that sounds completely real. They say it so convincingly that you believe them. That is what AI hallucination is like. The AI does not actually "know" things the way people do. It predicts what words are likely to come next based on patterns it learned during training. Sometimes those patterns lead to answers that sound right but are totally wrong. It might invent a book that does not exist, make up fake statistics, or describe events that never happened, all while sounding perfectly sure of itself.
In the context of AI, hallucination describes the generation of content that appears coherent and plausible on the surface but is not supported by the source material, the model's input, or verifiable real-world facts. The term borrows from psychiatry, where hallucination refers to sensory perceptions that occur without external stimuli. However, the analogy is imperfect, and this has generated significant debate in the research community.
Several researchers and commentators have argued that "hallucination" is a misleading term when applied to AI systems. In clinical psychology, confabulation refers to the unintentional production of false memories or narratives to fill gaps in knowledge, which more closely mirrors what language models actually do. Since LLMs have no sensory experiences to misperceive, they cannot truly "hallucinate" in the psychiatric sense. Instead, they fill gaps in their learned patterns with plausible but fabricated content.
Usama Fayyad has called the term "misleading" and "vague," while Mary Shaw has argued that it inappropriately frames real errors as "idiosyncratic quirks." Computer scientist Gary N. Smith has pointed out that LLMs "do not understand what words mean" and therefore cannot be said to hallucinate in any meaningful sense. Alternative terms that have been proposed include confabulation, fabrication, bullshit (in the philosophical sense defined by Harry Frankfurt), and delusion. Despite these objections, "hallucination" has become the standard term in the field since its widespread adoption following ChatGPT's release in November 2022. Cambridge Dictionary updated its definition in 2023 to include the AI-specific meaning.
Two closely related but distinct concepts underpin the study of hallucination:
| Concept | Definition | Example |
|---|---|---|
| Faithfulness | Whether the output is consistent with the provided input or context | A summarization model adding claims not present in the source document |
| Factuality | Whether the output agrees with established real-world facts | A language model claiming that the Eiffel Tower is located in Berlin |
A model can be faithful to its input yet factually wrong (if the input itself contains errors), or factually correct yet unfaithful (if it introduces accurate information not present in the source).
Researchers have developed several taxonomies for classifying hallucinations. The most widely cited framework, established in early NLP research on abstractive summarization, distinguishes between intrinsic and extrinsic hallucinations.
| Type | Description | Example |
|---|---|---|
| Intrinsic hallucination | The generated output directly contradicts information present in the source material or input context | A summarizer states "the patient was discharged on Monday" when the source says Tuesday |
| Extrinsic hallucination | The generated output contains information that cannot be verified or refuted from the source material alone | A summarizer adds a claim about the patient's family history that is never mentioned in the source |
Intrinsic hallucinations are generally considered more harmful because they actively distort known information. Extrinsic hallucinations may sometimes be benign (adding true background knowledge) or harmful (introducing fabricated details).
Beyond the intrinsic/extrinsic framework, hallucinations manifest differently depending on the task:
Hallucinations arise from a combination of factors related to training data, model architecture, and the inference process. No single cause fully explains the phenomenon, and in most cases, hallucinations result from multiple interacting factors.
In 2025, Anthropic published research on the internal mechanisms of Claude that shed light on how hallucinations arise at the circuit level. The researchers identified internal circuits responsible for declining to answer questions when the model lacks sufficient information. Hallucinations were observed to occur when this inhibition mechanism fails: the model recognizes a name or concept but lacks enough stored information to generate an accurate response. Rather than declining to answer, it constructs a plausible but untrue response.
While most research focuses on text generation, hallucinations occur across all generative AI modalities.
LLM hallucinations are the most widely studied form. Common manifestations include:
A 2023 study published in the Cureus Journal found that of 178 references generated by GPT-3, 69 had incorrect or nonexistent DOIs, and 28 had no locatable DOI at all. Another study analyzing 115 ChatGPT-3.5 references found that 47% were entirely fabricated, 46% cited real sources but extracted incorrect information, and only 7% were fully correct.
Hallucinations in image generative models (such as DALL-E, Stable Diffusion, and Midjourney) take different forms:
Research on diffusion models has shown that these models interpolate between nearby data modes in their training distribution, sometimes generating samples entirely outside the support of real data. This mode interpolation phenomenon is a fundamental cause of hallucinated visual content.
Multimodal large language models (MLLMs) that process both text and images face a distinct form of hallucination called object hallucination, where the model perceives or describes objects that are absent from the input image. Studies have found that even state-of-the-art multimodal models frequently describe visual content inaccurately, particularly when prompted with leading questions about objects not present in the scene.
Code-generating models can hallucinate in several ways:
In neural machine translation, hallucinations manifest as translations that are fluent in the target language but bear no relation to the source text. Google researchers documented this phenomenon in 2017, noting that it was particularly common for low-resource language pairs and short or ambiguous source sentences.
Hallucinations have caused significant real-world harm across multiple domains.
The most prominent case is Mata v. Avianca, Inc. (2023), in which attorney Steven Schwartz submitted a legal brief containing six entirely fictitious case citations generated by ChatGPT, complete with fabricated docket numbers and judicial opinions. Judge P. Kevin Castel of the Southern District of New York fined Schwartz and co-counsel Peter LoDuca $5,000 each. The case became a landmark example of AI hallucination risk.
By 2025, an AI hallucination case database maintained by legal researcher Damien Charlotin documented over 700 cases worldwide involving hallucinated legal content, with 324 in U.S. federal, state, and tribal courts, implicating more than 128 lawyers and two judges.
A 2024 Stanford University study by Varun Magesh, Matthew Dahl, and colleagues found that specialized legal AI tools hallucinated on at least 1 in 6 benchmark queries. Lexis+ AI produced incorrect or misgrounded responses on more than 17% of queries, while Westlaw's AI-Assisted Research hallucinated on approximately 33% of queries.
Hallucinated medical information poses serious risks to patient safety. Studies have found that AI chatbots can generate plausible but incorrect medical advice, fabricate drug interactions, or cite nonexistent clinical trials. The potential for harm is amplified because patients may lack the expertise to identify inaccurate medical claims.
In 2025, Deloitte faced scrutiny when an A$440,000 report was found to contain citations to nonexistent academic sources. Similarly, a CA$1.6 million Health Human Resources Plan included at least four false citations to fabricated research papers. These incidents highlight the risks of using AI-generated content in professional consulting without rigorous verification.
Hallucinated citations pose a threat to academic integrity. Northwestern University research found that plagiarism detectors rated AI-generated abstracts as 100% original, while AI detection tools achieved only 66% accuracy in identifying them. Human researchers performed only slightly better, identifying AI-generated text at a rate of 68%.
Detecting hallucinations is an active and challenging area of research. Methods can be broadly categorized into reference-based and reference-free approaches.
These methods compare model outputs against a trusted knowledge source:
| Method | Approach | Strengths | Limitations |
|---|---|---|---|
| Fact verification | Decompose output into atomic claims and verify each against a knowledge base or retrieved documents | High precision for verifiable claims | Requires comprehensive knowledge bases; cannot verify subjective or novel claims |
| NLI-based detection | Use natural language inference models to check whether source documents entail, contradict, or are neutral toward generated claims | Scalable; works across domains | NLI models themselves can be inaccurate |
| Retrieval-based checking | Retrieve relevant documents and compare them against the generated output for consistency | Leverages up-to-date information | Depends on retrieval quality; may miss nuanced errors |
These methods assess hallucination without access to external ground truth:
| Method | Approach | Strengths | Limitations |
|---|---|---|---|
| Self-consistency checking | Generate multiple responses to the same prompt and identify claims that appear inconsistently across samples | No external knowledge needed | Inconsistency does not always indicate hallucination; consistent errors are missed |
| Semantic uncertainty estimation | Measure the model's uncertainty at the semantic level across multiple sampled outputs | Can flag low-confidence generations | Computationally expensive; overconfident models may evade detection |
| Internal probe methods | Train classifiers on the model's internal activations to predict whether a given output is hallucinated | Can detect hallucinations the model "knows" are wrong | Requires access to model internals; may not generalize across models |
| SelfCheckGPT | Prompt the model to evaluate its own outputs for factual consistency without external databases | Simple to implement | Limited by the model's own knowledge and biases |
Several benchmarks have been developed specifically to measure and evaluate hallucination in AI systems.
| Benchmark | Description | Scale | Key features |
|---|---|---|---|
| TruthfulQA | Tests whether models avoid generating false answers to questions designed to elicit common misconceptions | 817 questions across 38 categories | Targets common human misconceptions; widely used but increasingly saturated due to inclusion in training data |
| HaluEval | Provides human-annotated examples of hallucinated and factual responses for evaluation | 10,000 to 35,000 annotated examples | Covers QA and dialogue formats; balanced between factual and hallucinated samples |
| FactScore | Decomposes long-form text into atomic facts and evaluates each for factual precision | Variable (depends on input) | Fine-grained evaluation; identifies specific hallucinated claims within longer passages |
| HalluLens | A comprehensive benchmark for evaluating hallucination across multiple dimensions | Multi-task evaluation | Tests multiple hallucination types simultaneously; designed to address limitations of earlier benchmarks |
| Hallucinations Leaderboard | An open community effort hosted on Hugging Face to rank models by hallucination rates | Ongoing, multi-model | Combines multiple evaluation metrics; publicly accessible and regularly updated |
Researchers have noted that TruthfulQA, while historically important, has become increasingly saturated because its questions have been incorporated into many models' training data, reducing its effectiveness as an evaluation tool. Newer benchmarks like HalluLens and FactScore address some of these limitations by using more dynamic evaluation methodologies.
A wide range of techniques have been developed to reduce hallucinations, though no single approach eliminates them entirely. Effective mitigation typically requires combining multiple strategies.
Retrieval-augmented generation is one of the most widely adopted mitigation strategies. RAG systems retrieve relevant documents from an external knowledge base before generating a response, grounding the model's output in specific source material. Studies have shown that RAG can reduce hallucination rates by 40% to 71% compared to standalone LLMs. However, RAG is not a complete solution. Poorly retrieved or irrelevant documents can actually amplify hallucinations, a phenomenon researchers have termed "hallucination on hallucination." The effectiveness of RAG depends heavily on the quality and relevance of the retrieval corpus.
RLHF trains models to align their outputs with human preferences, including preferences for factual accuracy over plausible fabrication. By having human evaluators rate model outputs and training a reward model on these ratings, RLHF can teach models to avoid confident confabulation. Most leading LLMs, including GPT-4, Claude, and Gemini, use RLHF as part of their training pipeline. However, RLHF can also introduce new biases and does not guarantee factual accuracy.
Chain-of-thought (CoT) prompting guides models to reason through problems step-by-step before generating a final answer, which helps reduce logical errors and hallucinations in tasks requiring multi-step reasoning. Self-consistency decoding extends this approach by sampling multiple diverse reasoning paths and selecting the answer that appears most consistently across them. Research by Wang et al. (2022) demonstrated that self-consistency improves performance on arithmetic and commonsense reasoning benchmarks by significant margins, including a 17.9% improvement on GSM8K and an 11.0% improvement on SVAMP.
Constrained decoding techniques restrict the model's output space to reduce the likelihood of hallucination:
These approaches check and correct outputs after generation:
Grounding refers to anchoring model outputs in verifiable, authoritative sources of information. Key grounding approaches include:
While hallucination is overwhelmingly viewed as a problem in information-centric applications, the same generative capacity has proven valuable in creative and scientific contexts:
In these applications, the model's ability to produce outputs beyond its training data is a feature rather than a bug, enabling the exploration of design spaces that humans might not have considered.
As of 2026, hallucination remains one of the most significant unsolved problems in AI. Key observations about the current state include: