Hallucination

Hallucination in artificial intelligence refers to outputs generated by AI models that are fluent, plausible, and confident-sounding but factually incorrect, fabricated, or not grounded in the provided input or real-world knowledge. The term is most commonly associated with large language models (LLMs) such as GPT-4, Claude, and Gemini, but hallucinations also occur in image generation, code synthesis, machine translation, and other modalities. Hallucination is widely considered one of the most significant challenges in deploying AI systems for real-world applications, particularly in high-stakes domains such as law, medicine, and finance.

ELI5 (Explain like I'm five)

Imagine you ask a very confident friend a question, and instead of saying "I don't know," they make up an answer that sounds completely real. They say it so convincingly that you believe them. That is what AI hallucination is like. The AI does not actually "know" things the way people do. It predicts what words are likely to come next based on patterns it learned during training. Sometimes those patterns lead to answers that sound right but are totally wrong. It might invent a book that does not exist, make up fake statistics, or describe events that never happened, all while sounding perfectly sure of itself.

Definition and terminology

In the context of AI, hallucination describes the generation of content that appears coherent and plausible on the surface but is not supported by the source material, the model's input, or verifiable real-world facts. The term borrows from psychiatry, where hallucination refers to sensory perceptions that occur without external stimuli. However, the analogy is imperfect, and this has generated significant debate in the research community.

The confabulation debate

Several researchers and commentators have argued that "hallucination" is a misleading term when applied to AI systems. In clinical psychology, confabulation refers to the unintentional production of false memories or narratives to fill gaps in knowledge, which more closely mirrors what language models actually do. Since LLMs have no sensory experiences to misperceive, they cannot truly "hallucinate" in the psychiatric sense. Instead, they fill gaps in their learned patterns with plausible but fabricated content.

Usama Fayyad has called the term "misleading" and "vague," while Mary Shaw has argued that it inappropriately frames real errors as "idiosyncratic quirks." Computer scientist Gary N. Smith has pointed out that LLMs "do not understand what words mean" and therefore cannot be said to hallucinate in any meaningful sense. Alternative terms that have been proposed include confabulation, fabrication, bullshit (in the philosophical sense defined by Harry Frankfurt), and delusion. Despite these objections, "hallucination" has become the standard term in the field since its widespread adoption following ChatGPT's release in November 2022. Cambridge Dictionary updated its definition in 2023 to include the AI-specific meaning.

Faithfulness vs. factuality

Two closely related but distinct concepts underpin the study of hallucination:

Concept	Definition	Example
Faithfulness	Whether the output is consistent with the provided input or context	A summarization model adding claims not present in the source document
Factuality	Whether the output agrees with established real-world facts	A language model claiming that the Eiffel Tower is located in Berlin

A model can be faithful to its input yet factually wrong (if the input itself contains errors), or factually correct yet unfaithful (if it introduces accurate information not present in the source).

Types of hallucination

Researchers have developed several taxonomies for classifying hallucinations. The most widely cited framework, established in early NLP research on abstractive summarization, distinguishes between intrinsic and extrinsic hallucinations.

Intrinsic vs. extrinsic hallucinations

Type	Description	Example
Intrinsic hallucination	The generated output directly contradicts information present in the source material or input context	A summarizer states "the patient was discharged on Monday" when the source says Tuesday
Extrinsic hallucination	The generated output contains information that cannot be verified or refuted from the source material alone	A summarizer adds a claim about the patient's family history that is never mentioned in the source

Intrinsic hallucinations are generally considered more harmful because they actively distort known information. Extrinsic hallucinations may sometimes be benign (adding true background knowledge) or harmful (introducing fabricated details).

Task-specific categories

Beyond the intrinsic/extrinsic framework, hallucinations manifest differently depending on the task:

Factual hallucinations: The model generates statements that contradict established real-world facts, such as inventing historical events or attributing discoveries to the wrong person.
Faithfulness hallucinations: The output deviates from or misrepresents the content of a provided source document, common in summarization and question-answering tasks.
Grounded hallucinations: The model produces claims that are plausible and not obviously false but cannot be verified against any available evidence.
Closed-domain hallucinations: In tasks with a defined input (such as document summarization), the output contradicts the provided context.
Open-domain hallucinations: In free-form generation tasks (such as open-ended question answering), the output is factually incorrect with respect to world knowledge.

Causes of hallucination

Hallucinations arise from a combination of factors related to training data, model architecture, and the inference process. No single cause fully explains the phenomenon, and in most cases, hallucinations result from multiple interacting factors.

Training data issues

Noisy and inaccurate data: Large-scale training corpora scraped from the internet inevitably contain errors, contradictions, outdated information, and biases. Models that internalize these inconsistencies may reproduce them during generation.
Source-reference divergence: In supervised tasks such as summarization, training pairs sometimes contain mismatches between source documents and reference summaries. Models trained on such data learn to generate content that diverges from the source.
Knowledge cutoffs: Models trained on data up to a specific date lack information about subsequent events but may still attempt to answer questions about them, producing fabricated responses.
OCR and transcription errors: Training data derived from scanned documents may contain systematic errors that introduce factual inaccuracies into the model's learned representations.

Architectural and modeling factors

Autoregressive generation: Most modern LLMs generate text one token at a time, with each token conditioned on previously generated tokens. Errors in early tokens can cascade and compound, leading to increasingly divergent outputs. This "snowball effect" makes longer outputs more prone to hallucination.
Exposure bias: During training, models are conditioned on ground-truth sequences, but during inference, they condition on their own previously generated (and potentially erroneous) tokens. This train-test mismatch contributes to hallucination.
Attention mechanism limitations: Transformer models rely on attention mechanisms to focus on relevant parts of the input. When attention weights are poorly distributed, the model may fail to properly leverage available context, leading to outputs that ignore or contradict source information.
Compression of knowledge: Neural networks compress vast amounts of information into a fixed set of parameters. This lossy compression means that some facts are stored imprecisely, leading to plausible but incorrect recall during generation.

Decoding and inference factors

Sampling strategies: Decoding methods such as top-k sampling, nucleus sampling, and temperature scaling introduce randomness to promote diverse outputs. Higher randomness settings increase the likelihood of hallucination by allowing less probable (and potentially incorrect) tokens to be selected.
Likelihood vs. truthfulness misalignment: Language models are trained to maximize the likelihood of generating text that resembles the training data, not to maximize factual accuracy. A statement can be highly probable under the model's distribution yet factually wrong.
Overconfidence: Models often assign high confidence to incorrect outputs, making hallucinations difficult to detect through confidence-based filtering alone.

Insights from mechanistic interpretability

In 2025, Anthropic published research on the internal mechanisms of Claude that shed light on how hallucinations arise at the circuit level. The researchers identified internal circuits responsible for declining to answer questions when the model lacks sufficient information. Hallucinations were observed to occur when this inhibition mechanism fails: the model recognizes a name or concept but lacks enough stored information to generate an accurate response. Rather than declining to answer, it constructs a plausible but untrue response.

Hallucination across modalities

While most research focuses on text generation, hallucinations occur across all generative AI modalities.

Text generation (LLMs)

LLM hallucinations are the most widely studied form. Common manifestations include:

Fabricated citations: Models generate references to academic papers, court cases, or books that do not exist, complete with plausible-sounding titles, authors, and publication details.
Invented facts: Models state incorrect dates, attribute quotes to the wrong people, or describe events that never occurred.
Fictional entities: Models create people, organizations, or places that have no real-world counterparts.
False numerical data: Models generate statistics, financial figures, or measurements that are entirely fabricated.

A 2023 study published in the Cureus Journal found that of 178 references generated by GPT-3, 69 had incorrect or nonexistent DOIs, and 28 had no locatable DOI at all. Another study analyzing 115 ChatGPT-3.5 references found that 47% were entirely fabricated, 46% cited real sources but extracted incorrect information, and only 7% were fully correct.

Image generation

Hallucinations in image generative models (such as DALL-E, Stable Diffusion, and Midjourney) take different forms:

Object hallucination: Generating objects or details not specified in the text prompt.
Anatomical errors: Producing images with incorrect human anatomy, such as extra fingers or distorted limbs.
Text rendering failures: Generating garbled or nonsensical text within images.
Semantic inconsistency: Producing images that contradict the intended meaning of the prompt.

Research on diffusion models has shown that these models interpolate between nearby data modes in their training distribution, sometimes generating samples entirely outside the support of real data. This mode interpolation phenomenon is a fundamental cause of hallucinated visual content.

Multimodal AI

Multimodal large language models (MLLMs) that process both text and images face a distinct form of hallucination called object hallucination, where the model perceives or describes objects that are absent from the input image. Studies have found that even state-of-the-art multimodal models frequently describe visual content inaccurately, particularly when prompted with leading questions about objects not present in the scene.

Code generation

Code-generating models can hallucinate in several ways:

Calling APIs or functions that do not exist in the specified library.
Using incorrect function signatures or parameter names.
Generating syntactically valid code that produces incorrect results.
Referencing nonexistent packages, modules, or version-specific features.

Machine translation

In neural machine translation, hallucinations manifest as translations that are fluent in the target language but bear no relation to the source text. Google researchers documented this phenomenon in 2017, noting that it was particularly common for low-resource language pairs and short or ambiguous source sentences.

Real-world impact

Hallucinations have caused significant real-world harm across multiple domains.

Legal

The most prominent case is Mata v. Avianca, Inc. (2023), in which attorney Steven Schwartz submitted a legal brief containing six entirely fictitious case citations generated by ChatGPT, complete with fabricated docket numbers and judicial opinions. Judge P. Kevin Castel of the Southern District of New York fined Schwartz and co-counsel Peter LoDuca $5,000 each. The case became a landmark example of AI hallucination risk.

By 2025, an AI hallucination case database maintained by legal researcher Damien Charlotin documented over 700 cases worldwide involving hallucinated legal content, with 324 in U.S. federal, state, and tribal courts, implicating more than 128 lawyers and two judges.

A 2024 Stanford University study by Varun Magesh, Matthew Dahl, and colleagues found that specialized legal AI tools hallucinated on at least 1 in 6 benchmark queries. Lexis+ AI produced incorrect or misgrounded responses on more than 17% of queries, while Westlaw's AI-Assisted Research hallucinated on approximately 33% of queries.

Medical

Hallucinated medical information poses serious risks to patient safety. Studies have found that AI chatbots can generate plausible but incorrect medical advice, fabricate drug interactions, or cite nonexistent clinical trials. The potential for harm is amplified because patients may lack the expertise to identify inaccurate medical claims.

Financial and business

In 2025, Deloitte faced scrutiny when an A$440,000 report was found to contain citations to nonexistent academic sources. Similarly, a CA$1.6 million Health Human Resources Plan included at least four false citations to fabricated research papers. These incidents highlight the risks of using AI-generated content in professional consulting without rigorous verification.

Academic research

Hallucinated citations pose a threat to academic integrity. Northwestern University research found that plagiarism detectors rated AI-generated abstracts as 100% original, while AI detection tools achieved only 66% accuracy in identifying them. Human researchers performed only slightly better, identifying AI-generated text at a rate of 68%.

Detection methods

Detecting hallucinations is an active and challenging area of research. Methods can be broadly categorized into reference-based and reference-free approaches.

Reference-based detection

These methods compare model outputs against a trusted knowledge source:

Method	Approach	Strengths	Limitations
Fact verification	Decompose output into atomic claims and verify each against a knowledge base or retrieved documents	High precision for verifiable claims	Requires comprehensive knowledge bases; cannot verify subjective or novel claims
NLI-based detection	Use natural language inference models to check whether source documents entail, contradict, or are neutral toward generated claims	Scalable; works across domains	NLI models themselves can be inaccurate
Retrieval-based checking	Retrieve relevant documents and compare them against the generated output for consistency	Leverages up-to-date information	Depends on retrieval quality; may miss nuanced errors

Reference-free detection

These methods assess hallucination without access to external ground truth:

Method	Approach	Strengths	Limitations
Self-consistency checking	Generate multiple responses to the same prompt and identify claims that appear inconsistently across samples	No external knowledge needed	Inconsistency does not always indicate hallucination; consistent errors are missed
Semantic uncertainty estimation	Measure the model's uncertainty at the semantic level across multiple sampled outputs	Can flag low-confidence generations	Computationally expensive; overconfident models may evade detection
Internal probe methods	Train classifiers on the model's internal activations to predict whether a given output is hallucinated	Can detect hallucinations the model "knows" are wrong	Requires access to model internals; may not generalize across models
SelfCheckGPT	Prompt the model to evaluate its own outputs for factual consistency without external databases	Simple to implement	Limited by the model's own knowledge and biases

Benchmarks and evaluation

Several benchmarks have been developed specifically to measure and evaluate hallucination in AI systems.

Benchmark	Description	Scale	Key features
TruthfulQA	Tests whether models avoid generating false answers to questions designed to elicit common misconceptions	817 questions across 38 categories	Targets common human misconceptions; widely used but increasingly saturated due to inclusion in training data
HaluEval	Provides human-annotated examples of hallucinated and factual responses for evaluation	10,000 to 35,000 annotated examples	Covers QA and dialogue formats; balanced between factual and hallucinated samples
FactScore	Decomposes long-form text into atomic facts and evaluates each for factual precision	Variable (depends on input)	Fine-grained evaluation; identifies specific hallucinated claims within longer passages
HalluLens	A comprehensive benchmark for evaluating hallucination across multiple dimensions	Multi-task evaluation	Tests multiple hallucination types simultaneously; designed to address limitations of earlier benchmarks
Hallucinations Leaderboard	An open community effort hosted on Hugging Face to rank models by hallucination rates	Ongoing, multi-model	Combines multiple evaluation metrics; publicly accessible and regularly updated

Researchers have noted that TruthfulQA, while historically important, has become increasingly saturated because its questions have been incorporated into many models' training data, reducing its effectiveness as an evaluation tool. Newer benchmarks like HalluLens and FactScore address some of these limitations by using more dynamic evaluation methodologies.

Mitigation strategies

A wide range of techniques have been developed to reduce hallucinations, though no single approach eliminates them entirely. Effective mitigation typically requires combining multiple strategies.

Retrieval-augmented generation (RAG)

Retrieval-augmented generation is one of the most widely adopted mitigation strategies. RAG systems retrieve relevant documents from an external knowledge base before generating a response, grounding the model's output in specific source material. Studies have shown that RAG can reduce hallucination rates by 40% to 71% compared to standalone LLMs. However, RAG is not a complete solution. Poorly retrieved or irrelevant documents can actually amplify hallucinations, a phenomenon researchers have termed "hallucination on hallucination." The effectiveness of RAG depends heavily on the quality and relevance of the retrieval corpus.

Reinforcement learning from human feedback (RLHF)

RLHF trains models to align their outputs with human preferences, including preferences for factual accuracy over plausible fabrication. By having human evaluators rate model outputs and training a reward model on these ratings, RLHF can teach models to avoid confident confabulation. Most leading LLMs, including GPT-4, Claude, and Gemini, use RLHF as part of their training pipeline. However, RLHF can also introduce new biases and does not guarantee factual accuracy.

Chain-of-thought and self-consistency

Chain-of-thought (CoT) prompting guides models to reason through problems step-by-step before generating a final answer, which helps reduce logical errors and hallucinations in tasks requiring multi-step reasoning. Self-consistency decoding extends this approach by sampling multiple diverse reasoning paths and selecting the answer that appears most consistently across them. Research by Wang et al. (2022) demonstrated that self-consistency improves performance on arithmetic and commonsense reasoning benchmarks by significant margins, including a 17.9% improvement on GSM8K and an 11.0% improvement on SVAMP.

Constrained decoding and structured output

Constrained decoding techniques restrict the model's output space to reduce the likelihood of hallucination:

Temperature reduction: Lowering the sampling temperature for factual tasks reduces randomness and favors more probable (and typically more accurate) outputs.
Grounded generation: Forcing the model to generate only content that can be traced back to specific source passages.
Schema-constrained output: Requiring outputs to conform to predefined schemas or templates, limiting opportunities for fabrication.
Tool integration: Allowing models to call external tools (calculators, search engines, databases) rather than relying solely on parametric knowledge.

Post-generation verification

These approaches check and correct outputs after generation:

Automated fact-checking pipelines: Decompose generated text into atomic claims, retrieve evidence for each claim, and flag or correct unsupported statements.
Multi-agent debate: Multiple model instances evaluate the same question and debate until reaching consensus, filtering out individually hallucinated claims.
Human-in-the-loop review: Critical applications maintain human oversight to verify AI-generated content before it reaches end users.

Fine-tuning and data curation

Instruction tuning: Training models on high-quality instruction-following datasets that emphasize accuracy over fluency.
Data deduplication and cleaning: Removing errors, contradictions, and low-quality content from training data.
Reinforcement learning from AI feedback (RLAIF): Using AI systems to evaluate and provide feedback on model outputs, scaling the alignment process beyond what is feasible with human annotators alone.

Grounding techniques

Grounding refers to anchoring model outputs in verifiable, authoritative sources of information. Key grounding approaches include:

Knowledge graph integration: Connecting language models to structured knowledge graphs (such as Wikidata) to verify facts during generation.
Citation generation: Training models to produce inline citations for their claims, enabling users to verify outputs against original sources.
Web search augmentation: Allowing models to perform real-time web searches to check facts before or during response generation.
Database-backed generation: Connecting models to structured databases for tasks involving numerical data, ensuring that statistics and figures are retrieved rather than generated from memory.

Positive applications of hallucination

While hallucination is overwhelmingly viewed as a problem in information-centric applications, the same generative capacity has proven valuable in creative and scientific contexts:

David Baker's laboratory used AI's capacity to generate novel molecular structures to design millions of new proteins, contributing to his 2024 Nobel Prize in Chemistry.
Caltech researchers leveraged generative AI to design novel catheter geometries that reduced bacterial contamination.
Memorial Sloan Kettering Cancer Center used AI hallucination-like processes to enhance blurry medical images, improving diagnostic capabilities.

In these applications, the model's ability to produce outputs beyond its training data is a feature rather than a bug, enabling the exploration of design spaces that humans might not have considered.

Current state and open challenges

As of 2026, hallucination remains one of the most significant unsolved problems in AI. Key observations about the current state include:

No complete solution exists. Despite significant research investment, no technique or combination of techniques fully eliminates hallucination. Researchers at multiple institutions have provided theoretical arguments suggesting that hallucination may be an inherent property of large language models that cannot be entirely resolved through scaling or training alone.
Detection remains imperfect. Current hallucination detection tools achieve useful but far from perfect accuracy, particularly for subtle or domain-specific fabrications.
Industry response is growing. The market for hallucination detection and mitigation tools grew by 318% between 2023 and 2025, reflecting enterprise urgency around the problem.
Regulatory attention is increasing. Courts, regulators, and professional standards bodies are establishing rules and guidelines around the use of AI-generated content, particularly in legal and medical contexts.
Newer models hallucinate less but still hallucinate. Each generation of frontier models tends to produce fewer hallucinations than its predecessors, but the problem has not been eliminated.

References

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Madotto, A., & Fung, P. (2023). "Survey of Hallucination in Natural Language Generation." *ACM Computing Surveys*, 55(12), 1-38.
Maynez, J., Narayan, S., Bohnet, B., & McDonald, R. (2020). "On Faithfulness and Factuality in Abstractive Summarization." *Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics*, 1906-1919.
Lin, S., Hilton, J., & Evans, O. (2022). "TruthfulQA: Measuring How Models Mimic Human Falsehoods." *Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics*, 3214-3252.
Li, J., Cheng, X., Zhao, W. X., Nie, J.-Y., & Wen, J.-R. (2023). "HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models." *Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing*.
Min, S., Krishna, K., Lyu, X., Lewis, M., Yih, W., Koh, P. W., Iyyer, M., Zettlemoyer, L., & Hajishirzi, H. (2023). "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation." *Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing*.
Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). "Self-Consistency Improves Chain of Thought Reasoning in Language Models." *arXiv preprint arXiv:2203.11171*.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kuttler, H., Lewis, M., Yih, W., Rocktaschel, T., Riedel, S., & Kiela, D. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." *Advances in Neural Information Processing Systems*, 33, 9459-9474.
Magesh, V., Surani, F., Dahl, M., Suzgun, M., Manning, C. D., & Ho, D. E. (2024). "Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools." *Journal of Legal Analysis*, 16, 64-93.
Cossio, M. (2025). "A Comprehensive Taxonomy of Hallucinations in Large Language Models." *arXiv preprint arXiv:2508.01781*.
Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., & Liu, T. (2025). "A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions." *arXiv preprint arXiv:2311.05232*.
Rawte, V., Sheth, A., & Das, A. (2023). "A Survey of Hallucination in Large Foundation Models." *arXiv preprint arXiv:2309.05922*.
Tonmoy, S. M., Zaman, S. M., Jain, V., Rani, A., Rawber, A., Chadha, A., & Das, A. (2024). "A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models." *arXiv preprint arXiv:2401.01313*.

ELI5 (Explain like I'm five)

Definition and terminology

The confabulation debate

Faithfulness vs. factuality

Types of hallucination

Intrinsic vs. extrinsic hallucinations

Task-specific categories

Causes of hallucination

Training data issues

Architectural and modeling factors

Decoding and inference factors

Insights from mechanistic interpretability

Hallucination across modalities

Text generation (LLMs)

Image generation

Multimodal AI

Code generation

Machine translation

Real-world impact

Legal

Medical

Financial and business

Academic research

Detection methods

Reference-based detection

Reference-free detection

Benchmarks and evaluation

Mitigation strategies

Retrieval-augmented generation (RAG)

Reinforcement learning from human feedback (RLHF)

Chain-of-thought and self-consistency

Constrained decoding and structured output

Post-generation verification

Fine-tuning and data curation

Grounding techniques

Positive applications of hallucination

Current state and open challenges

See also

References

Improve this article

Related Articles

ARC-AGI 2

Agentic Context Engineering

Claude Sonnet 4.5

Computer-use agent

Computer-use model

Context window

ELI5 (Explain like I'm five)

Definition and terminology

The confabulation debate

Faithfulness vs. factuality

Types of hallucination

Intrinsic vs. extrinsic hallucinations

Task-specific categories

Causes of hallucination

Training data issues

Architectural and modeling factors

Decoding and inference factors

Insights from mechanistic interpretability

Hallucination across modalities

Text generation (LLMs)

Image generation

Multimodal AI

Code generation

Machine translation

Real-world impact

Legal

Medical

Financial and business

Academic research

Detection methods

Reference-based detection

Reference-free detection

Benchmarks and evaluation

Mitigation strategies

Retrieval-augmented generation (RAG)

Reinforcement learning from human feedback (RLHF)

Chain-of-thought and self-consistency

Constrained decoding and structured output

Post-generation verification