HyDE (Hypothetical Document Embeddings)

Information Retrieval Natural Language Processing

23 min read

Updated Jun 24, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 24, 2026

Fact-checked

In review queue

Sources

14 citations

Revision

v3 · 4,556 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

HyDE (Hypothetical Document Embeddings) is a zero-shot dense retrieval technique that, instead of searching with the user's query, first prompts an instruction-following large language model to write a hypothetical answer document for the query, then embeds that synthetic document and uses its vector to retrieve real documents. It was introduced in December 2022 by Luyu Gao, Xueguang Ma, Jimmy Lin, and Jamie Callan in the paper Precise Zero-Shot Dense Retrieval without Relevance Labels.^[1] The synthetic document is embedded with an unsupervised contrastive encoder such as Contriever, and that embedding vector drives nearest-neighbor search in the corpus.^[1] The method requires no relevance labels and no fine-tuned dual encoder, yet on TREC Deep Learning, BEIR, and Mr.TyDi benchmarks it reported nDCG@10 of 61.3 on TREC DL19 (versus 44.5 for the unsupervised baseline Contriever and 50.6 for lexical BM25) and approached supervised, in-domain fine-tuned retrievers.^[1]^[2] As the abstract summarises, "HyDE significantly outperforms the state-of-the-art unsupervised dense retriever Contriever and shows strong performance comparable to fine-tuned retrievers across various tasks (e.g. web search, QA, fact verification) and languages."^[1] HyDE became one of the most widely adopted query-side enhancements for retrieval-augmented generation (RAG) systems, with first-party integrations in LangChain and LlamaIndex.^[3]^[4]

Background

What problem does HyDE solve?

Dense retrievers based on dual encoders, such as Dense Passage Retrieval (DPR), encode queries and passages into a shared vector space and retrieve passages by cosine or inner-product similarity.^[1] These models typically need substantial training on labelled query-document pairs (for example MS MARCO) to learn that space well, because the two encoders must agree on which surface forms of a question map to which surface forms of an answering passage.^[1] When the same encoder is moved to a new domain without further supervision, retrieval quality drops sharply, a weakness documented in the BEIR benchmark of Thakur et al., which collected nine heterogeneous retrieval tasks spanning fact verification, scientific question answering, financial QA, and argument retrieval.^[1] On many BEIR tasks, supervised dense retrievers actually underperformed lexical BM25, a result that motivated a wave of follow-up work on better zero-shot encoders.^[1]

Earlier approaches

The two main classes of pre-HyDE responses were better self-supervised encoders and synthetic-data approaches. Contriever, introduced by Izacard et al. at Meta in 2022, trained an encoder purely with contrastive objectives over unlabelled corpora using random cropping to build positive pairs.^[1]^[14] It improved zero-shot performance but still lagged fine-tuned retrievers on out-of-domain tasks. In parallel, methods such as InPars from Bonifacio et al. and Promptagator from Dai et al. used large language models to generate synthetic queries from real corpus documents, then fine-tuned a small retriever on those synthetic pairs; this trades inference-time LLM calls for offline training cost but requires per-corpus fine-tuning.^[1]

Conceptual move

The HyDE authors framed zero-shot retrieval as fundamentally hard because nothing in the encoder's training tells it which corpus passage answers a given query intent: queries and answers occupy different distributions, and an unsupervised encoder cannot bridge them.^[1] Their proposal inverted the InPars/Promptagator direction. Rather than generating queries from documents and re-training the retriever, HyDE generates documents from queries at inference time and reuses an off-the-shelf unsupervised encoder unchanged.^[1] The hard work of mapping a question to an answer text is delegated to a generic instruction-following language model such as InstructGPT, which has been trained to follow natural language instructions on broad web data, while the dense encoder is given the easier job of finding real passages near a generated text in embedding space.^[1]

When was HyDE published?

The paper was first posted to arXiv on 20 December 2022 with arXiv identifier 2212.10496^[1] and later published in the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023, Volume 1: Long Papers) in Toronto, Canada, pages 1762 to 1777.^[2] Lead author Luyu Gao was a PhD student at Carnegie Mellon University's Language Technologies Institute under Jamie Callan and previously studied computer science at the University of Illinois at Urbana-Champaign; he has worked on retrieval, transformer pre-training, and program-aided reasoning, and was also lead author of the PAL paper on program-aided language models.^[5] Co-authors Xueguang Ma and Jimmy Lin were at the University of Waterloo's David R. Cheriton School of Computer Science; Lin is the principal investigator behind the Pyserini toolkit that HyDE's reference implementation uses for evaluation.^[5]^[7] Jamie Callan is a long-time information retrieval professor at Carnegie Mellon University's Language Technologies Institute.^[5]

How does HyDE work?

HyDE replaces the single encoding step of a dense retriever with a two-stage pipeline. The query path becomes "query, then LLM, then encoder, then index"; the corpus path is unchanged from a normal dense retriever.^[1] In the authors' own words, "HyDE first zero-shot instructs an instruction-following language model (e.g. InstructGPT) to generate a hypothetical document" that "captures relevance patterns but is unreal and may contain false details."^[1]

Pipeline

Instruction prompt: a task-specific natural language instruction is concatenated with the user query and sent to an instruction-following LLM. The reference implementation uses InstructGPT (text-davinci-003) with a prompt such as "Please write a passage to answer the question.\n\nQuestion: {query}\n\nPassage:".^[6]
Hypothetical document generation: the LLM samples one or more candidate "documents" of a few sentences each. These passages are not retrieved from the corpus and may contain factual errors or invented details; the authors call them deliberately hypothetical.^[1] In the reference setup the model is asked to write a passage of typical length and genre for the target domain (a financial article, a scientific abstract, a news story).^[6]
Document embedding: each generated document, plus optionally the original query, is encoded by an unsupervised contrastive encoder. The reference setup uses Contriever for English and mContriever for multilingual experiments.^[1] Because the corpus index was built with the same encoder, the generated text lands in the same vector space as real passages.^[1]
Vector averaging: when multiple hypothetical documents are sampled (the paper uses N = 8), their embeddings are averaged with the query embedding to form a single search vector.^[1] The query embedding is included as a regularizer that anchors the search to the surface form of the question and dampens the influence of any single bad generation.^[1]
Nearest-neighbor search: the averaged search vector is used to retrieve real passages from a corpus-side dense index built with the same encoder, typically through FAISS or Pyserini.^[7] The retrieved passages, not the hypothetical document, are the final output of the system.^[1]

Why it works

The hypothetical document expresses the relevance pattern the user implicitly cares about. Although the generation may hallucinate specifics, the encoder's low-dimensional dense bottleneck filters out incorrect tokens and projects the output near the true answer's neighborhood in embedding space.^[1] HyDE thus exploits the LLM's world knowledge for query understanding while delegating final grounding to the corpus index, which prevents the system from returning fabricated content directly to the user; only real corpus passages are ever retrieved.^[1]

The technique also recasts the query, which is often a short keyword string or a question fragment, into the genre, vocabulary, and length distribution of the target corpus. This brings the search vector closer to the manifold occupied by the actual passages and avoids the well-known query-document length and style mismatch that hurts naive cosine similarity search with general-purpose encoders.^[1] The authors note that, in this sense, HyDE generalises the classical query expansion strategy of pseudo-relevance feedback: instead of expanding using terms drawn from a noisy first-pass retrieval, it expands using a passage drawn from the LLM's parametric memory.^[1]

Hyperparameters

The reference implementation exposes a small number of knobs: the prompt template, the generator model, the sampling temperature (set to 0.7 in the published experiments), the number of samples n (set to 8), and the maximum number of tokens per sample (set to 512).^[9] Lowering n reduces cost and latency at some cost in retrieval quality; raising temperature increases diversity across samples but also the risk of off-topic generations.^[9]

Prompt templates

The reference repository ships eight task-specific prompts, one per benchmark domain. Each prompt instructs the LLM to write a passage in the appropriate genre.^[6]

Task key	Prompt skeleton
`web_search`	"Please write a passage to answer the question."
`scifact`	"Please write a scientific paper passage to support/refute the claim."
`arguana`	"Please write a counter argument for the passage."
`trec_covid`	"Please write a scientific paper passage to answer the question."
`fiqa`	"Please write a financial article passage to answer the question."
`dbpedia_entity`	"Please write a passage to answer the question."
`trec_news`	"Please write a news passage about the topic."
`mr_tydi`	"Please write a passage in {language} to answer the question in detail."

The mr_tydi template additionally accepts a language name, which lets HyDE generate hypothetical passages in the target language before encoding them with a multilingual encoder.^[6]

How well does HyDE perform?

Gao and colleagues evaluated HyDE on three families of benchmarks: TREC Deep Learning web search (DL19, DL20), BEIR low-resource retrieval, and Mr.TyDi multilingual retrieval. All numbers below are taken from the paper's tables and reflect HyDE with InstructGPT plus Contriever (mContriever for Mr.TyDi).^[1]

Web search (TREC DL)

On the TREC Deep Learning passage tracks built on MS MARCO, HyDE reached nDCG@10 of 61.3 on DL19 and 57.9 on DL20, compared with Contriever (44.5 / 42.1) and lexical BM25 (50.6 / 48.0). The fine-tuned supervised baseline ContrieverFT scored 62.1 / 63.2, so HyDE essentially closed the gap on DL19 without any labels and remained competitive on DL20.^[1]

Low-resource BEIR tasks (nDCG@10)

Dataset	HyDE	Contriever	BM25
SciFact	69.1	64.9	67.9
ArguAna	46.6	37.9	39.7
TREC-COVID	59.3	27.3	59.5
FiQA	27.3	24.5	23.6
DBPedia	36.8	29.2	31.8
TREC-NEWS	44.0	34.8	39.5

Across BEIR datasets, HyDE strongly improved on the underlying Contriever encoder and matched or beat BM25 on every domain in the paper's table.^[1]

Multilingual retrieval (Mr.TyDi MRR@100)

Language	HyDE	mContriever	mContrieverFT
Swahili	41.7	38.3	51.2
Korean	30.6	22.3	34.2
Japanese	30.7	19.5	32.4
Bengali	41.3	35.3	42.3

On Mr.TyDi, HyDE roughly matched or approached the fully fine-tuned mContrieverFT on Korean, Japanese, and Bengali, and beat the zero-shot mContriever on every language tested, even though the prompts to InstructGPT were issued in English with a language-name slot.^[1]

Effect of the generator model

The paper additionally reports an ablation over the generator. Replacing InstructGPT with smaller or weaker generators (an unaligned GPT-3 base, or smaller instruction-tuned variants) degraded performance in line with how well each model followed the instruction to write a passage of the requested form.^[1] This is consistent with the conceptual claim that HyDE's quality is bounded by the generator's ability to produce a passage that looks like the answer should look, regardless of factual accuracy in the surface form.^[1]

Comparison with traditional dense retrievers and query expansion

The authors position HyDE against three families of baselines: lexical BM25, unsupervised dense retrievers (Contriever, mContriever), and supervised fine-tuned dense retrievers.^[1] HyDE is the strongest unsupervised method in every reported track and is the first zero-shot dense pipeline to approach supervised quality.^[1] Conceptually, it sits in the lineage of classical query expansion methods such as pseudo-relevance feedback (PRF), but instead of expanding using terms from initially retrieved documents, HyDE expands by generating a full pseudo-document with an LLM before any retrieval has occurred.^[1] A close successor, Query2doc by Wang, Yang, and Wei at Microsoft (March 2023, EMNLP 2023), uses few-shot prompting rather than zero-shot, concatenates the generated pseudo-document to the original query rather than averaging embeddings, and reports 3 to 15 percent BM25 gains on MS MARCO and TREC DL, with smaller but consistent gains layered on top of supervised dense retrievers.^[8] Other LLM-front-of-retrieval techniques released later in 2023 include multi-query expansion (the LLM rewrites a single query as several paraphrases), step-back questioning, and decomposed sub-query generation; LlamaIndex documents HyDE alongside StepDecomposeQueryTransform under the same Query Transformations heading.^[12]

Implementation

The official implementation lives in the texttron/hyde GitHub repository under the texttron organisation, the same group that maintains the Tevatron retrieval toolkit.^[7] It is written primarily as Jupyter notebooks driving Pyserini for dense indexing and FAISS for nearest-neighbor search, with a small Python package containing two main classes; the repository licence is Apache 2.0 and as of mid-2026 it had roughly 580 stars and 40 forks.^[7]

Repository layout

The repository ships two end-to-end notebooks. hyde-demo.ipynb walks through the full pipeline on a single example query: it loads a prebuilt Contriever FAISS index, prompts InstructGPT for eight hypothetical passages, averages the embeddings, and retrieves the top results from MS MARCO.^[7] hyde-dl19.ipynb reproduces the TREC DL19 numbers from the paper using Pyserini's evaluation scripts.^[7] Setup requires installing Pyserini, downloading the Contriever index that the repository links to, and configuring an OpenAI API key through an environment variable.^[7]

Key classes

The Promptor class (in src/hyde/promptor.py) stores the eight benchmark-specific prompt templates listed above and formats a query into the full instruction string.^[6] The Generator class (in src/hyde/generator.py) is an abstract base with two production subclasses: OpenAIGenerator, which calls the OpenAI API for InstructGPT or GPT-3 family models, and CohereGenerator, which calls Cohere.^[9] Both accept configuration for model name, API key, number of samples n (default 8), max_tokens (default 512), temperature (default 0.7), top_p, frequency and presence penalties, and a wait_till_success retry flag for handling rate-limit errors from the underlying API.^[9] Retrieval itself relies on a pre-built Contriever FAISS index that the repository links from external storage, and on a small HyDE class that wires Promptor, Generator, an encoder, and a searcher into a single e2e_search call.^[7]

Citation

The repository provides the following BibTeX entry for the arXiv version of the paper:^[7]

@article{hyde,
  title  = {Precise Zero-Shot Dense Retrieval without Relevance Labels},
  author = {Luyu Gao and Xueguang Ma and Jimmy Lin and Jamie Callan},
  journal = {arXiv preprint arXiv:2212.10496},
  year   = {2022}
}

Integrations

LangChain

LangChain exposes HyDE as HypotheticalDocumentEmbedder in the Python package and HydeRetriever in the TypeScript package. The Python class wraps any Embeddings model and any BaseLanguageModel, intercepts the embedding call, generates hypothetical documents with an LLMChain, and returns their averaged embedding to the caller.^[3] A convenience HypotheticalDocumentEmbedder.from_llm constructor accepts a prompt key from a built-in set that mirrors the texttron prompts (web_search, sci_fact, arguana, trec_covid, fiqa, dbpedia_entity, trec_news, mr_tydi), or a fully custom PromptTemplate.^[3] Because it conforms to the Embeddings interface, the resulting object can be passed directly into any LangChain vector store (for example a Chroma or FAISS store), which makes HyDE a drop-in replacement for a normal query embedder at indexing or query time.^[3]

The JavaScript HydeRetriever (now part of @langchain/community under the oss/javascript docs site) accepts a vectorStore, an llm, a k parameter for the number of results, and an optional custom prompt template with a single {question} variable; its defaults follow the prompts from the academic paper.^[10] In code, a minimal instantiation looks like new HydeRetriever({ vectorStore, llm, k: 1 }).^[10]

LlamaIndex

LlamaIndex ships HyDE as HyDEQueryTransform under llama_index.core.indices.query.query_transform. The transform accepts an llm, an optional hyde_prompt, and an include_original flag; it is composed with a base query engine through TransformQueryEngine so that the underlying vector index receives the hypothetical answer as its embedding text instead of the raw query.^[4] A typical usage looks like:

from llama_index.core.indices.query.query_transform import HyDEQueryTransform
from llama_index.core.query_engine import TransformQueryEngine

hyde = HyDEQueryTransform(include_original=True)
hyde_query_engine = TransformQueryEngine(query_engine, hyde)
response = hyde_query_engine.query(query_str)

LlamaIndex documents HyDE alongside StepDecomposeQueryTransform in its Query Transformations module and explicitly notes failure modes: open-ended or ambiguous queries can lead HyDE to fabricate misleading "answers" that pull retrieval off topic, and the team recommends inspecting outputs before deploying it on subjective questions.^[4] The framework's own example notebook reports that on a Paul Graham essays corpus, HyDE improved answer quality on factual queries by generating plausible content that lined up with what the essays actually said, while it could mislead on broader interpretive questions.^[4]

Other ecosystem use

Vector database vendors and tutorial sites have published their own HyDE walkthroughs that reuse the LangChain or LlamaIndex classes against backends such as Chroma, Pinecone, Weaviate, Qdrant, or Milvus.^[11] The technique has since been combined with rerankers, ColBERT late interaction, and sparse SPLADE vectors in hybrid retrieval pipelines, where HyDE handles the dense path and BM25 or SPLADE handles the lexical path before a cross-encoder rerank.^[11]^[12] Embedding-benchmark efforts such as MTEB have helped users decide which dense encoder to combine with HyDE for a given domain, since the technique inherits the underlying encoder's strengths and weaknesses.^[11]

Applications and Significance

HyDE was an early and unusually clean demonstration that LLM-generated text can substitute for missing supervision signal in information retrieval. It generalised an old idea (query expansion with pseudo documents) into a regime where the expansion is produced by a general-purpose instruction-tuned model rather than by relevance feedback over a first-pass index.^[1] For practitioners, HyDE became a near-default option to try whenever a vector store underperforms on a new domain, because it requires no labelling and no encoder fine-tuning, and it composes naturally with existing dual encoders and vector indexes.^[11]

Where HyDE helps in production RAG

In production retrieval-augmented generation stacks, HyDE is most often used when (a) the corpus is specialised (legal, biomedical, code, financial) and a general-purpose embedding model performs poorly out of the box, (b) the user queries are very short or keyword-like and the corpus passages are long and prose-like, or (c) the team lacks the labelled relevance pairs that would be needed to fine-tune the dense retriever for the domain.^[11] In these settings HyDE often closes most of the gap to a fine-tuned retriever for free, at the cost of one extra LLM call per query.^[1]^[11]

Broader influence

The paper also helped popularise the broader pattern of "LLM in front of retrieval", which now spans query rewriting, multi-query expansion, hallucination-aware reranking, and step-back questioning. Within months of the HyDE release, Query2doc adapted the approach to few-shot prompting and showed gains on top of BM25,^[8] and Wikipedia-style tutorials, vendor blogs, and follow-on retrieval surveys routinely cite HyDE as the canonical instance of the technique.^[11] HyDE is also referenced in textbook treatments of dense retrieval as the first clean negative result for the assumption that zero-shot dense retrievers cannot match supervised ones without labelled data.^[1]^[11]

Adoption in libraries and platforms

By mid-2024, HyDE was available as a built-in component in both major Python LLM frameworks (LangChain and LlamaIndex)^[3]^[4], in several vector database tutorial paths (Pinecone, Weaviate, Qdrant, Milvus, Chroma, Zilliz)^[11], and in academic toolkits including Pyserini-based notebooks that ship with the reference repository.^[7] Independent academic groups have applied the same generate-then-embed pattern to legal QA, biomedical literature search, code search, and developer support QA, sometimes branding their variant as "Adaptive HyDE" when the system chooses dynamically whether to invoke the LLM at all.^[11]

Limitations

Generator-dependent quality

The HyDE authors themselves caution that the technique inherits the failure modes of the underlying LLM. Hypothetical passages may hallucinate plausible but corpus-absent details, and the dense bottleneck only partially filters these out; performance therefore depends on whether the LLM has been exposed to enough domain-relevant text to write a useful pseudo-document.^[1] On niche corpora that fall outside the LLM's pre-training distribution (for example a private codebase, a regulatory archive, or a non-English low-resource domain), the generated passages may be generic platitudes that fail to discriminate between candidate documents in the corpus.^[1]

Ambiguous and open-ended queries

LlamaIndex's documentation flags two practical failure cases: queries that are ambiguous without context (the generated passage drifts away from the user intent, taking the embedding with it) and open-ended subjective questions (the LLM may inject bias that warps retrieval).^[4] For example, on a query like "What are the best machine learning algorithms?" the generator may write a passage anchored on a particular family of methods (decision trees, neural networks) that excludes documents about the alternatives, narrowing rather than broadening retrieval.^[4]

How fast is HyDE, and what does it cost?

Latency and cost are also issues. Each retrieval call now requires at least one LLM generation, which is far slower and more expensive than embedding a short query directly; sampling N = 8 hypothetical documents, as in the paper, multiplies the cost further.^[1] In a production setting where a normal query embedding might take a few tens of milliseconds, a single HyDE call can take a second or more even with a fast hosted model, plus the per-token cost of generation.^[1] These overheads have led some production systems to apply HyDE only on queries that the system judges retrieval-difficult, to use cheaper generators (such as a small open-weights model) for the generation step, or to cache hypothetical documents for common queries.^[11]

Knowledge leakage critique

A more fundamental critique appears in Yoon, Jung, Yoon, and Park's 2025 paper Hypothetical Documents or Knowledge Leakage? Rethinking LLM-based Query Expansion. Across three fact-verification benchmarks (FEVER, SciFact, and AVeriTeC) and seven LLMs, the authors show that HyDE and Query2doc gains correlate strongly with whether the LLM's generated text contains sentences entailed by the gold evidence: in most settings over 40 percent of generated documents matched gold evidence, peaking at 83.5 percent on FEVER with GPT-4o-mini, and the techniques fell below the no-expansion baseline on claims whose answers were not in the model's training distribution.^[13] They report that "performance improvements consistently occurred for claims whose generated documents included sentences entailed by gold evidence," and argue that some reported gains may reflect knowledge leakage from pre-training rather than improved query-document alignment.^[13] This implies that HyDE may be most valuable when the generator's parametric knowledge overlaps with the corpus and least valuable on the very out-of-distribution problems that motivate zero-shot retrieval in the first place.^[13]

Tighter coupling between generator and encoder

A practical engineering downside is that HyDE introduces a runtime coupling between the LLM and the dense encoder. Changing the generator (for example moving from InstructGPT to a newer model with a different writing style) changes the distribution of hypothetical documents and can shift which corpus passages end up in the top-k. Teams that adopt HyDE typically need to re-evaluate the pipeline whenever the generator is upgraded, in addition to the usual re-evaluation when the embedding model is upgraded.^[11]

Follow-up Work

Query2doc

Liang Wang, Nan Yang, and Furu Wei at Microsoft Research released Query2doc: Query Expansion with Large Language Models on arXiv on 14 March 2023 and published it at EMNLP 2023.^[8] Query2doc differs from HyDE on three axes. First, it uses few-shot rather than zero-shot prompting; the prompt includes several real query-document pairs as in-context examples before asking the LLM to write a pseudo-document for the new query.^[8] Second, it concatenates the generated pseudo-document to the original query string rather than averaging embeddings; the concatenated string is then fed to a normal retriever, which makes the method usable for sparse BM25 as well as dense indexes.^[8] Third, the reported gains are layered on supervised retrievers in addition to BM25, with the authors reporting a 3 to 15 percent improvement over BM25 alone on MS MARCO and TREC DL.^[8]

Hypothetical Documents or Knowledge Leakage?

Yoon, Jung, Yoon, and Park's 2025 critique paper, discussed above, is the most cited follow-up evaluation of HyDE-style methods. It frames HyDE and Query2doc not as orthogonal retrieval techniques but as proxies for memorisation, and recommends that future evaluations measure overlap with pre-training data when claiming gains from LLM-based query expansion.^[13]

Adaptive and decomposed variants

Subsequent work has explored adaptive HyDE pipelines that invoke the generator only on queries the system flags as retrieval-difficult, with the goal of recovering the quality wins on hard queries while avoiding the cost on easy ones.^[11] LlamaIndex has separately added query decomposition transforms that, like HyDE, manipulate the query before retrieval; the two are often combined.^[12] Application-domain papers have applied HyDE-style retrieval to developer support QA, biomedical literature search, and tutoring systems, in each case noting that the generated passage's quality dominates retrieval quality.^[11]

How does HyDE compare to other retrievers?

Method	Year	Mechanism	Supervision	Typical pairing
BM25	1994 (Robertson)	Lexical scoring	None	Sparse index
DPR	2020	Dual encoder, in-domain training	Supervised	FAISS
Contriever	2022	Unsupervised contrastive dual encoder	Self-supervised	FAISS
HyDE	2022	LLM hypothetical doc + unsupervised dense encoder	Zero-shot	Contriever + FAISS
Query2doc	2023	LLM pseudo-doc + concatenation with query	Few-shot	BM25 or dense retriever
ColBERT late interaction	2020 / v2 2021	Token-level MaxSim	Supervised	ColBERT index
SPLADE	2021	Learned sparse expansion	Supervised	Inverted index

HyDE is best understood as orthogonal rather than competing to most of the entries in this table: it modifies only the query path and can be layered on top of any dense retriever or even on BM25 (by using the generated pseudo-document as the search string), and it composes with reranking, hybrid sparse-plus-dense pipelines, and per-domain fine-tuning.^[1]^[11]

References

Luyu Gao, Xueguang Ma, Jimmy Lin, Jamie Callan, "Precise Zero-Shot Dense Retrieval without Relevance Labels", arXiv:2212.10496, 2022-12-20. https://arxiv.org/abs/2212.10496. Accessed 2026-06-24. ↩
Luyu Gao, Xueguang Ma, Jimmy Lin, Jamie Callan, "Precise Zero-Shot Dense Retrieval without Relevance Labels", Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Long Papers), ACL Anthology 2023.acl-long.99, pp. 1762-1777, 2023-07, Toronto, Canada. https://aclanthology.org/2023.acl-long.99/. Accessed 2026-06-24. ↩
LangChain, "HypotheticalDocumentEmbedder API reference", LangChain Python documentation, 2024. https://api.python.langchain.com/en/latest/chains/langchain.chains.hyde.base.HypotheticalDocumentEmbedder.html. Accessed 2026-05-21. ↩
LlamaIndex, "HyDE Query Transform Demo", LlamaIndex developer documentation, 2024. https://developers.llamaindex.ai/python/examples/query_transformations/hydequerytransformdemo/. Accessed 2026-05-21. ↩
Luyu Gao, "About", Personal website, Language Technologies Institute, Carnegie Mellon University, 2024. https://luyug.github.io/. Accessed 2026-05-21. ↩
texttron, "promptor.py", `texttron/hyde` GitHub repository, 2022. https://github.com/texttron/hyde/blob/main/src/hyde/promptor.py. Accessed 2026-05-21. ↩
texttron, "HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels", `texttron/hyde` GitHub repository README, 2022. https://github.com/texttron/hyde. Accessed 2026-06-24. ↩
Liang Wang, Nan Yang, Furu Wei, "Query2doc: Query Expansion with Large Language Models", arXiv:2303.07678, 2023-03-14 (EMNLP 2023). https://arxiv.org/abs/2303.07678. Accessed 2026-06-24. ↩
texttron, "generator.py", `texttron/hyde` GitHub repository, 2022. https://github.com/texttron/hyde/blob/main/src/hyde/generator.py. Accessed 2026-05-21. ↩
LangChain, "Hyde integration (JavaScript)", LangChain documentation, 2024. https://docs.langchain.com/oss/javascript/integrations/retrievers/hyde. Accessed 2026-05-21. ↩
Zilliz, "Better RAG with HyDE: Hypothetical Document Embeddings", Zilliz Learn, 2024. https://zilliz.com/learn/improve-rag-and-information-retrieval-with-hyde-hypothetical-document-embeddings. Accessed 2026-05-21. ↩
LlamaIndex, "Query Transformations", LlamaIndex developer documentation, 2024. https://developers.llamaindex.ai/python/framework/optimizing/advanced_retrieval/query_transformations/. Accessed 2026-05-21. ↩
Yejun Yoon, Jaeyoon Jung, Seunghyun Yoon, Kunwoo Park, "Hypothetical Documents or Knowledge Leakage? Rethinking LLM-based Query Expansion", arXiv:2504.14175, 2025-04-19. https://arxiv.org/abs/2504.14175. Accessed 2026-06-24. ↩
Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave, "Unsupervised Dense Information Retrieval with Contrastive Learning", arXiv:2112.09118, Transactions on Machine Learning Research, 2022. https://arxiv.org/abs/2112.09118. Accessed 2026-06-24. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributor · full history

Suggest edit

What links here

A Survey of Techniques for Maximizing LLM Performance (OpenAI Dev Day 2023)Abbreviations Chunking (information retrieval)Contextual retrieval Multi-hop RAG Retrieval-Augmented Generation Semantic chunking Step-Back Prompting

Background

What problem does HyDE solve?

Earlier approaches

Conceptual move

When was HyDE published?

How does HyDE work?

Pipeline

Why it works

Hyperparameters

Prompt templates

How well does HyDE perform?

Web search (TREC DL)

Low-resource BEIR tasks (nDCG@10)

Multilingual retrieval (Mr.TyDi MRR@100)

Effect of the generator model

Comparison with traditional dense retrievers and query expansion

Implementation

Repository layout

Key classes

Citation

Integrations

LangChain

LlamaIndex

Other ecosystem use

Applications and Significance

Where HyDE helps in production RAG

Broader influence

Adoption in libraries and platforms

Limitations

Generator-dependent quality

Ambiguous and open-ended queries

How fast is HyDE, and what does it cost?

Knowledge leakage critique

Tighter coupling between generator and encoder

Follow-up Work

Query2doc

Hypothetical Documents or Knowledge Leakage?

Adaptive and decomposed variants

How does HyDE compare to other retrievers?

See also

References

Improve this article

Related Articles

Similarity Measure

Vector embeddings

LlamaIndex

AI search

Embeddings

Information Retrieval

What links here

Related Articles

Similarity Measure

Vector embeddings

LlamaIndex

AI search

Embeddings

Information Retrieval

What links here