Hybrid search

Information Retrieval

25 min read

Updated Jun 27, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 27, 2026

Fact-checked

In review queue

Sources

37 citations

Revision

v3 · 5,050 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Hybrid search is an information retrieval technique that runs a lexical (sparse) keyword retriever, typically BM25, and a semantic (dense) vector retriever in parallel against the same corpus, then fuses their two ranked result lists into a single ranking, most often with Reciprocal Rank Fusion (RRF).^[1]^[2]^[3] It exists because the two methods fail in opposite ways: lexical search matches exact terms, rare tokens, identifiers, and out-of-vocabulary words but misses synonyms and paraphrase, whereas semantic search over dense vectors captures meaning and synonymy but can be "token-blind" to rare entities and product codes, so combining them recovers relevant documents that either method alone would miss.^[4]^[12]^[13] Hybrid retrieval has become the default first-stage architecture for production retrieval-augmented generation (RAG) systems because it consistently outperforms either single modality on heterogeneous out-of-domain benchmarks such as BEIR, and it is now a first-class feature of every major vector database and search engine, including Pinecone, Weaviate, Qdrant, Vespa, Elasticsearch, OpenSearch, and Azure AI Search.^[2]^[5]^[6]^[7]^[8]^[9]^[10]^[36]

In one sentence: hybrid search = keyword search (BM25) + vector search, with the two result lists merged (usually by RRF) so you get exact-match precision and semantic recall at the same time.

ELI5: what is hybrid search?

Imagine you ask a librarian to find a book. One librarian looks only for the exact words you said ("big red dog"); another ignores your exact words and instead understands what you mean (a story about a large friendly pet). Each one finds books the other misses. Hybrid search asks both librarians at once and then blends their two stacks of books into one best list. The keyword librarian is great at catching exact names, codes, and rare words; the meaning librarian is great at catching synonyms and ideas. Together they almost always find the right book.

What is hybrid search?

Hybrid search is a family of techniques that combine the ranked outputs of a lexical (sparse) retriever, typically BM25 or a learned sparse model such as SPLADE, with those of a dense (vector) retriever based on neural embeddings or Dense Passage Retrieval (DPR)-style bi-encoders.^[1]^[2] The two systems are run in parallel against the same corpus and their result lists are merged using a fusion algorithm, most commonly Reciprocal Rank Fusion (RRF), a weighted convex combination of normalized scores, or a learning-to-rank model trained on labeled relevance data.^[3]^[4] Hybrid retrieval has become the dominant first-stage retrieval architecture for production retrieval-augmented generation systems because it consistently outperforms either single modality alone on heterogeneous out-of-domain benchmarks such as BEIR.^[2]^[5] Major vector database vendors (Pinecone, Weaviate, Qdrant, Vespa) and search engines (Elasticsearch, OpenSearch, Azure AI Search) now expose hybrid search as a first-class primitive.^[6]^[7]^[8]^[9]^[10]^[36]

Microsoft's Azure AI Search documentation gives a compact operational definition: "Hybrid search is a single query request configured for both full-text and vector queries" that "runs full-text search and vector search in parallel" and "merges results from each query by using Reciprocal Rank Fusion (RRF)."^[36]

Where did hybrid search come from?

Classical text retrieval, formalized in the 1970s and 1980s, scored documents using sparse bag-of-words representations and term-weighting schemes such as TF-IDF and BM25 (Okapi BM25), which approximate the probability that a document is relevant to a query via term frequency, inverse document frequency, and document-length normalization.^[11] BM25 dominated open-domain retrieval for two decades because it is cheap to compute, requires no training data, and matches exact terms reliably, but it is fundamentally lexical: a query for "automobile" returns no documents that only mention "car".

The introduction of pre-trained transformer encoders such as BERT enabled a different style of retrieval: queries and documents are independently encoded into low-dimensional dense vectors, and relevance is measured by inner product or cosine similarity over those vectors. Karpukhin and colleagues at Facebook AI Research showed in 2020 that such a "dual encoder" Dense Passage Retriever (DPR), trained on weakly supervised question-passage pairs, could outperform BM25 by 9 to 19 points on top-20 retrieval accuracy for Natural Questions and several other open-domain QA benchmarks.^[12] Dense retrievers handle synonymy, paraphrase, and semantic similarity gracefully but tend to be "token-blind": they can miss rare named entities, product codes, identifiers, and out-of-vocabulary terms that are decisive for relevance.^[4]^[13]

These two failure modes are largely complementary. Luan, Eisenstein, Toutanova, and Collins formalized this observation in a 2021 Transactions of the Association for Computational Linguistics paper, demonstrating both theoretically and empirically that sparse bag-of-words models have unbounded capacity for long documents whereas fixed-dimension dual encoders are capacity-limited, and that simple sparse-dense hybrids "outperform strong alternatives in large-scale retrieval".^[14] The 2021 BEIR benchmark by Thakur and colleagues then provided large-scale evidence that dense models trained on MS MARCO frequently fail to generalize zero-shot to out-of-domain corpora, whereas BM25 remains a robust baseline; hybrid systems and reranking architectures were among the strongest configurations evaluated.^[2] Together these results catalyzed the modern view of hybrid search as a default architecture for production retrieval.

The first practical hybrid systems predate the dense-retrieval revival. Rank aggregation across multiple IR systems has been studied since at least the TREC evaluations of the 1990s, where techniques such as CombSUM, CombMNZ (Fox and Shaw), and Borda count were used to fuse runs from independent retrieval systems. The 2009 RRF paper by Cormack et al. compared these score-fusion methods with rank-fusion alternatives, including Condorcet pairwise voting, and showed that the deceptively simple reciprocal-rank score consistently won on TREC and on the spam-tracker corpora used in the paper, while needing neither training data nor per-system score normalization.^[3] This robustness is why RRF later became the default fusion operator for hybrid sparse-dense pipelines a decade later, when dense retrievers entered the picture: it works out of the box even when the two systems produce wildly incomparable raw scores.

Why is hybrid search better than vector search alone?

The core argument is that lexical and semantic retrieval have complementary blind spots, so fusing them recovers relevant documents that neither finds alone. Vector search excels at conceptual similarity (matching "affordable laptop" to "budget notebook") but degrades on exact-match needs such as product codes, error identifiers, dates, and people's names, which is exactly where keyword search is strongest.^[4]^[36] Microsoft's Azure AI Search team states the trade-off directly: "Hybrid search combines the strengths of vector search and keyword search," noting that "some scenarios, such as querying over product codes, highly specialized jargon, dates, and people's names, perform better with keyword search because it can identify exact matches."^[36]

The benchmark evidence is consistent. On a 12-task subset of BEIR, Elastic's research team reported that RRF combining a learned sparse encoder (ELSER), BM25, and a dense baseline raised average nDCG@10 by 1.4 percentage points over ELSER alone and 18 percentage points over BM25 alone, and that the hybrid result was "either better or similar to BM25 alone for all test data sets".^[5] That last point is the central practical attraction: hybrid search rarely loses to a strong sparse baseline, even with no tuning, so the downside risk of adopting it is low while the upside on out-of-domain queries is large.^[5] BEIR itself showed that many dense retrievers trained on MS MARCO underperform plain BM25 when transferred to new domains, so pairing the two is a hedge against the dense model's generalization gap.^[2]

How does hybrid search work?

A hybrid search system has three logical stages: independent first-stage retrieval, score or rank fusion, and (optionally) a second-stage reranker such as a ColBERT late-interaction model or a transformer cross-encoder.

First-stage retrieval

Sparse retrieval is normally implemented over an inverted index. For BM25 the score of a document d against query q is

BM25(q, d) = sum over t in q of IDF(t) * (f(t,d) * (k1+1)) / (f(t,d) + k1 * (1 - b + b * |d|/avgdl))

with parameters typically k1 in [1.2, 2.0] and b ~ 0.75.^[11] Learned sparse models such as SPLADE replace raw term weights with weights produced by a masked language model head over the BERT vocabulary, regularized to remain sparse, so the result can still be served by an inverted index but each document is "expanded" with semantically related terms.^[15]

Dense retrieval encodes the query into a single vector and looks up the approximate nearest neighbors of that vector in a precomputed index over document embeddings, typically using algorithms such as HNSW graphs or product quantization implemented in libraries like FAISS.^[16] Common encoders include DPR, sentence-transformer models derived from Sentence-BERT, and commercial embedding APIs from providers such as OpenAI, Cohere, and Jina.

How does Reciprocal Rank Fusion work?

The most widely deployed fusion method is Reciprocal Rank Fusion (RRF), introduced by Gordon V. Cormack, Charles L. A. Clarke, and Stefan Büttcher in a short paper at SIGIR 2009.^[3] Microsoft describes it as "an algorithm that evaluates search scores from multiple previously ranked results to produce a single, unified result set."^[37] Given n ranked result lists R_1, ..., R_n and a small constant k (the paper uses k = 60), each document d receives a fused score

RRFscore(d) = sum over i of 1 / (k + r_i(d))

where r_i(d) is the rank of d in list i (and the contribution is zero if d does not appear in list i).^[3]^[17] Documents are then sorted in descending RRF score. The constant k softens the influence of the very top ranks: a larger k allows mid-ranked documents to accumulate weight across multiple lists. Cormack and colleagues showed on TREC tracks that RRF, with no training data and no per-system tuning, outperformed both Condorcet-style voting and a learning-to-rank baseline that was given access to relevance judgments.^[3] Vendor implementations almost universally inherit the default k = 60: Elasticsearch's rrf retriever exposes rank_constant with default 60, Weaviate's rankedFusion uses the same constant in its 1 / (rank + 60) formula, Qdrant's Query API similarly defaults to 60, and Azure AI Search recommends "a small value, such as 60".^[7]^[17]^[18]^[37]

Convex combination of normalized scores

The alternative is to fuse raw scores rather than ranks. Because BM25 scores are unbounded positive values and cosine or inner-product similarities live in roughly [-1, 1], the two must first be normalized. Common normalizations include min-max within the result window, theoretical min-max (BM25 bounded below by zero, cosine by -1), and z-score normalization. The fused score is then

score(d) = alpha * s_dense(d) + (1 - alpha) * s_sparse(d)

with alpha in [0, 1] controlling the relative weight. Sebastian Bruch, Siyu Gai, and Amir Ingber of Pinecone analyzed this family in a 2022 paper "An Analysis of Fusion Functions for Hybrid Retrieval", finding that a learned convex combination is largely agnostic to the choice of score normalization, can be tuned with very few labeled examples, and outperforms RRF on both in-domain and out-of-domain BEIR-style evaluations; the same study reported that RRF is more parameter-sensitive than previously assumed.^[19] In Pinecone's product, alpha-weighted convex combination is implemented by scaling query and document sparse and dense components before computing a single dot product, so the underlying index can serve hybrid queries at native speed.^[6]^[20]

Learning-to-rank fusion

When labeled relevance data is available, the scores or ranks from sparse and dense retrievers can be treated as features and fed into a supervised learning-to-rank model (e.g., LambdaMART or a small neural ranker). This is the most flexible approach but requires per-domain training data, so it is typically used in mature search systems with large query logs rather than in zero-shot RAG pipelines.^[4]^[19]

Reranking stage

Hybrid retrieval is often paired with a second-stage neural reranker that scores the union of the top k results from each retriever using a transformer cross-encoder or a late-interaction model such as ColBERT. The reranker sees the query and candidate document jointly and produces a calibrated relevance score, at the cost of one forward pass per candidate. This cascaded design (cheap hybrid recall, expensive precise reranker) is the de facto reference architecture in toolkits such as Pyserini and Haystack.^[9]^[21]

What does the benchmark evidence show?

The BEIR (Benchmarking-IR) suite, introduced by Thakur, Reimers, Rücklé, Srivastava, and Gurevych at NeurIPS 2021, comprises 18 publicly available retrieval datasets spanning fact-checking, question answering, biomedical search, financial filings, and other domains.^[2] BEIR's central finding is that BM25 is a remarkably robust zero-shot baseline: many dense retrievers trained on MS MARCO underperform BM25 when transferred to out-of-domain corpora, while late-interaction and reranking-based models achieve the best mean nDCG@10 at substantially higher cost.^[2] Subsequent work, including hybrid retrieval studies and the analysis by Bruch et al., used BEIR as the standard testbed and reported that hybrid sparse-dense systems improve average nDCG@10 substantially over either single modality.^[5]^[19]

Elastic's research team, evaluating Elastic's Learned Sparse Encoder (ELSER), BM25, and a dense baseline on a 12-task BEIR subset, reported that RRF with k = 20 (window 1000) increased average nDCG@10 by 1.4 percentage points over ELSER alone and 18 percentage points over BM25 alone, and that a weighted linear combination calibrated on annotated data delivered a 6-point gain over ELSER alone and 24 points over BM25 alone.^[5] Importantly, the RRF result was "either better or similar to BM25 alone for all test data sets", which is the main practical attraction: hybrid search rarely loses to a strong sparse baseline, even without tuning.^[5] Comparable conclusions have been documented by OpenSearch's team, who benchmarked normalization-and-combination pipelines on BEIR and the Amazon ESCI dataset and concluded that min-max normalization plus arithmetic-mean combination yielded the best hybrid configuration in their setup.^[8]

On the MS MARCO passage ranking task, hybrid retrieval has long been a leaderboard staple. Pyserini, the Python toolkit for reproducible IR research developed in Jimmy Lin's lab at the University of Waterloo, reports reference hybrid runs combining BM25 or uniCOIL sparse signals with TCT-ColBERT-style dense encoders, and these consistently improve over either component alone.^[21] Luan and colleagues' TACL 2021 analysis showed similar gains on MS MARCO and on the Wikipedia-based Natural Questions corpus.^[14]

Which vector databases and search engines support hybrid search?

Hybrid search is now a feature of essentially every production vector store and search engine. The implementations differ in how they index sparse and dense signals, which fusion methods they expose, and whether they perform fusion at the index layer or as a post-processing step.

System	Sparse component	Dense component	Default fusion	Notes
Pinecone	Sparse vectors (BM25 or SPLADE)	Dense vectors	Convex combination via alpha	Single sparse-dense vector, `dotproduct` metric only.^[6]^[20]
Weaviate	BM25 with configurable `k1`, `b`, tokenization	Dense vector search	`relativeScoreFusion` (default since v1.24) and `rankedFusion` (RRF with `k = 60`)	`alpha` parameter weights vector vs keyword.^[7]
Qdrant	Sparse vectors / SPLADE	Dense vectors, multi-vector (ColBERT)	RRF in Query API (since v1.10, July 2024)	Supports prefetch-then-rerank pipelines and DBSF.^[18]
Elasticsearch	BM25 (Lucene) and ELSER learned sparse	kNN with HNSW	`rrf` retriever (default `rank_constant = 60`, weighted RRF added later)	Hybrid via the retrievers API and sub-searches.^[17]^[22]
OpenSearch	BM25 (Lucene) and learned sparse	kNN with HNSW or IVF	Search-pipeline `normalization-processor` (`min_max`, `l2`) plus combination (`arithmetic_mean`, `geometric_mean`, `harmonic_mean`); RRF processor added later	Per-clause weighting.^[8]^[23]
Azure AI Search	BM25 full-text (inverted index)	kNN with HNSW or exhaustive eKNN	RRF (`1/(rank + k)`, `k = 60`), with optional vector weighting and semantic reranker	Single request combining `search` + `vectorQueries`; default returns top 50.^[36]^[37]
Vespa	`bm25`, `nativeRank` text features, `weakAnd`	`nearestNeighbor` over HNSW tensor field	User-defined ranking expressions combining text and vector features	Single ranking framework, supports `RANK` operator for retrieve-with-features.^[10]^[24]
pgvector + tsvector	PostgreSQL `tsvector` / `tsquery` full-text search	pgvector cosine / inner product	Application-level RRF or weighted sum	Hybrid runs inside Postgres; commonly combined via RRF in a single SQL.^[25]

Pinecone

Pinecone introduced hybrid search built on a single "sparse-dense" vector type that requires the index to use the dotproduct distance metric.^[6]^[26] The official documentation describes sparse vectors with a very large number of dimensions but only a small proportion of non-zero values, one per vocabulary token, and recommends BM25 or SPLADE as the encoder.^[26] To merge signals the system uses a convex combination governed by an alpha parameter; the helper function hybrid_score_norm multiplies dense values by alpha and sparse values by (1 - alpha) before the dot product so the index itself computes the weighted score with no extra latency.^[20] Pinecone now also supports separate sparse and dense indexes with fusion at query time for users who prefer to manage the two modalities independently.^[20]

Weaviate

Weaviate exposes a hybrid search operator that runs BM25 and vector search in parallel, then applies one of two fusion algorithms.^[7] Since v1.24 the default is relativeScoreFusion, which scales the highest score in each modality to 1 and the lowest to 0, preserving magnitude information; Weaviate made it the default because it "retains more information from the original searches than rankedFusion, which only retains the rankings".^[7] The alternative rankedFusion is exactly RRF with k = 60 (score = 1 / (rank + 60)).^[7] An alpha parameter in [0, 1] weights the contributions, with alpha = 0.5 as the default; alpha = 1.0 is pure vector search and alpha = 0.0 is pure BM25. Weaviate keeps the full BM25 surface area (custom tokenization, stopwords, k1, b, AND/OR operator) so hybrid mode behaves like a superset of the BM25 endpoint.^[7]

Qdrant

Qdrant added a unified Query API in version 1.10 (July 2024) that can express hybrid retrieval as a single nested query, with prefetch sub-queries for sparse and dense vectors and a fusion stage on top.^[18] The built-in fusion is RRF; Qdrant also documents Relative Score Fusion (RSF) and Distribution-Based Score Fusion (DBSF) as alternatives that practitioners can apply.^[27] The same API natively supports reranking flows ("prefetch with sparse, rerank with ColBERT-style multi-vectors") and Matryoshka-style nested embedding queries.^[18]

Elasticsearch and OpenSearch

Elasticsearch added native RRF in the 8.x line, first via a sub-searches RRF construct and then via the retrievers API; the rrf retriever takes a rank_constant (default 60) and rank_window_size (default 100).^[17] A weighted variant published by Elastic Search Labs lets each retriever contribute with its own weight: rrf_score = w1 * rrf_1 + w2 * rrf_2, useful when one signal is known to be stronger than another.^[28] OpenSearch implements hybrid search through a search-pipeline normalization-processor that intercepts query-phase scores from each clause, normalizes them (min_max or l2) and combines them (arithmetic_mean, geometric_mean, or harmonic_mean) with per-clause weights.^[23]^[8] An RRF processor was added more recently to support rank-based fusion alongside the score-based pipeline.^[8]

Azure AI Search

Azure AI Search (formerly Azure Cognitive Search) implements hybrid search as a single query request that carries both a search (full-text) parameter and one or more vectorQueries, executed in parallel and merged with RRF.^[36]^[37] Full-text matches are scored with BM25, while vector matches use HNSW or exhaustive eKNN with the configured similarity metric; the fused @search.score is computed as 1/(rank + k) summed across queries, with Microsoft recommending "a small value, such as 60" for k.^[37] By default a hybrid query returns the top 50 results of the unified set, and individual vector queries can be weighted to increase or decrease their influence before fusion.^[37] An optional semantic ranker (a machine-reading-comprehension reranker producing a separate @search.rerankerScore in the range 0.00 to 4.00) can be layered on top, and Microsoft reports that "hybrid retrieval with semantic ranker offers significant benefits in search relevance" on its benchmark testing.^[36]^[37]

Vespa

Vespa, originally an internal Yahoo system released as open source in 2017, treats text matching and vector search as features in a single ranking pipeline rather than as separate operators.^[10] A hybrid query combines weakAnd or nearestNeighbor retrieval operators, and the ranking profile assembles arbitrary expressions over text features (bm25(title), nativeRank(content)) and vector features (closeness(field, embedding), cosine distance).^[10]^[24] Because the ranking expression is user-defined, fusion strategies in Vespa range from simple weighted sums to multi-phase ranking with a GBDT or transformer model in the second phase.^[24]

pgvector and tsvector in PostgreSQL

PostgreSQL supports hybrid search natively through the combination of its built-in tsvector / tsquery full-text search and the pgvector extension, which adds a vector data type and HNSW or IVFFlat indexing.^[25] A common pattern is to run both queries in a single SQL with WITH clauses, then merge with RRF: each subquery produces a ranked list, the SQL computes 1.0 / (k + rank) for each, sums per document, and orders by the sum.^[25] This approach is attractive because the entire pipeline stays inside a single transactional database without a separate vector store.

How is hybrid search used in RAG frameworks?

Modern RAG orchestration frameworks treat hybrid retrieval as a standard component.

LangChain

LangChain provides an EnsembleRetriever that accepts a list of retrievers (typically a BM25Retriever and a vector-store retriever) plus optional per-retriever weights, and fuses their results using Reciprocal Rank Fusion with a constant c (default 60).^[29] An alpha-style weighting is achieved through the weights argument: a [0.6, 0.4] configuration assigns 60% influence to the first retriever and 40% to the second.^[29]

LlamaIndex

LlamaIndex composes hybrid retrieval by combining a VectorIndexRetriever with a BM25Retriever (or a vendor-native hybrid like Pinecone's sparse-dense queries) and merging results.^[30] For vendors that expose an alpha parameter (e.g., Weaviate), LlamaIndex documents an "alpha tuning" recipe in which alpha is treated as a hyperparameter to be swept against the developer's own evaluation set.^[31]

Haystack

deepset's Haystack framework treats retrievers as composable pipeline nodes. A canonical hybrid pipeline wires an InMemoryBM25Retriever and an InMemoryEmbeddingRetriever (or their Elasticsearch / Weaviate / OpenSearch / Azure AI Search analogues) into a DocumentJoiner that performs RRF or weighted joining, and optionally appends a transformer ranker for second-stage scoring.^[21]^[32] Dedicated wrappers such as WeaviateHybridRetriever and AzureAISearchHybridRetriever push the fusion down into the vendor where supported.^[33]^[34]

What is hybrid search used for?

Hybrid search is widely used for:

Open-domain question answering and RAG, where queries mix natural-language phrasing (favoring dense retrieval) with exact entities, codes, or jargon (favoring sparse).
Enterprise search over heterogeneous corpora containing acronyms, product SKUs, error codes, and license identifiers that pure semantic systems regularly mishandle.
E-commerce search, where brand names, model numbers, and attribute filters interact with semantic intent ("affordable noise-cancelling headphones for travel"). The Amazon ESCI shopping queries dataset is one of the standard hybrid-search benchmarks.^[8]
Biomedical and legal retrieval, where domain-specific terminology and statute identifiers must be matched exactly, but synonymy and paraphrase remain important. BEIR includes TREC-COVID, NFCorpus, and SciFact precisely to expose how badly dense models trained on MS MARCO can transfer to such domains.^[2]
Code search, where identifier tokens and function names benefit from BM25 while natural-language docstrings benefit from dense encoders.

A worked example

A typical RAG ingestion pipeline backed by hybrid search looks roughly as follows. At indexing time the corpus is chunked, each chunk is sent through both a sparse encoder (BM25 statistics or a SPLADE model from Hugging Face) and a dense encoder (a Sentence-BERT variant or a hosted embedding API from OpenAI, Cohere, or Jina), and the two representations are written to the vector store and the inverted index respectively. At query time the same encoders convert the user's natural-language query into a sparse query vector and a dense query vector. The vector store retrieves the top k1 dense matches; the inverted index retrieves the top k2 sparse matches. The two lists are fused by RRF or convex combination, the resulting top n candidates are passed through a re-ranking cross-encoder, and the final top few are concatenated into the LLM prompt. Most of this orchestration is handled by LangChain, LlamaIndex, or Haystack with minimal user code, and modern vector stores increasingly perform the fusion step server-side so the client only sees a single ranked list.

What are the limitations of hybrid search?

Hybrid search is not free. Operationally, the system must maintain two indexes (or a hybrid index that supports both), pay query-time latency for both retrievers, and tune at least one fusion parameter (alpha, k, or per-modality weights). The Pinecone fusion-analysis paper explicitly documents that RRF, contrary to its "tuning-free" reputation, can be parameter-sensitive when the two underlying systems differ in calibration, and recommends a small labeled set to tune a convex combination instead.^[19]

The benchmark literature also contains caveats. Lassance and colleagues highlighted in 2023 that MS MARCO comes in subtly different "preprocessed" variants and that head-to-head comparisons in the literature have sometimes compared systems trained or evaluated on different versions, leading to overstated improvements.^[35] BEIR's own conclusion is that dense retrievers underperform BM25 in many out-of-domain settings, so a hybrid that simply sums an under-trained dense run with BM25 can recover most, but not all, of the lost ground without addressing the underlying generalization gap.^[2]

Finally, hybrid retrieval does not solve the problem of bad documents in the index. If a corpus contains contradictory facts or stale text, returning more of them via hybrid recall amplifies the downstream burden on the LLM-side reranker or generator. The cascaded design with a precise neural reranker is therefore standard in production RAG.

Hybrid search sits at the intersection of several active research areas:

BM25 and TF-IDF remain the sparse baselines against which all dense and learned-sparse models are compared.^[11]
SPLADE and other learned sparse models (uniCOIL, DeepImpact) keep the inverted-index serving stack but learn term weights, blurring the sparse/dense distinction.^[15]
Dense Passage Retrieval (DPR) established the modern dual-encoder paradigm that powers the dense side of most hybrids.^[12]
ColBERT late-interaction retrieval is increasingly used as a second-stage reranker on top of hybrid first-stage retrieval.
Re-ranking with transformer cross-encoders is the standard precision step after hybrid recall.
Semantic search is the broader umbrella that hybrid search refines by reintroducing lexical signals.
Vector database systems and the broader vector databases ecosystem provide the infrastructure for the dense side of hybrid pipelines.

References

Stephen Robertson and Hugo Zaragoza, "The Probabilistic Relevance Framework: BM25 and Beyond", *Foundations and Trends in Information Retrieval*, 2009-04-09. https://www.staff.city.ac.uk/~sbrp622/papers/foundations_bm25_review.pdf. Accessed 2026-05-21. ↩
Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, Iryna Gurevych, "BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models", arXiv:2104.08663 (NeurIPS 2021 Datasets and Benchmarks), 2021-10-21. https://arxiv.org/abs/2104.08663. Accessed 2026-05-21. ↩
Gordon V. Cormack, Charles L. A. Clarke, Stefan Büttcher, "Reciprocal Rank Fusion outperforms Condorcet and Individual Rank Learning Methods", *Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval*, 2009-07-19. https://dl.acm.org/doi/10.1145/1571941.1572114. Accessed 2026-05-21. ↩
Helain Zimmermann, "Hybrid Search: Combining Dense and Sparse Retrieval", helain-zimmermann.com, 2024. https://helain-zimmermann.com/blog/hybrid-search-combining-dense-and-sparse-retrieval. Accessed 2026-05-21. ↩
Quentin Herreros et al., "Improving information retrieval in the Elastic Stack: Hybrid retrieval", Elastic Search Labs, 2023. https://www.elastic.co/search-labs/blog/improving-information-retrieval-elastic-stack-hybrid. Accessed 2026-05-21. ↩
Pinecone, "Understanding hybrid search", Pinecone Docs. https://docs.pinecone.io/guides/data/understanding-hybrid-search. Accessed 2026-05-21. ↩
Weaviate, "Hybrid search", Weaviate Documentation. https://docs.weaviate.io/weaviate/concepts/search/hybrid-search. Accessed 2026-06-27. ↩
OpenSearch Project, "Building effective hybrid search in OpenSearch: Techniques and best practices", OpenSearch Blog, 2024. https://opensearch.org/blog/building-effective-hybrid-search-in-opensearch-techniques-and-best-practices/. Accessed 2026-05-21. ↩
deepset, "Creating a Hybrid Retrieval Pipeline", Haystack Tutorial 33. https://haystack.deepset.ai/tutorials/33_hybrid_retrieval. Accessed 2026-05-21. ↩
Vespa, "Hybrid Text Search Tutorial", Vespa Documentation. https://docs.vespa.ai/en/learn/tutorials/hybrid-search.html. Accessed 2026-05-21. ↩
Stephen Robertson, Hugo Zaragoza, Michael Taylor, "Simple BM25 extension to multiple weighted fields", Microsoft Research, in *Proceedings of CIKM 2004*, 2004. https://www.microsoft.com/en-us/research/publication/simple-bm25-extension-to-multiple-weighted-fields/. Accessed 2026-05-21. ↩
Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih, "Dense Passage Retrieval for Open-Domain Question Answering", arXiv:2004.04906 (EMNLP 2020), 2020-09-30. https://arxiv.org/abs/2004.04906. Accessed 2026-05-21. ↩
Qdrant, "What is a Sparse Vector? How to Achieve Vector-based Hybrid Search", Qdrant Articles. https://qdrant.tech/articles/sparse-vectors/. Accessed 2026-05-21. ↩
Yi Luan, Jacob Eisenstein, Kristina Toutanova, Michael Collins, "Sparse, Dense, and Attentional Representations for Text Retrieval", *Transactions of the Association for Computational Linguistics* 9:329-345, 2021-04-12. https://aclanthology.org/2021.tacl-1.20/. Accessed 2026-05-21. ↩
Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant, "SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking", *Proceedings of SIGIR 2021* (also SPLADE v2, arXiv:2109.10086), 2021-07-11. https://arxiv.org/abs/2109.10086. Accessed 2026-05-21. ↩
Yu. A. Malkov, D. A. Yashunin, "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs", arXiv:1603.09320, 2018-08-14. https://arxiv.org/abs/1603.09320. Accessed 2026-05-21. ↩
Elastic, "Reciprocal rank fusion", Elasticsearch Reference. https://www.elastic.co/docs/reference/elasticsearch/rest-apis/reciprocal-rank-fusion. Accessed 2026-05-21. ↩
Qdrant, "Hybrid Search Revamped, Building with Qdrant's Query API", Qdrant Articles, 2024-07-25. https://qdrant.tech/articles/hybrid-search/. Accessed 2026-05-21. ↩
Sebastian Bruch, Siyu Gai, Amir Ingber, "An Analysis of Fusion Functions for Hybrid Retrieval", arXiv:2210.11934 (also in ACM Transactions on Information Systems), 2023-05. https://arxiv.org/abs/2210.11934. Accessed 2026-05-21. ↩
Pinecone, "Hybrid search", Pinecone Docs. https://docs.pinecone.io/guides/search/hybrid-search. Accessed 2026-05-21. ↩
Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, Rodrigo Nogueira, "Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations", arXiv:2102.10073 (also SIGIR 2021 demo), 2021-02-19. https://arxiv.org/abs/2102.10073. Accessed 2026-05-21. ↩
Elastic, "RRF retriever", Elasticsearch Reference. https://www.elastic.co/docs/reference/elasticsearch/rest-apis/retrievers/rrf-retriever. Accessed 2026-05-21. ↩
OpenSearch Project, "Normalization processor", OpenSearch Documentation. https://docs.opensearch.org/latest/search-plugins/search-pipelines/normalization-processor/. Accessed 2026-05-21. ↩
Vespa, "Redefining Hybrid Search Possibilities with Vespa", Vespa Blog. https://blog.vespa.ai/redefining-hybrid-search-possibilities-with-vespa/. Accessed 2026-05-21. ↩
ParadeDB, "Hybrid Search in PostgreSQL: The Missing Manual", ParadeDB Blog. https://www.paradedb.com/blog/hybrid-search-in-postgresql-the-missing-manual. Accessed 2026-05-21. ↩
Pinecone, "Encode sparse vectors", Pinecone Docs. https://docs.pinecone.io/guides/data/encode-sparse-vectors. Accessed 2026-05-21. ↩
Qdrant, "Hybrid Search and the Universal Query API", Qdrant Course. https://qdrant.tech/course/essentials/day-3/hybrid-search/. Accessed 2026-05-21. ↩
Elastic Search Labs, "Balancing the scales: Making reciprocal rank fusion (RRF) smarter with weights", 2024. https://www.elastic.co/search-labs/blog/weighted-reciprocal-rank-fusion-rrf. Accessed 2026-05-21. ↩
LangChain, "EnsembleRetriever", LangChain Python API Reference. https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.ensemble.EnsembleRetriever.html. Accessed 2026-05-21. ↩
LlamaIndex, "BM25 Retriever", LlamaIndex Documentation. https://developers.llamaindex.ai/python/examples/retrievers/bm25_retriever/. Accessed 2026-05-21. ↩
Ravi Theja, "LlamaIndex: Enhancing Retrieval Performance with Alpha Tuning in Hybrid Search in RAG", LlamaIndex Blog. https://www.llamaindex.ai/blog/llamaindex-enhancing-retrieval-performance-with-alpha-tuning-in-hybrid-search-in-rag-135d0c9b8a00. Accessed 2026-05-21. ↩
deepset, "Hybrid Document Retrieval", Haystack Blog. https://haystack.deepset.ai/blog/hybrid-retrieval. Accessed 2026-05-21. ↩
deepset, "WeaviateHybridRetriever", Haystack Documentation. https://docs.haystack.deepset.ai/docs/weaviatehybridretriever. Accessed 2026-05-21. ↩
deepset, "AzureAISearchHybridRetriever", Haystack Documentation. https://docs.haystack.deepset.ai/docs/azureaisearchhybridretriever. Accessed 2026-05-21. ↩
Carlos Lassance, Stéphane Clinchant, "The Tale of Two MSMARCO and Their Unfair Comparisons", *Proceedings of SIGIR 2023*, arXiv:2304.12904, 2023-04-25. https://arxiv.org/abs/2304.12904. Accessed 2026-05-21. ↩
Microsoft, "Hybrid search using vectors and full text in Azure AI Search", Microsoft Learn (Azure AI Search documentation), updated 2026-02-19. https://learn.microsoft.com/en-us/azure/search/hybrid-search-overview. Accessed 2026-06-27. ↩
Microsoft, "Hybrid Search Scoring (RRF) in Azure AI Search", Microsoft Learn (Azure AI Search documentation), updated 2026-06-08. https://learn.microsoft.com/en-us/azure/search/hybrid-search-ranking. Accessed 2026-06-27. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributor · full history

Suggest edit

What links here

Chunking (information retrieval)Qdrant Reranker Retrieval-Augmented Generation Vespa (search engine)

ELI5: what is hybrid search?

What is hybrid search?

Where did hybrid search come from?

Why is hybrid search better than vector search alone?

How does hybrid search work?

First-stage retrieval

How does Reciprocal Rank Fusion work?

Convex combination of normalized scores

Learning-to-rank fusion

Reranking stage

What does the benchmark evidence show?

Which vector databases and search engines support hybrid search?

Pinecone

Weaviate

Qdrant

Elasticsearch and OpenSearch

Azure AI Search

Vespa

pgvector and tsvector in PostgreSQL

How is hybrid search used in RAG frameworks?

LangChain

LlamaIndex

Haystack

What is hybrid search used for?

A worked example

What are the limitations of hybrid search?

Related Work

See also

References

Improve this article

Related Articles

Average Precision

Candidate Generation

Hashing

Ranking

Re-ranking

SEO

What links here

Related Articles

Average Precision

Candidate Generation

Hashing

Ranking

Re-ranking

SEO

What links here