# Hybrid search

> Source: https://aiwiki.ai/wiki/hybrid_search
> Updated: 2026-06-27
> Categories: Information Retrieval
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**Hybrid search** is an [information retrieval](/wiki/information_retrieval) technique that runs a lexical (sparse) keyword retriever, typically [BM25](/wiki/bm25), and a semantic (dense) vector retriever in parallel against the same corpus, then fuses their two ranked result lists into a single ranking, most often with Reciprocal Rank Fusion (RRF).[^1][^2][^3] It exists because the two methods fail in opposite ways: lexical search matches exact terms, rare tokens, identifiers, and out-of-vocabulary words but misses synonyms and paraphrase, whereas [semantic search](/wiki/semantic_search) over dense vectors captures meaning and synonymy but can be "token-blind" to rare entities and product codes, so combining them recovers relevant documents that either method alone would miss.[^4][^12][^13] Hybrid retrieval has become the default first-stage architecture for production [retrieval-augmented generation (RAG)](/wiki/retrieval_augmented_generation_rag) systems because it consistently outperforms either single modality on heterogeneous out-of-domain benchmarks such as BEIR, and it is now a first-class feature of every major [vector database](/wiki/vector_database) and search engine, including Pinecone, Weaviate, Qdrant, Vespa, Elasticsearch, OpenSearch, and Azure AI Search.[^2][^5][^6][^7][^8][^9][^10][^36]

**In one sentence:** hybrid search = keyword search (BM25) + vector search, with the two result lists merged (usually by RRF) so you get exact-match precision and semantic recall at the same time.

## ELI5: what is hybrid search?

Imagine you ask a librarian to find a book. One librarian looks only for the exact words you said ("big red dog"); another ignores your exact words and instead understands what you *mean* (a story about a large friendly pet). Each one finds books the other misses. Hybrid search asks both librarians at once and then blends their two stacks of books into one best list. The keyword librarian is great at catching exact names, codes, and rare words; the meaning librarian is great at catching synonyms and ideas. Together they almost always find the right book.

## What is hybrid search?

Hybrid search is a family of techniques that combine the ranked outputs of a lexical (sparse) retriever, typically [BM25](/wiki/bm25) or a learned sparse model such as [SPLADE](/wiki/splade), with those of a dense (vector) retriever based on neural [embeddings](/wiki/embeddings) or [Dense Passage Retrieval (DPR)](/wiki/dense_passage_retrieval)-style bi-encoders.[^1][^2] The two systems are run in parallel against the same corpus and their result lists are merged using a fusion algorithm, most commonly Reciprocal Rank Fusion (RRF), a weighted convex combination of normalized scores, or a learning-to-rank model trained on labeled relevance data.[^3][^4] Hybrid retrieval has become the dominant first-stage retrieval architecture for production [retrieval-augmented generation](/wiki/retrieval_augmented_generation_rag) systems because it consistently outperforms either single modality alone on heterogeneous out-of-domain benchmarks such as BEIR.[^2][^5] Major [vector database](/wiki/vector_database) vendors (Pinecone, Weaviate, Qdrant, Vespa) and search engines (Elasticsearch, OpenSearch, Azure AI Search) now expose hybrid search as a first-class primitive.[^6][^7][^8][^9][^10][^36]

Microsoft's Azure AI Search documentation gives a compact operational definition: "Hybrid search is a single query request configured for both full-text and vector queries" that "runs full-text search and vector search in parallel" and "merges results from each query by using Reciprocal Rank Fusion (RRF)."[^36]

## Where did hybrid search come from?

Classical text retrieval, formalized in the 1970s and 1980s, scored documents using sparse bag-of-words representations and term-weighting schemes such as [TF-IDF](/wiki/tf_idf) and BM25 (Okapi BM25), which approximate the probability that a document is relevant to a query via term frequency, inverse document frequency, and document-length normalization.[^11] BM25 dominated open-domain retrieval for two decades because it is cheap to compute, requires no training data, and matches exact terms reliably, but it is fundamentally lexical: a query for "automobile" returns no documents that only mention "car".

The introduction of pre-trained transformer encoders such as [BERT](/wiki/bert) enabled a different style of retrieval: queries and documents are independently encoded into low-dimensional dense vectors, and relevance is measured by inner product or cosine similarity over those vectors. Karpukhin and colleagues at Facebook AI Research showed in 2020 that such a "dual encoder" Dense Passage Retriever (DPR), trained on weakly supervised question-passage pairs, could outperform BM25 by 9 to 19 points on top-20 retrieval accuracy for Natural Questions and several other open-domain QA benchmarks.[^12] Dense retrievers handle synonymy, paraphrase, and semantic similarity gracefully but tend to be "token-blind": they can miss rare named entities, product codes, identifiers, and out-of-vocabulary terms that are decisive for relevance.[^4][^13]

These two failure modes are largely complementary. Luan, Eisenstein, Toutanova, and Collins formalized this observation in a 2021 *Transactions of the Association for Computational Linguistics* paper, demonstrating both theoretically and empirically that sparse bag-of-words models have unbounded capacity for long documents whereas fixed-dimension dual encoders are capacity-limited, and that simple sparse-dense hybrids "outperform strong alternatives in large-scale retrieval".[^14] The 2021 BEIR benchmark by Thakur and colleagues then provided large-scale evidence that dense models trained on MS MARCO frequently fail to generalize zero-shot to out-of-domain corpora, whereas BM25 remains a robust baseline; hybrid systems and reranking architectures were among the strongest configurations evaluated.[^2] Together these results catalyzed the modern view of hybrid search as a default architecture for production retrieval.

The first practical hybrid systems predate the dense-retrieval revival. Rank aggregation across multiple IR systems has been studied since at least the TREC evaluations of the 1990s, where techniques such as CombSUM, CombMNZ (Fox and Shaw), and Borda count were used to fuse runs from independent retrieval systems. The 2009 RRF paper by Cormack et al. compared these score-fusion methods with rank-fusion alternatives, including Condorcet pairwise voting, and showed that the deceptively simple reciprocal-rank score consistently won on TREC and on the spam-tracker corpora used in the paper, while needing neither training data nor per-system score normalization.[^3] This robustness is why RRF later became the default fusion operator for hybrid sparse-dense pipelines a decade later, when dense retrievers entered the picture: it works out of the box even when the two systems produce wildly incomparable raw scores.

## Why is hybrid search better than vector search alone?

The core argument is that lexical and semantic retrieval have complementary blind spots, so fusing them recovers relevant documents that neither finds alone. Vector search excels at conceptual similarity (matching "affordable laptop" to "budget notebook") but degrades on exact-match needs such as product codes, error identifiers, dates, and people's names, which is exactly where keyword search is strongest.[^4][^36] Microsoft's Azure AI Search team states the trade-off directly: "Hybrid search combines the strengths of vector search and keyword search," noting that "some scenarios, such as querying over product codes, highly specialized jargon, dates, and people's names, perform better with keyword search because it can identify exact matches."[^36]

The benchmark evidence is consistent. On a 12-task subset of BEIR, Elastic's research team reported that RRF combining a learned sparse encoder (ELSER), BM25, and a dense baseline raised average nDCG@10 by 1.4 percentage points over ELSER alone and 18 percentage points over BM25 alone, and that the hybrid result was "either better or similar to BM25 alone for all test data sets".[^5] That last point is the central practical attraction: hybrid search rarely loses to a strong sparse baseline, even with no tuning, so the downside risk of adopting it is low while the upside on out-of-domain queries is large.[^5] BEIR itself showed that many dense retrievers trained on MS MARCO underperform plain BM25 when transferred to new domains, so pairing the two is a hedge against the dense model's generalization gap.[^2]

## How does hybrid search work?

A hybrid search system has three logical stages: independent first-stage retrieval, score or rank fusion, and (optionally) a second-stage reranker such as a [ColBERT](/wiki/colbert) late-interaction model or a transformer cross-encoder.

### First-stage retrieval

Sparse retrieval is normally implemented over an inverted index. For BM25 the score of a document `d` against query `q` is

```
BM25(q, d) = sum over t in q of IDF(t) * (f(t,d) * (k1+1)) / (f(t,d) + k1 * (1 - b + b * |d|/avgdl))
```

with parameters typically `k1 in [1.2, 2.0]` and `b ~ 0.75`.[^11] Learned sparse models such as SPLADE replace raw term weights with weights produced by a masked language model head over the BERT vocabulary, regularized to remain sparse, so the result can still be served by an inverted index but each document is "expanded" with semantically related terms.[^15]

Dense retrieval encodes the query into a single vector and looks up the approximate nearest neighbors of that vector in a precomputed index over document embeddings, typically using algorithms such as [HNSW](/wiki/hnsw) graphs or product quantization implemented in libraries like [FAISS](/wiki/faiss).[^16] Common encoders include DPR, sentence-transformer models derived from [Sentence-BERT](/wiki/sentence-bert), and commercial embedding APIs from providers such as OpenAI, [Cohere](/wiki/cohere), and [Jina](/wiki/jina_embeddings_v3).

### How does Reciprocal Rank Fusion work?

The most widely deployed fusion method is Reciprocal Rank Fusion (RRF), introduced by Gordon V. Cormack, Charles L. A. Clarke, and Stefan Büttcher in a short paper at SIGIR 2009.[^3] Microsoft describes it as "an algorithm that evaluates search scores from multiple previously ranked results to produce a single, unified result set."[^37] Given `n` ranked result lists `R_1, ..., R_n` and a small constant `k` (the paper uses `k = 60`), each document `d` receives a fused score

```
RRFscore(d) = sum over i of 1 / (k + r_i(d))
```

where `r_i(d)` is the rank of `d` in list `i` (and the contribution is zero if `d` does not appear in list `i`).[^3][^17] Documents are then sorted in descending RRF score. The constant `k` softens the influence of the very top ranks: a larger `k` allows mid-ranked documents to accumulate weight across multiple lists. Cormack and colleagues showed on TREC tracks that RRF, with no training data and no per-system tuning, outperformed both Condorcet-style voting and a learning-to-rank baseline that was given access to relevance judgments.[^3] Vendor implementations almost universally inherit the default `k = 60`: Elasticsearch's `rrf` retriever exposes `rank_constant` with default 60, Weaviate's `rankedFusion` uses the same constant in its `1 / (rank + 60)` formula, Qdrant's Query API similarly defaults to 60, and Azure AI Search recommends "a small value, such as 60".[^7][^17][^18][^37]

### Convex combination of normalized scores

The alternative is to fuse raw scores rather than ranks. Because BM25 scores are unbounded positive values and cosine or inner-product similarities live in roughly `[-1, 1]`, the two must first be normalized. Common normalizations include min-max within the result window, theoretical min-max (BM25 bounded below by zero, cosine by `-1`), and z-score normalization. The fused score is then

```
score(d) = alpha * s_dense(d) + (1 - alpha) * s_sparse(d)
```

with `alpha in [0, 1]` controlling the relative weight. Sebastian Bruch, Siyu Gai, and Amir Ingber of Pinecone analyzed this family in a 2022 paper "An Analysis of Fusion Functions for Hybrid Retrieval", finding that a learned convex combination is largely agnostic to the choice of score normalization, can be tuned with very few labeled examples, and outperforms RRF on both in-domain and out-of-domain BEIR-style evaluations; the same study reported that RRF is more parameter-sensitive than previously assumed.[^19] In Pinecone's product, alpha-weighted convex combination is implemented by scaling query and document sparse and dense components before computing a single dot product, so the underlying index can serve hybrid queries at native speed.[^6][^20]

### Learning-to-rank fusion

When labeled relevance data is available, the scores or ranks from sparse and dense retrievers can be treated as features and fed into a supervised learning-to-rank model (e.g., LambdaMART or a small neural ranker). This is the most flexible approach but requires per-domain training data, so it is typically used in mature search systems with large query logs rather than in zero-shot RAG pipelines.[^4][^19]

### Reranking stage

Hybrid retrieval is often paired with a second-stage neural reranker that scores the union of the top `k` results from each retriever using a transformer cross-encoder or a late-interaction model such as ColBERT. The reranker sees the query and candidate document jointly and produces a calibrated relevance score, at the cost of one forward pass per candidate. This cascaded design (cheap hybrid recall, expensive precise reranker) is the de facto reference architecture in toolkits such as Pyserini and Haystack.[^9][^21]

## What does the benchmark evidence show?

The BEIR (Benchmarking-IR) suite, introduced by Thakur, Reimers, Rücklé, Srivastava, and Gurevych at NeurIPS 2021, comprises 18 publicly available retrieval datasets spanning fact-checking, question answering, biomedical search, financial filings, and other domains.[^2] BEIR's central finding is that BM25 is a remarkably robust zero-shot baseline: many dense retrievers trained on MS MARCO underperform BM25 when transferred to out-of-domain corpora, while late-interaction and reranking-based models achieve the best mean nDCG@10 at substantially higher cost.[^2] Subsequent work, including hybrid retrieval studies and the analysis by Bruch et al., used BEIR as the standard testbed and reported that hybrid sparse-dense systems improve average nDCG@10 substantially over either single modality.[^5][^19]

Elastic's research team, evaluating Elastic's Learned Sparse Encoder (ELSER), BM25, and a dense baseline on a 12-task BEIR subset, reported that RRF with `k = 20` (window 1000) increased average nDCG@10 by 1.4 percentage points over ELSER alone and 18 percentage points over BM25 alone, and that a weighted linear combination calibrated on annotated data delivered a 6-point gain over ELSER alone and 24 points over BM25 alone.[^5] Importantly, the RRF result was "either better or similar to BM25 alone for all test data sets", which is the main practical attraction: hybrid search rarely loses to a strong sparse baseline, even without tuning.[^5] Comparable conclusions have been documented by OpenSearch's team, who benchmarked normalization-and-combination pipelines on BEIR and the Amazon ESCI dataset and concluded that min-max normalization plus arithmetic-mean combination yielded the best hybrid configuration in their setup.[^8]

On the MS MARCO passage ranking task, hybrid retrieval has long been a leaderboard staple. Pyserini, the Python toolkit for reproducible IR research developed in Jimmy Lin's lab at the University of Waterloo, reports reference hybrid runs combining BM25 or uniCOIL sparse signals with TCT-ColBERT-style dense encoders, and these consistently improve over either component alone.[^21] Luan and colleagues' TACL 2021 analysis showed similar gains on MS MARCO and on the Wikipedia-based Natural Questions corpus.[^14]

## Which vector databases and search engines support hybrid search?

Hybrid search is now a feature of essentially every production vector store and search engine. The implementations differ in how they index sparse and dense signals, which fusion methods they expose, and whether they perform fusion at the index layer or as a post-processing step.

| System | Sparse component | Dense component | Default fusion | Notes |
|---|---|---|---|---|
| Pinecone | Sparse vectors (BM25 or SPLADE) | Dense vectors | Convex combination via alpha | Single sparse-dense vector, `dotproduct` metric only.[^6][^20] |
| Weaviate | BM25 with configurable `k1`, `b`, tokenization | Dense vector search | `relativeScoreFusion` (default since v1.24) and `rankedFusion` (RRF with `k = 60`) | `alpha` parameter weights vector vs keyword.[^7] |
| Qdrant | Sparse vectors / SPLADE | Dense vectors, multi-vector (ColBERT) | RRF in Query API (since v1.10, July 2024) | Supports prefetch-then-rerank pipelines and DBSF.[^18] |
| Elasticsearch | BM25 (Lucene) and ELSER learned sparse | kNN with HNSW | `rrf` retriever (default `rank_constant = 60`, weighted RRF added later) | Hybrid via the retrievers API and sub-searches.[^17][^22] |
| OpenSearch | BM25 (Lucene) and learned sparse | kNN with HNSW or IVF | Search-pipeline `normalization-processor` (`min_max`, `l2`) plus combination (`arithmetic_mean`, `geometric_mean`, `harmonic_mean`); RRF processor added later | Per-clause weighting.[^8][^23] |
| Azure AI Search | BM25 full-text (inverted index) | kNN with HNSW or exhaustive eKNN | RRF (`1/(rank + k)`, `k = 60`), with optional vector weighting and semantic reranker | Single request combining `search` + `vectorQueries`; default returns top 50.[^36][^37] |
| Vespa | `bm25`, `nativeRank` text features, `weakAnd` | `nearestNeighbor` over HNSW tensor field | User-defined ranking expressions combining text and vector features | Single ranking framework, supports `RANK` operator for retrieve-with-features.[^10][^24] |
| pgvector + tsvector | PostgreSQL `tsvector` / `tsquery` full-text search | pgvector cosine / inner product | Application-level RRF or weighted sum | Hybrid runs inside Postgres; commonly combined via RRF in a single SQL.[^25] |

### Pinecone

Pinecone introduced hybrid search built on a single "sparse-dense" vector type that requires the index to use the `dotproduct` distance metric.[^6][^26] The official documentation describes sparse vectors with a very large number of dimensions but only a small proportion of non-zero values, one per vocabulary token, and recommends BM25 or SPLADE as the encoder.[^26] To merge signals the system uses a convex combination governed by an alpha parameter; the helper function `hybrid_score_norm` multiplies dense values by alpha and sparse values by `(1 - alpha)` before the dot product so the index itself computes the weighted score with no extra latency.[^20] Pinecone now also supports separate sparse and dense indexes with fusion at query time for users who prefer to manage the two modalities independently.[^20]

### Weaviate

Weaviate exposes a hybrid search operator that runs BM25 and vector search in parallel, then applies one of two fusion algorithms.[^7] Since v1.24 the default is `relativeScoreFusion`, which scales the highest score in each modality to 1 and the lowest to 0, preserving magnitude information; Weaviate made it the default because it "retains more information from the original searches than rankedFusion, which only retains the rankings".[^7] The alternative `rankedFusion` is exactly RRF with `k = 60` (`score = 1 / (rank + 60)`).[^7] An `alpha` parameter in `[0, 1]` weights the contributions, with `alpha = 0.5` as the default; `alpha = 1.0` is pure vector search and `alpha = 0.0` is pure BM25. Weaviate keeps the full BM25 surface area (custom tokenization, stopwords, `k1`, `b`, `AND`/`OR` operator) so hybrid mode behaves like a superset of the BM25 endpoint.[^7]

### Qdrant

Qdrant added a unified Query API in version 1.10 (July 2024) that can express hybrid retrieval as a single nested query, with prefetch sub-queries for sparse and dense vectors and a fusion stage on top.[^18] The built-in fusion is RRF; Qdrant also documents Relative Score Fusion (RSF) and Distribution-Based Score Fusion (DBSF) as alternatives that practitioners can apply.[^27] The same API natively supports reranking flows ("prefetch with sparse, rerank with ColBERT-style multi-vectors") and Matryoshka-style nested embedding queries.[^18]

### Elasticsearch and OpenSearch

Elasticsearch added native RRF in the 8.x line, first via a sub-searches RRF construct and then via the retrievers API; the `rrf` retriever takes a `rank_constant` (default 60) and `rank_window_size` (default 100).[^17] A weighted variant published by Elastic Search Labs lets each retriever contribute with its own weight: `rrf_score = w1 * rrf_1 + w2 * rrf_2`, useful when one signal is known to be stronger than another.[^28] OpenSearch implements hybrid search through a search-pipeline `normalization-processor` that intercepts query-phase scores from each clause, normalizes them (`min_max` or `l2`) and combines them (`arithmetic_mean`, `geometric_mean`, or `harmonic_mean`) with per-clause weights.[^23][^8] An RRF processor was added more recently to support rank-based fusion alongside the score-based pipeline.[^8]

### Azure AI Search

Azure AI Search (formerly Azure Cognitive Search) implements hybrid search as a single query request that carries both a `search` (full-text) parameter and one or more `vectorQueries`, executed in parallel and merged with RRF.[^36][^37] Full-text matches are scored with BM25, while vector matches use HNSW or exhaustive eKNN with the configured similarity metric; the fused `@search.score` is computed as `1/(rank + k)` summed across queries, with Microsoft recommending "a small value, such as 60" for `k`.[^37] By default a hybrid query returns the top 50 results of the unified set, and individual vector queries can be weighted to increase or decrease their influence before fusion.[^37] An optional semantic ranker (a machine-reading-comprehension reranker producing a separate `@search.rerankerScore` in the range 0.00 to 4.00) can be layered on top, and Microsoft reports that "hybrid retrieval with semantic ranker offers significant benefits in search relevance" on its benchmark testing.[^36][^37]

### Vespa

Vespa, originally an internal Yahoo system released as open source in 2017, treats text matching and vector search as features in a single ranking pipeline rather than as separate operators.[^10] A hybrid query combines `weakAnd` or `nearestNeighbor` retrieval operators, and the ranking profile assembles arbitrary expressions over text features (`bm25(title)`, `nativeRank(content)`) and vector features (`closeness(field, embedding)`, cosine distance).[^10][^24] Because the ranking expression is user-defined, fusion strategies in Vespa range from simple weighted sums to multi-phase ranking with a GBDT or transformer model in the second phase.[^24]

### pgvector and tsvector in PostgreSQL

PostgreSQL supports hybrid search natively through the combination of its built-in `tsvector` / `tsquery` full-text search and the [pgvector](/wiki/pgvector) extension, which adds a `vector` data type and HNSW or IVFFlat indexing.[^25] A common pattern is to run both queries in a single SQL with `WITH` clauses, then merge with RRF: each subquery produces a ranked list, the SQL computes `1.0 / (k + rank)` for each, sums per document, and orders by the sum.[^25] This approach is attractive because the entire pipeline stays inside a single transactional database without a separate vector store.

## How is hybrid search used in RAG frameworks?

Modern [RAG](/wiki/retrieval_augmented_generation_rag) orchestration frameworks treat hybrid retrieval as a standard component.

### LangChain

[LangChain](/wiki/langchain) provides an `EnsembleRetriever` that accepts a list of retrievers (typically a `BM25Retriever` and a vector-store retriever) plus optional per-retriever weights, and fuses their results using Reciprocal Rank Fusion with a constant `c` (default 60).[^29] An `alpha`-style weighting is achieved through the `weights` argument: a `[0.6, 0.4]` configuration assigns 60% influence to the first retriever and 40% to the second.[^29]

### LlamaIndex

[LlamaIndex](/wiki/llamaindex) composes hybrid retrieval by combining a `VectorIndexRetriever` with a `BM25Retriever` (or a vendor-native hybrid like Pinecone's sparse-dense queries) and merging results.[^30] For vendors that expose an `alpha` parameter (e.g., Weaviate), LlamaIndex documents an "alpha tuning" recipe in which alpha is treated as a hyperparameter to be swept against the developer's own evaluation set.[^31]

### Haystack

deepset's [Haystack](/wiki/haystack) framework treats retrievers as composable pipeline nodes. A canonical hybrid pipeline wires an `InMemoryBM25Retriever` and an `InMemoryEmbeddingRetriever` (or their Elasticsearch / Weaviate / OpenSearch / Azure AI Search analogues) into a `DocumentJoiner` that performs RRF or weighted joining, and optionally appends a transformer ranker for second-stage scoring.[^21][^32] Dedicated wrappers such as `WeaviateHybridRetriever` and `AzureAISearchHybridRetriever` push the fusion down into the vendor where supported.[^33][^34]

## What is hybrid search used for?

Hybrid search is widely used for:

- **Open-domain question answering and RAG**, where queries mix natural-language phrasing (favoring dense retrieval) with exact entities, codes, or jargon (favoring sparse).
- **Enterprise search** over heterogeneous corpora containing acronyms, product SKUs, error codes, and license identifiers that pure semantic systems regularly mishandle.
- **E-commerce search**, where brand names, model numbers, and attribute filters interact with semantic intent ("affordable noise-cancelling headphones for travel"). The Amazon ESCI shopping queries dataset is one of the standard hybrid-search benchmarks.[^8]
- **Biomedical and legal retrieval**, where domain-specific terminology and statute identifiers must be matched exactly, but synonymy and paraphrase remain important. BEIR includes TREC-COVID, NFCorpus, and SciFact precisely to expose how badly dense models trained on MS MARCO can transfer to such domains.[^2]
- **Code search**, where identifier tokens and function names benefit from BM25 while natural-language docstrings benefit from dense encoders.

### A worked example

A typical RAG ingestion pipeline backed by hybrid search looks roughly as follows. At indexing time the corpus is chunked, each chunk is sent through both a sparse encoder (BM25 statistics or a SPLADE model from [Hugging Face](/wiki/hugging_face)) and a dense encoder (a Sentence-BERT variant or a hosted embedding API from OpenAI, [Cohere](/wiki/cohere), or [Jina](/wiki/jina_embeddings_v3)), and the two representations are written to the vector store and the inverted index respectively. At query time the same encoders convert the user's natural-language query into a sparse query vector and a dense query vector. The vector store retrieves the top `k1` dense matches; the inverted index retrieves the top `k2` sparse matches. The two lists are fused by RRF or convex combination, the resulting top `n` candidates are passed through a [re-ranking](/wiki/re-ranking) cross-encoder, and the final top few are concatenated into the LLM prompt. Most of this orchestration is handled by [LangChain](/wiki/langchain), [LlamaIndex](/wiki/llamaindex), or [Haystack](/wiki/haystack) with minimal user code, and modern vector stores increasingly perform the fusion step server-side so the client only sees a single ranked list.

## What are the limitations of hybrid search?

Hybrid search is not free. Operationally, the system must maintain two indexes (or a hybrid index that supports both), pay query-time latency for both retrievers, and tune at least one fusion parameter (`alpha`, `k`, or per-modality weights). The Pinecone fusion-analysis paper explicitly documents that RRF, contrary to its "tuning-free" reputation, can be parameter-sensitive when the two underlying systems differ in calibration, and recommends a small labeled set to tune a convex combination instead.[^19]

The benchmark literature also contains caveats. Lassance and colleagues highlighted in 2023 that MS MARCO comes in subtly different "preprocessed" variants and that head-to-head comparisons in the literature have sometimes compared systems trained or evaluated on different versions, leading to overstated improvements.[^35] BEIR's own conclusion is that dense retrievers underperform BM25 in many out-of-domain settings, so a hybrid that simply sums an under-trained dense run with BM25 can recover most, but not all, of the lost ground without addressing the underlying generalization gap.[^2]

Finally, hybrid retrieval does not solve the problem of bad documents in the index. If a corpus contains contradictory facts or stale text, returning more of them via hybrid recall amplifies the downstream burden on the LLM-side reranker or generator. The cascaded design with a precise neural reranker is therefore standard in production RAG.

## Related Work

Hybrid search sits at the intersection of several active research areas:

- [BM25](/wiki/bm25) and [TF-IDF](/wiki/tf_idf) remain the sparse baselines against which all dense and learned-sparse models are compared.[^11]
- [SPLADE](/wiki/splade) and other learned sparse models (uniCOIL, DeepImpact) keep the inverted-index serving stack but learn term weights, blurring the sparse/dense distinction.[^15]
- [Dense Passage Retrieval (DPR)](/wiki/dense_passage_retrieval) established the modern dual-encoder paradigm that powers the dense side of most hybrids.[^12]
- [ColBERT](/wiki/colbert) late-interaction retrieval is increasingly used as a second-stage reranker on top of hybrid first-stage retrieval.
- [Re-ranking](/wiki/re-ranking) with transformer cross-encoders is the standard precision step after hybrid recall.
- [Semantic search](/wiki/semantic_search) is the broader umbrella that hybrid search refines by reintroducing lexical signals.
- [Vector database](/wiki/vector_database) systems and the broader [vector databases](/wiki/vector_databases) ecosystem provide the infrastructure for the dense side of hybrid pipelines.

## See also

- [BM25 (Okapi BM25)](/wiki/bm25)
- [SPLADE](/wiki/splade)
- [ColBERT](/wiki/colbert)
- [Dense Passage Retrieval (DPR)](/wiki/dense_passage_retrieval)
- [Sentence-BERT (SBERT)](/wiki/sentence-bert)
- [TF-IDF (Term Frequency-Inverse Document Frequency)](/wiki/tf_idf)
- [HNSW](/wiki/hnsw)
- [FAISS](/wiki/faiss)
- [Pinecone](/wiki/pinecone)
- [Weaviate](/wiki/weaviate)
- [Qdrant](/wiki/qdrant)
- [pgvector](/wiki/pgvector)
- [Vector database](/wiki/vector_database)
- [LangChain](/wiki/langchain)
- [LlamaIndex](/wiki/llamaindex)
- [Haystack (framework)](/wiki/haystack)
- [Retrieval-Augmented Generation](/wiki/retrieval_augmented_generation_rag)
- [Semantic search](/wiki/semantic_search)
- [Re-ranking](/wiki/re-ranking)
- [Information Retrieval](/wiki/information_retrieval)

## References

[^1]: Stephen Robertson and Hugo Zaragoza, "The Probabilistic Relevance Framework: BM25 and Beyond", *Foundations and Trends in Information Retrieval*, 2009-04-09. https://www.staff.city.ac.uk/~sbrp622/papers/foundations_bm25_review.pdf. Accessed 2026-05-21.
[^2]: Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, Iryna Gurevych, "BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models", arXiv:2104.08663 (NeurIPS 2021 Datasets and Benchmarks), 2021-10-21. https://arxiv.org/abs/2104.08663. Accessed 2026-05-21.
[^3]: Gordon V. Cormack, Charles L. A. Clarke, Stefan Büttcher, "Reciprocal Rank Fusion outperforms Condorcet and Individual Rank Learning Methods", *Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval*, 2009-07-19. https://dl.acm.org/doi/10.1145/1571941.1572114. Accessed 2026-05-21.
[^4]: Helain Zimmermann, "Hybrid Search: Combining Dense and Sparse Retrieval", helain-zimmermann.com, 2024. https://helain-zimmermann.com/blog/hybrid-search-combining-dense-and-sparse-retrieval. Accessed 2026-05-21.
[^5]: Quentin Herreros et al., "Improving information retrieval in the Elastic Stack: Hybrid retrieval", Elastic Search Labs, 2023. https://www.elastic.co/search-labs/blog/improving-information-retrieval-elastic-stack-hybrid. Accessed 2026-05-21.
[^6]: Pinecone, "Understanding hybrid search", Pinecone Docs. https://docs.pinecone.io/guides/data/understanding-hybrid-search. Accessed 2026-05-21.
[^7]: Weaviate, "Hybrid search", Weaviate Documentation. https://docs.weaviate.io/weaviate/concepts/search/hybrid-search. Accessed 2026-06-27.
[^8]: OpenSearch Project, "Building effective hybrid search in OpenSearch: Techniques and best practices", OpenSearch Blog, 2024. https://opensearch.org/blog/building-effective-hybrid-search-in-opensearch-techniques-and-best-practices/. Accessed 2026-05-21.
[^9]: deepset, "Creating a Hybrid Retrieval Pipeline", Haystack Tutorial 33. https://haystack.deepset.ai/tutorials/33_hybrid_retrieval. Accessed 2026-05-21.
[^10]: Vespa, "Hybrid Text Search Tutorial", Vespa Documentation. https://docs.vespa.ai/en/learn/tutorials/hybrid-search.html. Accessed 2026-05-21.
[^11]: Stephen Robertson, Hugo Zaragoza, Michael Taylor, "Simple BM25 extension to multiple weighted fields", Microsoft Research, in *Proceedings of CIKM 2004*, 2004. https://www.microsoft.com/en-us/research/publication/simple-bm25-extension-to-multiple-weighted-fields/. Accessed 2026-05-21.
[^12]: Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih, "Dense Passage Retrieval for Open-Domain Question Answering", arXiv:2004.04906 (EMNLP 2020), 2020-09-30. https://arxiv.org/abs/2004.04906. Accessed 2026-05-21.
[^13]: Qdrant, "What is a Sparse Vector? How to Achieve Vector-based Hybrid Search", Qdrant Articles. https://qdrant.tech/articles/sparse-vectors/. Accessed 2026-05-21.
[^14]: Yi Luan, Jacob Eisenstein, Kristina Toutanova, Michael Collins, "Sparse, Dense, and Attentional Representations for Text Retrieval", *Transactions of the Association for Computational Linguistics* 9:329-345, 2021-04-12. https://aclanthology.org/2021.tacl-1.20/. Accessed 2026-05-21.
[^15]: Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant, "SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking", *Proceedings of SIGIR 2021* (also SPLADE v2, arXiv:2109.10086), 2021-07-11. https://arxiv.org/abs/2109.10086. Accessed 2026-05-21.
[^16]: Yu. A. Malkov, D. A. Yashunin, "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs", arXiv:1603.09320, 2018-08-14. https://arxiv.org/abs/1603.09320. Accessed 2026-05-21.
[^17]: Elastic, "Reciprocal rank fusion", Elasticsearch Reference. https://www.elastic.co/docs/reference/elasticsearch/rest-apis/reciprocal-rank-fusion. Accessed 2026-05-21.
[^18]: Qdrant, "Hybrid Search Revamped, Building with Qdrant's Query API", Qdrant Articles, 2024-07-25. https://qdrant.tech/articles/hybrid-search/. Accessed 2026-05-21.
[^19]: Sebastian Bruch, Siyu Gai, Amir Ingber, "An Analysis of Fusion Functions for Hybrid Retrieval", arXiv:2210.11934 (also in ACM Transactions on Information Systems), 2023-05. https://arxiv.org/abs/2210.11934. Accessed 2026-05-21.
[^20]: Pinecone, "Hybrid search", Pinecone Docs. https://docs.pinecone.io/guides/search/hybrid-search. Accessed 2026-05-21.
[^21]: Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, Rodrigo Nogueira, "Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations", arXiv:2102.10073 (also SIGIR 2021 demo), 2021-02-19. https://arxiv.org/abs/2102.10073. Accessed 2026-05-21.
[^22]: Elastic, "RRF retriever", Elasticsearch Reference. https://www.elastic.co/docs/reference/elasticsearch/rest-apis/retrievers/rrf-retriever. Accessed 2026-05-21.
[^23]: OpenSearch Project, "Normalization processor", OpenSearch Documentation. https://docs.opensearch.org/latest/search-plugins/search-pipelines/normalization-processor/. Accessed 2026-05-21.
[^24]: Vespa, "Redefining Hybrid Search Possibilities with Vespa", Vespa Blog. https://blog.vespa.ai/redefining-hybrid-search-possibilities-with-vespa/. Accessed 2026-05-21.
[^25]: ParadeDB, "Hybrid Search in PostgreSQL: The Missing Manual", ParadeDB Blog. https://www.paradedb.com/blog/hybrid-search-in-postgresql-the-missing-manual. Accessed 2026-05-21.
[^26]: Pinecone, "Encode sparse vectors", Pinecone Docs. https://docs.pinecone.io/guides/data/encode-sparse-vectors. Accessed 2026-05-21.
[^27]: Qdrant, "Hybrid Search and the Universal Query API", Qdrant Course. https://qdrant.tech/course/essentials/day-3/hybrid-search/. Accessed 2026-05-21.
[^28]: Elastic Search Labs, "Balancing the scales: Making reciprocal rank fusion (RRF) smarter with weights", 2024. https://www.elastic.co/search-labs/blog/weighted-reciprocal-rank-fusion-rrf. Accessed 2026-05-21.
[^29]: LangChain, "EnsembleRetriever", LangChain Python API Reference. https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.ensemble.EnsembleRetriever.html. Accessed 2026-05-21.
[^30]: LlamaIndex, "BM25 Retriever", LlamaIndex Documentation. https://developers.llamaindex.ai/python/examples/retrievers/bm25_retriever/. Accessed 2026-05-21.
[^31]: Ravi Theja, "LlamaIndex: Enhancing Retrieval Performance with Alpha Tuning in Hybrid Search in RAG", LlamaIndex Blog. https://www.llamaindex.ai/blog/llamaindex-enhancing-retrieval-performance-with-alpha-tuning-in-hybrid-search-in-rag-135d0c9b8a00. Accessed 2026-05-21.
[^32]: deepset, "Hybrid Document Retrieval", Haystack Blog. https://haystack.deepset.ai/blog/hybrid-retrieval. Accessed 2026-05-21.
[^33]: deepset, "WeaviateHybridRetriever", Haystack Documentation. https://docs.haystack.deepset.ai/docs/weaviatehybridretriever. Accessed 2026-05-21.
[^34]: deepset, "AzureAISearchHybridRetriever", Haystack Documentation. https://docs.haystack.deepset.ai/docs/azureaisearchhybridretriever. Accessed 2026-05-21.
[^35]: Carlos Lassance, Stéphane Clinchant, "The Tale of Two MSMARCO and Their Unfair Comparisons", *Proceedings of SIGIR 2023*, arXiv:2304.12904, 2023-04-25. https://arxiv.org/abs/2304.12904. Accessed 2026-05-21.
[^36]: Microsoft, "Hybrid search using vectors and full text in Azure AI Search", Microsoft Learn (Azure AI Search documentation), updated 2026-02-19. https://learn.microsoft.com/en-us/azure/search/hybrid-search-overview. Accessed 2026-06-27.
[^37]: Microsoft, "Hybrid Search Scoring (RRF) in Azure AI Search", Microsoft Learn (Azure AI Search documentation), updated 2026-06-08. https://learn.microsoft.com/en-us/azure/search/hybrid-search-ranking. Accessed 2026-06-27.