Semantic search

Machine Learning Natural Language Processing

24 min read

Updated Jul 12, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 12, 2026

Fact-checked

In review queue

Sources

14 citations

Revision

v5 · 4,704 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Semantic search is an information retrieval approach that finds results based on the meaning and intent behind a query rather than relying solely on exact keyword matches. Instead of treating a search query as a bag of words, semantic search systems encode both queries and documents into dense vector representations (embeddings) that capture semantic meaning, then find documents whose embeddings are closest to the query embedding in vector space. This allows a search for "how to fix a leaking faucet" to return results about "plumbing repair" and "dripping tap solutions" even if those exact words never appear in the query.

Semantic search has become a foundational technology for modern AI applications, powering everything from enterprise knowledge bases and e-commerce product discovery to the retrieval component of retrieval-augmented generation (RAG) systems. The approach has matured rapidly since 2020, driven by advances in transformer-based embedding models, the emergence of purpose-built vector databases, and the explosive growth of large language model applications that depend on high-quality retrieval. The vector database market that underpins semantic search is projected to grow from roughly 3.2 billion dollars in 2026 to about 17.91 billion dollars by 2034, a compound annual growth rate near 24% ^[13].

The modern practice of semantic search was made tractable by the 2019 Sentence-BERT paper from Nils Reimers and Iryna Gurevych, which showed that encoding sentences into independent vectors and comparing them, rather than running every pair through a full transformer, "reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT" ^[4]. That speedup is what makes large-scale meaning-based retrieval practical.

How does semantic search work?

A semantic search system operates through three core stages: encoding, indexing, and retrieval.

Stage 1: Document encoding

During the offline indexing phase, every document (or document chunk) in the corpus is passed through an embedding model, which converts the text into a fixed-length dense vector, typically ranging from 384 to 3072 dimensions depending on the model. These vectors are numerical representations that position semantically similar texts close together in a high-dimensional space. The sentence "The cat sat on the mat" and "A kitten rested on the rug" would produce vectors that are near each other, even though they share few words.

The resulting vectors are stored in a vector database or search index optimized for fast similarity lookups.

Stage 2: Query encoding

When a user submits a search query, the same embedding model encodes the query text into a vector of the same dimensionality. This ensures that queries and documents exist in the same vector space and can be directly compared.

Stage 3: Similarity matching

The system computes the similarity between the query vector and all document vectors in the index, returning the documents with the highest similarity scores. In practice, exact comparison against every document would be prohibitively slow for large collections, so vector databases use approximate nearest neighbor (ANN) algorithms like HNSW (Hierarchical Navigable Small World) or IVF (Inverted File Index) to find the closest matches in milliseconds, even across billions of vectors ^[1].

How does semantic search differ from keyword search?

Traditional keyword search (lexical search) and semantic search have complementary strengths and weaknesses. Understanding the trade-offs is essential for building effective search systems.

Dimension	Keyword Search (BM25)	Semantic Search
Matching method	Exact term matching with term frequency weighting	Vector similarity based on learned meaning
Handles synonyms	No; "car" does not match "automobile"	Yes; captures synonyms and paraphrases
Handles exact identifiers	Excellent; matches product codes, error codes, names precisely	Poor; may miss exact terms not well-represented in training data
Infrastructure	CPU-based; no GPU required; runs on traditional databases	Requires embedding model and vector index; may need GPU for encoding
Speed at scale	Very fast (milliseconds over millions of documents)	Fast with ANN algorithms, but typically slower than BM25 for the same corpus
Interpretability	High; can explain why a document matched (which terms matched)	Low; difficult to explain why a particular vector is "close" to the query
Multilingual	Requires language-specific stemming and tokenization	Multilingual embedding models handle multiple languages natively
Training data dependency	None; works out of the box	Requires a pre-trained embedding model (general or fine-tuned)
Best for	Known-item search, exact identifiers, domain jargon	Exploratory search, natural language questions, conceptual queries

BM25 (Best Match 25) is the most widely used keyword search algorithm. It scores documents based on term frequency (how often query terms appear in a document), inverse document frequency (how rare query terms are across the corpus), and document length normalization. BM25 has been the backbone of information retrieval for decades and remains highly competitive, particularly for queries containing specific identifiers, product codes, or technical jargon ^[2].

Embedding models

The quality of semantic search depends heavily on the embedding model used to encode queries and documents. An embedding model maps text to dense vectors such that semantically similar texts produce similar vectors.

Architecture

Most modern embedding models are based on the transformer architecture. A common approach is the bi-encoder (also called a dual encoder): two transformer networks (or a single shared network) independently encode the query and document into fixed-length vectors. Similarity is then computed between these vectors. This architecture is efficient because document vectors can be precomputed and cached; only the query needs to be encoded at search time.

This contrasts with cross-encoders, which process the query and document together as a single concatenated input. Cross-encoders produce more accurate relevance scores because they can attend to fine-grained interactions between query and document tokens, but they are orders of magnitude slower since every query-document pair must be processed jointly. Cross-encoders are therefore used for re-ranking rather than first-stage retrieval.

Major embedding models

The following table summarizes widely used embedding models as of early 2026.

Model	Provider	Dimensions	Max Tokens	Type	Notable Features
text-embedding-3-large	OpenAI	3072	8,191	Proprietary API	Strong all-around retrieval performance; supports dimension reduction via Matryoshka
text-embedding-3-small	OpenAI	1536	8,191	Proprietary API	Cost-efficient; good performance for most use cases
embed-v4	Cohere	1024	128,000	Proprietary API	Multimodal (text + images); 128K context window
Gemini Embedding 001	Google	3072	8,192	Proprietary API	Topped MTEB Multilingual leaderboard (68.32 Task Mean) ^[3]
BGE-en-ICL	BAAI	4096	32,768	Open-source	In-context learning for task-specific performance boosts
Nomic Embed Text V2	Nomic AI	768	8,192	Open-source	First MoE architecture for embeddings; supports ~100 languages
GTE-Qwen2	Alibaba	1024-8192	32,768	Open-source	Flexible dimensions; strong multilingual performance
NV-Embed-v2	NVIDIA	4096	32,768	Open-source	72.31 MTEB English average; leading multilingual model
Voyage-3-large	Voyage AI	1024	32,000	Proprietary API	Outperforms competitors by 9-20% on retrieval tasks
Sentence-Transformers (all-MiniLM-L6-v2)	Hugging Face community	384	512	Open-source	Lightweight; widely used for prototyping and small-scale deployments

The Sentence-Transformers library, introduced by Reimers and Gurevych in 2019, was a pivotal development that made transformer-based embedding models accessible to practitioners ^[4]. It provides pre-trained models and training utilities built on top of Hugging Face Transformers, and it remains the most popular framework for working with embedding models in Python.

How do you choose an embedding model?

The best embedding model depends on the use case. Key considerations include:

Retrieval accuracy: How well does the model rank relevant documents? The MTEB benchmark provides standardized comparisons.
Dimensionality: Higher dimensions capture more nuance but require more storage and slower similarity computation.
Maximum token length: Models with longer context windows can encode larger text passages without truncation.
Latency: API-based models add network latency; locally hosted open-source models eliminate this but require GPU infrastructure.
Cost: Commercial APIs charge per token; open-source models have infrastructure costs but no per-query fees at scale.
Language support: Multilingual models are essential for applications serving non-English users.

Importantly, the embedding model used to encode documents and the one used to encode queries must be the same (or compatible). Mixing different embedding models produces vectors in different spaces, making similarity computation meaningless.

Vector databases

Vector databases are specialized storage systems designed to index, store, and query high-dimensional vectors efficiently. They are the infrastructure backbone of semantic search systems.

How do vector databases work?

Vector databases use approximate nearest neighbor (ANN) algorithms to enable fast similarity search over large collections of vectors. The most common algorithm is HNSW (Hierarchical Navigable Small World), which builds a multi-layered graph where each node is a vector and edges connect nearby vectors. Searching this graph has logarithmic time complexity, enabling sub-100ms queries over billions of vectors ^[1].

Other ANN algorithms include:

IVF (Inverted File Index): Partitions the vector space into clusters and searches only the most relevant clusters.
Product Quantization (PQ): Compresses vectors to reduce memory usage, trading some accuracy for lower storage costs.
ScaNN (Scalable Nearest Neighbors): Google's algorithm combining partitioning with quantization, optimized for large-scale deployments.

Major vector databases

The vector database landscape has expanded rapidly since 2021. The following table compares the leading platforms as of 2026.

Database	Type	Language	Key Strengths	Typical Scale
Pinecone	Managed cloud service	N/A (API)	Serverless option; sub-50ms p99 latency; simple API	Billions of vectors
Weaviate	Open-source / managed cloud	Go	Built-in hybrid search; module ecosystem; strong community	Billions of vectors
Milvus / Zilliz Cloud	Open-source / managed cloud	Go, C++	Lowest latency in benchmarks; cost-efficient at scale	Billions of vectors
Qdrant	Open-source / managed cloud	Rust	Rust performance; advanced filtering; payload indexing	Billions of vectors
Chroma	Open-source	Python	Developer-friendly; lightweight; excellent for prototyping	Millions of vectors
pgvector	PostgreSQL extension	C	Uses existing Postgres infrastructure; familiar SQL interface	Tens of millions of vectors
FAISS	Library (not a database)	C++, Python	Meta's research library; GPU-accelerated; highly optimized	Billions of vectors (in-memory)
Elasticsearch / OpenSearch	Search engine with vector support	Java	Combines traditional search with vector capabilities; mature ecosystem	Billions of vectors

The choice between a purpose-built vector database and an extension to an existing system (like pgvector or Elasticsearch) involves trade-offs. Purpose-built systems typically offer better performance and more specialized features, while extensions reduce operational complexity by avoiding the need to manage a separate database ^[5].

Similarity metrics

Similarity metrics determine how the "closeness" of two vectors is measured. The choice of metric affects search results and should match the metric used during embedding model training.

Cosine similarity

Cosine similarity measures the cosine of the angle between two vectors, ignoring their magnitudes. It ranges from -1 (opposite directions) to 1 (identical direction), with 0 indicating orthogonality (no similarity). Because it is magnitude-invariant, cosine similarity treats a short document and a long document equally if they discuss the same topic. This makes it the most popular metric for text-based semantic search ^[6].

Mathematically: $\cos(A, B) = \frac{A \cdot B}{\lVert A \rVert \, \lVert B \rVert}$

Dot product

The dot product (inner product) computes the sum of element-wise products of two vectors. Unlike cosine similarity, the dot product is sensitive to vector magnitudes: longer vectors (in the geometric sense) produce higher scores. This is useful when magnitude carries meaning, such as in recommendation systems where a larger embedding magnitude indicates higher confidence. When vectors are L2-normalized (unit length), the dot product is equivalent to cosine similarity ^[6].

Mathematically: $\operatorname{dot}(A, B) = \sum_i A_i B_i$

Euclidean distance

Euclidean distance (L2 distance) measures the straight-line distance between two points in vector space. Smaller distances indicate greater similarity. It is sensitive to both direction and magnitude, which can cause issues when irrelevant dimensions with high values dominate the distance calculation. Euclidean distance works well for spatial data and clustering tasks but is less commonly used for text semantic search than cosine similarity ^[6].

Mathematically: $L_2(A, B) = \sqrt{\sum_i (A_i - B_i)^2}$

Choosing a metric

The general rule is to match the similarity metric to the one used during the embedding model's training. Most text embedding models are trained with cosine similarity or dot product loss. If your vectors are normalized (which many embedding models produce by default), cosine similarity and dot product yield identical rankings. Pinecone, Weaviate, and other vector databases allow you to specify the metric at index creation time.

Hybrid search

Hybrid search combines keyword-based retrieval (typically BM25) with semantic search (vector similarity) to leverage the strengths of both approaches. This combination has become the recommended approach for production search systems because neither method alone handles all query types well ^[2].

Why does hybrid search matter?

Keyword search excels at matching exact terms, identifiers, and technical jargon but fails when users express queries in different words than the documents use. Semantic search handles vocabulary mismatch and captures conceptual similarity but can miss exact matches for specific terms, product codes, or proper nouns. Hybrid search runs both retrievers in parallel and merges their results.

Consider a search for "error code E4021 troubleshooting." BM25 will precisely match documents containing "E4021," which semantic search might miss if that code was not well-represented in the embedding model's training data. Conversely, semantic search will find documents discussing "fixing fault code E4021" or "resolving the E4021 issue" that discuss troubleshooting procedures without using the exact word "troubleshooting."

Fusion methods

After both retrievers return their results, a fusion algorithm merges the two ranked lists into a single ranking.

Reciprocal Rank Fusion (RRF) is the most widely used fusion method due to its simplicity and robustness. For each document, RRF sums the reciprocal of its rank from each retriever: $\operatorname{score}(d) = \sum_i \frac{1}{k + \operatorname{rank}_i(d)}$ , where $k$ is a constant (typically 60) that prevents high-ranked documents from dominating excessively. RRF is effective because it does not require score normalization between retrievers, which operate on different scales ^[2].

Convex Combination (CC) normalizes the scores from each retriever to a common range and then computes a weighted average. This approach allows fine-tuning the relative importance of keyword vs. semantic results (e.g., 40% BM25, 60% semantic), but it requires careful score normalization and weight tuning.

Learned fusion uses a trained model to optimally combine retriever scores. While more accurate than heuristic methods, it requires labeled training data and adds complexity.

Performance impact

Hybrid search consistently outperforms either retriever used alone. Research shows that reranked BM25 combined with semantic retrieval can achieve an NDCG@10 improvement from 43.4 (BM25 alone) to over 52.6 on the BEIR benchmark ^[7]. Pinecone's analysis reports a 48% improvement in retrieval quality using hybrid retrieval with re-ranking compared to single-method approaches ^[8].

Re-ranking

Re-ranking is a second-stage process that takes the initial set of retrieved results (from semantic search, keyword search, or hybrid search) and reorders them using a more powerful, computationally expensive model. The goal is to improve the precision of the final ranked list.

Cross-encoders

The most common re-ranking approach uses cross-encoder models. Unlike bi-encoders (used in first-stage semantic search), cross-encoders process the query and each candidate document together as a single input, allowing full attention between all query and document tokens. This joint processing captures fine-grained relevance signals that bi-encoders miss, such as negation, subtle context dependencies, and precise answer matching.

The trade-off is speed. A bi-encoder encodes the query once and compares it against precomputed document vectors. A cross-encoder must process every query-document pair individually. For this reason, cross-encoders are applied only to the top-K results (typically 20 to 100) returned by the first-stage retriever.

Popular cross-encoder models include those trained on the MS MARCO passage ranking dataset, available through the Sentence-Transformers library (e.g., cross-encoder/ms-marco-MiniLM-L-12-v2) ^[4].

Commercial re-ranking services

Cohere Rerank is the most widely used commercial re-ranking API. Cohere Rerank 4, released in 2025, features a 32K-token context window (4x larger than Rerank 3.5), supports over 100 languages, and includes a self-learning capability that allows customization without annotated training data. Rerank 4 comes in two variants: Fast (optimized for latency-sensitive applications like e-commerce and customer service) and Pro (optimized for accuracy on complex queries) ^[9].

Jina Reranker provides open-source and API-based cross-encoder models with competitive accuracy on MTEB benchmarks.

Elastic Rerank offers a semantic re-ranker integrated directly into the Elasticsearch ecosystem, allowing users to add re-ranking without external API calls ^[10].

Three-stage retrieval pipeline

The state-of-the-art retrieval architecture as of 2026 uses three stages:

BM25 retrieval: Cast a wide net with keyword matching, retrieving the top 1,000 candidates.
Dense (semantic) retrieval: Independently retrieve the top 1,000 candidates based on vector similarity.
Re-ranking: Merge the two result sets (via RRF or another fusion method) and apply a cross-encoder to the top 50 to 100 candidates to produce the final ranked list.

This pipeline maximizes both recall (through broad first-stage retrieval) and precision (through cross-encoder re-ranking). The three-stage approach has been validated across enterprise search, academic benchmarks, and production RAG systems ^[8].

What is semantic search used for?

Enterprise search

Semantic search powers internal knowledge management systems that let employees search across wikis, documentation, Slack conversations, emails, and support tickets using natural language questions. Unlike traditional keyword search, semantic search handles the vocabulary mismatch problem: an employee searching for "vacation policy" will find the document titled "Paid Time Off Guidelines" even though the terms do not overlap. Companies like Glean, Guru, and Coveo build their products on semantic search technology.

E-commerce product search

Product search is a high-value application of semantic search. When a shopper searches for "comfortable shoes for standing all day," semantic search can surface products tagged with attributes like "cushioned insole" and "ergonomic support" even if the product descriptions never use the phrase "standing all day." Major e-commerce platforms including Amazon, Walmart, and Etsy use semantic search in combination with traditional filters and ranking signals.

Retrieval-augmented generation (RAG)

Semantic search is the retrieval backbone of most RAG systems. When a user asks a question to a RAG-powered chatbot, the system uses semantic search to find the most relevant document chunks from its knowledge base, then passes those chunks to a large language model for answer generation. The quality of the semantic search component directly determines the quality of the generated answers; poor retrieval leads to irrelevant context and hallucinated responses.

Legal and compliance search

Law firms and compliance teams use semantic search to query vast repositories of contracts, regulations, and case law. A search for "clauses limiting liability in vendor agreements" requires understanding intent rather than matching keywords, making semantic search significantly more effective than keyword-based systems for legal research.

Academic and scientific literature

Researchers use semantic search tools like Semantic Scholar, Elicit, and Consensus to find relevant papers based on research questions expressed in natural language. These tools go beyond title and abstract keyword matching to identify papers whose findings are semantically relevant to the query.

Customer support

Support teams use semantic search to find relevant knowledge base articles, past tickets, and documentation when handling customer inquiries. Semantic search helps match a customer's description of a problem ("my screen keeps flickering after the update") to the correct troubleshooting article, even if the article uses different terminology.

Evaluation metrics

Measuring the quality of a semantic search system requires metrics that assess both the relevance and ranking of returned results.

NDCG (Normalized Discounted Cumulative Gain)

NDCG@K evaluates the quality of a ranked list by comparing it to an ideal ranking. It accounts for both the relevance of each result and its position in the list, applying a logarithmic discount to results further down the ranking. A score of 1.0 means the system returned results in perfect order; lower scores indicate suboptimal ranking. NDCG is the most widely used metric for evaluating search and recommendation systems because it handles graded relevance (not just binary relevant/irrelevant) ^[11].

MRR (Mean Reciprocal Rank)

MRR@K measures how quickly the system surfaces the first relevant result. For each query, the reciprocal rank is 1/position of the first relevant result. If the first relevant result appears at position 3, the reciprocal rank is $1/3$ . MRR is the average reciprocal rank across all queries. This metric is particularly useful for search applications where users primarily care about the top result, such as question answering systems ^[11].

Recall@K

Recall@K measures the fraction of all relevant documents that appear in the top K results. If there are 10 relevant documents in the corpus and 7 appear in the top 20, Recall@20 is 0.7. Unlike NDCG and MRR, Recall@K is not rank-aware: it does not consider the order of results within the top K, only whether they are present. It is especially useful for evaluating the first-stage retrieval step, where the goal is to cast a wide net and not miss relevant documents ^[11].

Comparison of metrics

Metric	What it measures	Rank-aware?	Best for
NDCG@K	Quality of ranking with graded relevance	Yes	Overall search quality with multi-level relevance
MRR@K	Position of first relevant result	Yes	Question answering; single-answer search
Recall@K	Fraction of relevant results found in top K	No	First-stage retrieval coverage
Precision@K	Fraction of top K results that are relevant	No	Measuring result cleanliness
MAP (Mean Average Precision)	Average precision across all recall levels	Yes	Binary relevance with multiple relevant documents

MTEB benchmark

The Massive Text Embedding Benchmark (MTEB) is the standard benchmark for evaluating text embedding models. Introduced by Muennighoff et al. in 2022, MTEB evaluates embeddings across multiple task categories: retrieval, classification, clustering, pair classification, reranking, semantic textual similarity (STS), and summarization ^[12].

MTEB uses the BEIR (Benchmarking IR) suite as its retrieval evaluation component. BEIR, introduced by Thakur et al. in 2021, encompasses 18 English datasets drawn from 9 heterogeneous retrieval tasks (including fact checking, question answering, and biomedical retrieval) spanning biomedical, financial, scientific, and general-domain corpora, providing a comprehensive assessment of how well an embedding model generalizes across domains ^[14]. The BEIR study found that "BM25 is a robust baseline" and that re-ranking and late-interaction models achieve the best zero-shot performance but at high computational cost, a finding that still motivates the multi-stage pipelines described above ^[14].

MTEB leaderboard (March 2026)

The MTEB leaderboard, hosted on Hugging Face, tracks the performance of embedding models. Google's Gemini Embedding 001 reached the top of the MTEB Multilingual leaderboard with a Task Mean score of 68.32, establishing a new state of the art when it launched in March 2025 and still holding a leading position into 2026 ^[3]. As of March 2026 ^[3]:

Model	Provider	MTEB Score	Leaderboard / Subset	Type
Gemini Embedding 001	Google	68.32	Multilingual Task Mean	Proprietary
NV-Embed-v2	NVIDIA	72.31	English retrieval subset	Open-source
BGE-en-ICL	BAAI	71.24	English retrieval subset	Open-source
Qwen3-Embedding-8B	Alibaba	70.58	Multilingual	Open-source

These figures are reported on different leaderboard slices (multilingual versus English-retrieval subset) and are not directly comparable as a single ranked column; the overall MTEB average aggregates across many task types, so retrieval-specific selection should weigh BEIR and the retrieval subtask rather than the headline number ^[3].

A notable trend is that open-source models have closed the gap with, and in some cases surpassed, commercial APIs on benchmark performance. However, raw MTEB averages can be misleading because they aggregate across many task types. For retrieval-specific use cases, looking at BEIR scores and the retrieval subtask is more informative than the overall average ^[3].

MMTEB

The Massive Multilingual Text Embedding Benchmark (MMTEB), introduced in 2025, extends MTEB to evaluate multilingual embedding performance across dozens of languages. This benchmark addresses the criticism that MTEB was overly English-centric and provides more reliable guidance for selecting models for non-English applications ^[12].

Current state (2025 to 2026)

Semantic search has moved from a niche technology to standard infrastructure for AI-powered applications. Several trends define the current landscape.

Hybrid search as the default. Pure semantic search deployments have largely given way to hybrid architectures combining BM25 and vector search. Every major vector database (Weaviate, Qdrant, Milvus, Elasticsearch) now offers built-in hybrid search capabilities, and RAG frameworks like LangChain and LlamaIndex default to hybrid retrieval in their templates.

Embedding model commoditization. The performance gap between leading embedding models has narrowed. Open-source models from BAAI (BGE series), Alibaba (GTE/Qwen), and NVIDIA (NV-Embed) perform within a few percentage points of commercial offerings from OpenAI, Cohere, and Google. The competitive focus has shifted from raw accuracy to practical features: longer context windows, multilingual support, Matryoshka (variable-dimension) embeddings, and efficient inference on edge devices ^[3].

Re-ranking as standard practice. Adding a cross-encoder re-ranking stage after initial retrieval has moved from an optimization technique to a best practice. Cohere Rerank, Jina Reranker, and open-source cross-encoders are now routinely integrated into production search pipelines. The release of Cohere Rerank 4 with its 32K context window and self-learning capabilities reflects the maturity of this approach ^[9].

Multimodal semantic search. Embedding models that handle text, images, and other modalities in a shared vector space are gaining traction. Cohere's embed-v4 supports text and image inputs in a single model, enabling searches where a text query can find relevant images and vice versa. CLIP-based models from OpenAI and open-source alternatives continue to advance multimodal search capabilities.

Late interaction models. Models like ColBERT and ColPali represent a middle ground between bi-encoders and cross-encoders. Instead of compressing an entire document into a single vector, late interaction models retain per-token embeddings and compute fine-grained similarity at query time. This provides accuracy closer to cross-encoders with efficiency closer to bi-encoders, though at the cost of significantly larger index sizes.

Adaptive and task-specific embeddings. The next generation of embedding models is moving toward adaptivity. Rather than producing a single general-purpose embedding, these models can adjust their representations based on task instructions (e.g., "Retrieve passages that answer this question" vs. "Find documents on the same topic"). BGE-en-ICL exemplifies this trend with its in-context learning capability, and instruction-tuned embedding models from multiple providers now accept task prefixes that steer their output.

References

"What Are Vector Databases? How They Power AI in 2026." Bright Data. https://brightdata.com/blog/ai/vector-databases ↩
"Hybrid Search: Combining BM25 and Semantic Search for Better Results." LanceDB / Medium. https://medium.com/etoai/hybrid-search-combining-bm25-and-semantic-search-for-better-results-with-lan-1358038fe7e6 ↩
"Embedding Model Leaderboard: MTEB Rankings March 2026." Awesome Agents. https://awesomeagents.ai/leaderboards/embedding-model-leaderboard-mteb-march-2026/ ↩
Reimers, N. and Gurevych, I. "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks." Proceedings of EMNLP 2019. https://arxiv.org/abs/1908.10084 ↩
"The Top 6 Vector Databases to Use for AI Applications in 2026." Appwrite. https://appwrite.io/blog/post/top-6-vector-databases-2025 ↩
"Vector Similarity Explained." Pinecone. https://www.pinecone.io/learn/vector-similarity/ ↩
"BM25 Retrieval: Methods and Applications." Emergent Mind. https://www.emergentmind.com/topics/bm25-retrieval ↩
"Integrating BM25 in Hybrid Search and Reranking Pipelines." DEV Community. https://dev.to/negitamaai/integrating-bm25-in-hybrid-search-and-reranking-pipelines-strategies-and-applications-4joi ↩
"Cohere Introduces Rerank 4." BigDATAwire. https://www.hpcwire.com/bigdatawire/this-just-in/cohere-introduces-rerank-4/ ↩
"Elastic Rerank: Elastic's Semantic Re-ranker Model." Elasticsearch Labs. https://www.elastic.co/search-labs/blog/elastic-semantic-reranker-part-2 ↩
"Evaluation Metrics for Search and Recommendation Systems." Weaviate. https://weaviate.io/blog/retrieval-evaluation-metrics ↩
Muennighoff, N., Tazi, N., Magne, L., and Reimers, N. "MTEB: Massive Text Embedding Benchmark." Proceedings of EACL 2023. https://arxiv.org/abs/2210.07316 ↩
"Vector Database Market Size & Share, 2026-2034 Trends." Global Market Insights. https://www.gminsights.com/industry-analysis/vector-database-market ↩
Thakur, N., Reimers, N., Ruckle, A., Srivastava, A., and Gurevych, I. "BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models." NeurIPS Datasets and Benchmarks 2021. https://arxiv.org/abs/2104.08663 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

4 revisions by 1 contributors · full history

Suggest edit