# Vector database

> Source: https://aiwiki.ai/wiki/vector_database
> Updated: 2026-06-20
> Categories: AI Infrastructure, Developer Tools, Information Retrieval, Machine Learning
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

*See also: [AI terms](/wiki/ai_terms)*

A vector database is a database that stores data as high-dimensional vectors (numerical embeddings produced by a machine learning model) and retrieves records by similarity rather than exact match, using approximate nearest neighbor search to find the items closest to a query vector. It is the storage layer that gives [large language models](/wiki/large_language_model) long-term memory and powers semantic search, recommendation, and [retrieval-augmented generation](/wiki/retrieval_augmented_generation_rag) (RAG). As of 2026, over 68% of enterprise AI applications use a vector database to manage embeddings, and analyst estimates place the global market between roughly USD 2.6 billion and USD 3.2 billion in 2025, growing at a compound annual rate above 20%.[39][46] Pinecone founder Edo Liberty, who helped define the category, has said: "We created Pinecone and the vector database category as a whole to let all AI developers easily work with a scalable and cost efficient database for this workload."[9]

## Explain Vector database Like I'm 5 (ELI5)

A vector database is a special kind of computer storage that helps find things that are similar, like finding pictures that look like a cat or finding songs that sound happy. It is really good at helping computers understand what things mean, even if those things are words, pictures, or sounds.

Vector databases also help big computer brains called "large language models" remember things, so they can give better answers when you ask them questions. They can find what you are looking for really fast, even when there are lots and lots of items inside.

## Introduction

A vector database is a type of database specifically designed for storing and querying high-dimensional vector data, which is often used in [artificial intelligence applications](/wiki/artificial_intelligence_applications) ([AI](/wiki/ai) apps). Complex data, including unstructured forms like documents, images, videos, and plain text, is growing rapidly. Traditional databases designed for structured data struggle to store and analyze complex data effectively, often requiring extensive keyword and metadata classification. Vector databases address this issue by transforming complex data into [vector embeddings](/wiki/vector_embeddings), which describe data objects in numerous dimensions. These databases are gaining popularity due to their ability to extend large language models ([LLMs](/wiki/llms)) with long-term memory and provide efficient querying for [AI-driven applications](/wiki/artificial_intelligence_applications).[34]

As of 2026, over 68% of enterprise AI applications use vector databases to manage [embeddings](/wiki/embeddings) generated by large language models, [computer vision](/wiki/computer_vision) systems, and recommendation engines. Industry estimates from research firms place the global vector database market between USD 2.6 billion and USD 3.2 billion in 2025, with most analyst forecasts pointing to a compound annual growth rate above 20% through the early 2030s. The category sits at the center of the [retrieval-augmented generation](/wiki/retrieval_augmented_generation_rag) stack and has become a standard component in modern AI infrastructure alongside model-serving platforms, [orchestration frameworks](/wiki/langchain), and feature stores.[39][46]

The first generation of dedicated vector database vendors appeared between 2017 and 2022, including [Faiss](/wiki/faiss) (Meta, 2017), [Milvus](/wiki/milvus) ([Zilliz](/wiki/zilliz), 2019), [Weaviate](/wiki/weaviate) (SeMI Technologies, 2019), [Pinecone](/wiki/pinecone) (2019), [Qdrant](/wiki/qdrant) (2021), and [Chroma](/wiki/chroma) (2022). A second wave of relational database extensions arrived in parallel, led by [pgvector](/wiki/pgvector) for [PostgreSQL](/wiki/postgresql) (Andrew Kane, 2021), MongoDB Atlas Vector Search (general availability December 2023), Oracle Database 23ai vector type (2024), and SQL Server vector functions. By 2025, the divide between purpose-built vector engines and traditional databases with vector add-ons had narrowed considerably, and many production teams run a mix of both.[45]

## What is a vector database?

In a relational database, data is organized in rows and columns, while in a document database, it is organized in documents and collections. In contrast, a vector database stores arrays of floating-point numbers (or quantized representations of them) along with associated metadata, indexed for similarity rather than exact match. These databases can be queried with ultra-low latency, making them suitable for AI-driven applications.

A vector database indexes and stores [vector embeddings](/wiki/vector_embedding) for efficient retrieval and similarity search. In addition to traditional CRUD (create, read, update, and delete) operations and metadata filtering, vector databases enable the organization and comparison of any vector to one another or to the vector of a search query. This capability allows vector databases to excel at similarity search or vector search, providing more comprehensive search results that would not be possible with traditional search technology.[34]

The fundamental query is the k-nearest-neighbor (kNN) lookup: given a query vector q and a target k, return the k vectors in the index whose distance to q is smallest under a chosen metric. For high-dimensional data, exact kNN requires comparing the query against every stored vector, which is impractical at scale. Vector databases therefore implement approximate nearest neighbor ([ANN](/wiki/ann)) search, which trades a small amount of recall for orders-of-magnitude gains in throughput and latency.[34]

### The curse of dimensionality

A long-standing challenge in vector search is the curse of dimensionality. As dimensionality grows, the volume of the embedding space grows exponentially, points drawn from typical distributions become roughly equidistant from one another, and the contrast between nearest and farthest neighbors collapses. In purely random high-dimensional spaces, distance-based nearest neighbor methods can fail to discriminate at all. Real-world embeddings produced by [neural networks](/wiki/neural_network) escape the worst of this problem because the data lies on lower-dimensional manifolds inside the ambient space, but the phenomenon still motivates the design of every modern ANN algorithm. Techniques such as random projections, learned partitions, navigable small-world graphs, and quantization all aim to exploit the manifold structure of real embeddings rather than treat them as uniformly distributed points.

### Vector databases vs. vector search libraries

Practitioners distinguish between vector libraries and vector databases. A library such as [Faiss](/wiki/faiss), [Annoy](/wiki/annoy), [hnswlib](/wiki/hnswlib), or [ScaNN](/wiki/scann) is an in-process toolkit that exposes APIs for index construction and search but leaves operational concerns (sharding, replication, persistence, multi-tenancy, security) to the application. A vector database wraps one or more such algorithms inside a managed runtime that handles those concerns, exposes a network API, and integrates with monitoring and access control. Faiss is the canonical library, while [Pinecone](/wiki/pinecone), [Weaviate](/wiki/weaviate), [Milvus](/wiki/milvus), [Qdrant](/wiki/qdrant), and [Chroma](/wiki/chroma) are canonical databases.[7]

## Indexing algorithms

The performance of a vector database depends heavily on its indexing strategy. Three indexing families dominate the landscape: HNSW, IVF, and Product [Quantization](/wiki/quantization). Production systems often combine two or more of these to balance recall, latency, memory, and throughput.

| Algorithm | Family | Year | Origin | Typical Build Complexity | Typical Query Complexity | Memory Use |
|---|---|---|---|---|---|---|
| Flat (brute force) | Exact | classical | n/a | O(N) | O(N · d) per query | 1x raw vectors |
| LSH | Hashing | 1998 | Indyk and Motwani, STOC 1998 | O(N · L · d) | O(L · d) average | Multiple hash tables |
| IVF | Inverted file | 2010 | Jegou, Douze, Schmid (with PQ) | O(N · nlist · d) | O((nprobe / nlist) · N · d) | 1x raw + centroids |
| Product Quantization | Quantization | 2010 | Jegou, Douze, Schmid (TPAMI 2011) | O(N · d) after codebook training | O(M · K) lookup per query | Up to 64x compression |
| HNSW | Graph | 2016 | Malkov and Yashunin (arxiv 1603.09320) | O(N · log N) average | O(log N) average | 1.5 to 2x raw vectors |
| ScaNN | Tree + AQ | 2020 | Guo et al., ICML 2020 (arxiv 1908.10396) | O(N · d) with codebook | Sub-linear typical | Compressed |
| DiskANN (Vamana) | Graph on SSD | 2019 | Subramanya et al., NeurIPS 2019 | O(N · log N) | O(log N) with SSD reads | RAM for centroids only |
| SPANN | Inverted on disk | 2021 | Chen et al., NeurIPS 2021 (arxiv 2111.08566) | Hierarchical clustering + graph | Sub-linear with SSD reads | Centroids in RAM |

### HNSW (Hierarchical Navigable Small World)

[HNSW](/wiki/hnsw) is a graph-based approximate nearest neighbor algorithm published by Yu. A. Malkov and D. A. Yashunin in March 2016 (arxiv 1603.09320, later in IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4):824-836, 2018).[1] It builds a multi-layered navigable graph over the vector space. Each layer contains a subset of the vectors, with the top layers holding fewer nodes and longer-range connections, and the bottom layers holding all nodes with short-range connections. During a search, the algorithm starts at the top layer and greedily traverses toward the query vector, descending through layers and refining the candidate set at each level. As the authors describe it, "Starting search from the upper layer together with utilizing the scale separation boosts the performance compared to NSW and allows a logarithmic complexity scaling."[1]

HNSW offers several strengths that make it the default choice for most vector database deployments:

- **High recall at low latency.** HNSW consistently achieves above 95% recall with single-digit millisecond query times on datasets up to hundreds of millions of vectors.[36]
- **No training phase required.** Unlike IVF, HNSW does not require a separate clustering step before the index can serve queries. Vectors can be inserted incrementally.
- **Tunable accuracy.** The `ef_search` parameter controls how many candidates the algorithm considers during a query. Increasing `ef_search` improves recall at the cost of higher latency.
- **Tunable construction.** The `M` parameter controls the number of connections per node, and `ef_construction` controls the quality of the graph at build time. Higher values produce a more accurate but larger index.

The main drawback of HNSW is memory consumption. Because the entire graph must reside in memory for fast traversal, HNSW typically requires 1.5 to 2 times the raw vector data size in RAM. For billion-scale datasets, this can become prohibitively expensive, which is why vendors increasingly pair HNSW with quantization to compress the in-memory representation.

HNSW is the primary index type in [Qdrant](/wiki/qdrant), [Weaviate](/wiki/weaviate), [pgvector](/wiki/pgvector), [Elasticsearch](/wiki/elasticsearch), [OpenSearch](/wiki/opensearch), and Redis (RediSearch 2.4 and later), and is supported as an option in [Milvus](/wiki/milvus), [Faiss](/wiki/faiss), and [Pinecone](/wiki/pinecone).

### IVF (Inverted File Index)

IVF is a partition-based indexing method derived from the inverted-file approach popularized in image retrieval research. It works by first running [k-means clustering](/wiki/k-means) on the dataset to divide the vector space into a configurable number of clusters (called `nlist`). Each cluster is represented by a centroid, and every vector in the dataset is assigned to its nearest centroid. At query time, the algorithm computes the distance between the query vector and all centroids, selects the `nprobe` closest clusters, and then performs an exhaustive search only within those clusters.[36]

IVF is well suited for scenarios where HNSW is impractical:

- **Billion-scale datasets.** Because IVF can store the index on disk and only load relevant clusters into memory during a query, it can handle datasets far larger than available RAM.
- **Filtered search.** IVF handles metadata filtering efficiently through a two-level process: first performing coarse-grained filtering at the centroid level to narrow down candidates, then conducting fine-grained distance calculations within selected clusters.
- **Batch search workloads.** When throughput matters more than individual query latency, IVF can process many queries in parallel across clusters.

The main trade-off is that IVF requires a training phase (the k-means clustering step), which can be time-consuming for large datasets. Additionally, IVF generally achieves lower recall than HNSW at equivalent latency, because queries that fall near cluster boundaries may miss relevant vectors in neighboring clusters.[36] Practitioners typically tune `nlist` near the square root of the dataset size and adjust `nprobe` to trade recall against latency.

IVF is available in [Faiss](/wiki/faiss), [Milvus](/wiki/milvus), and Oracle Database 23ai, among other systems.

### Product Quantization (PQ)

Product Quantization is a lossy compression technique introduced by Hervé Jégou, Matthijs Douze, and Cordelia Schmid in their 2011 IEEE TPAMI paper (DOI 10.1109/TPAMI.2010.57).[2] It reduces the memory footprint of vector indexes by up to 97%.[37] PQ works by dividing each high-dimensional vector into a set of smaller subvectors and then quantizing each subvector independently using a separate codebook trained with k-means.

As an example, a 128-dimensional float32 vector occupies 512 bytes. PQ might split it into 8 subvectors of 16 dimensions each, train a 256-centroid codebook for each subvector, and replace each subvector with a single byte. The result compresses the vector to 8 bytes, a reduction of roughly 98%. At query time, PQ uses precomputed lookup tables to estimate distances against quantized database vectors without decompressing them.[37] PQ is lossy and is rarely used alone; it is typically combined with IVF (as IVF-PQ) or HNSW to form a two-stage retrieval pipeline where the index identifies a candidate set and a re-ranking step refines results using full-precision vectors.

While conventional scalar quantization achieves up to 32x compression, PQ can provide compression levels of up to 64x, making it especially valuable for cost-sensitive deployments at very large scale. [Faiss](/wiki/faiss), [Milvus](/wiki/milvus), [Pinecone](/wiki/pinecone), and OpenSearch all use product quantization internally.

### Scalar quantization and binary quantization

In addition to product quantization, vector databases increasingly support scalar and binary forms of compression that trade more accuracy for further memory savings.

- **Scalar quantization** maps each float32 component of a vector to an integer with fewer bits, typically int8. This produces 4x compression with a small recall penalty and is the default in Elasticsearch 9.0 (`int8_hnsw`).
- **Binary quantization** maps each float component to a single bit by sign thresholding (positive becomes 1, non-positive becomes 0). On a 1536-dimensional embedding such as OpenAI's text-embedding-3-small output, the raw float32 representation is 6,144 bytes, and the binary representation is 192 bytes, a 32x compression. Distance computations on binary codes use Hamming distance, which is computed with a single XOR followed by a population count. [Qdrant](/wiki/qdrant) and other engines report up to 40x speedups on binary-quantized HNSW with high-dimensional embeddings, often combined with full-precision re-ranking on a smaller candidate set.[15]
- **Better Binary Quantization (BBQ)** is an Elastic-developed extension that improves recall on binary-quantized vectors through learned rotations and asymmetric search.

Binary quantization works best on embeddings of dimension 1024 or higher; on lower-dimensional vectors, the loss of information is too large to preserve useful similarity rankings.[15]

### ScaNN

ScaNN (Scalable Nearest Neighbors) was published by Ruiqi Guo and colleagues at Google Research in 2020 (ICML 2020, arxiv 1908.10396). It introduces an anisotropic vector quantization loss that penalizes the parallel component of a datapoint's residual relative to its orthogonal component, on the observation that database points with the largest inner products are more relevant for a given query.[4] ScaNN combines tree-based partitioning with this quantization to deliver leading results on benchmarks such as glove-100-angular.[4] It is available as an open-source library and underlies vector search in Google Cloud's Vertex AI Vector Search service.

### DiskANN

DiskANN was published by Suhas Jayaram Subramanya, Rohan Kadekodi, Ravishankar Krishaswamy, and Harsha Vardhan Simhadri at Microsoft Research at NeurIPS 2019. It introduced a graph-based algorithm called Vamana with a smaller search radius than NSG and HNSW, designed specifically for SSD storage.[3] On the SIFT1B billion-point benchmark, DiskANN serves more than 5,000 queries per second with under 3 ms mean latency at 95% 1-recall@1 on a 16-core machine with 64 GB of RAM and an inexpensive SSD.[3] The Microsoft DiskANN code is open source on GitHub, and the algorithm is integrated into [Milvus](/wiki/milvus), pgvectorscale (the Timescale extension on top of [pgvector](/wiki/pgvector)), and other systems.

### SPANN

SPANN was published in 2021 (NeurIPS 2021, arxiv 2111.08566) by researchers at Microsoft, Peking University, Tencent, and Baidu.[5] It implements a hybrid memory-disk inverted-file system: centroids of the posting lists are kept in RAM via a Microsoft SPTAG index, while the larger posting lists themselves live on disk. A hierarchical balanced clustering scheme keeps posting lists similar in length, and a query-aware pruning scheme avoids reading unnecessary lists at search time. SPANN can reach 90% recall@1 in around one millisecond on billion-scale datasets with only 32 GB of RAM, more than twice as fast as DiskANN at the same memory budget on three billion-scale datasets.[5]

### Locality-Sensitive Hashing (LSH)

LSH is the classical hashing-based ANN technique, introduced by Piotr Indyk and Rajeev Motwani in 1998 (STOC 1998).[6] LSH families are designed so that hash collisions are maximized for nearby points and minimized for distant points.[6] Random-projection LSH for Euclidean and Manhattan distances uses random hyperplanes that partition the space into buckets, while MinHash and SimHash variants target Jaccard and cosine similarities. LSH is fast and memory-efficient but tends to deliver lower recall than graph-based methods at equivalent throughput, and it has largely been displaced by HNSW in modern vector databases. It remains useful for very high-dimensional sparse data, near-duplicate detection in web crawls, and theoretically grounded approximate search.

### Other indexing approaches

Other algorithms appear in production: Flat (brute-force) gives perfect recall by exhaustive comparison and is useful below a few hundred thousand vectors; Annoy uses random-projection forests of binary trees and is popular for read-heavy workloads from Spotify; NSG (Cong Fu et al., 2017) is a graph algorithm focused on small index size; hnswlib is a reference C++ implementation of HNSW that several engines embed; and RaBitQ is a newer quantization scheme with theoretical recall guarantees that has begun to appear in research benchmarks.

## Distance metrics

Vector databases rely on distance or similarity metrics to determine how close two vectors are in the embedding space. Choosing the correct metric is important because it should match the metric used during the training of the [embedding model](/wiki/embedding_model).[35] Using cosine similarity with a model trained for inner-product retrieval, for example, can degrade recall noticeably.[35]

| Metric | Formula Summary | Best For | Notes |
|---|---|---|---|
| Cosine Similarity | cos(θ) = (a · b) / (‖a‖ ‖b‖); measures angle, ignores magnitude | Text similarity, document comparison, [NLP](/wiki/natural_language_processing) tasks | Most common default; insensitive to vector length |
| Euclidean Distance (L2) | √(Σ (a_i - b_i)²); straight-line distance | Clustering, [anomaly detection](/wiki/anomaly_detection), spatial data | Sensitive to magnitude; penalizes large absolute differences |
| Dot Product (Inner Product) | a · b = Σ a_i b_i; sum of element-wise products | Recommendation systems, ranking tasks | Equivalent to cosine similarity when vectors are normalized |
| Manhattan Distance (L1) | Σ |a_i - b_i|; sum of absolute differences | Sparse data, grid-based environments | Less common in vector databases |
| Hamming Distance | Number of positions where two binary vectors differ | Binary embeddings, hash-based search | Used with binary quantization for speed |
| Jaccard | |A ∩ B| / |A ∪ B|; set overlap ratio | Sparse boolean or sketch data | Common with MinHash and shingled documents |
| Squared Euclidean | Σ (a_i - b_i)² | Same ranking as L2 with cheaper math | Skips the square root for speed |

When vectors are normalized to unit length, cosine similarity and dot product produce identical rankings.[35] Many embedding models, such as [OpenAI](/wiki/openai)'s text-embedding-3 family and [Sentence Transformers](/wiki/sentence_transformers) trained with cosine objectives, output normalized vectors, so either metric can be used interchangeably in those cases. For maximum-inner-product search, where vectors are not normalized and magnitude carries information, dot product is preferred and is the metric ScaNN's anisotropic quantization is designed for.[4]

```text
Quick guide to picking a distance metric:
- text embeddings (most modern models): cosine or dot product (interchangeable when normalized)
- recommendation systems with biased magnitudes: dot product
- clustering or geometric data: Euclidean (L2)
- compressed binary representations: Hamming
- sparse hash-based fingerprints: Jaccard
```

## Core database operations

Beyond similarity search, modern vector databases support a broader set of operations that distinguish them from pure ANN libraries.

### CRUD and upserts

Vectors can be inserted, updated, deleted, and fetched by primary key. Most engines accept an upsert operation that inserts a vector if its ID is new and replaces the existing entry otherwise. Because graph indexes are sensitive to deletions, engines such as [Qdrant](/wiki/qdrant), [Weaviate](/wiki/weaviate), and [Milvus](/wiki/milvus) implement tombstoning with periodic background compaction.

### Metadata filtering

Each vector typically carries a payload of structured metadata such as document IDs, timestamps, tags, languages, prices, or access-control attributes. Filtering this metadata alongside vector similarity is essential for production retrieval. Approaches include:

- **Pre-filtering** evaluates the metadata predicate first and runs ANN only on the surviving subset. It produces correct top-k results but can be slow when the predicate is expensive or returns a small fraction of the corpus.
- **Post-filtering** runs ANN first to retrieve a larger candidate set and then applies the metadata filter. It is fast but may return fewer than k results when filters are highly selective.
- **Single-stage filtering** integrates the metadata predicate directly into the ANN traversal, pruning graph edges or inverted-list buckets that violate the filter. [Pinecone](/wiki/pinecone) implements this with bitmap indexes, [Qdrant](/wiki/qdrant) with payload-aware HNSW link selection, and [Milvus](/wiki/milvus) with bitmap and bloom filters.

The choice between strategies depends on filter selectivity, ANN parameters, and dataset size, and most engines pick one automatically based on workload heuristics.

### Hybrid search (sparse plus dense)

Hybrid search combines lexical retrieval (typically [BM25](/wiki/bm25) or SPLADE) with dense vector search. Dense embeddings capture semantic similarity but can miss exact-match cues such as product SKUs, person names, or rare technical terms. Lexical search excels at exact match but lacks semantic generalization. Combining the two consistently outperforms either alone.

The fusion step combines two ranked lists into one. The most popular method is **Reciprocal Rank Fusion (RRF)**, which scores a document as the sum of 1 / (k + rank_i) across the two retrievers, with k typically set to 60.[31] RRF is insensitive to score-scale differences and requires no tuning.[30] Other approaches include weighted score normalization, learned-to-rank rerankers, and convex combinations. Hybrid retrieval has become the default in production [RAG](/wiki/retrieval_augmented_generation_rag) systems through 2025 and 2026; one VentureBeat survey reported hybrid retrieval intent tripling among enterprise teams in Q1 2026.[44]

```text
RRF score for a document d:
RRF(d) = sum over retrievers i of 1 / (k + rank_i(d))
default k = 60
```

[Weaviate](/wiki/weaviate), [Milvus](/wiki/milvus), [Qdrant](/wiki/qdrant), [Elasticsearch](/wiki/elasticsearch), [OpenSearch](/wiki/opensearch), and [pgvector](/wiki/pgvector) (via PostgreSQL full-text search and the `pgsearch` extension) all support hybrid search natively.

### Multi-vector and named vector support

A single record can carry multiple embeddings: one for the title, one for the body, one from a multilingual model, one from a multimodal model. Vector databases support this through named vectors ([Qdrant](/wiki/qdrant)), multi-vector fields ([Weaviate](/wiki/weaviate), [Pinecone](/wiki/pinecone)), or separate collections joined by foreign keys ([pgvector](/wiki/pgvector)). Multi-vector queries typically combine per-vector similarities via maxSim, average, or weighted aggregation. ColBERT-style late-interaction models, which produce one vector per token, are increasingly supported as a special case.

## Major vector database products

Several vector databases have emerged to cater to the growing demand for AI applications. The following table compares the most widely adopted systems as of early 2026.

| Feature | [Pinecone](/wiki/pinecone) | [Weaviate](/wiki/weaviate) | [Milvus](/wiki/milvus) | [Qdrant](/wiki/qdrant) | [Chroma](/wiki/chroma) | [pgvector](/wiki/pgvector) |
|---|---|---|---|---|---|---|
| **Type** | Fully managed (serverless) | Open-source + managed cloud | Open-source + managed (Zilliz) | Open-source + managed cloud | Open-source (embedded) | PostgreSQL extension |
| **Founded** | 2019 | 2019 (project from 2016) | 2019 (Zilliz 2017) | 2021 | 2022 | 2021 |
| **Founders** | Edo Liberty | Bob van Luijt, Etienne Dilocker, Micha Verhagen | Charles Xie | Andrey Vasnetsov, Andre Zayarni | Jeff Huber, Anton Troynikov | Andrew Kane |
| **License** | Proprietary | BSD-3-Clause | Apache 2.0 | Apache 2.0 | Apache 2.0 | PostgreSQL License |
| **Written In** | Proprietary (undisclosed) | Go | Go / C++ | Rust | Rust (rewritten 2025) | C |
| **Index Types** | Proprietary (auto-tuned) | HNSW | HNSW, IVF-Flat, IVF-PQ, DiskANN, GPU indexes | HNSW (with quantization) | HNSW | HNSW, IVFFlat |
| **Max Dimensions** | 20,000 | Configurable | 32,768 | 65,536 | Configurable | 16,000 (with halfvec up to 4,000 indexed) |
| **Distance Metrics** | Cosine, Euclidean, Dot Product | Cosine, L2, Dot, Hamming, Manhattan | Cosine, L2, IP, Jaccard, Hamming | Cosine, Euclidean, Dot Product, Manhattan | Cosine, L2, IP | Cosine, L2, Inner Product |
| **Hybrid Search** | Sparse vectors (2024+) | Native BM25 + vector (BlockMax WAND) | Sparse-BM25 (Milvus 2.5+) | Named vector hybrid (v1.9+) | Metadata + full-text | SQL + vector via standard PostgreSQL |
| **Metadata Filtering** | Single-stage with bitmap indexes | GraphQL-based filtering | Boolean expressions | Rich JSON payload filtering | Metadata filtering | Standard SQL WHERE clauses |
| **Multi-Tenancy** | Namespaces | Native tenant isolation | Partitions / collections | Payload-based or collection-based | Collections | Schema-based (standard PostgreSQL) |
| **Cloud Regions** | AWS, GCP, Azure | AWS, GCP, Azure | Zilliz: AWS, GCP, Azure | AWS, GCP, Azure | Chroma Cloud (limited) | Any PostgreSQL host |
| **GitHub Stars (2026)** | N/A (closed source) | ~13,000 | ~38,000 | ~26,000 | ~22,000 | ~16,000 |
| **Best For** | Zero-ops managed vector search | Hybrid/semantic search with GraphQL | Billion-scale deployments | Cost-efficient filtered search | Rapid prototyping | Adding vectors to existing PostgreSQL |

### Pinecone

[Pinecone](/wiki/pinecone) was founded in 2019 by Edo Liberty, who previously led the Algorithms group at AWS AI Labs and worked at Yahoo Research. The company raised a USD 10 million seed in 2021,[8] a USD 28 million Series A in 2022, and a USD 100 million Series B in April 2023 led by Andreessen Horowitz at a USD 750 million valuation, bringing total disclosed funding to roughly USD 138 million.[9] Announcing the Series B, Liberty framed the product as a new category of infrastructure: "We created Pinecone and the vector database category as a whole to let all AI developers easily work with a scalable and cost efficient database for this workload."[9] Pinecone uses proprietary, auto-tuned indexing and offers single-stage metadata filtering with bitmap indexes, namespace-based multi-tenancy, and scale-to-zero. In 2024 it replaced pod-based pricing with a serverless model, and in 2025 introduced sparse vector support for hybrid search. In September 2025, founder Edo Liberty moved to chief scientist and former Google director Ash Ashutosh became CEO;[10][42] The Information reported that Pinecone, last valued at USD 750 million, had spoken to investment bankers about a potential sale and had received takeover interest, with speculation about a valuation north of USD 2 billion.[43]

### Weaviate

[Weaviate](/wiki/weaviate) is an open-source vector database written in Go. The Weaviate project was started in March 2016 by Bob van Luijt, who co-founded SeMI Technologies (short for Semantic Machine Insights) in 2019 with Etienne Dilocker and Micha Verhagen; the company later renamed itself Weaviate B.V.[11] SeMI raised a USD 16 million Series A in February 2022 led by Cortical Ventures and a USD 50 million Series B in April 2023 led by Index Ventures with Battery Ventures, NEA, Zetta Venture Partners, and ING Ventures participating.[12] Weaviate provides a GraphQL API, native BM25 plus vector hybrid search using BlockMax WAND, and built-in vectorization modules for [OpenAI](/wiki/openai), [Cohere](/wiki/cohere), and [Hugging Face](/wiki/hugging_face). Weaviate Cloud offers managed hosting across AWS, GCP, and Azure.

### Milvus

[Milvus](/wiki/milvus) is an open-source, cloud-native vector database under the Apache 2.0 license, written primarily in Go and C++. It was created by Charles Xie, who founded [Zilliz](/wiki/zilliz) in 2017 and previously worked at Oracle as a founding engineer of the Oracle 12c project. Milvus was first released in 2019 and donated to the LF AI and Data Foundation in 2020. Zilliz raised USD 43 million in its initial Series B and an additional USD 60 million Series B extension in August 2022 led by Prosperity7 Ventures, with Pavilion Capital, Hillhouse Capital, 5Y Capital, and Yunqi Capital participating, bringing total funding to roughly USD 113 million.[16] Milvus has a distributed architecture that separates compute and storage and supports the widest range of index types, including HNSW, IVF-Flat, IVF-PQ, DiskANN, and GPU-accelerated indexes. Milvus 2.5 introduced native sparse-BM25 hybrid search. Zilliz Cloud provides managed versions with serverless and dedicated deployment options.

### Qdrant

[Qdrant](/wiki/qdrant) is an open-source vector database written in Rust, founded in 2021 in Berlin by Andrey Vasnetsov (CTO) and Andre Zayarni (CEO).[14] The company raised an early USD 7.5 million seed round and a USD 28 million Series A in January 2024 led by Spark Capital with participation from Unusual Ventures and 42CAP.[13] Qdrant is known for its compact memory footprint and strong filtering performance.

Qdrant uses HNSW indexing with built-in scalar, product, and binary quantization options. Its payload-based filtering system supports rich JSON queries and can be combined with vector search in a single request. Qdrant v1.9 introduced named vector hybrid search, and 2025 releases added GPU-accelerated index construction. The Qdrant Cloud managed service offers a permanent free tier with 1 GB of vector storage.

### Chroma

[Chroma](/wiki/chroma) is an open-source, embedded vector database designed for rapid prototyping and small-to-medium applications. It was founded on April 1, 2022 by Jeff Huber (CEO) and Anton Troynikov (CTO) in San Francisco. The company raised a USD 2.3 million pre-seed in May 2022 and an USD 18 million seed round led by Quiet Capital with participation from Naval Ravikant and others.[17] The first GitHub commit landed in October 2022, and the initial release shipped on October 22, 2022.

Originally written in Python, Chroma underwent a major rewrite in Rust during 2025, delivering approximately 4x faster writes and queries. Chroma provides a simple, NumPy-like API with built-in metadata filtering and full-text search. It is commonly used in local development environments and as an embedded store in [LangChain](/wiki/langchain) and [LlamaIndex](/wiki/llamaindex) workflows. Chroma Cloud launched in 2025 with a credit-based pricing model.

### pgvector

[pgvector](/wiki/pgvector) is an open-source PostgreSQL extension authored by Andrew Kane and first released as version 0.1.0 on April 20, 2021.[18] It adds vector similarity search capabilities to any PostgreSQL database. pgvector supports HNSW (added in version 0.5.0, August 2023) and IVFFlat indexes, with distance metrics including cosine, L2, and inner product.[19] The companion pgvectorscale extension (developed by Timescale, now Tiger Data) adds DiskANN-based indexing and Statistical Binary Quantization, enabling competitive performance on datasets of up to 50 million vectors.

pgvector supports standard vectors, half-precision vectors (halfvec, up to 4,000 indexed dimensions), sparse vectors (sparsevec), and binary vectors (bit type, up to 64,000 indexed dimensions).[18] Because pgvector runs inside PostgreSQL, developers can store vectors alongside relational data and query both with standard SQL, eliminating the need for a separate vector database in many use cases. pgvector is included by default in managed PostgreSQL offerings from AWS RDS, Google Cloud SQL, Azure Database for PostgreSQL, [Supabase](/wiki/supabase), [Neon](/wiki/neon), and Tiger Data.

### Faiss

[Faiss](/wiki/faiss) (Facebook AI Similarity Search) is an open-source library released by [Meta](/wiki/meta) AI Research on March 29, 2017.[20] Hervé Jégou initiated the project and wrote the first implementation, with Matthijs Douze implementing most of the CPU code and Jeff Johnson contributing the GPU implementation. Announcing the release, Meta wrote: "We've built nearest-neighbor search implementations for billion-scale data sets that are some 8.5x faster than the previous reported state-of-the-art."[20] Faiss is a library rather than a standalone database; it provides reference implementations of IVF, PQ, IVF-PQ, OPQ, HNSW, and many composite indexes.[7] It is the algorithmic foundation for several commercial systems, including parts of Pinecone and Milvus.

### Vespa

[Vespa](/wiki/vespa) is an open-source big-data serving engine developed at Yahoo and open-sourced in September 2017 under the Apache 2.0 license.[21] Yahoo had built Vespa from technology acquired with Overture and the AlltheWeb search engine in the mid-2000s. Jon Bratseth, the distinguished architect who led Vespa's development, became CEO when Yahoo spun Vespa out as an independent company in October 2023, and Vespa.ai raised a USD 31 million Series A from Blossom Capital in November 2023.[22] Vespa supports HNSW, learned ranking models, lexical search, and tensor evaluation in the same engine, and is used by Spotify, OkCupid, and Wix, among others.

### LanceDB

[LanceDB](/wiki/lancedb) is an open-source, serverless vector database founded in 2022 by Chang She and Lei Xu. Chang She is one of the original co-authors of the [pandas](/wiki/pandas) library, and Lei Xu was a core HDFS contributor and led ML infrastructure at Cruise. LanceDB is built on the Lance columnar file format, an alternative to Parquet optimized for ML workloads. The company raised approximately USD 41 million across pre-seed, seed, and Series A rounds, and lists Midjourney among its production customers.[23] LanceDB targets multimodal workloads where image, text, and metadata vectors live alongside source media, with object storage as the primary persistence tier.

### Turbopuffer

[Turbopuffer](/wiki/turbopuffer) is a serverless vector and full-text search engine founded in 2023 by Simon Hørup Eskildsen and Justin Li, both former Shopify engineers. Turbopuffer stores its primary copy of data in object storage such as Amazon S3 and layers SSD and memory caches on top, achieving roughly 10x lower storage cost than RAM-resident vector databases.[24] The company is backed by Lachy Groom, Thrive Capital, and others. As of 2025 and 2026, Turbopuffer reported powering search at Cursor, Notion, Linear, and other large AI-native applications, handling more than 2.5 trillion documents, 10 million writes per second, and over 10,000 queries per second in production.[24]

### MongoDB Atlas Vector Search

MongoDB Atlas Vector Search reached general availability on December 4, 2023, after a public preview launched in June 2023.[25] It allows MongoDB users to add vector indexes to existing collections and run `$vectorSearch` aggregation pipelines that combine vector similarity with MongoDB's existing query operators and aggregation framework. By late 2024, MongoDB reported Atlas Vector Search had become one of the most widely deployed vector solutions, in part because existing MongoDB users could adopt vector search without standing up a new database.

### Elasticsearch and OpenSearch

[Elasticsearch](/wiki/elasticsearch) added a `dense_vector` field type in 7.x and introduced HNSW-based ANN search in version 8.0 (February 2022).[26] The 9.0 release in 2025 made `int8_hnsw` the default, and the engine now also supports BBQ (Better Binary Quantization) for high-dimensional embeddings. [OpenSearch](/wiki/opensearch), the AWS-led fork of Elasticsearch, ships an analogous `knn_vector` type with HNSW (via [hnswlib](/wiki/hnswlib)), Faiss, and Lucene-native back ends, plus first-class support for Reciprocal Rank Fusion.[30] Both engines are popular for teams that already operate an inverted-index search cluster and want to add vectors without introducing a new system.

### Redis

Redis added vector similarity search in RediSearch 2.4, announced at RedisDays NY 2022.[27] The module supports both flat (brute force) and HNSW indexes, with cosine, inner product, and Euclidean distances.[27] Redis vector search is commonly used as a low-latency cache for embeddings or as an in-memory primary store for small-to-medium corpora, with Redis Enterprise Cloud and Azure Cache for Redis Enterprise offering managed deployments.

### Other notable systems

| System | Type | Notes |
|---|---|---|
| Annoy | Library | Random-projection forest, by Erik Bernhardsson at Spotify; widely used for read-heavy use cases |
| ScaNN | Library | Google's open-source ANN library; underlies Vertex AI Vector Search |
| Snowflake | Data warehouse | VECTOR data type and VECTOR_COSINE_SIMILARITY function (2024) |
| Databricks Vector Search | Lakehouse | Unity Catalog-aware vector index over Delta tables |
| Oracle Database 23ai | Relational DB | Native VECTOR data type and AI Vector Search (2024) |
| SQL Server | Relational DB | VECTOR_DISTANCE function and ANN indexes (2025) |
| Amazon S3 Vectors | Object store | S3-native vector index and query API (2025) |

## Pricing comparison

Vector database pricing varies significantly depending on whether the service is fully managed, self-hosted, or serverless. The following table summarizes pricing as of early 2026.[45][46]

| Database | Free Tier | Paid Starting Price | Pricing Model | Estimated Cost at Scale (1B vectors, 100 QPS) |
|---|---|---|---|---|
| [Pinecone](/wiki/pinecone) | Starter plan (limited) | $0.33/GB storage + $8.25/1M read units + $2.00/1M write units | Serverless consumption-based | ~$3,500/month (fully managed) |
| [Weaviate](/wiki/weaviate) | 14-day trial | From $25/month | AIU-based or serverless consumption | ~$2,200/month (managed); ~$800/month (self-hosted + ops overhead) |
| [Milvus](/wiki/milvus) / Zilliz | Zilliz free plan (up to 5 GB) | Zilliz serverless from ~$89/month; dedicated from ~$114/month | CU-based (~$0.15/CU/hour) | Varies by configuration (self-hosted: infrastructure cost only) |
| [Qdrant](/wiki/qdrant) | 1 GB free forever (no credit card) | From $25/month; Hybrid Cloud from $99/month | Storage + compute based | Competitive; lower than Pinecone at equivalent scale |
| [Chroma](/wiki/chroma) | Open-source (free); Chroma Cloud $5 free credits | Usage-based | Credit-based | Primarily for small/medium workloads |
| [pgvector](/wiki/pgvector) | Free (open-source extension) | Infrastructure cost only | N/A (runs on your PostgreSQL instance) | Infrastructure cost only; ~25% the cost of Pinecone at equivalent recall |
| [Turbopuffer](/wiki/turbopuffer) | Limited free tier | Usage-based on object storage | Storage + query consumption | ~10x cheaper than RAM-resident managed services |
| MongoDB Atlas Vector Search | Atlas free tier | From dedicated cluster cost (~$57/month) | Atlas cluster pricing + dedicated search nodes | Varies with workload |
| Elasticsearch / OpenSearch | Self-hosted free | From ~$95/month managed | Cluster size + storage | Higher than dedicated vector DBs at equivalent recall |

Usage-based cloud pricing can become volatile at scale. A [RAG](/wiki/retrieval_augmented_generation_rag) application handling 150 million queries per month could incur USD 5,000 to USD 6,000 in monthly costs on a serverless platform, prompting some organizations to evaluate reserved capacity or self-hosted alternatives. Object-storage-backed engines such as Turbopuffer and Amazon S3 Vectors target this exact pain point by trading a small amount of latency for a much lower price per stored vector.

## Performance benchmarks

Vector database benchmarks measure queries per second (QPS), recall (the fraction of true nearest neighbors found), and latency (typically p50, p95, or p99).[32] Results depend heavily on dataset size, vector dimensionality, hardware, and precision thresholds, so benchmarks should be compared only when similar configurations are used.[33]

### Benchmarking suites

| Benchmark | Maintainer | Scope | Notes |
|---|---|---|---|
| ANN-Benchmarks | Erik Bernhardsson, Martin Aumüller | Library-level | Plots recall vs QPS on standard datasets like SIFT-1M, GIST-1M, glove-100; widely cited |
| Big-ANN-Benchmarks | NeurIPS billion-scale challenge | Billion-scale | Tracks T1 (in-memory), T2 (out-of-memory), T3 (custom hardware) |
| VectorDBBench | Zilliz, open source | Database-level | End-to-end benchmark across managed and self-hosted vector DBs |
| ann-benchmarks-private | Vendor-run | Database-level | Used in marketing claims; reproducibility varies |
| BEIR | Information retrieval benchmark | Retrieval quality | Measures end-to-end retrieval, including embeddings |
| MTEB | Massive Text Embedding Benchmark | Embedding quality | Measures embedding model performance on retrieval, classification, clustering |

### Representative benchmark results

| Database | Dataset | Vectors | Dimensions | QPS | Recall | Latency (p95) | Source |
|---|---|---|---|---|---|---|---|
| Qdrant | dbpedia-openai-1M | 1M | 1,536 | 626 | 99% | ~1 ms (p99) | Qdrant Benchmarks |
| pgvectorscale | Custom 50M | 50M | 768 | 471 | 99% | Low (unspecified) | Timescale Benchmarks |
| Qdrant | Custom 50M | 50M | 768 | 41 | 99% | Higher | Timescale Benchmarks |
| Milvus / Zilliz | 768-dim text embeddings | Varies | 768 | High | 99% | <10 ms (p50) | VectorDBBench |
| Pinecone | 768-dim text embeddings | Varies | 768 | High | 99% | ~7 ms (p99) | Pinecone Documentation |
| Weaviate | 768-dim text embeddings | Varies | 768 | Moderate | 95%+ | ~50 ms | VectorDBBench |
| DiskANN | SIFT1B | 1B | 128 | >5,000 | 95%+ (1-recall@1) | <3 ms (mean) | NeurIPS 2019 paper |
| SPANN | SIFT1B | 1B | 128 | High | 90% (recall@1, recall@10) | ~1 ms | NeurIPS 2021 paper |

Key observations from recent benchmarks:

- **Milvus / Zilliz Cloud** consistently leads in low-latency scenarios, with sub-10 ms p50 latency on standard embedding workloads.
- **Qdrant** achieves the highest requests per second on smaller datasets (up to a few million vectors) and offers strong performance with filtering.[38]
- **pgvectorscale** has demonstrated surprisingly competitive throughput (471 QPS at 99% recall on 50 million vectors), outperforming some purpose-built vector databases.
- **Pinecone** maintains consistent low latency (7 ms p99) across varying loads thanks to its serverless auto-scaling architecture.
- **Weaviate** provides solid hybrid search performance but requires more memory and compute than alternatives at very large scale.
- **DiskANN and SPANN** continue to set the bar for billion-scale workloads on a single node, and both are now integrated into commercial systems.[3][5]

## Embedding models commonly paired with vector databases

A vector database is only as useful as the embeddings it stores. Most production systems pair a vector database with a managed or open-source embedding model. The most widely used models as of 2026 include:

| Model | Provider | Dimensions | Context | Notes |
|---|---|---|---|---|
| text-embedding-3-large | [OpenAI](/wiki/openai) | up to 3,072 (Matryoshka) | 8K tokens | Default OpenAI text embedding; supports dimension truncation |
| text-embedding-3-small | OpenAI | up to 1,536 (Matryoshka) | 8K tokens | Cheaper sibling of text-embedding-3-large |
| voyage-3-large | [Voyage AI](/wiki/voyage_ai) | 1,024 (Matryoshka, with int8/binary variants) | 32K tokens | Outperforms text-embedding-3-large by ~10% on Voyage's 100-dataset benchmark |
| embed-english-v3.0 | [Cohere](/wiki/cohere) | 1,024 | 512 tokens | Strong for English RAG; binary and int8 variants available |
| embed-multilingual-v3.0 | Cohere | 1,024 | 512 tokens | 100+ languages |
| all-MiniLM-L6-v2 | [Sentence Transformers](/wiki/sentence_transformers) | 384 | 256 tokens | Open weights; common baseline |
| BGE-M3 | BAAI | 1,024 | 8K tokens | Open weights; strong multilingual performance |
| nomic-embed-text-v1.5 | Nomic | 768 (Matryoshka) | 8K tokens | Open weights; popular for self-hosted RAG |
| Gemini text-embedding | [Google](/wiki/google) | up to 3,072 | 2K tokens | Tightly integrated with Vertex AI Vector Search |

Matryoshka representation learning, used by OpenAI, Voyage AI, and Nomic, encodes the most semantically important information into the first dimensions of each vector.[28] Truncating a 3,072-dimensional embedding to 256 dimensions preserves most of its quality, turning dimension count into a runtime knob that lets teams trade storage cost for retrieval quality.[29]

## Vector databases in retrieval-augmented generation (RAG)

One of the most significant applications of vector databases is [retrieval-augmented generation](/wiki/retrieval_augmented_generation_rag) (RAG), an architecture that combines external knowledge retrieval with [large language models](/wiki/large_language_model) to produce more accurate and grounded responses.

### How RAG works with vector databases

A RAG pipeline operates through four stages:

1. **Ingestion.** Documents, web pages, or other knowledge sources are split into chunks and passed through an [embedding model](/wiki/embedding_model) (such as OpenAI's text-embedding-3-large or [Cohere](/wiki/cohere)'s embed-v3) to produce vector representations.[28] These vectors are stored in a vector database along with the original text as metadata.
2. **Retrieval.** When a user submits a query, the query is embedded using the same model. The vector database performs a similarity search to find the most relevant document chunks, typically returning the top-k results. Filters and hybrid scoring may be applied at this stage.
3. **Augmentation.** The retrieved chunks are inserted into the LLM's prompt as context, along with the user's original question. Many systems also pass system instructions, formatting rules, and citation templates.
4. **Generation.** The LLM generates a response grounded in the retrieved context, reducing [hallucination](/wiki/hallucination) and improving factual accuracy. A re-ranking model (such as Cohere Rerank or a cross-encoder) is often inserted between retrieval and augmentation to improve relevance.

### Why vector databases are essential for RAG

RAG applications demand sub-100 ms retrieval latency to avoid adding perceptible delay to the LLM's response time. Vector databases are built for exactly this workload, with in-memory indexes, SIMD-accelerated distance calculations, and distributed architectures that scale horizontally. Traditional databases and search engines can perform keyword matching, but they cannot capture the semantic meaning of queries the way vector similarity search does.

### Advanced RAG techniques (2025-2026)

Several advanced techniques have emerged to improve RAG quality:

- **Hybrid search.** Combining BM25 keyword search with vector similarity search consistently outperforms either method alone.[44] Reciprocal Rank Fusion is the default fusion method, and most major vector databases now support it natively.[31]
- **Query rewriting.** Multi-query expansion (asking the LLM to generate paraphrases of the user query) and HyDE (Hypothetical Document Embeddings, where the LLM hallucinates a likely answer that is then embedded and used as the query) both materially improve recall.
- **Re-ranking.** A cross-encoder re-ranker scores the top 50 to 200 retrieved chunks against the query and re-orders them before generation. Cohere Rerank, BGE Reranker, and OpenAI's o-series small re-rankers are all in production use.
- **Self-RAG.** This technique trains models to decide when retrieval is necessary and to critique their own outputs, improving factuality and citation accuracy.
- **Multimodal RAG.** New embedding families (such as OpenAI's text-embedding-3 models and voyage-multimodal-3) unify text and images into a shared vector space, enabling retrieval across modalities. This is useful for technical manuals with diagrams or scanned forms.
- **Graph-enhanced retrieval.** Combining vector search with [knowledge graphs](/wiki/knowledge_graph) to capture relationships between entities, not just semantic similarity. This approach is gaining traction in 2026 as a way to improve retrieval for complex, multi-hop questions.
- **Agentic memory.** In [AI agent](/wiki/ai_agents) systems, vector databases serve as long-term memory stores, allowing agents to recall previous interactions and context across sessions.
- **ColBERT and late interaction.** Token-level embedding models such as ColBERTv2 store one vector per token and aggregate similarity at query time using maxSim. Several vector databases (Vespa, Qdrant, LanceDB) support this pattern through multi-vector or multi-tensor fields.

## Use cases

Vector databases support a broad set of applications across industries.

### Semantic search

Unlike lexical search, which relies on exact word or string matches, semantic search uses the meaning and context of a search query or question. Vector databases use [Natural Language Processing](/wiki/natural_language_processing) [models](/wiki/models) to store and index vector embeddings, allowing for more accurate and relevant search results. E-commerce, support knowledge bases, intranet search, and code search all rely on semantic retrieval at scale.

### Similarity search for unstructured data

Vector databases facilitate the search and retrieval of unstructured data like images, [audio](/wiki/audio), video, and JSON, which can be challenging to classify and store in traditional databases. Image-similarity search powers reverse-image lookup, content moderation, fashion recommendation, and visual product search.

### Recommendation systems

By finding similar items or users based on nearest matches in a learned embedding space, vector databases power recommendation engines for online retailers and streaming media services. Two-tower models, where users and items each have their own embedding tower, are typically deployed by computing user vectors at request time and looking up nearest items in a vector index.

### Deduplication and record matching

Vector similarity search can be used to find near-duplicate records for applications such as removing duplicate items from a catalog, deduplicating user-generated content, and matching customer records across systems (entity resolution).

### Anomaly detection

Vector databases can identify anomalies in applications used for threat assessment, fraud detection, and IT operations by finding objects that are distant or dissimilar from expected results. Time-series embeddings, log embeddings, and network flow embeddings are all routinely indexed to detect outliers.

### Code search and developer tools

[GitHub Copilot](/wiki/github_copilot), [Cursor](/wiki/cursor), [Sourcegraph Cody](/wiki/sourcegraph), and other AI coding tools embed code and documentation into vector indexes to support semantic code search, repository-aware completion, and chat over codebases. Cursor and Notion are both reported users of [Turbopuffer](/wiki/turbopuffer).[24]

## Using vector databases with large language models

One of the primary reasons for the increasing popularity of vector databases is their ability to extend large language models (LLMs) with long-term memory. By providing a general-purpose model, such as [OpenAI](/wiki/openai)'s [GPT-4](/wiki/gpt-4), [Anthropic](/wiki/anthropic)'s [Claude](/wiki/claude), [Meta](/wiki/meta)'s [LLaMA](/wiki/llama), or [Google](/wiki/google)'s [Gemini](/wiki/gemini), users can store their own data in a vector database. When [prompted](/wiki/prompt), the database can query relevant documents to update the context, customizing the final response and providing the AI with long-term memory.

In addition, vector databases integrate with orchestration frameworks like [LangChain](/wiki/langchain), [LlamaIndex](/wiki/llamaindex), [Haystack](/wiki/haystack), and [Semantic Kernel](/wiki/semantic_kernel), which combine multiple LLMs, retrievers, and tools into multi-step pipelines. These frameworks abstract over individual vector databases and let teams swap engines without changing the application code.

## Features of a vector database

### Vector indexes for search and retrieval

Vector databases employ algorithms to index and retrieve vectors efficiently. Accuracy, latency, or memory usage may need to be prioritized depending on specific use cases. Common similarity and distance metrics used in vector indexes are Euclidean distance, cosine similarity, and dot products.

Approximate Nearest Neighbor ([ANN](/wiki/ann)) search is a popular technique to balance precision and performance. ANN algorithms, such as HNSW, IVF, or PQ, focus on improving specific performance properties like memory reduction or fast and accurate search times. Composite indexes combine several components and are often used to achieve optimal performance for a given use case.

### Single-stage filtering

Single-stage filtering is essential for effective vector databases, as it enables users to limit search results based on vector metadata. It combines the accuracy of pre-filtering with the speed of post-filtering, merging vector and metadata indexes into a single index for optimal performance.

### API and observability

REST and gRPC APIs allow vector databases to be accessed from any environment capable of making network calls, with native clients in Python, Java, Go, JavaScript, and Rust. Production deployments expose Prometheus or OpenTelemetry metrics, structured logs, and audit trails. Multi-tenancy support typically includes per-collection access control, namespace-level encryption, and customer-managed keys (BYOK) on managed services. SOC 2, ISO 27001, HIPAA, and GDPR-aligned operations are common requirements for enterprise deployments.

## Comparison with traditional databases

| Aspect | Relational DB | Document DB | Search Engine | Vector Database |
|---|---|---|---|---|
| Primary data unit | Row | Document | Document with inverted index | Vector with payload |
| Primary query | SELECT with WHERE | find with predicate | match query (BM25) | k-nearest-neighbor |
| Index type | B-tree, hash | B-tree, geo | Inverted index | HNSW, IVF, PQ, DiskANN |
| Distance / scoring | Equality, ranges | Predicate match | TF-IDF, BM25 | Cosine, L2, dot product |
| Schema | Strict | Flexible | Mapped fields | Vectors plus payload |
| Scale | Vertical first, sharded | Horizontal | Horizontal | Horizontal |
| Strength | Transactions, joins | Flexible documents | Full-text search | Semantic similarity |

In practice, no production AI system relies on only one of these. The polystore pattern (relational data in PostgreSQL, vectors in pgvector or a dedicated engine, full-text in OpenSearch or in the same engine) has become standard. PostgreSQL with pgvector blurs the line further by hosting transactional, lexical, and vector workloads in a single database.

## Operating considerations

### Capacity planning

The dominant capacity drivers for a vector database are vector count, dimensionality, index type, replication factor, and quantization. As a worked example, 100 million 1,536-dimensional float32 vectors occupy roughly 614 GB of raw data; with HNSW (about 1.7x overhead), the index needs around 1 TB of RAM. With int8 scalar quantization that drops to roughly 150 GB, and binary quantization with full-precision re-ranking can fit in around 19 GB.

### Update workloads

Vector indexes vary in how cleanly they handle updates. HNSW supports incremental insertions naturally but degrades with frequent deletions; engines compensate with periodic rebuilds. IVF requires retraining when the data distribution shifts substantially, although small inserts can use the existing centroids. DiskANN's Vamana graph supports incremental updates through the SPFresh extension. Engines often expose a flush or build operation that the application can trigger after bulk inserts.

### Cost optimization patterns

- **Quantize aggressively.** Switching from float32 to int8 or binary on high-dimensional embeddings cuts storage by 4 to 32x with limited recall impact.
- **Right-size dimensions.** Matryoshka embeddings let teams truncate to 512 or 768 dimensions instead of using the full 1,024 to 3,072.
- **Use object storage for cold tiers.** Turbopuffer, Amazon S3 Vectors, and pgvectorscale's S3 tiering store cold vectors at object-storage prices.
- **Cap replication.** Replication factor 2 or 3 is sufficient for most read-heavy workloads.
- **Cache query embeddings.** Repeated queries (especially in chat applications) benefit from caching the embedding result.
- **Reserve capacity for steady traffic.** Reserved or committed-use pricing can reduce serverless costs by 40 to 60% on predictable workloads.

### Common pitfalls

- **Mismatched distance metrics.** Using cosine similarity with a model trained for dot product (or vice versa) silently degrades recall.
- **Inadequate chunking strategy.** Chunks that are too long dilute relevance; chunks that are too short lose context. Recursive character splitting with overlap is a typical default.
- **Ignoring filter selectivity.** Highly selective filters interact badly with naive ANN; engines that support single-stage filtering avoid this.
- **Not measuring recall.** Throughput numbers without recall measurements are meaningless; benchmark with a held-out kNN ground truth.

## 2025-2026 developments and trends

The vector database landscape has undergone significant changes in 2025 and early 2026, driven by the rapid adoption of [generative AI](/wiki/generative_ai) and [AI agents](/wiki/ai_agents).[39]

### Convergence with traditional databases

One of the most important trends is the convergence of vector search into traditional relational and document databases. PostgreSQL (via pgvector and pgvectorscale), MongoDB (via Atlas Vector Search), Oracle 23ai, Snowflake, Databricks, and Microsoft SQL Server have all integrated native vector support. PostgreSQL was the most-used database among developers in 2025 at 46.5% adoption, and in 2025 alone Snowflake and [Databricks](/wiki/databricks) spent approximately USD 1.25 billion acquiring PostgreSQL-focused companies.[41]

### Polystore architecture

Production teams have converged on a polystore architecture where the relational database handles ACID transactions and structured queries while the vector database handles embedding search and semantic retrieval. By 2026, this pattern has become the default for teams shipping AI-augmented products.

### AI agent workloads

The rise of [AI agents](/wiki/ai_agents) has created new demands on vector database infrastructure. AI agents generate roughly 10x more queries than human users, forcing architectural rethinks around throughput, connection pooling, and semantic caching.[41] Vector databases now serve as long-term memory systems for agents, storing conversation history, tool-use logs, and retrieved context across sessions. Patterns such as semantic caching (returning a cached LLM response when an incoming query matches a recent question above a similarity threshold) reduce both cost and latency for agent workloads.

### Object-storage-native architectures

A new generation of engines builds primarily on object storage rather than block-attached SSDs. [Turbopuffer](/wiki/turbopuffer), launched in 2023, demonstrated that an object-storage-first architecture could serve sub-10 ms p50 queries while costing roughly 10x less than RAM-resident managed services.[24] Amazon S3 Vectors, announced in 2025, takes the same approach inside the S3 service itself. This pattern is well suited to the long-tail, multi-tenant workloads typical of AI-native SaaS products.

### Graph-enhanced vector retrieval

One of the most notable developments in 2026 is the emergence of graph-enhanced vector retrieval, which combines vector similarity search with [knowledge graph](/wiki/knowledge_graph) traversal. This approach captures both semantic similarity (via vectors) and structural relationships (via graph edges), producing higher-quality retrieval for complex multi-hop queries.[39] Multiple vector database vendors have begun integrating graph capabilities or partnering with graph database providers, and Microsoft's open-source GraphRAG project popularized the pattern in 2024.

### Edge deployment

Comprehensive edge-native vector database solutions remain scarce despite a USD 168 billion edge computing market and 39 billion projected connected IoT devices by 2030. Lightweight engines such as [LanceDB](/wiki/lancedb), Chroma in embedded mode, and DuckDB-VSS target this segment for healthcare, autonomous systems, manufacturing, and retail.

### Performance and cost optimization

Chroma's 2025 Rust rewrite delivered approximately 4x performance improvements. [Qdrant](/wiki/qdrant) and [Milvus](/wiki/milvus) have added more aggressive quantization options (binary quantization, scalar quantization) to reduce memory costs. pgvectorscale's DiskANN-based indexing has brought competitive performance to PostgreSQL at a fraction of the cost of managed vector databases. Across the board, vendors are investing in techniques to reduce the cost per query as AI workloads scale.

### Consolidation pressure on pure-play vendors

By mid-2025, with the vector database market concentrated and well-defined, pure-play vendors faced consolidation pressure. The Information reported that [Pinecone](/wiki/pinecone) had engaged bankers about a potential sale, with speculation about a valuation north of USD 2 billion.[43] Possible suitors mentioned in press reports included Oracle, IBM, MongoDB, Snowflake, and Databricks. The September 2025 appointment of Ash Ashutosh as Pinecone's CEO and the founder's transition to chief scientist were widely read as positioning the company for either a strategic exit or a deeper enterprise push.[42]

## Frequently asked questions

### What is a vector database used for?

Vector databases are used wherever applications need to retrieve items by meaning rather than exact keywords: semantic search, recommendation systems, image and audio similarity search, deduplication and entity resolution, anomaly and fraud detection, code search in developer tools, and, most prominently, as the retrieval layer in retrieval-augmented generation and as long-term memory for AI agents.[34][41]

### How is a vector database different from a traditional database?

A relational or document database answers exact-match and range queries over rows or documents using B-tree or inverted indexes, while a vector database answers nearest-neighbor queries over high-dimensional embeddings using approximate nearest neighbor indexes such as HNSW, IVF, or DiskANN. The traditional database is optimized for transactions, joins, and full-text search; the vector database is optimized for semantic similarity. In practice the two are combined in a polystore architecture, and PostgreSQL with [pgvector](/wiki/pgvector) can serve both roles in one engine.[34]

### Is a vector database the same as a vector search library?

No. A vector search library such as [Faiss](/wiki/faiss) or [hnswlib](/wiki/hnswlib) is an in-process toolkit that builds and queries an index but leaves persistence, sharding, replication, multi-tenancy, and access control to the application. A vector database wraps those algorithms in a managed runtime with a network API, durability, and security. Faiss is the canonical library; [Pinecone](/wiki/pinecone), [Weaviate](/wiki/weaviate), [Milvus](/wiki/milvus), and [Qdrant](/wiki/qdrant) are canonical databases.[7]

### Which vector database is best?

There is no single best engine; the right choice depends on scale and operational constraints. [Pinecone](/wiki/pinecone) suits teams that want zero-ops managed search, [Milvus](/wiki/milvus) targets billion-scale deployments, [Qdrant](/wiki/qdrant) emphasizes cost-efficient filtered search, [Weaviate](/wiki/weaviate) offers hybrid and GraphQL search, [Chroma](/wiki/chroma) is popular for prototyping, and [pgvector](/wiki/pgvector) lets teams add vectors to an existing PostgreSQL database without standing up a new system.[45][46]

## Glossary

| Term | Definition |
|---|---|
| Embedding | A vector representation of a piece of data, produced by a neural network |
| ANN | Approximate Nearest Neighbor; algorithms that trade some recall for large speedups |
| kNN | k-Nearest Neighbor; the exact version of nearest-neighbor search |
| HNSW | Hierarchical Navigable Small World; the dominant graph-based ANN algorithm |
| IVF | Inverted File Index; partition-based ANN method |
| PQ | Product Quantization; subvector codebook compression method |
| LSH | Locality-Sensitive Hashing; classical hashing-based ANN approach |
| Recall | Fraction of true nearest neighbors found by an approximate search |
| QPS | Queries Per Second; throughput metric |
| BM25 | A classical lexical ranking function used in full-text search |
| RRF | Reciprocal Rank Fusion; method for combining ranked lists |
| Hybrid search | Combination of lexical (BM25) and dense vector retrieval |
| Pre-filter / Post-filter | Whether metadata constraints are applied before or after ANN |
| Quantization | Lossy compression of vectors to reduce memory and compute |
| Matryoshka | Embedding training scheme that nests information from coarse to fine dimensions |
| Polystore | An architecture combining multiple databases for different workloads |

## See also

- [Vector embedding](/wiki/vector_embedding)
- [Embedding model](/wiki/embedding_model)
- [Approximate nearest neighbor](/wiki/ann)
- [HNSW](/wiki/hnsw)
- [Retrieval-augmented generation](/wiki/retrieval_augmented_generation_rag)
- [LangChain](/wiki/langchain)
- [LlamaIndex](/wiki/llamaindex)
- [Quantization](/wiki/quantization)
- [Semantic search](/wiki/semantic_search)
- [Knowledge graph](/wiki/knowledge_graph)

## References

1. Malkov, Yu. A.; Yashunin, D. A. "Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs." arxiv preprint 1603.09320, 2016. Later in IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4):824-836, 2018. https://arxiv.org/abs/1603.09320
2. Jégou, Hervé; Douze, Matthijs; Schmid, Cordelia. "Product Quantization for Nearest Neighbor Search." IEEE TPAMI, 33(1):117-128, 2011. DOI 10.1109/TPAMI.2010.57. https://inria.hal.science/inria-00514462v1/document
3. Subramanya, Suhas Jayaram; Kadekodi, Rohan; Krishaswamy, Ravishankar; Simhadri, Harsha Vardhan. "DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node." NeurIPS 2019. https://www.microsoft.com/en-us/research/publication/diskann-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node/
4. Guo, Ruiqi; Sun, Philip; Lindgren, Erik; Geng, Quan; Simcha, David; Chern, Felix; Kumar, Sanjiv. "Accelerating Large-Scale Inference with Anisotropic Vector Quantization." ICML 2020. arxiv 1908.10396. https://arxiv.org/abs/1908.10396
5. Chen, Qi et al. "SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search." NeurIPS 2021. arxiv 2111.08566. https://arxiv.org/abs/2111.08566
6. Indyk, Piotr; Motwani, Rajeev. "Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality." STOC 1998.
7. Douze, Matthijs et al. "The Faiss library." arxiv 2401.08281, 2024. https://arxiv.org/abs/2401.08281
8. Pinecone. "Announcing the Pinecone Vector Database and $10M in Seed Funding." Pinecone Blog, 2021. https://www.pinecone.io/blog/announcing-vector-database/
9. Pinecone. "Announcing Our $100M Series B Funding to Build Long-Term Memory for AI." Pinecone Blog, April 2023. https://www.pinecone.io/blog/series-b/
10. Pinecone. "Edo Liberty to Spearhead Pinecone's Growing AI Ambitions; Appoints Ash Ashutosh as CEO." Pinecone Newsroom, September 2025. https://www.pinecone.io/newsroom/next-chapter/
11. Weaviate. "The History of Weaviate." Weaviate Blog. https://weaviate.io/blog/history-of-weaviate
12. PR Newswire. "Weaviate Raises $50 Million Series B Funding." April 2023. https://www.prnewswire.com/news-releases/weaviate-raises-50-million-series-b-funding-301803296.html
13. Qdrant. "Announcing Qdrant's $28M Series A Funding Round." Qdrant Blog, January 2024. https://qdrant.tech/blog/series-a-funding-round/
14. Qdrant. "About Us." https://qdrant.tech/about-us/
15. Qdrant. "Binary Quantization - Vector Search, 40x Faster." https://qdrant.tech/articles/binary-quantization/
16. TechCrunch. "Zilliz raises $60M, relocates to SF." August 2022. https://techcrunch.com/2022/08/24/zilliz-the-startup-behind-the-milvus-open-source-vector-database-for-ai-applications-raises-60m-and-relocates-to-sf/
17. Madrona. "Chroma's Jeff Huber on Vector Databases and Getting AI into Production." https://www.madrona.com/chromas-jeff-huber-on-vector-databases-and-getting-ai-into-production/
18. GitHub. "pgvector/pgvector." https://github.com/pgvector/pgvector
19. PostgreSQL News. "pgvector 0.7.0 Released!" https://www.postgresql.org/about/news/pgvector-070-released-2852/
20. Engineering at Meta. "Faiss: A library for efficient similarity search." March 2017. https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/
21. Vespa Blog. "Open Sourcing Vespa, Yahoo's Big Data Processing and Serving Engine." September 2017. https://blog.vespa.ai/open-sourcing-vespa-yahoos-big-data-processing/
22. TechCrunch. "Yahoo spins out Vespa, its search tech, into an independent company." October 2023. https://techcrunch.com/2023/10/04/yahoo-spins-out-vespa-its-search-tech-into-an-independent-company/
23. TechCrunch. "LanceDB, which counts Midjourney as a customer, is building databases for multimodal AI." May 2024. https://techcrunch.com/2024/05/15/lancedb-which-counts-midjourney-as-a-customer-is-building-databases-for-multimodal-ai/
24. Turbopuffer. "turbopuffer: fast search on object storage." https://turbopuffer.com/blog/turbopuffer
25. MongoDB. "Atlas Vector Search Generally Available." MongoDB Blog, December 2023. https://www.mongodb.com/blog/post/dedicated-search-nodes-vector-search-now-in-general-availability
26. Elastic. "Introducing approximate nearest neighbor search in Elasticsearch 8.0." Elastic Blog. https://www.elastic.co/blog/introducing-approximate-nearest-neighbor-search-in-elasticsearch-8-0
27. Redis. "Rediscover Redis for Vector Similarity Search." Redis Blog. https://redis.io/blog/rediscover-redis-for-vector-similarity-search/
28. OpenAI. "New embedding models and API updates." January 2024. https://openai.com/index/new-embedding-models-and-api-updates/
29. Voyage AI. "voyage-3-large: the new state-of-the-art general-purpose embedding model." January 2025. https://blog.voyageai.com/2025/01/07/voyage-3-large/
30. OpenSearch. "Introducing reciprocal rank fusion for hybrid search." OpenSearch Blog. https://opensearch.org/blog/introducing-reciprocal-rank-fusion-hybrid-search/
31. Microsoft Learn. "Hybrid Search Scoring (RRF) - Azure AI Search." https://learn.microsoft.com/en-us/azure/search/hybrid-search-ranking
32. ANN-Benchmarks. http://ann-benchmarks.com/
33. Big-ANN-Benchmarks. https://big-ann-benchmarks.com/neurips21.html
34. Pinecone. "What is a Vector Database?" Pinecone Documentation. https://www.pinecone.io/learn/vector-database/
35. Weaviate. "Distance Metrics in Vector Search." https://weaviate.io/blog/distance-metrics-in-vector-search
36. Milvus. "How to Choose Between IVF and HNSW." https://milvus.io/blog/understanding-ivf-vector-index-how-It-works-and-when-to-choose-it-over-hnsw.md
37. Pinecone. "Product Quantization: Compressing high-dimensional vectors by 97%." https://www.pinecone.io/learn/series/faiss/product-quantization/
38. Qdrant. "Vector Search [Benchmarks](/wiki/benchmarks)." https://qdrant.tech/benchmarks/
39. DEV Community. "What's Changing in Vector Databases in 2026." https://dev.to/actiandev/whats-changing-in-vector-databases-in-2026-3pbo
41. VentureBeat. "Six data shifts that will shape enterprise AI in 2026." https://venturebeat.com/data/six-data-shifts-that-will-shape-enterprise-ai-in-2026/
42. VentureBeat. "Pinecone founder Edo Liberty appoints Googler Ash as CEO." September 2025. https://venturebeat.com/data-infrastructure/pinecone-founder-edo-liberty-appoints-googler-ash-as-ceo
43. The Information. "Pinecone weighs sale at over $2 billion valuation." 2025.
44. VentureBeat. "Enterprise RAG rebuild: hybrid retrieval adoption tripled in Q1 2026." https://venturebeat.com/data/the-retrieval-rebuild-why-hybrid-retrieval-intent-tripled-as-enterprise-rag-programs-hit-the-scale-wall
45. LiquidMetal AI. "Vector Database Comparison." https://liquidmetal.ai/casesAndBlogs/vector-comparison/
46. DataCamp. "The 7 Best Vector Databases in 2026." https://www.datacamp.com/blog/the-top-5-vector-databases

