Vector database
Last reviewed
May 9, 2026
Sources
45 citations
Review status
Source-backed
Revision
v7 · 9,869 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 9, 2026
Sources
45 citations
Review status
Source-backed
Revision
v7 · 9,869 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: AI terms
A vector database is a special kind of computer storage that helps find things that are similar, like finding pictures that look like a cat or finding songs that sound happy. It is really good at helping computers understand what things mean, even if those things are words, pictures, or sounds.
Vector databases also help big computer brains called "large language models" remember things, so they can give better answers when you ask them questions. They can find what you are looking for really fast, even when there are lots and lots of items inside.
A vector database is a type of database specifically designed for storing and querying high-dimensional vector data, which is often used in artificial intelligence applications (AI apps). Complex data, including unstructured forms like documents, images, videos, and plain text, is growing rapidly. Traditional databases designed for structured data struggle to store and analyze complex data effectively, often requiring extensive keyword and metadata classification. Vector databases address this issue by transforming complex data into vector embeddings, which describe data objects in numerous dimensions. These databases are gaining popularity due to their ability to extend large language models (LLMs) with long-term memory and provide efficient querying for AI-driven applications.
As of 2026, over 68% of enterprise AI applications use vector databases to manage embeddings generated by large language models, computer vision systems, and recommendation engines. Industry estimates from research firms place the global vector database market between USD 2.6 billion and USD 3.2 billion in 2025, with most analyst forecasts pointing to a compound annual growth rate above 20% through the early 2030s. The category sits at the center of the retrieval-augmented generation stack and has become a standard component in modern AI infrastructure alongside model-serving platforms, orchestration frameworks, and feature stores.
The first generation of dedicated vector database vendors appeared between 2017 and 2022, including Faiss (Meta, 2017), Milvus (Zilliz, 2019), Weaviate (SeMI Technologies, 2019), Pinecone (2019), Qdrant (2021), and Chroma (2022). A second wave of relational database extensions arrived in parallel, led by pgvector for PostgreSQL (Andrew Kane, 2021), MongoDB Atlas Vector Search (general availability December 2023), Oracle Database 23ai vector type (2024), and SQL Server vector functions. By 2025, the divide between purpose-built vector engines and traditional databases with vector add-ons had narrowed considerably, and many production teams run a mix of both.
In a relational database, data is organized in rows and columns, while in a document database, it is organized in documents and collections. In contrast, a vector database stores arrays of floating-point numbers (or quantized representations of them) along with associated metadata, indexed for similarity rather than exact match. These databases can be queried with ultra-low latency, making them suitable for AI-driven applications.
A vector database indexes and stores vector embeddings for efficient retrieval and similarity search. In addition to traditional CRUD (create, read, update, and delete) operations and metadata filtering, vector databases enable the organization and comparison of any vector to one another or to the vector of a search query. This capability allows vector databases to excel at similarity search or vector search, providing more comprehensive search results that would not be possible with traditional search technology.
The fundamental query is the k-nearest-neighbor (kNN) lookup: given a query vector q and a target k, return the k vectors in the index whose distance to q is smallest under a chosen metric. For high-dimensional data, exact kNN requires comparing the query against every stored vector, which is impractical at scale. Vector databases therefore implement approximate nearest neighbor (ANN) search, which trades a small amount of recall for orders-of-magnitude gains in throughput and latency.
A long-standing challenge in vector search is the curse of dimensionality. As dimensionality grows, the volume of the embedding space grows exponentially, points drawn from typical distributions become roughly equidistant from one another, and the contrast between nearest and farthest neighbors collapses. In purely random high-dimensional spaces, distance-based nearest neighbor methods can fail to discriminate at all. Real-world embeddings produced by neural networks escape the worst of this problem because the data lies on lower-dimensional manifolds inside the ambient space, but the phenomenon still motivates the design of every modern ANN algorithm. Techniques such as random projections, learned partitions, navigable small-world graphs, and quantization all aim to exploit the manifold structure of real embeddings rather than treat them as uniformly distributed points.
Practitioners distinguish between vector libraries and vector databases. A library such as Faiss, Annoy, hnswlib, or ScaNN is an in-process toolkit that exposes APIs for index construction and search but leaves operational concerns (sharding, replication, persistence, multi-tenancy, security) to the application. A vector database wraps one or more such algorithms inside a managed runtime that handles those concerns, exposes a network API, and integrates with monitoring and access control. Faiss is the canonical library, while Pinecone, Weaviate, Milvus, Qdrant, and Chroma are canonical databases.
The performance of a vector database depends heavily on its indexing strategy. Three indexing families dominate the landscape: HNSW, IVF, and Product Quantization. Production systems often combine two or more of these to balance recall, latency, memory, and throughput.
| Algorithm | Family | Year | Origin | Typical Build Complexity | Typical Query Complexity | Memory Use |
|---|---|---|---|---|---|---|
| Flat (brute force) | Exact | classical | n/a | O(N) | O(N · d) per query | 1x raw vectors |
| LSH | Hashing | 1998 | Indyk and Motwani, STOC 1998 | O(N · L · d) | O(L · d) average | Multiple hash tables |
| IVF | Inverted file | 2010 | Jegou, Douze, Schmid (with PQ) | O(N · nlist · d) | O((nprobe / nlist) · N · d) | 1x raw + centroids |
| Product Quantization | Quantization | 2010 | Jegou, Douze, Schmid (TPAMI 2011) | O(N · d) after codebook training | O(M · K) lookup per query | Up to 64x compression |
| HNSW | Graph | 2016 | Malkov and Yashunin (arxiv 1603.09320) | O(N · log N) average | O(log N) average | 1.5 to 2x raw vectors |
| ScaNN | Tree + AQ | 2020 | Guo et al., ICML 2020 (arxiv 1908.10396) | O(N · d) with codebook | Sub-linear typical | Compressed |
| DiskANN (Vamana) | Graph on SSD | 2019 | Subramanya et al., NeurIPS 2019 | O(N · log N) | O(log N) with SSD reads | RAM for centroids only |
| SPANN | Inverted on disk | 2021 | Chen et al., NeurIPS 2021 (arxiv 2111.08566) | Hierarchical clustering + graph | Sub-linear with SSD reads | Centroids in RAM |
HNSW is a graph-based approximate nearest neighbor algorithm published by Yu. A. Malkov and D. A. Yashunin in March 2016 (arxiv 1603.09320, later in IEEE Transactions on Pattern Analysis and Machine Intelligence). It builds a multi-layered navigable graph over the vector space. Each layer contains a subset of the vectors, with the top layers holding fewer nodes and longer-range connections, and the bottom layers holding all nodes with short-range connections. During a search, the algorithm starts at the top layer and greedily traverses toward the query vector, descending through layers and refining the candidate set at each level.
HNSW offers several strengths that make it the default choice for most vector database deployments:
ef_search parameter controls how many candidates the algorithm considers during a query. Increasing ef_search improves recall at the cost of higher latency.M parameter controls the number of connections per node, and ef_construction controls the quality of the graph at build time. Higher values produce a more accurate but larger index.The main drawback of HNSW is memory consumption. Because the entire graph must reside in memory for fast traversal, HNSW typically requires 1.5 to 2 times the raw vector data size in RAM. For billion-scale datasets, this can become prohibitively expensive, which is why vendors increasingly pair HNSW with quantization to compress the in-memory representation.
HNSW is the primary index type in Qdrant, Weaviate, pgvector, Elasticsearch, OpenSearch, and Redis (RediSearch 2.4 and later), and is supported as an option in Milvus, Faiss, and Pinecone.
IVF is a partition-based indexing method derived from the inverted-file approach popularized in image retrieval research. It works by first running k-means clustering on the dataset to divide the vector space into a configurable number of clusters (called nlist). Each cluster is represented by a centroid, and every vector in the dataset is assigned to its nearest centroid. At query time, the algorithm computes the distance between the query vector and all centroids, selects the nprobe closest clusters, and then performs an exhaustive search only within those clusters.
IVF is well suited for scenarios where HNSW is impractical:
The main trade-off is that IVF requires a training phase (the k-means clustering step), which can be time-consuming for large datasets. Additionally, IVF generally achieves lower recall than HNSW at equivalent latency, because queries that fall near cluster boundaries may miss relevant vectors in neighboring clusters. Practitioners typically tune nlist near the square root of the dataset size and adjust nprobe to trade recall against latency.
IVF is available in Faiss, Milvus, and Oracle Database 23ai, among other systems.
Product Quantization is a lossy compression technique introduced by Hervé Jégou, Matthijs Douze, and Cordelia Schmid in their 2011 IEEE TPAMI paper (DOI 10.1109/TPAMI.2010.57). It reduces the memory footprint of vector indexes by up to 97%. PQ works by dividing each high-dimensional vector into a set of smaller subvectors and then quantizing each subvector independently using a separate codebook trained with k-means.
As an example, a 128-dimensional float32 vector occupies 512 bytes. PQ might split it into 8 subvectors of 16 dimensions each, train a 256-centroid codebook for each subvector, and replace each subvector with a single byte. The result compresses the vector to 8 bytes, a reduction of roughly 98%. At query time, PQ uses precomputed lookup tables to estimate distances against quantized database vectors without decompressing them. PQ is lossy and is rarely used alone; it is typically combined with IVF (as IVF-PQ) or HNSW to form a two-stage retrieval pipeline where the index identifies a candidate set and a re-ranking step refines results using full-precision vectors.
While conventional scalar quantization achieves up to 32x compression, PQ can provide compression levels of up to 64x, making it especially valuable for cost-sensitive deployments at very large scale. Faiss, Milvus, Pinecone, and OpenSearch all use product quantization internally.
In addition to product quantization, vector databases increasingly support scalar and binary forms of compression that trade more accuracy for further memory savings.
int8_hnsw).Binary quantization works best on embeddings of dimension 1024 or higher; on lower-dimensional vectors, the loss of information is too large to preserve useful similarity rankings.
ScaNN (Scalable Nearest Neighbors) was published by Ruiqi Guo and colleagues at Google Research in 2020 (ICML 2020, arxiv 1908.10396). It introduces an anisotropic vector quantization loss that penalizes the parallel component of a datapoint's residual relative to its orthogonal component, on the observation that database points with the largest inner products are more relevant for a given query. ScaNN combines tree-based partitioning with this quantization to deliver leading results on benchmarks such as glove-100-angular. It is available as an open-source library and underlies vector search in Google Cloud's Vertex AI Vector Search service.
DiskANN was published by Suhas Jayaram Subramanya, Rohan Kadekodi, Ravishankar Krishaswamy, and Harsha Vardhan Simhadri at Microsoft Research at NeurIPS 2019. It introduced a graph-based algorithm called Vamana with a smaller search radius than NSG and HNSW, designed specifically for SSD storage. On the SIFT1B billion-point benchmark, DiskANN serves more than 5,000 queries per second with under 3 ms mean latency at 95% 1-recall@1 on a 16-core machine with 64 GB of RAM and an inexpensive SSD. The Microsoft DiskANN code is open source on GitHub, and the algorithm is integrated into Milvus, pgvectorscale (the Timescale extension on top of pgvector), and other systems.
SPANN was published in 2021 (NeurIPS 2021, arxiv 2111.08566) by researchers at Microsoft, Peking University, Tencent, and Baidu. It implements a hybrid memory-disk inverted-file system: centroids of the posting lists are kept in RAM via a Microsoft SPTAG index, while the larger posting lists themselves live on disk. A hierarchical balanced clustering scheme keeps posting lists similar in length, and a query-aware pruning scheme avoids reading unnecessary lists at search time. SPANN can reach 90% recall@1 in around one millisecond on billion-scale datasets with only 32 GB of RAM, more than twice as fast as DiskANN at the same memory budget on three billion-scale datasets.
LSH is the classical hashing-based ANN technique, introduced by Piotr Indyk and Rajeev Motwani in 1998 (STOC 1998). LSH families are designed so that hash collisions are maximized for nearby points and minimized for distant points. Random-projection LSH for Euclidean and Manhattan distances uses random hyperplanes that partition the space into buckets, while MinHash and SimHash variants target Jaccard and cosine similarities. LSH is fast and memory-efficient but tends to deliver lower recall than graph-based methods at equivalent throughput, and it has largely been displaced by HNSW in modern vector databases. It remains useful for very high-dimensional sparse data, near-duplicate detection in web crawls, and theoretically grounded approximate search.
Other algorithms appear in production: Flat (brute-force) gives perfect recall by exhaustive comparison and is useful below a few hundred thousand vectors; Annoy uses random-projection forests of binary trees and is popular for read-heavy workloads from Spotify; NSG (Cong Fu et al., 2017) is a graph algorithm focused on small index size; hnswlib is a reference C++ implementation of HNSW that several engines embed; and RaBitQ is a newer quantization scheme with theoretical recall guarantees that has begun to appear in research benchmarks.
Vector databases rely on distance or similarity metrics to determine how close two vectors are in the embedding space. Choosing the correct metric is important because it should match the metric used during the training of the embedding model. Using cosine similarity with a model trained for inner-product retrieval, for example, can degrade recall noticeably.
| Metric | Formula Summary | Best For | Notes |
|---|---|---|---|
| Cosine Similarity | cos(θ) = (a · b) / (‖a‖ ‖b‖); measures angle, ignores magnitude | Text similarity, document comparison, NLP tasks | Most common default; insensitive to vector length |
| Euclidean Distance (L2) | √(Σ (a_i - b_i)²); straight-line distance | Clustering, anomaly detection, spatial data | Sensitive to magnitude; penalizes large absolute differences |
| Dot Product (Inner Product) | a · b = Σ a_i b_i; sum of element-wise products | Recommendation systems, ranking tasks | Equivalent to cosine similarity when vectors are normalized |
| Manhattan Distance (L1) | Σ | a_i - b_i | ; sum of absolute differences |
| Hamming Distance | Number of positions where two binary vectors differ | Binary embeddings, hash-based search | Used with binary quantization for speed |
| Jaccard | A ∩ B | / | |
| Squared Euclidean | Σ (a_i - b_i)² | Same ranking as L2 with cheaper math | Skips the square root for speed |
When vectors are normalized to unit length, cosine similarity and dot product produce identical rankings. Many embedding models, such as OpenAI's text-embedding-3 family and Sentence Transformers trained with cosine objectives, output normalized vectors, so either metric can be used interchangeably in those cases. For maximum-inner-product search, where vectors are not normalized and magnitude carries information, dot product is preferred and is the metric ScaNN's anisotropic quantization is designed for.
Quick guide to picking a distance metric:
- text embeddings (most modern models): cosine or dot product (interchangeable when normalized)
- recommendation systems with biased magnitudes: dot product
- clustering or geometric data: Euclidean (L2)
- compressed binary representations: Hamming
- sparse hash-based fingerprints: Jaccard
Beyond similarity search, modern vector databases support a broader set of operations that distinguish them from pure ANN libraries.
Vectors can be inserted, updated, deleted, and fetched by primary key. Most engines accept an upsert operation that inserts a vector if its ID is new and replaces the existing entry otherwise. Because graph indexes are sensitive to deletions, engines such as Qdrant, Weaviate, and Milvus implement tombstoning with periodic background compaction.
Each vector typically carries a payload of structured metadata such as document IDs, timestamps, tags, languages, prices, or access-control attributes. Filtering this metadata alongside vector similarity is essential for production retrieval. Approaches include:
The choice between strategies depends on filter selectivity, ANN parameters, and dataset size, and most engines pick one automatically based on workload heuristics.
Hybrid search combines lexical retrieval (typically BM25 or SPLADE) with dense vector search. Dense embeddings capture semantic similarity but can miss exact-match cues such as product SKUs, person names, or rare technical terms. Lexical search excels at exact match but lacks semantic generalization. Combining the two consistently outperforms either alone.
The fusion step combines two ranked lists into one. The most popular method is Reciprocal Rank Fusion (RRF), which scores a document as the sum of 1 / (k + rank_i) across the two retrievers, with k typically set to 60. RRF is insensitive to score-scale differences and requires no tuning. Other approaches include weighted score normalization, learned-to-rank rerankers, and convex combinations. Hybrid retrieval has become the default in production RAG systems through 2025 and 2026; one VentureBeat survey reported hybrid retrieval intent tripling among enterprise teams in Q1 2026.
RRF score for a document d:
RRF(d) = sum over retrievers i of 1 / (k + rank_i(d))
default k = 60
Weaviate, Milvus, Qdrant, Elasticsearch, OpenSearch, and pgvector (via PostgreSQL full-text search and the pgsearch extension) all support hybrid search natively.
A single record can carry multiple embeddings: one for the title, one for the body, one from a multilingual model, one from a multimodal model. Vector databases support this through named vectors (Qdrant), multi-vector fields (Weaviate, Pinecone), or separate collections joined by foreign keys (pgvector). Multi-vector queries typically combine per-vector similarities via maxSim, average, or weighted aggregation. ColBERT-style late-interaction models, which produce one vector per token, are increasingly supported as a special case.
Several vector databases have emerged to cater to the growing demand for AI applications. The following table compares the most widely adopted systems as of early 2026.
| Feature | Pinecone | Weaviate | Milvus | Qdrant | Chroma | pgvector |
|---|---|---|---|---|---|---|
| Type | Fully managed (serverless) | Open-source + managed cloud | Open-source + managed (Zilliz) | Open-source + managed cloud | Open-source (embedded) | PostgreSQL extension |
| Founded | 2019 | 2019 (project from 2016) | 2019 (Zilliz 2017) | 2021 | 2022 | 2021 |
| Founders | Edo Liberty | Bob van Luijt, Etienne Dilocker, Micha Verhagen | Charles Xie | Andrey Vasnetsov, Andre Zayarni | Jeff Huber, Anton Troynikov | Andrew Kane |
| License | Proprietary | BSD-3-Clause | Apache 2.0 | Apache 2.0 | Apache 2.0 | PostgreSQL License |
| Written In | Proprietary (undisclosed) | Go | Go / C++ | Rust | Rust (rewritten 2025) | C |
| Index Types | Proprietary (auto-tuned) | HNSW | HNSW, IVF-Flat, IVF-PQ, DiskANN, GPU indexes | HNSW (with quantization) | HNSW | HNSW, IVFFlat |
| Max Dimensions | 20,000 | Configurable | 32,768 | 65,536 | Configurable | 16,000 (with halfvec up to 4,000 indexed) |
| Distance Metrics | Cosine, Euclidean, Dot Product | Cosine, L2, Dot, Hamming, Manhattan | Cosine, L2, IP, Jaccard, Hamming | Cosine, Euclidean, Dot Product, Manhattan | Cosine, L2, IP | Cosine, L2, Inner Product |
| Hybrid Search | Sparse vectors (2024+) | Native BM25 + vector (BlockMax WAND) | Sparse-BM25 (Milvus 2.5+) | Named vector hybrid (v1.9+) | Metadata + full-text | SQL + vector via standard PostgreSQL |
| Metadata Filtering | Single-stage with bitmap indexes | GraphQL-based filtering | Boolean expressions | Rich JSON payload filtering | Metadata filtering | Standard SQL WHERE clauses |
| Multi-Tenancy | Namespaces | Native tenant isolation | Partitions / collections | Payload-based or collection-based | Collections | Schema-based (standard PostgreSQL) |
| Cloud Regions | AWS, GCP, Azure | AWS, GCP, Azure | Zilliz: AWS, GCP, Azure | AWS, GCP, Azure | Chroma Cloud (limited) | Any PostgreSQL host |
| GitHub Stars (2026) | N/A (closed source) | ~13,000 | ~38,000 | ~26,000 | ~22,000 | ~16,000 |
| Best For | Zero-ops managed vector search | Hybrid/semantic search with GraphQL | Billion-scale deployments | Cost-efficient filtered search | Rapid prototyping | Adding vectors to existing PostgreSQL |
Pinecone was founded in 2019 by Edo Liberty, who previously led the Algorithms group at AWS AI Labs and worked at Yahoo Research. The company raised a USD 10 million seed in 2021, a USD 28 million Series A in 2022, and a USD 100 million Series B in April 2023 led by Andreessen Horowitz at a USD 750 million valuation, bringing total disclosed funding to roughly USD 138 million. Pinecone uses proprietary, auto-tuned indexing and offers single-stage metadata filtering with bitmap indexes, namespace-based multi-tenancy, and scale-to-zero. In 2024 it replaced pod-based pricing with a serverless model, and in 2025 introduced sparse vector support for hybrid search. In September 2025, founder Edo Liberty moved to chief scientist and former Google director Ash Ashutosh became CEO; The Information reported that Pinecone had engaged bankers about a potential sale at a valuation north of USD 2 billion.
Weaviate is an open-source vector database written in Go. The Weaviate project was started in March 2016 by Bob van Luijt, who co-founded SeMI Technologies (short for Semantic Machine Insights) in 2019 with Etienne Dilocker and Micha Verhagen; the company later renamed itself Weaviate B.V. SeMI raised a USD 16 million Series A in February 2022 led by Cortical Ventures and a USD 50 million Series B in April 2023 led by Index Ventures with Battery Ventures, NEA, Zetta Venture Partners, and ING Ventures participating. Weaviate provides a GraphQL API, native BM25 plus vector hybrid search using BlockMax WAND, and built-in vectorization modules for OpenAI, Cohere, and Hugging Face. Weaviate Cloud offers managed hosting across AWS, GCP, and Azure.
Milvus is an open-source, cloud-native vector database under the Apache 2.0 license, written primarily in Go and C++. It was created by Charles Xie, who founded Zilliz in 2017 and previously worked at Oracle as a founding engineer of the Oracle 12c project. Milvus was first released in 2019 and donated to the LF AI and Data Foundation in 2020. Zilliz raised USD 43 million in its initial Series B and an additional USD 60 million Series B extension in August 2022 led by Prosperity7 Ventures, with Pavilion Capital, Hillhouse Capital, 5Y Capital, and Yunqi Capital participating, bringing total funding to roughly USD 113 million. Milvus has a distributed architecture that separates compute and storage and supports the widest range of index types, including HNSW, IVF-Flat, IVF-PQ, DiskANN, and GPU-accelerated indexes. Milvus 2.5 introduced native sparse-BM25 hybrid search. Zilliz Cloud provides managed versions with serverless and dedicated deployment options.
Qdrant is an open-source vector database written in Rust, founded in 2021 in Berlin by Andrey Vasnetsov (CTO) and Andre Zayarni (CEO). The company raised an early USD 7.5 million seed round and a USD 28 million Series A in January 2024 led by Spark Capital with participation from Unusual Ventures and 42CAP. Qdrant is known for its compact memory footprint and strong filtering performance.
Qdrant uses HNSW indexing with built-in scalar, product, and binary quantization options. Its payload-based filtering system supports rich JSON queries and can be combined with vector search in a single request. Qdrant v1.9 introduced named vector hybrid search, and 2025 releases added GPU-accelerated index construction. The Qdrant Cloud managed service offers a permanent free tier with 1 GB of vector storage.
Chroma is an open-source, embedded vector database designed for rapid prototyping and small-to-medium applications. It was founded on April 1, 2022 by Jeff Huber (CEO) and Anton Troynikov (CTO) in San Francisco. The company raised a USD 2.3 million pre-seed in May 2022 and an USD 18 million seed round led by Quiet Capital with participation from Naval Ravikant and others. The first GitHub commit landed in October 2022, and the initial release shipped on October 22, 2022.
Originally written in Python, Chroma underwent a major rewrite in Rust during 2025, delivering approximately 4x faster writes and queries. Chroma provides a simple, NumPy-like API with built-in metadata filtering and full-text search. It is commonly used in local development environments and as an embedded store in LangChain and LlamaIndex workflows. Chroma Cloud launched in 2025 with a credit-based pricing model.
pgvector is an open-source PostgreSQL extension authored by Andrew Kane and first released as version 0.1.0 on April 20, 2021. It adds vector similarity search capabilities to any PostgreSQL database. pgvector supports HNSW (added in version 0.5.0, August 2023) and IVFFlat indexes, with distance metrics including cosine, L2, and inner product. The companion pgvectorscale extension (developed by Timescale, now Tiger Data) adds DiskANN-based indexing and Statistical Binary Quantization, enabling competitive performance on datasets of up to 50 million vectors.
pgvector supports standard vectors, half-precision vectors (halfvec, up to 4,000 indexed dimensions), sparse vectors (sparsevec), and binary vectors (bit type, up to 64,000 indexed dimensions). Because pgvector runs inside PostgreSQL, developers can store vectors alongside relational data and query both with standard SQL, eliminating the need for a separate vector database in many use cases. pgvector is included by default in managed PostgreSQL offerings from AWS RDS, Google Cloud SQL, Azure Database for PostgreSQL, Supabase, Neon, and Tiger Data.
Faiss (Facebook AI Similarity Search) is an open-source library released by Meta AI Research in March 2017. Hervé Jégou initiated the project and wrote the first implementation, with Matthijs Douze implementing most of the CPU code. The original release set a new state of the art on billion-scale benchmarks (about 8.5x faster than the previous best). Faiss is a library rather than a standalone database; it provides reference implementations of IVF, PQ, IVF-PQ, OPQ, HNSW, and many composite indexes. It is the algorithmic foundation for several commercial systems, including parts of Pinecone and Milvus.
Vespa is an open-source big-data serving engine developed at Yahoo and open-sourced in September 2017 under the Apache 2.0 license. Yahoo had built Vespa from technology acquired with Overture and the AlltheWeb search engine in the mid-2000s. Jon Bratseth, the distinguished architect who led Vespa's development, became CEO when Yahoo spun Vespa out as an independent company in October 2023, and Vespa.ai raised a USD 31 million Series A from Blossom Capital in November 2023. Vespa supports HNSW, learned ranking models, lexical search, and tensor evaluation in the same engine, and is used by Spotify, OkCupid, and Wix, among others.
LanceDB is an open-source, serverless vector database founded in 2022 by Chang She and Lei Xu. Chang She is one of the original co-authors of the pandas library, and Lei Xu was a core HDFS contributor and led ML infrastructure at Cruise. LanceDB is built on the Lance columnar file format, an alternative to Parquet optimized for ML workloads. The company raised approximately USD 41 million across pre-seed, seed, and Series A rounds, and lists Midjourney among its production customers. LanceDB targets multimodal workloads where image, text, and metadata vectors live alongside source media, with object storage as the primary persistence tier.
Turbopuffer is a serverless vector and full-text search engine founded in 2023 by Simon Hørup Eskildsen and Justin Li, both former Shopify engineers. Turbopuffer stores its primary copy of data in object storage such as Amazon S3 and layers SSD and memory caches on top, achieving roughly 10x lower storage cost than RAM-resident vector databases. The company is backed by Lachy Groom, Thrive Capital, and others. As of 2025 and 2026, Turbopuffer reported powering search at Cursor, Notion, Linear, and other large AI-native applications, handling more than 2.5 trillion documents, 10 million writes per second, and over 10,000 queries per second in production.
MongoDB Atlas Vector Search reached general availability on December 4, 2023, after a public preview launched in June 2023. It allows MongoDB users to add vector indexes to existing collections and run $vectorSearch aggregation pipelines that combine vector similarity with MongoDB's existing query operators and aggregation framework. By late 2024, MongoDB reported Atlas Vector Search had become one of the most widely deployed vector solutions, in part because existing MongoDB users could adopt vector search without standing up a new database.
Elasticsearch added a dense_vector field type in 7.x and introduced HNSW-based ANN search in version 8.0 (February 2022). The 9.0 release in 2025 made int8_hnsw the default, and the engine now also supports BBQ (Better Binary Quantization) for high-dimensional embeddings. OpenSearch, the AWS-led fork of Elasticsearch, ships an analogous knn_vector type with HNSW (via hnswlib), Faiss, and Lucene-native back ends, plus first-class support for Reciprocal Rank Fusion. Both engines are popular for teams that already operate an inverted-index search cluster and want to add vectors without introducing a new system.
Redis added vector similarity search in RediSearch 2.4, announced at RedisDays NY 2022. The module supports both flat (brute force) and HNSW indexes, with cosine, inner product, and Euclidean distances. Redis vector search is commonly used as a low-latency cache for embeddings or as an in-memory primary store for small-to-medium corpora, with Redis Enterprise Cloud and Azure Cache for Redis Enterprise offering managed deployments.
| System | Type | Notes |
|---|---|---|
| Annoy | Library | Random-projection forest, by Erik Bernhardsson at Spotify; widely used for read-heavy use cases |
| ScaNN | Library | Google's open-source ANN library; underlies Vertex AI Vector Search |
| Snowflake | Data warehouse | VECTOR data type and VECTOR_COSINE_SIMILARITY function (2024) |
| Databricks Vector Search | Lakehouse | Unity Catalog-aware vector index over Delta tables |
| Oracle Database 23ai | Relational DB | Native VECTOR data type and AI Vector Search (2024) |
| SQL Server | Relational DB | VECTOR_DISTANCE function and ANN indexes (2025) |
| Amazon S3 Vectors | Object store | S3-native vector index and query API (2025) |
Vector database pricing varies significantly depending on whether the service is fully managed, self-hosted, or serverless. The following table summarizes pricing as of early 2026.
| Database | Free Tier | Paid Starting Price | Pricing Model | Estimated Cost at Scale (1B vectors, 100 QPS) |
|---|---|---|---|---|
| Pinecone | Starter plan (limited) | $0.33/GB storage + $8.25/1M read units + $2.00/1M write units | Serverless consumption-based | ~$3,500/month (fully managed) |
| Weaviate | 14-day trial | From $25/month | AIU-based or serverless consumption | ~$2,200/month (managed); ~$800/month (self-hosted + ops overhead) |
| Milvus / Zilliz | Zilliz free plan (up to 5 GB) | Zilliz serverless from ~$89/month; dedicated from ~$114/month | CU-based (~$0.15/CU/hour) | Varies by configuration (self-hosted: infrastructure cost only) |
| Qdrant | 1 GB free forever (no credit card) | From $25/month; Hybrid Cloud from $99/month | Storage + compute based | Competitive; lower than Pinecone at equivalent scale |
| Chroma | Open-source (free); Chroma Cloud $5 free credits | Usage-based | Credit-based | Primarily for small/medium workloads |
| pgvector | Free (open-source extension) | Infrastructure cost only | N/A (runs on your PostgreSQL instance) | Infrastructure cost only; ~25% the cost of Pinecone at equivalent recall |
| Turbopuffer | Limited free tier | Usage-based on object storage | Storage + query consumption | ~10x cheaper than RAM-resident managed services |
| MongoDB Atlas Vector Search | Atlas free tier | From dedicated cluster cost (~$57/month) | Atlas cluster pricing + dedicated search nodes | Varies with workload |
| Elasticsearch / OpenSearch | Self-hosted free | From ~$95/month managed | Cluster size + storage | Higher than dedicated vector DBs at equivalent recall |
Usage-based cloud pricing can become volatile at scale. A RAG application handling 150 million queries per month could incur USD 5,000 to USD 6,000 in monthly costs on a serverless platform, prompting some organizations to evaluate reserved capacity or self-hosted alternatives. Object-storage-backed engines such as Turbopuffer and Amazon S3 Vectors target this exact pain point by trading a small amount of latency for a much lower price per stored vector.
Vector database benchmarks measure queries per second (QPS), recall (the fraction of true nearest neighbors found), and latency (typically p50, p95, or p99). Results depend heavily on dataset size, vector dimensionality, hardware, and precision thresholds, so benchmarks should be compared only when similar configurations are used.
| Benchmark | Maintainer | Scope | Notes |
|---|---|---|---|
| ANN-Benchmarks | Erik Bernhardsson, Martin Aumüller | Library-level | Plots recall vs QPS on standard datasets like SIFT-1M, GIST-1M, glove-100; widely cited |
| Big-ANN-Benchmarks | NeurIPS billion-scale challenge | Billion-scale | Tracks T1 (in-memory), T2 (out-of-memory), T3 (custom hardware) |
| VectorDBBench | Zilliz, open source | Database-level | End-to-end benchmark across managed and self-hosted vector DBs |
| ann-benchmarks-private | Vendor-run | Database-level | Used in marketing claims; reproducibility varies |
| BEIR | Information retrieval benchmark | Retrieval quality | Measures end-to-end retrieval, including embeddings |
| MTEB | Massive Text Embedding Benchmark | Embedding quality | Measures embedding model performance on retrieval, classification, clustering |
| Database | Dataset | Vectors | Dimensions | QPS | Recall | Latency (p95) | Source |
|---|---|---|---|---|---|---|---|
| Qdrant | dbpedia-openai-1M | 1M | 1,536 | 626 | 99% | ~1 ms (p99) | Qdrant Benchmarks |
| pgvectorscale | Custom 50M | 50M | 768 | 471 | 99% | Low (unspecified) | Timescale Benchmarks |
| Qdrant | Custom 50M | 50M | 768 | 41 | 99% | Higher | Timescale Benchmarks |
| Milvus / Zilliz | 768-dim text embeddings | Varies | 768 | High | 99% | <10 ms (p50) | VectorDBBench |
| Pinecone | 768-dim text embeddings | Varies | 768 | High | 99% | ~7 ms (p99) | Pinecone Documentation |
| Weaviate | 768-dim text embeddings | Varies | 768 | Moderate | 95%+ | ~50 ms | VectorDBBench |
| DiskANN | SIFT1B | 1B | 128 | >5,000 | 95%+ (1-recall@1) | <3 ms (mean) | NeurIPS 2019 paper |
| SPANN | SIFT1B | 1B | 128 | High | 90% (recall@1, recall@10) | ~1 ms | NeurIPS 2021 paper |
Key observations from recent benchmarks:
A vector database is only as useful as the embeddings it stores. Most production systems pair a vector database with a managed or open-source embedding model. The most widely used models as of 2026 include:
| Model | Provider | Dimensions | Context | Notes |
|---|---|---|---|---|
| text-embedding-3-large | OpenAI | up to 3,072 (Matryoshka) | 8K tokens | Default OpenAI text embedding; supports dimension truncation |
| text-embedding-3-small | OpenAI | up to 1,536 (Matryoshka) | 8K tokens | Cheaper sibling of text-embedding-3-large |
| voyage-3-large | Voyage AI | 1,024 (Matryoshka, with int8/binary variants) | 32K tokens | Outperforms text-embedding-3-large by ~10% on Voyage's 100-dataset benchmark |
| embed-english-v3.0 | Cohere | 1,024 | 512 tokens | Strong for English RAG; binary and int8 variants available |
| embed-multilingual-v3.0 | Cohere | 1,024 | 512 tokens | 100+ languages |
| all-MiniLM-L6-v2 | Sentence Transformers | 384 | 256 tokens | Open weights; common baseline |
| BGE-M3 | BAAI | 1,024 | 8K tokens | Open weights; strong multilingual performance |
| nomic-embed-text-v1.5 | Nomic | 768 (Matryoshka) | 8K tokens | Open weights; popular for self-hosted RAG |
| Gemini text-embedding | up to 3,072 | 2K tokens | Tightly integrated with Vertex AI Vector Search |
Matryoshka representation learning, used by OpenAI, Voyage AI, and Nomic, encodes the most semantically important information into the first dimensions of each vector. Truncating a 3,072-dimensional embedding to 256 dimensions preserves most of its quality, turning dimension count into a runtime knob that lets teams trade storage cost for retrieval quality.
One of the most significant applications of vector databases is retrieval-augmented generation (RAG), an architecture that combines external knowledge retrieval with large language models to produce more accurate and grounded responses.
A RAG pipeline operates through four stages:
RAG applications demand sub-100 ms retrieval latency to avoid adding perceptible delay to the LLM's response time. Vector databases are built for exactly this workload, with in-memory indexes, SIMD-accelerated distance calculations, and distributed architectures that scale horizontally. Traditional databases and search engines can perform keyword matching, but they cannot capture the semantic meaning of queries the way vector similarity search does.
Several advanced techniques have emerged to improve RAG quality:
Vector databases support a broad set of applications across industries.
Unlike lexical search, which relies on exact word or string matches, semantic search uses the meaning and context of a search query or question. Vector databases use Natural Language Processing models to store and index vector embeddings, allowing for more accurate and relevant search results. E-commerce, support knowledge bases, intranet search, and code search all rely on semantic retrieval at scale.
Vector databases facilitate the search and retrieval of unstructured data like images, audio, video, and JSON, which can be challenging to classify and store in traditional databases. Image-similarity search powers reverse-image lookup, content moderation, fashion recommendation, and visual product search.
By finding similar items or users based on nearest matches in a learned embedding space, vector databases power recommendation engines for online retailers and streaming media services. Two-tower models, where users and items each have their own embedding tower, are typically deployed by computing user vectors at request time and looking up nearest items in a vector index.
Vector similarity search can be used to find near-duplicate records for applications such as removing duplicate items from a catalog, deduplicating user-generated content, and matching customer records across systems (entity resolution).
Vector databases can identify anomalies in applications used for threat assessment, fraud detection, and IT operations by finding objects that are distant or dissimilar from expected results. Time-series embeddings, log embeddings, and network flow embeddings are all routinely indexed to detect outliers.
GitHub Copilot, Cursor, Sourcegraph Cody, and other AI coding tools embed code and documentation into vector indexes to support semantic code search, repository-aware completion, and chat over codebases. Cursor and Notion are both reported users of Turbopuffer.
One of the primary reasons for the increasing popularity of vector databases is their ability to extend large language models (LLMs) with long-term memory. By providing a general-purpose model, such as OpenAI's GPT-4, Anthropic's Claude, Meta's LLaMA, or Google's Gemini, users can store their own data in a vector database. When prompted, the database can query relevant documents to update the context, customizing the final response and providing the AI with long-term memory.
In addition, vector databases integrate with orchestration frameworks like LangChain, LlamaIndex, Haystack, and Semantic Kernel, which combine multiple LLMs, retrievers, and tools into multi-step pipelines. These frameworks abstract over individual vector databases and let teams swap engines without changing the application code.
Vector databases employ algorithms to index and retrieve vectors efficiently. Accuracy, latency, or memory usage may need to be prioritized depending on specific use cases. Common similarity and distance metrics used in vector indexes are Euclidean distance, cosine similarity, and dot products.
Approximate Nearest Neighbor (ANN) search is a popular technique to balance precision and performance. ANN algorithms, such as HNSW, IVF, or PQ, focus on improving specific performance properties like memory reduction or fast and accurate search times. Composite indexes combine several components and are often used to achieve optimal performance for a given use case.
Single-stage filtering is essential for effective vector databases, as it enables users to limit search results based on vector metadata. It combines the accuracy of pre-filtering with the speed of post-filtering, merging vector and metadata indexes into a single index for optimal performance.
REST and gRPC APIs allow vector databases to be accessed from any environment capable of making network calls, with native clients in Python, Java, Go, JavaScript, and Rust. Production deployments expose Prometheus or OpenTelemetry metrics, structured logs, and audit trails. Multi-tenancy support typically includes per-collection access control, namespace-level encryption, and customer-managed keys (BYOK) on managed services. SOC 2, ISO 27001, HIPAA, and GDPR-aligned operations are common requirements for enterprise deployments.
| Aspect | Relational DB | Document DB | Search Engine | Vector Database |
|---|---|---|---|---|
| Primary data unit | Row | Document | Document with inverted index | Vector with payload |
| Primary query | SELECT with WHERE | find with predicate | match query (BM25) | k-nearest-neighbor |
| Index type | B-tree, hash | B-tree, geo | Inverted index | HNSW, IVF, PQ, DiskANN |
| Distance / scoring | Equality, ranges | Predicate match | TF-IDF, BM25 | Cosine, L2, dot product |
| Schema | Strict | Flexible | Mapped fields | Vectors plus payload |
| Scale | Vertical first, sharded | Horizontal | Horizontal | Horizontal |
| Strength | Transactions, joins | Flexible documents | Full-text search | Semantic similarity |
In practice, no production AI system relies on only one of these. The polystore pattern (relational data in PostgreSQL, vectors in pgvector or a dedicated engine, full-text in OpenSearch or in the same engine) has become standard. PostgreSQL with pgvector blurs the line further by hosting transactional, lexical, and vector workloads in a single database.
The dominant capacity drivers for a vector database are vector count, dimensionality, index type, replication factor, and quantization. As a worked example, 100 million 1,536-dimensional float32 vectors occupy roughly 614 GB of raw data; with HNSW (about 1.7x overhead), the index needs around 1 TB of RAM. With int8 scalar quantization that drops to roughly 150 GB, and binary quantization with full-precision re-ranking can fit in around 19 GB.
Vector indexes vary in how cleanly they handle updates. HNSW supports incremental insertions naturally but degrades with frequent deletions; engines compensate with periodic rebuilds. IVF requires retraining when the data distribution shifts substantially, although small inserts can use the existing centroids. DiskANN's Vamana graph supports incremental updates through the SPFresh extension. Engines often expose a flush or build operation that the application can trigger after bulk inserts.
The vector database landscape has undergone significant changes in 2025 and early 2026, driven by the rapid adoption of generative AI and AI agents.
One of the most important trends is the convergence of vector search into traditional relational and document databases. PostgreSQL (via pgvector and pgvectorscale), MongoDB (via Atlas Vector Search), Oracle 23ai, Snowflake, Databricks, and Microsoft SQL Server have all integrated native vector support. PostgreSQL was the most-used database among developers in 2025 at 46.5% adoption, and in 2025 alone Snowflake and Databricks spent approximately USD 1.25 billion acquiring PostgreSQL-focused companies.
Production teams have converged on a polystore architecture where the relational database handles ACID transactions and structured queries while the vector database handles embedding search and semantic retrieval. By 2026, this pattern has become the default for teams shipping AI-augmented products.
The rise of AI agents has created new demands on vector database infrastructure. AI agents generate roughly 10x more queries than human users, forcing architectural rethinks around throughput, connection pooling, and semantic caching. Vector databases now serve as long-term memory systems for agents, storing conversation history, tool-use logs, and retrieved context across sessions. Patterns such as semantic caching (returning a cached LLM response when an incoming query matches a recent question above a similarity threshold) reduce both cost and latency for agent workloads.
A new generation of engines builds primarily on object storage rather than block-attached SSDs. Turbopuffer, launched in 2023, demonstrated that an object-storage-first architecture could serve sub-10 ms p50 queries while costing roughly 10x less than RAM-resident managed services. Amazon S3 Vectors, announced in 2025, takes the same approach inside the S3 service itself. This pattern is well suited to the long-tail, multi-tenant workloads typical of AI-native SaaS products.
One of the most notable developments in 2026 is the emergence of graph-enhanced vector retrieval, which combines vector similarity search with knowledge graph traversal. This approach captures both semantic similarity (via vectors) and structural relationships (via graph edges), producing higher-quality retrieval for complex multi-hop queries. Multiple vector database vendors have begun integrating graph capabilities or partnering with graph database providers, and Microsoft's open-source GraphRAG project popularized the pattern in 2024.
Comprehensive edge-native vector database solutions remain scarce despite a USD 168 billion edge computing market and 39 billion projected connected IoT devices by 2030. Lightweight engines such as LanceDB, Chroma in embedded mode, and DuckDB-VSS target this segment for healthcare, autonomous systems, manufacturing, and retail.
Chroma's 2025 Rust rewrite delivered approximately 4x performance improvements. Qdrant and Milvus have added more aggressive quantization options (binary quantization, scalar quantization) to reduce memory costs. pgvectorscale's DiskANN-based indexing has brought competitive performance to PostgreSQL at a fraction of the cost of managed vector databases. Across the board, vendors are investing in techniques to reduce the cost per query as AI workloads scale.
By mid-2025, with the vector database market concentrated and well-defined, pure-play vendors faced consolidation pressure. The Information reported that Pinecone had engaged bankers about a potential sale, with speculation about a valuation north of USD 2 billion. Possible suitors mentioned in press reports included Oracle, IBM, MongoDB, Snowflake, and Databricks. The September 2025 appointment of Ash Ashutosh as Pinecone's CEO and the founder's transition to chief scientist were widely read as positioning the company for either a strategic exit or a deeper enterprise push.
| Term | Definition |
|---|---|
| Embedding | A vector representation of a piece of data, produced by a neural network |
| ANN | Approximate Nearest Neighbor; algorithms that trade some recall for large speedups |
| kNN | k-Nearest Neighbor; the exact version of nearest-neighbor search |
| HNSW | Hierarchical Navigable Small World; the dominant graph-based ANN algorithm |
| IVF | Inverted File Index; partition-based ANN method |
| PQ | Product Quantization; subvector codebook compression method |
| LSH | Locality-Sensitive Hashing; classical hashing-based ANN approach |
| Recall | Fraction of true nearest neighbors found by an approximate search |
| QPS | Queries Per Second; throughput metric |
| BM25 | A classical lexical ranking function used in full-text search |
| RRF | Reciprocal Rank Fusion; method for combining ranked lists |
| Hybrid search | Combination of lexical (BM25) and dense vector retrieval |
| Pre-filter / Post-filter | Whether metadata constraints are applied before or after ANN |
| Quantization | Lossy compression of vectors to reduce memory and compute |
| Matryoshka | Embedding training scheme that nests information from coarse to fine dimensions |
| Polystore | An architecture combining multiple databases for different workloads |