See also: AI terms
A vector database is a special kind of computer storage that helps find things that are similar, like finding pictures that look like a cat or finding songs that sound happy. It's really good at helping computers understand what things mean, even if they are in different forms like words, pictures, or sounds.
Vector databases help big computer brains called "large language models" remember things for a long time, so they can give better answers when you ask them questions. They can also help find things that are similar or different, which is useful for things like shopping websites and spotting unusual activities.
These databases are like magic boxes that can find what you're looking for really fast, even when you have lots and lots of things inside.
A vector database is a type of database specifically designed for storing and querying high-dimensional vector data, which is often used in artificial intelligence applications (AI apps). Complex data, including unstructured forms like documents, images, videos, and plain text, is growing rapidly. Traditional databases designed for structured data struggle to store and analyze complex data effectively, often requiring extensive keyword and metadata classification. Vector databases address this issue by transforming complex data into vector embeddings, which describe data objects in numerous dimensions. These databases are gaining popularity due to their ability to extend large language models (LLMs) with long-term memory and provide efficient querying for AI-driven applications.
As of 2026, over 68% of enterprise AI applications use vector databases to manage embeddings generated by large language models, computer vision systems, and recommendation engines. The global vector database market has surpassed $4.2 billion, reflecting the central role these systems play in modern AI infrastructure.
In a relational database, data is organized in rows and columns, while in a document database, it is organized in documents and collections. In contrast, a vector database stores arrays of numbers clustered based on similarity. These databases can be queried with ultra-low latency, making them ideal for AI-driven applications.
A vector database is a type of database that indexes and stores vector embeddings for efficient retrieval and similarity search. In addition to traditional CRUD (create, read, update, and delete) operations and metadata filtering, vector databases enable the organization and comparison of any vector to one another or to the vector of a search query. This capability allows vector databases to excel at similarity search or vector search, providing more comprehensive search results that would not be possible with traditional search technology.
The performance of a vector database depends heavily on its indexing strategy. Indexing determines how vectors are organized in memory or on disk so that nearest-neighbor queries can be answered quickly without scanning every vector in the collection. Three indexing families dominate the vector database landscape: HNSW, IVF, and Product Quantization. Many production systems combine two or more of these techniques to balance recall, latency, memory usage, and throughput.
HNSW is a graph-based approximate nearest neighbor (ANN) algorithm that builds a multi-layered navigable graph over the vector space. Each layer contains a subset of the vectors, with the top layers holding fewer nodes and longer-range connections, and the bottom layers holding all nodes with short-range connections. During a search, the algorithm starts at the top layer and greedily traverses toward the query vector, descending through layers and refining the candidate set at each level.
HNSW offers several strengths that make it the default choice for most vector database deployments:
ef_search parameter controls how many candidates the algorithm considers during a query. Increasing ef_search improves recall at the cost of higher latency.M parameter controls the number of connections per node, and ef_construction controls the quality of the graph at build time. Higher values produce a more accurate but larger index.The main drawback of HNSW is memory consumption. Because the entire graph must reside in memory for fast traversal, HNSW typically requires 1.5 to 2 times the raw vector data size in RAM. For billion-scale datasets, this can become prohibitively expensive.
HNSW is the primary index type in Qdrant, Weaviate, pgvector, and is supported as an option in Milvus and Pinecone.
IVF (Inverted File Index) is a partition-based indexing method. It works by first running k-means clustering on the dataset to divide the vector space into a configurable number of clusters (called nlist). Each cluster is represented by a centroid, and every vector in the dataset is assigned to its nearest centroid. At query time, the algorithm computes the distance between the query vector and all centroids, selects the nprobe closest clusters, and then performs an exhaustive search only within those clusters.
IVF is well suited for scenarios where HNSW is impractical:
The main trade-off is that IVF requires a training phase (the k-means clustering step), which can be time-consuming for large datasets. Additionally, IVF generally achieves lower recall than HNSW at equivalent latency, because queries that fall near cluster boundaries may miss relevant vectors in neighboring clusters.
IVF is available in FAISS, Milvus, and Oracle Database 23ai, among other systems.
Product Quantization is a lossy compression technique that reduces the memory footprint of vector indexes by up to 97%. PQ works by dividing each high-dimensional vector into a set of smaller subvectors and then quantizing each subvector independently using a separate codebook.
For example, a 128-dimensional vector stored in float32 format occupies 512 bytes. PQ might split this vector into 8 subvectors of 16 dimensions each. For each subvector, a codebook of 256 centroids (representable with a single byte) is trained via k-means. Each subvector is then replaced by the index of its closest centroid. The result is that the original 512-byte vector is compressed to just 8 bytes, a reduction of roughly 98%.
During search, PQ uses precomputed lookup tables to estimate distances between the query vector and the quantized database vectors without decompressing them. This makes both storage and computation significantly cheaper.
The trade-off is reduced accuracy. Because PQ is lossy, the compressed representations do not perfectly preserve the original distance relationships. In practice, PQ is rarely used alone; it is typically combined with IVF (as IVF-PQ) or HNSW to form a two-stage retrieval pipeline where the index quickly identifies a candidate set and then a re-ranking step refines the results using full-precision vectors.
While conventional scalar quantization achieves up to 32x compression, PQ can provide compression levels of up to 64x, making it especially valuable for cost-sensitive deployments at very large scale. FAISS, Milvus, Pinecone, and OpenSearch all use product quantization internally.
Beyond HNSW, IVF, and PQ, several other indexing strategies are used in practice:
| Index Type | Description | Strengths | Weaknesses |
|---|---|---|---|
| Flat (brute-force) | Compares the query vector against every vector in the dataset | 100% recall; no training needed | Extremely slow for large datasets |
| LSH (Locality-Sensitive Hashing) | Hashes similar vectors into the same bucket | Fast; low memory overhead | Lower recall than graph-based methods |
| ScaNN (Google) | Combines tree-based partitioning with anisotropic quantization | Very fast; strong recall on benchmarks | Primarily available through Google Cloud |
| DiskANN | Graph-based index designed to run on SSDs rather than RAM | Handles billion-scale data on commodity hardware | Higher latency than in-memory HNSW |
| SPANN | Combines IVF-style partitioning with graph-based in-partition search | Balances memory use and recall | More complex to tune |
Vector databases rely on distance or similarity metrics to determine how close two vectors are in the embedding space. Choosing the correct metric is important because it should match the metric used during the training of the embedding model.
| Metric | Formula Summary | Best For | Notes |
|---|---|---|---|
| Cosine Similarity | Measures the angle between two vectors, ignoring magnitude | Text similarity, document comparison, NLP tasks | Most common default; insensitive to vector length |
| Euclidean Distance (L2) | Measures straight-line distance between two points | Clustering, anomaly detection, spatial data | Sensitive to magnitude; penalizes large absolute differences |
| Dot Product (Inner Product) | Sum of element-wise products of two vectors | Recommendation systems, ranking tasks | Equivalent to cosine similarity when vectors are normalized |
| Manhattan Distance (L1) | Sum of absolute differences across dimensions | Sparse data, grid-based environments | Less common in vector databases |
| Hamming Distance | Counts differing bits between binary vectors | Binary embeddings, hash-based search | Used with binary quantization for speed |
When vectors are normalized to unit length, cosine similarity and dot product produce identical rankings. Many embedding models (such as OpenAI's text-embedding-3 family) output normalized vectors, so either metric can be used interchangeably in those cases.
Several vector databases have emerged to cater to the growing demand for AI applications. The following table compares the most widely adopted systems as of early 2026.
| Feature | Pinecone | Weaviate | Milvus | Qdrant | Chroma | pgvector |
|---|---|---|---|---|---|---|
| Type | Fully managed (serverless) | Open-source + managed cloud | Open-source + managed (Zilliz) | Open-source + managed cloud | Open-source (embedded) | PostgreSQL extension |
| License | Proprietary | BSD-3-Clause | Apache 2.0 | Apache 2.0 | Apache 2.0 | PostgreSQL License |
| Written In | Proprietary (undisclosed) | Go | Go / C++ | Rust | Rust (rewritten 2025) | C |
| Index Types | Proprietary (auto-tuned) | HNSW | HNSW, IVF-Flat, IVF-PQ, DiskANN, GPU indexes | HNSW (with quantization) | HNSW | HNSW, IVFFlat |
| Max Dimensions | 20,000 | Configurable | 32,768 | 65,536 | Configurable | 16,000 (with halfvec up to 4,000 indexed) |
| Distance Metrics | Cosine, Euclidean, Dot Product | Cosine, L2, Dot, Hamming, Manhattan | Cosine, L2, IP, Jaccard, Hamming | Cosine, Euclidean, Dot Product, Manhattan | Cosine, L2, IP | Cosine, L2, Inner Product |
| Hybrid Search | Sparse vectors (2024+) | Native BM25 + vector (BlockMax WAND) | Sparse-BM25 (Milvus 2.5+) | Named vector hybrid (v1.9+) | Metadata + full-text | SQL + vector via standard PostgreSQL |
| Metadata Filtering | Advanced (bitmap indexes) | GraphQL-based filtering | Boolean expressions | Rich JSON payload filtering | Metadata filtering | Standard SQL WHERE clauses |
| Multi-Tenancy | Namespaces | Native tenant isolation | Partitions / collections | Payload-based or collection-based | Collections | Schema-based (standard PostgreSQL) |
| Cloud Regions | AWS, GCP, Azure | AWS, GCP, Azure | Zilliz: AWS, GCP, Azure | AWS, GCP, Azure | Chroma Cloud (limited) | Any PostgreSQL host |
| GitHub Stars (2025) | N/A (closed source) | ~8,000 | ~35,000 | ~9,000 | ~6,000 | ~13,000 |
| Best For | Zero-ops managed vector search | Hybrid/semantic search with GraphQL | Billion-scale deployments | Cost-efficient filtered search | Rapid prototyping | Adding vectors to existing PostgreSQL |
Pinecone is a fully managed, serverless vector database that requires no infrastructure management. It uses proprietary, auto-tuned indexing and offers features such as single-stage metadata filtering with bitmap indexes, namespace-based multi-tenancy, and automatic scaling to zero. Pinecone is commonly chosen by teams building commercial AI products who want reliability and multi-region performance without managing clusters. In 2024, Pinecone eliminated pod-based pricing in favor of a serverless consumption model. In 2025, Pinecone introduced sparse vector support for hybrid search, the ability to fetch vectors by metadata filter, and namespace-level metadata schema configuration.
Weaviate is an open-source vector database written in Go. It is designed for applications that combine vector search with complex data relationships. Weaviate provides a GraphQL API, native hybrid search using BlockMax WAND for keyword scoring combined with vector similarity, and built-in support for vectorization modules (including OpenAI, Cohere, and Hugging Face integrations). Weaviate Cloud offers managed hosting across AWS, GCP, and Azure.
Milvus is an open-source, cloud-native vector database under the Apache 2.0 license, written primarily in Go and C++. It is the most mature open-source option for billion-scale vector search, with a distributed architecture that separates compute and storage. Milvus supports the widest range of index types, including HNSW, IVF-Flat, IVF-PQ, DiskANN, and GPU-accelerated indexes. Milvus 2.5 introduced native sparse-BM25 hybrid search. Zilliz Cloud provides a managed version of Milvus with serverless and dedicated deployment options.
Qdrant is an open-source vector database written in Rust, known for its compact memory footprint and strong filtering performance. Qdrant uses HNSW indexing with built-in scalar, product, and binary quantization options. Its payload-based filtering system supports rich JSON queries and can be combined with vector search in a single request. Qdrant v1.9 introduced named vector hybrid search. The Qdrant Cloud managed service offers a permanent free tier with 1 GB of vector storage.
Chroma is an open-source, embedded vector database designed for rapid prototyping and small-to-medium applications. Originally written in Python, Chroma underwent a major rewrite in Rust during 2025, delivering approximately 4x faster writes and queries. Chroma provides a simple, NumPy-like API with built-in metadata filtering and full-text search. It is commonly used in local development environments and as an embedded store in LangChain and LlamaIndex workflows. Chroma Cloud launched in 2025 with a credit-based pricing model.
pgvector is an open-source PostgreSQL extension that adds vector similarity search capabilities to any PostgreSQL database. It supports HNSW and IVFFlat indexes, with distance metrics including cosine, L2, and inner product. The companion pgvectorscale extension (developed by Timescale) adds DiskANN-based indexing and Statistical Binary Quantization, enabling competitive performance on datasets of up to 50 million vectors. pgvector supports standard vectors, half-precision vectors (halfvec, up to 4,000 indexed dimensions), sparse vectors (sparsevec), and binary vectors (bit type, up to 64,000 indexed dimensions). Because pgvector runs inside PostgreSQL, developers can store vectors alongside relational data and query both with standard SQL, eliminating the need for a separate vector database in many use cases.
Vector database pricing varies significantly depending on whether the service is fully managed, self-hosted, or serverless. The following table summarizes pricing as of early 2026.
| Database | Free Tier | Paid Starting Price | Pricing Model | Estimated Cost at Scale (1B vectors, 100 QPS) |
|---|---|---|---|---|
| Pinecone | Starter plan (limited) | $0.33/GB storage + $8.25/1M read units + $2.00/1M write units | Serverless consumption-based | ~$3,500/month (fully managed) |
| Weaviate | 14-day trial | From $25/month | AIU-based or serverless consumption | ~$2,200/month (managed); ~$800/month (self-hosted + ops overhead) |
| Milvus / Zilliz | Zilliz free plan (up to 5 GB) | Zilliz serverless from ~$89/month; dedicated from ~$114/month | CU-based (~$0.15/CU/hour) | Varies by configuration (self-hosted: infrastructure cost only) |
| Qdrant | 1 GB free forever (no credit card) | From $25/month; Hybrid Cloud from $99/month | Storage + compute based | Competitive; lower than Pinecone at equivalent scale |
| Chroma | Open-source (free); Chroma Cloud $5 free credits | Usage-based | Credit-based | Primarily for small/medium workloads |
| pgvector | Free (open-source extension) | Infrastructure cost only | N/A (runs on your PostgreSQL instance) | Infrastructure cost only; ~25% the cost of Pinecone at equivalent recall |
Usage-based cloud pricing can become volatile at scale. A RAG application handling 150 million queries per month could incur $5,000 to $6,000 in monthly costs on a serverless platform, prompting some organizations to evaluate reserved capacity or self-hosted alternatives.
Vector database benchmarks measure queries per second (QPS), recall (the fraction of true nearest neighbors found), and latency (typically p50, p95, or p99). Results depend heavily on dataset size, vector dimensionality, hardware, and precision thresholds, so benchmarks should be compared only when similar configurations are used.
| Database | Dataset | Vectors | Dimensions | QPS | Recall | Latency (p95) | Source |
|---|---|---|---|---|---|---|---|
| Qdrant | dbpedia-openai-1M | 1M | 1,536 | 626 | 99% | ~1 ms (p99) | Qdrant Benchmarks |
| pgvectorscale | Custom 50M | 50M | 768 | 471 | 99% | Low (unspecified) | Timescale Benchmarks |
| Qdrant | Custom 50M | 50M | 768 | 41 | 99% | Higher | Timescale Benchmarks |
| Milvus / Zilliz | 768-dim text embeddings | Varies | 768 | High | 99% | <10 ms (p50) | VectorDBBench |
| Pinecone | 768-dim text embeddings | Varies | 768 | High | 99% | ~7 ms (p99) | Pinecone Documentation |
| Weaviate | 768-dim text embeddings | Varies | 768 | Moderate | 95%+ | ~50 ms | VectorDBBench |
Key observations from recent benchmarks:
Popular benchmarking tools include ANN-Benchmark (for evaluating index algorithms in isolation) and VectorDBBench (an open-source tool for end-to-end database comparison).
One of the most significant applications of vector databases is Retrieval-Augmented Generation (RAG), an architecture that combines external knowledge retrieval with large language models to produce more accurate and grounded responses.
A RAG pipeline operates through four stages:
RAG applications demand sub-100 ms retrieval latency to avoid adding perceptible delay to the LLM's response time. Vector databases are built for exactly this workload, with in-memory indexes, SIMD-accelerated distance calculations, and distributed architectures that scale horizontally. Traditional databases and search engines can perform keyword matching, but they cannot capture the semantic meaning of queries the way vector similarity search does.
Several advanced techniques have emerged to improve RAG quality:
One of the primary reasons for the increasing popularity of vector databases is their ability to extend large language models (LLMs) with long-term memory. By providing a general-purpose model, such as OpenAI's GPT-4, Meta's LLaMA, or Google's LaMDA, users can store their own data in a vector database. When prompted, the database can query relevant documents to update the context, customizing the final response and providing the AI with long-term memory.
In addition, vector databases can integrate with tools like LangChain, which combine multiple LLMs together for more advanced applications.
Unlike lexical search, which relies on exact word or string matches, semantic search uses the meaning and context of a search query or question. Vector databases use Natural Language Processing models to store and index vector embeddings, allowing for more accurate and relevant search results.
Vector databases facilitate the search and retrieval of unstructured data like images, audio, video, and JSON, which can be challenging to classify and store in traditional databases.
By finding similar items based on nearest matches, vector databases are suitable for powering ranking and recommendation engines for online retailers and streaming media services.
Vector similarity search can be used to find near-duplicate records for applications such as removing duplicate items from a catalog.
Vector databases can identify anomalies in applications used for threat assessment, fraud detection, and IT operations by finding objects that are distant or dissimilar from expected results.
Vector databases employ algorithms to index and retrieve vectors efficiently. Accuracy, latency, or memory usage may need to be prioritized depending on specific use cases. Common similarity and distance metrics used in vector indexes are Euclidean distance, cosine similarity, and dot products.
Approximate Nearest Neighbor (ANN) search is a popular technique to balance precision and performance. ANN algorithms, such as HNSW, IVF, or PQ, focus on improving specific performance properties like memory reduction or fast and accurate search times. Composite indexes combine several components and are often used to achieve optimal performance for a given use case.
Building an effective index without a vector database can be challenging and may require a team of experienced engineers with expertise in indexing and retrieval algorithms.
Single-stage filtering is essential for effective vector databases, as it enables users to limit search results based on vector metadata. It combines the accuracy of pre-filtering with the speed of post-filtering, merging vector and metadata indexes into a single index for optimal performance.
Scaling is critical for vector databases to handle large volumes of data. Data sharding allows the database to divide vectors into shards and replicas across multiple machines, providing scalable and cost-effective performance. When searching, the database queries each shard and combines the results to determine the best match. This can be achieved using Kubernetes, with each shard assigned its own pod containing CPU and RAM resources.
Replication is necessary for vector databases to handle multiple requests simultaneously or in rapid succession. By replicating the set of pods, more requests can be processed in parallel. Replicas also improve availability, as they can be spread across different availability zones provided by cloud providers, ensuring high availability even when machines fail.
Hybrid storage configurations store a compressed vector index in memory (RAM) and the original, full-resolution vector index on disk. This approach reduces infrastructure costs while maintaining fast and accurate search results. Hybrid storage increases storage capacity without negatively impacting database performance.
APIs enable developers to use and manage vector databases from other applications, offloading the burden of building and maintaining vector search capabilities. REST APIs allow vector databases to be accessed from any environment capable of making HTTPS calls, while direct access can be provided through clients using languages like Python, Java, and Go.
The vector database landscape has undergone significant changes in 2025 and early 2026, driven by the rapid adoption of generative AI and AI agents.
One of the most important trends is the convergence of vector search capabilities into traditional relational databases. PostgreSQL, MongoDB (via Atlas Vector Search), and Oracle (via Oracle Database 23ai) have all integrated native vector support. This shift means that for many applications, a separate vector database is no longer necessary. PostgreSQL in particular has gained momentum: it was the most-used database among developers in 2025 at 46.5% adoption, and it powers parts of OpenAI's infrastructure. In 2025 alone, Snowflake and Databricks spent approximately $1.25 billion acquiring PostgreSQL-focused companies, signaling the industry's confidence in this approach.
Serious production teams have converged on a polystore architecture where vector search and relational storage coexist with clear boundaries. Rather than choosing between a relational database and a vector database, organizations run both in tandem: the relational database handles ACID transactions and structured queries, while the vector database handles embedding search and semantic retrieval. By 2026, this pattern has become the default for teams shipping AI-augmented products.
The rise of AI agents has created new demands on vector database infrastructure. AI agents generate roughly 10x more queries than human users, forcing architectural rethinks around throughput, connection pooling, and semantic caching. Vector databases now serve as long-term memory systems for agents, storing conversation history, tool-use logs, and retrieved context across sessions.
One of the most notable developments in 2026 is the emergence of graph-enhanced vector retrieval, which combines vector similarity search with knowledge graph traversal. This approach captures both semantic similarity (via vectors) and structural relationships (via graph edges), producing higher-quality retrieval for complex queries. Multiple vector database vendors have begun integrating graph capabilities or partnering with graph database providers.
Despite the $168 billion edge computing market and 39 billion projected connected IoT devices by 2030, comprehensive edge-native vector database solutions remain scarce. This gap represents a significant opportunity in healthcare, autonomous systems, manufacturing, and retail, where low-latency inference must happen on-device rather than in the cloud.
Chroma's 2025 Rust rewrite delivered 4x performance improvements. Qdrant and Milvus have added more aggressive quantization options (binary quantization, scalar quantization) to reduce memory costs. pgvectorscale's DiskANN-based indexing has brought competitive performance to PostgreSQL at a fraction of the cost of managed vector databases. Across the board, vendors are investing in techniques to reduce the cost per query as AI workloads scale.