pgvector is an open-source PostgreSQL extension that adds vector similarity search capabilities directly to PostgreSQL databases. Created by Andrew Kane and first released in April 2021, pgvector allows developers to store, index, and query high-dimensional vector embeddings alongside traditional relational data, effectively turning any PostgreSQL instance into a vector database. The project has become one of the most widely adopted approaches to vector search, largely because it eliminates the need for a separate vector database in many use cases.
pgvector was created by Andrew Kane, a prolific open-source developer known for building practical libraries across multiple programming languages. Kane released the first version (0.1.0) on April 20, 2021, recognizing early that machine learning systems would generate large volumes of embedding data that developers would need to store and query efficiently.
The extension evolved rapidly through a series of releases:
| Version | Release date | Notable changes |
|---|---|---|
| 0.1.0 | April 2021 | Initial release with vector type and distance operators |
| 0.2.0 | October 2021 | Added PostgreSQL 14 support |
| 0.3.0 | October 2022 | PostgreSQL 15 support; dropped support for versions before 11 |
| 0.4.0 | January 2023 | Increased maximum vector dimensions from 1,024 to 16,000; added vector aggregates |
| 0.5.0 | August 2023 | Introduced HNSW indexing for approximate nearest neighbor search |
| 0.6.0 | January 2024 | Parallelized HNSW index builds; significant performance improvements |
| 0.7.0 | April 2024 | Added halfvec (half-precision) and sparsevec types |
| 0.8.0 | February 2025 | Introduced iterative index scans for improved recall |
| 0.8.2 | 2026 | Bug fix release addressing CVE-2026-3172 (parallel HNSW build buffer overflow) |
The addition of HNSW indexing in version 0.5.0 was a turning point, making pgvector competitive with dedicated vector databases for many workloads. Prior to that release, users were limited to IVFFlat indexes, which offered weaker recall-speed tradeoffs.
pgvector adds a vector column type to PostgreSQL. Developers can create a column that stores vectors of a specified dimensionality and then run similarity queries using standard SQL. The extension provides three distance functions:
| Operator | Distance metric | Description |
|---|---|---|
<-> | L2 (Euclidean) distance | Measures straight-line distance between two vectors |
<#> | Negative inner product | Useful when vectors are not normalized |
<=> | Cosine distance | Measures angle between vectors; popular for text embeddings |
A basic query looks like standard SQL. For example, to find the 10 most similar items to a given embedding, a developer would write something like SELECT * FROM items ORDER BY embedding <=> '[0.1, 0.2, ...]' LIMIT 10;. This simplicity is one of the main reasons pgvector is popular: developers do not need to learn a new query language or manage a separate service.
As of version 0.7.0 and later, pgvector supports several vector storage types:
pgvector supports two indexing algorithms for approximate nearest neighbor search, plus exact (brute-force) search when no index is present.
IVFFlat (Inverted File with Flat compression) divides all vectors into clusters, each with a center point (centroid). When a query arrives, the algorithm first identifies the closest cluster centers and then searches within those clusters. The lists parameter controls how many clusters are created, and probes controls how many clusters are searched at query time.
IVFFlat is faster to build than HNSW and uses less memory, but it generally delivers lower recall at the same query speed. It requires training data to be present before the index can be created, since it needs to compute cluster centroids. For a dataset of 1 million 50-dimensional vectors, IVFFlat index creation takes roughly 128 seconds, compared to over 4,000 seconds for HNSW [1].
The performance of IVFFlat indexes depends heavily on two parameters:
| Parameter | Setting | Recommendation |
|---|---|---|
lists | Number of clusters | General guideline: use sqrt(n) for datasets under 1M vectors, and n / 1000 for larger datasets. More lists improve query speed at the cost of build time and potentially lower recall. |
probes | Clusters searched per query | Set via SET ivfflat.probes = N;. Higher values improve recall but increase latency. Start with sqrt(lists) and adjust based on recall requirements. |
A key limitation of IVFFlat is that the index quality degrades as data drifts away from the original cluster centroids. If the data distribution changes significantly after index creation, the index should be rebuilt to recompute centroids. For datasets with frequent inserts, this makes IVFFlat less suitable than HNSW [1][8].
Creating an IVFFlat index:
-- Create an IVFFlat index with 100 lists using cosine distance
CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
-- Set probes at query time
SET ivfflat.probes = 10;
HNSW (Hierarchical Navigable Small World) builds a multi-layered graph where each vector is a node. The graph has a hierarchical structure: the top layer contains broadly connected nodes, and each subsequent layer adds more nodes with finer-grained connections. During search, the algorithm starts at the top layer and navigates downward, progressively narrowing the search.
HNSW provides better recall-speed tradeoffs than IVFFlat, particularly for high-dimensional data. In benchmarks on 58,000 records, HNSW queries took approximately 1.5 milliseconds compared to 2.4 milliseconds for IVFFlat and 650 milliseconds for sequential scan [2]. The tradeoff is that HNSW indexes take longer to build and use substantially more memory. For the same 1 million vector dataset mentioned above, HNSW requires about 729 MB versus 257 MB for IVFFlat.
One practical advantage of HNSW is that the index can be created on an empty table, since there is no training step. This means the index stays up to date as new rows are inserted, while IVFFlat indexes may degrade as data drifts away from the original cluster centroids.
HNSW performance is controlled by three parameters:
| Parameter | Default | Range | Effect |
|---|---|---|---|
m | 16 | 2-100 | Maximum connections per node per layer. Higher values improve recall and search speed but increase memory usage and build time. A value of 16 is suitable for most workloads; increase to 32 or 64 for very high recall requirements. |
ef_construction | 64 | 4-1000 | Size of the candidate list during index construction. Higher values produce a better graph structure but slow down index building. Values of 100-200 are common for production workloads. |
hnsw.ef_search | 40 | 1-1000 | Size of the candidate list during search. Set at query time via SET hnsw.ef_search = N;. Higher values improve recall at the cost of latency. Must be at least as large as the LIMIT clause in the query. |
Creating an HNSW index with custom parameters:
-- Create an HNSW index with custom parameters
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops)
WITH (m = 24, ef_construction = 128);
-- Increase search quality at query time
SET hnsw.ef_search = 100;
| Criteria | HNSW | IVFFlat |
|---|---|---|
| Recall at same latency | Higher | Lower |
| Index build time | Slower (32x slower in benchmarks) | Faster |
| Memory usage | Higher (729 MB for 1M vectors) | Lower (257 MB for 1M vectors) |
| Query throughput | Higher (40.5 QPS at 0.998 recall) | Lower (2.6 QPS at same recall) |
| Empty table support | Yes (no training needed) | No (requires data for centroid computation) |
| Handling inserts | Good (index stays accurate) | Degrades over time (centroid drift) |
| Best for | Production workloads, high-recall requirements | Quick prototyping, memory-constrained environments |
For the vast majority of production workloads, HNSW is the recommended index type. IVFFlat remains useful in specific scenarios: when memory is severely constrained, when rapid index rebuilds are needed (for example, in a development environment), or when the dataset is static and does not receive new inserts [1][2][8].
Starting with version 0.8.0, pgvector introduced iterative index scans. When enabled via the hnsw.iterative_scan or ivfflat.iterative_scan parameters, the database automatically scans progressively more of the index until enough results satisfying any filter conditions are found. This is particularly useful for filtered queries where the index might need to examine many candidates before finding enough that match the filter criteria.
-- Enable iterative scans for HNSW
SET hnsw.iterative_scan = relaxed_order;
-- Now filtered queries will automatically expand search scope
SELECT * FROM items
WHERE category = 'electronics'
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;
Without iterative scans, a filtered query on an HNSW index might return fewer than the requested number of results if most of the nearest neighbors do not match the filter. Iterative scans solve this by progressively expanding the search scope until enough matching results are found.
Several techniques can improve pgvector query performance in production:
Ensure indexes fit in memory. Both HNSW and IVFFlat indexes perform best when they are fully cached in RAM. Monitor the PostgreSQL shared_buffers and effective_cache_size settings. If the index is larger than available memory, queries will hit disk I/O and slow down significantly.
Use appropriate work_mem. For HNSW index builds, PostgreSQL needs sufficient maintenance_work_mem. A general recommendation is to set maintenance_work_mem to at least 1 GB for index builds on large datasets, and to ensure max_parallel_maintenance_workers is set to leverage parallelized HNSW construction (available since pgvector 0.6.0).
Warm up the index. After a PostgreSQL restart, the first queries against an HNSW index will be slow because the index is not yet cached. Use pg_prewarm to load the index into shared buffers:
-- Pre-warm the HNSW index into shared buffers
SELECT pg_prewarm('items_embedding_idx');
Use partial indexes for filtered queries. If queries frequently filter on a specific column, creating a partial index can improve performance:
-- Create an HNSW index only for active items
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops)
WHERE status = 'active';
Consider halfvec for memory savings. If full 32-bit precision is not needed, casting vectors to halfvec halves memory usage. Many embedding models produce outputs where 16-bit precision is sufficient:
-- Create a halfvec column for memory efficiency
ALTER TABLE items ADD COLUMN embedding_half halfvec(1536);
UPDATE items SET embedding_half = embedding::halfvec(1536);
CREATE INDEX ON items USING hnsw (embedding_half halfvec_cosine_ops);
pgvectorscale is a complementary extension developed by Timescale that adds advanced indexing and quantization capabilities to pgvector. Written in Rust using the PGRX framework, it extends pgvector without replacing it.
The primary feature of pgvectorscale is the StreamingDiskANN index type, inspired by Microsoft's DiskANN algorithm. Unlike HNSW, which requires the full index to reside in memory, StreamingDiskANN is optimized for disk-based storage. The graph structure is kept on SSD, trading some latency for dramatically lower memory requirements. This makes it particularly suitable for large datasets that would be prohibitively expensive to keep entirely in RAM.
-- Install pgvectorscale
CREATE EXTENSION vectorscale;
-- Create a StreamingDiskANN index
CREATE INDEX ON items USING diskann (embedding);
Since StreamingDiskANN uses the pgvector vector data type, existing pgvector users can add the index without migrating data or changing column types.
pgvectorscale includes Statistical Binary Quantization (SBQ), which compresses vectors more efficiently than standard binary quantization by using statistical properties of the data distribution. SBQ reduces storage requirements while maintaining high recall.
On a benchmark dataset of 50 million Cohere embeddings (768 dimensions), pgvectorscale achieved notable results [3][9]:
| Metric | pgvector + pgvectorscale | Pinecone (s1 pod) | Qdrant |
|---|---|---|---|
| QPS at 99% recall | 471 | ~29 (s1 pod) | 41 |
| p95 latency | 28x lower than Pinecone s1 | Baseline | N/A |
| Cost (self-hosted AWS EC2) | 75% less than Pinecone | Baseline | N/A |
These benchmarks demonstrate that PostgreSQL with pgvectorscale can be competitive with dedicated vector databases for datasets up to 50 million vectors, particularly when cost is a primary concern [9].
pgvector's performance depends on several factors, and understanding them is important for production deployments.
Memory: Both HNSW and IVFFlat indexes should ideally fit in memory for best performance. HNSW indexes in particular can be large. For very large datasets, the pgvectorscale extension (developed by Timescale) adds a streaming index build option and other optimizations.
Dimensionality: Higher-dimensional vectors increase both storage requirements and query latency. HNSW indexes support up to 2,000 dimensions for standard vectors, which covers most common embedding models.
Concurrency: PostgreSQL's connection model means that each query runs in a single process. For high-throughput workloads, connection pooling (e.g., PgBouncer) and read replicas help distribute load.
Scaling limits: With pgvectorscale, PostgreSQL with pgvector has been tested on datasets of up to 50 million high-dimensional vectors, delivering 471 QPS (queries per second) at 99% recall on 50 million vectors [3]. This is competitive with dedicated vector databases for many workloads, though billion-scale use cases may still benefit from purpose-built systems.
pgvector's main value proposition is simplicity: developers who already use PostgreSQL do not need a separate database. However, dedicated vector databases offer features and scale that pgvector may not match in every scenario.
| Feature | pgvector | Pinecone | Qdrant | Milvus |
|---|---|---|---|---|
| Deployment model | Extension on existing PostgreSQL | Fully managed cloud service | Open-source, self-hosted or cloud | Open-source, self-hosted or Zilliz Cloud |
| Max tested scale | ~50M vectors (with pgvectorscale) | Billions of vectors | Billions of vectors | Billions of vectors |
| Index types | IVFFlat, HNSW, StreamingDiskANN (pgvectorscale) | Proprietary | HNSW with quantization | HNSW, IVF, DiskANN, SCANN, and others |
| Hybrid search | Via PostgreSQL full-text search (tsvector + GIN) | Metadata filtering + sparse vectors | Sparse + dense vectors, payload filtering | Native BM25 + vector hybrid search |
| GPU acceleration | No | N/A (managed) | GPU-accelerated HNSW indexing | NVIDIA CAGRA GPU indexing |
| SQL support | Full PostgreSQL SQL | No (REST/gRPC API) | No (REST/gRPC API) | No (SDK/API) |
| ACID transactions | Yes (PostgreSQL) | No | No | No |
| Operational overhead | None (part of PostgreSQL) | None (fully managed) | Moderate (self-hosted) or none (cloud) | Moderate to high (distributed system) |
The general guidance in the community is: start with pgvector if you already use PostgreSQL and your dataset is under 10 million vectors. Consider a dedicated vector database when you need billion-scale storage, advanced features like GPU-accelerated indexing, or when vector search performance is the primary concern.
For teams considering migrating from pgvector to a dedicated vector database (or vice versa), the key factors to evaluate are:
pgvector's integration into major cloud PostgreSQL services has been a significant factor in its adoption. Since the extension is standard PostgreSQL, enabling it typically requires only running CREATE EXTENSION vector; on any supported instance.
| Cloud provider | Service | pgvector support |
|---|---|---|
| AWS | Amazon RDS for PostgreSQL | Available from PostgreSQL 15.2+ in all AWS regions |
| AWS | Amazon Aurora PostgreSQL | Supported with global database for multi-region replication |
| Google Cloud | Cloud SQL for PostgreSQL | Supported |
| Microsoft Azure | Azure Database for PostgreSQL | Supported with dedicated documentation |
| Supabase | Supabase Postgres | Built-in; extensively documented with AI toolkit integration |
| Neon | Neon Serverless Postgres | Built-in with optimization guides for vector search |
| Render | Render PostgreSQL | Supported |
Supabase has been particularly active in promoting pgvector, building an AI toolkit around it that includes embedding generation, retrieval-augmented generation (RAG) templates, and performance optimization guides. Neon has similarly invested in pgvector support with detailed documentation on index tuning.
pgvector is commonly used for:
Several projects have grown around pgvector to extend its capabilities:
| Tool | Developer | Description |
|---|---|---|
| pgvectorscale | Timescale | Adds StreamingDiskANN index and Statistical Binary Quantization for improved performance on large datasets |
| pgai | Timescale | Provides convenience functions for calling embedding APIs directly from SQL |
| VectorChord | VectorChord | Alternative high-performance vector index for PostgreSQL |
| LangChain | LangChain | PGVector vector store integration for RAG pipelines |
| LlamaIndex | LlamaIndex | Native pgvector vector store connector |
| Supabase Vecs | Supabase | Python client for managing pgvector collections in Supabase |
pgvector is at version 0.8.2 and supports PostgreSQL 13 through 17. The project has over 13,000 GitHub stars and is one of the most popular PostgreSQL extensions. Its adoption continues to grow as more organizations build AI-powered applications on existing PostgreSQL infrastructure rather than introducing new database systems.
The project's development pace remains steady, with Andrew Kane continuing as the primary maintainer. The introduction of halfvec and sparsevec types in recent versions shows that pgvector is tracking developments in the embedding model space, where dimensionality reduction and sparse retrieval techniques are becoming more common. For teams already on PostgreSQL, pgvector remains the most practical entry point into vector search.