pgvector

AI Infrastructure Open Source AI

19 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

16 citations

Revision

v7 · 3,711 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

pgvector is an open-source PostgreSQL extension that adds vector similarity search to a standard PostgreSQL database, letting developers store, index, and query high-dimensional embeddings using ordinary SQL. Created by Andrew Kane and first released on April 20, 2021, pgvector ships two index types (IVFFlat and HNSW), supports six distance metrics, and inherits PostgreSQL's full ACID transactions, which lets a single database hold both vector and relational data without a separate vector database ^[4]^[13]. As of June 2026 the project has more than 21,900 GitHub stars and is one of the most widely used PostgreSQL extensions, largely because it eliminates the need for a dedicated vector store in many use cases ^[4]. The official repository describes it simply as "open-source vector similarity search for Postgres" ^[4].

History and development

pgvector was created by Andrew Kane, a prolific open-source developer known for building practical libraries across multiple programming languages. Kane released the first version (0.1.0) on April 20, 2021, recognizing early that machine learning systems would generate large volumes of embedding data that developers would need to store and query efficiently ^[4].

The extension evolved rapidly through a series of releases:

Version	Release date	Notable changes
0.1.0	April 2021	Initial release with vector type and distance operators
0.2.0	October 2021	Added PostgreSQL 14 support
0.3.0	October 2022	PostgreSQL 15 support; dropped support for versions before 11
0.4.0	January 2023	Increased maximum vector dimensions from 1,024 to 16,000; added vector aggregates
0.5.0	August 2023	Introduced HNSW indexing for approximate nearest neighbor search
0.6.0	January 2024	Parallelized HNSW index builds; significant performance improvements
0.7.0	April 2024	Added halfvec (half-precision), sparsevec, and bit types, plus L1, Hamming, and Jaccard distance functions ^[14]
0.8.0	October 2024	Introduced iterative index scans for improved recall on filtered queries ^[15]
0.8.2	February 25, 2026	Security fix for a buffer overflow in parallel HNSW index builds (CVE-2026-3172) ^[16]
0.8.3	June 17, 2026	Fixed possible index corruption with HNSW vacuuming and a Hamming/Jaccard regression on PostgreSQL 18 ^[4]

The addition of HNSW indexing in version 0.5.0 was a turning point, making pgvector competitive with dedicated vector databases for many workloads. Prior to that release, users were limited to IVFFlat indexes, which offered weaker recall-speed tradeoffs.

How does pgvector work?

pgvector adds a vector column type to PostgreSQL. Developers can create a column that stores vectors of a specified dimensionality and then run similarity queries using standard SQL. The extension provides six distance functions, with the L1, Hamming, and Jaccard operators added in version 0.7.0 ^[14]:

Operator	Distance metric	Description
`<->`	L2 (Euclidean) distance	Measures straight-line distance between two vectors
`<#>`	Negative inner product	Useful when vectors are not normalized
`<=>`	Cosine distance	Measures angle between vectors; popular for text embeddings
`<+>`	L1 (Taxicab/Manhattan) distance	Sum of absolute differences; HNSW-indexable since 0.7.0
`<~>`	Hamming distance	For `bit` (binary) vectors; counts differing positions
`<%>`	Jaccard distance	For `bit` (binary) vectors; measures set dissimilarity

A basic query looks like standard SQL. For example, to find the 10 most similar items to a given embedding, a developer would write something like SELECT * FROM items ORDER BY embedding <=> '[0.1, 0.2, ...]' LIMIT 10;. This simplicity is one of the main reasons pgvector is popular: developers do not need to learn a new query language or manage a separate service.

Vector types

As of version 0.7.0 and later, pgvector supports several vector storage types. PostgreSQL stores data in 8 KB buffer pages and every index tuple must fit within a single page, which is why the maximum dimensions for indexing are lower than the maximum that can be stored ^[4]:

Type	Max dimensions (stored)	Max dimensions (indexed)	Notes
vector	16,000	2,000	Standard single-precision (32-bit) floats
halfvec	16,000	4,000	Half-precision (16-bit) floats; half the memory of `vector`
sparsevec	16,000 non-zero	1,000 non-zero	Stores only non-zero elements; about 8 bytes per element plus 16 bytes overhead
bit	64,000	64,000	Binary vectors for binary quantization

vector: Standard single-precision (32-bit) floating-point vectors, supporting up to 2,000 dimensions for HNSW indexes.
halfvec: Half-precision (16-bit) floating-point vectors, supporting up to 4,000 dimensions for HNSW indexes. These use half the memory of standard vectors while maintaining reasonable precision.
sparsevec: Sparse vectors that store only non-zero elements, supporting up to 1,000 non-zero elements for indexing. Each sparse vector uses approximately 8 bytes per non-zero element plus 16 bytes of overhead. This type is useful for models like SPLADE or BM25 that produce high-dimensional but mostly-zero vectors.
bit: Binary vectors supporting up to 64,000 dimensions for HNSW indexes. Useful for binary quantization schemes.

Indexing strategies

pgvector supports two indexing algorithms for approximate nearest neighbor search, plus exact (brute-force) search when no index is present.

IVFFlat

IVFFlat (Inverted File with Flat compression) divides all vectors into clusters, each with a center point (centroid). When a query arrives, the algorithm first identifies the closest cluster centers and then searches within those clusters. The lists parameter controls how many clusters are created, and probes controls how many clusters are searched at query time.

IVFFlat is faster to build than HNSW and uses less memory, but it generally delivers lower recall at the same query speed. It requires training data to be present before the index can be created, since it needs to compute cluster centroids. For a dataset of 1 million 50-dimensional vectors, IVFFlat index creation takes roughly 128 seconds, compared to over 4,000 seconds for HNSW ^[1].

IVFFlat parameter tuning

The performance of IVFFlat indexes depends heavily on two parameters:

Parameter	Setting	Recommendation
`lists`	Number of clusters	General guideline: use `sqrt(n)` for datasets under 1M vectors, and `n / 1000` for larger datasets. More lists improve query speed at the cost of build time and potentially lower recall.
`probes`	Clusters searched per query	Set via `SET ivfflat.probes = N;`. Higher values improve recall but increase latency. Start with `sqrt(lists)` and adjust based on recall requirements.

A key limitation of IVFFlat is that the index quality degrades as data drifts away from the original cluster centroids. If the data distribution changes significantly after index creation, the index should be rebuilt to recompute centroids. For datasets with frequent inserts, this makes IVFFlat less suitable than HNSW ^[1]^[8].

Creating an IVFFlat index:

-- Create an IVFFlat index with 100 lists using cosine distance
CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

-- Set probes at query time
SET ivfflat.probes = 10;

HNSW

HNSW (Hierarchical Navigable Small World) builds a multi-layered graph where each vector is a node. The graph has a hierarchical structure: the top layer contains broadly connected nodes, and each subsequent layer adds more nodes with finer-grained connections. During search, the algorithm starts at the top layer and navigates downward, progressively narrowing the search.

HNSW provides better recall-speed tradeoffs than IVFFlat, particularly for high-dimensional data. In benchmarks on 58,000 records, HNSW queries took approximately 1.5 milliseconds compared to 2.4 milliseconds for IVFFlat and 650 milliseconds for sequential scan ^[2]. The tradeoff is that HNSW indexes take longer to build and use substantially more memory. For the same 1 million vector dataset mentioned above, HNSW requires about 729 MB versus 257 MB for IVFFlat.

One practical advantage of HNSW is that the index can be created on an empty table, since there is no training step. This means the index stays up to date as new rows are inserted, while IVFFlat indexes may degrade as data drifts away from the original cluster centroids.

HNSW parameter tuning

HNSW performance is controlled by three parameters:

Parameter	Default	Range	Effect
`m`	16	2-100	Maximum connections per node per layer. Higher values improve recall and search speed but increase memory usage and build time. A value of 16 is suitable for most workloads; increase to 32 or 64 for very high recall requirements.
`ef_construction`	64	4-1000	Size of the candidate list during index construction. Higher values produce a better graph structure but slow down index building. Values of 100-200 are common for production workloads.
`hnsw.ef_search`	40	1-1000	Size of the candidate list during search. Set at query time via `SET hnsw.ef_search = N;`. Higher values improve recall at the cost of latency. Must be at least as large as the `LIMIT` clause in the query.

Creating an HNSW index with custom parameters:

-- Create an HNSW index with custom parameters
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops)
  WITH (m = 24, ef_construction = 128);

-- Increase search quality at query time
SET hnsw.ef_search = 100;

HNSW vs IVFFlat: when to use which

Criteria	HNSW	IVFFlat
Recall at same latency	Higher	Lower
Index build time	Slower (32x slower in benchmarks)	Faster
Memory usage	Higher (729 MB for 1M vectors)	Lower (257 MB for 1M vectors)
Query throughput	Higher (40.5 QPS at 0.998 recall)	Lower (2.6 QPS at same recall)
Empty table support	Yes (no training needed)	No (requires data for centroid computation)
Handling inserts	Good (index stays accurate)	Degrades over time (centroid drift)
Best for	Production workloads, high-recall requirements	Quick prototyping, memory-constrained environments

For the vast majority of production workloads, HNSW is the recommended index type. IVFFlat remains useful in specific scenarios: when memory is severely constrained, when rapid index rebuilds are needed (for example, in a development environment), or when the dataset is static and does not receive new inserts ^[1]^[2]^[8].

Iterative scans

Starting with version 0.8.0, pgvector introduced iterative index scans. When enabled via the hnsw.iterative_scan or ivfflat.iterative_scan parameters, the database automatically scans progressively more of the index until enough results satisfying any filter conditions are found ^[15]. This is particularly useful for filtered queries where the index might need to examine many candidates before finding enough that match the filter criteria.

-- Enable iterative scans for HNSW
SET hnsw.iterative_scan = relaxed_order;

-- Now filtered queries will automatically expand search scope
SELECT * FROM items
WHERE category = 'electronics'
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;

Without iterative scans, a filtered query on an HNSW index might return fewer than the requested number of results if most of the nearest neighbors do not match the filter. Iterative scans solve this by progressively expanding the search scope until enough matching results are found.

How can you optimize pgvector query performance?

Several techniques can improve pgvector query performance in production:

Ensure indexes fit in memory. Both HNSW and IVFFlat indexes perform best when they are fully cached in RAM. Monitor the PostgreSQL shared_buffers and effective_cache_size settings. If the index is larger than available memory, queries will hit disk I/O and slow down significantly.

Use appropriate work_mem. For HNSW index builds, PostgreSQL needs sufficient maintenance_work_mem. A general recommendation is to set maintenance_work_mem to at least 1 GB for index builds on large datasets, and to ensure max_parallel_maintenance_workers is set to leverage parallelized HNSW construction (available since pgvector 0.6.0).

Warm up the index. After a PostgreSQL restart, the first queries against an HNSW index will be slow because the index is not yet cached. Use pg_prewarm to load the index into shared buffers:

-- Pre-warm the HNSW index into shared buffers
SELECT pg_prewarm('items_embedding_idx');

Use partial indexes for filtered queries. If queries frequently filter on a specific column, creating a partial index can improve performance:

-- Create an HNSW index only for active items
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops)
  WHERE status = 'active';

Consider halfvec for memory savings. If full 32-bit precision is not needed, casting vectors to halfvec halves memory usage. Many embedding models produce outputs where 16-bit precision is sufficient:

-- Create a halfvec column for memory efficiency
ALTER TABLE items ADD COLUMN embedding_half halfvec(1536);
UPDATE items SET embedding_half = embedding::halfvec(1536);
CREATE INDEX ON items USING hnsw (embedding_half halfvec_cosine_ops);

pgvectorscale extension

pgvectorscale is a complementary extension developed by Timescale that adds advanced indexing and quantization capabilities to pgvector. Written in Rust using the PGRX framework, it extends pgvector without replacing it ^[10].

StreamingDiskANN index

The primary feature of pgvectorscale is the StreamingDiskANN index type, inspired by Microsoft's DiskANN algorithm. Unlike HNSW, which requires the full index to reside in memory, StreamingDiskANN is optimized for disk-based storage. The graph structure is kept on SSD, trading some latency for dramatically lower memory requirements. This makes it particularly suitable for large datasets that would be prohibitively expensive to keep entirely in RAM.

-- Install pgvectorscale
CREATE EXTENSION vectorscale;

-- Create a StreamingDiskANN index
CREATE INDEX ON items USING diskann (embedding);

Since StreamingDiskANN uses the pgvector vector data type, existing pgvector users can add the index without migrating data or changing column types.

Statistical Binary Quantization

pgvectorscale includes Statistical Binary Quantization (SBQ), which compresses vectors more efficiently than standard binary quantization by using statistical properties of the data distribution. SBQ reduces storage requirements while maintaining high recall.

Performance benchmarks

On a benchmark dataset of 50 million Cohere embeddings (768 dimensions), pgvectorscale achieved notable results ^[3]^[9]:

Metric	pgvector + pgvectorscale	Pinecone (s1 pod)	Qdrant
QPS at 99% recall	471	~29 (s1 pod)	41
p95 latency	28x lower than Pinecone s1	Baseline	N/A
Cost (self-hosted AWS EC2)	75% less than Pinecone	Baseline	N/A

In Timescale's published benchmark, the team reported that pgvector with pgvectorscale was "28x lower p95 latency, and 16x higher query throughput" than Pinecone's s1 pod while costing 75 percent less ^[9]. These results demonstrate that PostgreSQL with pgvectorscale can be competitive with dedicated vector databases for datasets up to 50 million vectors, particularly when cost is a primary concern ^[9].

Performance considerations

pgvector's performance depends on several factors, and understanding them is important for production deployments.

Memory: Both HNSW and IVFFlat indexes should ideally fit in memory for best performance. HNSW indexes in particular can be large. For very large datasets, the pgvectorscale extension (developed by Timescale) adds a streaming index build option and other optimizations.

Dimensionality: Higher-dimensional vectors increase both storage requirements and query latency. The vector type can store up to 16,000 dimensions, but HNSW and IVFFlat indexes are limited to 2,000 dimensions for standard vectors (4,000 with halfvec), which covers most common embedding models ^[4].

Concurrency: PostgreSQL's connection model means that each query runs in a single process. For high-throughput workloads, connection pooling (e.g., PgBouncer) and read replicas help distribute load.

Scaling limits: With pgvectorscale, PostgreSQL with pgvector has been tested on datasets of up to 50 million high-dimensional vectors, delivering 471 QPS (queries per second) at 99% recall on 50 million vectors ^[3]. This is competitive with dedicated vector databases for many workloads, though billion-scale use cases may still benefit from purpose-built systems.

How does pgvector compare with dedicated vector databases?

pgvector's main value proposition is simplicity: developers who already use PostgreSQL do not need a separate database. However, dedicated vector databases offer features and scale that pgvector may not match in every scenario.

Feature	pgvector	Pinecone	Qdrant	Milvus
Deployment model	Extension on existing PostgreSQL	Fully managed cloud service	Open-source, self-hosted or cloud	Open-source, self-hosted or Zilliz Cloud
Max tested scale	~50M vectors (with pgvectorscale)	Billions of vectors	Billions of vectors	Billions of vectors
Index types	IVFFlat, HNSW, StreamingDiskANN (pgvectorscale)	Proprietary	HNSW with quantization	HNSW, IVF, DiskANN, SCANN, and others
Hybrid search	Via PostgreSQL full-text search (tsvector + GIN)	Metadata filtering + sparse vectors	Sparse + dense vectors, payload filtering	Native BM25 + vector hybrid search
GPU acceleration	No	N/A (managed)	GPU-accelerated HNSW indexing	NVIDIA CAGRA GPU indexing
SQL support	Full PostgreSQL SQL	No (REST/gRPC API)	No (REST/gRPC API)	No (SDK/API)
ACID transactions	Yes (PostgreSQL)	No	No	No
Operational overhead	None (part of PostgreSQL)	None (fully managed)	Moderate (self-hosted) or none (cloud)	Moderate to high (distributed system)

The general guidance in the community is: start with pgvector if you already use PostgreSQL and your dataset is under 10 million vectors. Consider a dedicated vector database when you need billion-scale storage, advanced features like GPU-accelerated indexing, or when vector search performance is the primary concern.

Migration considerations

For teams considering migrating from pgvector to a dedicated vector database (or vice versa), the key factors to evaluate are:

Data volume: pgvector is well-suited for datasets up to 10-50 million vectors. Beyond that, purpose-built systems with distributed architectures handle scale more gracefully.
Operational complexity: pgvector adds zero operational overhead if PostgreSQL is already in the stack. Introducing a new database means new monitoring, backup procedures, and operational knowledge.
Feature requirements: If you need GPU-accelerated indexing, advanced quantization, or native multi-modal search, dedicated vector databases offer these features. If your primary need is simple semantic search alongside relational data, pgvector is sufficient.
ACID transactions: pgvector inherits PostgreSQL's full ACID transaction support. If your application requires transactional consistency between vector operations and relational data, pgvector is the only option that provides this natively.
Cost: Self-hosted pgvector on existing PostgreSQL infrastructure is essentially free. Managed vector databases charge per query, per storage, or per compute hour. For smaller workloads, pgvector is typically the most cost-effective option ^[9].

Which cloud providers support pgvector?

pgvector's integration into major cloud PostgreSQL services has been a significant factor in its adoption. Since the extension is standard PostgreSQL, enabling it typically requires only running CREATE EXTENSION vector; on any supported instance.

Cloud provider	Service	pgvector support
AWS	Amazon RDS for PostgreSQL	Available from PostgreSQL 15.2+ in all AWS regions
AWS	Amazon Aurora PostgreSQL	Supported with global database for multi-region replication
Google Cloud	Cloud SQL for PostgreSQL	Supported
Microsoft Azure	Azure Database for PostgreSQL	Supported with dedicated documentation
Supabase	Supabase Postgres	Built-in; extensively documented with AI toolkit integration
Neon	Neon Serverless Postgres	Built-in with optimization guides for vector search
Render	Render PostgreSQL	Supported

Supabase has been particularly active in promoting pgvector, building an AI toolkit around it that includes embedding generation, retrieval-augmented generation (RAG) templates, and performance optimization guides ^[5]. Neon has similarly invested in pgvector support with detailed documentation on index tuning ^[6].

What is pgvector used for?

pgvector is commonly used for:

Semantic search: Finding documents or records based on meaning rather than keyword matching, typically using embeddings from models like OpenAI's text-embedding-ada-002 or open-source alternatives.
Retrieval-augmented generation (RAG): Storing document chunks as embeddings and retrieving relevant context to feed into large language models.
Recommendation systems: Finding similar items (products, content, users) based on embedding similarity.
Image search: Storing image embeddings and finding visually similar images.
Deduplication: Identifying near-duplicate records by finding vectors within a distance threshold.

Several projects have grown around pgvector to extend its capabilities:

Tool	Developer	Description
pgvectorscale	Timescale	Adds StreamingDiskANN index and Statistical Binary Quantization for improved performance on large datasets
pgai	Timescale	Provides convenience functions for calling embedding APIs directly from SQL
VectorChord	VectorChord	Alternative high-performance vector index for PostgreSQL
LangChain	LangChain	PGVector vector store integration for RAG pipelines
LlamaIndex	LlamaIndex	Native pgvector vector store connector
Supabase Vecs	Supabase	Python client for managing pgvector collections in Supabase

Current state

pgvector is at version 0.8.3 (released June 17, 2026) and supports PostgreSQL 13 through 18 ^[4]. The project has over 21,900 GitHub stars and is one of the most popular PostgreSQL extensions ^[4]. Its adoption continues to grow as more organizations build AI-powered applications on existing PostgreSQL infrastructure rather than introducing new database systems.

The project's development pace remains steady, with Andrew Kane continuing as the primary maintainer. The introduction of halfvec and sparsevec types in recent versions shows that pgvector is tracking developments in the embedding model space, where dimensionality reduction and sparse retrieval techniques are becoming more common. For teams already on PostgreSQL, pgvector remains the most practical entry point into vector search.

References

AWS Database Blog. Optimize generative AI applications with pgvector indexing: A deep dive into IVFFlat and HNSW techniques ↩
Instaclustr. pgvector performance: Benchmark results and 5 ways to boost performance ↩
DEV Community. PostgreSQL as a Vector Database: When to Use pgvector vs Pinecone vs Weaviate ↩
pgvector GitHub repository. https://github.com/pgvector/pgvector ↩
Supabase. pgvector: Embeddings and vector similarity ↩
Neon. Optimize pgvector search ↩
Severalnines. Vector Similarity Search with PostgreSQL's pgvector
Medium. PGVector: HNSW vs IVFFlat - A Comprehensive Study ↩
Tiger Data. Pgvector Is Now Faster than Pinecone at 75% Less Cost ↩
Timescale. pgvectorscale GitHub Repository ↩
Calmops. PostgreSQL Vector Search with pgvector: Complete Guide 2026
Microsoft Learn. How to optimize performance when using pgvector
PostgreSQL. pgvector 0.8.0 Released ↩
PostgreSQL. pgvector 0.7.0 Released ↩
Supabase. What's new in pgvector v0.7.0 and beyond ↩
PostgreSQL. pgvector 0.8.2 Released ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

6 revisions by 1 contributors · full history

Suggest edit

pgvector

History and development

How does pgvector work?

Vector types

Indexing strategies

IVFFlat

IVFFlat parameter tuning

HNSW

HNSW parameter tuning

HNSW vs IVFFlat: when to use which

Iterative scans

How can you optimize pgvector query performance?

pgvectorscale extension

StreamingDiskANN index

Statistical Binary Quantization

Performance benchmarks

Performance considerations

How does pgvector compare with dedicated vector databases?

Migration considerations

Which cloud providers support pgvector?

What is pgvector used for?

Current state

References

Improve this article

What links here

What links here

History and development

How does pgvector work?

Vector types

Indexing strategies

IVFFlat

IVFFlat parameter tuning

HNSW

HNSW parameter tuning

HNSW vs IVFFlat: when to use which

Iterative scans

How can you optimize pgvector query performance?

pgvectorscale extension

StreamingDiskANN index

Statistical Binary Quantization

Performance benchmarks

Performance considerations

How does pgvector compare with dedicated vector databases?

Migration considerations

Which cloud providers support pgvector?

What is pgvector used for?

Ecosystem and related tools

Current state

References

Improve this article

Related Articles

Model hubs

Weaviate

Chroma

Qdrant

Milvus

Ray (framework)

What links here

Related Articles

Model hubs

Weaviate

Chroma

Qdrant

Milvus

Ray (framework)

What links here