HNSW

Hierarchical Navigable Small World (HNSW) is an approximate nearest-neighbor search algorithm based on multi-layer proximity graphs. It was introduced by Yury Malkov and Dmitry Yashunin in a 2016 arXiv preprint, formally published in IEEE Transactions on Pattern Analysis and Machine Intelligence in 2018 ^[1]. HNSW is one of the most widely deployed algorithms for high-dimensional vector similarity search, and is the default index in many modern vector databases, including Pinecone, Weaviate, Qdrant, Milvus, Vespa, Elasticsearch, OpenSearch, Redis, and PostgreSQL's pgvector extension ^[2]^[3]^[4]^[5].

The algorithm builds a layered graph where each vector is a node connected to a bounded number of neighbors. Higher layers contain progressively fewer points and serve as a coarse navigation structure, while the bottom layer contains every vector and provides fine-grained refinement. A nearest-neighbor query starts at the top, performs greedy descent toward the query, and finishes with a beam search at the base layer. Construction is incremental: each insertion samples a top layer from a geometric distribution, descends through the existing graph, and links the new point to its nearest neighbors at each layer where it appears. Performance scales as O(log N) in the expected case for both insertion and query; recall and throughput are tuned through three main hyperparameters: M, efConstruction, and efSearch.

HNSW has become a central component of the modern retrieval-augmented generation stack, where dense vector embeddings produced by language and multimodal models are indexed and queried at low latency to supply external context to large language models.

Background and motivation

Given a set of N points in a metric space and a query point q, the nearest-neighbor search problem asks for the point in the set closest to q under a chosen distance function. Exact methods such as k-d trees and ball trees work well in low dimensions but degrade rapidly as dimensionality grows. Beyond roughly twenty dimensions, distance-based pruning loses effectiveness and tree methods devolve to near-linear scans. This curse of dimensionality motivates approximate algorithms that trade a small amount of recall for orders-of-magnitude gains in speed and scalability ^[6]^[7].

Dense embeddings used in modern AI routinely live in 100 to 3,000 dimensions. Common embedding models such as text-embedding-3 from OpenAI, BGE, E5, and various sentence-transformers checkpoints output 384, 768, 1,024, 1,536, or 3,072 dimensional vectors. At these dimensionalities, exact search over millions or billions of points is too slow for interactive applications.

Before HNSW, several families of approximate methods were in widespread use:

Locality-sensitive hashing (LSH): hash functions designed so that nearby points collide with high probability. LSH offers strong theoretical guarantees but typically has lower empirical recall per byte of memory than graph-based methods.
Inverted file index (IVF): partition the vector space using k-means and search only the partitions closest to the query. Often combined with quantization.
Product quantization (PQ): split each vector into subvectors and quantize each independently, enabling compact codes and fast asymmetric distance computation.
Navigable Small World (NSW): the predecessor of HNSW, proposed by Malkov, Ponomarenko, Logvinov, and Krylov in 2014 ^[8]. NSW maintains a single proximity graph that grows by inserting each new point and connecting it to its nearest neighbors among already-inserted points. NSW reached state-of-the-art accuracy on several benchmarks but suffered from polylogarithmic search complexity with a meaningful constant and from poor performance on clustered data.

HNSW addresses these limitations by introducing a hierarchical structure that explicitly separates long-range and short-range edges across distinct layers. On the ANN-Benchmarks leaderboards maintained by Aumüller, Bernhardsson, and Faltings, HNSW consistently sits on the Pareto frontier of recall-versus-throughput trade-offs across diverse datasets, including SIFT-1M, GIST-1M, and the deep image embeddings benchmark ^[9].

Theoretical roots

HNSW draws on three threads of prior work:

Watts-Strogatz small-world networks ^[10]. In their 1998 Nature paper, Watts and Strogatz showed that adding a small fraction of long-range random edges to a regular lattice produces a graph with high local clustering and short average path length, with random walks reaching any target in O(log N) expected steps. HNSW exploits this property at each layer to enable fast greedy routing from any starting point. See small-world network and Watts-Strogatz.
Kleinberg's navigable small worlds ^[11]. Jon Kleinberg proved in 2000 that a small-world graph is navigable by a decentralized greedy algorithm only if its long-range edges follow a specific distribution: in a d-dimensional grid, the probability of a long-range edge to a vertex at distance r should fall off as r^(-d). HNSW's heuristic neighbor selection approximates this property in a metric space without requiring an explicit grid embedding.
Skip lists ^[12]. The skip list, introduced by William Pugh in 1990, is a probabilistic data structure that maintains multiple linked lists at increasing levels of coarseness. HNSW transposes this idea from one-dimensional ordered keys to general metric spaces: instead of a chain of skip pointers, each layer is a small-world proximity graph; instead of comparison-based jumps, each step follows the edge to the closest neighbor.

The synthesis of these ideas, plus a layer-assignment distribution and a heuristic neighbor selection rule, is the core contribution of the HNSW paper.

Algorithm

Multi-layer structure

HNSW maintains a graph G partitioned into layers 0, 1, 2, .... Layer 0 contains every point in the index. Each higher layer is a strict subset of the layer below it, and the number of points in each layer decays exponentially. Edges exist only within a layer: a point present in layers 0 through l has a separate adjacency list at each layer.

For every point, the maximum layer is sampled at insertion time from a geometric distribution. Concretely, the insertion algorithm draws a uniform random number u in (0, 1] and computes:

level = floor(-ln(u) * mL)

where mL = 1 / ln(M) is a normalization constant. With this choice, the probability of being assigned to layer l decays as 1 / M^l, so layer 1 contains roughly 1 / M of the points, layer 2 roughly 1 / M^2, and so on. The expected total number of layers is O(log_M N), which gives the algorithm its logarithmic search complexity.

A single global entry point is maintained, pointing to a node in the topmost non-empty layer. When a freshly inserted point reaches a layer above the current entry point, the entry point is updated.

Insertion

The insertion procedure for a new point q proceeds as follows:

Sample top layer L for q using the geometric formula above.
Greedy phase across upper layers. Starting from the global entry point ep in the current top layer, descend layer by layer down to layer L + 1. At each layer, perform a greedy walk: repeatedly move to the neighbor of the current node closest to q until no neighbor is closer. Pass the closest node down to the next layer as the new entry point.
Beam search across lower layers. From layer min(L, top) down to layer 0, perform an efConstruction-sized search. The search maintains two priority queues: a candidate set C and a dynamic result set W of the efConstruction closest nodes seen so far. At each step, the closest unvisited candidate is popped from C, its neighbors are visited, and any neighbor closer than the worst element of W (or while W is not full) is inserted into both C and W. The search terminates when no candidate in C is closer than the worst element of W.
Select neighbors. From the resulting candidate set, select up to M (or Mmax0 at layer 0) neighbors. The original paper proposes two heuristics: a simple top-M selection, and a diversity-aware heuristic that excludes a candidate if the new edge would be redundant given an already-selected neighbor closer to that candidate. The diversity heuristic is the default in hnswlib and is one reason HNSW outperforms simpler proximity graphs on clustered data.
Bidirectional connection and pruning. Add edges between q and the selected neighbors. For each neighbor whose degree now exceeds Mmax (or Mmax0 at layer 0), prune its adjacency list using the same heuristic.

The upper-layer descent costs O(log N) distance computations on average, while the layer-0 beam search dominates wall-clock cost and is controlled by efConstruction.

Search

The k-nearest-neighbor query uses the same machinery as insertion, with two passes:

Starting at the global entry point in the topmost layer, perform a greedy walk down to layer 1, always moving to the neighbor closest to the query.
At layer 0, run a beam search with dynamic candidate-list size efSearch, returning the k closest points found.

Increasing efSearch produces better recall by exploring more of the layer-0 graph, at the cost of more distance computations per query. Because efSearch is set per query, recall and throughput can be tuned at runtime without rebuilding the index.

A range-search variant is also supported, with the beam search continuing until no candidate is closer than a user-supplied radius.

Hyperparameters

HNSW has a small number of hyperparameters that determine its memory footprint, build time, and query performance. The table below summarizes typical recommended ranges drawn from the paper, hnswlib, and Faiss documentation ^[1]^[2]^[13].

Parameter	Meaning	Typical range	Effect
`M`	Maximum out-degree per node at layers `> 0`	8 to 64	Higher `M` increases recall, memory, and build time
`Mmax0`	Maximum out-degree at layer 0	typically `2 * M`	Layer 0 connections dominate memory; default in `hnswlib` is `2 * M`
`efConstruction`	Dynamic candidate list size during insertion	64 to 800	Higher values improve graph quality; build time grows roughly linearly
`efSearch` (`ef`)	Dynamic candidate list size during query	16 to 800	Tunable per query; trades recall against latency
`mL`	Layer-assignment normalization	`1 / ln(M)`	Default rarely changed; controls layer growth rate
`seed`	RNG seed for layer sampling	implementation-defined	Determines reproducibility

Most implementations default to M = 16 and efConstruction = 200. For high-dimensional embedding workloads, M = 32 or M = 48 with efConstruction = 400 is common. efSearch is tuned per workload to hit a target recall.

Distance metrics

The HNSW algorithm is metric-agnostic; it only requires a function returning a non-negative real number for any pair of points. The standard distance functions supported by mainstream implementations are:

Euclidean distance (L2), the squared or unsquared L2 norm of the difference vector.
Inner product, used either as a similarity (negated for use as a distance) or paired with normalization to yield cosine similarity.
Cosine similarity, equivalent to inner product on L2-normalized vectors.
Hamming distance for binary vectors.

The original paper notes that HNSW is also well suited to non-metric similarity measures, since neither construction nor search relies on the triangle inequality except as a heuristic. The companion library nmslib was designed for exactly this case ^[14]. In production vector databases the dominant distance is cosine, because most modern embedding models are trained with cosine or inner-product objectives.

Complexity

Let N be the number of indexed points, d the dimensionality of each vector, and assume distance computations cost O(d).

Insertion: O(d * efConstruction * log N) expected. The log N factor arises from the average number of layers; at each layer the beam search examines roughly efConstruction nodes.
Search: O(d * efSearch * log N) expected, with efSearch substituted for efConstruction at layer 0.
Memory: O(N * d) for vectors plus O(N * M) graph edges. Each edge in hnswlib consumes 4 bytes (a 32-bit node id). For M = 16, this is around 192 bytes of graph per node on top of raw vector storage, often 1.0 to 1.5 times the raw vector size for typical embedding dimensionalities.
Index build scales as O(N * d * efConstruction * log N). Building an index over 100 million 768-dimensional vectors with M = 32 and efConstruction = 200 typically takes hours on a single multi-core machine.

Empirically, for a fixed recall target HNSW offers some of the best query throughput per unit memory of any in-memory ANN method, especially on L2 and cosine metrics in moderate-to-high dimensions ^[9].

Implementations

HNSW has been implemented in many open-source libraries and embedded inside most commercial vector databases.

Implementation	Language	License	Notes
hnswlib	C++ with Python bindings	Apache 2.0	Reference implementation by Yury Malkov; header-only library ^[2]
nmslib	C++ with Python bindings	Apache 2.0	Broader Non-Metric Space Library that includes HNSW and many other methods ^[14]
Faiss	C++ with Python bindings	MIT	Meta AI Research; provides `IndexHNSWFlat`, `IndexHNSWPQ`, `IndexHNSWSQ` variants combining HNSW with quantization ^[13]
Lucene 9.1+	Java	Apache 2.0	Apache Lucene shipped HNSW in March 2022; powers vector search in Elasticsearch 8.x and OpenSearch ^[15]
Vespa	C++ with Java config	Apache 2.0	Native real-time HNSW with concurrent updates and metadata filtering ^[4]
Qdrant	Rust	Apache 2.0	Production-grade vector DB built around HNSW with payload filtering
Milvus	C++/Go	Apache 2.0	Default index for many workloads; also supports IVF, PQ, DiskANN
Weaviate	Go	BSD-3	Native HNSW with custom tombstone-based deletion
Pinecone	Proprietary	closed	Managed vector database; documented as graph-based with HNSW lineage
pgvector	C (PostgreSQL extension)	PostgreSQL license	Added HNSW index in version 0.5.0, August 2023, alongside its earlier IVFFlat index ^[16]
Redis Stack / RediSearch	C	RSAL	HNSW available as a vector index type in RediSearch
Elasticsearch	Java	Elastic License v2	Wraps Lucene's HNSW; introduced in 8.0, GA in 8.4
OpenSearch	Java	Apache 2.0	Wraps Lucene's HNSW and Faiss HNSW via the k-NN plugin

Most implementations follow the original paper closely, with two common deviations: lazy deletion via tombstones, and disk-friendly graph layouts.

Variants and alternatives

HNSW has inspired and competed with several follow-up approaches:

DiskANN. Subramanya, Devvrit, Simhadri, Krishnaswamy, and Kadekodi proposed DiskANN at NeurIPS 2019 ^[17]. Built on the Vamana graph construction algorithm, it is engineered to keep most of the index on SSD while serving billion-scale workloads with a small RAM footprint. Microsoft Research released a reference implementation, and the algorithm underlies Microsoft Bing and Azure Cognitive Search.
NGT. Developed at Yahoo Japan, NGT is a tree-graph hybrid that uses an ANNG (approximate k-NN graph) plus a tree for entry-point selection. NGT-ONNG is competitive with HNSW on some ANN-Benchmarks workloads.
ScaNN. Google's 2020 algorithm combines IVF-style coarse partitioning with anisotropic vector quantization. It is not graph-based but reaches strong recall-throughput points on certain benchmarks.
HNSW with quantization. IndexHNSWPQ in Faiss compresses vectors with product quantization, reducing memory by 4 to 32 times at the cost of recall. Vector databases such as Qdrant and Weaviate also offer 1-bit or 2-bit quantization on top of HNSW, often combined with rescoring on full-precision vectors.
NSG and Vamana. Alternative graph constructions that prune edges aggressively to keep degrees low while maintaining good navigability. Vamana is the construction used by DiskANN.
NSW. The single-layer predecessor ^[8], largely superseded.

In practice, HNSW remains the default in-memory choice for dense vector search at the scale of millions to a few hundred million vectors. DiskANN tends to win at billion scale when disk residency matters, while quantization-heavy methods like ScaNN are preferred when memory is the binding constraint.

Trade-offs and limitations

Despite its strong default behavior, HNSW has several practical limitations that operators must plan for:

Memory pressure. Most implementations hold both vectors and the graph in RAM. Per-vector overhead beyond the raw embedding ranges from roughly 24 bytes (small M, layer 0 only) to several hundred bytes. For 100 million 1,536-dimensional float32 vectors, raw storage alone is around 614 GB; the graph adds tens of gigabytes more.
Indexing time. While the asymptotic complexity is O(N * log N), the constant is non-trivial. Bulk-loading 100 million or more vectors can take hours even on parallelized implementations.
Mutability. HNSW supports inserts well. Deletions are typically implemented as soft tombstones that mark a node as removed but leave the graph intact; the deleted node is still traversed at query time. Hard deletes that reconnect surrounding nodes are expensive; full reindexing is often the cleanest option for large-scale rewrites.
Update churn. Heavy overwrite workloads can degrade graph quality, since stale edges accumulate. Production systems typically schedule periodic compaction or rebuilds.
Filtered search. Combining vector similarity with structured metadata predicates (timestamp ranges, tenant IDs, ACLs) is non-trivial. Pre-filtering can disconnect the graph and force expensive fallbacks; post-filtering wastes work and may miss enough candidates to satisfy the requested k. Vector databases use filterable HNSW variants, multi-graph partitioning, and hybrid query planners.
Disk residency. Native HNSW assumes random-access reads to graph and vector data. SSD-based variants exist but generally rely on different graph constructions tuned for sequential access patterns; this is the niche DiskANN was designed to fill.
Worst-case behavior. Recall is not guaranteed in the strict sense; pathological data distributions can leave a few queries with poor results. Increasing efSearch typically eliminates these tails at the cost of latency.
Patent and licensing. The hnswlib reference implementation is released under the Apache 2.0 license and the algorithm has been published openly. There is no known patent encumbrance on the core method.

Use in AI and LLM workflows

The rise of large language models and dense retrieval has pulled HNSW from a specialized algorithm into a core piece of production infrastructure. Several patterns recur:

Retrieval-augmented generation. A typical RAG stack chunks documents into passages, embeds each chunk with a model such as text-embedding-3-small, BGE, or E5, and stores the resulting vectors in a vector database with HNSW indexing. At query time, the user prompt is embedded, the top k matching passages are retrieved, and they are concatenated into the language model's context window.
Semantic search. Beyond RAG, HNSW indexes power semantic search over product catalogs, documentation portals, code repositories, and chat history. Many search platforms now ship HNSW alongside classical inverted indices for hybrid lexical-plus-semantic ranking.
Multimodal retrieval. Embeddings produced by CLIP-style models, image encoders, and audio encoders are also indexed with HNSW for cross-modal lookup.
Agentic workflows. Frameworks like LangChain and LlamaIndex abstract over vector databases, but the underlying index is almost always HNSW, either in a managed product such as Pinecone or self-hosted options such as Qdrant, Weaviate, Milvus, or pgvector.
On-device and edge. Small HNSW deployments power local-first applications such as notes apps, IDE plugins, and chat clients that embed documents locally.

Authors

Yury Malkov is the lead author of HNSW and of its predecessor NSW. After working at Mail.ru and Sense.lab on large-scale similarity search, he later joined OpenAI. He maintains hnswlib and contributed to nmslib. His body of work on graph-based ANN search includes both theoretical analysis and the reference implementations most production systems are built on.

Dmitry Yashunin co-authored the HNSW paper, with work focused on algorithm design and high-performance computing for similarity search. The Malkov and Yashunin paper has accumulated thousands of citations since its 2016 preprint and is one of the most-cited works in modern information retrieval.

The earlier NSW paper had additional co-authors Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov ^[8].

References

Malkov, Y. A., & Yashunin, D. A. (2018). Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. *IEEE Transactions on Pattern Analysis and Machine Intelligence*, 42(4), 824-836. arXiv:1603.09320 (first version 2016). https://arxiv.org/abs/1603.09320
hnswlib: header-only C++/Python implementation by Yury Malkov. https://github.com/nmslib/hnswlib
Pinecone documentation, vector database concepts. https://docs.pinecone.io/
Vespa documentation, approximate nearest neighbor search using HNSW. https://docs.vespa.ai/en/approximate-nn-hnsw.html
Weaviate documentation, ANN index types and HNSW configuration. https://weaviate.io/developers/weaviate/concepts/vector-index
Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When Is Nearest Neighbor Meaningful? *International Conference on Database Theory*. https://link.springer.com/chapter/10.1007/3-540-49257-7_15
Indyk, P., & Motwani, R. (1998). Approximate nearest neighbors: towards removing the curse of dimensionality. *STOC 1998*. https://dl.acm.org/doi/10.1145/276698.276876
Malkov, Y., Ponomarenko, A., Logvinov, A., & Krylov, V. (2014). Approximate nearest neighbor algorithm based on navigable small world graphs. *Information Systems*, 45, 61-68. https://www.sciencedirect.com/science/article/abs/pii/S0306437913001300
Aumüller, M., Bernhardsson, E., & Faltings, A. ANN-Benchmarks: a benchmarking tool for approximate nearest neighbor algorithms. https://ann-benchmarks.com/
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of small-world networks. *Nature*, 393, 440-442. https://www.nature.com/articles/30918
Kleinberg, J. M. (2000). The small-world phenomenon: an algorithmic perspective. *STOC 2000*. https://dl.acm.org/doi/10.1145/335305.335325
Pugh, W. (1990). Skip lists: a probabilistic alternative to balanced trees. *Communications of the ACM*, 33(6), 668-676. https://dl.acm.org/doi/10.1145/78973.78977
Faiss documentation, indexing 1G vectors and HNSW index types. https://github.com/facebookresearch/faiss/wiki/Indexing-1G-vectors
Boytsov, L., & Naidan, B. nmslib: Non-Metric Space Library. https://github.com/nmslib/nmslib
Apache Lucene 9.1 release notes, KnnVectorQuery and HNSW. https://lucene.apache.org/core/9_1_0/changes/Changes.html
pgvector 0.5.0 release notes, HNSW index support. https://github.com/pgvector/pgvector/releases/tag/v0.5.0
Subramanya, S. J., Devvrit, F., Simhadri, H. V., Krishnaswamy, R., & Kadekodi, R. (2019). DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. *NeurIPS 2019*. https://proceedings.neurips.cc/paper/2019/hash/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Abstract.html
Qdrant documentation, HNSW index parameters and filtering. https://qdrant.tech/documentation/concepts/indexing/
OpenSearch k-NN plugin documentation. https://opensearch.org/docs/latest/search-plugins/knn/index/

HNSW

Background and motivation

Theoretical roots

Algorithm

Multi-layer structure

Insertion

Search

Hyperparameters

Distance metrics

Complexity

Implementations

Variants and alternatives

Trade-offs and limitations

Use in AI and LLM workflows

Authors

See also

References

Improve this article

Background and motivation

Theoretical roots

Algorithm

Multi-layer structure

Insertion

Search

Hyperparameters

Distance metrics

Complexity

Implementations

Variants and alternatives

Trade-offs and limitations

Use in AI and LLM workflows

Authors

See also

References

Background and motivation

Theoretical roots

Algorithm

Multi-layer structure

Insertion

Search

Hyperparameters

Distance metrics

Complexity

Implementations

Variants and alternatives

Trade-offs and limitations

Use in AI and LLM workflows

Authors

See also

References

Improve this article

Related Articles

Product quantization

ARC-AGI 2

MTEB (Massive Text Embedding Benchmark)

BM25 (Okapi BM25)

TF-IDF (Term Frequency-Inverse Document Frequency)

XGBoost

Background and motivation

Theoretical roots

Algorithm

Multi-layer structure

Insertion

Search

Hyperparameters

Distance metrics

Complexity

Implementations

Variants and alternatives

Trade-offs and limitations

Use in AI and LLM workflows

Authors

See also

References

Related Articles

Product quantization

ARC-AGI 2

MTEB (Massive Text Embedding Benchmark)

BM25 (Okapi BM25)

TF-IDF (Term Frequency-Inverse Document Frequency)

XGBoost