HNSW
Last reviewed
Apr 30, 2026
Sources
19 citations
Review status
Source-backed
Revision
v2 · 3,583 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Apr 30, 2026
Sources
19 citations
Review status
Source-backed
Revision
v2 · 3,583 words
Add missing citations, update stale details, or suggest a clearer explanation.
Hierarchical Navigable Small World (HNSW) is an approximate nearest-neighbor search algorithm based on multi-layer proximity graphs. It was introduced by Yury Malkov and Dmitry Yashunin in a 2016 arXiv preprint, formally published in IEEE Transactions on Pattern Analysis and Machine Intelligence in 2018 [1]. HNSW is one of the most widely deployed algorithms for high-dimensional vector similarity search, and is the default index in many modern vector databases, including Pinecone, Weaviate, Qdrant, Milvus, Vespa, Elasticsearch, OpenSearch, Redis, and PostgreSQL's pgvector extension [2][3][4][5].
The algorithm builds a layered graph where each vector is a node connected to a bounded number of neighbors. Higher layers contain progressively fewer points and serve as a coarse navigation structure, while the bottom layer contains every vector and provides fine-grained refinement. A nearest-neighbor query starts at the top, performs greedy descent toward the query, and finishes with a beam search at the base layer. Construction is incremental: each insertion samples a top layer from a geometric distribution, descends through the existing graph, and links the new point to its nearest neighbors at each layer where it appears. Performance scales as O(log N) in the expected case for both insertion and query; recall and throughput are tuned through three main hyperparameters: M, efConstruction, and efSearch.
HNSW has become a central component of the modern retrieval-augmented generation stack, where dense vector embeddings produced by language and multimodal models are indexed and queried at low latency to supply external context to large language models.
Given a set of N points in a metric space and a query point q, the nearest-neighbor search problem asks for the point in the set closest to q under a chosen distance function. Exact methods such as k-d trees and ball trees work well in low dimensions but degrade rapidly as dimensionality grows. Beyond roughly twenty dimensions, distance-based pruning loses effectiveness and tree methods devolve to near-linear scans. This curse of dimensionality motivates approximate algorithms that trade a small amount of recall for orders-of-magnitude gains in speed and scalability [6][7].
Dense embeddings used in modern AI routinely live in 100 to 3,000 dimensions. Common embedding models such as text-embedding-3 from OpenAI, BGE, E5, and various sentence-transformers checkpoints output 384, 768, 1,024, 1,536, or 3,072 dimensional vectors. At these dimensionalities, exact search over millions or billions of points is too slow for interactive applications.
Before HNSW, several families of approximate methods were in widespread use:
HNSW addresses these limitations by introducing a hierarchical structure that explicitly separates long-range and short-range edges across distinct layers. On the ANN-Benchmarks leaderboards maintained by Aumüller, Bernhardsson, and Faltings, HNSW consistently sits on the Pareto frontier of recall-versus-throughput trade-offs across diverse datasets, including SIFT-1M, GIST-1M, and the deep image embeddings benchmark [9].
HNSW draws on three threads of prior work:
Watts-Strogatz small-world networks [10]. In their 1998 Nature paper, Watts and Strogatz showed that adding a small fraction of long-range random edges to a regular lattice produces a graph with high local clustering and short average path length, with random walks reaching any target in O(log N) expected steps. HNSW exploits this property at each layer to enable fast greedy routing from any starting point. See small-world network and Watts-Strogatz.
Kleinberg's navigable small worlds [11]. Jon Kleinberg proved in 2000 that a small-world graph is navigable by a decentralized greedy algorithm only if its long-range edges follow a specific distribution: in a d-dimensional grid, the probability of a long-range edge to a vertex at distance r should fall off as r^(-d). HNSW's heuristic neighbor selection approximates this property in a metric space without requiring an explicit grid embedding.
Skip lists [12]. The skip list, introduced by William Pugh in 1990, is a probabilistic data structure that maintains multiple linked lists at increasing levels of coarseness. HNSW transposes this idea from one-dimensional ordered keys to general metric spaces: instead of a chain of skip pointers, each layer is a small-world proximity graph; instead of comparison-based jumps, each step follows the edge to the closest neighbor.
The synthesis of these ideas, plus a layer-assignment distribution and a heuristic neighbor selection rule, is the core contribution of the HNSW paper.
HNSW maintains a graph G partitioned into layers 0, 1, 2, .... Layer 0 contains every point in the index. Each higher layer is a strict subset of the layer below it, and the number of points in each layer decays exponentially. Edges exist only within a layer: a point present in layers 0 through l has a separate adjacency list at each layer.
For every point, the maximum layer is sampled at insertion time from a geometric distribution. Concretely, the insertion algorithm draws a uniform random number u in (0, 1] and computes:
level = floor(-ln(u) * mL)
where mL = 1 / ln(M) is a normalization constant. With this choice, the probability of being assigned to layer l decays as 1 / M^l, so layer 1 contains roughly 1 / M of the points, layer 2 roughly 1 / M^2, and so on. The expected total number of layers is O(log_M N), which gives the algorithm its logarithmic search complexity.
A single global entry point is maintained, pointing to a node in the topmost non-empty layer. When a freshly inserted point reaches a layer above the current entry point, the entry point is updated.
The insertion procedure for a new point q proceeds as follows:
L for q using the geometric formula above.ep in the current top layer, descend layer by layer down to layer L + 1. At each layer, perform a greedy walk: repeatedly move to the neighbor of the current node closest to q until no neighbor is closer. Pass the closest node down to the next layer as the new entry point.min(L, top) down to layer 0, perform an efConstruction-sized search. The search maintains two priority queues: a candidate set C and a dynamic result set W of the efConstruction closest nodes seen so far. At each step, the closest unvisited candidate is popped from C, its neighbors are visited, and any neighbor closer than the worst element of W (or while W is not full) is inserted into both C and W. The search terminates when no candidate in C is closer than the worst element of W.M (or Mmax0 at layer 0) neighbors. The original paper proposes two heuristics: a simple top-M selection, and a diversity-aware heuristic that excludes a candidate if the new edge would be redundant given an already-selected neighbor closer to that candidate. The diversity heuristic is the default in hnswlib and is one reason HNSW outperforms simpler proximity graphs on clustered data.q and the selected neighbors. For each neighbor whose degree now exceeds Mmax (or Mmax0 at layer 0), prune its adjacency list using the same heuristic.The upper-layer descent costs O(log N) distance computations on average, while the layer-0 beam search dominates wall-clock cost and is controlled by efConstruction.
The k-nearest-neighbor query uses the same machinery as insertion, with two passes:
efSearch, returning the k closest points found.Increasing efSearch produces better recall by exploring more of the layer-0 graph, at the cost of more distance computations per query. Because efSearch is set per query, recall and throughput can be tuned at runtime without rebuilding the index.
A range-search variant is also supported, with the beam search continuing until no candidate is closer than a user-supplied radius.
HNSW has a small number of hyperparameters that determine its memory footprint, build time, and query performance. The table below summarizes typical recommended ranges drawn from the paper, hnswlib, and Faiss documentation [1][2][13].
| Parameter | Meaning | Typical range | Effect |
|---|---|---|---|
M | Maximum out-degree per node at layers > 0 | 8 to 64 | Higher M increases recall, memory, and build time |
Mmax0 | Maximum out-degree at layer 0 | typically 2 * M | Layer 0 connections dominate memory; default in hnswlib is 2 * M |
efConstruction | Dynamic candidate list size during insertion | 64 to 800 | Higher values improve graph quality; build time grows roughly linearly |
efSearch (ef) | Dynamic candidate list size during query | 16 to 800 | Tunable per query; trades recall against latency |
mL | Layer-assignment normalization | 1 / ln(M) | Default rarely changed; controls layer growth rate |
seed | RNG seed for layer sampling | implementation-defined | Determines reproducibility |
Most implementations default to M = 16 and efConstruction = 200. For high-dimensional embedding workloads, M = 32 or M = 48 with efConstruction = 400 is common. efSearch is tuned per workload to hit a target recall.
The HNSW algorithm is metric-agnostic; it only requires a function returning a non-negative real number for any pair of points. The standard distance functions supported by mainstream implementations are:
L2), the squared or unsquared L2 norm of the difference vector.L2-normalized vectors.The original paper notes that HNSW is also well suited to non-metric similarity measures, since neither construction nor search relies on the triangle inequality except as a heuristic. The companion library nmslib was designed for exactly this case [14]. In production vector databases the dominant distance is cosine, because most modern embedding models are trained with cosine or inner-product objectives.
Let N be the number of indexed points, d the dimensionality of each vector, and assume distance computations cost O(d).
O(d * efConstruction * log N) expected. The log N factor arises from the average number of layers; at each layer the beam search examines roughly efConstruction nodes.O(d * efSearch * log N) expected, with efSearch substituted for efConstruction at layer 0.O(N * d) for vectors plus O(N * M) graph edges. Each edge in hnswlib consumes 4 bytes (a 32-bit node id). For M = 16, this is around 192 bytes of graph per node on top of raw vector storage, often 1.0 to 1.5 times the raw vector size for typical embedding dimensionalities.O(N * d * efConstruction * log N). Building an index over 100 million 768-dimensional vectors with M = 32 and efConstruction = 200 typically takes hours on a single multi-core machine.Empirically, for a fixed recall target HNSW offers some of the best query throughput per unit memory of any in-memory ANN method, especially on L2 and cosine metrics in moderate-to-high dimensions [9].
HNSW has been implemented in many open-source libraries and embedded inside most commercial vector databases.
| Implementation | Language | License | Notes |
|---|---|---|---|
| hnswlib | C++ with Python bindings | Apache 2.0 | Reference implementation by Yury Malkov; header-only library [2] |
| nmslib | C++ with Python bindings | Apache 2.0 | Broader Non-Metric Space Library that includes HNSW and many other methods [14] |
| Faiss | C++ with Python bindings | MIT | Meta AI Research; provides IndexHNSWFlat, IndexHNSWPQ, IndexHNSWSQ variants combining HNSW with quantization [13] |
| Lucene 9.1+ | Java | Apache 2.0 | Apache Lucene shipped HNSW in March 2022; powers vector search in Elasticsearch 8.x and OpenSearch [15] |
| Vespa | C++ with Java config | Apache 2.0 | Native real-time HNSW with concurrent updates and metadata filtering [4] |
| Qdrant | Rust | Apache 2.0 | Production-grade vector DB built around HNSW with payload filtering |
| Milvus | C++/Go | Apache 2.0 | Default index for many workloads; also supports IVF, PQ, DiskANN |
| Weaviate | Go | BSD-3 | Native HNSW with custom tombstone-based deletion |
| Pinecone | Proprietary | closed | Managed vector database; documented as graph-based with HNSW lineage |
| pgvector | C (PostgreSQL extension) | PostgreSQL license | Added HNSW index in version 0.5.0, August 2023, alongside its earlier IVFFlat index [16] |
| Redis Stack / RediSearch | C | RSAL | HNSW available as a vector index type in RediSearch |
| Elasticsearch | Java | Elastic License v2 | Wraps Lucene's HNSW; introduced in 8.0, GA in 8.4 |
| OpenSearch | Java | Apache 2.0 | Wraps Lucene's HNSW and Faiss HNSW via the k-NN plugin |
Most implementations follow the original paper closely, with two common deviations: lazy deletion via tombstones, and disk-friendly graph layouts.
HNSW has inspired and competed with several follow-up approaches:
k-NN graph) plus a tree for entry-point selection. NGT-ONNG is competitive with HNSW on some ANN-Benchmarks workloads.IndexHNSWPQ in Faiss compresses vectors with product quantization, reducing memory by 4 to 32 times at the cost of recall. Vector databases such as Qdrant and Weaviate also offer 1-bit or 2-bit quantization on top of HNSW, often combined with rescoring on full-precision vectors.In practice, HNSW remains the default in-memory choice for dense vector search at the scale of millions to a few hundred million vectors. DiskANN tends to win at billion scale when disk residency matters, while quantization-heavy methods like ScaNN are preferred when memory is the binding constraint.
Despite its strong default behavior, HNSW has several practical limitations that operators must plan for:
M, layer 0 only) to several hundred bytes. For 100 million 1,536-dimensional float32 vectors, raw storage alone is around 614 GB; the graph adds tens of gigabytes more.O(N * log N), the constant is non-trivial. Bulk-loading 100 million or more vectors can take hours even on parallelized implementations.k. Vector databases use filterable HNSW variants, multi-graph partitioning, and hybrid query planners.efSearch typically eliminates these tails at the cost of latency.hnswlib reference implementation is released under the Apache 2.0 license and the algorithm has been published openly. There is no known patent encumbrance on the core method.The rise of large language models and dense retrieval has pulled HNSW from a specialized algorithm into a core piece of production infrastructure. Several patterns recur:
text-embedding-3-small, BGE, or E5, and stores the resulting vectors in a vector database with HNSW indexing. At query time, the user prompt is embedded, the top k matching passages are retrieved, and they are concatenated into the language model's context window.Yury Malkov is the lead author of HNSW and of its predecessor NSW. After working at Mail.ru and Sense.lab on large-scale similarity search, he later joined OpenAI. He maintains hnswlib and contributed to nmslib. His body of work on graph-based ANN search includes both theoretical analysis and the reference implementations most production systems are built on.
Dmitry Yashunin co-authored the HNSW paper, with work focused on algorithm design and high-performance computing for similarity search. The Malkov and Yashunin paper has accumulated thousands of citations since its 2016 preprint and is one of the most-cited works in modern information retrieval.
The earlier NSW paper had additional co-authors Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov [8].