Product quantization

Product quantization (PQ) is a vector-compression technique for approximate nearest-neighbor (ANN) search in high-dimensional spaces. It splits each D-dimensional input vector into M equal sub-vectors, runs k-means independently on each sub-space to learn a small codebook of K centroids, and replaces every sub-vector with the index of its nearest centroid. The resulting code is M * log_2(K) bits long, typically M bytes when K = 256, which is one to two orders of magnitude smaller than the raw float32 vector. Distances between a full-precision query and a compressed database vector can then be estimated in time linear in M using precomputed lookup tables, an operation known as asymmetric distance computation (ADC).

PQ was introduced by Hervé Jégou, Matthijs Douze, and Cordelia Schmid at Inria in a 2011 paper in IEEE Transactions on Pattern Analysis and Machine Intelligence, which remains the foundational reference for the method ^[1]. Since then, PQ and its variants have become a core building block of modern vector databases and large-scale retrieval systems, including FAISS, Milvus, Pinecone, Weaviate, pgvector, and Google's ScaNN. PQ is also a standard component of billion-scale similarity search pipelines for image retrieval, recommender systems, and the retrieval-augmented generation stack used by large language models.

Motivation

Many machine-learning systems represent items as dense vector embeddings in R^D, with D typically between 100 and 3,000. Search over these vectors reduces to a k-nearest-neighbors problem: given a query vector, return the k database vectors with smallest distance under a chosen metric, usually Euclidean or cosine.

For billion-scale corpora, exact nearest-neighbor search is infeasible on two axes:

Memory. A single 768-dimensional float32 vector occupies 3,072 bytes. One billion such vectors require roughly 3 TB of RAM, beyond what fits in a single machine. Even after sharding, the cost is prohibitive for many production deployments.
Compute. Brute-force scan computes one distance per database vector per query, scaling as O(N * D) floating-point operations. At billion scale this is hundreds of milliseconds per query even on optimized hardware, far too slow for interactive use.

Approximate methods trade a small amount of recall for orders-of-magnitude gains in memory or speed. PQ attacks the memory axis directly: a typical configuration with D = 128, M = 8, and 8 bits per sub-quantizer compresses each vector from 512 bytes to 8 bytes, a 64x reduction. With ADC, distance computation also becomes faster, since per-query work reduces to a constant number of table lookups per database vector instead of a full dot product.

The Inria group originally developed PQ for content-based image retrieval over millions of SIFT descriptors, but the same machinery generalizes to any dense embedding space. Combined with an inverted file (IVF) coarse partition, PQ scales to billion-vector indexes on a single server, a setting demonstrated by FAISS ^[4]^[1].

Algorithm

Quantization recap

A vector quantizer is a function q: R^D -> C that maps each input vector x to a codeword c_i drawn from a finite codebook C = {c_1, ..., c_K}. The optimal quantizer in the mean-squared sense minimizes the expected reconstruction error:

E[||x - q(x)||^2]

Lloyd's algorithm (k-means) provides a local minimum. Each x requires log_2(K) bits to store as the index of its assigned centroid. The challenge is that high-quality quantization in D dimensions needs K exponential in D. For D = 128 and 8 bytes per code (K = 2^64), training and lookup with a single codebook are not tractable.

Cartesian product structure

PQ sidesteps the curse of dimensionality by factoring the codebook as a Cartesian product. The space R^D is decomposed into the product of M orthogonal sub-spaces, each of dimension D / M. A separate sub-quantizer q_m: R^(D/M) -> C_m is trained on each sub-space, with its own small codebook C_m of size K. The full quantizer is:

q(x) = (q_1(x_1), q_2(x_2), ..., q_M(x_M))

where x = [x_1; x_2; ...; x_M] is the concatenation of the M sub-vectors. The effective codebook has K^M entries but is stored implicitly as M separate codebooks of size K. Training runs M independent k-means problems, each in D / M dimensions with K clusters. With K = 256 and b = 8 bits per sub-quantizer, each sub-vector index fits in a single byte and the total code length is M bytes ^[1].

Encoding and storage

Given a database vector x:

Split x into M sub-vectors of dimension D / M.
For each sub-vector x_m, find the nearest centroid in C_m and emit its 8-bit index i_m.
Store the code (i_1, i_2, ..., i_M) as M bytes.

A database of N vectors at D = 128 floats per vector occupies 512 * N bytes uncompressed. With M = 8, the PQ-compressed database occupies 8 * N bytes, plus a small fixed overhead for the codebooks (M * K * (D / M) * 4 = 16 KB for M = 8, K = 256, D = 128). A 1-billion-vector dataset that needs 512 GB raw fits in 8 GB compressed ^[1].

Distance computation

The utility of PQ depends on being able to estimate ||q - x||^2 from q and the code of x without first reconstructing x. The Jégou paper proposes two estimators:

Symmetric distance computation (SDC). Both query and database vector are quantized: d_SDC(q, x) = sum_m ||q_m(q_m_subvec) - q_m(x_m)||^2. Inter-centroid distances are precomputed once into an M * K * K table. At query time, each compressed database vector requires M table lookups, one per sub-space.
Asymmetric distance computation (ADC). The query stays in full precision, while the database vector is reconstructed implicitly: d_ADC(q, x) = sum_m ||q_m_subvec - q_m(x_m)||^2. For each query, the M * K lookup table of distances ||q_m_subvec - c_m_k||^2 is computed once. Each database vector then takes M lookups and M - 1 additions to score.

ADC has lower bias and is the default in practice, since paying M * K distance computations per query (a few thousand operations) is negligible compared to scoring millions of database codes. SDC is occasionally preferred when query latency is dominated by small batched inner loops over compressed codes, since the lookup tables can be cached more aggressively. The original paper reports that ADC outperforms SDC by 5-10 percentage points of recall at the same code size on SIFT-1M ^[1].

Inverted file with PQ (IVFADC)

For very large N, scoring every code is still too slow. The standard production setup is IVFADC, also known as IVF-PQ:

Train a coarse quantizer q_c with K_coarse centroids (e.g., 1,024 to 65,536) using ordinary k-means on the full D-dimensional space.
For each database vector x, compute its coarse cell q_c(x) and the residual r = x - q_c(x).
Encode the residual with PQ. Store the code in the inverted list of the corresponding coarse cell.
At query time, find the nprobe coarse cells nearest to the query. For each selected cell, compute the residual q - q_c(cell) and score the codes in that cell using ADC against the residual.

The coarse quantizer prunes the search to a small fraction nprobe / K_coarse of the database. The PQ residual encoding compresses what remains. Typical values are K_coarse = sqrt(N) rounded to a power of two and nprobe between 1 and a few hundred. On SIFT-1M with K_coarse = 1024, M = 8, nprobe = 8, the original paper reports recall@100 around 0.9 with index size 8 MB instead of 512 MB ^[1].

Training cost

PQ training runs M independent k-means jobs, each on a sub-sample of the data (typically 256,000 to 1 million vectors). Wall-clock cost is dominated by the coarse quantizer when K_coarse is large, but is generally a one-time offline operation.

Variants

PQ has spawned a family of related methods that improve on its accuracy under different assumptions. The most important are summarized below.

Method	Year	Authors	Idea
Product quantization (PQ)	2011	Jégou, Douze, Schmid	Independent k-means on `M` sub-spaces ^[1]
Inverted Multi-Index (IMI)	2012	Babenko, Lempitsky	Two coarse quantizers, Cartesian product of cells gives finer partition ^[3]
Optimized PQ (OPQ)	2013-2014	Ge, He, Ke, Sun	Learn an orthogonal rotation `R` so dimensions distribute uniformly across sub-spaces ^[2]
Cartesian k-means (CKM)	2013	Norouzi, Fleet	Concurrent formulation equivalent to OPQ ^[5]
Locally Optimized PQ (LOPQ)	2014	Kalantidis, Avrithis	Learn a separate rotation and codebook per IVF cell ^[6]
Distance-Encoded PQ (DPQ)	2014	Heo, Lin, Yoon	Spend some bits encoding the quantized distance to the centroid ^[7]
Additive Quantization (AQ)	2014	Babenko, Lempitsky	Encode each vector as a sum of codewords from `M` codebooks; no orthogonality assumption ^[8]
Composite Quantization (CQ)	2014	Zhang et al.	Constrain inter-codebook inner products for fast distance computation
Residual Vector Quantization (RVQ)	classical, modern variants	Various	Cascade of quantizers; each stage encodes the residual of the previous one
Anisotropic Vector Quantization	2020	Guo et al. (Google ScaNN)	Loss function that penalizes the parallel residual component more than the orthogonal one ^[9]

Optimized product quantization

The quality of PQ depends on the choice of sub-space decomposition. If a few dimensions carry most of the variance, splitting them into the same sub-vector wastes most of the codebook budget on a different sub-space. Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun introduced OPQ to learn an orthogonal rotation R of the input space so that variance is balanced across sub-spaces and intra-sub-space dimensions are decorrelated. The objective is to jointly minimize quantization distortion over R and the M codebooks:

min_{R, C_1, ..., C_M} sum_x ||R x - q(R x)||^2

subject to R^T R = I. The CVPR 2013 conference paper proposes both a parametric solution that assumes a Gaussian distribution and a non-parametric iterative solution that alternates between updating R (an orthogonal Procrustes step) and re-running k-means in the rotated sub-spaces. The TPAMI 2014 journal version expands the analysis ^[2]. Norouzi and Fleet's Cartesian k-means at the same CVPR 2013 derives the same family of models independently ^[5].

OPQ closes a substantial fraction of the recall gap to full-precision search on benchmarks where the raw embedding has correlated dimensions, and is now standard in FAISS as IndexPQ(...) with OPQMatrix pre-rotation, or as the OPQ<m>_<d>,IVF<k>,PQ<m> index factory string.

Inverted Multi-Index

Artem Babenko and Victor Lempitsky's Inverted Multi-Index (IMI), presented at CVPR 2012, replaces the single coarse quantizer of IVFADC with two coarse quantizers, one per half of the input. The Cartesian product of their cells yields K_coarse^2 implicit sub-cells, often a thousand times more than a flat IVF can afford, while keeping training cheap. The list traversal is reordered using a multi-sequence algorithm that walks over candidate cells in increasing-distance order. On SIFT-1B, IMI achieves a denser partition and higher recall than flat IVF at similar memory and query cost ^[3]. IMI is implemented in FAISS as IndexIVFPQ with two-level coarse quantization (IndexIMI2).

Locally optimized PQ

LOPQ extends IVFADC by learning a separate orthogonal rotation and PQ codebook for each coarse cell, rather than a single global one. Cells in different parts of the embedding space tend to have different local covariance structure, so a global rotation is suboptimal. The Kalantidis and Avrithis CVPR 2014 paper shows LOPQ improves recall on SIFT-1B and Yahoo Flickr Creative Commons 100M with constant memory overhead per cell ^[6]. The original implementation was open-sourced by Yahoo.

Additive and residual quantization

Additive Quantization (AQ) drops the orthogonal sub-space assumption entirely. Each vector is approximated as a sum of M codewords drawn from M codebooks of size K, with full freedom in how the codebooks span R^D. The encoding problem becomes combinatorial and is solved with beam search; training alternates between codeword updates and re-encoding. AQ delivers lower distortion than PQ at the same code size, at the cost of more expensive encoding ^[8].

Residual Vector Quantization (RVQ) is a cascaded variant: the first stage quantizes x, the second quantizes x - c_1, the third quantizes x - c_1 - c_2, and so on. Classical RVQ predates PQ and has been used for speech and image coding for decades. In modern deep learning, RVQ is the workhorse of neural audio codecs such as Google's SoundStream and Meta's EnCodec, where it is jointly trained with the encoder and decoder networks.

Fast scan and SIMD

Fred André, Anne-Marie Kermarrec, and Nicolas Le Scouarnec showed in their VLDB 2015 paper that PQ throughput on commodity CPUs is bottlenecked not by FLOPs but by the cost of repeated cache accesses to the M * K lookup table. Their PQ Fast Scan algorithm shrinks the table to fit in SIMD registers, replaces cache lookups with in-register PSHUFB-style permutations, and processes 16 or 32 codes per instruction. The result is a 4x to 6x speed-up at identical recall, reducing scan time on 25 million vectors from about 74 ms to about 13 ms ^[10]. The same group's follow-up work, Quick ADC and Quicker ADC, extends the trick to wider SIMD and 4-bit codes. Most modern PQ implementations, including FAISS's IndexPQFastScan and IndexIVFPQFastScan, ship Fast Scan kernels.

Performance characteristics

Performance on PQ-style indexes is usually reported as recall@R: the fraction of queries for which the true nearest neighbor (or one of the true top-k) appears in the top R results returned by the approximate index. The original Jégou paper benchmarks SIFT-1M (D = 128, N = 1 million) and SIFT-1B (N = 1 billion):

On SIFT-1M with M = 8 (8-byte codes), recall@100 reaches about 0.59 with plain ADC and around 0.89 with IVFADC at K_coarse = 1024, nprobe = 8 ^[1].
With M = 16 (16-byte codes), recall@100 climbs above 0.95 on SIFT-1M IVFADC.
On SIFT-1B with M = 16 and a coarse quantizer of 8,192 cells, IVFADC retrieves the true nearest neighbor in the top 100 candidates for the majority of queries while consuming roughly 12 GB of RAM for the entire 1-billion-vector index.

OPQ, IMI, and LOPQ each push these numbers further, generally improving recall by 5 to 15 percentage points at the same code size. ScaNN's anisotropic loss is reported to outperform OPQ on glove-100-angular and other modern benchmarks by another factor of two in queries-per-second at fixed recall ^[9].

The Johnson, Douze, and Jégou 2017 paper on FAISS demonstrates GPU implementations of IVF-PQ that perform billion-scale k-NN at single-digit milliseconds per query, with peak k-selection running at 55% of theoretical hardware bandwidth. The same paper reports building a high-accuracy k-NN graph over 95 million Flickr images in 35 minutes on 8 GPUs, and a 1-billion-vector graph in under 12 hours on 4 GPUs ^[4].

Comparison with other ANN methods

PQ-based indexes occupy a different point in the recall-memory-throughput trade-off than graph-based methods like HNSW or hashing methods like LSH. The table below summarizes the main families.

Method	Index type	Memory per vector	Query speed	Recall	Training	Billion-scale
Flat (brute force)	Exhaustive	`4 * D` bytes (float32)	`O(N * D)` per query	100%	none	impractical
LSH	Random or learned hash buckets	tens of bytes	sub-linear	moderate	minimal	yes, with sharding
IVF-Flat	k-means cells, full vectors in lists	`4 * D + 8` bytes	`nprobe / K_coarse * N`	high (up to 100% with full lists)	k-means	yes, but memory-heavy
PQ (flat)	Compressed codes only	`M` bytes (often 8-32)	`O(N)` table lookups	low to moderate	k-means per sub-space	yes, fits in RAM
IVF-PQ (IVFADC)	Coarse quantizer + PQ residuals	`M + small` bytes	`(nprobe / K_coarse) * N` lookups	moderate to high	coarse k-means + sub-space k-means	yes, default at 1B scale
HNSW	Multi-layer proximity graph	`4 * D + ~64-200` bytes for graph	`O(log N)` distance evals per query	very high	incremental	up to 100M comfortably
HNSW + PQ	Graph over compressed codes	`M + graph` bytes	`O(log N)` lookups	moderate to high	k-means + graph build	yes, hybrid
ScaNN	IVF + anisotropic PQ	`M` bytes per residual	`(nprobe / K_coarse) * N`	very high at given memory	anisotropic loss training	yes
DiskANN	SSD-resident graph	small RAM, large SSD	`O(log N)` SSD reads	very high	offline graph build	designed for billion scale

In practice, HNSW wins when memory is plentiful and high recall (above 0.95) is required, while PQ-family methods win when memory is the binding constraint or when the corpus exceeds RAM. Hybrids that build HNSW or NSG graphs over PQ-compressed vectors, then re-rank the top candidates against full-precision vectors, are common in production systems that need both high recall and low memory.

Production systems

Nearly every billion-scale vector search deployment in industry uses PQ in some form. The table summarizes the main implementations.

System	First released	Owner	PQ usage
FAISS	2017	Meta AI Research	`IndexPQ`, `IndexIVFPQ`, `IndexHNSWPQ`, `IndexIVFPQFastScan`, OPQ pre-rotation; reference implementation ^[4]
Milvus	2019	Zilliz	`IVF_PQ`, `IVF_SQ8`, `HNSW_PQ` index types; uses FAISS and Knowhere internally
ScaNN	2020	Google	Anisotropic vector quantization, the state of the art on glove-100-angular ^[9]
Pinecone	2021	Pinecone	Internal indexes use IVF-PQ-like compression alongside graph methods
Weaviate	2019	Weaviate B.V.	HNSW with optional PQ compression and 1-bit binary quantization
Qdrant	2021	Qdrant Solutions	HNSW with optional PQ, scalar, and binary quantization
Vespa	open-sourced 2017	Yahoo / Verizon	HNSW with int8 scalar quantization; PQ available for recommendation workloads
Vertex AI Matching Engine	2021	Google Cloud	Powered by ScaNN under the hood
pgvector	2021	Andrew Kane	IVFFlat and HNSW; community has added IVFPQ patches and external plugins
Elasticsearch / OpenSearch	2022	Elastic / OpenSearch Project	HNSW, with PQ via Faiss engine and scalar quantization native
LOPQ	2014	Yahoo	Reference open-source implementation of locally optimized PQ

FAISS, released by Facebook AI Research in March 2017, remains the dominant open-source library. Its index factory mini-language lets users compose pipelines such as OPQ32_128,IVF65536,PQ32x8: rotate inputs with a 32-block OPQ matrix into 128 dimensions, partition with a 65,536-cell IVF, then encode residuals as 32-byte PQ codes. The FAISS GPU backend accelerates both training and search by one to two orders of magnitude over CPU ^[4]. Milvus, launched by Zilliz in October 2019, packages FAISS and a custom engine called Knowhere behind a database-style API. Google's ScaNN, open-sourced in July 2020, demonstrated that an anisotropic loss function biased toward inner-product preservation could outperform OPQ on retrieval workloads where queries follow a known distribution; it powers Vertex AI Matching Engine ^[9].

Use in modern AI workflows

Dense retrieval since 2020 has made PQ-style compression a default consideration for any vector index scaling beyond a few million entries. In retrieval-augmented generation, document chunks are embedded with models such as text-embedding-3-large, BGE, or E5, producing vectors of dimension 768 to 3,072. A typical RAG corpus of 100 million chunks at 1,024 dimensions consumes about 400 GB raw; with M = 64, 8-bit PQ codes, the index fits in roughly 6.4 GB. The recall hit is usually repaired by re-ranking the top few hundred PQ candidates against full-precision vectors held on disk or in a tiered cache.

Recommender systems at Spotify, Pinterest, and eBay have used PQ-style compression in their candidate generators for years, typically combined with two-tower neural networks. Image and video retrieval at Google, Meta, and Yandex relies on PQ for both product image search and content moderation pipelines.

PQ codes are also sometimes used as discrete tokens. The vector quantization step inside VQ-VAE and RVQ-based neural audio codecs can be viewed as a form of PQ where the codebooks are jointly learned with neural encoders. Learned quantization is an active research area, with work on differentiable PQ, joint training with retrieval objectives, and hybrid sparse-dense indexes combining BM25 with PQ-compressed dense retrieval.

Mathematical perspective

PQ can be cast as a constrained vector quantization problem. An unconstrained quantizer with K_total = K^M centroids would minimize E[||x - c_i||^2]. PQ adds the constraint that each centroid c_i factorizes as a concatenation of M sub-centroids, one per sub-space, and that each sub-centroid is shared across all K^(M-1) global centroids that agree on the other sub-spaces. The problem decomposes into M independent k-means problems precisely when the sub-spaces are decorrelated and have equal variance, the assumption that motivates OPQ. Pre-rotating the data with an orthogonal R does not change the global geometry but redistributes the per-sub-space error so the sum is minimized ^[1]^[2].

Limitations

PQ is a lossy compression. Recall@1 is rarely above 0.95 even with large M, and exact retrieval requires re-ranking with full-precision vectors.
Plain PQ's quality depends on alignment of variance with sub-space boundaries. OPQ-style rotation is typically required for embeddings produced by deep networks, where dimensions are arbitrary linear combinations of underlying features ^[2].
PQ is most natural for L2 distance. Inner product and cosine require normalization and the identity ||q - x||^2 = 2 - 2 q.x, which introduces extra distortion if normalization is not preserved by the codebook.
Codebooks trained on one distribution degrade if the data distribution shifts. Production systems periodically re-train sub-quantizers as embeddings change.
PQ codes are append-only by default. Deletes are handled with tombstones; large rewrites typically require a periodic rebuild.
PQ shines when scoring millions of codes per query. For small candidate sets, the table-construction overhead can dominate, and a full-precision distance is faster.

References

Jégou, H., Douze, M., & Schmid, C. (2011). Product Quantization for Nearest Neighbor Search. *IEEE Transactions on Pattern Analysis and Machine Intelligence*, 33(1), 117-128. https://ieeexplore.ieee.org/document/5432202
Ge, T., He, K., Ke, Q., & Sun, J. (2014). Optimized Product Quantization. *IEEE Transactions on Pattern Analysis and Machine Intelligence*, 36(4), 744-755. Conference version: Optimized Product Quantization for Approximate Nearest Neighbor Search. *CVPR 2013*. https://people.csail.mit.edu/kaiming/publications/pami13opq.pdf
Babenko, A., & Lempitsky, V. (2012). The Inverted Multi-Index. *IEEE Conference on Computer Vision and Pattern Recognition (CVPR)*. https://www.robots.ox.ac.uk/~vilem/cvpr2012.pdf
Johnson, J., Douze, M., & Jégou, H. (2019). Billion-scale similarity search with GPUs. *IEEE Transactions on Big Data*, 7(3), 535-547. arXiv:1702.08734 (2017). https://arxiv.org/abs/1702.08734
Norouzi, M., & Fleet, D. J. (2013). Cartesian k-means. *IEEE Conference on Computer Vision and Pattern Recognition (CVPR)*. https://norouzi.github.io/research/papers/ckmeans.pdf
Kalantidis, Y., & Avrithis, Y. (2014). Locally Optimized Product Quantization for Approximate Nearest Neighbor Search. *IEEE Conference on Computer Vision and Pattern Recognition (CVPR)*. https://openaccess.thecvf.com/content_cvpr_2014/papers/Kalantidis_Locally_Optimized_Product_2014_CVPR_paper.pdf
Heo, J.-P., Lin, Z., & Yoon, S.-E. (2014). Distance Encoded Product Quantization. *IEEE Conference on Computer Vision and Pattern Recognition (CVPR)*. https://openaccess.thecvf.com/content_cvpr_2014/papers/Heo_Distance_Encoded_Product_2014_CVPR_paper.pdf
Babenko, A., & Lempitsky, V. (2014). Additive Quantization for Extreme Vector Compression. *IEEE Conference on Computer Vision and Pattern Recognition (CVPR)*. https://openaccess.thecvf.com/content_cvpr_2014/papers/Babenko_Additive_Quantization_for_2014_CVPR_paper.pdf
Guo, R., Sun, P., Lindgren, E., Geng, Q., Simcha, D., Chern, F., & Kumar, S. (2020). Accelerating Large-Scale Inference with Anisotropic Vector Quantization. *International Conference on Machine Learning (ICML)*. https://arxiv.org/abs/1908.10396
André, F., Kermarrec, A.-M., & Le Scouarnec, N. (2015). Cache locality is not enough: High-Performance Nearest Neighbor Search with Product Quantization Fast Scan. *Proceedings of the VLDB Endowment*, 9(4). https://inria.hal.science/hal-01239055
Malkov, Y. A., & Yashunin, D. A. (2018). Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. *IEEE Transactions on Pattern Analysis and Machine Intelligence*, 42(4), 824-836. arXiv:1603.09320. https://arxiv.org/abs/1603.09320
Faiss documentation. https://faiss.ai/
Faiss wiki, Faiss indexes. https://github.com/facebookresearch/faiss/wiki/Faiss-indexes
Pinecone, Product Quantization: Compressing high-dimensional vectors by 97%. https://www.pinecone.io/learn/series/faiss/product-quantization/
Milvus documentation, Vector index types. https://milvus.io/docs/index.md
Douze, M., Guzhva, A., Deng, C., Johnson, J., Szilvasy, G., Mazaré, P.-E., Lomeli, M., Hosseini, L., & Jégou, H. (2024). The Faiss library. arXiv:2401.08281. https://arxiv.org/abs/2401.08281

Motivation

Algorithm

Quantization recap

Cartesian product structure

Encoding and storage

Distance computation

Inverted file with PQ (IVFADC)

Training cost

Variants

Optimized product quantization

Inverted Multi-Index

Locally optimized PQ

Additive and residual quantization

Fast scan and SIMD

Performance characteristics

Comparison with other ANN methods

Production systems

Use in modern AI workflows

Mathematical perspective

Limitations

See also

References

Improve this article

Related Articles

HNSW

MTEB (Massive Text Embedding Benchmark)

Actor model

Model deployment

QLoRA

GPTQ

Motivation

Algorithm

Quantization recap

Cartesian product structure

Encoding and storage

Distance computation

Inverted file with PQ (IVFADC)

Training cost

Variants

Optimized product quantization

Inverted Multi-Index

Locally optimized PQ

Additive and residual quantization

Fast scan and SIMD

Performance characteristics

Comparison with other ANN methods

Production systems

Use in modern AI workflows

Mathematical perspective

Limitations

See also

References

Related Articles

HNSW

MTEB (Massive Text Embedding Benchmark)

Actor model

Model deployment

QLoRA

GPTQ