Qdrant

Qdrant (pronounced "quadrant") is an open-source vector database and similarity search engine written in Rust and designed for high-performance retrieval over high-dimensional data. Founded in 2021 in Berlin, Germany by Andre Zayarni (CEO) and Andrey Vasnetsov (CTO), Qdrant has become one of the most widely deployed vector engines in production artificial intelligence systems. By March 2026 the project had surpassed 250 million package downloads, accumulated more than 29,000 stars on GitHub, and was used by enterprises including Tripadvisor, HubSpot, Canva, OpenTable, Roche, Bosch, Deutsche Telekom, Flipkart, and Elon Musk's xAI for the Grok chatbot [1][2][3].

The company has raised approximately $85.5 million across three rounds: a $7.5 million seed in April 2023 led by Unusual Ventures, a $28 million Series A in January 2024 led by Spark Capital, and a $50 million Series B in March 2026 led by AVP [4][5][6]. Qdrant is distributed under the Apache License 2.0 and offered both as a self-hosted binary and as a managed service through Qdrant Cloud, Qdrant Hybrid Cloud, and Qdrant Private Cloud.

history and founding

The seeds of Qdrant were planted before the company itself existed. Andrey Vasnetsov, a machine learning engineer who had worked on search and recommendation systems at Mail.Ru Group, the social-matching startup dotin, Tinkoff Bank, and the talent platform MoBerries, repeatedly ran into the same problem: existing vector similarity tools did not behave like databases. Libraries such as FAISS and Annoy were excellent for static benchmark datasets but lacked persistence, payload storage, filtering, replication, and an operational interface suitable for production services [3][7].

Vasnetsov began writing a vector search engine from scratch in Rust and published the first public release on GitHub in 2021. The early developer response was strong enough to convince him and Andre Zayarni, who had previously held engineering and product leadership roles at Bigpoint, MoBerries, and the sports-tech company Playtomic, to incorporate the project as a company. Qdrant GmbH was registered in Berlin in 2021, with Zayarni as CEO and Vasnetsov as CTO [3][7].

The choice of Rust was deliberate. The team wanted memory safety without garbage collection, low and predictable latency under heavy load, and direct access to SIMD instructions for distance calculations. That decision later defined Qdrant's performance profile: by 2024 it routinely ranked among the lowest-memory and lowest-latency engines in community vector search benchmarks, and the codebase remained more than 87 percent Rust at the v1.17 milestone [3][8].

funding rounds

The company's funding history reflects the rapid growth of investor interest in retrieval infrastructure for generative AI.

Round	Date	Amount	Lead investor	Other participants	Source
Seed	19 April 2023	$7.5M	Unusual Ventures	42CAP, IBB Ventures, angels including Amr Awadallah	[4]
Series A	23 January 2024	$28M	Spark Capital	Unusual Ventures, 42CAP	[5]
Series B	12 March 2026	$50M	AVP	Bosch Ventures, Unusual Ventures, Spark Capital, 42CAP	[6]
Total		~$85.5M

Unusual Ventures partner John Vrionis, who led the seed round, characterized the timing as the moment vector search shifted from a research curiosity to core infrastructure for production AI applications. Spark Capital partner Yasmin Razavi joined Qdrant's board following the Series A. The Series B, led by Atlantic Bridge spin-out AVP and announced on 12 March 2026, valued the company at a level the parties did not disclose; participating investors included Bosch Ventures, which connected Qdrant to enterprise users in industrial automation and automotive software [4][5][6].

company growth

By the time of the Series B announcement, Qdrant had grown to more than 100 employees distributed across over 20 countries, with engineering concentrated in Berlin and a Discord community of more than 9,000 developers. The company holds events including the annual Vector Space Day in Berlin, which drew more than 400 attendees in 2025 [1][9].

architecture

Qdrant runs as a single statically linked Rust binary that exposes REST and gRPC APIs. The same binary handles standalone, replicated, and sharded deployments, with a custom storage engine, a custom key-value store called Gridstore, and a heavily modified HNSW graph index at its core. The system is intentionally focused on vector search rather than general-purpose data storage, and it rejects features (full transactional ACID guarantees, SQL parsing, generic document modeling) that would compromise its primary workload [10].

why Rust

The choice of Rust as the implementation language has significant architectural implications. Rust provides several advantages for a vector database:

No garbage collector. Rust uses a compile-time ownership model instead of garbage collection. This eliminates GC pauses that can cause latency spikes during search operations. In contrast, databases written in Go (such as Weaviate and Milvus) or those with JVM components are subject to GC-related tail latency.
Memory safety without overhead. Rust's borrow checker catches memory errors at compile time, avoiding the segfaults and buffer overflows that plague C/C++ codebases, and without the runtime cost of bounds checking.
Zero-cost abstractions. Rust's generic and trait system allows high-level API design without sacrificing performance. Iterator chains and closures compile down to the same assembly as hand-written loops.
SIMD acceleration. Qdrant leverages Rust's SIMD (Single Instruction, Multiple Data) support to accelerate distance calculations on both x86-64 (AVX2 and AVX-512) and ARM (NEON) architectures. This hardware-level optimization is critical for vector distance calculations, which sit in the innermost loop of every search.
Async I/O with io_uring. Qdrant uses io_uring for asynchronous disk I/O on Linux, maximizing disk throughput even on network-attached storage. This matters most for disk-based search modes where quantized vectors are stored on SSD [10].

The practical result is a compact memory footprint and predictable latency. In community benchmarks, Qdrant consistently ranks among the lowest in memory consumption per vector among the major vector databases, and among the highest in queries-per-second at fixed recall [8].

segments and storage

A Qdrant collection is divided into one or more segments. Each segment is a self-contained unit that owns its own vector storage, payload storage, HNSW index, and ID mapper. Segments come in two states:

Appendable segments accept inserts, updates, and deletes. They use mutable data structures and a flat (brute-force) index, and are optimized for write throughput.
Non-appendable (sealed) segments are immutable except for tombstone-style deletions. They use compacted storage, fully built HNSW indexes, optional quantization, and are optimized for read performance [11].

A background optimizer thread merges small segments, converts appendable segments to sealed ones, and rebuilds indexes in the background. This is conceptually similar to how an LSM-tree database compacts SSTables, but adapted for vector indexes.

Durability is provided by a write-ahead log (WAL). Every operation is recorded in the WAL before being applied to a segment. If the process crashes before changes are flushed, the WAL is replayed during startup. Each segment also stores a per-point version, so out-of-order replays cannot resurrect older state [11].

Gridstore

Until version 1.13, Qdrant used RocksDB as the underlying key-value store for payload and sparse vector data. The team identified several mismatches between RocksDB's design and Qdrant's actual workload: the LSM-tree compaction cycle introduced random latency spikes, the system offered far more configuration knobs than Qdrant needed, and the C++ interop boundary slowed iteration on storage features [12].

In January 2025, Qdrant 1.13 shipped Gridstore, a custom key-value store designed specifically for sequential integer keys (which is all Qdrant ever stores). Gridstore is structured around three layers: a Data Layer with a pointer-array tracker, a Mask Layer that tracks block usage, and a Gaps Layer that locates free space efficiently. Because there is no compaction phase, write latency is more predictable, and the storage engine can be debugged without crossing a foreign-function-interface boundary [12].

HNSW implementation

Qdrant's index is built on HNSW (Hierarchical Navigable Small World) graphs, the dominant approximate nearest neighbor algorithm in production vector search. Unlike a textbook implementation, however, Qdrant's HNSW is integrated with the payload index. Filter conditions are applied during graph traversal rather than as a pre-filter on candidates or a post-filter on results. The HNSW graph is also extended with additional edges based on indexed payload values, so a filtered traversal does not become disconnected when many neighbors are pruned [10][13].

The HNSW index supports configurable parameters:

Parameter	Default	Description
`m`	16	Number of edges per node in the HNSW graph
`ef_construct`	100	Size of the dynamic candidate list during index construction
`ef`	`ef_construct`	Search-time candidate list size, controls speed/recall trade-off
`full_scan_threshold`	10,000	Collections smaller than this use brute-force search
`max_indexing_threads`	0 (auto)	Number of threads for index building
`on_disk`	false	Whether to store the HNSW graph on disk instead of in memory

Version 1.13 added GPU-accelerated HNSW indexing. The implementation uses the Vulkan graphics API rather than CUDA, making it portable across NVIDIA, AMD, and Intel GPUs. Qdrant reports up to 10x faster index construction relative to CPU-only builds on equivalent hardware. GPU indexing is enabled with the environment variable QDRANT__GPU__INDEXING=1 and a Docker image suffix such as qdrant/qdrant:v1.13.0-gpu-nvidia [13][14].

Version 1.13 also introduced HNSW graph compression based on delta encoding, which reduces index memory by up to 30 percent without measurable recall loss. Version 1.16 added Inline Storage, which co-locates quantized vector data inside the HNSW graph nodes so that traversal touches fewer pages of disk or memory [13][15].

For highly selective filters, version 1.16 also implemented the ACORN-1 search method (an adaptation of the ACORN approach to filtered ANN search). When direct neighbors of a node are excluded by a filter, ACORN looks one hop further to recover graph connectivity, improving recall on queries that combine vector similarity with rare metadata conditions [13].

quantization

Quantization is the central tool Qdrant uses to keep large indexes in RAM or to make disk-based search feasible. The system has shipped three families of vector quantization, all of which can be enabled and disabled per collection without rebuilding the underlying vectors.

Quantization	Compression	Speed-up	Typical recall	Introduced
Scalar (int8)	4x	up to 2x	~0.99	v1.1, March 2023
Product quantization	up to 64x	0.5x	~0.7 at 12x	v1.2, May 2023
Binary (1-bit)	32x	up to 40x	~0.95 to 0.99*	v1.5, September 2023
Binary (1.5-bit)	24x	~30x	similar to 1-bit	2025
Binary (2-bit)	16x	~20x	similar to 1-bit	2025

*Recall figures for binary quantization assume 1024-dimensional or larger embeddings with appropriate oversampling and rescoring [16][17][18].

Scalar quantization, introduced in version 1.1 (March 2023), maps each float32 component to an int8 integer using a linear transformation calibrated from a configurable quantile (typically 0.99). This reduces vector memory by approximately 75 percent, allows SIMD-accelerated 8-bit dot products, and on internal benchmarks loses 0.3 percent recall or less while improving latency by 28 to 60 percent on real datasets [17].

Product quantization, added in version 1.2 (May 2023), splits each vector into chunks and runs k-means clustering with k=256 on each chunk independently. Compression ratios can range from 4x up to 64x. The trade-off is much steeper than scalar quantization: for a Glove-100 dataset, 4x PQ retains 0.71 mean precision (almost identical to the float baseline), but 12x PQ drops to 0.59. PQ is recommended only when memory constraints clearly outweigh recall [18].

Binary quantization, announced in September 2023 along with the partnership with OpenAI, maps each component to a single bit (1 if positive, 0 otherwise). For 1536-dimensional text-embedding-3-small vectors and 3072-dimensional text-embedding-3-large vectors from OpenAI, Qdrant reports recall of 0.98 to 0.997 with appropriate oversampling. Storage drops by a factor of 32, and search becomes up to 40 times faster because distance computations reduce to bitwise XOR and population count [16]. Binary quantization performs poorly on embeddings smaller than 1024 dimensions, which led to the 1.5-bit and 2-bit variants introduced in 2025 for medium-dimensional models [16].

All quantization modes support rescoring: the candidate list is built using the compressed representation, then the top results are re-ranked using the full-precision vectors. Asymmetric quantization, which keeps query vectors in a higher-precision representation than stored vectors, is also supported.

choosing a quantization strategy

Scenario	Recommended quantization	Rationale
General production workload	Scalar	Best balance of memory savings and recall; minimal accuracy loss
1536+ dimensional embeddings	Binary (1-bit)	Large embeddings tolerate aggressive compression with rescoring
Medium-dimensional embeddings (768 to 1024)	Binary (1.5-bit or 2-bit)	Better recall than 1-bit at lower compression
Maximum compression needed	Product quantization	Highest ratio but lowest recall; best for cold storage tiers
Mixed workload with precision-sensitive queries	Asymmetric quantization	Full-precision queries with compressed storage

Quantization can be configured per collection, and re-quantization is run by the optimizer in the background when the configuration changes [16][17].

key features

payload and filtering

Every vector stored in Qdrant can carry an arbitrary JSON payload as metadata. The filtering engine supports a comprehensive list of conditions:

Condition	Use case
`match`	Exact match on keyword, integer, boolean, or UUID fields
`match_any` (v1.1+)	IN-style matching against a list of values
`match_except` (v1.2+)	NOT-IN-style filtering
`range`	Numeric comparisons (gt, gte, lt, lte)
`datetime_range` (v1.8+)	RFC 3339 datetime comparisons
`geo_bounding_box`	Rectangular geographic region
`geo_radius`	Circular geographic region
`geo_polygon`	Arbitrary geographic polygons with holes
`text_match`	Token-level text search with configurable tokenizer
`phrase_match` (v1.15+)	Exact phrase matching with token order
`nested` (v1.2+)	Apply conditions to elements of an array of objects
`values_count`	Filter on array length
`is_empty` / `is_null`	Handle missing or null fields
`has_id`	Filter by point IDs
`has_vector` (v1.13+)	Filter by presence of a named vector

Filters are combined using three boolean operators:

Combinator	Logic
`must`	Logical AND over all conditions
`should`	Logical OR over conditions
`must_not`	Logical NOT over conditions

Because filters are applied during graph traversal rather than as a separate step, filtered searches perform well even when the filter is highly selective (matching only a small fraction of the dataset). Other vector databases often struggle in this scenario, since pre-filtering can fragment the graph and post-filtering wastes computation on irrelevant candidates [13][20].

payload indexes

To accelerate filtered searches, Qdrant supports creating explicit payload indexes on frequently filtered fields:

Index type	Field type	Description
Keyword	String	Hash-based index for exact string matching
Integer	Integer	Range index for numerical filtering
Float	Float	Range index for floating-point fields
Bool	Boolean	Bitmap index for boolean fields
Geo	GeoPoint	R-tree index for geographic queries
Full-text	Text	Inverted index with configurable tokenizer
Datetime	Datetime	Range index for temporal filtering
UUID	UUID	Hash-based index for UUID matching

Full-text indexes can be configured with multiple tokenizers (word, whitespace, prefix, multilingual), Snowball stemming, ASCII folding for diacritics, and language-specific or custom stopword lists. Most index types can be stored on disk to conserve RAM. Specialized variants include the tenant index (is_tenant: true) for multi-tenant collections and the principal index for high-cardinality numeric fields used in nearly every query [13].

Creating a payload index before inserting data is more efficient than adding it afterward, because the optimizer can build the HNSW graph with the payload index already in place rather than rebuilding it later [10].

multitenancy

Qdrant provides built-in multitenancy support, allowing many tenants to share a single collection while remaining logically isolated. Tenants are typically separated by a payload field (such as tenant_id) that is marked is_tenant: true, which causes the optimizer to keep each tenant's vectors in dedicated segments. This avoids the cost of a separate collection per tenant and keeps the HNSW graph for each tenant compact and efficient.

Version 1.16 introduced tiered multitenancy, which addresses the noisy-neighbor problem in shared deployments. Tenants can be promoted between hot, warm, and cold tiers based on activity. Hot tenants keep their vectors in RAM, warm tenants use quantized vectors with rescoring, and cold tenants live entirely on disk. Promotions and demotions happen in the background without rebuilding indexes [13][19].

sparse vectors and hybrid search

Version 1.7 (December 2023) added native support for sparse vectors, enabling hybrid search that combines semantic similarity (dense vectors) with lexical relevance (sparse vectors produced by BM25, TF-IDF, or SPLADE models). Sparse vectors are indexed exactly (no approximation) and support both in-memory and on-disk storage. An optional inverse document frequency (IDF) modifier weights tokens by rarity at query time, which improves quality for natural-language queries.

A single point can carry multiple named vectors. The Query API supports a fan-out of prefetches across vector fields and combines results using fusion methods including Reciprocal Rank Fusion (RRF), Relative Score Fusion, and Weighted RRF [21]. This capability is fundamental for retrieval-augmented generation (RAG) pipelines, where combining keyword and semantic matching often outperforms either approach alone.

multivector and late interaction

Version 1.10 (July 2024) introduced multivector support, in which a single point can carry an arbitrary set of vectors of equal dimensionality. This is the natural representation for late interaction models such as ColBERT, where each token in a document is encoded into its own vector and similarity is computed using the MaxSim operator: for each query token, take the maximum similarity over document tokens, then sum over query tokens [22].

A typical pipeline uses a multistage Query API call that prefetches candidates with a compressed dense vector, re-ranks with full-precision dense vectors, and finally re-ranks with ColBERT multivectors. Multivectors interoperate with all quantization modes [22].

query API and advanced retrieval

Version 1.10 also generalized the search interface into a unified Query API that supports nearest-neighbor search, recommendations (based on positive and negative example IDs or vectors), discovery search (finding points in regions defined by example pairs), counting, and faceted aggregation. The same API supports nested prefetches, allowing complex retrieval pipelines to be expressed as a single request rather than multiple round trips.

More recent additions include Maximal Marginal Relevance (MMR) for diversity-aware reranking, score-boosting reranking that combines vector similarity with business signals, and full-text filtering with multilingual stemming and stopword support [9][13].

collection management

Qdrant organizes data into collections, which contain points (vectors with optional payloads). Collection management is handled through the REST or gRPC API, or any of the official client libraries:

from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance

client = QdrantClient(url="http://localhost:6333")

# Create a collection
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(
        size=1536,
        distance=Distance.COSINE,
    ),
)

# Create a collection with multiple named vectors
client.create_collection(
    collection_name="multimodal",
    vectors_config={
        "text": VectorParams(size=768, distance=Distance.COSINE),
        "image": VectorParams(size=512, distance=Distance.COSINE),
    },
)

# List collections
collections = client.get_collections()

# Get collection info (vector count, config, status)
info = client.get_collection("documents")

# Delete a collection
client.delete_collection("documents")

Collections support aliases, which allow a stable name to point at different underlying collections. This enables blue-green deployments where a new index is built in a separate collection and the alias is switched atomically once the new index is ready.

A simple insertion and search flow looks like this:

from qdrant_client.models import PointStruct, Filter, FieldCondition, MatchValue

client.upsert(
    collection_name="documents",
    points=[
        PointStruct(id=1, vector=[0.1] * 1536, payload={"category": "news"}),
        PointStruct(id=2, vector=[0.2] * 1536, payload={"category": "blog"}),
    ],
)

results = client.query_points(
    collection_name="documents",
    query=[0.15] * 1536,
    query_filter=Filter(
        must=[FieldCondition(key="category", match=MatchValue(value="news"))]
    ),
    limit=5,
).points

The equivalent REST call uses the POST /collections/{name}/points/query endpoint with a JSON body.

snapshots and backup

Qdrant provides a comprehensive snapshot system for backups and data migration.

collection snapshots

A snapshot is a tar archive containing the full data and configuration of a specific collection on a specific node at a specific point in time. Snapshots are created through the REST API:

Operation	Endpoint	Description
Create	`POST /collections/{name}/snapshots`	Creates a new snapshot of the collection
List	`GET /collections/{name}/snapshots`	Lists all available snapshots
Download	`GET /collections/{name}/snapshots/{snapshot_name}`	Downloads the snapshot file
Delete	`DELETE /collections/{name}/snapshots/{snapshot_name}`	Removes a snapshot
Recover	`PUT /collections/{name}/snapshots/recover`	Restores a collection from a snapshot file or URL

full storage snapshots

Full storage snapshots capture the entire Qdrant instance, including all collections, aliases, and cluster metadata. The snapshot name follows the format full-snapshot-<timestamp>.snapshot. This is useful for complete instance backups or for migrating an entire Qdrant deployment to new hardware.

snapshot storage options

Snapshots can be stored locally on the filesystem or in S3-compatible object storage. The storage backend is configured in the Qdrant configuration file:

Local storage: snapshots are saved to the configured snapshots_path directory.
S3 storage: snapshots are uploaded to an S3 bucket using the AWS SDK, which also supports S3-compatible services such as MinIO and Google Cloud Storage.

For disaster recovery, Qdrant Cloud provides automatic incremental backups for AWS and GCP clusters, where each backup contains only the data that changed since the previous backup [23].

distributed deployment and clustering

Qdrant supports distributed deployment for horizontal scaling and high availability.

consensus with Raft

In distributed mode, Qdrant uses the Raft consensus protocol to maintain agreement on cluster topology and collection structure. All metadata operations (creating collections, changing replication factors, transferring shards) are routed through the consensus layer. Data operations (upserts and queries) bypass consensus and go directly to the relevant shards using optimistic replication, which is what allows the system to handle high write rates without the consensus layer becoming a bottleneck [10].

Write consistency can be tuned per request. The write_ordering option supports weak, medium, and strong modes; the write_consistency_factor controls how many replicas must acknowledge a write before it is considered successful. Reads can also request a consistency level via the consistency parameter.

sharding

Collections are divided into shards, each of which is an independent store of points with its own HNSW index and payload indexes. Searches are dispatched to all shards in parallel and results are merged.

Sharding concept	Description
Auto-sharding	Qdrant automatically distributes shards across available nodes
Custom shard keys	Points can be routed by a payload field, similar to a partition key
Manual shard placement	Operators can pin specific shards to specific nodes
Resharding	Available in Qdrant Cloud for rebalancing data across nodes after scaling
Shard transfer	Moves shards between nodes with foreground or streaming modes

Custom shard keys are particularly useful for multi-tenant deployments where each tenant should be physically isolated to a subset of nodes; combined with the tenant payload index, this enables strong tenant locality [10].

replication

Each shard can have multiple replicas distributed across different nodes. The replication factor determines how many copies of each shard exist. For production systems, a replication factor of at least 2 is recommended to ensure availability during node failures. Clusters with three or more nodes and replication enabled can perform all operations even while one node is down, gaining performance benefits from load balancing.

When a node fails, the remaining replicas continue serving queries. When the failed node recovers, consensus triggers a replication process to bring the recovering node up to date with any mutations it missed. In Qdrant Cloud, replication factor changes and shard rebalancing are handled automatically; in self-hosted deployments they are operator-driven [10].

deployment options

Qdrant offers several deployment models suitable for different use cases.

Option	Description	Operator
Self-hosted (binary or Docker)	Run anywhere with the official binary, Docker image, or Helm chart	User
Qdrant Cloud (Managed)	Fully managed clusters on AWS, GCP, or Azure with usage-based billing	Qdrant
Qdrant Hybrid Cloud	Control plane managed by Qdrant; data plane runs in customer infrastructure	Qdrant control, customer data
Qdrant Private Cloud	Air-gapped, fully isolated deployment without any link to Qdrant's services	Customer
Qdrant Edge	Embedded library for on-device retrieval	Customer

Qdrant Cloud

Qdrant Cloud, launched in 2023 alongside the v1.0 server release, is the company's fully managed service. It is available on AWS, Google Cloud, and Microsoft Azure. The service offers:

A free tier with a single-node cluster (0.5 vCPU, 1 GB RAM, 4 GB disk) for prototyping.
Standard tier with usage-based hourly billing for compute, memory, and storage.
Premium tier with a 99.9 percent SLA, single sign-on, and custom contracts.

Qdrant Cloud achieved SOC 2 Type II compliance on 7 April 2024, certified across the Security, Availability, and Confidentiality trust service criteria, and is also HIPAA compliant. Hardened clusters run in unprivileged containers with strict network policies and at-rest encryption [24][25]. In April 2026, Qdrant Cloud added GPU-accelerated indexing as a managed feature, multi-availability-zone clusters, and audit logging for enterprise compliance [14].

Qdrant Hybrid Cloud

Launched on 15 April 2024, Qdrant Hybrid Cloud uses a Kubernetes-native bring-your-own-cloud (BYOC) architecture. The control plane runs in Qdrant's infrastructure while the data plane and all customer vectors stay inside the customer's VPC. Supported targets include AWS, GCP, Azure, Oracle Cloud, DigitalOcean, Vultr, OVHcloud, Scaleway, Red Hat OpenShift, and on-premises VMware. Twelve launch partners shipped integrations on day one, including LangChain, LlamaIndex, Airbyte, Jina AI, and Haystack [26].

Qdrant Private Cloud

For customers with strict data sovereignty or air-gap requirements, Qdrant Private Cloud installs the same Kubernetes-native stack entirely inside the customer's environment with no link back to Qdrant's services. The customer is fully responsible for the operational lifecycle in this mode [25].

Qdrant Edge

Announced on 29 July 2025 and currently in private beta, Qdrant Edge is an embedded library that runs the same retrieval primitives in-process on resource-constrained devices. Target use cases include robotics with on-device perception, mobile devices that need offline RAG, point-of-sale systems, and IoT predictive-maintenance agents. Qdrant Edge supports filterable HNSW, hybrid search, and multivector search, while running synchronously without background threads to fit on-device requirements [27].

comparison with alternatives

Qdrant competes with both pure-play vector databases and vector extensions for general-purpose data systems. The landscape is crowded, and each system has different strengths.

Feature	Qdrant	Pinecone	Weaviate	Milvus	Chroma	pgvector
Language	Rust	Proprietary (managed)	Go	Go and C++	Python and Rust	C (PostgreSQL extension)
Open source	Yes (Apache 2.0)	No	Yes (BSD-3)	Yes (Apache 2.0)	Yes (Apache 2.0)	Yes (PostgreSQL License)
Deployment	Self-hosted, Cloud, Hybrid, Private, Edge	Managed cloud only	Self-hosted or Cloud	Self-hosted or Zilliz Cloud	Self-hosted or Cloud	PostgreSQL extension
Sparse vectors	Yes (native)	Yes	Yes (modular)	Yes	No	Yes (`sparsevec` type)
Multivector / late interaction	Yes (native)	Limited	Yes (modular)	Yes	No	No
Filtering during search	Yes (in-graph)	Yes (metadata)	Yes (inverted index plus vector)	Yes (attribute filtering)	Yes (metadata)	SQL `WHERE` clauses
GPU-accelerated indexing	Yes (vendor-agnostic, Vulkan)	N/A (managed)	Yes (NVIDIA CAGRA)	Yes (NVIDIA CAGRA)	No	No
Quantization	Scalar, Binary, PQ, asymmetric	Limited	Yes	Yes (DiskANN, IVF-PQ)	No	No
Distributed consensus	Raft	Proprietary	Raft (since 1.25)	etcd	None	PostgreSQL replication
Multitenancy	Tiered, native	Namespaces	Tenant-based	Partition key	Collection-based	Schema-based
Typical use case	Performance-sensitive filtered search at any scale	Managed simplicity	Hybrid search with knowledge-graph features	Billion-scale enterprise deployments	Quick prototypes and small RAG apps	Vector search inside an existing PostgreSQL stack

Qdrant's clearest performance advantage is in filtered search, where the integrated in-graph filtering avoids the recall cliffs that affect pre-filtering and the wasted compute of post-filtering. In Qdrant's published benchmarks (covering datasets such as dbpedia-openai-1M-angular, deep-image-96-angular, gist-960-euclidean, and glove-100-angular), Qdrant delivers the highest queries-per-second at fixed recall on most configurations and ranks among the lowest in latency [8][13]. Independent reviews generally agree that Qdrant and Milvus are the two strongest open-source contenders, with Pinecone leading on pure operational simplicity for managed-only customers [28].

Compared to Pinecone, Qdrant offers the option of being open-source and self-hosted, which lowers cost for teams willing to manage their own infrastructure. Compared to Weaviate, Qdrant tends to use less memory and CPU at the same scale, but lacks Weaviate's built-in GraphQL API and modular knowledge-graph features. Compared to Milvus, Qdrant is simpler to operate (single Rust binary versus a distributed system with multiple components such as etcd, Pulsar, and MinIO) but may not match Milvus's throughput at the very largest scale. Compared to Chroma, Qdrant is positioned higher on the production-readiness curve, with replication, sharding, and quantization that Chroma does not currently offer.

ecosystem and integrations

Qdrant provides official client libraries in Python, TypeScript and JavaScript, Rust, Go, Java, and C#, along with community libraries in many other languages. The Python client integrates with FastEmbed, Qdrant's lightweight embedding library that runs ONNX models directly on CPU without PyTorch or CUDA dependencies, making it possible to encode text into vectors as part of a single client call. FastEmbed supports embedding models from Sentence Transformers, Jina AI, Nomic, BAAI, and others, plus rerankers such as cross-encoders [29].

Qdrant integrates with most major LLM orchestration frameworks:

Framework	Integration
LangChain	Vector store integration for RAG pipelines
LlamaIndex	Retriever integration for document question-answering
Haystack by deepset	Document store component for search pipelines
Spring AI	Java-based AI application framework
Microsoft Semantic Kernel	Orchestration SDK for AI agents
Microsoft AutoGen	Multi-agent orchestration framework
CrewAI	Multi-agent orchestration platform
n8n	Official workflow automation node
Airbyte	ELT pipelines for ingesting embeddings
Vercel AI SDK	Edge-friendly AI framework for Next.js

Qdrant also integrates with embedding providers including OpenAI, Cohere, Jina AI, and many open-source models hosted on Hugging Face. Qdrant Cloud Inference, introduced in 2025, offers managed embedding generation alongside vector storage so that a single endpoint can serve both encoding and retrieval [9].

customers and case studies

By March 2026 Qdrant powered production AI systems at more than 50 named enterprises and a much larger long tail of startups [2]. Selected case studies illustrate the typical scale and use case.

Customer	Industry	Use case	Scale
Tripadvisor	Travel	AI Trip Planner and conversational search over reviews and images	Over 1 billion reviews, hundreds of millions of images, 11 million businesses
HubSpot	CRM and marketing	Breeze AI retrieval for customer data	Production retrieval across SaaS tenants
OpenTable	Restaurants	Sparse-vector restaurant filtering and discovery	Over 60,000 restaurants
Canva	Design	AI-powered features across the design platform	Hundreds of millions of users
Deutsche Telekom	Telecommunications	Multi-agent platform for customer service	Over 2 million conversations across 10 countries
Bazaarvoice	E-commerce	Product review retrieval	Billions of reviews
Roche	Pharmaceuticals	RAG over scientific literature	Internal enterprise deployment
Bosch	Industrial	Embedded retrieval in automotive and industrial AI	Internal enterprise deployment
xAI	AI labs	Real-time RAG over X (Twitter) for Grok	Real-time social-media data
Flipkart	E-commerce	Product search and recommendations	Indian retail scale
Voiceflow	Conversational AI	Conversational agent platform	Cloud SaaS deployment
Sprinklr	Customer experience	Unified CXM platform	Enterprise SaaS
Dust	AI assistants	Workspace AI assistants	Cloud SaaS deployment

The Tripadvisor deployment is one of the most documented public case studies. Tripadvisor's data and AI team, led by Rahul Todkar, used Qdrant to build a unified user graph that captures how users engage with hotels, restaurants, and attractions across more than 21 countries. The graph powers an AI Trip Planner and a conversational search interface that has replaced filter-based search for many users. Tripadvisor reports a two- to three-fold increase in revenue per user from those who engage with its generative AI features [30].

The xAI relationship is notable for its scale and visibility: Grok, the conversational AI assistant integrated into the X social network, uses Qdrant as the retrieval layer that gives it access to real-time posts from the platform [3].

current state

With the March 2026 Series B funding of $50 million led by AVP, with participation from Bosch Ventures, Unusual Ventures, Spark Capital, and 42CAP, Qdrant plans to expand its engineering and product teams, strengthen enterprise offerings, and scale global operations [6]. CEO Andre Zayarni summarized the company's positioning at the Series B announcement with a single line: "Models get the attention, but retrieval is what makes them useful in production." [6]

Qdrant is positioning itself as the core retrieval infrastructure layer for production AI systems, with a particular focus on agentic AI applications that require fast, reliable vector search across billions of objects and thousands of intermediate steps per workflow. The product strategy is built around composable vector search: rather than imposing a fixed retrieval pipeline, Qdrant exposes core capabilities (dense vectors, sparse vectors, metadata filters, multi-vector representations, custom scoring) as flexible primitives that can be combined at query time. This allows different workloads, from real-time agent reasoning to nightly batch RAG indexing, to compose retrieval logic suited to their latency, recall, and cost requirements [6].

The open-source project continues active development. As of March 2026, Qdrant 1.17 is the latest stable release, with v1.13 having shipped GPU-accelerated indexing, Gridstore, and HNSW graph compression in January 2025; v1.16 having added Inline Storage, ACORN-1, and tiered multitenancy in late 2025; and v1.17 having introduced Relevance Feedback, audit access logging, weighted RRF, and configurable read fan-out delays in February 2026 [13]. Qdrant Edge is in private beta, and Qdrant Cloud Inference unifies embedding generation and retrieval. Together these initiatives suggest a trajectory in which Qdrant evolves from a vector database into a more general-purpose retrieval engine for AI applications.

references

Qdrant. About Us. Retrieved 2026.
Qdrant. Customers. Retrieved 2026.
TechCrunch. Open source vector database startup Qdrant raises $28M. 23 January 2024.
Qdrant. On Unstructured Data, Vector Databases, New AI Age, and Our Seed Round. 19 April 2023.
Qdrant. Announcing Qdrant's $28M Series A Funding Round. 23 January 2024.
Qdrant. We Raised $50M to Build Composable Vector Search as Core Infrastructure. 12 March 2026.
Andrey Vasnetsov. CV. Retrieved 2026.
Qdrant. Vector Search Benchmarks. Retrieved 2026.
Qdrant. 2025 Recap: Powering the Agentic Era. January 2026.
DeepWiki. qdrant/qdrant: System Architecture. Retrieved 2026.
Qdrant Documentation. Storage. Retrieved 2026.
Qdrant. Introducing Gridstore: Qdrant's Custom Key-Value Store. January 2025.
Qdrant. Qdrant 1.13 Release Notes: GPU Indexing, Strict Mode and New Storage Engine. 23 January 2025.
Yahoo Finance. Qdrant Cloud Ships Enterprise-Grade Features: GPU-Accelerated Indexing, Multi-AZ Clusters, and Audit Logging. 28 April 2026.
Qdrant Documentation. Indexing. Retrieved 2026.
Qdrant. Binary Quantization: A Quantum Leap in Vector Search. 18 September 2023.
Qdrant. Scalar Quantization: Memory and Performance Optimization. 30 March 2023.
Qdrant. Product Quantization for Vector Search. May 2023.
Qdrant. Multitenancy and Custom Sharding in Qdrant. Retrieved 2026.
Qdrant Documentation. Filtering. Retrieved 2026.
Qdrant. Sparse Vectors. 9 December 2023.
Qdrant. Late Interaction Models with Multivectors. 2024.
Qdrant Documentation. Snapshots. Retrieved 2026.
Qdrant. Qdrant Attains SOC 2 Type II Audit Report. 23 May 2024.
Qdrant Documentation. Cloud Security. Retrieved 2026.
Qdrant. Introducing Qdrant Hybrid Cloud: A New Era of Vector Search. 15 April 2024.
Qdrant. Qdrant Edge: On-Device Retrieval for Embedded AI. 29 July 2025.
Xenoss. Pinecone vs Qdrant vs Weaviate: Best Vector Database. Retrieved 2026.
Qdrant. FastEmbed: Qdrant's Efficient Python Library for Embedding Generation. Retrieved 2026.
Qdrant. How Tripadvisor Drives 2 to 3x More Revenue with Qdrant-Powered AI. Retrieved 2026.

history and founding

funding rounds

company growth

architecture

why Rust

segments and storage

Gridstore

HNSW implementation

quantization

choosing a quantization strategy

key features

payload and filtering

payload indexes

multitenancy

sparse vectors and hybrid search

multivector and late interaction

query API and advanced retrieval

collection management

snapshots and backup

collection snapshots

full storage snapshots

snapshot storage options

distributed deployment and clustering

consensus with Raft

sharding

replication

deployment options

Qdrant Cloud

Qdrant Hybrid Cloud

Qdrant Private Cloud

Qdrant Edge

comparison with alternatives

ecosystem and integrations

customers and case studies

current state

see also

references

Improve this article

Related Articles

Weaviate

Chroma

pgvector

Milvus

Open-source AI

MCP server

history and founding

funding rounds

company growth

architecture

why Rust

segments and storage

Gridstore

HNSW implementation

quantization

choosing a quantization strategy

key features

payload and filtering

payload indexes

multitenancy

sparse vectors and hybrid search

multivector and late interaction

query API and advanced retrieval

collection management

snapshots and backup

collection snapshots

full storage snapshots

snapshot storage options

distributed deployment and clustering

consensus with Raft

sharding

replication

deployment options

Qdrant Cloud

Qdrant Hybrid Cloud

Qdrant Private Cloud

Qdrant Edge

comparison with alternatives

ecosystem and integrations

customers and case studies

current state