Qdrant
Last reviewed
May 9, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v7 ยท 6,178 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 9, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v7 ยท 6,178 words
Add missing citations, update stale details, or suggest a clearer explanation.
Qdrant (pronounced "quadrant") is an open-source vector database and similarity search engine written in Rust and designed for high-performance retrieval over high-dimensional data. Founded in 2021 in Berlin, Germany by Andre Zayarni (CEO) and Andrey Vasnetsov (CTO), Qdrant has become one of the most widely deployed vector engines in production artificial intelligence systems. By March 2026 the project had surpassed 250 million package downloads, accumulated more than 29,000 stars on GitHub, and was used by enterprises including Tripadvisor, HubSpot, Canva, OpenTable, Roche, Bosch, Deutsche Telekom, Flipkart, and Elon Musk's xAI for the Grok chatbot [1][2][3].
The company has raised approximately $85.5 million across three rounds: a $7.5 million seed in April 2023 led by Unusual Ventures, a $28 million Series A in January 2024 led by Spark Capital, and a $50 million Series B in March 2026 led by AVP [4][5][6]. Qdrant is distributed under the Apache License 2.0 and offered both as a self-hosted binary and as a managed service through Qdrant Cloud, Qdrant Hybrid Cloud, and Qdrant Private Cloud.
The seeds of Qdrant were planted before the company itself existed. Andrey Vasnetsov, a machine learning engineer who had worked on search and recommendation systems at Mail.Ru Group, the social-matching startup dotin, Tinkoff Bank, and the talent platform MoBerries, repeatedly ran into the same problem: existing vector similarity tools did not behave like databases. Libraries such as FAISS and Annoy were excellent for static benchmark datasets but lacked persistence, payload storage, filtering, replication, and an operational interface suitable for production services [3][7].
Vasnetsov began writing a vector search engine from scratch in Rust and published the first public release on GitHub in 2021. The early developer response was strong enough to convince him and Andre Zayarni, who had previously held engineering and product leadership roles at Bigpoint, MoBerries, and the sports-tech company Playtomic, to incorporate the project as a company. Qdrant GmbH was registered in Berlin in 2021, with Zayarni as CEO and Vasnetsov as CTO [3][7].
The choice of Rust was deliberate. The team wanted memory safety without garbage collection, low and predictable latency under heavy load, and direct access to SIMD instructions for distance calculations. That decision later defined Qdrant's performance profile: by 2024 it routinely ranked among the lowest-memory and lowest-latency engines in community vector search benchmarks, and the codebase remained more than 87 percent Rust at the v1.17 milestone [3][8].
The company's funding history reflects the rapid growth of investor interest in retrieval infrastructure for generative AI.
| Round | Date | Amount | Lead investor | Other participants | Source |
|---|---|---|---|---|---|
| Seed | 19 April 2023 | $7.5M | Unusual Ventures | 42CAP, IBB Ventures, angels including Amr Awadallah | [4] |
| Series A | 23 January 2024 | $28M | Spark Capital | Unusual Ventures, 42CAP | [5] |
| Series B | 12 March 2026 | $50M | AVP | Bosch Ventures, Unusual Ventures, Spark Capital, 42CAP | [6] |
| Total | ~$85.5M |
Unusual Ventures partner John Vrionis, who led the seed round, characterized the timing as the moment vector search shifted from a research curiosity to core infrastructure for production AI applications. Spark Capital partner Yasmin Razavi joined Qdrant's board following the Series A. The Series B, led by Atlantic Bridge spin-out AVP and announced on 12 March 2026, valued the company at a level the parties did not disclose; participating investors included Bosch Ventures, which connected Qdrant to enterprise users in industrial automation and automotive software [4][5][6].
By the time of the Series B announcement, Qdrant had grown to more than 100 employees distributed across over 20 countries, with engineering concentrated in Berlin and a Discord community of more than 9,000 developers. The company holds events including the annual Vector Space Day in Berlin, which drew more than 400 attendees in 2025 [1][9].
Qdrant runs as a single statically linked Rust binary that exposes REST and gRPC APIs. The same binary handles standalone, replicated, and sharded deployments, with a custom storage engine, a custom key-value store called Gridstore, and a heavily modified HNSW graph index at its core. The system is intentionally focused on vector search rather than general-purpose data storage, and it rejects features (full transactional ACID guarantees, SQL parsing, generic document modeling) that would compromise its primary workload [10].
The choice of Rust as the implementation language has significant architectural implications. Rust provides several advantages for a vector database:
The practical result is a compact memory footprint and predictable latency. In community benchmarks, Qdrant consistently ranks among the lowest in memory consumption per vector among the major vector databases, and among the highest in queries-per-second at fixed recall [8].
A Qdrant collection is divided into one or more segments. Each segment is a self-contained unit that owns its own vector storage, payload storage, HNSW index, and ID mapper. Segments come in two states:
A background optimizer thread merges small segments, converts appendable segments to sealed ones, and rebuilds indexes in the background. This is conceptually similar to how an LSM-tree database compacts SSTables, but adapted for vector indexes.
Durability is provided by a write-ahead log (WAL). Every operation is recorded in the WAL before being applied to a segment. If the process crashes before changes are flushed, the WAL is replayed during startup. Each segment also stores a per-point version, so out-of-order replays cannot resurrect older state [11].
Until version 1.13, Qdrant used RocksDB as the underlying key-value store for payload and sparse vector data. The team identified several mismatches between RocksDB's design and Qdrant's actual workload: the LSM-tree compaction cycle introduced random latency spikes, the system offered far more configuration knobs than Qdrant needed, and the C++ interop boundary slowed iteration on storage features [12].
In January 2025, Qdrant 1.13 shipped Gridstore, a custom key-value store designed specifically for sequential integer keys (which is all Qdrant ever stores). Gridstore is structured around three layers: a Data Layer with a pointer-array tracker, a Mask Layer that tracks block usage, and a Gaps Layer that locates free space efficiently. Because there is no compaction phase, write latency is more predictable, and the storage engine can be debugged without crossing a foreign-function-interface boundary [12].
Qdrant's index is built on HNSW (Hierarchical Navigable Small World) graphs, the dominant approximate nearest neighbor algorithm in production vector search. Unlike a textbook implementation, however, Qdrant's HNSW is integrated with the payload index. Filter conditions are applied during graph traversal rather than as a pre-filter on candidates or a post-filter on results. The HNSW graph is also extended with additional edges based on indexed payload values, so a filtered traversal does not become disconnected when many neighbors are pruned [10][13].
The HNSW index supports configurable parameters:
| Parameter | Default | Description |
|---|---|---|
m | 16 | Number of edges per node in the HNSW graph |
ef_construct | 100 | Size of the dynamic candidate list during index construction |
ef | ef_construct | Search-time candidate list size, controls speed/recall trade-off |
full_scan_threshold | 10,000 | Collections smaller than this use brute-force search |
max_indexing_threads | 0 (auto) | Number of threads for index building |
on_disk | false | Whether to store the HNSW graph on disk instead of in memory |
Version 1.13 added GPU-accelerated HNSW indexing. The implementation uses the Vulkan graphics API rather than CUDA, making it portable across NVIDIA, AMD, and Intel GPUs. Qdrant reports up to 10x faster index construction relative to CPU-only builds on equivalent hardware. GPU indexing is enabled with the environment variable QDRANT__GPU__INDEXING=1 and a Docker image suffix such as qdrant/qdrant:v1.13.0-gpu-nvidia [13][14].
Version 1.13 also introduced HNSW graph compression based on delta encoding, which reduces index memory by up to 30 percent without measurable recall loss. Version 1.16 added Inline Storage, which co-locates quantized vector data inside the HNSW graph nodes so that traversal touches fewer pages of disk or memory [13][15].
For highly selective filters, version 1.16 also implemented the ACORN-1 search method (an adaptation of the ACORN approach to filtered ANN search). When direct neighbors of a node are excluded by a filter, ACORN looks one hop further to recover graph connectivity, improving recall on queries that combine vector similarity with rare metadata conditions [13].
Quantization is the central tool Qdrant uses to keep large indexes in RAM or to make disk-based search feasible. The system has shipped three families of vector quantization, all of which can be enabled and disabled per collection without rebuilding the underlying vectors.
| Quantization | Compression | Speed-up | Typical recall | Introduced |
|---|---|---|---|---|
| Scalar (int8) | 4x | up to 2x | ~0.99 | v1.1, March 2023 |
| Product quantization | up to 64x | 0.5x | ~0.7 at 12x | v1.2, May 2023 |
| Binary (1-bit) | 32x | up to 40x | ~0.95 to 0.99* | v1.5, September 2023 |
| Binary (1.5-bit) | 24x | ~30x | similar to 1-bit | 2025 |
| Binary (2-bit) | 16x | ~20x | similar to 1-bit | 2025 |
*Recall figures for binary quantization assume 1024-dimensional or larger embeddings with appropriate oversampling and rescoring [16][17][18].
Scalar quantization, introduced in version 1.1 (March 2023), maps each float32 component to an int8 integer using a linear transformation calibrated from a configurable quantile (typically 0.99). This reduces vector memory by approximately 75 percent, allows SIMD-accelerated 8-bit dot products, and on internal benchmarks loses 0.3 percent recall or less while improving latency by 28 to 60 percent on real datasets [17].
Product quantization, added in version 1.2 (May 2023), splits each vector into chunks and runs k-means clustering with k=256 on each chunk independently. Compression ratios can range from 4x up to 64x. The trade-off is much steeper than scalar quantization: for a Glove-100 dataset, 4x PQ retains 0.71 mean precision (almost identical to the float baseline), but 12x PQ drops to 0.59. PQ is recommended only when memory constraints clearly outweigh recall [18].
Binary quantization, announced in September 2023 along with the partnership with OpenAI, maps each component to a single bit (1 if positive, 0 otherwise). For 1536-dimensional text-embedding-3-small vectors and 3072-dimensional text-embedding-3-large vectors from OpenAI, Qdrant reports recall of 0.98 to 0.997 with appropriate oversampling. Storage drops by a factor of 32, and search becomes up to 40 times faster because distance computations reduce to bitwise XOR and population count [16]. Binary quantization performs poorly on embeddings smaller than 1024 dimensions, which led to the 1.5-bit and 2-bit variants introduced in 2025 for medium-dimensional models [16].
All quantization modes support rescoring: the candidate list is built using the compressed representation, then the top results are re-ranked using the full-precision vectors. Asymmetric quantization, which keeps query vectors in a higher-precision representation than stored vectors, is also supported.
| Scenario | Recommended quantization | Rationale |
|---|---|---|
| General production workload | Scalar | Best balance of memory savings and recall; minimal accuracy loss |
| 1536+ dimensional embeddings | Binary (1-bit) | Large embeddings tolerate aggressive compression with rescoring |
| Medium-dimensional embeddings (768 to 1024) | Binary (1.5-bit or 2-bit) | Better recall than 1-bit at lower compression |
| Maximum compression needed | Product quantization | Highest ratio but lowest recall; best for cold storage tiers |
| Mixed workload with precision-sensitive queries | Asymmetric quantization | Full-precision queries with compressed storage |
Quantization can be configured per collection, and re-quantization is run by the optimizer in the background when the configuration changes [16][17].
Every vector stored in Qdrant can carry an arbitrary JSON payload as metadata. The filtering engine supports a comprehensive list of conditions:
| Condition | Use case |
|---|---|
match | Exact match on keyword, integer, boolean, or UUID fields |
match_any (v1.1+) | IN-style matching against a list of values |
match_except (v1.2+) | NOT-IN-style filtering |
range | Numeric comparisons (gt, gte, lt, lte) |
datetime_range (v1.8+) | RFC 3339 datetime comparisons |
geo_bounding_box | Rectangular geographic region |
geo_radius | Circular geographic region |
geo_polygon | Arbitrary geographic polygons with holes |
text_match | Token-level text search with configurable tokenizer |
phrase_match (v1.15+) | Exact phrase matching with token order |
nested (v1.2+) | Apply conditions to elements of an array of objects |
values_count | Filter on array length |
is_empty / is_null | Handle missing or null fields |
has_id | Filter by point IDs |
has_vector (v1.13+) | Filter by presence of a named vector |
Filters are combined using three boolean operators:
| Combinator | Logic |
|---|---|
must | Logical AND over all conditions |
should | Logical OR over conditions |
must_not | Logical NOT over conditions |
Because filters are applied during graph traversal rather than as a separate step, filtered searches perform well even when the filter is highly selective (matching only a small fraction of the dataset). Other vector databases often struggle in this scenario, since pre-filtering can fragment the graph and post-filtering wastes computation on irrelevant candidates [13][20].
To accelerate filtered searches, Qdrant supports creating explicit payload indexes on frequently filtered fields:
| Index type | Field type | Description |
|---|---|---|
| Keyword | String | Hash-based index for exact string matching |
| Integer | Integer | Range index for numerical filtering |
| Float | Float | Range index for floating-point fields |
| Bool | Boolean | Bitmap index for boolean fields |
| Geo | GeoPoint | R-tree index for geographic queries |
| Full-text | Text | Inverted index with configurable tokenizer |
| Datetime | Datetime | Range index for temporal filtering |
| UUID | UUID | Hash-based index for UUID matching |
Full-text indexes can be configured with multiple tokenizers (word, whitespace, prefix, multilingual), Snowball stemming, ASCII folding for diacritics, and language-specific or custom stopword lists. Most index types can be stored on disk to conserve RAM. Specialized variants include the tenant index (is_tenant: true) for multi-tenant collections and the principal index for high-cardinality numeric fields used in nearly every query [13].
Creating a payload index before inserting data is more efficient than adding it afterward, because the optimizer can build the HNSW graph with the payload index already in place rather than rebuilding it later [10].
Qdrant provides built-in multitenancy support, allowing many tenants to share a single collection while remaining logically isolated. Tenants are typically separated by a payload field (such as tenant_id) that is marked is_tenant: true, which causes the optimizer to keep each tenant's vectors in dedicated segments. This avoids the cost of a separate collection per tenant and keeps the HNSW graph for each tenant compact and efficient.
Version 1.16 introduced tiered multitenancy, which addresses the noisy-neighbor problem in shared deployments. Tenants can be promoted between hot, warm, and cold tiers based on activity. Hot tenants keep their vectors in RAM, warm tenants use quantized vectors with rescoring, and cold tenants live entirely on disk. Promotions and demotions happen in the background without rebuilding indexes [13][19].
Version 1.7 (December 2023) added native support for sparse vectors, enabling hybrid search that combines semantic similarity (dense vectors) with lexical relevance (sparse vectors produced by BM25, TF-IDF, or SPLADE models). Sparse vectors are indexed exactly (no approximation) and support both in-memory and on-disk storage. An optional inverse document frequency (IDF) modifier weights tokens by rarity at query time, which improves quality for natural-language queries.
A single point can carry multiple named vectors. The Query API supports a fan-out of prefetches across vector fields and combines results using fusion methods including Reciprocal Rank Fusion (RRF), Relative Score Fusion, and Weighted RRF [21]. This capability is fundamental for retrieval-augmented generation (RAG) pipelines, where combining keyword and semantic matching often outperforms either approach alone.
Version 1.10 (July 2024) introduced multivector support, in which a single point can carry an arbitrary set of vectors of equal dimensionality. This is the natural representation for late interaction models such as ColBERT, where each token in a document is encoded into its own vector and similarity is computed using the MaxSim operator: for each query token, take the maximum similarity over document tokens, then sum over query tokens [22].
A typical pipeline uses a multistage Query API call that prefetches candidates with a compressed dense vector, re-ranks with full-precision dense vectors, and finally re-ranks with ColBERT multivectors. Multivectors interoperate with all quantization modes [22].
Version 1.10 also generalized the search interface into a unified Query API that supports nearest-neighbor search, recommendations (based on positive and negative example IDs or vectors), discovery search (finding points in regions defined by example pairs), counting, and faceted aggregation. The same API supports nested prefetches, allowing complex retrieval pipelines to be expressed as a single request rather than multiple round trips.
More recent additions include Maximal Marginal Relevance (MMR) for diversity-aware reranking, score-boosting reranking that combines vector similarity with business signals, and full-text filtering with multilingual stemming and stopword support [9][13].
Qdrant organizes data into collections, which contain points (vectors with optional payloads). Collection management is handled through the REST or gRPC API, or any of the official client libraries:
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance
client = QdrantClient(url="http://localhost:6333")
# Create a collection
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(
size=1536,
distance=Distance.COSINE,
),
)
# Create a collection with multiple named vectors
client.create_collection(
collection_name="multimodal",
vectors_config={
"text": VectorParams(size=768, distance=Distance.COSINE),
"image": VectorParams(size=512, distance=Distance.COSINE),
},
)
# List collections
collections = client.get_collections()
# Get collection info (vector count, config, status)
info = client.get_collection("documents")
# Delete a collection
client.delete_collection("documents")
Collections support aliases, which allow a stable name to point at different underlying collections. This enables blue-green deployments where a new index is built in a separate collection and the alias is switched atomically once the new index is ready.
A simple insertion and search flow looks like this:
from qdrant_client.models import PointStruct, Filter, FieldCondition, MatchValue
client.upsert(
collection_name="documents",
points=[
PointStruct(id=1, vector=[0.1] * 1536, payload={"category": "news"}),
PointStruct(id=2, vector=[0.2] * 1536, payload={"category": "blog"}),
],
)
results = client.query_points(
collection_name="documents",
query=[0.15] * 1536,
query_filter=Filter(
must=[FieldCondition(key="category", match=MatchValue(value="news"))]
),
limit=5,
).points
The equivalent REST call uses the POST /collections/{name}/points/query endpoint with a JSON body.
Qdrant provides a comprehensive snapshot system for backups and data migration.
A snapshot is a tar archive containing the full data and configuration of a specific collection on a specific node at a specific point in time. Snapshots are created through the REST API:
| Operation | Endpoint | Description |
|---|---|---|
| Create | POST /collections/{name}/snapshots | Creates a new snapshot of the collection |
| List | GET /collections/{name}/snapshots | Lists all available snapshots |
| Download | GET /collections/{name}/snapshots/{snapshot_name} | Downloads the snapshot file |
| Delete | DELETE /collections/{name}/snapshots/{snapshot_name} | Removes a snapshot |
| Recover | PUT /collections/{name}/snapshots/recover | Restores a collection from a snapshot file or URL |
Full storage snapshots capture the entire Qdrant instance, including all collections, aliases, and cluster metadata. The snapshot name follows the format full-snapshot-<timestamp>.snapshot. This is useful for complete instance backups or for migrating an entire Qdrant deployment to new hardware.
Snapshots can be stored locally on the filesystem or in S3-compatible object storage. The storage backend is configured in the Qdrant configuration file:
snapshots_path directory.For disaster recovery, Qdrant Cloud provides automatic incremental backups for AWS and GCP clusters, where each backup contains only the data that changed since the previous backup [23].
Qdrant supports distributed deployment for horizontal scaling and high availability.
In distributed mode, Qdrant uses the Raft consensus protocol to maintain agreement on cluster topology and collection structure. All metadata operations (creating collections, changing replication factors, transferring shards) are routed through the consensus layer. Data operations (upserts and queries) bypass consensus and go directly to the relevant shards using optimistic replication, which is what allows the system to handle high write rates without the consensus layer becoming a bottleneck [10].
Write consistency can be tuned per request. The write_ordering option supports weak, medium, and strong modes; the write_consistency_factor controls how many replicas must acknowledge a write before it is considered successful. Reads can also request a consistency level via the consistency parameter.
Collections are divided into shards, each of which is an independent store of points with its own HNSW index and payload indexes. Searches are dispatched to all shards in parallel and results are merged.
| Sharding concept | Description |
|---|---|
| Auto-sharding | Qdrant automatically distributes shards across available nodes |
| Custom shard keys | Points can be routed by a payload field, similar to a partition key |
| Manual shard placement | Operators can pin specific shards to specific nodes |
| Resharding | Available in Qdrant Cloud for rebalancing data across nodes after scaling |
| Shard transfer | Moves shards between nodes with foreground or streaming modes |
Custom shard keys are particularly useful for multi-tenant deployments where each tenant should be physically isolated to a subset of nodes; combined with the tenant payload index, this enables strong tenant locality [10].
Each shard can have multiple replicas distributed across different nodes. The replication factor determines how many copies of each shard exist. For production systems, a replication factor of at least 2 is recommended to ensure availability during node failures. Clusters with three or more nodes and replication enabled can perform all operations even while one node is down, gaining performance benefits from load balancing.
When a node fails, the remaining replicas continue serving queries. When the failed node recovers, consensus triggers a replication process to bring the recovering node up to date with any mutations it missed. In Qdrant Cloud, replication factor changes and shard rebalancing are handled automatically; in self-hosted deployments they are operator-driven [10].
Qdrant offers several deployment models suitable for different use cases.
| Option | Description | Operator |
|---|---|---|
| Self-hosted (binary or Docker) | Run anywhere with the official binary, Docker image, or Helm chart | User |
| Qdrant Cloud (Managed) | Fully managed clusters on AWS, GCP, or Azure with usage-based billing | Qdrant |
| Qdrant Hybrid Cloud | Control plane managed by Qdrant; data plane runs in customer infrastructure | Qdrant control, customer data |
| Qdrant Private Cloud | Air-gapped, fully isolated deployment without any link to Qdrant's services | Customer |
| Qdrant Edge | Embedded library for on-device retrieval | Customer |
Qdrant Cloud, launched in 2023 alongside the v1.0 server release, is the company's fully managed service. It is available on AWS, Google Cloud, and Microsoft Azure. The service offers:
Qdrant Cloud achieved SOC 2 Type II compliance on 7 April 2024, certified across the Security, Availability, and Confidentiality trust service criteria, and is also HIPAA compliant. Hardened clusters run in unprivileged containers with strict network policies and at-rest encryption [24][25]. In April 2026, Qdrant Cloud added GPU-accelerated indexing as a managed feature, multi-availability-zone clusters, and audit logging for enterprise compliance [14].
Launched on 15 April 2024, Qdrant Hybrid Cloud uses a Kubernetes-native bring-your-own-cloud (BYOC) architecture. The control plane runs in Qdrant's infrastructure while the data plane and all customer vectors stay inside the customer's VPC. Supported targets include AWS, GCP, Azure, Oracle Cloud, DigitalOcean, Vultr, OVHcloud, Scaleway, Red Hat OpenShift, and on-premises VMware. Twelve launch partners shipped integrations on day one, including LangChain, LlamaIndex, Airbyte, Jina AI, and Haystack [26].
For customers with strict data sovereignty or air-gap requirements, Qdrant Private Cloud installs the same Kubernetes-native stack entirely inside the customer's environment with no link back to Qdrant's services. The customer is fully responsible for the operational lifecycle in this mode [25].
Announced on 29 July 2025 and currently in private beta, Qdrant Edge is an embedded library that runs the same retrieval primitives in-process on resource-constrained devices. Target use cases include robotics with on-device perception, mobile devices that need offline RAG, point-of-sale systems, and IoT predictive-maintenance agents. Qdrant Edge supports filterable HNSW, hybrid search, and multivector search, while running synchronously without background threads to fit on-device requirements [27].
Qdrant competes with both pure-play vector databases and vector extensions for general-purpose data systems. The landscape is crowded, and each system has different strengths.
| Feature | Qdrant | Pinecone | Weaviate | Milvus | Chroma | pgvector |
|---|---|---|---|---|---|---|
| Language | Rust | Proprietary (managed) | Go | Go and C++ | Python and Rust | C (PostgreSQL extension) |
| Open source | Yes (Apache 2.0) | No | Yes (BSD-3) | Yes (Apache 2.0) | Yes (Apache 2.0) | Yes (PostgreSQL License) |
| Deployment | Self-hosted, Cloud, Hybrid, Private, Edge | Managed cloud only | Self-hosted or Cloud | Self-hosted or Zilliz Cloud | Self-hosted or Cloud | PostgreSQL extension |
| Sparse vectors | Yes (native) | Yes | Yes (modular) | Yes | No | Yes (sparsevec type) |
| Multivector / late interaction | Yes (native) | Limited | Yes (modular) | Yes | No | No |
| Filtering during search | Yes (in-graph) | Yes (metadata) | Yes (inverted index plus vector) | Yes (attribute filtering) | Yes (metadata) | SQL WHERE clauses |
| GPU-accelerated indexing | Yes (vendor-agnostic, Vulkan) | N/A (managed) | Yes (NVIDIA CAGRA) | Yes (NVIDIA CAGRA) | No | No |
| Quantization | Scalar, Binary, PQ, asymmetric | Limited | Yes | Yes (DiskANN, IVF-PQ) | No | No |
| Distributed consensus | Raft | Proprietary | Raft (since 1.25) | etcd | None | PostgreSQL replication |
| Multitenancy | Tiered, native | Namespaces | Tenant-based | Partition key | Collection-based | Schema-based |
| Typical use case | Performance-sensitive filtered search at any scale | Managed simplicity | Hybrid search with knowledge-graph features | Billion-scale enterprise deployments | Quick prototypes and small RAG apps | Vector search inside an existing PostgreSQL stack |
Qdrant's clearest performance advantage is in filtered search, where the integrated in-graph filtering avoids the recall cliffs that affect pre-filtering and the wasted compute of post-filtering. In Qdrant's published benchmarks (covering datasets such as dbpedia-openai-1M-angular, deep-image-96-angular, gist-960-euclidean, and glove-100-angular), Qdrant delivers the highest queries-per-second at fixed recall on most configurations and ranks among the lowest in latency [8][13]. Independent reviews generally agree that Qdrant and Milvus are the two strongest open-source contenders, with Pinecone leading on pure operational simplicity for managed-only customers [28].
Compared to Pinecone, Qdrant offers the option of being open-source and self-hosted, which lowers cost for teams willing to manage their own infrastructure. Compared to Weaviate, Qdrant tends to use less memory and CPU at the same scale, but lacks Weaviate's built-in GraphQL API and modular knowledge-graph features. Compared to Milvus, Qdrant is simpler to operate (single Rust binary versus a distributed system with multiple components such as etcd, Pulsar, and MinIO) but may not match Milvus's throughput at the very largest scale. Compared to Chroma, Qdrant is positioned higher on the production-readiness curve, with replication, sharding, and quantization that Chroma does not currently offer.
Qdrant provides official client libraries in Python, TypeScript and JavaScript, Rust, Go, Java, and C#, along with community libraries in many other languages. The Python client integrates with FastEmbed, Qdrant's lightweight embedding library that runs ONNX models directly on CPU without PyTorch or CUDA dependencies, making it possible to encode text into vectors as part of a single client call. FastEmbed supports embedding models from Sentence Transformers, Jina AI, Nomic, BAAI, and others, plus rerankers such as cross-encoders [29].
Qdrant integrates with most major LLM orchestration frameworks:
| Framework | Integration |
|---|---|
| LangChain | Vector store integration for RAG pipelines |
| LlamaIndex | Retriever integration for document question-answering |
| Haystack by deepset | Document store component for search pipelines |
| Spring AI | Java-based AI application framework |
| Microsoft Semantic Kernel | Orchestration SDK for AI agents |
| Microsoft AutoGen | Multi-agent orchestration framework |
| CrewAI | Multi-agent orchestration platform |
| n8n | Official workflow automation node |
| Airbyte | ELT pipelines for ingesting embeddings |
| Vercel AI SDK | Edge-friendly AI framework for Next.js |
Qdrant also integrates with embedding providers including OpenAI, Cohere, Jina AI, and many open-source models hosted on Hugging Face. Qdrant Cloud Inference, introduced in 2025, offers managed embedding generation alongside vector storage so that a single endpoint can serve both encoding and retrieval [9].
By March 2026 Qdrant powered production AI systems at more than 50 named enterprises and a much larger long tail of startups [2]. Selected case studies illustrate the typical scale and use case.
| Customer | Industry | Use case | Scale |
|---|---|---|---|
| Tripadvisor | Travel | AI Trip Planner and conversational search over reviews and images | Over 1 billion reviews, hundreds of millions of images, 11 million businesses |
| HubSpot | CRM and marketing | Breeze AI retrieval for customer data | Production retrieval across SaaS tenants |
| OpenTable | Restaurants | Sparse-vector restaurant filtering and discovery | Over 60,000 restaurants |
| Canva | Design | AI-powered features across the design platform | Hundreds of millions of users |
| Deutsche Telekom | Telecommunications | Multi-agent platform for customer service | Over 2 million conversations across 10 countries |
| Bazaarvoice | E-commerce | Product review retrieval | Billions of reviews |
| Roche | Pharmaceuticals | RAG over scientific literature | Internal enterprise deployment |
| Bosch | Industrial | Embedded retrieval in automotive and industrial AI | Internal enterprise deployment |
| xAI | AI labs | Real-time RAG over X (Twitter) for Grok | Real-time social-media data |
| Flipkart | E-commerce | Product search and recommendations | Indian retail scale |
| Voiceflow | Conversational AI | Conversational agent platform | Cloud SaaS deployment |
| Sprinklr | Customer experience | Unified CXM platform | Enterprise SaaS |
| Dust | AI assistants | Workspace AI assistants | Cloud SaaS deployment |
The Tripadvisor deployment is one of the most documented public case studies. Tripadvisor's data and AI team, led by Rahul Todkar, used Qdrant to build a unified user graph that captures how users engage with hotels, restaurants, and attractions across more than 21 countries. The graph powers an AI Trip Planner and a conversational search interface that has replaced filter-based search for many users. Tripadvisor reports a two- to three-fold increase in revenue per user from those who engage with its generative AI features [30].
The xAI relationship is notable for its scale and visibility: Grok, the conversational AI assistant integrated into the X social network, uses Qdrant as the retrieval layer that gives it access to real-time posts from the platform [3].
With the March 2026 Series B funding of $50 million led by AVP, with participation from Bosch Ventures, Unusual Ventures, Spark Capital, and 42CAP, Qdrant plans to expand its engineering and product teams, strengthen enterprise offerings, and scale global operations [6]. CEO Andre Zayarni summarized the company's positioning at the Series B announcement with a single line: "Models get the attention, but retrieval is what makes them useful in production." [6]
Qdrant is positioning itself as the core retrieval infrastructure layer for production AI systems, with a particular focus on agentic AI applications that require fast, reliable vector search across billions of objects and thousands of intermediate steps per workflow. The product strategy is built around composable vector search: rather than imposing a fixed retrieval pipeline, Qdrant exposes core capabilities (dense vectors, sparse vectors, metadata filters, multi-vector representations, custom scoring) as flexible primitives that can be combined at query time. This allows different workloads, from real-time agent reasoning to nightly batch RAG indexing, to compose retrieval logic suited to their latency, recall, and cost requirements [6].
The open-source project continues active development. As of March 2026, Qdrant 1.17 is the latest stable release, with v1.13 having shipped GPU-accelerated indexing, Gridstore, and HNSW graph compression in January 2025; v1.16 having added Inline Storage, ACORN-1, and tiered multitenancy in late 2025; and v1.17 having introduced Relevance Feedback, audit access logging, weighted RRF, and configurable read fan-out delays in February 2026 [13]. Qdrant Edge is in private beta, and Qdrant Cloud Inference unifies embedding generation and retrieval. Together these initiatives suggest a trajectory in which Qdrant evolves from a vector database into a more general-purpose retrieval engine for AI applications.