# Qdrant

> Source: https://aiwiki.ai/wiki/qdrant
> Updated: 2026-06-21
> Categories: AI Infrastructure, Open Source AI
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**Qdrant** (pronounced "quadrant") is an open-source [vector database](/wiki/vector_database) and similarity search engine written in [Rust](/wiki/rust_programming_language) and designed for high-performance retrieval over high-dimensional data. Founded in 2021 in Berlin, Germany by [Andre Zayarni](/wiki/andre_zayarni) (CEO) and [Andrey Vasnetsov](/wiki/andrey_vasnetsov) (CTO), Qdrant has become one of the most widely deployed vector engines in production [artificial intelligence](/wiki/artificial_intelligence) systems. By March 2026 the project had surpassed 250 million package downloads, accumulated more than 29,000 stars on [GitHub](/wiki/github), and was used by enterprises including Tripadvisor, HubSpot, Canva, OpenTable, Roche, Bosch, Deutsche Telekom, Flipkart, and Elon Musk's [xAI](/wiki/xai) for the [Grok](/wiki/grok) chatbot [1][2][3]. As CEO Andre Zayarni framed the company's positioning at its March 2026 Series B announcement, "Models get the attention, but retrieval is what makes them useful in production." [6][31]

The company has raised approximately $85.5 million across three rounds: a $7.5 million seed in April 2023 led by Unusual Ventures, a $28 million Series A in January 2024 led by Spark Capital, and a $50 million Series B in March 2026 led by AVP [4][5][6]. Qdrant is distributed under the [Apache License 2.0](/wiki/apache_license) and offered both as a self-hosted binary and as a managed service through Qdrant Cloud, Qdrant Hybrid Cloud, and Qdrant Private Cloud.

## When was Qdrant founded and who created it?

The seeds of Qdrant were planted before the company itself existed. Andrey Vasnetsov, a machine learning engineer who had worked on search and recommendation systems at Mail.Ru Group, the social-matching startup dotin, Tinkoff Bank, and the talent platform MoBerries, repeatedly ran into the same problem: existing vector similarity tools did not behave like databases. Libraries such as [FAISS](/wiki/faiss) and Annoy were excellent for static benchmark datasets but lacked persistence, payload storage, filtering, replication, and an operational interface suitable for production services [3][7].

Vasnetsov began writing a vector search engine from scratch in [Rust](/wiki/rust_programming_language) and published the first public release on GitHub in 2021. The early developer response was strong enough to convince him and Andre Zayarni, who had previously held engineering and product leadership roles at Bigpoint, MoBerries, and the sports-tech company Playtomic, to incorporate the project as a company. Qdrant GmbH was registered in Berlin in 2021, with Zayarni as CEO and Vasnetsov as CTO [3][7].

The choice of [Rust](/wiki/rust_programming_language) was deliberate. The team wanted memory safety without garbage collection, low and predictable latency under heavy load, and direct access to [SIMD](/wiki/simd) instructions for distance calculations. That decision later defined Qdrant's performance profile: by 2024 it routinely ranked among the lowest-memory and lowest-latency engines in community vector search benchmarks, and the codebase remained more than 87 percent Rust at the v1.17 milestone [3][8].

### How much funding has Qdrant raised?

The company's funding history reflects the rapid growth of investor interest in retrieval infrastructure for [generative AI](/wiki/generative_ai).

| Round | Date | Amount | Lead investor | Other participants | Source |
|---|---|---|---|---|---|
| Seed | 19 April 2023 | $7.5M | Unusual Ventures | 42CAP, IBB Ventures, angels including Amr Awadallah | [4] |
| Series A | 23 January 2024 | $28M | Spark Capital | Unusual Ventures, 42CAP | [5] |
| Series B | 12 March 2026 | $50M | AVP | Bosch Ventures, Unusual Ventures, Spark Capital, 42CAP | [6] |
| **Total** | | **~$85.5M** | | | |

Unusual Ventures partner John Vrionis, who led the seed round, characterized the timing as the moment vector search shifted from a research curiosity to core infrastructure for production AI applications. Spark Capital partner Yasmin Razavi joined Qdrant's board following the Series A. The Series B, led by Atlantic Bridge spin-out AVP and announced on 12 March 2026, valued the company at a level the parties did not disclose; participating investors included Bosch Ventures, which connected Qdrant to enterprise users in industrial automation and automotive software [4][5][6].

### company growth

By the time of the Series B announcement, Qdrant had grown to more than 100 employees distributed across over 20 countries, with engineering concentrated in Berlin and a Discord community of more than 9,000 developers. The company holds events including the annual Vector Space Day in Berlin, which drew more than 400 attendees in 2025 [1][9].

## How is Qdrant's architecture designed?

Qdrant runs as a single statically linked Rust binary that exposes [REST](/wiki/rest_api) and [gRPC](/wiki/grpc) APIs. The same binary handles standalone, replicated, and sharded deployments, with a custom storage engine, a custom key-value store called Gridstore, and a heavily modified [HNSW](/wiki/hnsw) graph index at its core. The system is intentionally focused on vector search rather than general-purpose data storage, and it rejects features (full transactional [ACID](/wiki/acid) guarantees, [SQL](/wiki/sql) parsing, generic document modeling) that would compromise its primary workload [10].

### Why is Qdrant written in Rust?

The choice of Rust as the implementation language has significant architectural implications. Rust provides several advantages for a vector database:

- **No garbage collector.** Rust uses a compile-time ownership model instead of garbage collection. This eliminates GC pauses that can cause latency spikes during search operations. In contrast, databases written in [Go](/wiki/go_programming_language) (such as [Weaviate](/wiki/weaviate) and [Milvus](/wiki/milvus)) or those with [JVM](/wiki/jvm) components are subject to GC-related tail latency.
- **Memory safety without overhead.** Rust's borrow checker catches memory errors at compile time, avoiding the segfaults and buffer overflows that plague C/C++ codebases, and without the runtime cost of bounds checking.
- **Zero-cost abstractions.** Rust's generic and trait system allows high-level API design without sacrificing performance. Iterator chains and closures compile down to the same assembly as hand-written loops.
- **SIMD acceleration.** Qdrant leverages Rust's [SIMD](/wiki/simd) (Single Instruction, Multiple Data) support to accelerate distance calculations on both x86-64 (AVX2 and AVX-512) and ARM (NEON) architectures. This hardware-level optimization is critical for vector distance calculations, which sit in the innermost loop of every search.
- **Async I/O with io_uring.** Qdrant uses io_uring for asynchronous disk I/O on Linux, maximizing disk throughput even on network-attached storage. This matters most for disk-based search modes where quantized vectors are stored on SSD [10].

The practical result is a compact memory footprint and predictable latency. In community benchmarks, Qdrant consistently ranks among the lowest in memory consumption per vector among the major vector databases, and among the highest in queries-per-second at fixed recall [8].

### segments and storage

A Qdrant **collection** is divided into one or more **segments**. Each segment is a self-contained unit that owns its own vector storage, payload storage, [HNSW](/wiki/hnsw) index, and ID mapper. Segments come in two states:

- **Appendable segments** accept inserts, updates, and deletes. They use mutable data structures and a flat (brute-force) index, and are optimized for write throughput.
- **Non-appendable (sealed) segments** are immutable except for tombstone-style deletions. They use compacted storage, fully built HNSW indexes, optional [quantization](/wiki/quantization), and are optimized for read performance [11].

A background **optimizer** thread merges small segments, converts appendable segments to sealed ones, and rebuilds indexes in the background. This is conceptually similar to how an [LSM-tree](/wiki/lsm_tree) database compacts SSTables, but adapted for vector indexes.

Durability is provided by a **write-ahead log (WAL)**. Every operation is recorded in the WAL before being applied to a segment. If the process crashes before changes are flushed, the WAL is replayed during startup. Each segment also stores a per-point version, so out-of-order replays cannot resurrect older state [11].

### Gridstore

Until version 1.13, Qdrant used [RocksDB](/wiki/rocksdb) as the underlying key-value store for payload and sparse vector data. The team identified several mismatches between RocksDB's design and Qdrant's actual workload: the LSM-tree compaction cycle introduced random latency spikes, the system offered far more configuration knobs than Qdrant needed, and the [C++](/wiki/cpp) interop boundary slowed iteration on storage features [12].

In January 2025, Qdrant 1.13 shipped **Gridstore**, a custom key-value store designed specifically for sequential integer keys (which is all Qdrant ever stores). Gridstore is structured around three layers: a Data Layer with a pointer-array tracker, a Mask Layer that tracks block usage, and a Gaps Layer that locates free space efficiently. Because there is no compaction phase, write latency is more predictable, and the storage engine can be debugged without crossing a foreign-function-interface boundary [12].

### HNSW implementation

Qdrant's index is built on [HNSW](/wiki/hnsw) (Hierarchical Navigable Small World) graphs, the dominant approximate nearest neighbor algorithm in production vector search. Unlike a textbook implementation, however, Qdrant's HNSW is integrated with the payload index. Filter conditions are applied **during** graph traversal rather than as a pre-filter on candidates or a post-filter on results. The HNSW graph is also extended with additional edges based on indexed payload values, so a filtered traversal does not become disconnected when many neighbors are pruned [10][13].

The HNSW index supports configurable parameters:

| Parameter | Default | Description |
|-----------|---------|-------------|
| `m` | 16 | Number of edges per node in the HNSW graph |
| `ef_construct` | 100 | Size of the dynamic candidate list during index construction |
| `ef` | `ef_construct` | Search-time candidate list size, controls speed/recall trade-off |
| `full_scan_threshold` | 10,000 | Collections smaller than this use brute-force search |
| `max_indexing_threads` | 0 (auto) | Number of threads for index building |
| `on_disk` | false | Whether to store the HNSW graph on disk instead of in memory |

Version 1.13 added GPU-accelerated HNSW indexing. The implementation uses the [Vulkan](/wiki/vulkan_api) graphics API rather than [CUDA](/wiki/cuda), making it portable across NVIDIA, AMD, and Intel GPUs. Qdrant reports up to 10x faster index construction relative to CPU-only builds on equivalent hardware. GPU indexing is enabled with the environment variable `QDRANT__GPU__INDEXING=1` and a Docker image suffix such as `qdrant/qdrant:v1.13.0-gpu-nvidia` [13][14].

Version 1.13 also introduced **HNSW graph compression** based on delta encoding, which reduces index memory by up to 30 percent without measurable recall loss. Version 1.16 added **Inline Storage**, which co-locates quantized vector data inside the HNSW graph nodes so that traversal touches fewer pages of disk or memory [13][15].

For highly selective filters, version 1.16 also implemented the **ACORN-1** search method (an adaptation of the ACORN approach to filtered ANN search). When direct neighbors of a node are excluded by a filter, ACORN looks one hop further to recover graph connectivity, improving recall on queries that combine vector similarity with rare metadata conditions [13].

### What is quantization in Qdrant?

[Quantization](/wiki/quantization) is the central tool Qdrant uses to keep large indexes in RAM or to make disk-based search feasible. The system has shipped three families of vector quantization, all of which can be enabled and disabled per collection without rebuilding the underlying vectors.

| Quantization | Compression | Speed-up | Typical recall | Introduced |
|---|---|---|---|---|
| Scalar (int8) | 4x | up to 2x | ~0.99 | v1.1, March 2023 |
| [Product quantization](/wiki/product_quantization) | up to 64x | 0.5x | ~0.7 at 12x | v1.2, May 2023 |
| Binary (1-bit) | 32x | up to 40x | ~0.95 to 0.99* | v1.5, September 2023 |
| Binary (1.5-bit) | 24x | ~30x | similar to 1-bit | 2025 |
| Binary (2-bit) | 16x | ~20x | similar to 1-bit | 2025 |
| TurboQuant (Hadamard) | 2x scalar at equal recall | varies | 0.9193 at 4-bit on arxiv-titles | v1.18, May 2026 |

*Recall figures for binary quantization assume 1024-dimensional or larger embeddings with appropriate oversampling and rescoring [16][17][18][32].

**Scalar quantization**, introduced in version 1.1 (March 2023), maps each `float32` component to an `int8` integer using a linear transformation calibrated from a configurable quantile (typically 0.99). This reduces vector memory by approximately 75 percent, allows SIMD-accelerated 8-bit dot products, and on internal benchmarks loses 0.3 percent recall or less while improving latency by 28 to 60 percent on real datasets [17].

**Product quantization**, added in version 1.2 (May 2023), splits each vector into chunks and runs k-means clustering with k=256 on each chunk independently. Compression ratios can range from 4x up to 64x. The trade-off is much steeper than scalar quantization: for a Glove-100 dataset, 4x PQ retains 0.71 mean precision (almost identical to the float baseline), but 12x PQ drops to 0.59. PQ is recommended only when memory constraints clearly outweigh recall [18].

**Binary quantization**, announced in September 2023 along with the partnership with [OpenAI](/wiki/openai), maps each component to a single bit (1 if positive, 0 otherwise). For 1536-dimensional `text-embedding-3-small` vectors and 3072-dimensional `text-embedding-3-large` vectors from OpenAI, Qdrant reports recall of 0.98 to 0.997 with appropriate oversampling. Storage drops by a factor of 32, and search becomes up to 40 times faster because distance computations reduce to bitwise XOR and population count [16]. Binary quantization performs poorly on embeddings smaller than 1024 dimensions, which led to the 1.5-bit and 2-bit variants introduced in 2025 for medium-dimensional models [16].

**TurboQuant**, shipped in version 1.18 (11 May 2026), is a quantization method developed by Google Research that applies a fast Hadamard rotation to vectors before compression, redistributing values evenly across coordinates so the technique works across embedding models without per-dataset tuning. Qdrant reports that TurboQuant delivers similar recall to scalar quantization at double the compression ratio: 4-bit TurboQuant reaches 0.9193 recall on the arxiv-titles dataset versus 0.9285 for scalar quantization, while 1-bit TurboQuant substantially improves on binary quantization (0.6763 versus 0.4683 on the same dataset) [32].

All quantization modes support **rescoring**: the candidate list is built using the compressed representation, then the top results are re-ranked using the full-precision vectors. Asymmetric quantization, which keeps query vectors in a higher-precision representation than stored vectors, is also supported.

#### choosing a quantization strategy

| Scenario | Recommended quantization | Rationale |
|----------|------------------------|----------|
| General production workload | Scalar | Best balance of memory savings and recall; minimal accuracy loss |
| 1536+ dimensional embeddings | Binary (1-bit) | Large embeddings tolerate aggressive compression with rescoring |
| Medium-dimensional embeddings (768 to 1024) | Binary (1.5-bit or 2-bit) | Better recall than 1-bit at lower compression |
| Maximum compression needed | Product quantization | Highest ratio but lowest recall; best for cold storage tiers |
| Mixed workload with precision-sensitive queries | Asymmetric quantization | Full-precision queries with compressed storage |
| Model-agnostic high compression | TurboQuant | Hadamard rotation works across embedding models without tuning |

Quantization can be configured per collection, and re-quantization is run by the optimizer in the background when the configuration changes [16][17].

## What are Qdrant's key features?

### payload and filtering

Every vector stored in Qdrant can carry an arbitrary JSON **payload** as metadata. The filtering engine supports a comprehensive list of conditions:

| Condition | Use case |
|---|---|
| `match` | Exact match on keyword, integer, boolean, or UUID fields |
| `match_any` (v1.1+) | IN-style matching against a list of values |
| `match_except` (v1.2+) | NOT-IN-style filtering |
| `range` | Numeric comparisons (gt, gte, lt, lte) |
| `datetime_range` (v1.8+) | RFC 3339 datetime comparisons |
| `geo_bounding_box` | Rectangular geographic region |
| `geo_radius` | Circular geographic region |
| `geo_polygon` | Arbitrary geographic polygons with holes |
| `text_match` | Token-level text search with configurable tokenizer |
| `phrase_match` (v1.15+) | Exact phrase matching with token order |
| `nested` (v1.2+) | Apply conditions to elements of an array of objects |
| `values_count` | Filter on array length |
| `is_empty` / `is_null` | Handle missing or null fields |
| `has_id` | Filter by point IDs |
| `has_vector` (v1.13+) | Filter by presence of a named vector |

Filters are combined using three boolean operators:

| Combinator | Logic |
|-----------|-------|
| `must` | Logical AND over all conditions |
| `should` | Logical OR over conditions |
| `must_not` | Logical NOT over conditions |

Because filters are applied during graph traversal rather than as a separate step, filtered searches perform well even when the filter is highly selective (matching only a small fraction of the dataset). Other vector databases often struggle in this scenario, since pre-filtering can fragment the graph and post-filtering wastes computation on irrelevant candidates [13][20].

### payload indexes

To accelerate filtered searches, Qdrant supports creating explicit payload indexes on frequently filtered fields:

| Index type | Field type | Description |
|-----------|-----------|-------------|
| Keyword | String | Hash-based index for exact string matching |
| Integer | Integer | Range index for numerical filtering |
| Float | Float | Range index for floating-point fields |
| Bool | Boolean | Bitmap index for boolean fields |
| Geo | GeoPoint | R-tree index for geographic queries |
| Full-text | Text | Inverted index with configurable tokenizer |
| Datetime | Datetime | Range index for temporal filtering |
| UUID | UUID | Hash-based index for UUID matching |

Full-text indexes can be configured with multiple **tokenizers** (word, whitespace, prefix, multilingual), Snowball **stemming**, **ASCII folding** for diacritics, and language-specific or custom **stopword** lists. Most index types can be stored on disk to conserve RAM. Specialized variants include the **tenant index** (`is_tenant: true`) for multi-tenant collections and the **principal index** for high-cardinality numeric fields used in nearly every query [13].

Creating a payload index before inserting data is more efficient than adding it afterward, because the optimizer can build the HNSW graph with the payload index already in place rather than rebuilding it later [10].

### multitenancy

Qdrant provides built-in multitenancy support, allowing many tenants to share a single collection while remaining logically isolated. Tenants are typically separated by a payload field (such as `tenant_id`) that is marked `is_tenant: true`, which causes the optimizer to keep each tenant's vectors in dedicated segments. This avoids the cost of a separate collection per tenant and keeps the HNSW graph for each tenant compact and efficient.

Version 1.16 introduced **tiered multitenancy**, which addresses the noisy-neighbor problem in shared deployments. Tenants can be promoted between hot, warm, and cold tiers based on activity. Hot tenants keep their vectors in RAM, warm tenants use quantized vectors with rescoring, and cold tenants live entirely on disk. Promotions and demotions happen in the background without rebuilding indexes [13][19].

### sparse vectors and hybrid search

Version 1.7 (December 2023) added native support for [sparse vectors](/wiki/sparse_vectors), enabling [hybrid search](/wiki/hybrid_search) that combines semantic similarity (dense vectors) with lexical relevance (sparse vectors produced by [BM25](/wiki/bm25), TF-IDF, or [SPLADE](/wiki/splade) models). Sparse vectors are indexed exactly (no approximation) and support both in-memory and on-disk storage. An optional inverse document frequency (IDF) modifier weights tokens by rarity at query time, which improves quality for natural-language queries.

A single point can carry multiple named vectors. The Query API supports a fan-out of prefetches across vector fields and combines results using fusion methods including **Reciprocal Rank Fusion (RRF)**, **Relative Score Fusion**, and **Weighted RRF** [21]. This capability is fundamental for [retrieval-augmented generation](/wiki/retrieval_augmented_generation) (RAG) pipelines, where combining keyword and semantic matching often outperforms either approach alone.

### multivector and late interaction

Version 1.10 (July 2024) introduced multivector support, in which a single point can carry an arbitrary set of vectors of equal dimensionality. This is the natural representation for **late interaction** models such as [ColBERT](/wiki/colbert), where each token in a document is encoded into its own vector and similarity is computed using the **MaxSim** operator: for each query token, take the maximum similarity over document tokens, then sum over query tokens [22].

A typical pipeline uses a multistage Query API call that prefetches candidates with a compressed dense vector, re-ranks with full-precision dense vectors, and finally re-ranks with ColBERT multivectors. Multivectors interoperate with all quantization modes [22].

### query API and advanced retrieval

Version 1.10 also generalized the search interface into a unified **Query API** that supports nearest-neighbor search, recommendations (based on positive and negative example IDs or vectors), discovery search (finding points in regions defined by example pairs), counting, and faceted aggregation. The same API supports nested prefetches, allowing complex retrieval pipelines to be expressed as a single request rather than multiple round trips.

More recent additions include **Maximal Marginal Relevance (MMR)** for diversity-aware reranking, score-boosting reranking that combines vector similarity with business signals, and full-text filtering with multilingual stemming and stopword support [9][13].

## collection management

Qdrant organizes data into **collections**, which contain **points** (vectors with optional payloads). Collection management is handled through the REST or gRPC API, or any of the official client libraries:

```python
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance

client = QdrantClient(url="http://localhost:6333")

# Create a collection
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(
        size=1536,
        distance=Distance.COSINE,
    ),
)

# Create a collection with multiple named vectors
client.create_collection(
    collection_name="multimodal",
    vectors_config={
        "text": VectorParams(size=768, distance=Distance.COSINE),
        "image": VectorParams(size=512, distance=Distance.COSINE),
    },
)

# List collections
collections = client.get_collections()

# Get collection info (vector count, config, status)
info = client.get_collection("documents")

# Delete a collection
client.delete_collection("documents")
```

Collections support **aliases**, which allow a stable name to point at different underlying collections. This enables blue-green deployments where a new index is built in a separate collection and the alias is switched atomically once the new index is ready.

A simple insertion and search flow looks like this:

```python
from qdrant_client.models import PointStruct, Filter, FieldCondition, MatchValue

client.upsert(
    collection_name="documents",
    points=[
        PointStruct(id=1, vector=[0.1] * 1536, payload={"category": "news"}),
        PointStruct(id=2, vector=[0.2] * 1536, payload={"category": "blog"}),
    ],
)

results = client.query_points(
    collection_name="documents",
    query=[0.15] * 1536,
    query_filter=Filter(
        must=[FieldCondition(key="category", match=MatchValue(value="news"))]
    ),
    limit=5,
).points
```

The equivalent REST call uses the `POST /collections/{name}/points/query` endpoint with a JSON body.

## snapshots and backup

Qdrant provides a comprehensive snapshot system for backups and data migration.

### collection snapshots

A snapshot is a tar archive containing the full data and configuration of a specific collection on a specific node at a specific point in time. Snapshots are created through the REST API:

| Operation | Endpoint | Description |
|-----------|----------|-------------|
| Create | `POST /collections/{name}/snapshots` | Creates a new snapshot of the collection |
| List | `GET /collections/{name}/snapshots` | Lists all available snapshots |
| Download | `GET /collections/{name}/snapshots/{snapshot_name}` | Downloads the snapshot file |
| Delete | `DELETE /collections/{name}/snapshots/{snapshot_name}` | Removes a snapshot |
| Recover | `PUT /collections/{name}/snapshots/recover` | Restores a collection from a snapshot file or URL |

### full storage snapshots

Full storage snapshots capture the entire Qdrant instance, including all collections, aliases, and cluster metadata. The snapshot name follows the format `full-snapshot-<timestamp>.snapshot`. This is useful for complete instance backups or for migrating an entire Qdrant deployment to new hardware.

### snapshot storage options

Snapshots can be stored locally on the filesystem or in [S3](/wiki/amazon_s3)-compatible object storage. The storage backend is configured in the Qdrant configuration file:

- **Local storage**: snapshots are saved to the configured `snapshots_path` directory.
- **S3 storage**: snapshots are uploaded to an S3 bucket using the AWS SDK, which also supports S3-compatible services such as [MinIO](/wiki/minio) and Google Cloud Storage.

For disaster recovery, Qdrant Cloud provides automatic incremental backups for AWS and GCP clusters, where each backup contains only the data that changed since the previous backup [23].

## distributed deployment and clustering

Qdrant supports distributed deployment for horizontal scaling and high availability.

### consensus with Raft

In distributed mode, Qdrant uses the [Raft](/wiki/raft_consensus) consensus protocol to maintain agreement on cluster topology and collection structure. All metadata operations (creating collections, changing replication factors, transferring shards) are routed through the consensus layer. Data operations (upserts and queries) bypass consensus and go directly to the relevant shards using optimistic replication, which is what allows the system to handle high write rates without the consensus layer becoming a bottleneck [10].

Write consistency can be tuned per request. The `write_ordering` option supports `weak`, `medium`, and `strong` modes; the `write_consistency_factor` controls how many replicas must acknowledge a write before it is considered successful. Reads can also request a consistency level via the `consistency` parameter.

### sharding

Collections are divided into **shards**, each of which is an independent store of points with its own HNSW index and payload indexes. Searches are dispatched to all shards in parallel and results are merged.

| Sharding concept | Description |
|-----------------|-------------|
| Auto-sharding | Qdrant automatically distributes shards across available nodes |
| Custom shard keys | Points can be routed by a payload field, similar to a partition key |
| Manual shard placement | Operators can pin specific shards to specific nodes |
| Resharding | Available in Qdrant Cloud for rebalancing data across nodes after scaling |
| Shard transfer | Moves shards between nodes with foreground or streaming modes |

Custom shard keys are particularly useful for multi-tenant deployments where each tenant should be physically isolated to a subset of nodes; combined with the tenant payload index, this enables strong tenant locality [10].

### replication

Each shard can have multiple replicas distributed across different nodes. The **replication factor** determines how many copies of each shard exist. For production systems, a replication factor of at least 2 is recommended to ensure availability during node failures. Clusters with three or more nodes and replication enabled can perform all operations even while one node is down, gaining performance benefits from load balancing.

When a node fails, the remaining replicas continue serving queries. When the failed node recovers, consensus triggers a replication process to bring the recovering node up to date with any mutations it missed. In Qdrant Cloud, replication factor changes and shard rebalancing are handled automatically; in self-hosted deployments they are operator-driven [10].

## How can you deploy Qdrant?

Qdrant offers several deployment models suitable for different use cases.

| Option | Description | Operator |
|---|---|---|
| Self-hosted (binary or Docker) | Run anywhere with the official binary, Docker image, or [Helm](/wiki/helm) chart | User |
| Qdrant Cloud (Managed) | Fully managed clusters on AWS, GCP, or Azure with usage-based billing | Qdrant |
| Qdrant Hybrid Cloud | Control plane managed by Qdrant; data plane runs in customer infrastructure | Qdrant control, customer data |
| Qdrant Private Cloud | Air-gapped, fully isolated deployment without any link to Qdrant's services | Customer |
| Qdrant Edge | Embedded library for on-device retrieval | Customer |

### Qdrant Cloud

Qdrant Cloud, launched in 2023 alongside the v1.0 server release, is the company's fully managed service. It is available on [AWS](/wiki/amazon_web_services), [Google Cloud](/wiki/google_cloud_platform), and [Microsoft Azure](/wiki/microsoft_azure). The service offers:

- A free tier with a single-node cluster (0.5 vCPU, 1 GB RAM, 4 GB disk) for prototyping.
- Standard tier with usage-based hourly billing for compute, memory, and storage.
- Premium tier with a 99.9 percent SLA, [single sign-on](/wiki/single_sign_on), and custom contracts.

Qdrant Cloud achieved [SOC 2 Type II](/wiki/soc_2) compliance on 7 April 2024, certified across the Security, Availability, and Confidentiality trust service criteria, and is also [HIPAA](/wiki/hipaa) compliant. Hardened clusters run in unprivileged containers with strict network policies and at-rest encryption [24][25]. In April 2026, Qdrant Cloud added GPU-accelerated indexing as a managed feature, multi-availability-zone clusters, and audit logging for enterprise compliance [14].

### Qdrant Hybrid Cloud

Launched on 15 April 2024, Qdrant Hybrid Cloud uses a [Kubernetes](/wiki/kubernetes)-native bring-your-own-cloud (BYOC) architecture. The control plane runs in Qdrant's infrastructure while the data plane and all customer vectors stay inside the customer's [VPC](/wiki/virtual_private_cloud). Supported targets include AWS, GCP, Azure, Oracle Cloud, DigitalOcean, Vultr, OVHcloud, Scaleway, [Red Hat OpenShift](/wiki/red_hat_openshift), and on-premises VMware. Twelve launch partners shipped integrations on day one, including [LangChain](/wiki/langchain), [LlamaIndex](/wiki/llamaindex), Airbyte, Jina AI, and Haystack [26].

### Qdrant Private Cloud

For customers with strict data sovereignty or air-gap requirements, Qdrant Private Cloud installs the same Kubernetes-native stack entirely inside the customer's environment with no link back to Qdrant's services. The customer is fully responsible for the operational lifecycle in this mode [25].

### Qdrant Edge

Announced on 29 July 2025 and currently in private beta, Qdrant Edge is an embedded library that runs the same retrieval primitives in-process on resource-constrained devices. Target use cases include robotics with on-device perception, mobile devices that need offline RAG, point-of-sale systems, and IoT predictive-maintenance agents. Qdrant Edge supports filterable HNSW, hybrid search, and multivector search, while running synchronously without background threads to fit on-device requirements [27].

## How does Qdrant compare to other vector databases?

Qdrant competes with both pure-play vector databases and vector extensions for general-purpose data systems. The landscape is crowded, and each system has different strengths.

| Feature | Qdrant | [Pinecone](/wiki/pinecone) | [Weaviate](/wiki/weaviate) | [Milvus](/wiki/milvus) | [Chroma](/wiki/chroma) | [pgvector](/wiki/pgvector) |
|---|---|---|---|---|---|---|
| Language | Rust | Proprietary (managed) | Go | Go and C++ | Python and Rust | C ([PostgreSQL](/wiki/postgresql) extension) |
| Open source | Yes (Apache 2.0) | No | Yes (BSD-3) | Yes (Apache 2.0) | Yes (Apache 2.0) | Yes (PostgreSQL License) |
| Deployment | Self-hosted, Cloud, Hybrid, Private, Edge | Managed cloud only | Self-hosted or Cloud | Self-hosted or Zilliz Cloud | Self-hosted or Cloud | PostgreSQL extension |
| Sparse vectors | Yes (native) | Yes | Yes (modular) | Yes | No | Yes (`sparsevec` type) |
| Multivector / late interaction | Yes (native) | Limited | Yes (modular) | Yes | No | No |
| Filtering during search | Yes (in-graph) | Yes (metadata) | Yes (inverted index plus vector) | Yes (attribute filtering) | Yes (metadata) | SQL `WHERE` clauses |
| GPU-accelerated indexing | Yes (vendor-agnostic, Vulkan) | N/A (managed) | Yes (NVIDIA CAGRA) | Yes (NVIDIA CAGRA) | No | No |
| Quantization | Scalar, Binary, PQ, TurboQuant, asymmetric | Limited | Yes | Yes (DiskANN, IVF-PQ) | No | No |
| Distributed consensus | Raft | Proprietary | Raft (since 1.25) | etcd | None | PostgreSQL replication |
| Multitenancy | Tiered, native | Namespaces | Tenant-based | Partition key | Collection-based | Schema-based |
| Typical use case | Performance-sensitive filtered search at any scale | Managed simplicity | Hybrid search with knowledge-graph features | Billion-scale enterprise deployments | Quick prototypes and small RAG apps | Vector search inside an existing PostgreSQL stack |

Qdrant's clearest performance advantage is in **filtered search**, where the integrated in-graph filtering avoids the recall cliffs that affect pre-filtering and the wasted compute of post-filtering. In Qdrant's published benchmarks (covering datasets such as `dbpedia-openai-1M-angular`, `deep-image-96-angular`, `gist-960-euclidean`, and `glove-100-angular`), Qdrant delivers the highest queries-per-second at fixed recall on most configurations and ranks among the lowest in latency [8][13]. Independent reviews generally agree that Qdrant and Milvus are the two strongest open-source contenders, with Pinecone leading on pure operational simplicity for managed-only customers [28].

Compared to [Pinecone](/wiki/pinecone), Qdrant offers the option of being open-source and self-hosted, which lowers cost for teams willing to manage their own infrastructure. Compared to [Weaviate](/wiki/weaviate), Qdrant tends to use less memory and CPU at the same scale, but lacks Weaviate's built-in [GraphQL](/wiki/graphql) API and modular knowledge-graph features. Compared to [Milvus](/wiki/milvus), Qdrant is simpler to operate (single Rust binary versus a distributed system with multiple components such as etcd, Pulsar, and MinIO) but may not match Milvus's throughput at the very largest scale. Compared to [Chroma](/wiki/chroma), Qdrant is positioned higher on the production-readiness curve, with replication, sharding, and quantization that Chroma does not currently offer.

## ecosystem and integrations

Qdrant provides official client libraries in Python, TypeScript and JavaScript, Rust, Go, Java, and C#, along with community libraries in many other languages. The Python client integrates with **FastEmbed**, Qdrant's lightweight embedding library that runs [ONNX](/wiki/onnx) models directly on CPU without [PyTorch](/wiki/pytorch) or CUDA dependencies, making it possible to encode text into vectors as part of a single client call. FastEmbed supports embedding models from [Sentence Transformers](/wiki/sentence_transformers), Jina AI, Nomic, [BAAI](/wiki/baai), and others, plus rerankers such as cross-encoders [29].

Qdrant integrates with most major [LLM](/wiki/large_language_model) orchestration frameworks:

| Framework | Integration |
|-----------|----------------|
| [LangChain](/wiki/langchain) | Vector store integration for RAG pipelines |
| [LlamaIndex](/wiki/llamaindex) | Retriever integration for document question-answering |
| Haystack by deepset | Document store component for search pipelines |
| Spring AI | Java-based AI application framework |
| Microsoft Semantic Kernel | Orchestration SDK for AI agents |
| Microsoft AutoGen | Multi-agent orchestration framework |
| CrewAI | Multi-agent orchestration platform |
| n8n | Official workflow automation node |
| Airbyte | ELT pipelines for ingesting embeddings |
| Vercel AI SDK | Edge-friendly AI framework for [Next.js](/wiki/nextjs) |

Qdrant also integrates with embedding providers including [OpenAI](/wiki/openai), [Cohere](/wiki/cohere), Jina AI, and many open-source models hosted on [Hugging Face](/wiki/hugging_face). Qdrant Cloud Inference, introduced in 2025, offers managed embedding generation alongside vector storage so that a single endpoint can serve both encoding and retrieval [9].

## Who uses Qdrant?

By March 2026 Qdrant powered production AI systems at more than 50 named enterprises and a much larger long tail of startups [2]. Selected case studies illustrate the typical scale and use case.

| Customer | Industry | Use case | Scale |
|----------|---------|----------|-------|
| Tripadvisor | Travel | AI Trip Planner and conversational search over reviews and images | Over 1 billion reviews, hundreds of millions of images, 11 million businesses |
| HubSpot | CRM and marketing | Breeze AI retrieval for customer data | Production retrieval across SaaS tenants |
| OpenTable | Restaurants | Sparse-vector restaurant filtering and discovery | Over 60,000 restaurants |
| Canva | Design | AI-powered features across the design platform | Hundreds of millions of users |
| Deutsche Telekom | Telecommunications | Multi-agent platform for customer service | Over 2 million conversations across 10 countries |
| Bazaarvoice | E-commerce | Product review retrieval | Billions of reviews |
| Roche | Pharmaceuticals | RAG over scientific literature | Internal enterprise deployment |
| Bosch | Industrial | Embedded retrieval in automotive and industrial AI | Internal enterprise deployment |
| xAI | AI labs | Real-time RAG over X (Twitter) for [Grok](/wiki/grok) | Real-time social-media data |
| Flipkart | E-commerce | Product search and recommendations | Indian retail scale |
| Voiceflow | Conversational AI | Conversational agent platform | Cloud SaaS deployment |
| Sprinklr | Customer experience | Unified CXM platform | Enterprise SaaS |
| Dust | AI assistants | Workspace AI assistants | Cloud SaaS deployment |

The Tripadvisor deployment is one of the most documented public case studies. Tripadvisor's data and AI team, led by Rahul Todkar, used Qdrant to build a unified user graph that captures how users engage with hotels, restaurants, and attractions across more than 21 countries. The graph powers an AI Trip Planner and a conversational search interface that has replaced filter-based search for many users. Tripadvisor reports a two- to three-fold increase in revenue per user from those who engage with its generative AI features [30].

Canva, the design platform serving hundreds of millions of users, cited performance and architecture as its reasons for adopting Qdrant. "Qdrant's technical architecture and performance capabilities have proven to be exactly what we need as we scale our AI-powered features across the platform," said Colin Chauvet, Director of Engineering at Canva [6].

The xAI relationship is notable for its scale and visibility: Grok, the conversational AI assistant integrated into the X social network, uses Qdrant as the retrieval layer that gives it access to real-time posts from the platform [3].

## current state

With the March 2026 Series B funding of $50 million led by AVP, with participation from Bosch Ventures, Unusual Ventures, Spark Capital, and 42CAP, Qdrant plans to expand its engineering and product teams, strengthen enterprise offerings, and scale global operations [6]. CEO Andre Zayarni summarized the company's positioning at the Series B announcement with a single line: "Models get the attention, but retrieval is what makes them useful in production." [6][31]

Qdrant is positioning itself as the core retrieval infrastructure layer for production AI systems, with a particular focus on agentic AI applications that require fast, reliable vector search across billions of objects and thousands of intermediate steps per workflow. The product strategy is built around **composable vector search**: rather than imposing a fixed retrieval pipeline, Qdrant exposes core capabilities (dense vectors, sparse vectors, metadata filters, multi-vector representations, custom scoring) as flexible primitives that can be combined at query time. This allows different workloads, from real-time agent reasoning to nightly batch RAG indexing, to compose retrieval logic suited to their latency, recall, and cost requirements [6].

The open-source project continues active development. As of June 2026, Qdrant 1.18 ("TurboQuant"), released on 11 May 2026, is the latest stable release. It followed v1.13, which shipped GPU-accelerated indexing, Gridstore, and HNSW graph compression in January 2025; v1.16, which added Inline Storage, ACORN-1, and tiered multitenancy in late 2025; and v1.17, released 20 February 2026, which introduced Relevance Feedback, audit access logging, weighted RRF, and configurable read fan-out delays [13][32]. Version 1.18 added the model-agnostic TurboQuant quantization method, per-component memory monitoring, named-vector management without collection recreation, and a cluster-wide audit-log query API; it also completed the removal of RocksDB in favor of Gridstore [32]. Qdrant Edge is in private beta, and Qdrant Cloud Inference unifies embedding generation and retrieval. Together these initiatives suggest a trajectory in which Qdrant evolves from a vector database into a more general-purpose retrieval engine for AI applications.

## see also

- [Vector database](/wiki/vector_database)
- [HNSW](/wiki/hnsw)
- [Hybrid search](/wiki/hybrid_search)
- [Quantization](/wiki/quantization)
- [Pinecone](/wiki/pinecone)
- [Weaviate](/wiki/weaviate)
- [Milvus](/wiki/milvus)
- [Chroma](/wiki/chroma)
- [pgvector](/wiki/pgvector)
- [Retrieval-augmented generation](/wiki/retrieval_augmented_generation)
- [ColBERT](/wiki/colbert)
- [Sentence Transformers](/wiki/sentence_transformers)
- [FAISS](/wiki/faiss)

## References

1. Qdrant. [About Us](https://qdrant.tech/about-us/). Retrieved 2026.
2. Qdrant. [Customers](https://qdrant.tech/customers/). Retrieved 2026.
3. TechCrunch. [Open source vector database startup Qdrant raises $28M](https://techcrunch.com/2024/01/23/qdrant-open-source-vector-database/). 23 January 2024.
4. Qdrant. [On Unstructured Data, Vector Databases, New AI Age, and Our Seed Round](https://qdrant.tech/articles/seed-round/). 19 April 2023.
5. Qdrant. [Announcing Qdrant's $28M Series A Funding Round](https://qdrant.tech/blog/series-a-funding-round/). 23 January 2024.
6. Qdrant. [We Raised $50M to Build Composable Vector Search as Core Infrastructure](https://qdrant.tech/blog/series-b-announcement/). 12 March 2026.
7. Andrey Vasnetsov. [CV](https://blog.vasnetsov.com/andrey_vasnetsov_cv.pdf). Retrieved 2026.
8. Qdrant. [Vector Search Benchmarks](https://qdrant.tech/benchmarks/). Retrieved 2026.
9. Qdrant. [2025 Recap: Powering the Agentic Era](https://qdrant.tech/blog/2025-recap/). January 2026.
10. DeepWiki. [qdrant/qdrant: System Architecture](https://deepwiki.com/qdrant/qdrant/2-system-architecture). Retrieved 2026.
11. Qdrant Documentation. [Storage](https://qdrant.tech/documentation/manage-data/storage/). Retrieved 2026.
12. Qdrant. [Introducing Gridstore: Qdrant's Custom Key-Value Store](https://qdrant.tech/articles/gridstore-key-value-storage/). January 2025.
13. Qdrant. [Qdrant 1.13 Release Notes: GPU Indexing, Strict Mode and New Storage Engine](https://qdrant.tech/blog/qdrant-1.13.x/). 23 January 2025.
14. Yahoo Finance. [Qdrant Cloud Ships Enterprise-Grade Features: GPU-Accelerated Indexing, Multi-AZ Clusters, and Audit Logging](https://finance.yahoo.com/). 28 April 2026.
15. Qdrant Documentation. [Indexing](https://qdrant.tech/documentation/concepts/indexing/). Retrieved 2026.
16. Qdrant. [Binary Quantization: A Quantum Leap in Vector Search](https://qdrant.tech/articles/binary-quantization/). 18 September 2023.
17. Qdrant. [Scalar Quantization: Memory and Performance Optimization](https://qdrant.tech/articles/scalar-quantization/). 30 March 2023.
18. Qdrant. [Product Quantization for Vector Search](https://qdrant.tech/articles/product-quantization/). May 2023.
19. Qdrant. [Multitenancy and Custom Sharding in Qdrant](https://qdrant.tech/articles/multitenancy/). Retrieved 2026.
20. Qdrant Documentation. [Filtering](https://qdrant.tech/documentation/concepts/filtering/). Retrieved 2026.
21. Qdrant. [Sparse Vectors](https://qdrant.tech/articles/sparse-vectors/). 9 December 2023.
22. Qdrant. [Late Interaction Models with Multivectors](https://qdrant.tech/articles/late-interaction-models/). 2024.
23. Qdrant Documentation. [Snapshots](https://qdrant.tech/documentation/concepts/snapshots/). Retrieved 2026.
24. Qdrant. [Qdrant Attains SOC 2 Type II Audit Report](https://qdrant.tech/blog/qdrant-soc2-type2-audit/). 23 May 2024.
25. Qdrant Documentation. [Cloud Security](https://qdrant.tech/documentation/cloud-security/). Retrieved 2026.
26. Qdrant. [Introducing Qdrant Hybrid Cloud: A New Era of Vector Search](https://qdrant.tech/blog/hybrid-cloud/). 15 April 2024.
27. Qdrant. [Qdrant Edge: On-Device Retrieval for Embedded AI](https://qdrant.tech/blog/qdrant-edge/). 29 July 2025.
28. Xenoss. [Pinecone vs Qdrant vs Weaviate: Best Vector Database](https://xenoss.io/blog/vector-database-comparison-pinecone-qdrant-weaviate). Retrieved 2026.
29. Qdrant. [FastEmbed: Qdrant's Efficient Python Library for Embedding Generation](https://qdrant.tech/articles/fastembed/). Retrieved 2026.
30. Qdrant. [How Tripadvisor Drives 2 to 3x More Revenue with Qdrant-Powered AI](https://qdrant.tech/blog/case-study-tripadvisor/). Retrieved 2026.
31. SiliconANGLE. [Qdrant raises $50M to bring flexible vector search to production AI systems](https://siliconangle.com/2026/03/12/qdrant-raises-50m-bring-flexible-vector-search-production-ai-systems/). 12 March 2026.
32. Qdrant. [Qdrant 1.18 - TurboQuant](https://qdrant.tech/blog/qdrant-1.18.x/). 11 May 2026.

