Qdrant (pronounced "quadrant") is an open-source vector database written in Rust and designed for high-performance similarity search at scale. Founded in 2021 in Berlin by Andre Zayarni (CEO) and Andrey Vasnetsov (CTO), Qdrant has grown into one of the leading vector search engines, with over 250 million downloads and 29,000 GitHub stars. The company raised $28 million in Series A funding in January 2024 and followed with a $50 million Series B round in March 2026, bringing total funding to approximately $87.5 million [1][2].
Andrey Vasnetsov, a machine learning engineer, and Andre Zayarni co-founded Qdrant in Berlin in 2021. Vasnetsov had previously worked on search and recommendation systems and saw that existing solutions for vector similarity search were either too slow, too memory-hungry, or forced developers into awkward workarounds using databases that were never designed for high-dimensional data.
The team chose Rust as the implementation language for its combination of memory safety, low-level performance, and absence of a garbage collector. This decision has proven to be one of Qdrant's defining technical choices, giving it a performance profile that competes favorably with systems written in C++ while avoiding entire classes of memory-related bugs.
The company's funding history reflects growing investor confidence in the vector database category:
| Round | Date | Amount | Lead investor |
|---|---|---|---|
| Seed | 2022 | ~$9.5M | Unusual Ventures |
| Series A | January 2024 | $28M | Spark Capital |
| Series B | March 2026 | $50M | AVP |
| Total | ~$87.5M |
Notable enterprise users include Tripadvisor, HubSpot, and OpenTable [2].
Qdrant is built as a standalone service that exposes REST and gRPC APIs. Internally, it uses a custom storage engine called Gridstore rather than relying on existing embedded databases. The core search algorithm is based on HNSW (Hierarchical Navigable Small World graphs), with several proprietary modifications.
The choice of Rust as the implementation language has significant architectural implications. Rust provides several advantages for a vector database:
The practical result is that Qdrant achieves a compact memory footprint and predictable latency characteristics. In community benchmarks, Qdrant consistently ranks among the lowest in memory consumption per vector among the major vector databases [5].
Qdrant's HNSW implementation goes beyond the standard algorithm in several ways. The graph is extended to incorporate payload (metadata) filtering directly into the search traversal. Instead of the common approach of pre-filtering candidates before search or post-filtering results after search, Qdrant applies filter conditions during the graph traversal itself. This single-pass approach means that filtered searches do not suffer from the recall degradation that affects pre-filtering or the wasted computation of post-filtering [3].
The HNSW index supports configurable parameters:
| Parameter | Default | Description |
|---|---|---|
m | 16 | Number of edges per node in the HNSW graph |
ef_construct | 100 | Number of neighbors to consider during index construction |
full_scan_threshold | 10,000 | Collections smaller than this use brute-force search |
max_indexing_threads | 0 (auto) | Number of threads for index building |
on_disk | false | Whether to store the HNSW graph on disk instead of in memory |
Recent versions have added GPU-accelerated HNSW indexing, incremental indexing for upsert-heavy workloads, and HNSW graph compression to reduce memory usage.
Qdrant supports multiple quantization strategies to reduce memory consumption:
| Quantization type | Memory reduction | Description |
|---|---|---|
| Scalar quantization | ~75% (float32 to uint8) | Compresses 32-bit floats into 8-bit unsigned integers |
| Binary quantization | ~97% (float32 to 1-bit) | Extreme compression for overparameterized embeddings |
| Product quantization | Variable | Divides vectors into subvectors and quantizes each separately |
| 1.5-bit / 2-bit quantization | ~94% / ~93% | Fine-grained options between binary and scalar |
| Asymmetric quantization | Variable | Uses original vectors for queries but quantized vectors in storage |
Qdrant's precision-tuned quantization maintains over 98% recall accuracy while achieving significant memory reductions. The asymmetric approach is particularly useful: query vectors remain at full precision while stored vectors are compressed, preserving search quality while reducing the memory footprint of the index [3].
| Scenario | Recommended quantization | Rationale |
|---|---|---|
| General production workload | Scalar quantization | Best balance of memory savings and recall; minimal accuracy loss |
| Overparameterized embeddings (1536+ dim) | Binary quantization | Large embeddings from models like OpenAI's text-embedding-ada-002 tolerate aggressive quantization |
| Maximum compression needed | Product quantization | Highest compression ratio, but more complex to configure and slightly lower recall |
| Mixed workload with some precision-sensitive queries | Asymmetric quantization | Full-precision queries with compressed storage; good for read-heavy workloads |
| Budget-constrained with moderate accuracy needs | 1.5-bit or 2-bit | Provides 93-94% memory reduction, better recall than binary |
Quantization can be configured per collection and can be enabled or disabled at any time without rebuilding the index. Qdrant re-quantizes vectors in the background when the configuration changes.
Qdrant introduced Inline Storage, which embeds quantized vectors directly into the HNSW graph structure. This improves disk-based search performance by reducing the number of random I/O operations needed during graph traversal, since the quantized vector data is co-located with the graph edges.
Qdrant's Gridstore storage engine manages data at the segment level. Each collection is divided into segments, where each segment contains a subset of the vectors and their associated payloads. Segments can be in one of two states:
The optimizer periodically merges small segments and converts appendable segments to sealed ones, similar to how LSM-tree databases compact SSTables.
Every vector in Qdrant can carry a JSON payload containing arbitrary metadata. The system supports a wide range of filter conditions on these payloads:
Payload filtering supports three logical combinators:
| Combinator | Description |
|---|---|
must | All conditions must be satisfied (logical AND) |
should | At least one condition must be satisfied (logical OR) |
must_not | None of the conditions should be satisfied (logical NOT) |
The filtering engine is one of Qdrant's strongest differentiators. Because filters are applied during HNSW traversal rather than as a separate step, filtered searches perform well even when the filter is highly selective (matching only a small fraction of the dataset). Other vector databases often struggle in this scenario because pre-filtering can eliminate too many graph nodes, and post-filtering can waste computation on irrelevant results [4].
Qdrant also supports ACORN (Approximate Condensed Retrieval over Neighborhoods), an additional mechanism that improves search accuracy when queries involve multiple high-cardinality filters. ACORN adapts the graph traversal strategy based on the filter selectivity, providing better recall for complex filter combinations.
To accelerate filtered searches, Qdrant supports creating explicit payload indexes on frequently filtered fields:
| Index type | Field type | Description |
|---|---|---|
| Keyword index | String | Hash-based index for exact string matching |
| Integer index | Integer | Range index for numerical filtering |
| Float index | Float | Range index for floating-point fields |
| Bool index | Boolean | Bitmap index for boolean fields |
| Geo index | GeoPoint | R-tree index for geographic queries |
| Full-text index | Text | Inverted index with configurable tokenizer for text search |
| Datetime index | Datetime | Range index for temporal filtering |
| UUID index | UUID | Hash-based index for UUID matching |
Creating a payload index before inserting data is more efficient than adding it after, because the optimizer can build the HNSW graph with the payload index already in place, avoiding a separate graph rebuild [3].
Qdrant provides built-in multitenancy support, allowing multiple tenants to share a single collection while maintaining isolation. The system introduced tiered multitenancy to address the "noisy neighbor" problem, where one tenant's heavy workload degrades performance for others. Tenants can be separated via payload-based partitioning, and Qdrant optimizes storage and indexing per tenant segment.
For multi-tenant deployments, Qdrant recommends using a tenant ID as a payload field and creating a payload index on that field. Queries then include a filter on the tenant ID, and the in-graph filtering mechanism ensures that only vectors belonging to the target tenant are considered during search.
Qdrant supports sparse vectors alongside dense vectors, enabling hybrid search that combines semantic similarity (dense vectors) with keyword relevance (sparse vectors, such as those produced by BM25, TF-IDF, or SPLADE models). Users can define multiple named vector fields per point and run searches that blend results from both dense and sparse representations.
This capability is important for retrieval-augmented generation (RAG) systems, where combining keyword and semantic matching often produces better results than either approach alone.
Beyond sparse and dense vectors, Qdrant supports storing multiple named vectors per point. This allows for multimodal search scenarios where a single record might have separate embeddings for text, images, and other modalities. Searches can target specific vector fields or combine results across multiple fields.
Qdrant organizes data into collections, where each collection contains points (vectors with payloads). Collection management includes:
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance
client = QdrantClient(url="http://localhost:6333")
# Create a collection
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(
size=1536,
distance=Distance.COSINE
)
)
# Create a collection with multiple named vectors
client.create_collection(
collection_name="multimodal",
vectors_config={
"text": VectorParams(size=768, distance=Distance.COSINE),
"image": VectorParams(size=512, distance=Distance.COSINE),
}
)
# List collections
collections = client.get_collections()
# Get collection info (vector count, config, status)
info = client.get_collection("documents")
# Delete a collection
client.delete_collection("documents")
Collections support aliasing, which allows pointing a stable name to different collections. This is useful for blue-green deployments where a new index is built in a separate collection, then the alias is switched atomically to point to the new collection.
Qdrant provides a comprehensive snapshot system for backups and data migration.
A snapshot is a tar archive containing the full data and configuration of a specific collection on a specific node at a specific point in time. Snapshots can be created, listed, downloaded, and deleted via the REST API:
| Operation | API endpoint | Description |
|---|---|---|
| Create | POST /collections/{name}/snapshots | Creates a new snapshot of the collection |
| List | GET /collections/{name}/snapshots | Lists all available snapshots |
| Download | GET /collections/{name}/snapshots/{snapshot_name} | Downloads the snapshot file |
| Delete | DELETE /collections/{name}/snapshots/{snapshot_name} | Removes a snapshot |
| Recover | PUT /collections/{name}/snapshots/recover | Restores a collection from a snapshot file or URL |
Full storage snapshots capture the entire Qdrant instance, including all collections, aliases, and cluster metadata. The snapshot name follows the format full-snapshot-<timestamp>.snapshot. This is useful for complete instance backups or migrating an entire Qdrant deployment to new hardware.
Snapshots can be stored locally on the filesystem or in S3-compatible object storage. The storage backend is configured in the Qdrant configuration file:
snapshots_path directory.For disaster recovery, Qdrant Cloud provides automatic incremental backups for AWS and GCP clusters, where each backup contains only the data that changed since the previous backup [4].
Qdrant supports distributed deployment for horizontal scaling and high availability.
In distributed mode, Qdrant uses the Raft consensus protocol to maintain consistency regarding cluster topology and collection structure. All metadata operations (creating collections, changing configurations) are routed through the consensus layer, while data operations (upserting points, searching) go directly to the relevant shards with optimistic replication [3].
Collections are divided into shards, each representing an independent store of points. Sharding enables parallel processing of search requests across multiple nodes. By default, Qdrant creates a number of shards based on the cluster size, but the shard count can be configured at collection creation time.
| Sharding concept | Description |
|---|---|
| Auto-sharding | Qdrant automatically distributes shards across available nodes |
| Manual shard placement | Users can control which nodes hold specific shards |
| Resharding | Available in Qdrant Cloud for rebalancing data across nodes after scaling |
| Shard transfer | Moving shards between nodes for load balancing |
Each shard can have multiple replicas distributed across different nodes. The replication factor determines how many copies of each shard exist. For production systems, a replication factor of at least 2 is recommended to ensure availability during node failures.
If a node fails, the remaining replicas continue serving queries. When the failed node recovers, consensus triggers a replication process to bring the recovering node up to date with any mutations it missed.
Clusters with three or more nodes and replication enabled can perform all operations even while one node is down, gaining performance benefits from load balancing. When adding nodes, existing replicas can be rebalanced, though in self-hosted deployments this requires manual shard management. In Qdrant Cloud, replication factor changes and shard rebalancing are handled automatically [4].
Qdrant Cloud is the company's managed hosting service, available on AWS, Google Cloud, and Microsoft Azure. It offers several deployment tiers:
| Tier | Description | Key features |
|---|---|---|
| Free tier | 1 GB of storage | Prototyping and small projects |
| Standard clusters | Dedicated resources | Configurable node sizes, SSD storage |
| Hybrid Cloud | Runs in customer's infrastructure | Qdrant-managed control plane; data stays in customer's VPC |
| Private Cloud | Fully isolated deployment | Enterprise security and compliance |
For self-hosted deployments, Qdrant is available as a Docker container or can be deployed on Kubernetes using the official Helm chart.
Qdrant competes with several other vector databases and search engines. The landscape is crowded, and each system has different strengths.
| Feature | Qdrant | Pinecone | Weaviate | Milvus | pgvector |
|---|---|---|---|---|---|
| Language | Rust | Proprietary (managed) | Go | Go/C++ | C (PostgreSQL extension) |
| Open source | Yes (Apache 2.0) | No | Yes (BSD-3) | Yes (Apache 2.0) | Yes (PostgreSQL License) |
| Deployment | Self-hosted or cloud | Managed cloud only | Self-hosted or cloud | Self-hosted or Zilliz Cloud | PostgreSQL extension |
| Sparse vectors | Yes | Yes | No (uses external modules) | Yes | Yes (via sparsevec type) |
| Filtering during search | Yes (in-graph filtering) | Yes (metadata filtering) | Yes (inverted index + vector) | Yes (attribute filtering) | Yes (SQL WHERE clauses) |
| GPU-accelerated indexing | Yes | N/A | Yes (CAGRA via NVIDIA) | Yes (NVIDIA CAGRA) | No |
| Multitenancy | Native support | Namespace-based | Tenant-based | Partition key | Schema-based |
| Typical use case | Performance-sensitive filtered search | Managed simplicity at scale | Hybrid search with knowledge graph features | Billion-scale enterprise deployments | Adding vector search to existing PostgreSQL |
Qdrant's performance advantage is most noticeable in filtered search workloads. When queries combine vector similarity with metadata conditions, the in-graph filtering approach avoids the performance cliffs that can occur with pre- or post-filtering strategies in other systems [4][5].
Compared to Pinecone, Qdrant offers the advantage of being open-source and self-hostable, with lower costs for teams willing to manage their own infrastructure. Compared to Weaviate, Qdrant tends to use less memory and compute at the same scale but lacks Weaviate's built-in GraphQL API and knowledge graph features. Compared to Milvus, Qdrant is simpler to operate (single binary versus a distributed system with multiple components) but may not match Milvus's throughput at truly massive scale.
Qdrant provides client libraries in Python, TypeScript/JavaScript, Rust, Go, Java, and C#. It integrates with major LLM orchestration frameworks:
| Framework | Integration type |
|---|---|
| LangChain | Vector store integration for RAG pipelines |
| LlamaIndex | Retriever integration for document question-answering |
| Haystack | Document store component for search pipelines |
| Spring AI | Java-based AI application framework |
| Semantic Kernel | Microsoft's orchestration SDK for AI agents |
| AutoGen | Microsoft's multi-agent framework |
| CrewAI | Multi-agent orchestration platform |
Qdrant also integrates with embedding providers including OpenAI, Cohere, and various open-source models via Hugging Face.
Qdrant's Rust implementation gives it a relatively compact memory footprint compared to garbage-collected alternatives. In community benchmarks, Qdrant consistently ranks among the top performers for filtered search latency. At the 1-10 million vector scale, typical query latencies fall in the 10-100 millisecond range depending on vector dimensionality, filter complexity, and quantization settings [5].
Key performance features include:
| Feature | Description | Impact |
|---|---|---|
| SIMD acceleration | Uses AVX2/AVX-512 on x86-64 and NEON on ARM | Up to 4x speedup for distance calculations |
| io_uring async I/O | Linux kernel-level async disk access | Maximizes disk throughput on SSDs |
| Graph compression | Reduces HNSW graph memory footprint | Lower memory usage without sacrificing search quality |
| Inline Storage | Co-locates quantized vectors with graph edges | Fewer random I/O operations during disk-based search |
| Incremental indexing | Updates HNSW graph without full rebuild | Better performance for upsert-heavy workloads |
| GPU-accelerated indexing | Offloads index construction to NVIDIA GPUs | Order-of-magnitude faster index builds |
The 2025 enhancements, including GPU-accelerated HNSW indexing and HNSW graph compression, have further improved both index build times and search throughput. Inline Storage has reduced disk-based search latency by co-locating quantized vectors with graph structure.
With the March 2026 Series B funding of $50 million led by AVP (with participation from Bosch Ventures, Unusual Ventures, Spark Capital, and 42CAP), Qdrant plans to expand its engineering and product teams, strengthen enterprise offerings, and scale global operations [2]. The company is positioning itself as the core retrieval infrastructure layer for production AI systems, targeting the growing market for agentic AI applications that require fast, reliable vector search.
The open-source project continues active development, with recent work focused on composable vector search primitives that allow developers to build complex retrieval pipelines by combining filtering, scoring, and re-ranking steps. Qdrant's trajectory suggests it is evolving from a pure vector database into a more general-purpose retrieval engine for AI applications.