Milvus is an open-source vector database designed for billion-scale similarity search, developed by Zilliz. First released in 2019, Milvus joined the Linux Foundation AI & Data Foundation as an incubation project in January 2020 and graduated to top-level project status in June 2021. The project has accumulated over 43,000 GitHub stars and more than 100 million downloads, making it the most popular open-source vector database by these measures [1]. Milvus is available both as self-hosted open-source software and as a fully managed service through Zilliz Cloud.
Milvus was created by the team at Zilliz, a company founded in 2017 by Charles Xie. The initial motivation came from the growing need to search through unstructured data (images, text, audio) using vector embeddings rather than traditional keyword matching.
The project went through two major architectural generations:
Zilliz has raised approximately $113 million across multiple funding rounds. The most recent major round was a $60 million Series B-II in August 2022 [2]. The company relocated its headquarters from Shanghai to San Francisco as part of its push into the U.S. market.
| Milestone | Date | Details |
|---|---|---|
| Milvus 1.0 release | 2019 | Initial open-source release |
| LF AI & Data incubation | January 2020 | Joined Linux Foundation AI & Data Foundation |
| LF AI & Data graduation | June 2021 | Achieved top-level project status |
| Milvus 2.0 release | 2021 | Cloud-native architecture rewrite |
| Zilliz $60M Series B-II | August 2022 | Led growth into U.S. market |
| Milvus 2.5 release | December 2024 | Added native full-text search and hybrid search |
| 40,000 GitHub stars | 2025 | Community milestone |
| Milvus 2.6 release | 2025 | Improved multilingual full-text search, performance optimizations |
Milvus 2.x uses a disaggregated architecture where storage, compute, and coordination are separated into independent layers. This design allows each component to scale independently based on workload demands.
The system consists of four main layers:
| Layer | Components | Role |
|---|---|---|
| Access layer | Proxy nodes | Handles client connections, validates requests, routes queries |
| Coordinator layer | Root coord, Query coord, Data coord, Index coord | Manages metadata, scheduling, and resource allocation |
| Worker layer | Query nodes, Data nodes, Index nodes | Executes search queries, ingests data, builds indexes |
| Storage layer | Object storage (S3, MinIO), message queue (Kafka, Pulsar) | Persists data and handles log streaming |
This separation means that read-heavy workloads can scale by adding more query nodes without affecting write performance, and write-heavy workloads can scale data nodes independently. The architecture targets 99.9% uptime for production deployments [3].
All persistent data in Milvus is stored in object storage (such as Amazon S3, Google Cloud Storage, or MinIO for self-hosted deployments). The compute nodes are stateless and pull data from storage as needed. A message queue (Apache Kafka or Apache Pulsar) serves as the log backbone, capturing all data mutations in an ordered stream that compute nodes can replay.
This architecture has several practical benefits: compute nodes can be restarted or replaced without data loss, storage costs scale independently of compute costs, and different regions can share the same underlying data.
All Milvus worker nodes are designed as stateless microservices deployed on Kubernetes. This design enables quick recovery from failures: if a query node crashes, Kubernetes automatically restarts it, and the node re-attaches to the shared object storage without data loss. The stateless design also enables auto-scaling, where the number of query or data nodes adjusts based on current load.
The flow of a typical write operation illustrates how the components work together:
For read operations, the query coordinator routes requests to query nodes that have the relevant data segments loaded in memory. If a search spans multiple segments or partitions, results are gathered from multiple query nodes and merged before returning to the client.
Milvus supports a wide range of indexing algorithms, more than most competing vector databases. The choice of index involves tradeoffs between search speed, recall accuracy, memory usage, and build time.
| Index type | Algorithm | Best suited for | Key tradeoff |
|---|---|---|---|
| FLAT | Brute-force | Small datasets (<1M vectors) | Perfect recall but slow |
| IVF_FLAT | Inverted file with flat storage | Medium datasets | Faster than FLAT, lower recall |
| IVF_SQ8 | IVF with scalar quantization | Memory-constrained environments | 4x memory reduction vs IVF_FLAT |
| IVF_PQ | IVF with product quantization | Very large datasets | Highest compression, lower recall |
| HNSW | Hierarchical Navigable Small World | Low-latency requirements | Best recall-speed tradeoff, high memory |
| DiskANN | Disk-based ANN | Datasets exceeding memory | Searches from SSD, lower cost |
| SCANN | Scalable Nearest Neighbors | Google's algorithm for balanced performance | Good recall with fast search |
| GPU_CAGRA | GPU-accelerated graph index | High-throughput GPU environments | Very fast search, requires NVIDIA GPU |
| GPU_IVF_FLAT | GPU-accelerated IVF | GPU-accelerated batch queries | Faster build and search on GPU |
The choice of index type depends on the workload characteristics:
| Workload characteristic | Recommended index | Rationale |
|---|---|---|
| Small dataset (<1M vectors), perfect recall needed | FLAT | Brute-force guarantees exact results |
| General production, balanced performance | HNSW | Best recall-speed tradeoff for most workloads |
| Very large dataset, memory is limited | DiskANN | Keeps graph on SSD, dramatically lower RAM requirements |
| Batch processing with available GPU | GPU_CAGRA | Order-of-magnitude faster index building and search |
| Cost-sensitive, moderate recall acceptable | IVF_SQ8 | 4x memory reduction with reasonable recall |
| Maximum compression needed | IVF_PQ | Highest compression but lowest recall |
The DiskANN index is particularly significant for cost-sensitive deployments. It allows Milvus to search over datasets that do not fit in memory by keeping the graph structure on SSD, trading some latency for dramatically lower hardware costs.
Milvus provides GPU-accelerated index building and search through NVIDIA's cuVS library, which includes the CAGRA (Cuda-Accelerated Graph index for Approximate nearest neighbor) algorithm. GPU acceleration is most beneficial for:
GPU indexes require NVIDIA GPUs with CUDA support. For deployments without GPUs, CPU-based indexes like HNSW and DiskANN remain strong options [4].
| Parameter | GPU_CAGRA | GPU_IVF_FLAT |
|---|---|---|
intermediate_graph_degree | 64 (default) | N/A |
graph_degree | 32 (default) | N/A |
build_algo | IVF_PQ or NN_DESCENT | N/A |
nlist | N/A | Number of cluster units |
cache_dataset_on_device | Controls GPU memory caching | Controls GPU memory caching |
Milvus 2.5, released in December 2024, introduced native full-text search using BM25 scoring. The implementation converts text into sparse vectors representing BM25 scores, and users can input raw text directly with Milvus handling the sparse embedding generation automatically. This eliminates the need for a separate keyword search system [5].
Hybrid search in Milvus works through its multi-vector architecture. A single collection can have up to 10 vector fields, and searches can run across multiple vector columns simultaneously. Results from different vector fields are combined using configurable fusion strategies:
| Fusion strategy | Description | Best for |
|---|---|---|
| Reciprocal Rank Fusion (RRF) | Combines results based on their rank positions across different search lists | General-purpose hybrid queries where score distributions differ |
| Weighted scoring | Assigns explicit weights to each vector field's scores | When the relative importance of each field is known |
This supports several hybrid search patterns:
Milvus claims that its hybrid search processes queries 30 times faster than traditional solutions that require separate vector and keyword search systems [5].
Zilliz Cloud further enhances full-text search with JSON Shredding and JSON Path indexing, which accelerate metadata filtering by up to 100x. Benchmarks show that Zilliz Cloud's full-text search delivers up to 7x faster performance than Elasticsearch on selected datasets [10].
Beyond hybrid search, Milvus supports storing and searching across multiple vector fields per record. This is useful for multi-modal applications. For example, a product catalog might store separate embeddings for the product title, description, and image. A search can query across all three vector fields and combine the results, weighting each field based on its relevance to the use case.
Milvus is regularly deployed at hundreds of millions to billions of vectors in production. The distributed architecture allows horizontal scaling: adding more query nodes increases read throughput, and adding data nodes increases write throughput. Sharding distributes collections across nodes, and partitioning within collections allows users to scope searches to specific data subsets.
Milvus provides two approaches to partitioning data within collections:
Users can manually create named partitions within a collection and assign data to specific partitions during insert. Searches can then be scoped to one or more partitions, reducing the amount of data scanned:
from pymilvus import Collection
collection = Collection("products")
# Create partitions
collection.create_partition("electronics")
collection.create_partition("clothing")
# Insert into a specific partition
collection.insert(data, partition_name="electronics")
# Search within a specific partition
results = collection.search(
data=query_vectors,
anns_field="embedding",
param={"metric_type": "COSINE", "params": {"nprobe": 10}},
limit=10,
partition_names=["electronics"]
)
Partition keys provide an automatic approach to partitioning. Instead of manually assigning data to partitions, users designate a scalar field as the partition key. Milvus automatically routes data to the appropriate partition based on the field's value. This is particularly useful for multi-tenant applications, where a tenant ID field can serve as the partition key [11].
Milvus 2.5.5 improved partition scalability, making it feasible to run a single cluster with 10,000 collections and 100,000 partitions [11].
Milvus supports multiple consistency levels for search operations, allowing users to trade consistency for performance:
| Level | Guarantee | Latency | Use case |
|---|---|---|---|
| Strong | Reads the latest data | Highest | Financial transactions, real-time analytics |
| Bounded staleness | Data freshness within a configurable time window | Medium | Most production workloads where slight staleness is acceptable |
| Session | Read-your-writes within a session | Medium-low | Interactive applications where users expect to see their own writes |
| Eventually | Lowest latency, may return slightly stale results | Lowest | High-throughput analytics where slight staleness is tolerable |
Bounded staleness is the default consistency level, providing a balance between data freshness and query performance. The staleness window is configurable, typically set to a few seconds [12].
Zilliz Cloud is the fully managed service built on Milvus, offered in three tiers:
| Tier | Description | Best for |
|---|---|---|
| Serverless | Pay-per-use pricing based on capacity units consumed | Development, testing, and small-scale production |
| Dedicated | Reserved compute and storage resources for predictable performance | Production workloads with consistent traffic |
| BYOC (Bring Your Own Cloud) | Runs in the customer's cloud account for data sovereignty and compliance requirements | Regulated industries, data residency requirements |
Zilliz Cloud runs on Cardinal, a proprietary vector search engine that Zilliz describes as delivering up to 10x faster performance than open-source Milvus. Cardinal includes additional optimizations for cloud deployment, including more aggressive query planning and caching [6].
| Feature | Open-source Milvus | Zilliz Cloud |
|---|---|---|
| Performance engine | Milvus (open source) | Cardinal (proprietary, up to 10x faster) |
| Full-text search | BM25 sparse vectors | Enhanced with JSON Shredding (up to 100x faster metadata filtering) |
| Auto-scaling | Manual (adjust K8s resources) | Automatic horizontal scaling |
| Backup and recovery | Manual snapshot management | Automatic backups with point-in-time recovery |
| Monitoring | Self-managed (Prometheus/Grafana) | Built-in monitoring dashboard |
| Multi-tenancy | Partition key isolation | Managed multi-tenancy with resource isolation |
| Cost | Infrastructure cost only | Pay-per-use or reserved capacity |
| Feature | Milvus | Pinecone | Qdrant | Weaviate | pgvector |
|---|---|---|---|---|---|
| Architecture | Distributed, cloud-native | Managed cloud | Single-binary or distributed | Modular, distributed | PostgreSQL extension |
| Language | Go, C++ | Proprietary | Rust | Go | C |
| Scale | Billions of vectors | Billions of vectors | Billions of vectors | Tens of millions typical | Tens of millions typical |
| GPU acceleration | Yes (NVIDIA CAGRA, IVF) | N/A | Yes (HNSW indexing) | Yes (CAGRA via NVIDIA) | No |
| Native full-text search | Yes (BM25, since v2.5) | No | No (uses sparse vectors) | Yes (BM25) | No (use PostgreSQL tsvector) |
| Index variety | 9+ index types | Proprietary | HNSW + quantization | HNSW + flat | IVFFlat, HNSW |
| Operational complexity | High (multiple components) | None (fully managed) | Low to moderate | Moderate | None (part of PostgreSQL) |
| Consistency options | Strong, bounded, session, eventual | Eventual | Eventual | Eventual | Strong (PostgreSQL ACID) |
Milvus's primary advantage over competitors is its combination of scale and feature breadth. It supports more index types than any other vector database and can handle larger datasets than most alternatives. The tradeoff is operational complexity: a full Milvus deployment involves multiple components (coordinators, workers, message queue, object storage), which is substantially more complex than running a single Qdrant binary or enabling pgvector on an existing PostgreSQL instance.
For teams without dedicated infrastructure engineers, Zilliz Cloud or a simpler alternative like Qdrant or pgvector may be more practical. For teams operating at massive scale with existing Kubernetes infrastructure, Milvus provides capabilities that simpler systems cannot match [7].
Milvus provides SDKs for Python, Java, Go, Node.js, and C#. It integrates with the major LLM orchestration frameworks:
| Framework | Integration type |
|---|---|
| LangChain | Vector store and retriever components |
| LlamaIndex | Native vector store integration |
| Haystack | Document store plugin |
| Spring AI | Java-based AI application integration |
| Semantic Kernel | Microsoft's orchestration SDK |
| AutoGen | Microsoft's multi-agent framework |
Milvus also integrates with data pipeline tools like Apache Spark, Apache Flink, and Airbyte for bulk data ingestion from existing data sources.
Attu is an open-source graphical management tool for Milvus. It provides a web-based interface for managing collections, viewing data, running queries, monitoring system health, and managing indexes. Attu is particularly useful for teams that prefer visual tools over CLI or SDK-based management.
Milvus publishes performance benchmarks across multiple real-world datasets, measuring throughput, latency, and recall rate.
Key benchmark results include [8]:
| Metric | Milvus 2.2.3 vs 2.0.0 | Notes |
|---|---|---|
| Search latency | 2.5x reduction | Measured across multiple dataset sizes |
| Index build speed | Significant improvement | Parallelized construction |
| QPS scalability | Near-linear with replicas | Adding replicas scales throughput proportionally |
Zilliz Cloud (running Cardinal) benchmarks show:
In scaled-out deployments, Milvus shows little performance degradation in search latency and QPS as the cluster grows, demonstrating near-linear scalability when using multiple replicas [8].
Milvus continues to release frequently, with version 2.6 adding improvements to multilingual full-text search and further performance optimizations. The project's position within the LF AI & Data Foundation provides governance stability and signals enterprise readiness.
Recent releases have focused on security, with Milvus 2.5.27 and 2.6.10 addressing CVE-2026-26190, an authentication bypass vulnerability on the metrics port. Version 2.6.10 also introduced automatic FP32-to-FP16/BF16 conversion for reduced storage and optimized segment loading [13].
The vector database market has become increasingly competitive, with Pinecone, Qdrant, Weaviate, and pgvector all improving rapidly. Milvus differentiates itself primarily through its scale capabilities (tested and deployed at billions of vectors), its breadth of index types (nine or more), and its GPU acceleration support. For organizations building AI applications that need to search across very large vector datasets with low latency, Milvus remains one of the strongest options available.