Milvus

AI Infrastructure Open Source AI

18 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

15 citations

Revision

v6 · 3,690 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Milvus is an open-source vector database built for billion-scale similarity search, developed by Zilliz and governed under the Linux Foundation AI & Data Foundation. First released in 2019, it is the most widely adopted open-source vector database: by December 2025 it had surpassed 40,000 GitHub stars, exceeded 100 million downloads and Docker pulls, and was used by more than 10,000 enterprise teams in production, including NVIDIA, Salesforce, eBay, Airbnb, and DoorDash ^[1]^[2]. Milvus is available both as self-hosted open-source software under the Apache 2.0 license and as a fully managed service through Zilliz Cloud, which advertises a 99.95% uptime SLA across 29 global regions ^[2].

History

Milvus was created by the team at Zilliz, a company founded in 2017 by Charles Xie. The initial motivation came from the growing need to search through unstructured data (images, text, audio) using vector embeddings rather than traditional keyword matching.

The project went through two major architectural generations:

Milvus 1.x (2019-2021): A standalone vector search engine that handled small to medium-scale workloads. It was effective but limited by its monolithic architecture.
Milvus 2.0 (2021-present): A complete rewrite with a cloud-native, distributed architecture that separates compute and storage. This redesign was necessary to support the scale and reliability requirements of production AI workloads.

Milvus joined the Linux Foundation AI & Data Foundation as an incubation project in January 2020 and graduated to top-level project status in June 2021, giving it vendor-neutral governance. Zilliz has raised approximately $113 million across multiple funding rounds. The most recent major round was a $60 million Series B-II in August 2022 ^[3]. The company relocated its headquarters from Shanghai to San Francisco as part of its push into the U.S. market.

Milestone	Date	Details
Milvus 1.0 release	2019	Initial open-source release
LF AI & Data incubation	January 2020	Joined Linux Foundation AI & Data Foundation
LF AI & Data graduation	June 2021	Achieved top-level project status
Milvus 2.0 release	2021	Cloud-native architecture rewrite
Zilliz $60M Series B-II	August 2022	Led growth into U.S. market
Milvus 2.5 release	December 2024	Added native full-text search and hybrid search
Milvus 2.6 release	June 2025	RaBitQ quantization, Woodpecker WAL, tiered storage
35,000 GitHub stars	June 2025	Community milestone
40,000 GitHub stars	December 2025	Reinforced open-source leadership

What is Milvus used for?

Milvus stores high-dimensional vector embeddings and retrieves the nearest matches to a query vector, which is the core operation behind semantic search, retrieval-augmented generation (RAG), recommendation systems, image and video search, anomaly detection, and multi-modal applications. Because it pairs approximate nearest-neighbor (ANN) search with scalar filtering, full-text search, and metadata querying in one engine, it is commonly deployed as the retrieval layer for LLM applications that need to ground responses in large private corpora.

Architecture

Milvus 2.x uses a disaggregated architecture where storage, compute, and coordination are separated into independent layers. This design allows each component to scale independently based on workload demands.

System components

The system consists of four main layers:

Layer	Components	Role
Access layer	Proxy nodes	Handles client connections, validates requests, routes queries
Coordinator layer	Root coord, Query coord, Data coord, Index coord	Manages metadata, scheduling, and resource allocation
Worker layer	Query nodes, Data nodes, Index nodes	Executes search queries, ingests data, builds indexes
Storage layer	Object storage (S3, MinIO), message queue (Kafka, Pulsar)	Persists data and handles log streaming

This separation means that read-heavy workloads can scale by adding more query nodes without affecting write performance, and write-heavy workloads can scale data nodes independently. The architecture targets 99.9% uptime for production deployments ^[4].

Storage and compute separation

All persistent data in Milvus is stored in object storage (such as Amazon S3, Google Cloud Storage, or MinIO for self-hosted deployments). The compute nodes are stateless and pull data from storage as needed. A message queue (Apache Kafka or Apache Pulsar) serves as the log backbone, capturing all data mutations in an ordered stream that compute nodes can replay.

This architecture has several practical benefits: compute nodes can be restarted or replaced without data loss, storage costs scale independently of compute costs, and different regions can share the same underlying data.

Stateless microservices on Kubernetes

All Milvus worker nodes are designed as stateless microservices deployed on Kubernetes. This design enables quick recovery from failures: if a query node crashes, Kubernetes automatically restarts it, and the node re-attaches to the shared object storage without data loss. The stateless design also enables auto-scaling, where the number of query or data nodes adjusts based on current load.

Component interaction

The flow of a typical write operation illustrates how the components work together:

A client sends an insert request to a proxy node.
The proxy validates the request and forwards it to the data coordinator.
The data coordinator assigns the data to a data node and writes it to the message queue.
The data node consumes from the message queue, writes data to object storage, and notifies the coordinator.
The index coordinator detects new data and assigns index building to an index node.
The index node builds the index and writes it to object storage.
The query coordinator notifies query nodes to load the new segment for search.

For read operations, the query coordinator routes requests to query nodes that have the relevant data segments loaded in memory. If a search spans multiple segments or partitions, results are gathered from multiple query nodes and merged before returning to the client.

Index types

Milvus supports a wide range of indexing algorithms, more than most competing vector databases. The choice of index involves tradeoffs between search speed, recall accuracy, memory usage, and build time.

Index type	Algorithm	Best suited for	Key tradeoff
FLAT	Brute-force	Small datasets (<1M vectors)	Perfect recall but slow
IVF_FLAT	Inverted file with flat storage	Medium datasets	Faster than FLAT, lower recall
IVF_SQ8	IVF with scalar quantization	Memory-constrained environments	4x memory reduction vs IVF_FLAT
IVF_PQ	IVF with product quantization	Very large datasets	Highest compression, lower recall
IVF_RABITQ	IVF with RaBitQ 1-bit quantization	Cost-sensitive billion-scale	Up to 72% memory reduction at ~95% recall
HNSW	Hierarchical Navigable Small World	Low-latency requirements	Best recall-speed tradeoff, high memory
DiskANN	Disk-based ANN	Datasets exceeding memory	Searches from SSD, lower cost
SCANN	Scalable Nearest Neighbors	Google's algorithm for balanced performance	Good recall with fast search
GPU_CAGRA	GPU-accelerated graph index	High-throughput GPU environments	Very fast search, requires NVIDIA GPU
GPU_IVF_FLAT	GPU-accelerated IVF	GPU-accelerated batch queries	Faster build and search on GPU

How do you choose a Milvus index type?

The choice of index type depends on the workload characteristics:

Workload characteristic	Recommended index	Rationale
Small dataset (<1M vectors), perfect recall needed	FLAT	Brute-force guarantees exact results
General production, balanced performance	HNSW	Best recall-speed tradeoff for most workloads
Very large dataset, memory is limited	DiskANN	Keeps graph on SSD, dramatically lower RAM requirements
Billion-scale, minimize memory cost	IVF_RABITQ	1-bit quantization cuts memory up to 72% at ~95% recall
Batch processing with available GPU	GPU_CAGRA	Order-of-magnitude faster index building and search
Cost-sensitive, moderate recall acceptable	IVF_SQ8	4x memory reduction with reasonable recall
Maximum compression needed	IVF_PQ	Highest compression but lowest recall

The DiskANN index is particularly significant for cost-sensitive deployments. It allows Milvus to search over datasets that do not fit in memory by keeping the graph structure on SSD, trading some latency for dramatically lower hardware costs.

GPU acceleration

Milvus provides GPU-accelerated index building and search through NVIDIA's cuVS library, which includes the CAGRA (Cuda-Accelerated Graph index for Approximate nearest neighbor) algorithm. GPU acceleration is most beneficial for:

Building indexes on large datasets, where the parallel processing power of GPUs can reduce build times by an order of magnitude.
High-throughput batch search scenarios, where many queries are processed simultaneously.
Real-time search at very low latency, where GPU memory bandwidth enables faster distance calculations than CPU.

GPU indexes require NVIDIA GPUs with CUDA support. For deployments without GPUs, CPU-based indexes like HNSW and DiskANN remain strong options ^[5].

GPU index parameters

Parameter	GPU_CAGRA	GPU_IVF_FLAT
`intermediate_graph_degree`	64 (default)	N/A
`graph_degree`	32 (default)	N/A
`build_algo`	IVF_PQ or NN_DESCENT	N/A
`nlist`	N/A	Number of cluster units
`cache_dataset_on_device`	Controls GPU memory caching	Controls GPU memory caching

Key features

Hybrid search and full-text search

Milvus 2.5, released in December 2024, introduced native full-text search using BM25 scoring. The implementation converts text into sparse vectors representing BM25 scores, and users can input raw text directly with Milvus handling the sparse embedding generation automatically. This eliminates the need for a separate keyword search system ^[6].

Hybrid search in Milvus works through its multi-vector architecture. A single collection can have up to 10 vector fields, and searches can run across multiple vector columns simultaneously. Results from different vector fields are combined using configurable fusion strategies:

Fusion strategy	Description	Best for
Reciprocal Rank Fusion (RRF)	Combines results based on their rank positions across different search lists	General-purpose hybrid queries where score distributions differ
Weighted scoring	Assigns explicit weights to each vector field's scores	When the relative importance of each field is known

This supports several hybrid search patterns:

Dense + sparse vector search (semantic + keyword)
Dense + full-text search
Multi-modal search (text embedding + image embedding)

Milvus claims that its hybrid search processes queries 30 times faster than traditional solutions that require separate vector and keyword search systems ^[6]. In Milvus 2.6, Zilliz reports that the enhanced BM25 full-text engine delivers 3-4x higher throughput than Elasticsearch at equivalent recall, reaching up to 7x higher QPS on specific workloads ^[7].

Zilliz Cloud further enhances full-text search with JSON Shredding and JSON Path indexing, which accelerate metadata filtering by up to 100x. Benchmarks show that Zilliz Cloud's full-text search delivers up to 7x faster performance than Elasticsearch on selected datasets ^[8]. In Milvus 2.6, the new JSON Path Index reduced an example filter latency from 140ms to 1.5ms, a roughly 99% reduction ^[7].

Multi-vector search

Beyond hybrid search, Milvus supports storing and searching across multiple vector fields per record. This is useful for multi-modal applications. For example, a product catalog might store separate embeddings for the product title, description, and image. A search can query across all three vector fields and combine the results, weighting each field based on its relevance to the use case.

Billion-scale operation

Milvus is regularly deployed at hundreds of millions to billions of vectors in production. The distributed architecture allows horizontal scaling: adding more query nodes increases read throughput, and adding data nodes increases write throughput. Sharding distributes collections across nodes, and partitioning within collections allows users to scope searches to specific data subsets.

What is new in Milvus 2.6?

Milvus 2.6, released in June 2025, was positioned by Zilliz as a cost-reduction release for billion-scale vector search. Its headline additions are RaBitQ 1-bit quantization, the Woodpecker write-ahead log, and tiered storage.

RaBitQ 1-bit quantization (IVF_RABITQ). The new index compresses the main index to 1/32 of its original size through 1-bit quantization, and when combined with optional SQ8 refinement it maintains 94.9% recall while cutting the memory footprint by 72% and delivering roughly 4x faster queries than IVF_FLAT ^[7].
Woodpecker WAL. Woodpecker is a purpose-built, cloud-native write-ahead log with a zero-disk design that persists all log data directly in object storage (S3, Google Cloud Storage, or MinIO), removing the external Kafka or Pulsar dependency. In S3 mode it reaches about 750 MB/s, which Zilliz reports as 5.8x higher throughput than Kafka and 7x higher than Pulsar; local file system mode reaches 450 MB/s, about 3.5x faster than Kafka ^[7].
Tiered storage. Intelligent tiered storage moves cold data to cheaper layers, which Zilliz reports cuts costs by up to 50% ^[7].

James Luan, VP of Engineering at Zilliz, framed the design priorities around production economics: "Teams depend on Milvus in demanding environments where performance, reliability, and cost all matter." ^[1]

Partition management

Milvus provides two approaches to partitioning data within collections:

Traditional partitions

Users can manually create named partitions within a collection and assign data to specific partitions during insert. Searches can then be scoped to one or more partitions, reducing the amount of data scanned:

from pymilvus import Collection

collection = Collection("products")

# Create partitions
collection.create_partition("electronics")
collection.create_partition("clothing")

# Insert into a specific partition
collection.insert(data, partition_name="electronics")

# Search within a specific partition
results = collection.search(
    data=query_vectors,
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"nprobe": 10}},
    limit=10,
    partition_names=["electronics"]
)

Partition keys

Partition keys provide an automatic approach to partitioning. Instead of manually assigning data to partitions, users designate a scalar field as the partition key. Milvus automatically routes data to the appropriate partition based on the field's value. This is particularly useful for multi-tenant applications, where a tenant ID field can serve as the partition key ^[9].

Milvus 2.5.5 improved partition scalability, making it feasible to run a single cluster with 10,000 collections and 100,000 partitions ^[9].

Consistency levels

Milvus supports multiple consistency levels for search operations, allowing users to trade consistency for performance:

Level	Guarantee	Latency	Use case
Strong	Reads the latest data	Highest	Financial transactions, real-time analytics
Bounded staleness	Data freshness within a configurable time window	Medium	Most production workloads where slight staleness is acceptable
Session	Read-your-writes within a session	Medium-low	Interactive applications where users expect to see their own writes
Eventually	Lowest latency, may return slightly stale results	Lowest	High-throughput analytics where slight staleness is tolerable

Bounded staleness is the default consistency level, providing a balance between data freshness and query performance. The staleness window is configurable, typically set to a few seconds ^[10].

Zilliz Cloud

Zilliz Cloud is the fully managed service built on Milvus, offered in three tiers. It runs across 29 global regions on AWS, Azure, and Google Cloud, advertises a 99.95% uptime SLA, and is SOC 2 Type II compliant ^[2].

Tier	Description	Best for
Serverless	Pay-per-use pricing based on capacity units consumed	Development, testing, and small-scale production
Dedicated	Reserved compute and storage resources for predictable performance	Production workloads with consistent traffic
BYOC (Bring Your Own Cloud)	Runs in the customer's cloud account for data sovereignty and compliance requirements	Regulated industries, data residency requirements

Zilliz Cloud runs on Cardinal, a proprietary vector search engine that Zilliz describes as delivering up to 10x faster performance than open-source Milvus. Cardinal includes additional optimizations for cloud deployment, including more aggressive query planning and caching ^[11].

How does Zilliz Cloud differ from open-source Milvus?

Feature	Open-source Milvus	Zilliz Cloud
Performance engine	Milvus (open source)	Cardinal (proprietary, up to 10x faster)
Full-text search	BM25 sparse vectors	Enhanced with JSON Shredding (up to 100x faster metadata filtering)
Auto-scaling	Manual (adjust K8s resources)	Automatic horizontal scaling
Backup and recovery	Manual snapshot management	Automatic backups with point-in-time recovery
Monitoring	Self-managed (Prometheus/Grafana)	Built-in monitoring dashboard
Multi-tenancy	Partition key isolation	Managed multi-tenancy with resource isolation
Uptime	Self-managed	99.95% SLA across 29 regions, SOC 2 Type II
Cost	Infrastructure cost only	Pay-per-use or reserved capacity

How does Milvus compare with Pinecone, Qdrant, Weaviate, and pgvector?

Feature	Milvus	Pinecone	Qdrant	Weaviate	pgvector
Architecture	Distributed, cloud-native	Managed cloud	Single-binary or distributed	Modular, distributed	PostgreSQL extension
Language	Go, C++	Proprietary	Rust	Go	C
Scale	Billions of vectors	Billions of vectors	Billions of vectors	Tens of millions typical	Tens of millions typical
GPU acceleration	Yes (NVIDIA CAGRA, IVF)	N/A	Yes (HNSW indexing)	Yes (CAGRA via NVIDIA)	No
Native full-text search	Yes (BM25, since v2.5)	No	No (uses sparse vectors)	Yes (BM25)	No (use PostgreSQL tsvector)
Index variety	10+ index types	Proprietary	HNSW + quantization	HNSW + flat	IVFFlat, HNSW
Operational complexity	High (multiple components)	None (fully managed)	Low to moderate	Moderate	None (part of PostgreSQL)
Consistency options	Strong, bounded, session, eventual	Eventual	Eventual	Eventual	Strong (PostgreSQL ACID)

Milvus's primary advantage over competitors is its combination of scale and feature breadth. It supports more index types than any other vector database and can handle larger datasets than most alternatives. The tradeoff is operational complexity: a full Milvus deployment involves multiple components (coordinators, workers, message queue, object storage), which is substantially more complex than running a single Qdrant binary or enabling pgvector on an existing PostgreSQL instance.

For teams without dedicated infrastructure engineers, Zilliz Cloud or a simpler alternative like Qdrant or pgvector may be more practical. For teams operating at massive scale with existing Kubernetes infrastructure, Milvus provides capabilities that simpler systems cannot match ^[12].

Integration ecosystem

Milvus provides SDKs for Python, Java, Go, Node.js, and C#. It integrates with the major LLM orchestration frameworks:

Framework	Integration type
LangChain	Vector store and retriever components
LlamaIndex	Native vector store integration
Haystack	Document store plugin
Spring AI	Java-based AI application integration
Semantic Kernel	Microsoft's orchestration SDK
AutoGen	Microsoft's multi-agent framework

Milvus also integrates with data pipeline tools like Apache Spark, Apache Flink, and Airbyte for bulk data ingestion from existing data sources.

Attu management UI

Attu is an open-source graphical management tool for Milvus. It provides a web-based interface for managing collections, viewing data, running queries, monitoring system health, and managing indexes. Attu is particularly useful for teams that prefer visual tools over CLI or SDK-based management.

Performance at scale

Milvus publishes performance benchmarks across multiple real-world datasets, measuring throughput, latency, and recall rate.

Key benchmark results include ^[13]:

Metric	Milvus 2.2.3 vs 2.0.0	Notes
Search latency	2.5x reduction	Measured across multiple dataset sizes
Index build speed	Significant improvement	Parallelized construction
QPS scalability	Near-linear with replicas	Adding replicas scales throughput proportionally

Zilliz Cloud (running Cardinal) benchmarks show:

Sub-10ms latency on billion-scale datasets with auto-scaling
7x faster full-text search than Elasticsearch on selected datasets
Up to 100x faster metadata filtering with JSON Shredding ^[8]

In scaled-out deployments, Milvus shows little performance degradation in search latency and QPS as the cluster grows, demonstrating near-linear scalability when using multiple replicas ^[13].

Is Milvus open source?

Yes. Milvus is distributed under the Apache 2.0 license and is governed as a top-level project of the Linux Foundation AI & Data Foundation, with Zilliz as its primary contributor ^[2]. The full source, including the distributed engine and index implementations, is hosted at github.com/milvus-io/milvus. Zilliz Cloud is the commercial managed offering and uses a proprietary engine (Cardinal), but the core Milvus database remains fully open source.

Current state

Milvus continues to release frequently, with version 2.6 adding RaBitQ quantization, the Woodpecker WAL, tiered storage, and improvements to multilingual full-text search. The project's position within the LF AI & Data Foundation provides governance stability and signals enterprise readiness.

Recent releases have focused on security, with Milvus 2.5.27 and 2.6.10 addressing CVE-2026-26190, an authentication bypass vulnerability on the metrics port. Version 2.6.10 also introduced automatic FP32-to-FP16/BF16 conversion for reduced storage and optimized segment loading ^[14].

The vector database market has become increasingly competitive, with Pinecone, Qdrant, Weaviate, and pgvector all improving rapidly. Milvus differentiates itself primarily through its scale capabilities (tested and deployed at billions of vectors), its breadth of index types (ten or more), and its GPU acceleration support. For organizations building AI applications that need to search across very large vector datasets with low latency, Milvus remains one of the strongest options available.

References

PR Newswire. Milvus Surpasses 40,000 GitHub Stars, Reinforcing Leadership in Open-Source Vector Databases. December 2025. ↩
Zilliz. Milvus | Open-source Vector Database created by Zilliz ↩
TechCrunch. Zilliz raises $60M for open-source Milvus vector database. August 2022. ↩
Milvus Documentation. What is Milvus ↩
Zilliz. What is Milvus ↩
Milvus Blog. Introducing Milvus 2.5: Full-Text Search, More Powerful Metadata Filtering, and Usability Improvements. December 2024. ↩
Milvus Blog. Introducing Milvus 2.6: Affordable Vector Search at Billion Scale. June 2025. ↩
PR Newswire. Zilliz Announces General Availability of Milvus 2.6.x on Zilliz Cloud ↩
Milvus Documentation. Use Partition Key ↩
Milvus Documentation. Consistency ↩
Zilliz Cloud. Managed vector database built on Milvus ↩
TensorBlue. Vector Database Comparison 2025: Pinecone vs Weaviate vs Qdrant vs Milvus vs FAISS ↩
Zilliz. Milvus Performance Report ↩
GitHub. Milvus Release Notes ↩
Milvus Documentation. Milvus Architecture Overview

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

5 revisions by 1 contributors · full history

Suggest edit

Milvus

History

What is Milvus used for?

Architecture

System components

Storage and compute separation

Stateless microservices on Kubernetes

Component interaction

Index types

How do you choose a Milvus index type?

GPU acceleration

GPU index parameters

Key features

Hybrid search and full-text search

Multi-vector search

Billion-scale operation

What is new in Milvus 2.6?

Partition management

Traditional partitions

Partition keys

Consistency levels

Zilliz Cloud

How does Zilliz Cloud differ from open-source Milvus?

How does Milvus compare with Pinecone, Qdrant, Weaviate, and pgvector?

Integration ecosystem

Attu management UI

Performance at scale

Is Milvus open source?

Current state

References

Improve this article

What links here (24 of 30)

What links here (24 of 30)

History

What is Milvus used for?

Architecture

System components

Storage and compute separation

Stateless microservices on Kubernetes

Component interaction

Index types

How do you choose a Milvus index type?

GPU acceleration

GPU index parameters

Key features

Hybrid search and full-text search

Multi-vector search

Billion-scale operation

What is new in Milvus 2.6?

Partition management

Traditional partitions

Partition keys

Consistency levels

Zilliz Cloud

How does Zilliz Cloud differ from open-source Milvus?

How does Milvus compare with Pinecone, Qdrant, Weaviate, and pgvector?

Integration ecosystem

Attu management UI

Performance at scale

Is Milvus open source?

Current state

References

Improve this article

Related Articles

Model hubs

Weaviate

Chroma

pgvector

Qdrant

Ray (framework)

What links here (24 of 30)

Related Articles

Model hubs

Weaviate

Chroma

pgvector

Qdrant

Ray (framework)

What links here (24 of 30)