Milvus

Milvus is a cloud-native vector database for developers building similarity search on massive unstructured data.

Reviewed by Mathijs Bronsdijk · Updated Apr 13, 2026

ToolFreeUpdated 1 month ago

What is Milvus?

Milvus is an open-source, cloud-native vector database for high-performance similarity search on massive-scale unstructured data in AI applications. It stores and retrieves vector embeddings and supports more than 10 indexing methods, including HNSW, IVF, and GPU-based options for similarity search. Its architecture includes an access layer for API handling, coordinator services for orchestration, worker nodes for execution, and storage components such as etcd, Pulsar or RocksDB, and object stores like S3. Milvus is for developers building AI applications such as retrieval-augmented generation and recommendation systems. It is designed for very large workloads, with support for similarity search across tens of billions of vectors and separate compute and storage scaling.

Key Features

HNSW Index: Supports graph-based approximate nearest neighbor search, which helps teams balance fast query speed with high recall on large vector datasets.
IVF Index: Partitions vectors into clusters for coarse-to-fine search and reduces search complexity from O(n) to sublinear time on billion-scale data.
FLAT Index: Runs exact brute-force search across all vectors, which is useful for smaller datasets or cases where 100% recall matters more than speed.
PyMilvus: The official Python SDK supports connection, collection creation, data insertion, and vector search, so Python teams can build and test retrieval workflows through an API.
RESTful API: The HTTP API exposes database management, query, and insert operations, so applications can work with Milvus without a language-specific SDK.
Role-Based Access Control (RBAC): Assigns read and write permissions to roles for collections, which helps control data access in multi-tenant deployments when user authentication is enabled.
Distributed Architecture: Separates compute and storage and scales horizontally with stateless microservices on Kubernetes, which supports high throughput and fault tolerance for billions of vectors.
Hardware Acceleration: Supports SIMD instructions such as AVX512 and Neon, plus NVIDIA GPUs, which can speed up indexing and search for high-QPS workloads on compatible hardware.

Use Cases

Platform Engineer at DoorDash: Uses Milvus through Zilliz Cloud for semantic search and recommendations across a large marketplace catalog. The reported outcome is better item discovery relevance and more accurate matching between users and food, grocery, and essentials items.
Ads Engineering Lead at eBay: Uses Milvus for semantic similarity search in ad recommendations by embedding product listings and user queries as vectors. The reported outcome is more relevant sponsored product matches and stronger ad performance.
AI Product Manager at Rexera: Uses Zilliz Cloud for hybrid retrieval in document-heavy real estate closing workflows. Rexera reports a 40% improvement in retrieval accuracy, removed its Elasticsearch dependency, and supports AI agents handling 10,000+ daily tasks across millions of document pages.

Strengths and Weaknesses

Strengths:

G2 reviewers report a 4.8 rating across 132 reviews, and reviews frequently cite strong similarity search performance for high-dimensional vector data (G2, source data provided).
G2 reviewers frequently note that Milvus handles large-scale datasets well and returns fast, accurate results on complex queries (G2, source data provided).
G2 reviews mention Milvus's native architecture for vector storage and search, with support for dense and sparse embeddings for AI use cases (G2, source data provided).
G2 aggregate comparisons note higher ratings for ease of use and quality of support, and reviewers mention an intuitive interface with responsive customer service (G2, source data provided).
G2 reviewers also report active community engagement, and reliability quotes describe improved performance and stability in image search, video search, and recommender system scenarios (G2, source data provided).

Weaknesses:

G2 reviewers consistently cite a steep learning curve, especially for users who are new to vector databases and AI applications (G2, source data provided).
G2 reviews frequently mention deployment and operations complexity in distributed setups. Reviewers point to separate configuration and monitoring for multiple components, plus Kubernetes or similar orchestration needs for production use (G2, source data provided).
G2 review data notes that the standalone version is for testing and is not suited for production deployments (G2, source data provided).
G2 reviewers report dependence on cloud or infrastructure stability, and some note that instability can interrupt access (G2, source data provided).

Getting Started

Free forever tier: $0. Full vector database capabilities, billion-scale search, tiered storage, and quantization for cost reduction. No vendor-enforced usage limits are documented for self-hosted use, and no contract minimum is listed.

Milvus is open source under the Apache 2.0 license, with no paid tiers or usage-based pricing in the provided data.

Who Is It For?

Ideal for:

ML engineer at a mid-market or enterprise company building recommendation systems: Milvus fits teams that need high-throughput vector similarity search for real-time personalization across millions of items. The research points to use cases like catalog discovery and restaurant matching.
Backend developer at a growth-stage SaaS company building RAG apps: Milvus works for semantic retrieval over documents, transcripts, or notes for chatbots and agents. The research says hybrid search can improve accuracy by 40%.
AI platform or security engineering team at an enterprise with 10+ engineers: Milvus suits multi-tenant vector search, fraud detection, and anomaly detection at large scale. It is a better fit when teams already run stacks with Kubernetes, Docker, Kafka or data pipelines, LLMs, and embedding models.

Not ideal for:

Solo developer prototyping a small app with less than 1M vectors: Milvus Lite may be enough, but if you want a simpler path for small projects, evaluate Chroma or Pinecone instead.
Teams with no vector search need, or non-technical users who want no-code search: Milvus requires SDK or API integration and is overkill for purely relational or tabular workloads, so PostgreSQL pgvector, Elasticsearch, Weaviate, or Pinecone may fit better depending on the use case.

Use Milvus when you are building production-scale vector applications such as recommendations, RAG, fraud detection, or semantic search, and you expect high query volume or billions of vectors. It fits growth and enterprise teams in areas like e-commerce, fintech, SaaS, media, and security. Skip it if your project is small, your data is not embedding-based, or your team does not want to manage a more technical deployment.

Alternatives and Comparisons

Redis: Milvus does vector search specialization better, with a disaggregated cloud-native architecture that separates ingestion, compaction, indexing, and query serving, plus support for HNSW, IVF, DiskANN, and GPU-accelerated CAGRA. Redis does mixed application support better, with sub-millisecond query latency, semantic caching through LangCache, and vectors, caching, streaming, and operational data in one platform with low-to-medium complexity. Choose Milvus if vector search performance and index tuning are central to the workload and a medium switching effort is acceptable; choose Redis if the app also depends on caching and streaming with less operational overhead.
Pinecone: Milvus does deployment control better, with self-managed options that include Lite, Standalone, and Distributed on Kubernetes, along with flexible indexing and hybrid search across vectors, BM25 full-text, and scalar filters. Pinecone does managed operations better because it handles indexing, metadata filtering, and production infrastructure for users. Choose Milvus if open-source control and self-hosting costs matter for large-scale vector workloads; choose Pinecone if managed uptime is the main priority.
Weaviate: Milvus does high-scale ANN search better, with multiple index types and GPU acceleration aimed at billion-scale vector workloads. Weaviate does broader semantic feature coverage better, with modular AI features that include hybrid search and knowledge graph capabilities. Choose Milvus if vector scale and index flexibility are the main requirements; choose Weaviate if the use case combines vector search with graph-oriented data and semantic modules.

Getting Started

Setup:

Signup: No free trial is listed in the research, and signup requirements are not stated.
Time to first result: Public quickstart materials point to about 10 to 30 minutes for a first result, with a simple Python script after starting Docker.

Learning curve:

Milvus is approachable for users who already know Python and basic Docker. The research says users with Python and Docker basics can pick it up in an afternoon, and it helps to understand vector embeddings concepts.
Beginner: Day 1 for basic search. Experienced: Immediate for core use.

Where to get help:

Official help starts with the quickstart tutorial and sample templates, and the tutorial covers local setup and simple search.
Discord and Slack are both described as active places to get help, learn tips, and talk with engineers and community members. GitHub Discussions and Issues are also a main support path, and maintainers direct users there for questions and bugs.
The community appears large, active, and growing. Maintainers, staff, and experienced community members answer questions, and office hours offer bookable 20-minute slots for deployment and vector search help.

Watch out for:

The first run assumes Python programming, basic Docker, and vector embeddings knowledge, so complete beginners may need extra setup time.
The research does not list common stumbling blocks, so there is limited user-reported detail on recurring onboarding issues.

Developer Experience

Milvus exposes SDKs for Python, Java, Go, Node.js, and C++, plus a gRPC API and Helm charts for Kubernetes deployments. Public feedback says the docs cover core topics such as collection management and indexing, but examples are scattered and some sections for newer features in Milvus 2.4+ are outdated. Time to first result ranges from 10 to 30 minutes for a simple Python setup, while production deployments often take 1 to 2 days because of configuration issues such as etcd tuning.

What developers like:

Pymilvus gets positive notes for type hints and flexible ANN query support, and developers describe it as solid for prototyping.
The Go SDK gets praise for low-latency production use.
Developers report high insert speeds at scale after tuning, and the open-source core supports local testing without vendor lock-in.
Milvus also has community integrations such as LangChain support and custom etcd wrappers on GitHub.

Common frustrations:

Developers report opaque error messages, including cases like "index build failed" without a clear root cause.
Docs are described as disorganized, with scattered examples and outdated sections for newer releases such as 2.4+.
Helm chart setup for scaling is a recurring source of complexity, and etcd instability is cited as a cause of cluster failures.
Some users report breaking changes between minor releases such as 2.3 and 2.4, plus rate limits and connection pooling issues in high-throughput apps.

Security and Privacy

Product Momentum

Release pace: Milvus ships monthly minor releases, with recent work centered on stability and performance fixes. Public release notes on the official site and GitHub list features, improvements, bug fixes, and known issues.
Recent releases: On April 7, 2026, Milvus released v2.6.14. The release notes say it improved MixCoord recovery speed and query filters, and fixed more than 20 bugs, including crashes and out of memory issues.
Growth: The project shows a growing trajectory and is driven by an open source community. Public materials also point to ecosystem expansion through integrations with OpenAI, AWS Bedrock, Google Vertex AI, and Hugging Face, plus Kubernetes Helm chart support.
Search interest: No Google Trends direction is included in the available research data.
Risks: No notable risks are documented in recent public discussions. The available research describes low abandonment risk due to recent releases and milestones, and minimal dependency risk because of its cloud-native design.

FAQ

What is Milvus used for?

Milvus is used for billion-scale vector search in workloads such as retrieval-augmented generation, recommendation systems, and multimodal AI. Public materials also describe it as a fit for incremental data and self-managed vector database deployments.

How much does Milvus cost?

Milvus is a 100% free open-source project under the Apache License 2.0 for production or distribution use. Zilliz also offers a managed version called Zilliz Cloud with pay-as-you-go pricing.

Is Milvus free to use?

Yes. The open-source version is free forever at $0, and the pricing research notes no paid tiers or usage-based pricing for Milvus itself.

Does Milvus have a free tier or usage limits?

The open-source version has no usage tiers or built-in limits. Hardware and storage needs still scale with your workload, and Zilliz Cloud uses pay-as-you-go pricing with trial credits for testing.

Can Milvus be self-hosted or run on-prem?

Yes. Milvus supports self-hosting through Standalone for single-node use, Distributed for Kubernetes or Helm deployments, and Lite for embedded Python use.

What is the easiest way to get started with Milvus?

The research points to Milvus Lite for single-machine setups and Docker for quick local clusters. Setup can be done with pip install or docker-compose, and the documented time to first result is about 10 to 30 minutes.

Is there a Milvus API?

Yes. Milvus provides gRPC and RESTful APIs for operations such as insert, search, and management, and SDKs are available for languages including Python and Java.

What integrations does Milvus support?

Milvus integrates with embedding providers such as OpenAI, AWS Bedrock, Google Vertex AI, and Hugging Face. It also supports object storage backends and client libraries for Python, Java, Go, and Node.js.

Where does Milvus store data?

Milvus stores vectors, scalar fields, and schema data as incremental logs in persistent object storage. Supported backends include MinIO, AWS S3, Google Cloud Storage, Azure Blob Storage, Alibaba Cloud OSS, and Tencent Cloud COS.

Does Milvus support inserting and searching data at the same time?

Yes. Insert and query operations run through separate modules, and inserted data becomes searchable after it is loaded to query nodes.

Does Milvus support Apple Silicon and ARM?

Yes. Milvus supports both x86 and ARM architectures, including ARM64, and the FAQ specifically notes support for Apple M1 and M2 CPUs.

Is there a limit to collections and partitions in Milvus?

Yes. A Milvus instance supports up to 65,535 collections, counting collections with shards and partitions.

What are Milvus data privacy and training data policies?

As open-source software, Milvus processes data locally or in storage controlled by the user. The research states it does not send user data to third parties for training.

How does Milvus compare with Pinecone or Weaviate?

The research describes Milvus as open-source and self-hostable, with no vendor software cost and support for distributed scaling to billions of vectors. It also notes advanced indexing and multi-modal support, while some alternatives focus more on ease of use and less on control.

What common issues come up with Milvus?

The research mentions out-of-memory problems from high memory use, delays when loading segments, and authentication bypass risks on some ports. Release notes and performance documentation are the cited sources for fixes and troubleshooting details.

Categories:

Memory & Knowledge

Tags: