Milvus can scale vector search performance by distributing its components across multiple machines, allowing it to handle massive datasets and high query loads.
Let’s see Milvus in action with a distributed setup. Imagine we have two proxy nodes, three querynode nodes, and two datanode nodes, all managed by etcd and Pulsar.
# Example Milvus Cluster Configuration Snippet
# etcd configuration
etcd:
endpoints:
- localhost:2379
# Pulsar configuration
pulsar:
serviceUrl: pulsar://localhost:6650
# Proxy configuration (example for one node)
proxy:
port: 19530
# ... other proxy specific configs
# Query Node configuration (example for one node)
queryNode:
port: 19531
# ... other query node specific configs
# Data Node configuration (example for one node)
dataNode:
port: 19532
# ... other data node specific configs
# Index Node configuration (example for one node)
indexNode:
port: 19533
# ... other index node specific configs
# MinIO configuration for object storage
minio:
address: localhost:9000
accessKey: minioadmin
secretKey: minioadmin
bucketName: milvus
createBucket: true
This configuration outlines how you’d specify the endpoints for your coordination services (etcd, Pulsar) and the ports for your various Milvus components. In a real distributed setup, these localhost addresses would be replaced with the actual IP addresses or hostnames of the machines running these services.
The core problem Milvus solves is efficiently searching through billions of high-dimensional vectors, a task that becomes computationally prohibitive on a single machine. Traditional databases struggle with the "curse of dimensionality" and the sheer volume of data. Milvus tackles this by breaking down the search problem into manageable parts and distributing them.
Here’s how it works internally:
- Data Nodes (
datanode): These nodes are responsible for ingesting data, performing primary data storage operations, and managing data consistency. They write data to object storage (like MinIO or S3) and also persist it to a local disk for faster access. - Query Nodes (
querynode): These are the workhorses for vector search. They load collection data and indexes into memory and execute similarity search queries. In a distributed setup, you can have multiplequerynodeinstances, each handling a subset of the collections or partitions, thereby parallelizing query execution. - Proxy Nodes (
proxy): These act as the entry point for client applications. They receive query requests, route them to the appropriatequerynode(s), aggregate results, and return them to the client. They also handle DDL (Data Definition Language) operations like creating collections and DML (Data Manipulation Language) operations like inserting and deleting data. - Index Nodes (
indexnode): Responsible for building vector indexes. When you create an index on a collection, theindexnodehandles the computationally intensive process of generating the index structure (e.g., IVF_FLAT, HNSW). These can also be scaled out. - Coordination Services (
etcd,Pulsar):etcd: Acts as the distributed configuration store and service discovery mechanism. All Milvus components register themselves withetcd, and other components can discover them through it. It ensures that the cluster state is consistently known across all nodes.Pulsar(or Kafka): Used as the message queue for asynchronous operations. Data insertion, deletion, and index building tasks are published as messages to Pulsar topics. The relevant Milvus components (likedatanodeandindexnode) consume these messages to perform their tasks. This decouples components and allows for fault tolerance and scalability.
When a search query arrives at a proxy, it consults etcd to find available querynodes. It then intelligently dispatches the query to the querynode(s) that hold the relevant data or partitions. If a collection is sharded or distributed across multiple querynodes, the proxy might send parts of the query to different querynodes and then merge the results.
The most surprising true thing about Milvus distributed setup is how aggressively it leverages internal message queues for task distribution and state management, making individual component failures far less catastrophic than one might expect. For instance, if a datanode goes offline during data ingestion, Pulsar holds onto the messages. Once the datanode recovers or a new one is provisioned, it can pick up where it left off by consuming the backlog from Pulsar, ensuring data consistency without manual intervention. This asynchronous, message-driven architecture is key to its resilience and scalability.
The next concept you’ll likely encounter is advanced load balancing strategies for querynodes and optimizing data distribution across partitions for predictable query latency.