Valkey Sentinel and Cluster are two distinct high-availability (HA) modes for Valkey, and choosing the right one hinges on understanding their fundamental differences in how they provide resilience and scalability.
Let’s see Sentinel in action. Imagine you have three Valkey instances: valkey-master (port 6379), valkey-replica1 (port 6380), and valkey-replica2 (port 6381). You’re running Valkey Sentinel on a separate machine (or even the same machine on a different port, say 26379).
Here’s a simplified Sentinel configuration file (sentinel.conf):
port 26379
sentinel monitor mymaster valkey-master 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 10000
sentinel parallel-syncs mymaster 1
sentinel auth-pass mymaster YourSentinelPassword
When valkey-master goes down, Sentinel (specifically, one of the Sentinel processes watching mymaster) detects the failure after 5 seconds (down-after-milliseconds). If a quorum of Sentinels (2 in this case, 2 in the sentinel monitor line) agree that the master is down, one Sentinel is elected leader. This leader then promotes one of the replicas (valkey-replica1 or valkey-replica2) to become the new master. The other replica is reconfigured to replicate from the new master. Your application, which was configured to connect to valkey-master:6379, would then query Sentinel to discover the new master’s address.
Now, let’s look at Valkey Cluster. Here, you’re not just replicating; you’re sharding data across multiple Valkey nodes. Imagine you want to set up a 3-master, 6-node cluster (each master having one replica).
You’d typically use the redis-cli --cluster create command. Let’s say you have nodes running on ports 7000, 7001, 7002 (masters) and 7003, 7004, 7005 (replicas).
The command might look like this:
redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 --cluster-replicas 1
This command does several things:
- It assigns hash slots (0-16383) to the master nodes.
- It configures the replica nodes to replicate from their designated master.
- It starts the cluster.
When a client connects to a Valkey Cluster, it talks to any node. If the requested key’s hash slot isn’t managed by that node, the node responds with a MOVED error, redirecting the client to the correct node. If a master node fails, its replicas are automatically promoted by the cluster itself, and the hash slot information is updated across all nodes. Clients can also be configured to automatically follow these redirects.
The core problem Valkey Sentinel solves is master failure detection and automatic failover for a single primary-replica setup. It’s designed for scenarios where you need a highly available primary Valkey instance, but don’t necessarily need to scale beyond the capacity of a single master. The mental model is a "guardian" watching over your primary and its backup.
Valkey Cluster, on the other hand, addresses scalability and high availability across multiple nodes. It partitions your dataset (shards it) across multiple master nodes. Each master can still have replicas for its own HA. The mental model here is a "distributed system" where data is spread out, and nodes coordinate to manage that distribution and recover from failures.
The most surprising thing for many is how Valkey Cluster handles client redirection. It’s not a single point of failure for lookups. Any node can tell you where a key is. When a master fails, the cluster’s internal gossip protocol ensures that all nodes are aware of the new master for the affected slots, and clients are eventually updated through MOVED or ASK redirects. This distributed consensus is key to its resilience, even for metadata about the cluster topology itself.
If you’re aiming for simple master-replica failover with minimal complexity, Sentinel is your path. If your dataset is growing too large for a single machine, or your read/write load demands distribution, Cluster is the way to go.
The next logical step after understanding Sentinel or Cluster is often exploring how to integrate them with application-level caching strategies to maximize performance and minimize latency.