Valkey replication, when paired with Sentinel, offers automatic failover, but the real magic is how it uses a distributed consensus model to decide when and how to promote a replica.

Let’s watch this happen. Imagine we have a master Valkey instance, valkey-master, and two replicas, valkey-replica-1 and valkey-replica-2. We also have three Sentinel instances, sentinel-1, sentinel-2, and sentinel-3, monitoring them.

Here’s a simplified view of our configuration:

Valkey Master (valkey-master):

port 6379
# ... other Valkey config ...

Valkey Replicas (valkey-replica-1, valkey-replica-2):

port 6379
replicaof valkey-master 6379
# ... other Valkey config ...

Sentinel Instances (sentinel-1, sentinel-2, sentinel-3):

port 26379
sentinel monitor valkey-master valkey-master 6379 2
sentinel down-after-milliseconds valkey-master 5000
sentinel failover-timeout valkey-master 10000
sentinel parallel-syncs valkey-master 1
# ... other Sentinel config ...

The sentinel monitor line tells each Sentinel to watch valkey-master with an initial quorum of 2. This means at least two Sentinels must agree that the master is down before a failover is initiated. down-after-milliseconds is the time a master must be unreachable before it’s considered down. failover-timeout is the maximum time allowed for the failover process. parallel-syncs limits how many replicas can be reconfigured to sync with the new master simultaneously.

Now, let’s simulate a master failure. We’ll stop the valkey-master process.

Within seconds, the Sentinels will start noticing valkey-master is unresponsive. sentinel-1 might detect it first. It waits for 5000 milliseconds (5 seconds). If valkey-master doesn’t respond, sentinel-1 marks it as S_DOWN (Subjectively Down). It then queries other Sentinels about their status for valkey-master.

If sentinel-2 and sentinel-3 also agree that valkey-master is S_DOWN within the failover-timeout period, they reach a consensus that the master is O_DOWN (Objectively Down). Since we configured a quorum of 2, and we have three Sentinels, two agreeing is enough.

At this point, one Sentinel is elected leader to perform the failover. This election is itself a distributed process where Sentinels vote for a leader. The Sentinel that wins the election will orchestrate the failover.

The leader Sentinel looks at the connected replicas (valkey-replica-1, valkey-replica-2). It needs to pick the "best" one to promote to a new master. It typically prioritizes replicas that are:

  1. Currently connected and not S_DOWN.
  2. Have the least replication lag behind the old master. This is crucial for minimizing data loss. Sentinel checks this by looking at the REPLICAOF information and the replica’s reported offset.
  3. Have a higher runid (a unique identifier for each Valkey instance) as a tie-breaker, indicating it’s likely a more recently started instance.

Let’s say sentinel-2 is elected leader and determines valkey-replica-1 is the best candidate. It sends a REPLICAOF NO ONE command to valkey-replica-1. This command immediately makes valkey-replica-1 the new master.

Once valkey-replica-1 has accepted its new role, sentinel-2 will then reconfigure the remaining replicas. It will send replicaof valkey-replica-1 6379 to valkey-replica-2. Now valkey-replica-2 starts replicating from the new master, valkey-replica-1.

Finally, the Sentinels update their internal configuration and inform any clients that might be using Sentinel to discover the master. Clients querying Sentinel will now be directed to valkey-replica-1.

The surprising part is that Sentinel doesn’t just blindly pick a replica. It actively probes and ranks them based on their replication state to ensure the promoted instance has the most up-to-date data possible. The consensus mechanism among Sentinels prevents split-brain scenarios where multiple instances might believe they are the master simultaneously.

If you were to restart the original valkey-master instance, Sentinel would detect it, mark it as S_DOWN, and then reconfigure it as a replica of the new master (valkey-replica-1). This ensures that the old master can rejoin the cluster and catch up with the current state.

The next thing you’ll likely encounter is understanding how to configure clients to use Sentinel for service discovery, which involves using specific connection commands like VALKEY-CLI -p 26379 SENTINEL get-master-addr-by-name valkey-master.

Want structured learning?

Take the full Valkey course →