Vitess’s primary election and reparenting process is how it automatically recovers from a primary MySQL instance failure, ensuring your application stays online.

Here’s Vitess sharding a dataset and a primary election in action:

Imagine we have a users table sharded by user_id. Vitess splits this table into multiple shards, and each shard has its own primary MySQL instance.

CREATE TABLE users (
    user_id BIGINT,
    name VARCHAR(255),
    email VARCHAR(255),
    PRIMARY KEY (user_id)
) ENGINE=InnoDB;

-- Vitess sharding configuration (example)
-- Shard 0: user_id BETWEEN 0 AND 1000000
-- Shard 1: user_id BETWEEN 1000000 AND 2000000

When a user with user_id = 500000 needs to be written to, Vitess directs that write to the primary MySQL instance responsible for Shard 0. If a user with user_id = 1500000 needs to be written to, it goes to the primary for Shard 1.

Now, let’s say the primary for Shard 0 suddenly crashes. Vitess, through its vttablet processes and vtctld (Vitess cluster controller), detects this failure. It then initiates a primary election.

The vttablet processes for the other MySQL instances in Shard 0 (the replicas) start campaigning to become the new primary. They communicate through vtctld and a distributed consensus system (like etcd or ZooKeeper) to agree on a single new primary. Once a new primary is elected, Vitess updates its internal routing information so that all traffic for Shard 0 is now directed to the newly elected primary. This entire process happens in seconds, minimizing downtime.

The core problem Vitess solves here is database high availability for sharded systems. Without automatic failover, a primary MySQL instance failure would require manual intervention, leading to significant application downtime. Vitess automates this, making it resilient.

Internally, Vitess uses a combination of vttablet agents running alongside each MySQL instance and a central vtctld service. vttablet monitors its local MySQL instance and reports its health to vtctld. vtctld orchestrates the cluster, including managing the primary election process. The election itself relies on a distributed consensus backend (like etcd, ZooKeeper, or even MySQL 8’s group replication) to ensure that only one vttablet can claim leadership for a shard at any given time.

The key levers you control are the replication_lag_limit and failover_timeout parameters.

  • replication_lag_limit: This defines how much replication lag is acceptable for a replica to be considered a candidate for primary election. If a replica is too far behind the current primary, it won’t be eligible. For example, setting replication_lag_limit = 30s means a replica must be within 30 seconds of the primary’s position to participate in an election.

  • failover_timeout: This is the duration Vitess waits for a primary to become available before initiating a reparenting operation. If the primary is unresponsive for longer than this timeout (e.g., failover_timeout = 1m), Vitess will proceed with electing a new primary.

The "failover options" in Vitess refer to the different strategies and configurations that govern this automatic recovery. The most common and default strategy is automatic reparenting. When a primary is detected as down, Vitess automatically triggers an election among its replicas.

Another option, less common for automatic failover but relevant for planned maintenance or controlled migrations, is manual reparenting. This is initiated via vtctl commands. You can explicitly tell Vitess to demote the current primary and promote a specific replica. This is useful for planned upgrades or when you want to have more control over which replica becomes the new primary.

You can check the status of your primaries and replicas using the vtctl command-line tool. For example, to see the status of a specific shard:

vtctlclient -server vtctld-0.vtctld-headless.vitess.svc.cluster.local:15999 GetShardReplicationStatuses cells/zone1

This will output detailed information about each shard in the specified cell, including which instance is the primary and the replication status of its replicas.

If a primary fails and you want to force a reparenting operation (use with caution, as this bypasses some safety checks and assumes you know what you’re doing), you might use a command like:

vtctlclient -server vtctld-0.vtctld-headless.vitess.svc.cluster.local:15999 ReparentShard zone1/<shard-name> <new-primary-tablet-alias>

For example, to reparent shard 0 in zone1 to tablet alias zone1-0000000001:

vtctlclient -server vtctld-0.vtctld-headless.vitess.svc.cluster.local:15999 ReparentShard zone1/0 zone1-0000000001

This command tells vtctld to demote the current primary of shard 0 in zone1 and promote zone1-0000000001 to be the new primary. Vitess will then update its internal state and redirect traffic accordingly.

The most surprising aspect of Vitess’s reparenting is its reliance on read-only MySQL instances for the initial failure detection and election coordination. While the new primary must be a writable instance, the replicas that are still catching up can effectively participate in the election process, even if their data is slightly stale. This allows for a faster election to complete and minimizes the window of unavailability.

Once your primary election and reparenting are working perfectly, the next thing you’ll likely encounter is how Vitess handles schema changes across your sharded database.

Want structured learning?

Take the full Vitess course →