Vitess’s buffer during reparent is the mechanism that allows reads to continue with minimal disruption when a primary MySQL instance is being replaced.

Let’s watch Vitess handle a primary failover, specifically focusing on how it keeps reads flowing. We’ll simulate a scenario where the current primary mysql-0 in zone1 fails, and a new primary mysql-1 is promoted in zone2.

Here’s a typical Vitess cluster setup:

# vt_tablet_map.json (simplified)
{
  "zone1": {
    "mysql-0": {
      "tablet_hostname": "mysql-0.example.com",
      "tablet_port": 15999,
      "tablet_uid": 100,
      "replication_position": "12345",
      "mysql_hostname": "mysql-0.example.com",
      "mysql_port": 3306
    },
    "vt-0": {
      "tablet_hostname": "vt-0.example.com",
      "tablet_port": 15999,
      "tablet_uid": 101,
      "cell": "zone1",
      "mysqld_uid": 100
    }
  },
  "zone2": {
    "mysql-1": {
      "tablet_hostname": "mysql-1.example.com",
      "tablet_port": 15999,
      "tablet_uid": 102,
      "replication_position": "12346",
      "mysql_hostname": "mysql-1.example.com",
      "mysql_port": 3306
    },
    "vt-1": {
      "tablet_hostname": "vt-1.example.com",
      "tablet_port": 15999,
      "tablet_uid": 103,
      "cell": "zone2",
      "mysqld_uid": 102
    }
  }
}

Imagine a read request comes in for zone1/vt-0 targeting keyspace ks. Normally, vt-0 would forward this to the current primary MySQL instance in zone1.

Now, mysql-0 in zone1 experiences an unrecoverable failure. vtorc (or orchestrator) detects this and initiates a reparent. A new primary, mysql-1 in zone2, is promoted. This involves several steps:

  1. Primary Detection & Orphaned Reads: vtorc identifies that the primary in zone1 is down. vt-0 (the VTTablet for zone1) is still trying to serve reads, but it can no longer reach its primary. It enters a "primary-down" state.

  2. Resilience Buffer Activation: This is where the magic happens. When vt-0 detects it can no longer reach its primary, it doesn’t immediately start failing all reads. Instead, it starts buffering write requests and starts queuing read requests that would have gone to the primary. Crucially, it continues to serve reads from its local replica if one exists and is healthy, or from its own buffered state if it was the primary. This is the "buffer" phase. The goal is to minimize the window where reads are unavailable.

  3. New Primary Registration: vtorc promotes mysql-1 in zone2 and registers it as the new primary for ks. vt-1 (the VTTablet for zone2) is updated to know it’s now the primary.

  4. Replica Catch-up: Other VTTablets (like vt-0) that were replicas of the old primary now start replicating from the new primary (mysql-1 in zone2). This catch-up process is critical.

  5. Switchover: Once vt-0 has caught up sufficiently (i.e., its replication lag is below a configured threshold, often max_replication_lag), it can start directing reads to the new primary (vt-1 in zone2). The buffered writes are also replayed.

The key to minimizing read downtime is that VTTablet instances don’t immediately stop serving reads when the primary is lost. They have a short grace period where they might still serve reads from their local data (if they were a replica) or from a cache, while simultaneously trying to establish a connection to the new primary. The max_replication_lag setting on the VTTablet is the critical lever here. If max_replication_lag is set to 0 (meaning zero lag allowed), reads will stop for longer. If it’s set to a few seconds, VTTablet will wait for the replica to catch up to that lag before resuming reads, thus buffering the impact.

The most surprising thing about Vitess’s replication buffering is that it can continue serving some reads from a replica even during a primary failover, provided the replica has caught up sufficiently to the new primary’s state. It’s not a hard stop and start; it’s a more nuanced transition.

The next hurdle is understanding how Vitess handles vttablet process restarts and how that interacts with the reparenting process.

Want structured learning?

Take the full Vitess course →