Vitess’s online resharding lets you split a shard without stopping your application, which feels like magic but is actually a carefully orchestrated migration of data and connections.
Let’s see Vitess resharding in action. Imagine we have a single shard, 0 (covering keyspace -80), containing 10 million rows. We want to split this into two shards: 0 (-40) and 1 (40-80).
First, we need to tell Vitess about the new shard boundary. This is done by updating the reshar_plan in the vtctl_client.
vtctlclient --server <vtctl_host>:<vtctl_port> ApplyVSchema \
- -- --keyspace <keyspace_name>
This command takes a JSON payload that describes the resharding operation. Here’s a simplified example:
{
"partitions": [
{
"name": "0",
"shard_ranges": [
{"start": "-80", "end": "0"}
]
},
{
"name": "1",
"shard_ranges": [
{"start": "40", "end": "80"}
]
}
]
}
After applying this, Vitess knows about the target state. The actual data movement is handled by MoveTables and MoveSides. MoveTables is used for moving entire tables to new shards. MoveSides is the core mechanism for splitting a shard. It works by copying data from the source shard to the new target shard.
Here’s how MoveSides works under the hood:
- Create VReplication Streams: Vitess sets up VReplication streams from the source shard to the target shard for each table. These streams continuously replicate changes from the source to the target, ensuring data consistency.
- Catch Up: The target shard catches up with the source shard.
- Cut Over: Once the target shard is caught up, Vitess performs a cutover. This involves:
- Pausing Writes: Briefly pausing writes to the source shard.
- Final Sync: Performing a final sync of any in-flight transactions.
- Switching Traffic: Redirecting traffic from the source shard to the target shard.
- Resuming Writes: Resuming writes, now directed to the new shard.
The vtctlclient MoveSides command initiates this process.
vtctlclient --server <vtctl_host>:<vtctl_port> MoveSides \
<source_cell>:<source_shard> <target_cell>:<target_shard> <keyspace>
You can monitor the progress using vtctlclient.sh TabletManager.GetVReplicationState <target_tablet_uid>.
The beauty of this is that the application doesn’t see any downtime. Vitess manages the connection routing. When a write comes in, Vitess checks which shard is responsible for that key range. If the key range has moved, Vitess routes the connection to the new shard seamlessly.
The reshar_plan is not just about defining partitions; it also contains information about the vindexes and how they map to the new shard structure. For example, if you have a hash vindex, Vitess uses the vindex’s hash to determine which shard a row belongs to. When you reshard, you’re essentially changing the boundaries of these vindex buckets.
A critical, and often overlooked, aspect of resharding is ensuring your vindexes are correctly configured for the new shard structure. If you have a composite vindex or a vindex that spans multiple shards, you need to carefully plan how its buckets will be distributed. Vitess will attempt to infer this from the vschema, but explicit configuration in the reshar_plan or vschema is often necessary for complex scenarios. This can involve defining how a vindex’s hash space is divided across the new shards.
The next problem you’ll likely encounter is managing the old, now empty, shards.