A Vitess backup isn’t really a backup in the traditional sense; it’s a point-in-time snapshot of your data and your schema, all bundled together and stored in a way that Vitess can efficiently reconstruct.
Let’s see this in action. Imagine you have a Vitess cluster, and you want to back up your commerce keyspace.
First, you’d typically run the vtctlclient Backup command. This initiates the backup process on the tablet that’s the primary for the shard you’re backing up.
vtctlclient --server <vtctld-host>:<vtctld-port> Backup <keyspace>/<shard>
For example, if your vtctld is at localhost:15999 and you want to back up the commerce keyspace, 0 shard:
vtctlclient --server localhost:15999 Backup commerce/0
Vitess will then tell you which tablet is taking the backup. You’ll see output like this:
Starting backup for keyspace "commerce", shard "0" on tablet 100
Backup initiated, please check tablet 100 logs for progress.
The backup process itself involves a few key steps happening on the primary tablet:
- Snapshotting: The tablet uses
mysqldump(or a similar tool, depending on configuration) to create a consistent snapshot of the data and schema. This is done with a read-only lock to ensure consistency. - Compression: The resulting dump file is compressed to save space and reduce transfer time.
- Uploading: The compressed snapshot is uploaded to a configured backup storage location. This is typically Amazon S3, Google Cloud Storage, or a local filesystem path accessible by all your
vtbackupprocesses. Vitess usesvtbackupagents to handle this upload.
Full vs. Incremental Backups
Vitess supports both full and incremental backups.
- Full Backups: These capture the entire state of the keyspace/shard at the time of the backup. They are the most straightforward for restoration but can be large and take longer to create.
- Incremental Backups: These capture only the changes that have occurred since the last successful full backup. This significantly reduces backup size and time, but restoration requires applying the full backup and all subsequent incremental backups in order.
You can configure the backup strategy. For instance, to perform a full backup every day and incremental backups every hour:
The vtbackup process, which runs on a separate set of worker nodes (or can be co-located), is responsible for actually performing the backup operations. It communicates with vtctld to get instructions and then interacts with the primary tablet.
The vtbackup configuration typically resides in a file like vtbackup.json. Here’s a snippet showing how you might configure it for S3 and specify a backup strategy:
{
"bucket": "s3://my-vitess-backups",
"aws_region": "us-east-1",
"backup_strategy": {
"full_backup_interval": "24h",
"incremental_backup_interval": "1h"
},
"upload_concurrency": 4
}
This configuration tells vtbackup to:
- Use the
s3://my-vitess-backupsbucket inus-east-1for storage. - Aim to perform a full backup every 24 hours and an incremental backup every 1 hour.
- Use 4 concurrent uploads for efficiency.
The Mental Model
Think of Vitess backups as a chain. A full backup is the start of the chain. Each incremental backup is a link added to that chain, referencing the previous state. When you restore, you grab the latest full backup and then meticulously attach each incremental link that followed it, replaying the changes until you reach your desired point in time.
The vtctlclient Restore command is your tool for bringing this chain back to life. You specify the backup files (either the full backup or a sequence of full + incrementals) and the target keyspace/shard. Vitess then orchestrates the process of downloading the backup data, creating new tablets if necessary, and loading the data into them.
What most people don’t realize is that the backup_strategy in vtbackup.json is a hint to the vtbackup process. It doesn’t automatically schedule backups. You still need a separate scheduler (like cron or a Kubernetes CronJob) to trigger the vtctlclient Backup command at your desired intervals. The backup_strategy fields then inform vtbackup about the type of backup it should attempt when triggered. If a full backup is due according to the strategy, it will try to perform a full one; otherwise, it will attempt an incremental.
The restoration process can be complex, especially with many incremental backups. Vitess relies on the backup_storage implementation to list and retrieve backup files, and on the tablet process to apply the SQL dump and binlog events.
The next thing you’ll likely encounter is understanding how to manage backup retention policies and prune old backups to avoid filling up your storage.