Tuning Valkey RDB snapshots involves a delicate dance between how often you save your data and how much that saving process impacts your running Valkey instance.
Here’s what a typical RDB snapshotting configuration looks like in valkey.conf:
save 900 1
save 300 10
save 60 10000
This configuration means:
- Save the database if at least 1 key changed in 900 seconds (15 minutes).
- Save the database if at least 10 keys changed in 300 seconds (5 minutes).
- Save the database if at least 10000 keys changed in 60 seconds (1 minute).
When Valkey initiates an RDB save, it forks the main process. The child process then reads the entire dataset and writes it to disk. This fork operation is the primary source of potential performance impact. The larger your dataset, the more memory is consumed by the duplicated memory pages during the fork, and the longer it can take to complete.
The save directives in valkey.conf are evaluated in order. If any of these conditions are met, a save is triggered. This means you can have multiple conditions that lead to a save. If you want to disable RDB snapshots entirely, you can comment out all save lines or add a save "".
The core trade-off is between data durability and performance. More frequent snapshots mean less data loss in case of a crash but can lead to more frequent I/O and CPU spikes. Less frequent snapshots reduce the overhead but increase the potential data loss window.
To get a sense of how often your RDB snapshots are actually happening, you can check the Valkey INFO output:
valkey-cli INFO persistence
Look for rdb_changes_since_last_save to see how many keys have been modified since the last RDB dump. The rdb_bgsave_enabled and rdb_bgsave_in_progress flags are also crucial. If rdb_bgsave_in_progress is 1, a snapshot is currently being taken.
The "cost" of an RDB snapshot isn’t just about disk I/O. The fork() system call, especially on systems with Copy-on-Write (CoW) memory management, can temporarily double the memory footprint of your Valkey process. If your Valkey instance is already memory-constrained, this can lead to the operating system swapping, severely degrading performance or even causing OOM (Out Of Memory) killer actions. The rdb_last_bgsave_status field in INFO persistence will tell you if the last save succeeded or failed.
If you’re seeing performance degradation correlated with RDB saves, the most direct approach is to adjust the save directives. For example, to make snapshots less frequent, you might change:
save 60 10000
to
save 300 10000
This means a save will only occur if 10000 keys change within 300 seconds (5 minutes), rather than 60 seconds. This reduces the frequency of the fork() operation and the associated overhead.
Alternatively, if your workload involves bursts of writes that momentarily exceed a threshold but then subside, you might need to increase the "keys changed" count for shorter intervals. For instance, if save 60 10000 is too aggressive, you could try save 60 50000. This ensures that a save is only triggered if a very significant number of keys change within that minute.
The stop-writes-on-bgsave-error yes configuration option is important. If an RDB save fails (e.g., due to disk full or permissions issues), Valkey will stop accepting writes to prevent data inconsistency. If this is set to yes and you experience write errors, check your disk space and Valkey’s permissions to write to the dir specified in valkey.conf.
A common misconception is that RDB is a "real-time" backup. It’s a point-in-time snapshot. If your application crashes, you’ll lose all writes that occurred after the last successful RDB save. For higher durability, especially with frequent writes, consider enabling Valkey’s Append Only File (AOF) persistence or using a combination of RDB and AOF.
The decision of how often to save RDBs is also influenced by your recovery time objective (RTO) and recovery point objective (RPO). If you can tolerate losing up to an hour of data, infrequent RDB saves are fine. If you need to minimize data loss, you’ll need more frequent saves or AOF.
The rdbcompression yes option can save disk space but adds CPU overhead during the save process. If your primary concern is CPU, you might consider disabling it by setting rdbcompression no. This makes the save operation faster but the resulting RDB file larger.
The next thing you’ll likely grapple with is how to manage the RDB file itself, especially in a clustered or replicated environment.