A full backup is often the least efficient way to back up your data, yet it’s the foundation of all other backup types.
Let’s see how this plays out in practice. Imagine you have a server with 10TB of data.
# Simulate creating a large file
dd if=/dev/zero of=/data/bigfile.img bs=1G count=10
Now, let’s perform a "full" backup. For simplicity, we’ll just copy the data. In a real-world scenario, this would be a more sophisticated backup tool, but the principle is the same: copy everything.
# Simulate a full backup by copying all data
mkdir /backup/full_$(date +%Y%m%d)
cp -r /data/* /backup/full_$(date +%Y%m%d)/
This cp command, while basic, represents a full backup. It copies all 10TB of data. If you had to do this daily, your storage would fill up rapidly, and it would take a very long time.
The problem this solves is data loss. If your server’s disk fails, or you accidentally delete a critical file, you need a copy to restore from. The challenge is doing this efficiently, minimizing storage space and backup time.
Here’s where incremental and differential backups come in. They build upon the initial full backup.
An incremental backup backs up only the data that has changed since the last backup (of any type).
Let’s simulate a change:
# Modify a small part of the big file
echo "some new data" >> /data/bigfile.img
Now, an incremental backup would only copy the changed blocks. Using our cp analogy, it’s still a bit clunky, but imagine a backup tool that only identifies and copies the new or modified data blocks.
# Simulate an incremental backup (only changed data)
mkdir /backup/inc_$(date +%Y%m%d)
# In reality, this would be a smart copy, not a full cp
# For demonstration, we'll just copy the modified file
cp /data/bigfile.img /backup/inc_$(date +%Y%m%d)/
This incremental backup would be tiny, perhaps only a few KB or MB, compared to the initial 10TB. To restore, you need the original full backup and all subsequent incremental backups in order.
A differential backup backs up all data that has changed since the last full backup.
Let’s make another change:
# Modify another part of the big file
echo "more new data" >> /data/bigfile.img
A differential backup would copy all changes since the original full backup.
# Simulate a differential backup (all changes since last FULL)
mkdir /backup/diff_$(date +%Y%m%d)
# In reality, this would be a smart copy, not a full cp
# For demonstration, we'll just copy the modified file
cp /data/bigfile.img /backup/diff_$(date +%Y%m%d)/
This differential backup would be larger than the incremental backup, as it includes all changes since the last full backup, not just since the last backup overall. To restore, you need the original full backup and the latest differential backup.
The most common strategy is a weekly full backup, followed by daily incremental backups. This balances storage needs with recovery complexity. A differential strategy might be daily differentials after a weekly full, which simplifies restoration but uses more space than incrementals.
The key to understanding restoration is this:
- Full + All Incrementals (in order) = Restore
- Full + Latest Differential = Restore
The one thing most people don’t fully grasp is how backup tools track changes. They don’t typically rescan entire files. Instead, they rely on filesystem timestamps and, more robustly, on archive bits or transaction logs. When a file is modified, its archive bit is usually set. Backup software then looks for files with the archive bit set, backs them up, and then clears the bit (for incremental) or leaves it set (for differential). This makes the process incredibly fast.
The next step is understanding backup retention policies and how they interact with different backup types to manage storage space over time.