Rsync over SSH is surprisingly bad at transferring files when the network is highly latent or packet loss is frequent.
Let’s see it in action. Imagine you have a directory ~/source_data on your local machine and you want to mirror it to ~/backup_target on a remote server remote.example.com.
# Local machine
rsync -avz --progress ~/source_data/ remote.example.com:~/backup_target/
The -a flag stands for "archive mode," which preserves permissions, timestamps, ownership, and symbolic links. -v enables verbose output, showing you what’s happening. -z compresses data during transfer, which is great for slow networks but can be detrimental on fast ones. --progress shows a running progress bar. The trailing slash on ~/source_data/ is crucial: it tells rsync to copy the contents of source_data into backup_target, not the directory source_data itself.
Here’s what’s happening under the hood: rsync uses a clever delta-transfer algorithm. Instead of sending entire files, it sends only the differences between the source and destination files. It does this by:
- File List Generation: Both the local and remote
rsyncprocesses generate a list of files, their sizes, and modification times. - Checksumming:
rsyncdivides files into fixed-size blocks and calculates checksums for each block on both ends. - Difference Detection: It compares these checksums. If a block’s checksum matches,
rsyncassumes the block is identical and doesn’t transfer it. - Transferring Deltas: For blocks that don’t match,
rsyncsends the new data. The remotersyncthen reconstructs the updated file using the unchanged blocks and the newly transferred deltas.
This makes rsync incredibly efficient for subsequent transfers. If you run the same command again, only the files that have changed, or new files, will be transferred.
The "over SSH" part means that rsync uses SSH as its transport layer. This provides two massive benefits:
- Encryption: All data transferred is encrypted, keeping your files secure in transit.
- Authentication: SSH handles user authentication, ensuring you’re connecting to the correct server and that you have permission to access the files.
The command rsync -avz -e ssh user@remote.example.com:/path/to/source /path/to/destination is the standard way to initiate this. You can also specify a different SSH port using -e 'ssh -p 2222'.
The most surprising thing about rsync’s delta algorithm is that it’s block-based, not line-based. This means that if you insert a single byte at the beginning of a large text file, rsync will likely re-calculate checksums for all subsequent blocks, even if the rest of the file is identical. This can lead to surprisingly large transfers for seemingly small changes in certain scenarios.
The efficiency of rsync is heavily dependent on the network. While -z (compression) is often recommended for slow links, it adds CPU overhead. On very fast, low-latency networks, the overhead of compression can actually slow things down. In such cases, you might omit -z or even use --no-compress.
The next logical step from here is to explore rsync’s more advanced filtering capabilities, allowing you to include or exclude specific files and directories based on patterns.