Support

Find answers, guides, and tutorials to supercharge your content delivery.

rsync - Transfer and Sync Files

Published on April 7, 2021
rsync - Transfer and Sync Files

rsync is a C program originally written in 1996, primarily designed to allow data transfers and synchronization while using minimal network bandwidth. This GNU-licensed utility is optimized for use on UNIX-based systems and is still in development decades after its first release. Since rsync doesn't contain security measures itself, almost all implementations of it put all data through SSH or similar.

Even if you've never heard of rsync, if you use version control tools like git, chances are that you've used it indirectly. Its popularity over the years have led to it being incorporated into such tools. It uses delta encoding, also known as "differential backup". Let's dive into how to use it, why to use it, and some real-world examples.

The rsync protocol

Before we go into the details, it's important to point out that rsync is both a utility and a protocol. Many technical documents that refer to "rsync" don't specify which one they mean. The rsync protocol can be used on servers that have rsync operating in daemon mode. In other words, the server must have a passive listener (rsyncd) running.

The daemon enables remote users to send files to it and receive files from it. It doubles as an effective equivalent of an FTP server. However, it's rarely used for that purpose because it's less lightweight than simply setting up an FTP daemon. There aren't any added security or speed benefits over FTP for simple file transfer, either.

How rsync works

The rsync utility must be installed on both the client and server machine before getting started. Rsync has two modes: local and remote. If both machines are on the same network, the local mode is used. This uses minimal transfer security and can rapidly perform transfers and synchronization over LAN.

If both machines are not on the same network, the remote mode is used. By default, SSH is used to ensure transfer security, but this can be configured differently on the server. If the server is not running rsyncd, then the rsync command is executed on the server upon connection. However, if the daemon is being run, clients can connect to the server at rsync's default TCP port of 873 to get going.

Determining which files have been altered

When a client requests a synchronization with a remote rsync server, rsync will figure out which files it needs to transfer. The most basic mechanism rsync uses for this purpose is simply recursively going through directories and examining dates of modification and exact file sizes. This data set collected from both the remote server and client are then compared. rsync then knows which files to scrutinize for synchronization.

Using the file modification time and size will work in most cases and is usually only utilized in casual environments. By invoking the --checksum argument, rsync will also check the hash value of each file. This is a more sound mechanism, but it takes more time and is more computationally expensive. Even with this, there's always the miniscule chance of a hash collision occurring. Hash collisions are when two different files incidentally have the same hash value, but the likelihood of this is astronomically low.

Performing a delta transfer

Most of rsync's lasting popularity and usage in the open source community is due to its differential transfer mechanism. In short, rather than sending an entire file from the remote server to the client if it's been updated more recently, only the edited portions are sent. This cuts down on the amount of network bandwidth used, but it requires a pretty crafty mathematical algorithm in order to work.

More on delta transfers

In order to perform a delta transfer, rsync looks at the overall file structure of each file for which the remote server has a newer version. It then goes through each file, cutting them into "chunks", on both the client and remote server versions.

The remote server and client both perform MD5 hashing on "chunks" as well as the (much faster) "rolling" hash on each "chunk". On the remote end, chunks are of fixed length. For the client, chunks begin incrementing at the same rate until the hash of any given chunk is a mismatch. Rsync then performs a fairly intensive, yet quick, process to resolve exactly which portion within each chunk is different. By the end of going through the frame of each file, the client will have a version that matches the remote host's version.

Various compression libraries are often used in conjunction with rsync to even further cut down on network bandwidth. This became commonplace because when the program was put out in 1996, high-speed Internet connections were relatively rare. Every bit counted, and so these efforts were warranted. Though it may seem like overkill, rsync's delta transfer is by far the most efficient data synchronization tool despite its age, or perhaps due to its age.

Usage of rsync

While rsync as a product is quite efficient, its syntax could use some work, like many UNIX-based applications. Let's take a look at some example use cases and the most common arguments used in rsync.

The general syntax of rsync commands will look familiar to Linux terminal users:

rsync [arguments] [path to source files to transfer] [path of destination to receive source files]

For a full list of rsync commands, you can simply enter rsync -h in your terminal.

rsync arguments

Here's a quick cheat sheet to highlight the most important arguments of the program.

-r: This makes it a recursive transfer, meaning that all folders within folders and files within them get copied. Keep in mind that, in itself, this argument doesn't retain permissions or metadata.

-a: The "a" is for "archive". This mode uses recursion as well, but it maintains permissions and all of the metadata used by UNIX systems, like symbolic links, ownership, and more.

-b: This argument makes a local backup file of the remote server's contents.

-e: This adds encryption to all transfers by using SSH. This one is very important to remember.

-v: The "verbose" option gives you a lot of information on your rsync transfer as it happens. Not recommended unless you're debugging.

--progress: Nobody really knows why, but there isn't a progress meter during rsync transfers by default. You'll need this argument to know what percentage is done.

Sample rsync commands

We'll go through a few simple rsync commands that are commonly used. To start off, let's say you wish to copy a folder along with all its subfolders and files. This will require recursion, of course. Unless there's a specific reason you don't want to include metadata and permissions, the -a argument is recommended for this purpose. We'll assume you're on Linux and want to copy everything from a folder called copyme on your desktop to one called destination:

rsync -a /home/user/Desktop/copyme /home/user/Desktop/destination

That one was simple enough. Let's try an rsync command that utilizes a remote server. It can be on our own network, but we need to know the local IP address in that case. We'll assume that it's 192.168.46.99. Keeping the same source and destination paths as our first example, we'll be copying files to the server from our local machine:

rsync -a /home/user/Desktop/copyme 192.168.46.99:/home/user/Desktop/destination

Note the colon right after the IP address. That's how to indicate that the file is on a remote machine. For our final example, we have a super-secret file called passwords.txt stored on the Desktop of username_here. We want to securely transfer its contents or update them if the file already exists on the remote machine.

rsync -e ssh /home/user/Desktop/passwords.txt 192.168.46.99:/home/user/Desktop

In that example, note that we used the encryption argument. We also specified the type of encryption. SSH is recommended due to its longevity, trust, and the fact that it's always maintained and will have the most current cipher suites.

Wrapping up on rsync

rsync may just be a C application with a single thread from the 1990s. However, due to research and intelligent development over the years, it's still what powers most synchronization tools on the market today. While we went over how the tool works and what it can do, you'll need to spend time practicing to truly master the tool.

Once you have mastered it, you'll likely notice it's immensely powerful. Especially when developing software, this can be a free alternative to paid options on the market that are essentially rsync with a fresh coat of paint.

Supercharge your content delivery 🚀

Try KeyCDN with a free 14 day trial, no credit card required.

Get started
KeyCDN uses cookies to make its website easier to use. Learn more