As the industry moves towards more distributed deployment of services, syncing files across multiple location is a problem that often needs to be solved. In the world of file synching there are two algorithms that are outstanding. One being rsync which is a very efficient tool for synching files. It works great when you have a few remotes that need to be kept in synch, but as soon as the number of remotes grow we hit the problem of bandwidth and load. Rsync being a single-source-to-destination syncing mechanism, puts a lot of load on the source as the source node needs to transfer a lot of data to each of the remotes.
This is the use case where sync based on bittorrent protocol shines. As bittorrent uses peer-to-peer file distribution there is no single source that distributes the files. Files are broken down into chunks and each chunk is associated with its hash. To get a copy of the file a participating peer asks for all the chunks of the file from all other peers. Any peer that has the chunk can be the source for this transfer. Additionally the file chunks are not transferred sequentially, but the transfer is randomized to make sure that possibility of finding the requested chunk in any other peer (rather than the original source) increases.
In this blog I will explore all the components needed to set up a bittorrent based file sync. A lot of these steps can be automated to come up with an efficient multi-site file synching solution. All the setup is done on a single Ubuntu 16.04 VM. Following diagram shows my test setup.
How to use a bittorrent network to share your data ?
To transfer your data using torrents you will need the following things
- The full copy of the source files that you want to transfer
- Torrent file generated for the source files
- Some way to distribute this torrent file to all your remotes (peers)
- A torrent tracking server
- A torrent client
Installing transmission client
Although you can use any torrent client, transmission is one of the popular torrent client for linux. It provides a CLI (and RPC API) interface which is handy for integration with other projects.
To install transmission on your machine use the following command
apt-get install transmission-cli transmission-common transmission-daemon transmission-gtk transmission-remote-gtk
This should get transmission installed. To configure the torrent client edit the settings file
You might be interested to change the default download directory which by default points to
Make sure that transmission has write access to the new download directory you set.
Transmission uses systemd unit to manage the transmission client daemon. If you want to make changes to transmission daemon follow the standard practice of customizing any systemd service. For my experiment I disabled authentication for RPC calls. To do this follow the steps as below.
cp /lib/systemd/system/transmission-daemon.service /etc/systemd/system/transmission-daemon.service
Then edit the system file in /etc/systemd/system/<service> with your local changes
diff -Nur /etc/systemd/system/transmission-daemon.service /lib/systemd/system/transmission-daemon.service
--- /etc/systemd/system/transmission-daemon.service 2019-03-15 22:30:59.736229363 +0530
+++ /lib/systemd/system/transmission-daemon.service 2018-02-06 23:25:40.000000000 +0530
@@ -5,7 +5,7 @@
-ExecStart=/usr/bin/transmission-daemon -f --log-error -T
+ExecStart=/usr/bin/transmission-daemon -f --log-error
ExecStop=/bin/kill -s STOP $MAINPID
ExecReload=/bin/kill -s HUP $MAINPID
Start the transmission daemon as follows
systemctl start transmission-daemon
And check its status with the following command
systemctl status transmission-daemon
Next step is to create a torrent file for your source files. Before you can create a torrent file you need to provide a tracker url. The next section will show you how to setup your own tracker server or you may choose to use public tracker server
What is a torrent tracker ?
A torrent tracker is a server which tracks all the peers that are interested in a torrent file distribution. It also helps peers interested in a file transfer to find each other. When a torrent client adds a new torrent file, it looks for the track url embedded in the torrent file and contacts the tracker server and gets a list or other peers that are interested and participating in the source file distribution. The client can then contact all its peers and start requesting chunks of the source file.
You can either use a public torrent tracker server, but if you are using torrents to share private data probably you will want to use a private torrent tracker server. For my experiment I tried
To setup the tracker use the following steps at https://github.com/chihaya/chihaya/blob/master/README.md
You will need to install Golang for compiling it. Once compiled you should have a binary named chihaya in your sources top dir. An example configuration file is provided as part of the code example_config.yaml
cp example_config.yaml config.yaml
Then edit it to your liking. As I am running this experiment on my local machine, I changed the http addr to “127.0.0.1:6969”
In a real deployment your tracker should be accessible from all the peers, so it must listen on a public interface. The default setting is to listen on “0.0.0.0:6969”.
Once done start the tracker server as follows
./chihaya --config config.yaml
This will start the server and your tracker url will be http://127.0.0.1:6969/announce or the IP you configured for the http listen address.
How to create a torrent ?
To create the torrent file use the following command.
transmission-create <source file|source dir> -t <tracker url>
transmission-create <source file|source dir> -t http://127.0.0.1:6969/announce
The above command will create the torrent file for your source file(s). The torrent file will contain the chunks definition of the source file along with its hash and the tracker url. Each time a new peer adds the torrent file. The torrent client will send a request to the tracker to add itself to the list of peers interested in the source file.
How to publish your files with torrent ?
To publishing the source file you need to add its torrent (like we created in the steps above). The only difference between all the other peers and the publisher is that the publisher will be the peer with the complete source file to start with. The peers with the complete files are also known as seeders while all the peers who have incomplete source files are called leechers. In the beginning the publisher will be the only seeder but as more and more peers get a complete source file they will also become seeders.
To add a torrent file and become a seeder use the following command.
transmission-remote --add <torrent file> -w <parent dir of complete source files> [ -u <upload bandwidth limit in kb/s>]
The upload bandwidth limit is optional.
How to download files ?
This should be the the most familiar part to everyone. Although for this blog we will use the transmission client for the reasons mentioned above. All the peers need to add the torrent file we created above to the transmission client. The torrent file can be hosted on a web server or can be sent to the peers over mail or any other means. To add a torrent file to all other peers use the following command
transmission-remote --add <torrent file> -w <parent dir where the source file will be saved> [ -u <upload bandwidth limit in kb/s>]
You can use the following commands to list the currently added torrents in transmission
transmission-remote -t <torrent_id> -r
Where torrent_id is the first column in the output of transmission-remote -l. You can also use the web or gtk interface if you like to use transmission
Or run transmission-gtk to use the GUI
Once you have added the torrent with the transmission-remote –add <torrent file> -w <parent dir where the source file will be saved> command, transmission with verify that the available source file matches the hash in the torrent file.
Once completed your transmission client will be marked as a seeder. For my test I used another torrent client to download the source files by using the torrent file.
In production you would like to use the transmission-remote on the peers to add the torrent file and start the download.
If you have the need to distribute files among multiple remotes, torrents can be a very efficient mechanism. You can relieve the source file server of the synching load and also benefit from better bandwidth utilization of the peers, not to mention the increased speed of distribution