Transferring the file list Summary

§4.6 Transferring the file list 82

4.6 Transferring the file list

Before starting a multiple file transfer with the rsync algorithm it is useful to first transfer the list of files from the sending machine to the receiving machine. This allows the two computers to reference files using an index into the file list which simplifies the implementation considerably. Although it is quite easy to pack the file list so that it consumes a minimum of space while being transferred 17 there are some quite common situations where the network costs associated with transferring the file list dominate the total network costs of the file transfer. This most commonly occurs when rsync is being used to mirror a directory tree consisting of many thousands of files where the number of files that have changed between mirror runs is small 18 . To reduce this cost it is possible to employ the rsync algorithm to more efficiently transfer the file list itself. This is done by generating the file list at both the senders and receivers end of the connection and storing it in a canonical sorted form. The receiver can then obtain a copy of the senders file list by running the rsync algorithm on the file list data itself. By doing this the network cost of sending the file list is reduced dramatically when the sender and receiver have very similar directory trees. If their trees are not similar then the network cost of applying this technique is limited to the cost of sending the signatures for the data in the receivers file list.

4.7 Summary

This chapter has examined a number of optimizations and enhancements to the basic rsync algorithm described in the previous chapter. The use of smaller signatures, stream compression and data transformations improved the efficiency considerably for some data sets while the pipelining technique allows the algorithm to minimise latency costs normally associated with the transfer of large number of files. The next 17 My rsync implementation uses approximately ten bytes per file in the file list, including meta data such as file permissions, file type and ownership information. 18 By default my rsync implementation skips the transfer phase completely for files that are the same size and have the same timestamp on both ends of the link, thereby avoiding the network cost of sending the signatures but not avoiding the transfer of a file list entry. §4.7 Summary 83 chapter will consider alternative uses for the ideas behind rsync. Chapter 5 Further applications for rsync This chapter describes some alternative uses of the ideas behind rsync. Since starting work on rsync and distributing a free implementation the number of uses for the algorithm has grown enormously, well beyond the initial idea of efficient remote file update. From new types of compression systems to differencing algorithms to incre- mental backup systems the basic rsync algorithm has proved to be quite versatile. This chapter describes some of the more interesting uses for the rsync algorithm.

5.1 The xdelta algorithm