§2.5 Conclusions 48
The use of early completion detection results in an algorithm which uses a very small average number of IO operations per element while maintaining a simple algorith-
mic structure.
2.5 Conclusions
The external parallel sorting algorithm presented in this chapter has some quite strong similarities to the internal parallel sorting algorithm from the previous chapter. Both
get most of their work done with a first pass that leaves the vast majority of elements in their correct final position. In the case of the external algorithm this first pass is the
natural first step in the overall algorithm whereas for the internal algorithm the first pass is logically separated from the cleanup phase.
The external algorithm also makes very efficient use of the disk subsystem by us- ing IO operations that are of a size equal to the internal memory size of each of the
processors. This makes an enormous difference to the practical efficiency of the algo- rithm.
Chapter 3
The rsync algorithm
This chapter describes the rsync algorithm, an algorithm for the efficient remote up- date of data over a high latency, low bandwidth link. Chapter four describes enhance-
ments and optimizations to the basic algorithm, and chapter five describes some of the more interesting alternative uses for the ideas behind rsync that have arisen since
the algorithm was developed.
3.1 Inspiration
The rsync algorithm was developed out of my frustration with the long time it took to send changes in a source code tree across a dialup network link. I have spent a
lot of time developing several large software packages during the course of my PhD and constantly finding myself waiting for source files to be transferred through my
modem for archiving, distribution or testing on computers at the other end of the link or on the other side of the world. The time taken to transfer the changes gave me
plenty of opportunity to think about better ways of transferring files. In most cases these transfers were actually updates, where an old version of the
file already existed at the other end of the link. The common methods for file transfer such as ftp and rcp didn’t take advantage of the old file. It was possible to transfer
just the changes by keeping original files around then using a differencing utility like diff and sending the diff files to the other end but I found that very inconvenient in
practice and very error prone. So I started looking for a way of updating files remotely without having prior
knowledge of the relative states of the files at either end of the link. The aims of such an algorithm are:
49
§3.2 Designing a remote update algorithm 50