Conclusions Inspiration efficient algorithms for sorting and synchronization 1999

§2.5 Conclusions 48 The use of early completion detection results in an algorithm which uses a very small average number of IO operations per element while maintaining a simple algorith- mic structure.

2.5 Conclusions

The external parallel sorting algorithm presented in this chapter has some quite strong similarities to the internal parallel sorting algorithm from the previous chapter. Both get most of their work done with a first pass that leaves the vast majority of elements in their correct final position. In the case of the external algorithm this first pass is the natural first step in the overall algorithm whereas for the internal algorithm the first pass is logically separated from the cleanup phase. The external algorithm also makes very efficient use of the disk subsystem by us- ing IO operations that are of a size equal to the internal memory size of each of the processors. This makes an enormous difference to the practical efficiency of the algo- rithm. Chapter 3 The rsync algorithm This chapter describes the rsync algorithm, an algorithm for the efficient remote up- date of data over a high latency, low bandwidth link. Chapter four describes enhance- ments and optimizations to the basic algorithm, and chapter five describes some of the more interesting alternative uses for the ideas behind rsync that have arisen since the algorithm was developed.

3.1 Inspiration

The rsync algorithm was developed out of my frustration with the long time it took to send changes in a source code tree across a dialup network link. I have spent a lot of time developing several large software packages during the course of my PhD and constantly finding myself waiting for source files to be transferred through my modem for archiving, distribution or testing on computers at the other end of the link or on the other side of the world. The time taken to transfer the changes gave me plenty of opportunity to think about better ways of transferring files. In most cases these transfers were actually updates, where an old version of the file already existed at the other end of the link. The common methods for file transfer such as ftp and rcp didn’t take advantage of the old file. It was possible to transfer just the changes by keeping original files around then using a differencing utility like diff and sending the diff files to the other end but I found that very inconvenient in practice and very error prone. So I started looking for a way of updating files remotely without having prior knowledge of the relative states of the files at either end of the link. The aims of such an algorithm are: 49 §3.2 Designing a remote update algorithm 50