Related work Summary efficient algorithms for sorting and synchronization 1999

§3.6 Related work 68 data 16 bit 32 bit set strength strength Linux 16 32 Samba 16 30 Netscape 16 29 Table 3.6 : Fast signature effective bit strength and the 16 bit fast signature hash. The fast signature is calculated once for each byte of literal data plus once per block and each fast signature is compared to the fast sig- natures of each block from B so the number of pairwise comparisons of fast signatures is d l + nLnL where d l is the number of bytes of literal data transferred. If a signa- ture algorithm has an effective bit strength of b then we would expect it to incorrectly match approximately every 2 b times, which means the effective bit strength is b = log d l + nLn Lm where m is the observed number of incorrect matches. Table 3.6 shows the effective bit strengths for the three sets of results given in the previous table 27 . In all three test sets the 16 bit effective strength is the full 16 bits, which demon- strates that the 16 bit hash is doing the job it was designed for. The performance of the 32 bit fast signature is not quite as good, but still does quite well considering the simplicity of the algorithm.

3.6 Related work

While I haven’t found any papers describing algorithms like the one described in this chapter, it has been pointed out to me that US patent number 5446888 describes a somewhat similar algorithm. The algorithms are not identical, but do seem to provide similar functionality[Pyne 1995]. 27 With zero false alarms for the Linux data set a “perfect” effective 32 bit strength is indicated. §3.7 Summary 69

3.7 Summary

The rsync algorithm provides an effective way to solve the remote update problem using a randomized method with a very low probability of failure. The dual signature method used by rsync allows the algorithm to efficiently find matches at any byte boundary in the source file while still using a strong signature for block matching. The next chapter will consider various enhancements and optimisations to the basic rsync algorithm. Chapter 4 rsync enhancements and optimizations The previous chapter introduced the basic rsync algorithm which allows for efficient remote update of data over a high latency, low bandwidth link. This chapter will consider optimizations and extensions to the algorithm, practical issues to do with mirroring large filesystems and transformations that make rsync applicable to a wider range of data.

4.1 Smaller signatures