Multiple files and pipelining

§4.5 Multiple files and pipelining 80

4.5 Multiple files and pipelining

The speedup in terms of the number of bytes sent over the network link is only one aspect of an efficient file transfer algorithm. The other really important consideration is the latency costs involved, particularly when the transfer or update consists of many files rather than a single file. Commonly used file transfer protocols such as ftp, rcp and rdist send each file as a separate operation, waiting for the receiver to acknowledge the completion of the transfer of the current file before starting to transfer the next file. This means that when transferring N files the minimum time will be N λ where λ is the round-trip latency of the network link. When used for a large number of small files such as mirroring a typical web site over an international Internet link this latency cost may dominate the time taken for the transfer. To overcome the latency cost the file transfer algorithm needs to use one logical round-trip for multiple files, rather than one per file 13 . This is commonly known as pipelining. The pipelining method used by rsync is shown in Figure 4.1. The process is broken into three pieces – the generator, the sender and the receiver. The generator runs on B and generates the signatures for all files that are to be transferred, sending the signa- tures to the sender. The sender running on A matches the signatures from B against the new files and sends a stream of block tokens and literal bytes to the receiver. The receiver running on B reconstructs the files 14 . The link from the receiver to the generator shown as a dashed line in the diagram is for a small message to indicate if a file has not been successfully received which is detected when the strong signature of the reconstructed file is not correct. In that case the generator starts again, but with a larger per-block strong signature 15 . To demonstrate the effectiveness of this scheme at reducing latency costs I trans- 13 I use the term logical round trip to emphasize the fact that I am ignoring round-trips due to the trans- port layer, such as TCP acknowledgement packets. Although TCP guarantees that a return acknowl- edgement packet will be sent at least every second packet these usually do not contribute to latency due to the way the TCP windowing works. 14 Note that I have ignored the transfer of the file list, which must be done prior to starting the 3-way process. This adds a total of one more latency to the algorithm. 15 It will also start with the incorrectly transferred file, as it is highly likely that most of the blocks are in fact correct. §4.5 Multiple files and pipelining 81 sender generator receiver Figure 4.1 : Pipelining in rsync protocol time seconds rsync 6.9 rcp 351 ftp 853 Table 4.6 : Time to transfer 1000 small files over a dialup link ferred 1000 files of size one byte each using rsync, rcp and ftp 16 . The files were trans- ferred over a PPP link which has a latency of approximately 120 milliseconds and a maximum bandwidth of approximately 3.6 kilobytessec. The results are shown in Table 4.6. The dramatic effect of the per-file latency in rcp and ftp is clearly shown in the results. In real application where files larger than one byte are usually used the observed differences will be the additional overhead when transferring 1000 files. For very large files this will be unimportant but it can have a very significant impact for moderately large files or when using a high latency network link. 16 For ftp the mget operation was used with prompting disabled. §4.6 Transferring the file list 82