rsync in HTTP efficient algorithms for sorting and synchronization 1999

§5.4 rsync in HTTP 93 to compute the differences. This is important because it allows the backup to proceed very quickly, particularly if the speed of the tape device is the limiting factor in the backup process. As the extra storage space required by the signatures is only a tiny fraction of the total space consumed by the file on tape, the storage overhead is small even if the rsync algorithm does not find a significant number of matches between the new and old files. This sort of incremental backup system would be particularly effective when large hierarchical storage systems are used as the backup medium. These devices typically have a large front-end set of disks that are used to stage data to a high latency tape robot. If the file signatures were kept on the front-end disks then no tape reading would be required before the incremental backup started.

5.4 rsync in HTTP

The Hyper-Text Transport Protocol HTTP is the most widely used protocol for pre- senting formatted information on the Internet, and as such the network efficiency of the protocol is very important. Increased efficiency results in better response times for users and decreased network costs. Although cache systems such as those employed by popular web browsers or im- plemented in proxy caching web servers are very effective at reducing this traffic for static documents, they do not help at all for dynamic documents. For documents that change regularly the currently deployed browsers and caching servers must fetch the whole page every time, even if only a small part of the page changes. With the trend towards dynamically generated web sites such dynamic documents are becom- ing more common. To address this issue a proposal has been put to the W3C, the standards body that coordinates HTTP, to incorporate support for differencing in the HTTP protocol. The proposal[Hoff and Payne 1997] introduces a differencing format called gdiff for sending changes in HTTP served documents and suggests the rsync algorithm as the means of generating the differences. The way this would work is that a client requesting a page would send an aug- mented HTTP GET request to the server which contained the fast and strong signa- §5.5 rsync in a network filesystem 94