§5.4 rsync in HTTP 93
to compute the differences. This is important because it allows the backup to proceed very quickly, particularly if the speed of the tape device is the limiting factor in the
backup process. As the extra storage space required by the signatures is only a tiny fraction of the total space consumed by the file on tape, the storage overhead is small
even if the rsync algorithm does not find a significant number of matches between the new and old files.
This sort of incremental backup system would be particularly effective when large hierarchical storage systems are used as the backup medium. These devices typically
have a large front-end set of disks that are used to stage data to a high latency tape robot. If the file signatures were kept on the front-end disks then no tape reading
would be required before the incremental backup started.
5.4 rsync in HTTP
The Hyper-Text Transport Protocol HTTP is the most widely used protocol for pre- senting formatted information on the Internet, and as such the network efficiency of
the protocol is very important. Increased efficiency results in better response times for users and decreased network costs.
Although cache systems such as those employed by popular web browsers or im- plemented in proxy caching web servers are very effective at reducing this traffic for
static documents, they do not help at all for dynamic documents. For documents that change regularly the currently deployed browsers and caching servers must fetch
the whole page every time, even if only a small part of the page changes. With the trend towards dynamically generated web sites such dynamic documents are becom-
ing more common. To address this issue a proposal has been put to the W3C, the standards body
that coordinates HTTP, to incorporate support for differencing in the HTTP protocol. The proposal[Hoff and Payne 1997] introduces a differencing format called gdiff for
sending changes in HTTP served documents and suggests the rsync algorithm as the means of generating the differences.
The way this would work is that a client requesting a page would send an aug- mented HTTP GET request to the server which contained the fast and strong signa-
§5.5 rsync in a network filesystem 94