Tricks and tips to increase performance

11.2 Tricks and tips to increase performance

Performance increases can often be made by simple changes to how data is moved between client and server. In some cases, these techniques may not

276 11.2 Tricks and tips to increase performance

be applicable; however when used correctly, each of the following methods will help keep your data moving quickly.

11.2.1 Caching

Caching can increase network performance by storing frequently accessed static data in a location that provides faster data return than the normal access time for the static data. It is important that all three of the following criteria are met:

The data must be frequently accessed . There is no point in storing large datasets in memory or on disk when only one client will ever request it, once.

The data must not change as often as it is requested . The data should remain static for long periods, or else clients will receive outdated data.

The access time for cached data must be substantially faster than the access time to receive the data directly . It would defeat the purpose if a client were denied access to the data from its source and instead was redirected to a caching server that had to reprocess the data.

Data can be cached at any point between the client and server. Server- side caches can protect against out-of-date data, but they are slower than cli- ent-side caches. Client caches are very fast because the data is read from disk, not the network, but they are prone to out-of-date data. Proxy caches are a combination of the two. They can refresh their cache regularly when idle and can serve data faster because they will be on a local connection to the client. Old data on a proxy can be frustrating for a user because it is awk- ward to flush the cache of a proxy server manually.

Server caching can be extremely useful when data on the server needs to

be processed before it can be sent to clients. A prime example of this is that when an ASP.NET page is uploaded to a server, it must be compiled before generating content that is sent to the client. It is extremely wasteful to have the server recompile the page every time it is requested, so the compiled version is held in a server-side cache.

When a site consists of mainly static content, it is possible to cache a compressed version of each of the pages to be delivered because most browsers can dynamically decompress content in the right format. There-

11.2 Tricks and tips to increase performance 277

fore, instead of sending the original version of each page, a compressed ver- sion could be sent. When the content is dynamic, it is possible to utilize on- the-fly compression from server-accelerator products such as Xcache and Pipeboost.

Caching introduces the problem of change monitoring, so that the cached data reflects the live data as accurately as possible. Where the data is in the form of files on disk, one of the simplest mechanisms is to compare the “date modified” field against the cached data. Above that, hashing could

be used to monitor changes within datasets or other content. Within the environment of a single Web site or application, caching can

be controlled and predicted quite easily, except when the content to be served could come from arbitrary sources. This situation might arise in a generic caching proxy server, where content could come from anywhere on the Internet. In this case, the proxy must make an educated assessment about whether pages should be cached locally or not.

The proxy would need to hold an internal table, which could record all requests made to it from clients. The proxy would need to store the full HTTP request because many sites behave differently depending on what cookies and so forth are sent by the client. Along with the requests, the proxy would need to be able to count the number of identical requests and how recently they were made. The proxy should also keep checksums (or hashes) of the data returned from the server relative to each request. With this information, the proxy can determine if the content is too dynamic to cache. With that said, even the most static and frequently accessed sites change sometimes. The proxy could, during lull periods, check some of the currently cached Web sites against the live versions and update the cache accordingly.

11.2.2 Keep-alive connections

Even though most Web pages contain many different images that all come from the same server, some older (HTTP 1.0) clients create new HTTP connections for each of the images. This is wasteful because the first HTTP connection is sufficient to send all of the images. Luckily, most browsers and servers are capable of handling HTTP 1.1 persistent connections. A cli- ent can request that a server keep a TCP connection open by specifying Connection: Keep-Alive in the HTTP header.

Netscape pioneered a technology that could send many disparate forms of data through the same HTTP connection. This system was called “server push” and could provide for simple video streaming in the days before Win-

Chapter 11

278 11.2 Tricks and tips to increase performance

dows media. Server push was never adopted by Microsoft, and unfortu- nately it is not supported by Internet Explorer, but it is still available in Netscape Navigator.

When a TCP connection opens and closes, several handshake packets are sent back and forth between the client and server, which can waste up to one second per connection for modem users. If you are developing a propri- etary protocol that involves multiple sequential requests and responses between client and server, you should always aim to keep the TCP connec- tion open for as long as possible, rather than repeatedly opening and closing it with every request.

The whole handshake latency issue can be avoided completely by using

a non-connection-oriented protocol such as UDP. As mentioned in Chap- ter 3, however, data integrity is endangered when transmitted over UDP. Some protocols such as real-time streaming protocol (RTSP, defined in RFC 2326) use a combination of TCP and UDP to achieve a compromise between speed and reliability.

11.2.3 Progressive downloads

When most of a file is downloaded, the client should be able to begin to use the data. The obvious applications are audio and video, where users can begin to see and hear the video clip before it is fully downloaded. The same technique is applicable in many scenarios. For instance, if product listings are being displayed as they are retrieved, a user could interrupt the process once the desired product is shown and proceed with the purchase.

Image formats such as JPEG and GIF come in a progressive version, which renders them as full-size images very soon after the first few hundred bytes are received. Subsequent bytes form a more distinct and higher-qual- ity image. This technique is known as interlacing . Its equivalent in an online

catalog application would be where product names and prices download first, followed by the images of the various products.

11.2.4 Tweaking settings

Windows is optimized by default for use on Ethernets, so where a produc- tion application is being rolled out to a client base using modems, ISDN, or DSL, some system tweaking can be done to help Windows manage the connection more efficiently and, ultimately, to increase overall network per- formance. Because these settings are systemwide, however, these changes

11.2 Tricks and tips to increase performance 279

should only be applied when the end-customer has given your software per- mission to do so.

The TCP/IP settings are held in the registry at

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\ Parameters

Under this location, various parameters can be seen, such as default name servers and gateways, which would otherwise be inaccessible programmati- cally. Not all of these parameters would already be present in the registry by default, but they could be added when required.

The first system tweak is the TCP window size, which can be set at the following registry location:

HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ GlobalMaxTcpWindowSize

The TCP window specifies the number of bytes that a sending computer can transmit without receiving an ACK. The recommended value is 256,960. Other values to try are 372,300, 186,880, 93,440, 64,240, and

32,120. The valid range is from the maximum segment size (MSS) to 2 30 . For best results, the size has to be a multiple of MSS lower than 65,535 times a scale factor that’s a power of 2. The MSS is generally roughly equal to the maximum transmission unit (MTU), as described later. This tweak reduces protocol overhead by eliminating part of the safety net and trim- ming some of the time involved in the turnaround of an ACK.

TcpWindowSize can also exist under \Parameters\Interface\ . If the setting is added at this location, it overrides the global setting. When the window size is less than 64K, the Tcp1323Opts setting should be applied as detailed below:

HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ Tcp1323Opts

“Tcp1323” refers to RFC 1323, a proposal to add timestamps to pack- ets to aid out-of-order deliveries. Removing timestamps shaves off 12 bytes per TCP/IP packet, but reduces reliability over bad connections. It also affects TCP window scaling, as mentioned above. Zero is the recommended option for higher performance. Set the size to one to include window-scal-

Chapter 11

280 11.2 Tricks and tips to increase performance

ing features and three to apply the timestamp. This setting is particularly risky and should not be tampered with without great care.

The issue of packets with a time-to-live (TTL) value is discussed again in the multicast section in this chapter, where it is of particular importance. The setting can be applied on a systemwide level at this registry location:

HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ DefaultTTL

The TTL of a packet is a measure of how many routers a packet will travel through before being discarded. An excessively high TTL (e.g., 255) will cause delays, especially over bad links. A low TTL will cause some packets to be discarded before they reach their destination. The recommended value is 64.

The MTU is the maximum size of any packet sent over the wire. If it is set too high, lost packets will take longer to retransmit and may get frag- mented. If the MTU is set too low, data becomes swamped with overhead and takes longer to send. Ethernet connections use a default of 1,500 bytes per packet; ADSL uses 1,492 bytes per packet; and FDDI uses 8,000 bytes per packet. The MTU value can be left as the default or can be negotiated at startup. The registry key in question is

HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ EnablePMTUDiscovery

The recommended value is one.This will make the computer negotiate with the NIC miniport driver for the best value for MTU on initial transmission. This may cause a slow startup effect, but it will ultimately be beneficial if there should be little packet loss and the data being transferred is large.

Ideally, every piece of datagram being sent should be the size of the MTU. If it is any larger than the MTU, the datagram will fragment, which takes computing time and increases the risk of datagram loss. This setting is highly recommended for modem users:

HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ EnablePMTUBHDetect

The recommended setting is zero. Setting this parameter to one ( True ) enables “black hole” routers to be detected; however, it also increases the

11.2 Tricks and tips to increase performance 281

maximum number of retransmissions for a given TCP data segment. A black hole router is one that fails to deliver packets and does not report the failure to the sender with an ICMP message. If black hole routers are not an issue on the network, they can be ignored.

HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ SackOpts

The recommended setting is one. This enables Selective Acknowledgement (SACK) to take place, which can improve performance where window sizes are low.

HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ TcpMaxDupAcks

The recommended value is two. The parameter determines the number of duplicate acknowledgments that must be received for the same sequence number of sent data before “fast retransmit” is triggered to resend the seg- ment that has been dropped in transit. This setting is of particular impor- tance on links where a high potential for packet loss exists.

Moving outside the low-level TCP nuts and bolts, a setting can improve the performance of outgoing HTTP connections. These settings can speed up activities such as Web browsing:

HKEY_USERS\.DEFAULT\Software\Microsoft\Windows\ CurrentVersion\Internet Settings\

"MaxConnectionsPerServer"=dword:00000020 "MaxConnectionsPer1_0Server"=dword:00000020

HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\ Internet Settings\

"MaxConnectionsPerServer"=dword:00000020 "MaxConnectionsPer1_0Server"=dword:00000020

This setting actually increases the number of concurrent outgoing con- nections that can be made from the same client to the one server. This is a (small) violation of the HTTP standard and can put undue strain on some Web servers, but the bottom line is, if it makes your application run faster, who cares?

Chapter 11

282 11.3 Multicast UDP