Performance Optimization Methods

12.8 Performance Optimization Methods

Measures to improve the performance of Web applications are primarily targeted at shortening response times or increasing throughput. However, these goals can be achieved only provided we identify and remove the bottleneck (or bottlenecks) in a system. We call the component with the highest utilization the primary bottleneck, which also primarily requires performance-improving measures. A bottleneck can basically be removed by reducing the load on the bottleneck component (e.g., reducing the transmitted file sizes to remove load from the network), or increasing the performance of the component itself (e.g., increasing the bandwidth of the network connection).

Once we have removed the primary bottleneck, we typically observe that another component will become a bottleneck; this component is known as the secondary bottleneck. An optimal system from the performance perspective is a balanced system, in which the utilization of all system components is (approximately) equal, so that there will be no primary bottleneck. An optimal strategy to improve performance should, therefore, not only look at one single component, but consider the system as a whole.

In principal, any system component can represent a bottleneck, and performance problems in practice are frequently due to delays in the transmission network (Internet) and server-side delays due to dynamically generated pages (e.g., high startup times of CGI scripts, time-intensive database queries). The following sections discuss selected performance optimization methods; see (Tracy et al. 1997) or (Killelea 2002) for a detailed description.

260 Performance of Web Applications

12.8.1 Acceleration Within a Web Application The first class of methods applies to the Web application itself, which means that it has to be

taken into account in the application development process. This class includes by definition all methods that aim at shortening the execution time of the application on the server, but also adaptations within the Web application that shorten the communication time with the client, or lead to shorter execution time on the client. The following methods are examples of this class:

Load Balancing Between Client and Server An important question for the design of Web applications is the load distribution between client

and server. More specifically, we have to decide how much of the application logic should be processed at the client and how much at the server. For example, checking data input in forms could be done at the client. These controls are relevant both for security reasons and performance reasons.

Embedding Server-side Applications Web servers offer a CGI (Common Gateway Interface) that can be used to forward HTTP requests

to other programs in the server for further processing. From the performance perspective, we could investigate alternative embedding possibilities, e.g., by using servlets (see also Chapter 6).

Pre-generating Web Content The dynamic generation of pages based on information from other applications (typically

databases) is a computing-intensive and time-consuming process. It would, therefore, be a good idea to pre-generate popular pages and then make them available as static pages, at the cost of their to-the-second accuracy (Sindoni 1998, Schrefl et al. 2003).

Adapting an HTTP Response to the Client’s Capabilities Though Web application developers normally have no influence on the options site visitors

configure in their clients (e.g., disabling the automatic download of images), they have to take these options into account in the application development process. Many client settings can be polled by Web applications (e.g., by using JavaScript), so that more personalized information can

be presented to the visitors. This allows to increase the quality of service subjectively perceived by the users (see Chapter 1). In particular when embedding multimedia contents, we should take the capabilities of clients into account to ensure, for instance, that both the server and the network won’t get overloaded by transmitting large data sets, which wouldn’t be displayed correctly in the client anyway. Compromises in the representation quality of multimedia content can be meaningful in favor of improved performance.

12.8 Performance Optimization Methods 261

12.8.2 Reducing Transmission Time For Web applications in which the clients and servers are connected over a WAN, the network

often represents the bottleneck. Future Internet protocols will support more options to negotiate the quality of service between clients and servers. The two approaches briefly described below have been used successfully with the current technological state-of-the-art.

Web Caching and Content Pre-fetching Many Web browsers store frequently requested pages on the local hard disk (or in the main

memory) on the client. When the client visits such a page again, it fetches the page from its cache rather than forwarding the request to the server. A continuation of this concept are Web proxy servers, which assume the same function as caches in the Internet, i.e., they serve replies to client requests from their cache rather than from the server.

In addition to reducing the transmission delay, Web caching removes load from the Web server. Web content pre-fetching is based on the same idea, but it additionally tries to predict the pages a user will access next, by loading these pages proactively into cache memory. The strategy that determines which pages should be held in the cache, and for how long, is important both in Web caching and content pre-fetching. Recent approaches try to apply methods from AI and machine learning (Park et al. 2005). To take the right decision, we have to weigh the cache capacity, the access time, and actuality (Labrinidis and Roussopoulos 2004, Podlipnig and B¨osz¨ormenyi 2003). A poorly configured proxy or cache server can even cause response times to increase.

From a protocol point of view, the W3C has issued an RFC on caching in HTTP ( http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html ). A well known and common- ly used caching tool is for example SQUID ( http://www.squid-cache.org ).

Web Server Replication and Load Balancing Web server replication is another technique that does not forward a client request directly to

the addressed server, but redirects it to a replicate of that server. In addition to shortening the transmission time, this technique also removes some load from the server. A simple way of implementing this technique are so-called mirror sites, offered explicitly as alternative servers to users. The drawback is that users normally don’t know what server offers the best performance, and the site operator has no control over load balancing. These problems can be avoided by using implicit mirroring. Externally, implicit mirroring lets users access only one logical address (URL) of a server, while the request is then internally forwarded to a replicate of that server. Another benefit of implicit mirroring is that the users always need to know only one single address, and the server replica can be easily added or removed without having to inform users of changes in the address. This takes the aspect of spontaneity (see Chapter 1) into account.

Surrogate servers represent a technique that combines caching and replication. A Web server forwards requests to a surrogate server. The surrogate server tries to reply to the request from its cache, and accesses the original Web server in the event of a cache miss. Surrogate servers can

262 Performance of Web Applications also be used jointly by several Web servers. A network composed of surrogate servers is also

known as content delivery network (CDN; Gadde et al. 2000). The literature often uses the term proxy server to generally describe a server that offers services on behalf of another server (see Chapter 4). In this sense, proxy server would be the generic term for cache server, replicate server, and surrogate server (Rabinovich and Spatschek 2002).

Application / component Performance

System tuning Observing performance

evaluation

indicators

Model improvement / fine-tuning Performance

metrics

Performance prediction

Performance Setting performance interpretation

actuators

Performance Management Figure 12-8 Traditional cycle of performance evaluation and performance management.

Performance analysis

12.9 Outlook 263

12.8.3 Server Tuning The last class of methods is aimed at improving the execution platform of a Web application. This

group includes primarily hardware upgrades (main memory expansion, CPU acceleration, faster disks) and optimized settings in the Web server (e.g., disabling the DNS lookup for log files). We refer our readers to the manual of their preferred Web server and relevant information and discussion forums on the Internet (e.g., http://httpd.apache.org/docs/misc/perf.html or http://www.microsoft.com/serviceproviders/whitepapers/tuningiis.asp ).