- 295 -
13.4.2 Scaling
The performance characteristics of an application vary with the number of different factors the application can deal with. These variable factors can include the number of users, the amount of
data dealt with, the number of objects used by the application, etc. During the design phase, whenever considering performance, you should consider how the performance scales as the load on
the application varies. It is usually not possible to predict or measure the performance for all possible variations of these factors. But you should select several representative sets of values for
the factors, and predict and measure performance for these sets. The sets should include factors for when the application:
•
Has a light load
•
Has a medium load
•
Has a heavy load
•
Has a varying load predicted to represent normal operating conditions
•
Has spiked loads where the load is mostly normal but occasionally spikes to the maximum supported
•
Consistently has the maximum load the application was designed to support You need to ensure that your scaling conditions include variations in threads, objects and users, and
variations in network conditions if appropriate. Measure response times and throughput for the various different scenarios and decide whether any particular situation needs optimizing for
throughput of the system as a whole or for response times for individual users .
It is clear that many extra factors need to be taken into account during scaling. The tools you have for profiling scaling behavior are fairly basic: essentially, only graphs of response times or
throughput against scaled parameters. It is typical to have a point at which the application starts to have bad scaling behavior: the knee or elbow in the response-time curve. At that point, the
application has probably reached some serious resource conflict that requires tuning so that nice scaling behavior can be extended further. Clearly, tuning for scaling behavior is likely to be a long
process, but you cannot shortcut this process if you want to be certain your application scales.
[12] [12]
By including timer-based delays in the application code, at least one multiuser application has deliberately slowed response times for low-scaled situations. The artificial delay is reduced or cut out at higher scaling values. The users perceive a system with a similar response time under most loads.
13.4.3 Distributed Applications
The essential design points for ensuring good performance of distributed applications are:
•
Supporting asynchronous communications
•
Decoupling process activities from each other in such a way that no process is forced to wait for others using queues achieves this
•
Supporting parallelism in the design of the workflows Determining the bottleneck in a distributed application requires looking at the throughput of every
component:
•
Client and server processes
•
Network transfer rates peak and average
•
Network interface card throughput
•
Router speed, disk IO
•
Middlewarequeuing transfer rates
•
Database access, update and transaction rates
- 296 -
•
Operating-system loads Tuning any component other than the current bottleneck gives no improvement. Peak performance
of each component is rarely achieved. You need to assume average rates of performance from the underlying resource and expect performance based on those average rates.
Distributed applications tend to exaggerate any performance characteristics. So when performance is bad, the application tends to slow significantly more than in nondistributed applications. The
distributed design aspects should emphasize asynchronous and concurrent operations. Typical items to include in the design are:
•
Queues
•
Asynchronous communications and activities
•
Parallelizable activities
•
Minimized serialization points
•
Balanced workloads across multiple servers
•
Redundant servers and automatic switching capabilities
•
Activities that can be configured at runtime to run in different locations
•
Short transactions The key to good performance in a distributed application is to minimize the amount of
communication necessary. Performance problems tend to be caused by too many messages flying back and forth between distributed components. Bells rule of networking applies: Money can buy
bandwidth, but latency is forever.
[13] [13]
Thomas E. Bell, Performance of distributed systems, a paper presented at the ICCM Capacity Management Forum 7, San Francisco, October 1993.
Unfortunately, communication overhead can be incurred by many different parts of a distributed application. There are some general high-level guidelines:
•
Allow the application to be partitioned according to the data and processing power. Any particular task should be able to run in several locations, and the location that provides the
best performance should be chosen at runtime. Usually the best location for the task is where the data required for the task is stored, as transferring data tends to be a significant overhead.
•
Avoid generating distributed garbage. Distributed garbage collection can be a severe overhead on any distributed application.
•
Reduce the costs of keeping data synchronized by minimizing the duplication of data.
•
Reduce data-transfer costs by duplicating data. This conflicts directly with the last point, so the two techniques must be balanced to find the optimal data duplication points.
•
Cache distributed data wherever possible.
•
Use compression to reduce the time taken to transfer large amounts of data.
13.4.4 Object Design