Scaling Design and Architecture

- 295 -

13.4.2 Scaling

The performance characteristics of an application vary with the number of different factors the application can deal with. These variable factors can include the number of users, the amount of data dealt with, the number of objects used by the application, etc. During the design phase, whenever considering performance, you should consider how the performance scales as the load on the application varies. It is usually not possible to predict or measure the performance for all possible variations of these factors. But you should select several representative sets of values for the factors, and predict and measure performance for these sets. The sets should include factors for when the application: • Has a light load • Has a medium load • Has a heavy load • Has a varying load predicted to represent normal operating conditions • Has spiked loads where the load is mostly normal but occasionally spikes to the maximum supported • Consistently has the maximum load the application was designed to support You need to ensure that your scaling conditions include variations in threads, objects and users, and variations in network conditions if appropriate. Measure response times and throughput for the various different scenarios and decide whether any particular situation needs optimizing for throughput of the system as a whole or for response times for individual users . It is clear that many extra factors need to be taken into account during scaling. The tools you have for profiling scaling behavior are fairly basic: essentially, only graphs of response times or throughput against scaled parameters. It is typical to have a point at which the application starts to have bad scaling behavior: the knee or elbow in the response-time curve. At that point, the application has probably reached some serious resource conflict that requires tuning so that nice scaling behavior can be extended further. Clearly, tuning for scaling behavior is likely to be a long process, but you cannot shortcut this process if you want to be certain your application scales. [12] [12] By including timer-based delays in the application code, at least one multiuser application has deliberately slowed response times for low-scaled situations. The artificial delay is reduced or cut out at higher scaling values. The users perceive a system with a similar response time under most loads.

13.4.3 Distributed Applications