A Tuning Strategy Introduction

- 10 - On the other hand, external data access or writing to the disk can be slowing your application. In this case, you need to look at exactly what you are doing to the disks that is slowing the application: first identify the operations, then determine the problems, and finally eliminate or change these to improve the situation. For example, one program I know of went through web server logs and did reverse lookups on the IP addresses. The first version of this program was very slow. A simple analysis of the activity being performed determined that the major time component of the reverse lookup operation was a network query. These network queries do not have to be done sequentially. Consequently, the second version of the program simply multithreaded the lookups to work in parallel, making multiple network queries simultaneously, and was much, much faster. In this book we look at the causes of bad performance. Identifying the causes of your performance problems is an essential first step to solving those problems. There is no point in extensively tuning the disk-accessing component of an application because we all know that disk access is much slower than memory access when, in fact, the application is CPU-bound. Once you have tuned the applications first bottleneck, there may be and typically is another problem, causing another bottleneck. This process often continues over several tuning iterations. It is not uncommon for an application to have its initial memory hog problems solved, only to become disk-bound, and then in turn CPU-bound when the disk-access problem is fixed. After all, the application has to be limited by something, or it would take no time at all to run. Because this bottleneck-switching sequence is normal—once youve solved the existing bottleneck, a previously hidden or less important one appears—you should attempt to solve only the main bottlenecks in an application at any one time. This may seem obvious, but I frequently encounter teams that tackle the main identified problem, and then instead of finding the next real problem, start applying the same fix everywhere they can in the application. One application I know of had a severe disk IO problem caused by using unbuffered streams all disk IO was done byte by byte, which led to awful performance. After fixing this, some members of the programming team decided to start applying buffering everywhere they could, instead of establishing where the next bottleneck was. In fact, the next bottleneck was in a data-conversion section of the application that was using inefficient conversion methods, causing too many temporary objects and hogging the CPU. Rather than addressing and solving this bottleneck, they instead created a large memory allocation problem by throwing an excessive number of buffers into the application.

1.4 A Tuning Strategy

Heres a strategy I have found works well when attacking performance problems: 1. Identify the main bottlenecks look for about the top five bottlenecks, but go higher or lower if you prefer. 2. Choose the quickest and easiest one to fix, and address it except for distributed applications where the top bottleneck is usually the one to attack: see the following paragraph. 3. Repeat from Step 1. This procedure will get your application tuned the quickest. The advantage of choosing the quickest to fix of the top few bottlenecks rather than the absolute topmost problem is that once a bottleneck has been eliminated, the characteristics of the application change, and the topmost bottleneck may not even need to be addressed any longer. However, in distributed applications I - 11 - advise you target the topmost bottleneck. The characteristics of distributed applications are such that the main bottleneck is almost always the best to fix and, once fixed, the next main bottleneck is usually in a completely different component of the system. Although this strategy is simple and actually quite obvious, I nevertheless find that I have to repeat it again and again: once programmers get the bit between their teeth, they just love to apply themselves to the interesting parts of the problems. After all, who wants to unroll loop after boring loop when theres a nice juicy caching technique youre eager to apply? You should always treat the actual identification of the cause of the performance bottleneck as a science, not an art. The general procedure is straightforward: 1. Measure the performance using profilers and benchmark suites, and by instrumenting code. 2. Identify the locations of any bottlenecks. 3. Think of a hypothesis for the cause of the bottleneck. 4. Consider any factors that may refute your hypothesis. 5. Create a test to isolate the factor identified by the hypothesis. 6. Test the hypothesis. 7. Alter the application to reduce the bottleneck. 8. Test that the alteration improves performance, and measure the improvement include regression testing the affected code. 9. Repeat from Step 1. Heres the procedure for a particular example: 1. Run the application through your standard profiler measurement. 2. You find that the code spends a huge 11 of time in one method identification of bottleneck. 3. Looking at the code, you find a complex loop and guess this is the problem hypothesis. 4. You see that it is not iterating that many times, so possibly the bottleneck could be outside the loop confounding factor. 5. You could vary the loop iteration as a test to see if that identifies the loop as the bottleneck. However, you instead try to optimize the loop by reducing the number of method calls it makes: this provides a test to identify the loop as the bottleneck and at the same time provides a possible solution. In doing this, you are combining two steps, Steps 5 and 7. Although this is frequently the way tuning actually goes, be aware that this can make the tuning process longer: if there is no speedup, it may be because your optimization did not actually make things faster, in which case you have neither confirmed nor eliminated the loop as the cause of the bottleneck. 6. Rerunning the profile on the altered application finds that this method has shifted its percentage time down to just 4. This may still be a candidate bottleneck for further optimization, but nevertheless its confirmed as the bottleneck and your change has improved performance. 7. Already done, combined with Step 5. 8. Already done, combined with Step 6.

1.5 Perceived Performance