The Benchmark Harness Starting to Tune

- 16 - Your general benchmark suite should be based on real functions used in the end application, but at the same time should not rely on user input, as this can make measurements difficult. Any variability in input times or any other part of the application should either be eliminated from the benchmarks or precisely identified and specified within the performance targets. There may be variability, but it must be controlled and reproducible.

1.6.3 The Benchmark Harness

There are tools for testing applications in various ways. [2] These tools focus mostly on testing the robustness of the application, but as long as they measure and report times, they can also be used for performance testing. However, because their focus tends to be on robustness testing, many tools interfere with the applications performance, and you may not find a tool you can use adequately or cost-effectively. If you cannot find an acceptable tool, the alternative is to build your own harness. [2] You can search the Web for java+perf+test to find performance-testing tools. In addition, some Java profilers are listed in Chapter 15 . Your benchmark harness can be as simple as a class that sets some values and then starts the main method of your application. A slightly more sophisticated harness might turn on logging and timestamp all output for later analysis. GUI-run applications need a more complex harness and require either an alternative way to execute the graphical functionality without going through the GUI which may depend on whether your design can support this, or a screen event capture and playback tool several such tools exist [3] . In any case, the most important requirement is that your harness correctly reproduces user activity and data input and output. Normally, whatever regression-testing apparatus you have and presumably are already using can be adapted to form a benchmark harness. [3] JDK 1.3 introduced a new java.awt.Robot class, which provides for generating native system-input events, primarily to support automated testing of Java GUIs. The benchmark harness should not test the quality or robustness of the system. Operations should be normal: startup, shutdown, noninterrupted functionality. The harness should support the different configurations your application operates under, and any randomized inputs should be controlled; but note that the random sequence used in tests should be reproducible. You should use a realistic amount of randomized data and input. It is helpful if the benchmark harness includes support for logging statistics and easily allows new tests to be added. The harness should be able to reproduce and simulate all user input, including GUI input, and should test the system across all scales of intended use, up to the maximum numbers of users, objects, throughputs, etc. You should also validate your benchmarks, checking some of the values against actual clock time to ensure that no systematic or random bias has crept into the benchmark harness. For the multiuser case, the benchmark harness must be able to simulate multiple users working, including variations in user access and execution patterns. Without this support for variations in activity, the multiuser tests inevitably miss many bottlenecks encountered in actual deployment and, conversely, do encounter artificial bottlenecks that are never encountered in deployment, wasting time and resources. It is critical in multiuser and distributed applications that the benchmark harness correctly reproduces user-activity variations, delays, and data flows.

1.6.4 Taking Measurements