- 18 - Optimizations should also be completely documented. It is often useful to retain the previous code
in comments for maintenance purposes, especially as some kinds of optimized code can be more difficult to understand and therefore to maintain.
It is typically better and easier to tune multiuser applications in single-user mode first. Many multiuser applications can obtain 90 of their final tuned performance if you tune in single-user
mode and then identify and tune just a few major multiuser bottlenecks which are typically a sort of give-and-take between single-user performance and general system throughput. Occasionally,
though, there will be serious conflicts that are revealed only during multiuser testing, such as transaction conflicts that can slow an application to a crawl. These may require a redesign or
rearchitecting of the application. For this reason, some basic multiuser tests should be run as early as possible to flush out potential multiuser-specific performance problems.
Tuning distributed applications requires access to the data being transferred across the various parts of the application. At the lowest level, this can be a packet sniffer on the network or server machine.
One step up from this is to wrap all the external communication points of the application so that you can record all data transfers. Relay servers are also useful. These are small applications that just re-
route data between two communication points. Most useful of all is a trace or debug mode in the communications layer that allows you to examine the higher-level calls and communication
between distributed parts.
1.7 What to Measure
The main measurement is always wall-clock time. You should use this measurement to specify almost all benchmarks, as its the real-time interval that is most appreciated by the user. There are
certain situations, however, in which system throughput might be considered more important than the wall-clock time; e.g., servers, enterprise transaction systems, and batch or background systems.
The obvious way to measure wall-clock time is to get a timestamp using
System.currentTimeMillis
and then subtract this from a later timestamp to determine the elapsed time. This works well for elapsed time measurements that are not short.
[5]
Other types of measurements have to be system-specific and often application-specific. You can measure:
[5]
System.currentTimeMillis
can take up to half a millisecond to execute. Any measurement including the two calls needed to measure the time difference should be over an interval greater than 100 milliseconds to ensure that the cost of the
System.currentTimeMillis
calls are less than 1 of the total measurement. I generally recommend that you do not make more than one time measurement i.e., two calls to
System.currentTimeMillis
per second.
•
CPU time the time allocated on the CPU for a particular procedure
•
The number of runnable processes waiting for the CPU this gives you an idea of CPU contention
•
Paging of processes
•
Memory sizes
•
Disk throughput
•
Disk scanning times
•
Network traffic, throughput, and latency
•
Transaction rates
•
Other system values However, Java doesnt provide mechanisms for measuring these values directly, and measuring
them requires at least some system knowledge, and usually some application-specific knowledge e.g., what is a transaction for your application?.
- 19 -
You need to be careful when running tests that have small differences in timings. The first test is usually slightly slower than any other tests. Try doubling the test run so that each test is run twice within the VM
e.g., rename main to maintest , and call maintest twice from a new main .
There are almost always small variations between test runs, so always use averages to measure differences and consider whether those differences are relevant by calculating the variance in the results.
For distributed applications , you need to break down measurements into times spent on each component, times spent preparing data for transfer and from transfer e.g., marshalling and
unmarshalling objects and writing to and reading from a buffer, and times spent in network transfer. Each separate machine used on the networked system needs to be monitored during the test
if any system parameters are to be included in the measurements. Timestamps must be synchronized across the system this can be done by measuring offsets from one reference machine
at the beginning of tests. Taking measurements consistently from distributed systems can be challenging, and it is often easier to focus on one machine, or one communication layer, at a time.
This is usually sufficient for most tuning.
1.8 Dont Tune What You Dont Need to Tune