- 29 - The trace corresponding to this second entry in the summary example turns out to be another
truncated trace, but the example shows the same method in 14th position, and the trace for that entry identifies the
Double.equals
call as coming from the
Hashtable.put
call. Unfortunately for tuning purposes, the
Double.equals
method itself is already quite fast and cannot be optimized further.
When methods cannot be directly optimized, the next best choice is to reduce the number of times they are called or even avoid the methods altogether. In fact, eliminating method calls is actually
the better tuning choice, but is often considerably more difficult to achieve and so is not a first- choice tactic for optimization. The object-creation profile and the method profile together point to
the
FloatingDecimal
class as being a huge bottleneck, so avoiding this class is the obvious tuning tactic here. In
Chapter 5 , I employ this technique, avoiding the default call through the
FloatingDecimal
class for the case of converting floating-point numbers to
String
s, and I obtain an order-of-magnitude improvement. Basically, the strategy is to create a more efficient routine to
run the equivalent conversion functionality, and then replacing the calls to the underperforming
FloatingDecimal
methods with calls to the more efficient optimized methods. The best way to avoid the
Double.equals
method is to replace the hash table with another implementation that stores double primitive data types directly rather than requiring the
double
s to be wrapped in a
Double
object. This allows the == operator to make the comparison in the
put
method, thus completely avoiding the
Double.equals
call: this is another standard tuning tactic, where a data structure is replaced with a more appropriate and faster one for the task.
The 1.1 profiling output is quite different and much less like a standard profilers output. Running the 1.1 profiler with this program details of this output are given in
Section 2.3.4 gives:
count callee caller time 21 javalangSystem.gc V
javalangFloatingDecimal.dtoaIJIV 760 8 javalangSystem.gc V
javalangDouble.equalsLjavalangObject;Z 295 2 javalangDouble.doubleToLongBitsDJ
javalangDouble.equalsLjavalangObject;Z 0
I have shown only the top four lines from the output. This output actually identifies both the FloatingDecimal.dtoa and the Double.equals methods as taking the vast majority
of the time, and the percentages given by the reported times are listed as around 70 and 25 of the total program time for the two methods, respectively. Since the callee for these methods is listed as
System.gc , this also identifies that the methods are significantly involved in memory creation and suggests that the next tuning step might be to analyze the object-creation output for this program.
2.3.2 Java 2 cpu=samples Profile Output
The default profile output gained from executing with
-Xrunhprof
in Java 2 is not useful for method profiling. The default output generates object-creation statistics from the heap as the dump
output occurs. By default, the dump occurs when the application terminates; you can modify the dump time by typing Ctrl-\ on Solaris and other Unix systems, or Ctrl-Break on Win32. To get a
useful method profile, you need to modify the profiler options to specify method profiling. A typical call to achieve this is:
java -Xrunhprof:cpu=samples,thread=y classname
Note that in a Windows command-line prompt, you need to surround the option with double quotes because the equals sign is considered a meta character.
- 30 - Note that
-Xrunhprof
has an h in it. There seems to be an undocumented feature of the VM in which the option
-Xrunsomething
makes the VM try to load a shared library called
something
, e.g., using
-Xrunprof
results in the VM trying to load a shared library called prof. This can be quite confusing if you are not expecting it. In fact,
-Xrunhprof
loads the hprof shared library.
The profiling option in JDK 1.21.3 can be pretty flaky. Several of the options can cause the runtime to crash core dump. The output is a large file, since huge amounts of trace data are written rather
than summarized. Since the profile option is essentially a Sun engineering tool, it has had limited resources applied to it, especially as Sun has a separate not free profile tool that Sun engineers
would normally use. Another tool that Sun provides to analyze the output of the profiler is called heap-analysis tool search
http:www.java.sun.com for HAT. But this tool analyzes only the
object-creation statistics output gained with the default profile output, and so is not that useful for method profiling see
Section 2.4 for slightly more about this tool.
Nevertheless, I expect the free profiling option to stabilize and be more useful in future versions. The output when run with the options already listed
cpu=samples, thread=y
already results in fairly usable information. This profiling mode operates by periodically sampling the stack. Each
unique stack trace provides a TRACE entry in the second section of the file; describing the method calls on the stack for that trace. Multiple identical samples are not listed; instead, the number of
their hits are summarized in the third section of the file. The profile output file in this mode has three sections:
Section 1 A standard header section describing possible monitored entries in the file. For example:
WARNING This file format is under development, and is subject to change without notice.
This file contains the following types of records: THREAD START
THREAD END mark the lifetime of Java threads TRACE represents a Java stack trace. Each trace consists
of a series of stack frames. Other records refer to TRACEs to identify 1 where object allocations have
taken place, 2 the frames in which GC roots were found, and 3 frequently executed methods.
Section 2 Individual entries describing monitored events, i.e., threads starting and terminating, but
mainly sampled stack traces. For example:
THREAD START obj=8c2640, id = 6, name=Thread-0, group=main THREAD END id = 6
TRACE 1: empty
TRACE 964: javaioObjectInputStream.readObjectObjectInputStream.java:Compiled
method javaioObjectInputStream.inputObjectObjectInputStream.java:Compiled
method javaioObjectInputStream.readObjectObjectInputStream.java:Compiled
method
- 31 -
javaioObjectInputStream.inputArrayObjectInputStream.java:Compiled method
TRACE 1074: javaioBufferedInputStream.fillBufferedInputStream.java:Compiled
method javaioBufferedInputStream.read1BufferedInputStream.java:Compiled
method javaioBufferedInputStream.readBufferedInputStream.java:Compiled
method javaioObjectInputStream.readObjectInputStream.java:Compiled method
Section 3 A summary table of methods ranked by the number of times the unique stack trace for that
method appears. For example:
CPU SAMPLES BEGIN total = 512371 Thu Aug 26 18:37:08 1999 rank self accum count trace method
1 16.09 16.09 82426 1121 javaioFileInputStream.read 2 6.62 22.71 33926 881
javaioObjectInputStream.allocateNewObject 3 5.11 27.82 26185 918
javaioObjectInputStream.inputClassFields 4 4.42 32.24 22671 887 javaioObjectInputStream.inputObject
5 3.20 35.44 16392 922 javalangreflectField.set
Section 3 is the place to start when analyzing this profile output. It consists of a table with six fields, headed
rank
,
self
,
accum
,
count
,
trace
, and
method
, as shown. These fields are used as follows: rank
This column simply counts the entries in the table, starting with 1 at the top, and incrementing by 1 for each entry.
self The
self
field is usually interpreted as a percentage of the total running time spent in this method. More accurately, this field reports the percentage of samples that have the stack
given by the
trace
field. Heres a one-line example:
rank self accum count trace method 1 11.55 11.55 18382 545 javalangFloatingDecimal.dtoa
This example shows that stack trace 545 occurred in 18,382 of the sampled stack traces, and this is 11.55 of the total number of stack trace samples made. It indicates that this method
was probably executing for about 11.55 of the application execution time, because the samples are at regular intervals. You can identify the precise trace from the second section
of the profile output by searching for the trace with identifier 545. For the previous example, this trace was:
TRACE 545: thread=1 javalangFloatingDecimal.dtoaFloatingDecimal.java:Compiled method
javalangFloatingDecimal.initFloatingDecimal.java:Compiled method javalangDouble.toStringDouble.java:Compiled method
javalangString.valueOfString.java:Compiled method
This TRACE entry clearly identifies the exact method and its caller. Note that the stack is reported to a depth of four methods. This is the default depth: the depth can be changed
- 32 - using the depth parameter to the
-Xrunhprof
option, e.g.,
- Xrunhprof:depth=6,cpu=samples,...
. accum
This field is a running additive total of all the
self
field percentages as you go down the table: for the Section 3 example shown previously, the third line lists 27.82 for the
accum
field, indicating that the sum total of the first three lines of the
self
field is 27.82. count
This field indicates how many times the unique stack trace that gave rise to this entry was sampled while the program ran.
trace This field shows the unique trace identifier from the second section of profile output that
generated this entry. The trace is recorded only once in the second section no matter how many times it is sampled; the number of times that this trace has been sampled is listed in
the
count
field. method
This field shows the method name from the top line of the stack trace referred to from the
trace
field, i.e., the method that was running when the stack was sampled. This summary table lists only the method name and not its argument types. Therefore, it is
frequently necessary to refer to the stack itself to determine the exact method, if the method is an overloaded method with several possible argument types. The stack is given by the
trace identifier in the
trace
field, which in turn references the trace from the second section of the profile output. If a method is called in different ways, it may also give rise to
different stack traces. Sometimes the same method call can be listed in different stack traces due to lost information. Each of these different stack traces results in a different entry in the
third section of the profilers output, even though the
method
field is the same. For example, it is perfectly possible to see several lines with the same
method
field, as in the following table segment:
rank self accum count trace method 95 1.1 51.55 110 699 javalangStringBuffer.append
110 1.0 67.35 100 711 javalangStringBuffer.append 128 1.0 85.35 99 332 javalangStringBuffer.append
When traces 699, 711, and 332 are analyzed, one trace might be
StringBuffer.appendboolean
, while the other two traces could both be
StringBuffer.appendint
, but called from two different methods and so giving rise to two different stack traces and consequently two different lines in the summary example.
Note that the trace does not identify actual method signatures, only method names. Line numbers are given if the class was compiled so that line numbers remain. This ambiguity
can be a nuisance at times.
The profiler in this mode
cpu=samples
is useful enough to suffice when you have no better alternative. It does have an effect on real measured times, slowing down operations by variable
- 33 - amounts even within one application run. But it normally indicates major bottlenecks, although
sometimes a little extra work is necessary to sort out multiple identical method-name references.
Using the alternative
cpu=times
mode, the profile output gives a different view of application execution. In this mode, the method times are measured from method entry to method exit,
including the time spent in all other calls the method makes. This profile of an application gives a tree-like view of where the application is spending its time. Some developers are more comfortable
with this mode for profiling the application, but I find that it does not directly identify bottlenecks in the code.
2.3.3 HotSpot and 1.3 -Xprof Profile Output