- 34 -
0.7 2 + 3 java.lang.StrictMath.floor 0.5 3 + 1 java.lang.Double.longBitsToDouble
Section 5 A list of internal VM function calls. Listed in order of the total number of ticks counted
while the method was at the top of the stack. Not tuneable. For example:
Runtime stub + native Method 0.1 1 + 0 interpreter_entries
0.1 1 + 0 Total runtime stubs
Section 6 Other miscellaneous entries not included in the previous sections:
Thread-local ticks: 1.4 10 classloader
0.1 1 Interpreter 11.7 86 Unknown code
Section 7 A global summary of ticks recorded. This includes ticks from the garbage collector, thread-
locking overheads, and other miscellaneous entries:
Global summary of 7.57 seconds: 100.0 754 Received ticks
1.9 14 Received GC ticks 0.3 2 Other VM operations
The entries at the top of Section 3 are the methods that probably need tuning. Any method listed near the top of Section 2 should have been targeted by the HotSpot optimizer and may be listed
lower down in Section 3. Such methods may still need to be optimized, but it is more likely that the methods at the top of Section 3 are what need optimizing. The ticks for the two sections are the
same, so you can easily compare the time taken up by the top methods in the different sections and decide which to target.
2.3.4 JDK 1.1.x -prof and Java 2 cpu=old Profile Output
The JDK 1.1.x method-profiling output, obtained by running with the
-prof
option, is quite different from the normal 1.2 output. This output format is supported in Java 2, using the
cpu=old
variation of the
-Xrunhprof
option. This output file consists of four sections: Section 1
The method profile table showing cumulative times spent in each method executed. The table is sorted on the first
count
field; for example:
callee caller time 29 javalangSystem.gc V
javaioFileInputStream.read[BI 10263 1 javaioFileOutputStream.writeBytes[BIIV
javaioFileOutputStream.write[BIIV 0
Section 2 One line describing high-water gross memory usage. For example:
- 35 -
handles_used: 1174, handles_free: 339046, heap-used: 113960, heap-free: 21794720
The line reports the number of handles and the number of bytes used by the heap memory storage over the applications lifetime. A handle is an object reference. The number of
handles used is the maximum number of objects that existed at any one time in the application handles are recycled by the garbage collector, so over its lifetime the
application could have used many more objects than are listed. The heap measurements are in bytes.
Section 3 Reports the number of primitive data type arrays left at the end of the process, just before
process termination. For example:
sig count bytes indx [C 174 19060 5
[B 5 19200 8
This section has four fields. The first field is the primitive data type array dimensions and data type given by letter codes listed shortly, the second field is the number of arrays, and
the third is the total number of bytes used by all the arrays. This example shows 174
char
arrays taking a combined space of 19,060 bytes, and 5
byte
arrays taking a combined space of 19,200 bytes.
The reported data does not include any arrays that may have been garbage collected before the end of the process. For this reason, the section is of limited use. You could use the
- noasyncgc
option to try to eliminate garbage collection if you have enough memory; you may also need
-mx
with a large number to boost the maximum memory available. If you do, also use
-verbosegc
so that if garbage collection is forced, you at least know that garbage collection has occurred and can get the basic number of objects and bytes reclaimed.
Section 4 The fourth section of the profile output is the per-object memory dump. Again, this includes
only objects left at the end of the process just before termination, not objects that may have been garbage-collected before the end of the process. For example:
tab[267] p=4bba378 cb=1873248 cnt=219 ac=3 al=1103 LjavautilHashtableEntry; 219 3504
[LjavautilHashtableEntry; 3 4412
This dump is a snapshot of the actual object table. The fields in the first line of an entry are: tab[
index
] The entry location as listed in the object table. The index is of no use for performance
tuning.
p=
hex value
Internal memory locations for the instance and class; of no use for performance tuning.
- 36 - cb=
hex value
Internal memory locations for the instance and class; of no use for performance tuning. cnt=
integer
The number of instances of the class reported on the next line. ac=
integer
The number of instances of arrays of the class reported on the next line. al=
integer
The total number of array elements for all the arrays counted in the previous
ac
field. This first line of the example is followed by lines consisting of three fields: first, the class
name prefixed by the array dimension if the line refers to the array data; next, the number of instances of that class or array class; and last, the total amount of space used by all the
instances, in bytes. So the example reports that there are 219
HashtableEntry
instances taking a total of 3504 bytes between them,
[5]
and three
HashtableEntry
arrays having 1103 array indexes between them which amounts to 4412 bytes between them, since each entry
is a 4-byte object handle.
[5]
A
HashtableEntry
has one
int
and three object handle instance variables, each of which takes 4 bytes, so each
HashtableEntry
is 16 bytes.
The last two sections, Sections 3 and 4, give snapshots of the object table memory and can be used in an interesting way: to run a garbage collection just before termination of your application. That
leaves in the object table all the objects that are rooted
[6]
by the system and by your application from static variables. If this snapshot shows significantly more objects than you expect, you may
be referencing more objects than you realized.
[6]
Objects rooted by the system are objects the JVM runtime keeps alive as part of its runtime system. Rooted objects are generally objects that cannot be garbage collected because they are referenced in some way from other objects that cannot be garbage collected. The roots of these non-garbage-collectable
objects are normally objects referenced from the stack, objects referenced from static variables of classes, and special objects the runtime system ensures are kept alive.
The first section of the profile output is the most useful, consisting of multiple lines, each of which specifies a method and its caller, together with the total cumulative time spent in that method and
the total number of times it was called from that caller. The first line of this section specifies the four fields in the profile table in this section:
count
,
callee
,
caller
, and
time
. They are detailed here:
count The total number of times the
callee
method was called from the
caller
method, accumulating multiple executions of the
caller
method. For example, if
foo1
calls
foo2
10 times every time
foo1
is executed, and
foo1
was itself called three times during the execution of the program, the
count
field should hold the value 30 for the callee-caller pair
foo2 -foo1
. The line in the table should look like this:
30 xyZ.foo2 V xyZ.foo1 V 1263
- 37 - assuming the
foo
methods are in class
x.y.Z
and they both have a void return. The actual reported numbers may be less than the true number of calls: the profiler can miss
calls.
callee The method that was called
count
times in total from the
caller
method. The callee can be listed in other entries as the
callee
method for different
caller
methods. caller
The method that called the
callee
method
count
times in total. time
The cumulative time in milliseconds spent in the
callee
method, including time when the
callee
method was calling other methods i.e., when the
callee
method was in the stack but not at the top, and so was not the currently executing method.
If each of the
count
calls in one line took exactly the same amount of time, then one call from caller to callee took
time
divided by
count
milliseconds. This first section is normally sorted into
count
order. However, for this profiler, the time spent in methods tends to be more useful. Because the times in the
time
field include the total time that the callee method was anywhere on the stack, interpreting the output of complex programs can be
difficult without processing the table to subtract subcall times. This format is different from the 1.2 output with
cpu=samples
specified, and is more equivalent to a 1.2 profile with
cpu=times
specified. The lines in the profile output are unique for each callee-caller pair, but any one
callee
method and any one
caller
method can and normally do appear in multiple lines. This is because any particular method can call many other methods, and so the method registers as the caller for
multiple callee-caller pairs. Any particular method can also be called by many other methods, and so the method registers as the callee for multiple callee-caller pairs.
The methods are written out using the internal Java syntax listed in Table 2-1
. Table 2-1, Internal Java Syntax for -prof Output Format
Internal Symbol Java Meaning
Replaces the . character in package names e.g., javalangString stands for java.lang.String
B byte
C char
D double
I int
F float
J long
S short
V void
Z boolean
[ One array dimension e.g., [[B stands for a two-dimensional array of bytes, such as new
- 38 -
byte[3][4] Lclassname; A class e.g., LjavalangString; stands for java.lang.String
There are free viewers, including source code, for viewing this format file:
•
Vladimir Bulatovs HyperProf search for HyperProf on the Web
•
Greg Whites ProfileViewer search for ProfileViewer on the Web
•
My own viewer see ProfileStack: A Profile Viewer for Java 1.1
ProfileStack: A Profile Viewer for Java 1.1
I have made my own viewer available, with source code. Under the
tuning.profview
package, the main class is
tuning.profview.ProfileStack
and takes one argument, the name of the prof file. All classes from this book are available by clicking the Examples
link from this books catalog page, http:www.oreilly.comcatalogjavapt
. My viewer analyzes the profile output file, combines identical
callee
methods to give a list of its callers, and maps codes into readable method names. The output to
System.out
looks like this:
time count localtime callee 19650 2607 19354 int ObjectInputStream.read
Called by time count caller
98.3 19335 46 short DataInputStream.readShort 1.1 227 1832 int DataInputStream.readUnsignedByte
0.2 58 462 int DataInputStream.readInt 0.1 23 206 int DataInputStream.readUnsignedShort
0.0 4 50 byte DataInputStream.readByte 0.0 1 9 boolean DataInputStream.readBoolean
19342 387 19342 int SocketInputStream.socketReadbyte[],int,i Called by
time count caller 100.0 19342 4 int SocketInputStream.readbyte[],int,i
15116 3 15116 void ServerSocket.implAcceptSocket Called by
time count caller 100.0 15116 3 Socket ServerSocket.accept
Each main nonindented line of this output consists of a particular method
callee
showing the cumulative time in milliseconds for all the callers of that method, the cumulative count from all the callers, and the time actually spent in the method itself not
in any of the methods that it called. This last noncumulative time is found by identifying the times listed for all the callers of the method and then subtracting the total time for all
those calls from the cumulative time for this method. Each main line is followed by several lines breaking down all the methods that call this
callee
method, giving the percentage amongst them in terms of time, the cumulative time, the count of calls, and the
name of the
caller
method. The methods are converted into normal Java source code syntax. The main lines are sorted by the time actually spent in the method the third field,
localtime
, of the nonindented lines. The biggest drawback to the 1.1 profile output is that threads are not indicated at all. This means
that it is possible to get time values for method calls that are longer than the total time spent in running the application, since all the call times from multiple threads are added together. It also
means that you cannot determine from which thread a particular method call was made.
- 39 - Nevertheless, after re-sorting the section on the time field, rather than the count field, the profile
data is useful enough to suffice as a method profiler when you have no better alternative.
One problem Ive encountered is the limited size of the list of methods that can be held by the internal profiler. Technically, this limitation is 10,001 entries in the profile table, and there is
presumably one entry per method. There are four methods that help you avoid the limitation by profiling only a small section of your code:
sun.misc.VM.suspendJavaMonitor sun.misc.VM.resumeJavaMonitor
sun.misc.VM.resetJavaMonitor sun.misc.VM.writeJavaMonitorReport
These methods also allow you some control over which parts of your application are profiled and when to dump the results.
2.4 Object-Creation Profiling