- 170 -
public void println {if TUNING wrappedOut.println ;} public void printlnboolean x {if TUNING wrappedOut.printlnx;}
public void printlnchar x {if TUNING wrappedOut.printlnx;} public void printlnchar[] x {if TUNING wrappedOut.printlnx;}
public void printlndouble x {if TUNING wrappedOut.printlnx;} public void printlnfloat x {if TUNING wrappedOut.printlnx;}
public void printlnint x {if TUNING wrappedOut.printlnx;} public void printlnlong x {if TUNING wrappedOut.printlnx;}
public void printlnObject x {if TUNING wrappedOut.printlnx;} public void printlnString x {if TUNING wrappedOut.printlnx;}
public void writebyte[] x, int y, int z { if TUNING wrappedOut.writex,y,z;}
public void writeint x {if TUNING wrappedOut.writex;} }
8.2 Logging
Logging always degrades performance. The penalty you pay depends to some extent on how logging is done. One possibility is using a final static variable to enable logging, as in the following
code:
public final static boolean LOGGING = true; ...
if LOGGING System.out.println...;
This code allows you to remove the logging code during compilation. If the
LOGGING
flag is set to
false
before compilation, the compiler eliminates the debugging code.
[2]
This approach works well when you need a lot of debugging code during development but dont want to carry the code into
your finished application. You can use a similar technique for when you do want logging capabilities during deployment, by compiling with logging features but setting the boolean at
runtime.
[2]
See Section 6.1.4
, and Section 3.5.1.4
.
An alternative technique is to use a logging object:
public class LogWriter { public static LogWriter TheLogger = sessionLogger ;
... }
... LogWriter.TheLogger.log...
This technique allows you to specify various
LogWriter
objects. Examples include a null log writer that has an empty
log
method, a file log writer that logs to file, a sysout log writer logging to
System.out
, etc. Using this technique allows logging to be turned on after an application has started. It can even install a new type of log writer after deployment, which can be useful for some
applications. However, be aware that any deployed logging capabilities should not do too much logging or even decide whether to log too often, or performance will suffer.
Personally, I prefer to deploy an application with a simple set of logging features still in place. But I first establish that the logging features do not slow down the application.
- 171 -
8.3 From Raw IO to Smokin IO
So far we have looked only at general points about IO and logging. Now we look at an example of tuning IO performance. The example consists of reading lines from a large file. This section was
inspired from an article from Sun Engineering,
[3]
though I go somewhat further along the tuning cycle.
[3]
Java Performance IO Tuning, Java Developers Journal, Volume 2, Issue 11. See http:www.JavaDevelopersJournal.com
.
The initial attempt at file IO might be to use the
FileInputStream
to read through a file. Note that
DataInputStream
has a
readLine
method now deprecated because it is
byte
-based rather than
char
-based, but ignore that for the moment, so you wrap the
FileInputStream
with the
DataInputStream
, and run. The code looks like:
DataInputStream in = new DataInputStreamnew FileInputStreamfile; while line = in.readLine = null
{ doSomethingWithline;
} in.close ;
For these timing tests, I use two different files, a 1.8-MB file with about 20,000 lines long lines, and a one-third of a megabyte file with about 34,000 lines short lines. I will test using several
VMs to show the variations across VMs and the challenges in improving performance across different runtime environments. To make comparisons simpler, I report the times as normalized to
100 for the JDK 1.2 VM with JIT. The long-line case and the short-line case will be normalized separately. Tests are averages across at least three test runs. For the baseline test, I have the
following chart see Tables
Table 8-1 and
Table 8-2 for full results. Note that the HotSpot results
are those for the second run of tests, after HotSpot has had a chance to apply its optimizations.
Normalized read times on
Long Line 1.2
Long Line 1.3
Long Line HotSpot
Long Line 1.1.6
Short Line 1.2
Short Line 1.3
Short Line HotSpot
Short Line 1.1.6
Unbuffered input stream
100
[4]
86 84 69
100 84 94 67
[4]
The short-line 1.2 and long-line 1.2 cases have been separately normalized to 100. All short-line times are relative to the short-line 1.2, and all long-line times are relative to the long-line 1.2.
The first test in absolute times is really dreadful, because you are executing IO one byte at a time. This performance is the result of using a plain
FileInputStream
without buffering the IO, because the process is completely IO-bound. For this reason, I expected the absolute times of the various
VMs to be similar, since the CPU is not the bottleneck. But curiously, they are varied. Possibly the underlying native call implementation may be different between 1.1.6 and 1.2, but I am not
interested enough to spend time deciding why there should be differences for the unbuffered case. After all, no one uses unbuffered IO. Everyone knows you should buffer your IO except when
memory is really at a premium, like in an embedded system.
So lets immediately move to wrap the
FileInputStream
with a
BufferedInputStream.
[5]
The code has only slight changes, in the constructor:
[5]
Buffering IO does not require the use of buffered class. You can buffer IO directly from the
FileInputStream
class and other low-level classes by passing arrays to the
read
and
write
methods. This means you need to handle buffer overflows yourself.
DataInputStream in = new DataInputStreamnew FileInputStreamfile; DataInputStream in = new DataInputStream
- 172 -
new BufferedInputStreamnew FileInputStreamfile; while line = in.readLine = null
{ doSomethingWithline;
} in.close ;
However, the times are already faster by an order of magnitude, as you can see in the following chart:
Normalized read times on
Long Line 1.2
Long Line 1.3
Long Line HotSpot
Long Line 1.1.6
Short Line 1.2
Short Line 1.3
Short Line HotSpot
Short Line 1.1.6
Unbuffered input stream
100
[6]
86 84 69
100 84 94 67
Buffered input stream
5 3 2 9 8 3 4
12
[6]
The short-line 1.2 and long-line 1.2 cases have been separately normalized to 100. All short-line times are relative to the short-line 1.2, and all long-line times are relative to the long-line 1.2.
The lesson is clear, if you havent already had it drummed home somewhere else: buffered IO performs much better than unbuffered IO. Having established that buffered IO is better than
unbuffered, you renormalize your times on the buffered IO case so that you can compare any improvements against the normal case.
So far, we have used only the default buffer, which is a 2048-byte buffer contrary to the JDK 1.1.6 documentation, which states it is 512 bytes; always check the source on easily changeable things
like this. Perhaps a larger buffer would be better. Lets try 8192 bytes:
DataInputStream in = new DataInputStreamnew FileInputStreamfile; DataInputStream in = new DataInputStream
new BufferedInputStreamnew FileInputStreamfile; DataInputStream in = new DataInputStream
new BufferedInputStreamnew FileInputStreamfile, 8192; while line = in.readLine = null
{ doSomethingWithline;
} in.close ;
Normalized read times on
Long Line 1.2
Long Line 1.3
Long Line HotSpot
Long Line 1.1.6
Short Line 1.2
Short Line 1.3
Short Line HotSpot
Short Line 1.1.6
Unbuffered input stream
1951 1684 1641 1341 1308 1101 1232
871 Buffered input
stream 100
[7]
52 45
174 100 33 54
160 8K buffered input
stream 102 50 48
225 101 31 54 231
[7]
The short-line 1.2 and long-line 1.2 cases have been separately normalized to 100. All short-line times are relative to the short-line 1.2, and all long-line times are relative to the long-line 1.2.
The variations are large, but there is a mostly consistent pattern. The 8K buffer doesnt seem to be significantly better than the default. I find this exception curious enough to do some further testing.
One variation in testing is to repeat the test several times in the same VM process. Sometimes this can highlight obscure differences. Doing this, I find that if I repeat the test in the same JDK 1.2 VM
process, the second test run is consistently faster for the 8K buffered stream. The entry for this
- 173 - second time is 75. I cannot identify why this happens, and I do not want to get sidetracked
debugging the JDK just now, so well move on with the tuning process.
Lets get back to the fact that we are using a deprecated method,
readLine
. You should really be using
Reader
s instead of
InputStream
s, according to the Java docs, for full portability, etc. Lets move to
Reader
s, and what it costs us:
DataInputStream in = new DataInputStreamnew FileInputStreamfile; DataInputStream in = new DataInputStream
new BufferedInputStreamnew FileInputStreamfile; DataInputStream in = new DataInputStream
new BufferedInputStreamnew FileInputStreamfile, 8192; BufferedReader in = new BufferedReadernew FileReaderfile;
while line = in.readLine = null {
doSomethingWithline; }
in.close ;
Normalized read times on
Long Line 1.2
Long Line 1.3
Long Line HotSpot
Long Line 1.1.6
Short Line 1.2
Short Line 1.3
Short Line HotSpot
Short Line 1.1.6
Buffered input stream
100
[8]
52 45 174 100 33 54
160 8K buffered input
stream 102 50 48
225 101 31 54 231
Buffered reader 47
43 41
43 111
39 45
127
[8]
The short-line 1.2 and long-line 1.2 cases have been separately normalized to 100. All short-line times are relative to the short-line 1.2, and all long-line times are relative to the long-line 1.2.
These results tell us that someone at Sun spent time optimizing
Reader
s. You can reasonably use
Reader
s in most situations where you would have used an
InputStream
. Some situations can show a performance decrease, but generally there is a performance increase.
Now lets get down to some real tuning. So far we have just been working from bad coding to good working practice. The final version so far uses buffered
Reader
classes for IO, as recommended by Sun. Can we do better? Well of course, but now lets get down and get dirty. You know from
general tuning practices that creating objects is overhead you should try to avoid. Up until now, we have used the
readLine
method, which returns a string. Suppose you work on that string and then discard it, as is the typical situation. You would do better to avoid the
String
creation altogether. Also, if you want to process the
String
, then for performance purposes you are better off working directly on the underlying
char
array. Working on
char
arrays is quicker, since you can avoid the
String
method overhead or, more likely, the need to copy the
String
into a
char
array buffer to work on it. See Chapter 5
, for more details on this technique. Basically, this means that you need to implement the
readLine
functionality with your own buffer, while passing the buffer to the method that does the string processing. The following
implementation uses its own
char
array buffer. It reads in characters to fill the buffer, then runs through the buffer looking for ends of lines. Each time the end of a line is found, the buffer,
together with the start and end index of the line in that buffer, is passed to the
doSomething
method for processing. This implementation avoids both the
String
-creation overhead and the subsequent
String
-processing overhead, but these are not included in any timings here. The only complication comes when you reach the end of the buffer, and you need to fill it with the next
chunk from the file, but you also need to retain the line fragment from the end of the last chunk. It is unlikely your 8192-
char
chunk will end exactly on an end of line, so there are almost always some characters left to be carried over to the next chunk. To handle this, simply copy the characters to the
- 174 - beginning of the buffer and read in the next chunk into the buffer starting from after those
characters. The commented code looks like this:
public static void myReaderString string throws IOException
{ Do the processing myself, directly from a FileReader
But dont create strings for each line, just leave it as a char array
FileReader in = new FileReaderstring; int defaultBufferSize = 8192;
int nextChar = 0; char[] buffer = new char[defaultBufferSize];
char c; int leftover;
int length_read; int startLineIdx = 0;
First fill the buffer once before we start int nChars = in.readbuffer, 0, defaultBufferSize;
boolean checkFirstOfChunk = false; for;;
{ Work through the buffer looking for end of line characters.
Note that the JDK does the eol search as follows: It hardcodes both of the characters \r and \n as end
of line characters, and considers either to signify the end of the line. In addition, if the end of line character
is determined to be \r, and the next character is \n, it winds past the \n. This way it allows the reading of
lines from files written on any of the three systems currently supported Unix with \n, Windows with \r\n,
and Mac with \r, even if you are not running on any of these. for ; nextChar nChars; nextChar++
{ if c = buffer[nextChar] == \n || c == \r
{ We found a line, so pass it for processing
doSomethingWithbuffer, startLineIdx, nextChar-1; And then increment the cursors. nextChar is
automatically incremented by the loop, so only need to worry if c is \r
if c == \r {
need to consider if we are at end of buffer if nextChar == nChars - 1
checkFirstOfChunk = true; else if buffer[nextChar+1] == \n
nextChar++; }
startLineIdx = nextChar + 1; }
} leftover = 0;
if startLineIdx nChars {
We have some characters left over at the end of the chunk. So carry them over to the beginning of the next chunk.
leftover = nChars - startLineIdx;
- 175 -
System.arraycopybuffer, startLineIdx, buffer, 0, leftover; }
do {
length_read = in.readbuffer, leftover, buffer.length-leftover ;
} while length_read == 0; if length_read 0
{ nextChar -= nChars;
nChars = leftover + length_read; startLineIdx = nextChar;
if checkFirstOfChunk {
checkFirstOfChunk = false; if buffer[0] == \n
{ nextChar++;
startLineIdx = nextChar; }
} }
else { EOF
in.close ; return;
} }
}
The following chart shows the new times:
Normalized read times on
Long Line 1.2
Long Line 1.3
Long Line HotSpot
Long Line 1.1.6
Short Line 1.2
Short Line 1.3
Short Line HotSpot
Short Line 1.1.6
Buffered input stream
100
[9]
52 45 174 100 33 54
160 Buffered reader
47 43
41 43
111 39
45 127
Custom-built reader
26 37 36 15 19 28 26
14
[9]
The short-line 1.2 and long-line 1.2 cases have been separately normalized to 100. All short-line times are relative to the short-line 1.2, and all long-line times are relative to the long-line 1.2.
All the timings are the best so far, and most are significantly better than before.
[10]
You can try one more thing: performing the
byte
-to-
char
conversion. The code comes from Chapter 7
, in which we looked at this conversion in detail. The changes are straightforward. Change the
FileReader
to
FileInputStream
and add a
byte
array buffer of the same size as the
char
array buffer:
[10]
Note that the HotSpot timings are, once again, for the second run of the repeated tests. No other VMs exhibited consistent variations between the first and second run tests. See
Table 8-1 and
Table 8-2 for the full set of results.
FileReader in = new FileReaderstring; this last line becomes
FileInputStream in = new FileInputStreamstring; int defaultBufferSize = 8192;
and add the byte array buffer byte[] byte_buffer = new byte[defaultBufferSize];
You also need to change the
read
calls to read into the
byte
buffer, adding a
convert
call after these. The first
read
is changed like this:
- 176 -
First fill the buffer once before we start this next line becomes a byte read followed by convert call
int nChars = in.readbuffer, 0, defaultBufferSize;
int nChars = in.readbyte_buffer, 0, defaultBufferSize; convertbyte_buffer, 0, nChars, buffer, 0, nChars, MAP3;
The second
read
in the main loop is also changed, but the conversion isnt done immediately here. Its done just after the number of characters,
nChars
, is set, a few lines later:
length_read = in.readbuffer, leftover, buffer.length-leftover ;
becomes length_read = in.readbyte_buffer, leftover,
buffer.length-leftover; } while length_read == 0;
if length_read 0 {
nextChar -= nChars; nChars = leftover + length_read;
startLineIdx = nextChar;
And add the conversion here convertbyte_buffer, leftover, nChars, buffer,
leftover, nChars, MAP3;
Measuring the performance with these changes, the times are now significantly better in almost every case, as shown in the following chart:
Normalized read times on
Long Line 1.2
Long Line 1.3
Long Line HotSpot
Long Line 1.1.6
Short Line 1.2
Short Line 1.3
Short Line HotSpot
Short Line 1.1.6
Buffered input stream
100
[11]
52 45 174 100 33
54 160
Custom-built reader 26 37
36 15
19 28
26 14
Custom reader and converter
12 18 17
10 9 21 53 8
[11]
The short-line 1.2 and long-line 1.2 cases have been separately normalized to 100. All short-line times are relative to the short-line 1.2, and all long-line times are relative to the long-line 1.2.
Only the HotSpot short-line case is worse.
[12]
All the times are now under one second, even on a slow machine. Subsecond times are notoriously variable, although in my tests the results were fairly
consistent.
[12]
This shows that HotSpot is quite variable with its optimizations. HotSpot sometimes makes an unoptimized loop faster, and sometimes the manually unrolled loop comes out faster.
Table 8-1 and
Table 8-2 show HotSpot producing both faster and slower times for the same manually unrolled loop, depending
on the data being processed i.e., short lines or long lines.
We have, however, hardcoded in the ISO 8859_1 type of
byte
-to-
char
conversion, rather than supporting the generic case where the conversion type is specified as a property. But this
conversion represents a common class of character- encoding conversions, and you could fall back on the method used in the previous test where the conversion is specified differently in the
System
property
file.encoding
. Often, you will read from files you know and whose format you understand and can predict. In those case, building in the appropriate encoding is not a problem.
Using a buffered reader is adequate for most purposes. But we have seen that it is possible to speed up IO even further if youre willing to spend the effort. Avoiding the creation of intermediate
String
s gives you a good gain. This is true for both reading and writing, and allows you to work on the
char
arrays directly. Working directly on
char
arrays is usually better for performance, but is
- 177 - also more work. In specialized cases, you might want to consider taking control of every aspect of
the IO right down to the
byte
-to-
char
encoding, but for this you need to consider how to maintain compatibility with the JDK.
Table 8-1 and
Table 8-2 summarize all the results from these experiments.
Table 8-1, Timings of the Long-Line Tests Normalized to the JDK 1.2 Buffered Input Stream Test
1.2 1.2 no JIT