Logging From Raw IO to Smokin IO

- 170 - public void println {if TUNING wrappedOut.println ;} public void printlnboolean x {if TUNING wrappedOut.printlnx;} public void printlnchar x {if TUNING wrappedOut.printlnx;} public void printlnchar[] x {if TUNING wrappedOut.printlnx;} public void printlndouble x {if TUNING wrappedOut.printlnx;} public void printlnfloat x {if TUNING wrappedOut.printlnx;} public void printlnint x {if TUNING wrappedOut.printlnx;} public void printlnlong x {if TUNING wrappedOut.printlnx;} public void printlnObject x {if TUNING wrappedOut.printlnx;} public void printlnString x {if TUNING wrappedOut.printlnx;} public void writebyte[] x, int y, int z { if TUNING wrappedOut.writex,y,z;} public void writeint x {if TUNING wrappedOut.writex;} }

8.2 Logging

Logging always degrades performance. The penalty you pay depends to some extent on how logging is done. One possibility is using a final static variable to enable logging, as in the following code: public final static boolean LOGGING = true; ... if LOGGING System.out.println...; This code allows you to remove the logging code during compilation. If the LOGGING flag is set to false before compilation, the compiler eliminates the debugging code. [2] This approach works well when you need a lot of debugging code during development but dont want to carry the code into your finished application. You can use a similar technique for when you do want logging capabilities during deployment, by compiling with logging features but setting the boolean at runtime. [2] See Section 6.1.4 , and Section 3.5.1.4 . An alternative technique is to use a logging object: public class LogWriter { public static LogWriter TheLogger = sessionLogger ; ... } ... LogWriter.TheLogger.log... This technique allows you to specify various LogWriter objects. Examples include a null log writer that has an empty log method, a file log writer that logs to file, a sysout log writer logging to System.out , etc. Using this technique allows logging to be turned on after an application has started. It can even install a new type of log writer after deployment, which can be useful for some applications. However, be aware that any deployed logging capabilities should not do too much logging or even decide whether to log too often, or performance will suffer. Personally, I prefer to deploy an application with a simple set of logging features still in place. But I first establish that the logging features do not slow down the application. - 171 -

8.3 From Raw IO to Smokin IO

So far we have looked only at general points about IO and logging. Now we look at an example of tuning IO performance. The example consists of reading lines from a large file. This section was inspired from an article from Sun Engineering, [3] though I go somewhat further along the tuning cycle. [3] Java Performance IO Tuning, Java Developers Journal, Volume 2, Issue 11. See http:www.JavaDevelopersJournal.com . The initial attempt at file IO might be to use the FileInputStream to read through a file. Note that DataInputStream has a readLine method now deprecated because it is byte -based rather than char -based, but ignore that for the moment, so you wrap the FileInputStream with the DataInputStream , and run. The code looks like: DataInputStream in = new DataInputStreamnew FileInputStreamfile; while line = in.readLine = null { doSomethingWithline; } in.close ; For these timing tests, I use two different files, a 1.8-MB file with about 20,000 lines long lines, and a one-third of a megabyte file with about 34,000 lines short lines. I will test using several VMs to show the variations across VMs and the challenges in improving performance across different runtime environments. To make comparisons simpler, I report the times as normalized to 100 for the JDK 1.2 VM with JIT. The long-line case and the short-line case will be normalized separately. Tests are averages across at least three test runs. For the baseline test, I have the following chart see Tables Table 8-1 and Table 8-2 for full results. Note that the HotSpot results are those for the second run of tests, after HotSpot has had a chance to apply its optimizations. Normalized read times on Long Line 1.2 Long Line 1.3 Long Line HotSpot Long Line 1.1.6 Short Line 1.2 Short Line 1.3 Short Line HotSpot Short Line 1.1.6 Unbuffered input stream 100 [4] 86 84 69 100 84 94 67 [4] The short-line 1.2 and long-line 1.2 cases have been separately normalized to 100. All short-line times are relative to the short-line 1.2, and all long-line times are relative to the long-line 1.2. The first test in absolute times is really dreadful, because you are executing IO one byte at a time. This performance is the result of using a plain FileInputStream without buffering the IO, because the process is completely IO-bound. For this reason, I expected the absolute times of the various VMs to be similar, since the CPU is not the bottleneck. But curiously, they are varied. Possibly the underlying native call implementation may be different between 1.1.6 and 1.2, but I am not interested enough to spend time deciding why there should be differences for the unbuffered case. After all, no one uses unbuffered IO. Everyone knows you should buffer your IO except when memory is really at a premium, like in an embedded system. So lets immediately move to wrap the FileInputStream with a BufferedInputStream. [5] The code has only slight changes, in the constructor: [5] Buffering IO does not require the use of buffered class. You can buffer IO directly from the FileInputStream class and other low-level classes by passing arrays to the read and write methods. This means you need to handle buffer overflows yourself. DataInputStream in = new DataInputStreamnew FileInputStreamfile; DataInputStream in = new DataInputStream - 172 - new BufferedInputStreamnew FileInputStreamfile; while line = in.readLine = null { doSomethingWithline; } in.close ; However, the times are already faster by an order of magnitude, as you can see in the following chart: Normalized read times on Long Line 1.2 Long Line 1.3 Long Line HotSpot Long Line 1.1.6 Short Line 1.2 Short Line 1.3 Short Line HotSpot Short Line 1.1.6 Unbuffered input stream 100 [6] 86 84 69 100 84 94 67 Buffered input stream 5 3 2 9 8 3 4 12 [6] The short-line 1.2 and long-line 1.2 cases have been separately normalized to 100. All short-line times are relative to the short-line 1.2, and all long-line times are relative to the long-line 1.2. The lesson is clear, if you havent already had it drummed home somewhere else: buffered IO performs much better than unbuffered IO. Having established that buffered IO is better than unbuffered, you renormalize your times on the buffered IO case so that you can compare any improvements against the normal case. So far, we have used only the default buffer, which is a 2048-byte buffer contrary to the JDK 1.1.6 documentation, which states it is 512 bytes; always check the source on easily changeable things like this. Perhaps a larger buffer would be better. Lets try 8192 bytes: DataInputStream in = new DataInputStreamnew FileInputStreamfile; DataInputStream in = new DataInputStream new BufferedInputStreamnew FileInputStreamfile; DataInputStream in = new DataInputStream new BufferedInputStreamnew FileInputStreamfile, 8192; while line = in.readLine = null { doSomethingWithline; } in.close ; Normalized read times on Long Line 1.2 Long Line 1.3 Long Line HotSpot Long Line 1.1.6 Short Line 1.2 Short Line 1.3 Short Line HotSpot Short Line 1.1.6 Unbuffered input stream 1951 1684 1641 1341 1308 1101 1232 871 Buffered input stream 100 [7] 52 45 174 100 33 54 160 8K buffered input stream 102 50 48 225 101 31 54 231 [7] The short-line 1.2 and long-line 1.2 cases have been separately normalized to 100. All short-line times are relative to the short-line 1.2, and all long-line times are relative to the long-line 1.2. The variations are large, but there is a mostly consistent pattern. The 8K buffer doesnt seem to be significantly better than the default. I find this exception curious enough to do some further testing. One variation in testing is to repeat the test several times in the same VM process. Sometimes this can highlight obscure differences. Doing this, I find that if I repeat the test in the same JDK 1.2 VM process, the second test run is consistently faster for the 8K buffered stream. The entry for this - 173 - second time is 75. I cannot identify why this happens, and I do not want to get sidetracked debugging the JDK just now, so well move on with the tuning process. Lets get back to the fact that we are using a deprecated method, readLine . You should really be using Reader s instead of InputStream s, according to the Java docs, for full portability, etc. Lets move to Reader s, and what it costs us: DataInputStream in = new DataInputStreamnew FileInputStreamfile; DataInputStream in = new DataInputStream new BufferedInputStreamnew FileInputStreamfile; DataInputStream in = new DataInputStream new BufferedInputStreamnew FileInputStreamfile, 8192; BufferedReader in = new BufferedReadernew FileReaderfile; while line = in.readLine = null { doSomethingWithline; } in.close ; Normalized read times on Long Line 1.2 Long Line 1.3 Long Line HotSpot Long Line 1.1.6 Short Line 1.2 Short Line 1.3 Short Line HotSpot Short Line 1.1.6 Buffered input stream 100 [8] 52 45 174 100 33 54 160 8K buffered input stream 102 50 48 225 101 31 54 231 Buffered reader 47 43 41 43 111 39 45 127 [8] The short-line 1.2 and long-line 1.2 cases have been separately normalized to 100. All short-line times are relative to the short-line 1.2, and all long-line times are relative to the long-line 1.2. These results tell us that someone at Sun spent time optimizing Reader s. You can reasonably use Reader s in most situations where you would have used an InputStream . Some situations can show a performance decrease, but generally there is a performance increase. Now lets get down to some real tuning. So far we have just been working from bad coding to good working practice. The final version so far uses buffered Reader classes for IO, as recommended by Sun. Can we do better? Well of course, but now lets get down and get dirty. You know from general tuning practices that creating objects is overhead you should try to avoid. Up until now, we have used the readLine method, which returns a string. Suppose you work on that string and then discard it, as is the typical situation. You would do better to avoid the String creation altogether. Also, if you want to process the String , then for performance purposes you are better off working directly on the underlying char array. Working on char arrays is quicker, since you can avoid the String method overhead or, more likely, the need to copy the String into a char array buffer to work on it. See Chapter 5 , for more details on this technique. Basically, this means that you need to implement the readLine functionality with your own buffer, while passing the buffer to the method that does the string processing. The following implementation uses its own char array buffer. It reads in characters to fill the buffer, then runs through the buffer looking for ends of lines. Each time the end of a line is found, the buffer, together with the start and end index of the line in that buffer, is passed to the doSomething method for processing. This implementation avoids both the String -creation overhead and the subsequent String -processing overhead, but these are not included in any timings here. The only complication comes when you reach the end of the buffer, and you need to fill it with the next chunk from the file, but you also need to retain the line fragment from the end of the last chunk. It is unlikely your 8192- char chunk will end exactly on an end of line, so there are almost always some characters left to be carried over to the next chunk. To handle this, simply copy the characters to the - 174 - beginning of the buffer and read in the next chunk into the buffer starting from after those characters. The commented code looks like this: public static void myReaderString string throws IOException { Do the processing myself, directly from a FileReader But dont create strings for each line, just leave it as a char array FileReader in = new FileReaderstring; int defaultBufferSize = 8192; int nextChar = 0; char[] buffer = new char[defaultBufferSize]; char c; int leftover; int length_read; int startLineIdx = 0; First fill the buffer once before we start int nChars = in.readbuffer, 0, defaultBufferSize; boolean checkFirstOfChunk = false; for;; { Work through the buffer looking for end of line characters. Note that the JDK does the eol search as follows: It hardcodes both of the characters \r and \n as end of line characters, and considers either to signify the end of the line. In addition, if the end of line character is determined to be \r, and the next character is \n, it winds past the \n. This way it allows the reading of lines from files written on any of the three systems currently supported Unix with \n, Windows with \r\n, and Mac with \r, even if you are not running on any of these. for ; nextChar nChars; nextChar++ { if c = buffer[nextChar] == \n || c == \r { We found a line, so pass it for processing doSomethingWithbuffer, startLineIdx, nextChar-1; And then increment the cursors. nextChar is automatically incremented by the loop, so only need to worry if c is \r if c == \r { need to consider if we are at end of buffer if nextChar == nChars - 1 checkFirstOfChunk = true; else if buffer[nextChar+1] == \n nextChar++; } startLineIdx = nextChar + 1; } } leftover = 0; if startLineIdx nChars { We have some characters left over at the end of the chunk. So carry them over to the beginning of the next chunk. leftover = nChars - startLineIdx; - 175 - System.arraycopybuffer, startLineIdx, buffer, 0, leftover; } do { length_read = in.readbuffer, leftover, buffer.length-leftover ; } while length_read == 0; if length_read 0 { nextChar -= nChars; nChars = leftover + length_read; startLineIdx = nextChar; if checkFirstOfChunk { checkFirstOfChunk = false; if buffer[0] == \n { nextChar++; startLineIdx = nextChar; } } } else { EOF in.close ; return; } } } The following chart shows the new times: Normalized read times on Long Line 1.2 Long Line 1.3 Long Line HotSpot Long Line 1.1.6 Short Line 1.2 Short Line 1.3 Short Line HotSpot Short Line 1.1.6 Buffered input stream 100 [9] 52 45 174 100 33 54 160 Buffered reader 47 43 41 43 111 39 45 127 Custom-built reader 26 37 36 15 19 28 26 14 [9] The short-line 1.2 and long-line 1.2 cases have been separately normalized to 100. All short-line times are relative to the short-line 1.2, and all long-line times are relative to the long-line 1.2. All the timings are the best so far, and most are significantly better than before. [10] You can try one more thing: performing the byte -to- char conversion. The code comes from Chapter 7 , in which we looked at this conversion in detail. The changes are straightforward. Change the FileReader to FileInputStream and add a byte array buffer of the same size as the char array buffer: [10] Note that the HotSpot timings are, once again, for the second run of the repeated tests. No other VMs exhibited consistent variations between the first and second run tests. See Table 8-1 and Table 8-2 for the full set of results. FileReader in = new FileReaderstring; this last line becomes FileInputStream in = new FileInputStreamstring; int defaultBufferSize = 8192; and add the byte array buffer byte[] byte_buffer = new byte[defaultBufferSize]; You also need to change the read calls to read into the byte buffer, adding a convert call after these. The first read is changed like this: - 176 - First fill the buffer once before we start this next line becomes a byte read followed by convert call int nChars = in.readbuffer, 0, defaultBufferSize; int nChars = in.readbyte_buffer, 0, defaultBufferSize; convertbyte_buffer, 0, nChars, buffer, 0, nChars, MAP3; The second read in the main loop is also changed, but the conversion isnt done immediately here. Its done just after the number of characters, nChars , is set, a few lines later: length_read = in.readbuffer, leftover, buffer.length-leftover ; becomes length_read = in.readbyte_buffer, leftover, buffer.length-leftover; } while length_read == 0; if length_read 0 { nextChar -= nChars; nChars = leftover + length_read; startLineIdx = nextChar; And add the conversion here convertbyte_buffer, leftover, nChars, buffer, leftover, nChars, MAP3; Measuring the performance with these changes, the times are now significantly better in almost every case, as shown in the following chart: Normalized read times on Long Line 1.2 Long Line 1.3 Long Line HotSpot Long Line 1.1.6 Short Line 1.2 Short Line 1.3 Short Line HotSpot Short Line 1.1.6 Buffered input stream 100 [11] 52 45 174 100 33 54 160 Custom-built reader 26 37 36 15 19 28 26 14 Custom reader and converter 12 18 17 10 9 21 53 8 [11] The short-line 1.2 and long-line 1.2 cases have been separately normalized to 100. All short-line times are relative to the short-line 1.2, and all long-line times are relative to the long-line 1.2. Only the HotSpot short-line case is worse. [12] All the times are now under one second, even on a slow machine. Subsecond times are notoriously variable, although in my tests the results were fairly consistent. [12] This shows that HotSpot is quite variable with its optimizations. HotSpot sometimes makes an unoptimized loop faster, and sometimes the manually unrolled loop comes out faster. Table 8-1 and Table 8-2 show HotSpot producing both faster and slower times for the same manually unrolled loop, depending on the data being processed i.e., short lines or long lines. We have, however, hardcoded in the ISO 8859_1 type of byte -to- char conversion, rather than supporting the generic case where the conversion type is specified as a property. But this conversion represents a common class of character- encoding conversions, and you could fall back on the method used in the previous test where the conversion is specified differently in the System property file.encoding . Often, you will read from files you know and whose format you understand and can predict. In those case, building in the appropriate encoding is not a problem. Using a buffered reader is adequate for most purposes. But we have seen that it is possible to speed up IO even further if youre willing to spend the effort. Avoiding the creation of intermediate String s gives you a good gain. This is true for both reading and writing, and allows you to work on the char arrays directly. Working directly on char arrays is usually better for performance, but is - 177 - also more work. In specialized cases, you might want to consider taking control of every aspect of the IO right down to the byte -to- char encoding, but for this you need to consider how to maintain compatibility with the JDK. Table 8-1 and Table 8-2 summarize all the results from these experiments. Table 8-1, Timings of the Long-Line Tests Normalized to the JDK 1.2 Buffered Input Stream Test 1.2 1.2 no JIT