Line Filter Example HotSpot 1.0

- 115 - StreamTokenizer using JDK 1.2 with the JIT compiler see Table 5-5 . Interestingly, the test takes almost the same amount of time when I run using the StreamTokenizer without the JIT compiler running. Depending on the file I run with, sometimes the JIT VM turns out slower than the non-JIT VM with the StreamTokenizer test. Table 5-5, Word Counter Timings Using wordcount or cwordcount Methods VM 1.2 1.2 no JIT

1.3 HotSpot 1.0

1.1.6 wordcount 100 104 152 199 88 cwordcount 0.7 9 1 3 0.6 These results are already quite curious. When I run the test with the char array implementation, it takes 9 of the normalized time without the JIT running, and 0.7 of the time with the JIT turned on. I suspect the curious results and huge discrepancy may have something to do with StreamTokenizer being a severely underoptimized class, as well as being too generic a tool for this particular test. Looking at object usage, [4] you find that the StreamTokenizer implementation winds through 1.2 million temporary objects, whereas the char array implementation uses only around 20 objects. Now you can understand the curious results. Object-creation differences of this order of magnitude impose a huge overhead on the StreamTokenizer implementation, explaining why the StreamTokenizer is so much slower than the char array implementation. The object-creation overhead also explains why both the JIT and non-JIT tests took similar times for the StreamTokenizer . Object creation requires similar amounts of time in both types of VM, and clearly the performance of the StreamTokenizer is limited by the number of objects it uses see Chapter 4 , for further details. [4] Object monitoring is easily done using the monitoring tools from Chapter 2 : both the object-creation monitor detailed there, and also separately by using the -verbosegc option while adding an explicit System.gc at the end of the test.

5.4.2 Line Filter Example

For the filter to select lines of a file, Ill use the simple BufferedReader.readLine method. This contrasts with the previous methodology using a dedicated class StreamTokenizer , which turned out to be extremely inefficient. The readline method should present us with more of a performance-tuning challenge, since it is relatively much simpler and so should be more efficient. The filter using BufferedReader and String s is easily implemented. I include an option to print only the count of matching lines: public static void filterString filter, String filename, boolean print throws IOException { count = 0; just open the file BufferedReader rdr = new BufferedReadernew FileReaderfilename; String line; and read each line while line = rdr.readLine = null { choosing those lines that include the sought after string if line.indexOffilter = -1 { count++; if print System.out.printlnline; - 116 - } } System.out.printlncount + lines matched.; rdr.close ; } Now lets consider how to handle this filter using char arrays. As in the previous example, you read into your char array using a FileReader . However, this example is a bit more complicated than the last word-count example. Here you need to test for a match against another char array, look for line endings, and handle reforming lines that are broken between read calls in a more complete manner than for the word count. Internationalization doesnt change this example in any obvious way. Both the readLine implementation and the char array implementation stay the same whatever language the text contains. This statement about internationalization is slightly disingenuous. In fact, searches in some languages allow words to match even if they are spelled differently. For example, when searching for a French word that contains an accented letter, the user might expect a nonaccented spelling to match. This is similar to searching for the word color and expecting to also match the British spelling colour. Such sophistication depends on how extensively the application supports this variation in spelling. The java.text.Collator class has four strength levels that support variations in the precision of word comparisons. Both implementations for the example in this section correspond to matches using the Collator.IDENTICAL strength together with the Collator.NO_DECOMPOSITION mode. The full commented listing for the char array implementation is shown shortly. Looking at the code, it is clearly more complicated than using the BufferedReader.readLine . Obviously you have to work a lot harder to get the performance you want. The result, though, is that some tests run as much as five times faster using the char array implementation see Table 5-6 and Table 5-7 . The line lengths of the test files makes a big difference, hence the variation in results. [5] In addition, the char array implementation uses only 1 of the number of objects compared to the BufferedReader.readLine implementation. [5] The HotSpot VMs seem better able to optimize the BufferedReader.readLine implementation. Consequently, there are a few long line measurements where the BufferedReader.readLine implementation actually ran faster than the char array implementation. But while the HotSpot BufferedReader.readLine implementation times are faster than the JIT times, the char array implementation times are significantly slower than the JIT VM times, indicating that HotSpot technology still has a little way to go to achieve its full potential. Table 5-6, Filter Timings Using filter or cfilter method on a Short-Line File VM 1.2

1.3 HotSpot 1.0