Performance Checklist IO, Logging, and Console Output

- 190 - There are several advantages to this technique of searching directly against compressed data: • There is no need to decompress a large amount of data. • Searches are actually quicker because the search is against a smaller volume of data. • More data can be held in memory simultaneously since it is compressed, which can be especially important for searching through large volumes of disk stored data. It is rarely possible to search for compressed substrings directly in compressed data because of the way most compression algorithms use tables covering the whole dataset. However, this scheme has been used to selectively query for data locations. For this usage, unique data keys are compressed separately from the rest of the data. A pointer is stored next to the compressed key. This produces a compressed index table that can be searched without decompressing the keys. The compression algorithm is separately applicable for each key. This scheme allows compressed keys to be searched directly to identify the location of the corresponding data.

8.7 Performance Checklist

Most of these suggestions apply only after a bottleneck has been identified: • Ensure that performance tests are run with the same amount of IO as the expected finished application. Specifically, turn off any extra logging, tracing, and debugging IO. • Use Runtime.traceMethodCalls , when supported, to count IO calls. o Redefine the IO classes to count IO calls if necessary. o Include logging statements next to all basic IO calls in the application. • Parallelize IO by splitting data into multiple files. • Execute IO in a background thread. • Avoid the filesystem file-growing overhead by preallocating files. • Try to minimize the number of IO calls. o Buffer to reduce the number of IO operations by increasing the amount of data transfer each IO operation executes. o Cache to replace repeated IO operations with much faster memory or local disk access. o Avoid or reduce IO calls in loops. o Replace System.out and System.err with customized PrintStream classes to control console output. o Use logger objects for tight control in specifying logging destinations. o Try to eliminate duplicate and unproductive IO statements. o Keep files open and navigate around them rather than repeatedly opening and closing the files. • Consider optimizing the Java byte -to- char and char -to- byte conversion. • Handle serializing explicitly, rather than using default serialization mechanisms. o Use transient fields to avoid serialization. o Use the java.io.Externalizable interface if overriding the default serialization routines. o Use change-logs for small changes, rather than reserializing the whole object. o Minimize the work done in the no-arg constructor. o Consider partitioning objects into multiple sets and serializing each set concurrently in different threads. o Use lazy initialization to move or spread the deserialization overhead to other times. o Consider indexing an object table for selective access to stored serialized objects. o Optimize network transfers by transferring only the data and objects needed, and no more. - 191 - o Cluster serialized objects that are used together by putting them into the same file. o Put objects next to each other if they are required together. o Consider using an object-storage system such as an object database if your object- storage requirements are at all sophisticated. • Use compression when the overhead of compression is outweighed by the benefit of reducing IO. o Avoid compression when the system has a heavily loaded CPU. o Consider using intelligent IO classes that can decide to use compression on the fly. o Consider searching directly against compressed data without decompressing.

9.1 Avoiding Unnecessary Sorting Overhead