- 234 -
[2]
The classic reference is The Art of Computer Programming by Donald Knuth Addison Wesley. A more Java-specific book is Data Structures and Algorithm Analysis in Java by Mark Weiss Peachpit Press.
When tuning, you often need to switch one implementation of a class for another, more optimal implementation. Switching data structures is easier because you are in an object-oriented
environment, so you can usually replace one or a few classes with different implementations while keeping all the interfaces and signatures the same.
When tuning algorithms, one factor that should pop to the front of your mind concerns the scaling characteristics of the algorithms you use. For example, bubblesort is an On2 algorithm, while
quicksort is Onlogn. The concept of order of magnitude statistics is described in Section 9.3
. This tells you nothing about absolute times for using either of these algorithms for sorting elements,
but it does clearly tell you that quicksort has the better scaling characteristics, and so is likely to be the better candidate as your collections increase in size. Similarly, hash tables have an O1
searching algorithm where an array requires On searching.
11.1 Collections
Collections are the data structures that are most easily altered for performance-tuning purposes. Using the correct or most appropriate collection class can improve performance with little change to
code. For example, if a large ordered collection has elements frequently deleted or inserted throughout it, it usually can provide better performance if based on a linked list rather than an array.
On the other hand, a static unchanging collection that needs to be accessed by index performs better with an underlying implementation that is an array .
If the data is large and insertions are allowed for example, a text buffer, then a common halfway measure is to use a linked list of arrays. This structure copies data within a single array when data is
inserted or deleted. When an array gets filled, the collection inserts a new empty array immediately after the full array, and moves some data from the full to the empty array so that both old and new
arrays have space. A converse structure provides optimized indexed access to a linked-list structure by holding an array of a subset of the link nodes e.g., every 20th node. This structure allows for
quick navigation to the indexed nodes, and then slower nodal access to nodes in between.
[3]
The result is a linked-list implementation that is much faster at index access, though it occupies more
space.
[3]
Skip lists are an implementation of this concept. See The Elegant and Fast Skip List by T. Wenger, Java Pro, April-May 1998.
It is sometimes useful to provide two collections holding the same data, so that the data can be accessed using the most appropriate and fastest procedure. This is common for indexed data
database-type indexes as opposed to array indexes, but entails extra overhead at the build stage. In a similar way, it may be that a particular data set is best held in two or more different collections
over its lifetime, but with only one collection being used at any one time. For example, you may use a linked-list implementation of a vector type collection while building the collection because your
collection requires many insertions while it is being built. However, this provides suboptimal random access. After the build is completed, the collection can be converted into one based on an
array, thus speeding up access.
It can be difficult to identify optimal algorithms for particular data structures. For example, in the Java 2
java.util.Collections.sort
method, a linked list is first converted to an array in order to sort it. This is detrimental to performance, and it would be significantly faster to sort a
linked list directly using a merge sort .
[4]
In any case, frequently converting between collections and arrays is likely to cause performance problems.
[4]
See Sorting and Searching Linked Lists in Java by John Boyer, Dr. Dobbs Journal, May 1998.
- 235 - The fastest ordered collections available in Java are plain arrays e.g.,
int[]
,
Object[]
, etc.. The drawback to using these directly is the lack of object-oriented methodology you can apply. Arrays
are not proper classes that can be extended. However, I occasionally find that there are situations when I want to pass these raw arrays directly between several classes, rather than wrap the arrays in
a class with the behavior required. This is unfortunate in design terms, but does provide speed. An example would be in some communications layers. Here, there are several layers of protocols you
need to pass your message through before it is transmitted, for example, a compression layer and an encryption layer. If you use an object as a message being passed through these layers, each layer
has to request the message contents copying it, change the contents, and then assign back the new contents copying again. An alternative is to implement the content-manipulation methods in the
message object itself, which is not a very extensible architecture. Assuming that you use an array to hold the contents, you can allow the message-contents array itself to be passed directly to the other
compression and encryption layer objects. This provides a big speedup, avoiding several copies.
String
objects also illustrate the point. If you want to iterate over the characters in a
String
, you must either repeatedly call
String.charAt
or copy the characters into your own array using
String.getChars
, and then iterate over them. Depending on the size of the
String
and how many times you iterate through the characters, one or the other of these methods is quicker, but if
you could iterate directly on the underlying
char
array, you would avoid the repeated method calls and the copy see
Chapter 5 .
A final point is that the collections that come with Java and other packages are usually not type- specific. This generality comes at the cost of performance. For example, if you are using
java.util.Vector
to hold only
String
objects, then you have to keep casting to
String
each time you access elements. If you reimplement the
Vector
class yourself using an underlying
String[]
array, and then change signature parameters and return types of methods from
Object
to
String
, the re-implemented class is faster. It is also clearer to use: you get rid of all those casts from your code. The cost is that you lose the general collection interface see
Section 3.2 , for an
example. It is straightforward to test the performance costs of generalized collections compared to specialized
collections. Access that does not involve a cast takes place at essentially the same speed, i.e., all the following accesses take the same time:
int i = integerArrayList.getsomeIndex; String s = stringArrayList.getsomeIndex;
Object o = objectArrayList.getsomeIndex;
But the cost of a cast can make the access take 50 longer:
It can take 50 longer to access the string because of the cast String s = String objectArrayList.getsomeIndex;
Update time can also be significantly faster. Updates to underlying arrays of primitive data types can be 40 faster than to object arrays.
[5]
The biggest difference is when a primitive data type needs to be wrapped and unwrapped in order to store into an array:
[5]
Even updating a typed object array with objects of the given type e.g.,
String
s into an underlying
String[]
array of an array list seems to be faster by about 10. The only reason I can think of for this is that the JIT compiler manages to optimize the update to the specialized array.
Simpler and much faster using a specialized IntArrayList integerArrayList.setsomeIndex, someNum;
int num = integerArrayList.getsomeIndex; Using a generalized ArrayList requires wrapping, casting unwrapping
- 236 -
integerArrayList.setsomeIndex, new IntegersomeNum; int num = Integer integerArrayList.getsomeIndex, someNum.intValue ;
For this example, the cost of creating a new
Integer
object to wrap the
int
makes setting values take more than ten times longer when using the generalized array. Accessing is not as bad, taking
only twice as long after including the extra cast and method access to get to the
int
.
11.2 Java 2 Collections