Hashtables and HashMaps Appropriate Data Structures and Algorithms

- 237 - [6] Attributes simply wraps a HashMap , and restricts the keys to be ASCII-character alphanumeric String s, and values to be String s. WeakHashMap can maintain a cache of elements that are automatically garbage-collected when memory gets low. RenderingHints is specialized for use within the AWT packages. Properties is a Hashtable subclass specialized for maintaining key value string pairs in files. UIDefaults is specialized for use within the Swing packages Hashtable , HashMap , and HashSet are all O1 for access and update, so they should scale nicely if you have the available memory space. List has four general-purpose implementations, Vector , Stack , ArrayList , and LinkedList . Vector , Stack , and ArrayList have underlying implementations based on arrays. LinkedList has an underlying implementation consisting of doubly linked list. As such, LinkedList s performance is worse than any of the other three List s for most operations. For very large collections that you cannot presize to be large enough, LinkedList s provides better performance when adding or deleting elements towards the middle of the list, if the array-copying overhead of the other List s is higher than the linear access time of the LinkedList . Otherwise, LinkedList s only likely performance advantage is as a first-in-first-out queue or double-ended queue. A circular array-list implementation provides better performance for a FIFO queue. Vector is a synchronized List , and ArrayList is an unsynchronized List . Vector is present for backward compatibility with earlier versions of the JDK. Nevertheless, if you need to use a synchronized List , a Vector is faster than using an ArrayList in a synchronized wrapper. See the comparison test at the end of Section 10.4.1 . Stack is a subclass of Vector with the same performance characteristics, but with additional functionality as a last-in-first-out queue.

11.3 Hashtables and HashMaps

Because Hashtable s and HashMap s are the most commonly used nonlist structures, I will spend a little extra time discussing them. Hashtable s and HashMap s are pretty fast and provide adequate performance for most purposes. I rarely find that I have a performance problem using Hashtable s or HashMap s, but here are some points that will help you tune them, or, if necessary, replace them: • Hashtable is synchronized . Thats fine if you are using it to share data across threads, but if you are using it single-threaded, you can replace it with an unsynchronized version to get a small boost in performance. HashMap is an unsynchronized version available from JDK 1.2. • Hashtable s and HashMap s are resized whenever the number of elements reaches the [capacity loadFactor ]. This requires reassigning every element to a new array using the rehashed values. This is not simply an array copy; every element needs to have its internal table position recalculated using the new table size for the hash function. You are usually better off setting an initial capacity that handles all the elements you want to add. This initial capacity should be the number of elements divided by the loadFactor the default load factor is 0.75. • Hashtable s and HashMap s are faster with a smaller loadFactor , but take up more space. You have to decide how this tradeoff works best for you. • The hashing function should work better with a capacity that is a prime number. Failing this, always use an odd number, never an even number add one if you have an even number. The rehashing mechanism creates a new capacity of twice the old capacity, plus one. A useful prime number to remember is 89. The sequence of numbers generated by successively multiplying by two and adding one includes several primes when the sequence starts with 89. The default size for Hashtable s is 101. However, in my tests using a size of 89, I gained a statistically significant speedup of only 2 or 3. The variation in test runs was actually larger than this speedup. - 238 - • Access to the Map requires asking the key for its hashCode and also testing that the key equals the key you are retrieving. You can create a specialized Map class that bypasses these calls if appropriate. Alternatively, you can use specialized key classes that have very fast method calls for these two methods. Note, for example, that Java String objects have hashCode methods that iterate and execute arithmetic over a number of characters to determine the value, and the String.equals method checks that every character is identical for the two strings being compared. Considering that strings are used as the most common keys in Hashtable s, Im often surprised to find that I dont have a performance problem with them, even for largish tables. From JDK 1.3, String s cache their hash code in an instance variable, making them faster and more suited as Map keys . • If you are building a specialized Hashtable , you can map objects to array elements to preallocate HashtableEntry objects and speed up access as well. The technique is illustrated in Section 11.8 later in this chapter. Here is a specialized class to use for keys in a Hashtable . This example assumes that I am using String keys, but all my String objects are nonequal, and I can reference keys by identity. I use a utility class, tuning.dict.Dict , which holds a large array of nonequal words taken from an English dictionary. I compare the access times against all the keys using two different Hashtables , one using the plain String objects as keys, the other using my own StringWrapper objects as keys. The StringWrapper objects cache the hash value of the string and assume that equality comparison is the same as identity comparison. These are the fastest possible equals and hashCode methods. The access speedups are illustrated in the following table of measurements times normalized to the JDK 1.2 case: JDK 1.2 JDK 1.3 [7] JDK 1.1.6 JDK 1.2 no JIT HotSpot 1.0 String keys 100 45 91 478 165 String-wrapped keys 53 40 64 282 65 [7] The limited speedup in JDK 1.3 reflects the improved performance of String s having their hash code cached in the String instance. If you create a hash-table implementation specialized for the StringWrapper class, you avoid calling the hashCode and equals methods completely. Instead, the specialized hash table can access the hash-instance variable directly and use identity comparison of the elements. The speedup is considerably larger, and for specialized purposes, this is the route to follow: package tuning.hash; import java.util.Hashtable; import tuning.dict.Dict; public class SpecialKeyClass { public static void mainString[] args { Initialize the dictionary try{Dict.initializetrue;}catchException e{} System.out.printlnStarted Test; Build the two hashtables. Keep references to the StringWrapper objects for later use as accessors. Hashtable h1 = new Hashtable ; Hashtable h2 = new Hashtable ; StringWrapper[] dict = new StringWrapper[Dict.DICT.length]; for int i = 0; i Dict.DICT.length; i++ - 239 - { h1.putDict.DICT[i], Boolean.TRUE; h2.putdict[i] = new StringWrapperDict.DICT[i], Boolean.TRUE; } System.out.printlnFinished building; Object o; Time the access for normal String keys long time1 = System.currentTimeMillis ; for int i = 0; i Dict.DICT.length; i++ o = h1.getDict.DICT[i]; time1 = System.currentTimeMillis - time1; System.out.printlnTime1 = + time1; Time the access for StringWrapper keys long time2 = System.currentTimeMillis ; for int i = 0; i Dict.DICT.length; i++ o = h2.getdict[i]; time2 = System.currentTimeMillis - time2; System.out.printlnTime2 = + time2; } } final class StringWrapper { cached hash code private int hash; private String string; public StringWrapperString str { string = str; hash = str.hashCode ; } public final int hashCode { return hash; } public final boolean equalsObject o { The fastest possible equality check return o == this; This would be the more generic equality check if we allowed access of the same String value from different StringWrapper objects. This is still faster than the plain Strings as keys. ifo instanceof StringWrapper { StringWrapper s = StringWrapper o; return s.hash == hash string.equalss.string; } else return false; } } - 240 -

11.4 Cached Access