Caching Example I Appropriate Data Structures and Algorithms

- 240 -

11.4 Cached Access

Caches use local data when present, and thus dont need to access nonlocal data. If the data is not present locally, the nonlocal data must be accessed or calculated; this is then stored locally as well as being returned. So after the first access, the data is then available locally, and access is quicker. How much quicker depends on the type of cache. Most caches have to maintain the consistency of the data held in the cache: it is usually important for the data in the cache to be up to date. When considering the use of a cache, bear in mind the expected lifetime of the data and any refresh rate or time-to-live values associated with the data. Similarly, for output data, consider how long to keep data in the cache before it must be written out. You may have differing levels of priority for writing out different types of data. For example, some filesystems keep general written data in a write cache, but immediately write critical system data that ensures system consistency in case of crashes. Also, as caches cannot usually hold all the data you would like, a strategy for swapping data out of the cache to overcome cache space limitations is usually necessary. The memory used by the cache is often significant, and it is always better to release the resources used by it explicitly when it is no longer needed, or reduce resources being used by the cache when possible, even if the cache itself is still required. Caching can apply to data held in single objects or groups of objects. For single objects, it is usual to maintain a structure or instance variable that holds cached values. For groups of objects, there is usually a structure maintained at the point of access to the elements of the group. In addition, caching applies generally to two types of locality of access, usually referred to as spatial and temporal. Spatial locality refers to the idea that if something is accessed, it is likely that something else nearby will be accessed soon. This is one of the reasons buffering IO streams works so well. If every subsequent byte read from disk was in a completely different part of the disk, IO buffering would provide no help at all. Temporal locality refers to the idea that if you access something, you are likely to access it again in the near future. This is the principle behind browsers holding files locally, once downloaded. There is a lot of research into the use of caches, but most of it is related to CPU or disk hardware caches. Nevertheless, any good article or book chapter on caches should cover the basics and the pitfalls, and these are normally applicable with some extra thinking effort to caches in applications. One thing you should do is monitor cache-hit rates, i.e., the number of times that accessing data retrieves data from the cache, compared to the total number of data accesses. This is important because if the cache-hit rate is too low, the overhead of having a cache may be more than any actual gain in performance. In this case, you want to tune or disable the cache. It is frequently useful to build in the option of disabling and emptying the cache. This can be very helpful for two reasons. First, you can make direct comparisons of operations with and without the cache, and second, there are times when you want to measure the overhead in filling an empty cache. In this case, you may need to repeatedly fill an empty cache to get a good measurement.

11.5 Caching Example I

When accessing elements from sets of data, it is often the case that some elements are accessed much more frequently than others. In these cases, it is possible to apply caching techniques to speed up access to these frequently accessed elements. This is best demonstrated with the following example. Consider a CacheTest class that consists mainly of a Map populated with Integer objects. I use Integer objects for convenience to populate the Map with many elements, but the actual object type - 241 - is of no significance since you use only the hashCode and equals methods, just as the Map does. Basically, you provide two ways to access the elements of the Map . The first, plain_access , just calls the Map.get method as normal. The second method, cached_access , uses the lower bits of the hash code of the object to obtain an index value into an array. This index is then checked to see whether the object is there. If it is, the corresponding value in a parallel value array is returned. If its not, the object is placed there with the value in the corresponding value array. This is about the simplest example of general cached access. It demonstrates the advantages and pitfalls of cached access. I have selected 10 integers that do not map to the same indexes for the example. Running the class gives a straightforward comparison between the two access methods, and I get the result that the cached access varies significantly depending on the VM used. The access speedups are illustrated in the following table of measurements. Times have been normalized to the JDK 1.2 case for using a HashMap . The first time of each entry is the measurement using a HashMap not available in JDK 1.1.6, and the second is the measurement using a Hashtable . For any one VM, the cached access is significantly faster. JDK 1.2 JDK 1.3 JDK 1.1.6 JDK 1.2 no JIT HotSpot 1.0 Plain accessHashMapHashtable 100317 198203 -444 16462730 329238 Cached accessHashMapHashtable 3532 7373 -32 11881120 120101 This test is artificial in that I chose integers where no two map to the same index. If there is more than one integer that maps to the same cache array index, this is called a collision. Clearly, with collisions, performance is not as good because you are constantly entering the code that puts the objects into the cache. Collisions are a general problem with cached data, and you need to minimize them for optimal performance. This can be done by choosing an appropriate mapping function to generate indexes that minimize your collisions: package tuning.cache; import java.util.HashMap; import java.util.Hashtable; import java.lang.Math; public class CacheTest { The cache array for the keys static Object[] cache_keys = new Object[128]; The array for the values corresponding to cached keys static Object[] cache_values = new Object[128]; static Hashtable hash = new Hashtable ; static HashMap hash = new HashMap ; public static void mainString[] args { try { System.out.printlnstarted populating; populate ; System.out.printlnstarted accessing; access_test ; } catchException e{e.printStackTrace ;} } public static void populate - 242 - { for int i = 0; i 100000; i++ hash.putnew Integeri, new Integeri+5; } public static Object plain_accessInteger i { simple get call to the hash table return hash.geti; } public static Object cached_accessInteger i { First get access index int access = Math.absi.hashCode 127; Object o; if the access index has an object, and that object is equal to key then return the corresponding value in the parallel values array. if o = cache_keys[access] == null || o.equalsi { otherwise, we got a collision. We need to replace the object at that access index with the new one that we get from the hashtable using normal Hashtable.get , and then return the value retrieved this way if o = null System.out.printlnCollsion between + o + and + i; o = hash.geti; cache_keys[access] = i; cache_values[access] = o; return o; } else { return cache_values[access]; } } public static void access_test { Ten integers that do not collide under the mapping scheme This gives best performance behavior for illustration purposes Integer a0 = new Integer6767676; Integer a1 = new Integer33; Integer a2 = new Integer998; Integer a3 = new Integer3333; Integer a4 = new Integer12348765; Integer a5 = new Integer9999; Integer a6 = new Integer66665; Integer a7 = new Integer1234; Integer a8 = new Integer987654; Integer a9 = new Integer3121219; Object o1,o2,o3,o4,o5,o6,o7,o8,o9,o0; long time = System.currentTimeMillis ; for int i = 0; i 1000000; i++ { o1 = plain_accessa0; o2 = plain_accessa1; o3 = plain_accessa2; o4 = plain_accessa3; o5 = plain_accessa4; o6 = plain_accessa5; o7 = plain_accessa6; o8 = plain_accessa7; o9 = plain_accessa8; - 243 - o0 = plain_accessa9; } System.out.printlnplain access took + System.currentTimeMillis -time; time = System.currentTimeMillis ; for int i = 0; i 1000000; i++ { o1 = cached_accessa0; o2 = cached_accessa1; o3 = cached_accessa2; o4 = cached_accessa3; o5 = cached_accessa4; o6 = cached_accessa5; o7 = cached_accessa6; o8 = cached_accessa7; o9 = cached_accessa8; o0 = cached_accessa9; } System.out.printlncached access took + System.currentTimeMillis -time; } }

11.6 Caching Example II