no HotSpot 1.0 2nd Loops and Switches

- 163 - Since this function is easily converted to a tail-recursive version, it is natural to test the tail- recursive version to see if it performs any better. For this particular function, the tail-recursive version does not perform any better, which is not typical. Here, the factorial function consists of a very simple fast calculation, and the extra function call overhead in the tail-recursive version is enough of an overhead that it negates the benefit that is normally gained. Note that the HotSpot 1.0 VM does manage to optimize the tail-recursive version to be faster than the original, after the compiler optimizations have had a chance to be applied. See Table 7-3 . Lets look at other ways this function can be optimized. Start with the classic conversion for recursive to iterative and note that the factorial method contains just one value which is successively operated on, to give a new value the result, along with a parameter specifying how to operate on the partial result the current input to the factorial. A standard way to convert this type of recursive method is to replace the parameters passed to the method with temporary variables in a loop. In this case, you need two variables, one of which is passed into the method and can be reused. The converted method looks like: public static long factorial2int n { long result = 1; whilen1 { result = n--; } return result; } Measuring the performance, you see that this method calculates the result in 88 of the time taken by the original recursive factorial1 method using the JDK 1.2 results. [8] See Table 7-3 . [8] HotSpot optimized the recursive version sufficiently to make it faster than the iterative version. Table 7-3, Timings of the Various Factorial Implementations 1.2 1.2 no JIT

1.3 HotSpot 1.0 2nd

Run 1.1.6 factoral1 original recursive 100 572 152 137 101 factoral1a tail recursive 110 609 173 91 111 factorial2 iterative 88 344 129 177 88 factoral3 dynamically cached 46 278 71 74 46 factoral4 statically cached 41 231 67 57 40 factoral3 dynamically cached with cache size of 21 elements 4 56 11 8 4 The recursion-to-iteration technique as illustrated here is general, and another example in a different domain may help make this generality clear. Consider a linked list, with singly linked nodes consisting of a next pointer to the next node, and a value instance variable holding in this case just an integer. A simple linear search method to find the first node holding a particular integer looks like: Node find_recursiveint i { if node.value == i return node; else ifnode.next = null node.next.find_recursivei; else - 164 - return null; } To convert this to an iterative method, use a temporary variable to hold the current node, and reassign that variable with the next node in the list at each iteration. The method is clear, and its only drawback compared to the recursive method is that it violates encapsulation this one method directly accesses the instance variable of each node object: Node find_iterativeint i { Node node = this; whilenode = null { if node.value == i return node; else node = node.next; } return null; } Before looking at general techniques for converting other types of recursive methods to iterative ones, I will revisit the original factorial method to illustrate some other techniques for improving the performance of recursive methods. To test the timing of the factorial method, I put it into a loop to recalculate factorial20 many times. Otherwise, the time taken is too short to be reliably measured. When this situation is close to the actual problem, a good tuning technique is to cache the intermediate results. This technique can be applied when some recursive function is repeatedly being called and some of the intermediate results are repeatedly being identified. This technique is simple to illustrate for the factorial method: public static final int CACHE_SIZE = 15; public static final long[] factorial3Cache = new long[CACHE_SIZE]; public static long factorial3int n { if n 2 return 1L; else if n CACHE_SIZE { if factorial3Cache[n] == 0 factorial3Cache[n] = nfactorial3n-1; return factorial3Cache[n]; } else return nfactorial3n-1; } With the choice of 15 elements for the cache, the factorial3 method takes 46 of the time taken by factorial1 . If you choose a cache with 21 elements, so that all except the first call to factorial320 is simply returning from the cache with no calculations at all, the time taken is just 4 of the time taken by factorial1 using the JDK 1.2 results: see Table 7-3 . In this particular situation, you can make one further improvement, which is to compile the values at implementation and hardcode them in: public static final long[] factorial4Cache = { 1L, 1L, 2L, 6L, 24L, 120L, 720L, 5040L, 40320L, 362880L, 3628800L, 39916800L, 479001600L, 6227020800L, 87178291200L}; public static final int CACHE_SIZE = factorial4Cache.length; - 165 - public static long factorial4int n { if n CACHE_SIZE return factorial4Cache[n]; else return nfactorial4n-1; } This is a valid technique that applies when you can identify and calculate partial solutions that can be included with the class at compilation time. [9] [9] My editor points out that a variation on hardcoded values, used by state-of-the-art high-performance mathematical functions, is a partial table of values together with an interpolation method to calculate intermediate values.

7.5 Recursion and Stacks