- 163 - Since this function is easily converted to a tail-recursive version, it is natural to test the tail-
recursive version to see if it performs any better. For this particular function, the tail-recursive version does not perform any better, which is not typical. Here, the factorial function consists of a
very simple fast calculation, and the extra function call overhead in the tail-recursive version is enough of an overhead that it negates the benefit that is normally gained. Note that the HotSpot 1.0
VM does manage to optimize the tail-recursive version to be faster than the original, after the compiler optimizations have had a chance to be applied. See
Table 7-3 .
Lets look at other ways this function can be optimized. Start with the classic conversion for recursive to iterative and note that the factorial method contains just one value which is
successively operated on, to give a new value the result, along with a parameter specifying how to operate on the partial result the current input to the factorial. A standard way to convert this type
of recursive method is to replace the parameters passed to the method with temporary variables in a loop. In this case, you need two variables, one of which is passed into the method and can be
reused. The converted method looks like:
public static long factorial2int n {
long result = 1; whilen1
{ result = n--;
} return result;
}
Measuring the performance, you see that this method calculates the result in 88 of the time taken by the original recursive
factorial1
method using the JDK 1.2 results.
[8]
See Table 7-3
.
[8]
HotSpot optimized the recursive version sufficiently to make it faster than the iterative version.
Table 7-3, Timings of the Various Factorial Implementations
1.2 1.2 no
JIT
1.3 HotSpot 1.0 2nd
Run
1.1.6
factoral1 original recursive 100 572 152 137
101 factoral1a tail recursive
110 609 173 91 111
factorial2 iterative 88 344
129 177 88
factoral3 dynamically cached 46 278
71 74 46
factoral4 statically cached 41 231
67 57 40
factoral3 dynamically cached with cache size of 21 elements
4 56 11
8 4
The recursion-to-iteration technique as illustrated here is general, and another example in a different domain may help make this generality clear. Consider a linked list, with singly linked nodes
consisting of a
next
pointer to the next node, and a
value
instance variable holding in this case just an integer. A simple linear search method to find the first node holding a particular integer
looks like:
Node find_recursiveint i {
if node.value == i return node;
else ifnode.next = null node.next.find_recursivei;
else
- 164 -
return null; }
To convert this to an iterative method, use a temporary variable to hold the current node, and reassign that variable with the next node in the list at each iteration. The method is clear, and its
only drawback compared to the recursive method is that it violates encapsulation this one method directly accesses the instance variable of each node object:
Node find_iterativeint i {
Node node = this; whilenode = null
{ if node.value == i
return node; else
node = node.next; }
return null; }
Before looking at general techniques for converting other types of recursive methods to iterative ones, I will revisit the original factorial method to illustrate some other techniques for improving the
performance of recursive methods.
To test the timing of the factorial method, I put it into a loop to recalculate
factorial20
many times. Otherwise, the time taken is too short to be reliably measured. When this situation is close to
the actual problem, a good tuning technique is to cache the intermediate results. This technique can be applied when some recursive function is repeatedly being called and some of the intermediate
results are repeatedly being identified. This technique is simple to illustrate for the factorial method:
public static final int CACHE_SIZE = 15; public static final long[] factorial3Cache = new long[CACHE_SIZE];
public static long factorial3int n {
if n 2 return 1L; else if n CACHE_SIZE
{ if factorial3Cache[n] == 0
factorial3Cache[n] = nfactorial3n-1; return factorial3Cache[n];
} else return nfactorial3n-1;
}
With the choice of 15 elements for the cache, the
factorial3
method takes 46 of the time taken by
factorial1
. If you choose a cache with 21 elements, so that all except the first call to
factorial320
is simply returning from the cache with no calculations at all, the time taken is just 4 of the time taken by
factorial1
using the JDK 1.2 results: see Table 7-3
. In this particular situation, you can make one further improvement, which is to compile the values
at implementation and hardcode them in:
public static final long[] factorial4Cache = { 1L, 1L, 2L, 6L, 24L, 120L, 720L, 5040L, 40320L, 362880L, 3628800L,
39916800L, 479001600L, 6227020800L, 87178291200L}; public static final int CACHE_SIZE = factorial4Cache.length;
- 165 -
public static long factorial4int n {
if n CACHE_SIZE return factorial4Cache[n];
else return nfactorial4n-1; }
This is a valid technique that applies when you can identify and calculate partial solutions that can be included with the class at compilation time.
[9] [9]
My editor points out that a variation on hardcoded values, used by state-of-the-art high-performance mathematical functions, is a partial table of values together with an interpolation method to calculate intermediate values.
7.5 Recursion and Stacks