- 216 -
for int j = 0; j CAPACITY; j++ l.setj, Boolean.TRUE;
System.out.printlnltype + took + System.currentTimeMillis -time;
} }
The normalized results from running this test are shown in Table 10-3
. Table 10-3, Timings of the Various Array-Manipulation Tests, Normalized to the JDK 1.2
Vector Test
1.2 1.2 no JIT
1.3 HotSpot 1.0
HotSpot 2nd Run
Vector 100 179
25
[2]
64 32
ArrayList 23 382
[3]
22 17
24 Wrapped ArrayList
170 797 36 72
39
[2]
The 1.3 VM manages to execute the initial
Vector
test slightly faster than the
ArrayList
. But unfortunately, the VM then appears to unoptimize the
Vector
test, making all subsequent test runs slower.
[3]
I have no idea why the non-JIT VM runs the
ArrayList
slower. The
ArrayList
methods are defined with slightly more testing, but I wouldnt have thought there was enough to make such a difference.
There are some reports that the latest VMs have negligible overheads for synchronized methods; however, my own tests show that synchronized methods continue to incur significant overheads
VMs up to and including JDK 1.2. HotSpot has at times shown different behavior. My tests using HotSpot show that synchronized methods can sometimes be optimized to run faster than
unsynchronized versions. However, by varying the order and type of tests, it becomes clear that HotSpot is very inconsistent in its optimizations. This variation can exist for a number of different
reasons: profiler overheads, aggressive compiler cutting in, deoptimizations occasionally necessary, etc. The variability in this particular test probably comes from speculatively inlining methods and
sometimes having to undo the speculative inline. This can result in tests where a synchronized method apparently gets optimized more effectively than a nonsynchronized method.
The results from running the
ListTesting
class just defined in a HotSpot VM show how difficult it can be to get consistent results from HotSpot . For my test results, I take the first three and next
three results, but I also find that altering the order of the tests can make a big difference to the times:
Vector took 5548 sync ArrayList took 6239
ArrayList took 1472 Vector took 2734
ArrayList took 2103 sync ArrayList took 3385
Vector took 7811 ArrayList took 6469
sync ArrayList took 3696
10.4.2 Avoiding Serialized Execution
One way of completely avoiding the requirement to synchronize methods is to use separate objects and storage structures for different threads. Care must be taken to avoid calling
synchronized
methods from your own methods, or you will lose all your carefully built benefits. For example,
Hashtable
access and update methods are
synchronized
, so using one in your storage structure can eliminate any desired benefit. Prior to JDK 1.2, there is no unsynchronized hash table in the
- 217 - JDK, and you have to build or buy your own unsynchronized version. From JDK 1.2,
unsynchronized collection classes are available, including
Map
classes. As an example of implementing this framework, I look at a simple set of global counters, keyed on
a numeric identifier. Basically, the concept is a global counter to which any thread can add a number. This concept is extended slightly to allow for multiple counters, each counter having a
different key.
String
keys are more useful, but for simplicity I use integer keys in this example. To use
String
keys, an unsynchronized
Map
replaces the arrays. The simple, straightforward version of the class looks like this:
package tuning.threads; public class Counter1
{ For simplicity make just 10 counters
static long[] vec = new long[10]; public static void initializeint key
{ vec[key] = 0;
} And also just make key the index into the array
public static void addAmountint key, long amount {
This is not atomically synchronized since we do an array access together with an update, which are two operations.
vec[key] += amount; }
public static long getAmountint key {
return vec[key]; }
}
This class is basic and easy to understand. Unfortunately, it is not thread-safe, and leads to corrupt counter values when used. A test run on a particular single-CPU configuration with four threads
running simultaneously, each adding the number 1 to the same key 10 million times, gives a final counter value of around 26 million instead of the correct 40 million.
[4]
On the positive side, the test is blazingly fast, taking very little time to complete and get the wrong answer.
[4]
The results discussed are for one particular test run. On other test runs, the final value is different, but it is almost never the correct 40 million value. If I use a faster CPU or a lower total count, the threads can get serialized by the operating system by finishing quickly enough, leading to consistently correct results for
the total count. But those correct results are an artifact of the environment, and are not guaranteed to be produced. Other system loads and environments generate corrupt values.
To get the correct behavior, you need to synchronize the update methods in the class. Here is
Counter2
, which is just
Counter1
with the methods synchronized:
package tuning.threads; public class Counter2
{ For simplicity make just 10 counters
static long[] vec = new long[10]; public static synchronized void initializeint key
{
- 218 -
vec[key] = 0; }
And also make the just make key the index into the array public static synchronized void addAmountint key, long amount
{ Now the method is synchronized, so we will always
complete any particular update vec[key] += amount;
} public static synchronized long getAmountint key
{ return vec[key];
} }
Now you get the correct answer of 40 million for the same test as before. Unfortunately, the test takes 20 times longer to execute see
Table 10-4 . Avoiding the synchronization is going to be more
work. To do this, create a set of counters, one for each thread, and update each threads counter separately.
[5]
When you want to see the global total, you need to sum the counters across the threads. The class definition follows:
[5]
Although
ThreadLocal
variables might seem ideal to ensure the allocation of different counters for different threads, they are of no use here. The underlying implementation for
ThreadLocal
objects uses a synchronized map to allocate per-thread objects, and that defeats the intention to avoid synchronization completely.
package tuning.threads; public class Counter3
{ support up to 10 threads of 10 counters
static long vec[][] = new long[10][]; public static synchronized void initializeCounterTest t
{ For simplicity make just 10 counters per thread
vec[t.num] = new long[10]; }
public static void addAmountint key, long amount {
Use our own threads to make the mapping easier, and to illustrate the technique of customizing threads.
For generic Thread objects, could use an unsynchronized HashMap or other Map,
Or use ThreadLocal if JDK 1.2 is available We use the num instance variable of the CounterTest
object to determine which array we are going to increment. Since each thread is different, here is no conflict.
Each thread updates its own counter. long[] arr = vec[CounterTest Thread.currentThread .num];
arr[key] += amount; }
public static synchronized long getAmountint key {
The current amount must be aggregated across the thread storage arrays. This needs to be synchronized, but
does not matter here as I just call it at the end. long amount = 0;
for int threadnum = vec.length-1; threadnum = 0 ; threadnum-- {
- 219 -
long[] arr = vec[threadnum]; if arr = null
amount += arr[key]; }
return amount; }
}
Using
Counter3
, you get the correct answer for the global counter, and the test is quicker than
Counter2
. The relative timings for a range of VMs are listed in Table 10-4
. Table 10-4, Timings of the Various Counter Tests, Normalized to the JDK 1.2 Counter2
Test
1.2 1.2 no JIT