282 D.J. Butler J. of Economic Behavior Org. 41 2000 277–297
the outcome for contiguous events, rather than one large event, even though the probability of the consequence being realised is unchanged see Fig. 3. Such effects are inconsistent
with all choice theories derived from a compensatory choice process, though they are con- sistent with the non-compensatory ‘majority of confirming decisions’ MCD rule in the
choice problems of interest to us. They are in themselves highly suggestive of incomplete preferences. A test for ESE’s is designed into the experiment.
3. An exploratory experiment and results
3.1. The experiment I designed a computerised experiment to shed some light on the hypotheses. The exper-
iment was written in Borland C++version 3.0. An important innovation in the design was the attempt to observe the strength of an individual’s preference in each of the pair-wise
choices, using a graphical ‘strength of preference’ indicator. This indicator has strong sim- ilarities to the ‘confidence’ indicator used in Butler and Loomes 1988. I assume a clear
preference to exist when a high number ≥7 is selected on the preference indicator. Two pairs of lotteries were employed, which are termed the low and high gap lotteries. The set
of lotteries used in the 16 questions is detailed in Fig. 1.
The testing of H1 and H2 used a low gap lottery pair that offered a 30 percent chance of 44 and 70 percent chance of 0, or a 70 percent chance of 16 and 30 percent chance of 0.
The GEU’s of these are assumed to be close because the EV of the high variance risky lottery exceeds the EV of the risk-averse lottery. If risk aversion is a typical characteristic
of clear utility functions, this combination should for many people lead to reasonably equal utilities of the lotteries presented. The high gap lottery pair used to test H1 and H2 offered a
40 percent chance of 33 and 60 percent chance of 0, or a 60 percent chance of 26 and a 40 percent chance of 0. This time the EV of the risk-averse lottery exceeds that of the risky
lottery, which should reinforce any natural tendency toward risk-aversion. The information and practice question screens used in the experiment are illustrated in Fig. 2.
Ninety-four students from the University of Western Australia were recruited by cross- campus advertising. A total of 16 experimental sessions were held. The experimenter
assigned subjects to a terminal and gave a brief description of the experiment and the payment mechanism. As part of this explanation, subjects completed three practice ques-
tions, one from each display that they would encounter. The experimenter ensured that all subjects understood how to use the preference indicator before the experiment proper began.
There were three very different displays in which the gamble pairs were presented. Examples of these are illustrated in Fig. 2, and will be referred to here as the pie, strip, and
bar displays. The pie and strip displays have often been used in choice experiments, the bar display has not. Subjects had to select either the risky or risk-averse option. Subjects may
have had more difficulty forming a preference using the bar display in this experiment, as the value of the outcomes was not explicitly stated. As the lottery outcomes were stated
clearly in the strip and pie displays, this suggests the bar display used here was fundamentally different. If so then we cannot be certain whether to attribute differences in choice behaviour
D.J. Butler J. of Economic Behavior Org. 41 2000 277–297 283
Fig. 1. Details of the choice problems. A total of 18 sub-groups used the payoff values for u,v and probabilities for A, B, C listed above. The question numbers, types and displays indicated are those for one sub-group. For examples
of question type, see Fig. 3 a,b and c. Questions 7–10 were used for the monotonicity test. The hypotheses were explored using the high and low gap lottery pairs in questions 1–6 and 11–16.
to some inherent property of the bar display or to the ease with which the outcomes were discerned. A slightly different design could address this issue.
7
The three display groups were then split so that half had the labels ‘A’ and ‘B’ on the lotteries reversed, and finally the resulting six groups were each broken into three different
question orderings. In total this gave 18 subgroups; subjects were randomly assigned to these subgroups over the various sessions. The purpose of the labelling and question-order
subgroups was to minimise other possible sources of bias. Camerer 1989 found that choice reversal rates can depend directly on the gap between the first encounter and the second,
although the reversal rate stabilises if the repetition is separated by a gap of more than five other choice problems. Consequently, all pure repeats in this experiment were kept six
choices apart to lessen the memory factor in lowering reversal rates.
Each subject’s response to a question was converted into a score from 1 to 18. On this scale, a choice of gamble ‘A’ coupled with a maximum preference score of 9 was assigned
an extreme value of 1. A choice of gamble ‘B’ coupled with a maximum preference of 9 was assigned the opposite extreme value of 18, thereby creating a continuum stretching from 1
to 18 denoting decreasing preference for ‘A’. Hence, a subject choosing ‘A’ combined with a preference strength of just 3 would have a score of 7 for that question; a subject choosing
‘B’ with a preference strength of 7 would have a score of 16 assigned. This method allows
7
Simply print the dollar value on each bar.
284 D.J. Butler J. of Economic Behavior Org. 41 2000 277–297
Fig. 2. Experimental displays used.
D.J. Butler J. of Economic Behavior Org. 41 2000 277–297 285
Fig. 2 Continued.
286 D.J. Butler J. of Economic Behavior Org. 41 2000 277–297
Fig. 3. The juxtapositioning of lottery consequences: strip display. Other displays used equivalent transformations. Ref Figs. 1 and 2.
both the subject’s choice and strength of preference for a question to be combined into a single indicator.
There are two possible ways of recording ‘errors’ in this design. First, like other re- searchers the paper focuses on those cases where the choice between ‘A’ and ‘B’ reverses.
I will refer to this as the ‘standard’ measure. Assuming subjects who used a 7, 8 or 9 on the 9-point preference scale had a clear preference, the dependence of choice reversals errors
on preference clarity could be investigated. The indicator was designed with this definition in mind see Fig. 2. Additionally, we can look to see the proportion of subjects that made a
change of 9 or more points on the 18-point preference scale. We will call this a ‘9+reversal rate’. Where relevant, both sets of results will be reported; the latter definition is clearly the
more stringent, though not directly comparable to results of previous research.
Each subject also faced pairs of choices that: 1. controlled for displays and event-splitting effects but changed the juxtaposition of con-
sequences; 2. controlled for displays and juxtapositions but changed the number of event splits;
3. controlled for displays but allowed the juxtaposition and the number of event splits to change simultaneously.
This design see Fig. 3 permits the display-dependence of regret and event-splitting effects to be investigated.
Finally, subjects were told that payment would depend on one of their chosen gambles being played out for real, and that payments could range from zero to A50. At the end
D.J. Butler J. of Economic Behavior Org. 41 2000 277–297 287
Table 1 Proportions of risky choices, by display and clarity
Display Risky
Risk Total
of risky of risk averse
Overall average averse
on unclear on unclear
preference strength High gap lottery pair
a
Strip 4.3
95.6 184
75.0 18.2
7.4 Pie
8.8 91.2
192 94.1
30.3 6.76
Bar 15.5
84.5 180
78.6 48.7
6.27 Average
9.5 90.5
556 83.0
31.6 6.81
Low gap lottery pair
b
Strip 42.4
57.6 186
72.1 57.0
5.83 Pie
31.7 68.3
180 75.4
55.3 6.12
Bar 42.7
57.3 178
68.4 57.8
5.98 Average
39.0 61.0
544 71.7
56.6 5.97
a
χ
2
= 13.46 5.99, therefore, the proportion of risky choices is display dependent; χ
2
= 2.23 5.99 therefore
the proportion of risky choices on unclear preferences is not display dependent; χ
2
= 35.3 5.99, therefore, the
proportion of risk averse choices on unclear preferences is display dependent.
b
χ
2
= 6.03 5.99, therefore, the proportion of risky choices is display dependent; χ
2
= 0.80 5.99, therefore,
the proportion of risky choices on unclear preferences is not display dependent; χ
2
= 0.16 5.99, therefore, the
proportion of risk averse choices on unclear preferences is not display dependent.
of the experiment a random device selected one of the 16 questions the subject faced. The lottery was then played for real by allowing the subject to draw a lottery ticket numbered
from 1 to 100 from a box, and payment occurred in accordance with their lottery choice for that ticket. Overall, around half of the subjects won money, the average sum being A 26
for less than an hour of their time.
The experimental design follows the incentive-compatible method developed by Becker et al. 1964, but it is not completely uncontroversial Machina, 1989. This is because a
subject’s answers to any one choice problem needs to be independent of their responses to any other choice problems in the experiment. If not, they may reduce what then becomes
a two-stage lottery to a one-stage lottery, thereby undermining the incentive structure. However, Starmer and Sugden 1991 provide strong evidence against this reduction process
in experiments of this type. I shall follow the bulk of the literature in assuming that the Becker, DeGroot, Marschak method is basically sound.
3.2. Experimental results Hypotheses 1a and b, and Hypothesis 2b can be investigated with reference to Table 1. As
six otherwise identical questions were asked in each of the three displays for both the high and low gap pair we can compare the average confidence scores for each display for these
questions. It is clear that preferences in the low gap lottery pair Table 1 are much hazier than for the high gap pair Table 1, as H1a proposed. Table 1 also shows a near perfect
negative rank correlation between the average preference strength for the question and the degree of pro-risk behaviour, as H1b implies. The great majority of risky choices occurred
on unclear preferences, far higher than the proportion for risk-averse choices, especially for the high gap pair. Note that in general there is no additional display effect on the proportion
288 D.J. Butler J. of Economic Behavior Org. 41 2000 277–297
Table 2 Strength of preference by presentational display
a
Percentage of subjects indicating clear choices both times the same lottery pair was presented
b
Display Clear
Total Pie
46.1 102
Bar 31.1
90 Strip
63.7 102
Average 47.3
294
a
Note: The 294 identical repeats comprise three repeats for each of the 94 subjects plus an additional unplanned repeat for 12 subjects. The latter was due to a typographical error in a question shown to some sub-groups, which
also explains why some of the other totals reported may not be divisible by 94.
b
χ
2
19.3 χ
2 .
05,2
= 5.99, therefore, the display affects the proportion of clear preferences.
of risky choices on unclear preferences. This is to be expected, as the display effect operates by changing the ease of preference formation, and as a consequence the proportions of risky
choices.
8
If GEU models are not core preference theories, this suggests that use of a low gap lottery pair has the potential to be more conducive to any choice switches, be they errors, GEU
effects or Event-Splitting Effects, than would a high gap pair. After all, if no risky choices are made at all, then all choice-switching rates must be zero.
Hypothesis 2 focuses directly on the core preferences issue. Table 2 shows the clarity of preferences by display, for the questions repeated identically, all using the high gap lottery
pair. These results suggest that the proportion of subjects with clear preferences did indeed vary by display, as Table 1 also suggested. There is a clear ranking of the displays used,
with the strip display promoting the clearest preferences. These findings are consistent with H2a.
Given the success of H2a, it is not surprising that Hypothesis 2b on the frequency of risky choices is also supported, as can be seen in Table 1. These results suggest we may
find support for our explanation of choice errors and GEU effects, as hypothesised in H2c. Choice reversals should occur predominantly for those choices where subjects are least
sure of their preference, rather than being spread across preference levels as Harless and Camerer’s 1994 ‘white noise’ theory would imply. The breakdown of the 294 ques-
tions that were repeated identically is as follows not shown in Table 2. Of the 139 times that a confident choice is made on both the original and repeat question, there are only
three choice-reversals, giving a reversal rate of 2.1 percent. Of the remaining 155 times where a subject is less confident of their choice on one or both occasions, there are 32
choice reversals, giving a reversal rate of 20.6 percent. This difference is easily significant:
χ
2
= 23.9 χ
2 .
05,1
= 3.84. Overall, there are 35 reversals, representing 11.9 percent of the
total 294 questions. For the same 294 repetitions, the 35 reversals were spread across displays as shown in
Table 3. Given the relatively small number of reversals, these display differences narrowly fail to be significant at the 5 percent level, although they are easily significant at the 10
percent level. However, the ranking of choice reversal rates by display is consistent with
8
The puzzling exception see Table 1 involved risk-averse choices in the high gap pair.
D.J. Butler J. of Economic Behavior Org. 41 2000 277–297 289
Table 3 Choice reversals by display
a
Standard measure 9 + measure
Display Reversals
Total Reversals
Total Pie
14.7 102
8.8 102
Bar 15.5
90 14.4
90 Strip
5.9 102
4.9 102
Total 11.9
294 9.2
294
a
χ
2
5.42 χ
2 .
05,2
= 5.99, therefore, choice reversals narrowly fail to be display dependent at the 5 level.
Hypothesis 2: the strip display has the largest proportion of clear preferences and the fewest reversals, the bar display the least clear preferences and the most reversals. These results
show a lower reversal rate than those found by others, for example. Hey and Orme 1994. One plausible explanation is that we used only the high gap lottery pair, whereas Hey and
Orme’s choice pairs generally involved lottery pairs with more similar expected values.
The experiment also offered precisely the same gamble to a subject twice, but in different displays each time. The findings for the three display groups, all of which used the low gap
lottery pair, are shown in Table 4. The overall rate of 33.8 percent is well in excess of the 11.9 percent reported previously for the high gap pair when the display was unchanged.
Using the 18-point scale, exactly the same score was chosen by just 25.4 percent 65256 for choices repeated in different displays. This compares with 42.5 percent 114268 when
the choice was repeated within a display on the high gap pair.
The dramatic difference in choice reversal rates between Tables 3 and 4 is explained by both the use of the low gap rather than high gap lottery pair in Table 4, and the trans-display
comparison. It is not possible to separate the impacts of the display-change and GEU-gap change in this particular study, however, that now seems a worthwhile question for future
work.
Continuing with Hypothesis 2, we now look at the choice-rule effects. There are three plausible ways of measuring such effects in this experiment. First, there is the traditional
count when the lotteries’ consequences are juxtaposed, andor the number of events split. Secondly, the number of 9+switches could be counted. Thirdly, the number of shifts in the
direction predicted by these theories on the 18-point scale can be calculated. Each of these scores can then be contrasted with the number of moves in the opposite direction.
Table 4 Choice reversals as display changes
a
Standard measure 9 + measures
Display pairs Reversals
Total Reversals
Total Strip and bar
38.4 86
24.4 86
Bar and pie 31.7
80 15.0
80 Strip and pie
31.2 90
14.4 90
Total 33.8
256 18.0
256
a
χ
2
1.3 χ
2 .
05,2
= 5.99, therefore, no display-pair differences in reversal rates.
290 D.J. Butler J. of Economic Behavior Org. 41 2000 277–297
Table 5 Combined juxtapositionevent-splitting effects i
a
Number of subjects changing choices following the above change, by clarity Predicted
Unpredicted Neither
Total Clear
5.1 1.7
93.2 117
Unclear 19.4
6.1 74.5
247 Total
14.8 4.7
80.5 364
a
χ
2
17.63 χ
2 .
05,2
= 5.99, therefore, combined juxtaposition effects are concentrated on unclear preferences.
The first general result is that, overall, the frequency of combined juxtapositionevent- splitting, pure regret and pure event-splitting effects is rather low in comparison with other
studies. Again, the difference in expected values of the lotteries was somewhat higher than in other experiments; this should not have prevented movements in the predicted direction on
the 18-point scale though. Half of the comparisons used the high gap pair and half the low gap pair, for each test and each display. Tables 5 and 6 give the results for combined juxtaposition
effects i.e. a regret test without standardising for event-splitting effects; compare problems a and c in Fig. 3. If the preference construction explanation of GEU effects is correct, we
would need to see the great majority occur on choices where preferences were unclear. This is what we find. However, it is important to state that although the evidence is consistent
with that view, by itself the results in Table 5 do not constitute evidence against the core theory alternative. This is because a person may have GEU preferences without producing
a GEU ‘effect’ if the utilities of the lotteries are sufficiently different. If so then such a person would report clear preferences and their choice would appear to be consistent with
EU, despite their true preferences being GEU. Of course, the greater the evidence consistent with the preference constructing view, the more the onus shifts to advocates of GEU models
to demonstrate why their interpretation of this is the more plausible.
Table 6 investigates the display dependence of these effects. The best display to observe them is the pie display. Although the display differences are not significant at the 5 percent
level, they are so at the 10 percent level. Using the 18-point scale, of the 61.5 percent of movements, 67 percent favour the predicted direction overall, comprising a 60 percent share
in the bar display, 63 percent in the strip display, and an overwhelming 79 percent in the
Table 6 Combined juxtapositionevent-splitting effects ii
a
Numbers of subjects changing choices following a juxtaposition of consequences by display
b
Standard measure 9 + measure
18-point moves Display
P U
N P
U N
P U
N T
Pie 18.5
0.8 80.7
11.3 0.8
87.9 45.2
12.1 42.7
124 Bar
12.1 7.2
80.7 8.9
4.0 87.1
37.1 25.0
37.9 124
Strip 13.8
6.0 80.2
8.6 6.0
85.4 41.4
24.1 34.5
116 Total
14.8 4.7
80.5 9.6
3.6 86.8
41.2 20.3
38.5 364
a
χ
2
8.52 χ
2 .
05,4
= 9.48, therefore, combined juxtaposition effects narrowly fail to be display dependent.
b
P = predicted, U = unpredicted and N = neither.
D.J. Butler J. of Economic Behavior Org. 41 2000 277–297 291
Table 7 Regret effects i
a
Number of subjects changing choices following a pure juxtaposition of consequences, by clarity Predicted
Unpredicted Neither
Total Clear
3.4 1.3
95.2 146
Unclear 11.9
8.1 80.0
210 Total
8.4 5.3
86.2 356
a
χ
2
16.96 χ
2 .
05,2
= 5.99, therefore, regret effects are concentrated on hazy preferences.
pie display. Taken together these results for combined effects suggest that the juxtaposition phenomenon may be a milder, but more widespread phenomenon than has previously been
realised. Tables 7 and 8 look at the pure-regret effects i.e. juxtaposition effects after standardising
for event splits. Again, Table 7 shows that observed effects are several times more likely to occur on unclear preferences. By display, the presentations this time do have a statistically
significant impact on these effects. The predicted effects are concentrated in the bar display. Of the 58 percent who move on the 18-point scale under a pure regret test, 59 percent move
in the predicted direction. These results suggest that the regret effect is less pronounced than the juxtaposition effect, and is more concentrated in the bar display. The regret effect
does not appear to be strong, though the asymmetry on the 18-point scale suggests it may be a genuine, though very mild, factor in some peoples’ decisions.
Tables 9 and 10 show the results for pure event-splitting effects. Using the conventional choice count measure, Table 9 shows event-splitting effects are also concentrated where
preferences were hazy. The displays however, are not the cause of a statistically significant difference in these
effects, though a larger experiment may have shown them to be. The pie display is most suited to these effects, and the bar display the least suited. Using the more general test of
the number of shifts of any magnitude on the 18-point scale, we find that for event-splitting effects, predicted shifts comprise 61 percent of the 63 percent of movements; this conceals
a 53 percent share in the bar display, and a 66 percent share for the other displays.
Table 8 Regret effects ii
a
Number of subjects changing choices following a pure juxtaposition of consequences, by display
b
Standard measure 9 + measure
18-point moves Display P
U N
P U
N P
U N
T Pie
5.8 1.7
92.5 3.3
1.7 95.0
38.3 19.2
42.5 120
Bar 13.1
7.9 78.9
10.5 5.3
84.2 36.8
22.8 40.3
114 Strip
6.5 6.5
86.9 4.9
4.1 91.0
27.9 29.5
42.6 122
Total 8.4
5.3 86.2
6.2 3.6
90.2 34.2
23.9 41.8
356
a
χ
2
10.5 χ
2 .
05,4
= 9.48, therefore, regret effects are display dependent.
b
P = predicted, U =unpredicted and N = neither.
292 D.J. Butler J. of Economic Behavior Org. 41 2000 277–297
Table 9 Event-splitting effects i
a
Number of subjects changing choices following a splitting of events, by preference strength Predicted
Unpredicted Neither
Total Clear
3.5 2.6
93.8 113
Unclear 19.6
9.4 71.0
235 Total
14.4 7.2
78.4 348
a
χ
2
23.4 χ
2 .
05,2
= 5.99, therefore, event splits are concentrated on hazy preferences.
An unexpected result was also found. Four questions 7–10 in Fig. 1 unconnected to testing for choice reversals or for juxtapositionevent-splitting effects were also faced by
each subject. Their purpose was to be identical to one of the other questions, except for an improvement to either the probability or dollar value of a positive consequence. Assuming
event-wise monotonicity, the idea was to build into the experimental design a test of the preference indicator. For instance, if a subject chose gamble ‘A’ over gamble ‘B’ in one
question, then faced another question that had improved either the probability of a favourable outcome or the consequence concerned, then ceteris paribus their confidence in having
chosen ‘A’ should improve. Similarly, if they had chosen ‘B’ on the first question, their confidence in that choice should now fall, and for some their choice would switch to ‘A’.
The results however, confounded this plan. Table 11 shows that out of 542 valid com- parisons, 402 do not change their choice and 117 others switch their choice in the expected
direction. But no fewer than 23, or 4.2 percent, switch their choices in the wrong direc- tion Of these 23, 16 also switch the wrong way using the 9+measure. Using the 18-point
scale to check whether the preference strength moved in the expected direction regardless of whether the choices switch or not there are 315 moves in the predicted direction, but
84 in the opposite direction. Thus only 79 percent of the 399 movements are in favour of monotonicity.
No existing normative theory of choice under risk could defend violations of the event-wise monotonicity axiom, but it is not inconsistent with the general thrust of this paper. Consider
for example the within-display choice-reversal rate of 11.9 percent on the high gap lottery pair reported above. If an individual has unusually poor knowledge of their preferences,
Table 10 Event-splitting effects ii
a
Number of subjects changing choices following a splitting of events, by display
b
Standard measure 9 + measure
18-point moves Display P
U N
P U
N P
U N
T Pie
18.7 4.5
76.7 12.5
2.7 84.8
42.0 21.4
36.6 112
Bar 11.4
11.4 77.2
8.8 8.8
82.4 33.3
29.8 36.8
114 Strip
13.1 5.7
81.1 8.2
4.9 86.9
41.0 22.1
36.9 122
Total 14.4
7.2 78.4
9.8 5.5
84.7 38.8
24.4 36.8
348
a
χ
2
6.8 χ
2 .
05,2
= 9.48, therefore, event splits are not display dependent.
b
P = predicted, U = unpredicted and N = neither.
D.J. Butler J. of Economic Behavior Org. 41 2000 277–297 293
Table 11 Monotonicity violations
a
Percentage of subjects changing choices for the questions listed
b
Problem pair Standard measure
9 + measure P
U N
P U
N Total
7 and 313 5.7
4.0 90.3
5.7 4.0
90.3 176
9 and 313 35.7
4.4 59.9
33.0 3.3
63.7 182
8 and 2 25.0
2.2 72.8
23.9 0.0
76.1 92
10 and 2 20.6
6.5 72.9
13.0 3.3
83.7 92
Total 21.6
4.2 74.1
19.2 3.0
77.8 542
a
Note: Questions 3 and 13 were identical, so have been combined here. Each subject potentially provided six observations, and 542 of the 564 resulting comparisons were valid.
b
P = predicted, U = unpredicted and N = neither.
it should be possible for a fraction of those people to violate event-wise monotonicity if an unsuitable choice rule is prompted. But even so, it should make economists uneasy that
ceteris paribus improvements to the expected value of one gamble on the order of A 2–A 6, should result in a choice switch away from that gamble some 4 percent of the time.
9
It suggests that the haziness and uncertainty surrounding our preferences may be deeper than
has hitherto been supposed.
4. Discussion and conclusion