An exploratory experiment and results

282 D.J. Butler J. of Economic Behavior Org. 41 2000 277–297 the outcome for contiguous events, rather than one large event, even though the probability of the consequence being realised is unchanged see Fig. 3. Such effects are inconsistent with all choice theories derived from a compensatory choice process, though they are con- sistent with the non-compensatory ‘majority of confirming decisions’ MCD rule in the choice problems of interest to us. They are in themselves highly suggestive of incomplete preferences. A test for ESE’s is designed into the experiment.

3. An exploratory experiment and results

3.1. The experiment I designed a computerised experiment to shed some light on the hypotheses. The exper- iment was written in Borland C++version 3.0. An important innovation in the design was the attempt to observe the strength of an individual’s preference in each of the pair-wise choices, using a graphical ‘strength of preference’ indicator. This indicator has strong sim- ilarities to the ‘confidence’ indicator used in Butler and Loomes 1988. I assume a clear preference to exist when a high number ≥7 is selected on the preference indicator. Two pairs of lotteries were employed, which are termed the low and high gap lotteries. The set of lotteries used in the 16 questions is detailed in Fig. 1. The testing of H1 and H2 used a low gap lottery pair that offered a 30 percent chance of 44 and 70 percent chance of 0, or a 70 percent chance of 16 and 30 percent chance of 0. The GEU’s of these are assumed to be close because the EV of the high variance risky lottery exceeds the EV of the risk-averse lottery. If risk aversion is a typical characteristic of clear utility functions, this combination should for many people lead to reasonably equal utilities of the lotteries presented. The high gap lottery pair used to test H1 and H2 offered a 40 percent chance of 33 and 60 percent chance of 0, or a 60 percent chance of 26 and a 40 percent chance of 0. This time the EV of the risk-averse lottery exceeds that of the risky lottery, which should reinforce any natural tendency toward risk-aversion. The information and practice question screens used in the experiment are illustrated in Fig. 2. Ninety-four students from the University of Western Australia were recruited by cross- campus advertising. A total of 16 experimental sessions were held. The experimenter assigned subjects to a terminal and gave a brief description of the experiment and the payment mechanism. As part of this explanation, subjects completed three practice ques- tions, one from each display that they would encounter. The experimenter ensured that all subjects understood how to use the preference indicator before the experiment proper began. There were three very different displays in which the gamble pairs were presented. Examples of these are illustrated in Fig. 2, and will be referred to here as the pie, strip, and bar displays. The pie and strip displays have often been used in choice experiments, the bar display has not. Subjects had to select either the risky or risk-averse option. Subjects may have had more difficulty forming a preference using the bar display in this experiment, as the value of the outcomes was not explicitly stated. As the lottery outcomes were stated clearly in the strip and pie displays, this suggests the bar display used here was fundamentally different. If so then we cannot be certain whether to attribute differences in choice behaviour D.J. Butler J. of Economic Behavior Org. 41 2000 277–297 283 Fig. 1. Details of the choice problems. A total of 18 sub-groups used the payoff values for u,v and probabilities for A, B, C listed above. The question numbers, types and displays indicated are those for one sub-group. For examples of question type, see Fig. 3 a,b and c. Questions 7–10 were used for the monotonicity test. The hypotheses were explored using the high and low gap lottery pairs in questions 1–6 and 11–16. to some inherent property of the bar display or to the ease with which the outcomes were discerned. A slightly different design could address this issue. 7 The three display groups were then split so that half had the labels ‘A’ and ‘B’ on the lotteries reversed, and finally the resulting six groups were each broken into three different question orderings. In total this gave 18 subgroups; subjects were randomly assigned to these subgroups over the various sessions. The purpose of the labelling and question-order subgroups was to minimise other possible sources of bias. Camerer 1989 found that choice reversal rates can depend directly on the gap between the first encounter and the second, although the reversal rate stabilises if the repetition is separated by a gap of more than five other choice problems. Consequently, all pure repeats in this experiment were kept six choices apart to lessen the memory factor in lowering reversal rates. Each subject’s response to a question was converted into a score from 1 to 18. On this scale, a choice of gamble ‘A’ coupled with a maximum preference score of 9 was assigned an extreme value of 1. A choice of gamble ‘B’ coupled with a maximum preference of 9 was assigned the opposite extreme value of 18, thereby creating a continuum stretching from 1 to 18 denoting decreasing preference for ‘A’. Hence, a subject choosing ‘A’ combined with a preference strength of just 3 would have a score of 7 for that question; a subject choosing ‘B’ with a preference strength of 7 would have a score of 16 assigned. This method allows 7 Simply print the dollar value on each bar. 284 D.J. Butler J. of Economic Behavior Org. 41 2000 277–297 Fig. 2. Experimental displays used. D.J. Butler J. of Economic Behavior Org. 41 2000 277–297 285 Fig. 2 Continued. 286 D.J. Butler J. of Economic Behavior Org. 41 2000 277–297 Fig. 3. The juxtapositioning of lottery consequences: strip display. Other displays used equivalent transformations. Ref Figs. 1 and 2. both the subject’s choice and strength of preference for a question to be combined into a single indicator. There are two possible ways of recording ‘errors’ in this design. First, like other re- searchers the paper focuses on those cases where the choice between ‘A’ and ‘B’ reverses. I will refer to this as the ‘standard’ measure. Assuming subjects who used a 7, 8 or 9 on the 9-point preference scale had a clear preference, the dependence of choice reversals errors on preference clarity could be investigated. The indicator was designed with this definition in mind see Fig. 2. Additionally, we can look to see the proportion of subjects that made a change of 9 or more points on the 18-point preference scale. We will call this a ‘9+reversal rate’. Where relevant, both sets of results will be reported; the latter definition is clearly the more stringent, though not directly comparable to results of previous research. Each subject also faced pairs of choices that: 1. controlled for displays and event-splitting effects but changed the juxtaposition of con- sequences; 2. controlled for displays and juxtapositions but changed the number of event splits; 3. controlled for displays but allowed the juxtaposition and the number of event splits to change simultaneously. This design see Fig. 3 permits the display-dependence of regret and event-splitting effects to be investigated. Finally, subjects were told that payment would depend on one of their chosen gambles being played out for real, and that payments could range from zero to A50. At the end D.J. Butler J. of Economic Behavior Org. 41 2000 277–297 287 Table 1 Proportions of risky choices, by display and clarity Display Risky Risk Total of risky of risk averse Overall average averse on unclear on unclear preference strength High gap lottery pair a Strip 4.3 95.6 184 75.0 18.2 7.4 Pie 8.8 91.2 192 94.1 30.3 6.76 Bar 15.5 84.5 180 78.6 48.7 6.27 Average 9.5 90.5 556 83.0 31.6 6.81 Low gap lottery pair b Strip 42.4 57.6 186 72.1 57.0 5.83 Pie 31.7 68.3 180 75.4 55.3 6.12 Bar 42.7 57.3 178 68.4 57.8 5.98 Average 39.0 61.0 544 71.7 56.6 5.97 a χ 2 = 13.46 5.99, therefore, the proportion of risky choices is display dependent; χ 2 = 2.23 5.99 therefore the proportion of risky choices on unclear preferences is not display dependent; χ 2 = 35.3 5.99, therefore, the proportion of risk averse choices on unclear preferences is display dependent. b χ 2 = 6.03 5.99, therefore, the proportion of risky choices is display dependent; χ 2 = 0.80 5.99, therefore, the proportion of risky choices on unclear preferences is not display dependent; χ 2 = 0.16 5.99, therefore, the proportion of risk averse choices on unclear preferences is not display dependent. of the experiment a random device selected one of the 16 questions the subject faced. The lottery was then played for real by allowing the subject to draw a lottery ticket numbered from 1 to 100 from a box, and payment occurred in accordance with their lottery choice for that ticket. Overall, around half of the subjects won money, the average sum being A 26 for less than an hour of their time. The experimental design follows the incentive-compatible method developed by Becker et al. 1964, but it is not completely uncontroversial Machina, 1989. This is because a subject’s answers to any one choice problem needs to be independent of their responses to any other choice problems in the experiment. If not, they may reduce what then becomes a two-stage lottery to a one-stage lottery, thereby undermining the incentive structure. However, Starmer and Sugden 1991 provide strong evidence against this reduction process in experiments of this type. I shall follow the bulk of the literature in assuming that the Becker, DeGroot, Marschak method is basically sound. 3.2. Experimental results Hypotheses 1a and b, and Hypothesis 2b can be investigated with reference to Table 1. As six otherwise identical questions were asked in each of the three displays for both the high and low gap pair we can compare the average confidence scores for each display for these questions. It is clear that preferences in the low gap lottery pair Table 1 are much hazier than for the high gap pair Table 1, as H1a proposed. Table 1 also shows a near perfect negative rank correlation between the average preference strength for the question and the degree of pro-risk behaviour, as H1b implies. The great majority of risky choices occurred on unclear preferences, far higher than the proportion for risk-averse choices, especially for the high gap pair. Note that in general there is no additional display effect on the proportion 288 D.J. Butler J. of Economic Behavior Org. 41 2000 277–297 Table 2 Strength of preference by presentational display a Percentage of subjects indicating clear choices both times the same lottery pair was presented b Display Clear Total Pie 46.1 102 Bar 31.1 90 Strip 63.7 102 Average 47.3 294 a Note: The 294 identical repeats comprise three repeats for each of the 94 subjects plus an additional unplanned repeat for 12 subjects. The latter was due to a typographical error in a question shown to some sub-groups, which also explains why some of the other totals reported may not be divisible by 94. b χ 2 19.3 χ 2 . 05,2 = 5.99, therefore, the display affects the proportion of clear preferences. of risky choices on unclear preferences. This is to be expected, as the display effect operates by changing the ease of preference formation, and as a consequence the proportions of risky choices. 8 If GEU models are not core preference theories, this suggests that use of a low gap lottery pair has the potential to be more conducive to any choice switches, be they errors, GEU effects or Event-Splitting Effects, than would a high gap pair. After all, if no risky choices are made at all, then all choice-switching rates must be zero. Hypothesis 2 focuses directly on the core preferences issue. Table 2 shows the clarity of preferences by display, for the questions repeated identically, all using the high gap lottery pair. These results suggest that the proportion of subjects with clear preferences did indeed vary by display, as Table 1 also suggested. There is a clear ranking of the displays used, with the strip display promoting the clearest preferences. These findings are consistent with H2a. Given the success of H2a, it is not surprising that Hypothesis 2b on the frequency of risky choices is also supported, as can be seen in Table 1. These results suggest we may find support for our explanation of choice errors and GEU effects, as hypothesised in H2c. Choice reversals should occur predominantly for those choices where subjects are least sure of their preference, rather than being spread across preference levels as Harless and Camerer’s 1994 ‘white noise’ theory would imply. The breakdown of the 294 ques- tions that were repeated identically is as follows not shown in Table 2. Of the 139 times that a confident choice is made on both the original and repeat question, there are only three choice-reversals, giving a reversal rate of 2.1 percent. Of the remaining 155 times where a subject is less confident of their choice on one or both occasions, there are 32 choice reversals, giving a reversal rate of 20.6 percent. This difference is easily significant: χ 2 = 23.9 χ 2 . 05,1 = 3.84. Overall, there are 35 reversals, representing 11.9 percent of the total 294 questions. For the same 294 repetitions, the 35 reversals were spread across displays as shown in Table 3. Given the relatively small number of reversals, these display differences narrowly fail to be significant at the 5 percent level, although they are easily significant at the 10 percent level. However, the ranking of choice reversal rates by display is consistent with 8 The puzzling exception see Table 1 involved risk-averse choices in the high gap pair. D.J. Butler J. of Economic Behavior Org. 41 2000 277–297 289 Table 3 Choice reversals by display a Standard measure 9 + measure Display Reversals Total Reversals Total Pie 14.7 102 8.8 102 Bar 15.5 90 14.4 90 Strip 5.9 102 4.9 102 Total 11.9 294 9.2 294 a χ 2 5.42 χ 2 . 05,2 = 5.99, therefore, choice reversals narrowly fail to be display dependent at the 5 level. Hypothesis 2: the strip display has the largest proportion of clear preferences and the fewest reversals, the bar display the least clear preferences and the most reversals. These results show a lower reversal rate than those found by others, for example. Hey and Orme 1994. One plausible explanation is that we used only the high gap lottery pair, whereas Hey and Orme’s choice pairs generally involved lottery pairs with more similar expected values. The experiment also offered precisely the same gamble to a subject twice, but in different displays each time. The findings for the three display groups, all of which used the low gap lottery pair, are shown in Table 4. The overall rate of 33.8 percent is well in excess of the 11.9 percent reported previously for the high gap pair when the display was unchanged. Using the 18-point scale, exactly the same score was chosen by just 25.4 percent 65256 for choices repeated in different displays. This compares with 42.5 percent 114268 when the choice was repeated within a display on the high gap pair. The dramatic difference in choice reversal rates between Tables 3 and 4 is explained by both the use of the low gap rather than high gap lottery pair in Table 4, and the trans-display comparison. It is not possible to separate the impacts of the display-change and GEU-gap change in this particular study, however, that now seems a worthwhile question for future work. Continuing with Hypothesis 2, we now look at the choice-rule effects. There are three plausible ways of measuring such effects in this experiment. First, there is the traditional count when the lotteries’ consequences are juxtaposed, andor the number of events split. Secondly, the number of 9+switches could be counted. Thirdly, the number of shifts in the direction predicted by these theories on the 18-point scale can be calculated. Each of these scores can then be contrasted with the number of moves in the opposite direction. Table 4 Choice reversals as display changes a Standard measure 9 + measures Display pairs Reversals Total Reversals Total Strip and bar 38.4 86 24.4 86 Bar and pie 31.7 80 15.0 80 Strip and pie 31.2 90 14.4 90 Total 33.8 256 18.0 256 a χ 2 1.3 χ 2 . 05,2 = 5.99, therefore, no display-pair differences in reversal rates. 290 D.J. Butler J. of Economic Behavior Org. 41 2000 277–297 Table 5 Combined juxtapositionevent-splitting effects i a Number of subjects changing choices following the above change, by clarity Predicted Unpredicted Neither Total Clear 5.1 1.7 93.2 117 Unclear 19.4 6.1 74.5 247 Total 14.8 4.7 80.5 364 a χ 2 17.63 χ 2 . 05,2 = 5.99, therefore, combined juxtaposition effects are concentrated on unclear preferences. The first general result is that, overall, the frequency of combined juxtapositionevent- splitting, pure regret and pure event-splitting effects is rather low in comparison with other studies. Again, the difference in expected values of the lotteries was somewhat higher than in other experiments; this should not have prevented movements in the predicted direction on the 18-point scale though. Half of the comparisons used the high gap pair and half the low gap pair, for each test and each display. Tables 5 and 6 give the results for combined juxtaposition effects i.e. a regret test without standardising for event-splitting effects; compare problems a and c in Fig. 3. If the preference construction explanation of GEU effects is correct, we would need to see the great majority occur on choices where preferences were unclear. This is what we find. However, it is important to state that although the evidence is consistent with that view, by itself the results in Table 5 do not constitute evidence against the core theory alternative. This is because a person may have GEU preferences without producing a GEU ‘effect’ if the utilities of the lotteries are sufficiently different. If so then such a person would report clear preferences and their choice would appear to be consistent with EU, despite their true preferences being GEU. Of course, the greater the evidence consistent with the preference constructing view, the more the onus shifts to advocates of GEU models to demonstrate why their interpretation of this is the more plausible. Table 6 investigates the display dependence of these effects. The best display to observe them is the pie display. Although the display differences are not significant at the 5 percent level, they are so at the 10 percent level. Using the 18-point scale, of the 61.5 percent of movements, 67 percent favour the predicted direction overall, comprising a 60 percent share in the bar display, 63 percent in the strip display, and an overwhelming 79 percent in the Table 6 Combined juxtapositionevent-splitting effects ii a Numbers of subjects changing choices following a juxtaposition of consequences by display b Standard measure 9 + measure 18-point moves Display P U N P U N P U N T Pie 18.5 0.8 80.7 11.3 0.8 87.9 45.2 12.1 42.7 124 Bar 12.1 7.2 80.7 8.9 4.0 87.1 37.1 25.0 37.9 124 Strip 13.8 6.0 80.2 8.6 6.0 85.4 41.4 24.1 34.5 116 Total 14.8 4.7 80.5 9.6 3.6 86.8 41.2 20.3 38.5 364 a χ 2 8.52 χ 2 . 05,4 = 9.48, therefore, combined juxtaposition effects narrowly fail to be display dependent. b P = predicted, U = unpredicted and N = neither. D.J. Butler J. of Economic Behavior Org. 41 2000 277–297 291 Table 7 Regret effects i a Number of subjects changing choices following a pure juxtaposition of consequences, by clarity Predicted Unpredicted Neither Total Clear 3.4 1.3 95.2 146 Unclear 11.9 8.1 80.0 210 Total 8.4 5.3 86.2 356 a χ 2 16.96 χ 2 . 05,2 = 5.99, therefore, regret effects are concentrated on hazy preferences. pie display. Taken together these results for combined effects suggest that the juxtaposition phenomenon may be a milder, but more widespread phenomenon than has previously been realised. Tables 7 and 8 look at the pure-regret effects i.e. juxtaposition effects after standardising for event splits. Again, Table 7 shows that observed effects are several times more likely to occur on unclear preferences. By display, the presentations this time do have a statistically significant impact on these effects. The predicted effects are concentrated in the bar display. Of the 58 percent who move on the 18-point scale under a pure regret test, 59 percent move in the predicted direction. These results suggest that the regret effect is less pronounced than the juxtaposition effect, and is more concentrated in the bar display. The regret effect does not appear to be strong, though the asymmetry on the 18-point scale suggests it may be a genuine, though very mild, factor in some peoples’ decisions. Tables 9 and 10 show the results for pure event-splitting effects. Using the conventional choice count measure, Table 9 shows event-splitting effects are also concentrated where preferences were hazy. The displays however, are not the cause of a statistically significant difference in these effects, though a larger experiment may have shown them to be. The pie display is most suited to these effects, and the bar display the least suited. Using the more general test of the number of shifts of any magnitude on the 18-point scale, we find that for event-splitting effects, predicted shifts comprise 61 percent of the 63 percent of movements; this conceals a 53 percent share in the bar display, and a 66 percent share for the other displays. Table 8 Regret effects ii a Number of subjects changing choices following a pure juxtaposition of consequences, by display b Standard measure 9 + measure 18-point moves Display P U N P U N P U N T Pie 5.8 1.7 92.5 3.3 1.7 95.0 38.3 19.2 42.5 120 Bar 13.1 7.9 78.9 10.5 5.3 84.2 36.8 22.8 40.3 114 Strip 6.5 6.5 86.9 4.9 4.1 91.0 27.9 29.5 42.6 122 Total 8.4 5.3 86.2 6.2 3.6 90.2 34.2 23.9 41.8 356 a χ 2 10.5 χ 2 . 05,4 = 9.48, therefore, regret effects are display dependent. b P = predicted, U =unpredicted and N = neither. 292 D.J. Butler J. of Economic Behavior Org. 41 2000 277–297 Table 9 Event-splitting effects i a Number of subjects changing choices following a splitting of events, by preference strength Predicted Unpredicted Neither Total Clear 3.5 2.6 93.8 113 Unclear 19.6 9.4 71.0 235 Total 14.4 7.2 78.4 348 a χ 2 23.4 χ 2 . 05,2 = 5.99, therefore, event splits are concentrated on hazy preferences. An unexpected result was also found. Four questions 7–10 in Fig. 1 unconnected to testing for choice reversals or for juxtapositionevent-splitting effects were also faced by each subject. Their purpose was to be identical to one of the other questions, except for an improvement to either the probability or dollar value of a positive consequence. Assuming event-wise monotonicity, the idea was to build into the experimental design a test of the preference indicator. For instance, if a subject chose gamble ‘A’ over gamble ‘B’ in one question, then faced another question that had improved either the probability of a favourable outcome or the consequence concerned, then ceteris paribus their confidence in having chosen ‘A’ should improve. Similarly, if they had chosen ‘B’ on the first question, their confidence in that choice should now fall, and for some their choice would switch to ‘A’. The results however, confounded this plan. Table 11 shows that out of 542 valid com- parisons, 402 do not change their choice and 117 others switch their choice in the expected direction. But no fewer than 23, or 4.2 percent, switch their choices in the wrong direc- tion Of these 23, 16 also switch the wrong way using the 9+measure. Using the 18-point scale to check whether the preference strength moved in the expected direction regardless of whether the choices switch or not there are 315 moves in the predicted direction, but 84 in the opposite direction. Thus only 79 percent of the 399 movements are in favour of monotonicity. No existing normative theory of choice under risk could defend violations of the event-wise monotonicity axiom, but it is not inconsistent with the general thrust of this paper. Consider for example the within-display choice-reversal rate of 11.9 percent on the high gap lottery pair reported above. If an individual has unusually poor knowledge of their preferences, Table 10 Event-splitting effects ii a Number of subjects changing choices following a splitting of events, by display b Standard measure 9 + measure 18-point moves Display P U N P U N P U N T Pie 18.7 4.5 76.7 12.5 2.7 84.8 42.0 21.4 36.6 112 Bar 11.4 11.4 77.2 8.8 8.8 82.4 33.3 29.8 36.8 114 Strip 13.1 5.7 81.1 8.2 4.9 86.9 41.0 22.1 36.9 122 Total 14.4 7.2 78.4 9.8 5.5 84.7 38.8 24.4 36.8 348 a χ 2 6.8 χ 2 . 05,2 = 9.48, therefore, event splits are not display dependent. b P = predicted, U = unpredicted and N = neither. D.J. Butler J. of Economic Behavior Org. 41 2000 277–297 293 Table 11 Monotonicity violations a Percentage of subjects changing choices for the questions listed b Problem pair Standard measure 9 + measure P U N P U N Total 7 and 313 5.7 4.0 90.3 5.7 4.0 90.3 176 9 and 313 35.7 4.4 59.9 33.0 3.3 63.7 182 8 and 2 25.0 2.2 72.8 23.9 0.0 76.1 92 10 and 2 20.6 6.5 72.9 13.0 3.3 83.7 92 Total 21.6 4.2 74.1 19.2 3.0 77.8 542 a Note: Questions 3 and 13 were identical, so have been combined here. Each subject potentially provided six observations, and 542 of the 564 resulting comparisons were valid. b P = predicted, U = unpredicted and N = neither. it should be possible for a fraction of those people to violate event-wise monotonicity if an unsuitable choice rule is prompted. But even so, it should make economists uneasy that ceteris paribus improvements to the expected value of one gamble on the order of A 2–A 6, should result in a choice switch away from that gamble some 4 percent of the time. 9 It suggests that the haziness and uncertainty surrounding our preferences may be deeper than has hitherto been supposed.

4. Discussion and conclusion