192 D.A. Seale et al. J. of Economic Behavior Org. 44 2001 177–200
Table 4 Payoffs by actual and hypothetical strategies in condition BAC
Trader Strategy pairs
A–A LE–LE
TT–TT LE–A
A–LE Buyer 1
1483 1549
1582 1549
1483 Buyer 2
1231 1412
1546 1412
1231 Buyer 3
1099 1463
1543 1463
1099 Buyer 4
698 1491
1624 1491
698 Buyer 5
1052 1409
1569 1410
1052 Buyer 6
1392 1542
1604 1542
1392 Buyer 7
1118 1416
1543 1416
1118 Buyer 8
1148 1426
1553 1426
1148 Buyer 9
1176 1474
1535 1474
1175 Buyer 10
1223 1454
1564 1454
1223 Mean buyers
1162 1464
1566 1464
1162 Seller 1
1486 1481
1582 1481
1485 Seller 2
1147 1234
1546 1234
1146 Seller 3
1405 1319
1543 1319
1405 Seller 4
415 1389
1624 1389
415 Seller 5
1744 1309
1569 1308
1745 Seller 6
1382 1545
1604 1545
1381 Seller 7
1337 1270
1543 1270
1338 Seller 8
1000 1345
1553 1345
999 Seller 9
1550 1388
1535 1388
1551 Seller 10
982 1395
1564 1395
982 Mean sellers
1245 1368
1566 1368
1245 Overall mean
1203 1416
1566 1416
1203 Percentage of deals made
42.6 52.4
76.4 52.4
42.6 Experiment 1 of DSR
Mean buyers 1457
1439 1536
1577 1182
Mean sellers 1028
1315 1536
1125 1112
Overall mean 1242
1377 1536
1125 1147
Percentage of deals made 51.6
52.6 77.0
57.8 40.8
improve those of the sellers and greatly reduce those of the buyers. This analysis, of course, ignores any changes the buyers might make during the experiment if sellers were to bid in
this fashion. Table 4 shows that actual buyers, in the face of such aggressive bidding by the sellers, would adaptively retreat during the experiment and the sellers would do even better.
In fact, the mean asks of the nine sellers excluding seller 4 who faced the inflexible buyer 4 is nearly as high as the expected LES values. Clearly there is a lesson to be learned here;
this topic is explored further in the concluding section of the paper.
5. A reinforcement-based adaptive learning model
The learning model proposed by DSR is invoked to explain the process by which the strategies of traders evolve over the 50 rounds of play. In contrast to other learning models,
D.A. Seale et al. J. of Economic Behavior Org. 44 2001 177–200 193
the focus of the model is on individual not aggregate behavior, and the goal is to account for the round-to-round changes in the decisions of both buyer and seller. The model makes
minimal demands on the rationality and reasoning ability of the traders. It assumes that the trader-seller or buyer-remembers what worked well poorly for him in the last round
of bargaining, and then does it more less frequently in the future. The model strives at parsimony. We subscribe to the approach e.g. Cooper and Feltovitch, 1996 that more
cognitively demanding models should be employed only after the simpler ones are proven incapable of accounting for the data.
The learning model in DSR maintains consistency with basic principles of learning be- havior, particularly with the effects of reinforcement, as observed and documented in the
vast psychological literature on animal and human learning. It embodies the ‘Law of Effect’ of Thorndike 1898, the ‘Power Law of Practice’ due to Blackburn 1936, the evidence
about the generalization of stimuli reported by many psychologists, and the role of reference points in determining if outcomes are perceived as positive or negative gains Kahneman
and Tversky, 1979. Predecessors include the learning direction theory Selten and Buchta, 1994 and the reinforcement-based learning model proposed and tested by Roth and Erev
1995. As with the RothErev model, the DSR model makes no cognitively demanding assumptions involving probability distributions over the opponent’s actions or Bayesian
updating of beliefs. It differs from the RothErev model by replacing their probabilistic re- sponse mechanism with a deterministic, and consequently more easily refutable mechanism;
by considering continuous rather than finite strategy sets with a small number of elements; by using a different approach to modeling stimulus generalization; and by postulating a
smaller number of free parameters.
5.1. The buyer’s model The buyer’s strategy is assumed to be a function specifying how much below her reser-
vation value she should bid for the item being transferred. On any trial t, the amount that the buyer bids, v
t
, relative to her reservation value on that trial, V
t
, is assumed to be described by the following function:
v
t
= min
V
t
, b
t − 1
1 − exp −V
t
b
t − 1
t = 1, 2, . . .
4 Eq. 4 defines an exponential function with the constraint that the bid cannot exceed the
buyer’s reservation price. The free parameter b
t
determines the shape of the exponential function at round t and hence the degree to which the buyer shades her bid below her
reservation value on that trial. Smaller values of b
t
represent more aggressive bidding. Thus, the buyer’s strategy space is represented by a one-parameter family of exponential
functions. Although this family of functions does not include the piecewise linear LES function, a close approximation can be achieved, with the proper choice of parameters.
The adaptive learning process by which buyers alter their strategy over successive bar- gains is modeled as follows. Past experience with successful or unsuccessful bids is assumed
to change the value of b
t
, and consequently the shape of the entire bid function. If a bid results in a trade being made on trial t v
t
c
t
, then b
t + 1
is adjusted downwards i.e. the
194 D.A. Seale et al. J. of Economic Behavior Org. 44 2001 177–200
tendency to bid below the reservation price is reinforced and the buyer shades her bid more next time in proportion to the profit realized, namely, V
t
− p
t
, where p
t
= v
t
+ c
t
2 is the price of the transaction on round t. Thus, following a successful trade on trial t − 1,
b
t
= b
t − 1
1 − w
+ b,t
V
t
− p
t
, 5
where w
+ b,t
= 1 − d
b
w
+ b,t−1
is an impact factor that incorporates the relative effect of a positive outcome, and d
b
0 d
b
≤ 1 is a discount factor that depreciates the impact of
the outcome as time evolves. If no trade takes place because the buyer bids too low on trial t v
t
c
t
, then b
t + 1
is adjusted upwards in proportion to the profit that the buyer could have made had she correctly
forecast the seller’s asking price. However, if no trade occurs because the seller’s asking price exceeds the buyer’s reservation price and hence no rational bid by the buyer would
have resulted in a trade, then the buyer has no reason to change her bidding policy and b
t + 1
remains unchanged. The following equation captures both of these effects: b
t
= b
t − 1
· max1, 1 + w
− b,t
V
t
− c
t
, 6
where w
− b,t
= 1 − d
b
w
− b,t−1
is an impact parameter for profits lost due to greedy bidding, and d
b
is as defined above. 5.2. The seller’s model
The sellers model is identical to the buyer’s model with the only change being in the labeling of the parameters. Corresponding to a reservation value on round t of C
t ,
the seller is assumed to place an ask, c
t
, determined by c
t
= max
C
t
, S
+
− s
t − 1
1 − exp −
S
+
− C
t
s
t − 1
t = 1, 2, . . .
7 where S
+
is the upper limit of the interval of the seller’s reservation values S
+
= 200 in
both conditions SA and SLA. Successful or unsuccessful transactions on round t − 1 are assumed to change the value of the parameter s
t
and, consequently, the entire offer function, as follows:
s
t
= s
t − 1
1 − w
+ s,t
p
t
− C
t
, if v
t
≥ c
t
8 and
s
t
= s
t − 1
· max1, 1 + w
− s,t
v
t
− C
t
, if v
t
c
t
where w
+ s,t
and w
− s,t
are impact parameters for the seller with the same interpretation as the corresponding impact parameters for the buyer, d
s
0 d
s
≤ 1 is the discount factor for
the seller, w
+ s,t
= 1 − d
s
w
+ s,t−1
, and w
− s,t
= 1 − d
s
w
− s,t−1
. When v c in Eq. 7, either v − C ≥
0 or v − C 0. In the former case, the seller ascribes the opportunity loss to his greediness; by lowering his offer to c = v he could have made a positive profit. As a result,
the seller lowers his ask function on the next round of play. In the latter case, he is supposed
D.A. Seale et al. J. of Economic Behavior Org. 44 2001 177–200 195
to place the responsibility for the loss of transaction on the buyer, and consequently leaves his ask function unaltered.
In summary, the buyer’s bids are accounted for by four parameters: b
t
— the single parameter of the exponential bid function, w
+ t
and w
− t
— the two impact parameters for successful and unsuccessful trades, and d
b
— a parameter discounting the effects of w
+
and w
−
. The seller’s asks are described by the same reinforcement-based learning model.
6. Dynamic analysis