A reinforcement-based adaptive learning model

192 D.A. Seale et al. J. of Economic Behavior Org. 44 2001 177–200 Table 4 Payoffs by actual and hypothetical strategies in condition BAC Trader Strategy pairs A–A LE–LE TT–TT LE–A A–LE Buyer 1 1483 1549 1582 1549 1483 Buyer 2 1231 1412 1546 1412 1231 Buyer 3 1099 1463 1543 1463 1099 Buyer 4 698 1491 1624 1491 698 Buyer 5 1052 1409 1569 1410 1052 Buyer 6 1392 1542 1604 1542 1392 Buyer 7 1118 1416 1543 1416 1118 Buyer 8 1148 1426 1553 1426 1148 Buyer 9 1176 1474 1535 1474 1175 Buyer 10 1223 1454 1564 1454 1223 Mean buyers 1162 1464 1566 1464 1162 Seller 1 1486 1481 1582 1481 1485 Seller 2 1147 1234 1546 1234 1146 Seller 3 1405 1319 1543 1319 1405 Seller 4 415 1389 1624 1389 415 Seller 5 1744 1309 1569 1308 1745 Seller 6 1382 1545 1604 1545 1381 Seller 7 1337 1270 1543 1270 1338 Seller 8 1000 1345 1553 1345 999 Seller 9 1550 1388 1535 1388 1551 Seller 10 982 1395 1564 1395 982 Mean sellers 1245 1368 1566 1368 1245 Overall mean 1203 1416 1566 1416 1203 Percentage of deals made 42.6 52.4 76.4 52.4 42.6 Experiment 1 of DSR Mean buyers 1457 1439 1536 1577 1182 Mean sellers 1028 1315 1536 1125 1112 Overall mean 1242 1377 1536 1125 1147 Percentage of deals made 51.6 52.6 77.0 57.8 40.8 improve those of the sellers and greatly reduce those of the buyers. This analysis, of course, ignores any changes the buyers might make during the experiment if sellers were to bid in this fashion. Table 4 shows that actual buyers, in the face of such aggressive bidding by the sellers, would adaptively retreat during the experiment and the sellers would do even better. In fact, the mean asks of the nine sellers excluding seller 4 who faced the inflexible buyer 4 is nearly as high as the expected LES values. Clearly there is a lesson to be learned here; this topic is explored further in the concluding section of the paper.

5. A reinforcement-based adaptive learning model

The learning model proposed by DSR is invoked to explain the process by which the strategies of traders evolve over the 50 rounds of play. In contrast to other learning models, D.A. Seale et al. J. of Economic Behavior Org. 44 2001 177–200 193 the focus of the model is on individual not aggregate behavior, and the goal is to account for the round-to-round changes in the decisions of both buyer and seller. The model makes minimal demands on the rationality and reasoning ability of the traders. It assumes that the trader-seller or buyer-remembers what worked well poorly for him in the last round of bargaining, and then does it more less frequently in the future. The model strives at parsimony. We subscribe to the approach e.g. Cooper and Feltovitch, 1996 that more cognitively demanding models should be employed only after the simpler ones are proven incapable of accounting for the data. The learning model in DSR maintains consistency with basic principles of learning be- havior, particularly with the effects of reinforcement, as observed and documented in the vast psychological literature on animal and human learning. It embodies the ‘Law of Effect’ of Thorndike 1898, the ‘Power Law of Practice’ due to Blackburn 1936, the evidence about the generalization of stimuli reported by many psychologists, and the role of reference points in determining if outcomes are perceived as positive or negative gains Kahneman and Tversky, 1979. Predecessors include the learning direction theory Selten and Buchta, 1994 and the reinforcement-based learning model proposed and tested by Roth and Erev 1995. As with the RothErev model, the DSR model makes no cognitively demanding assumptions involving probability distributions over the opponent’s actions or Bayesian updating of beliefs. It differs from the RothErev model by replacing their probabilistic re- sponse mechanism with a deterministic, and consequently more easily refutable mechanism; by considering continuous rather than finite strategy sets with a small number of elements; by using a different approach to modeling stimulus generalization; and by postulating a smaller number of free parameters. 5.1. The buyer’s model The buyer’s strategy is assumed to be a function specifying how much below her reser- vation value she should bid for the item being transferred. On any trial t, the amount that the buyer bids, v t , relative to her reservation value on that trial, V t , is assumed to be described by the following function: v t = min V t , b t − 1 1 − exp −V t b t − 1 t = 1, 2, . . . 4 Eq. 4 defines an exponential function with the constraint that the bid cannot exceed the buyer’s reservation price. The free parameter b t determines the shape of the exponential function at round t and hence the degree to which the buyer shades her bid below her reservation value on that trial. Smaller values of b t represent more aggressive bidding. Thus, the buyer’s strategy space is represented by a one-parameter family of exponential functions. Although this family of functions does not include the piecewise linear LES function, a close approximation can be achieved, with the proper choice of parameters. The adaptive learning process by which buyers alter their strategy over successive bar- gains is modeled as follows. Past experience with successful or unsuccessful bids is assumed to change the value of b t , and consequently the shape of the entire bid function. If a bid results in a trade being made on trial t v t c t , then b t + 1 is adjusted downwards i.e. the 194 D.A. Seale et al. J. of Economic Behavior Org. 44 2001 177–200 tendency to bid below the reservation price is reinforced and the buyer shades her bid more next time in proportion to the profit realized, namely, V t − p t , where p t = v t + c t 2 is the price of the transaction on round t. Thus, following a successful trade on trial t − 1, b t = b t − 1 1 − w + b,t V t − p t , 5 where w + b,t = 1 − d b w + b,t−1 is an impact factor that incorporates the relative effect of a positive outcome, and d b 0 d b ≤ 1 is a discount factor that depreciates the impact of the outcome as time evolves. If no trade takes place because the buyer bids too low on trial t v t c t , then b t + 1 is adjusted upwards in proportion to the profit that the buyer could have made had she correctly forecast the seller’s asking price. However, if no trade occurs because the seller’s asking price exceeds the buyer’s reservation price and hence no rational bid by the buyer would have resulted in a trade, then the buyer has no reason to change her bidding policy and b t + 1 remains unchanged. The following equation captures both of these effects: b t = b t − 1 · max1, 1 + w − b,t V t − c t , 6 where w − b,t = 1 − d b w − b,t−1 is an impact parameter for profits lost due to greedy bidding, and d b is as defined above. 5.2. The seller’s model The sellers model is identical to the buyer’s model with the only change being in the labeling of the parameters. Corresponding to a reservation value on round t of C t , the seller is assumed to place an ask, c t , determined by c t = max C t , S + − s t − 1 1 − exp − S + − C t s t − 1 t = 1, 2, . . . 7 where S + is the upper limit of the interval of the seller’s reservation values S + = 200 in both conditions SA and SLA. Successful or unsuccessful transactions on round t − 1 are assumed to change the value of the parameter s t and, consequently, the entire offer function, as follows: s t = s t − 1 1 − w + s,t p t − C t , if v t ≥ c t 8 and s t = s t − 1 · max1, 1 + w − s,t v t − C t , if v t c t where w + s,t and w − s,t are impact parameters for the seller with the same interpretation as the corresponding impact parameters for the buyer, d s 0 d s ≤ 1 is the discount factor for the seller, w + s,t = 1 − d s w + s,t−1 , and w − s,t = 1 − d s w − s,t−1 . When v c in Eq. 7, either v − C ≥ 0 or v − C 0. In the former case, the seller ascribes the opportunity loss to his greediness; by lowering his offer to c = v he could have made a positive profit. As a result, the seller lowers his ask function on the next round of play. In the latter case, he is supposed D.A. Seale et al. J. of Economic Behavior Org. 44 2001 177–200 195 to place the responsibility for the loss of transaction on the buyer, and consequently leaves his ask function unaltered. In summary, the buyer’s bids are accounted for by four parameters: b t — the single parameter of the exponential bid function, w + t and w − t — the two impact parameters for successful and unsuccessful trades, and d b — a parameter discounting the effects of w + and w − . The seller’s asks are described by the same reinforcement-based learning model.

6. Dynamic analysis