H. Dawid J. of Economic Behavior Org. 41 2000 27–53 29
increasing returns to scale. In this section we also provide some considerations about what outcomes to expect for different parameter setups. Simulation results for this model are
shown in Section 3 where also the influence of parameter variations on the simulation results are explored. We finish with some concluding remarks in Section 4.
2. The model
In this section we describe the basic model we will use in our simulations. All agents are equal here and the attraction of specialization and trade entirely stems from increasing
returns to scale in production. Within the basic model we consider two different setups. In the first setup the agents cannot hold stocks and thus only engage in direct trade whereas
the second setup gives the agents the possibility of becoming a mediator, building up stocks and exclusively concentrating on trading.
2.1. The agents We model the evolution of a system of n interacting economic agents who may split
their available time on production and trading. All agents have identical technology and preferences but may differ in their actions. Each agent has a fixed time budget in each period
which is normalized to one. There are two goods in the economy and he may spend this time on producing good 1, producing good 2 or trading. The behavior of agent i is described
by two variables governing his production sp
i
, pr
i
and one variable s
i
determining the amount of time he invests in trading. The variables may vary with time but we omit the time
argument in our notation. The variable sp
i
∈ [0, 1] denotes the degree of specialization in the production decision of the agent, whereas pr
i
∈ {1, 2} determines which of the two products the agent specializes in. If we denote by x
i g
the fraction of time the agent invests in producing good g we have
x
i 1
= 1 − s
i
sp
i
pr
i
= 1 1
− s
i
1 − sp
i
pr
i
= 2 x
i 2
= 1 − s
i
1 − sp
i
pr
i
= 1 1
− s
i
sp
i
pr
i
= 2. Of course, this implies
x
i 1
+ x
i 2
+ s
i
= 1 ∀ i = 1, . . . , n. Why we use this special representation of the production decision will become clear when
we describe the learning process. This representation allows the agents to change directly from specialization in good 1 to specialization in good 2 without going through states of
lesser specialization in-between.
Production functions of the goods are identical for all agents and read f
g
x
i g
= a
g
x
i g
α
g
, a
g
0, g = 1, 2,
30 H. Dawid J. of Economic Behavior Org. 41 2000 27–53
where increasing returns to scale are assumed α
g
1. The preferences of the agents are represented by the concave utility function
U c
i 1
, c
i 2
= b
1
c
i 1
β
1
+ b
2
c
i 2
β
2
, b
g
0, β
g
1. In what follows we always assume a
1
= a
2
, b
1
= b
2
, α
1
= α
2
and for computational convenience β
1
= β
2
= 12. The trading behavior of the agents is determined as follows: given a price p of good 1
expressed in units of good 2 the agent has to determine whether he likes to buy or sell good 1. Assuming that an agent currently holds γ
i
units of good 1, δ
i
units of good 2 and maximizes the utility gained by consuming all the goods he is holding after this trade he
wants to sell gp
; γ
i
, δ
i
= max 0,
γ
i
p
2
− δ
i
p1 + p
units of good 1, or buy hp
; γ
i
, δ
i
= max 0,
− γ
i
p
2
− δ
i
p1 + p
units, respectively. Whenever two agents are matched for trading the mechanism governing this matching is described below they exchange goods where the price p and the quantity y
of good 1 exchanged are determined such that excess demand for good 1 equals the excess supply. It is easy to see that with the supply and demand functions given above there is
always a unique pair p, y with this property.
2.2. Transactions within a period In each period t three different stages occur:
1. Production 2. Trade
3. Consumption In the first stage all agents produce according to their production variables x
i g
. After pro- duction they might trade the good. We denote by γ
i
and δ
i
the amount of goods 1 and 2 agent i is holding during the trading period. The holdings vary during the trading period
and these variables always denote the current value. Initially, we have γ
i
= f
1
x
i 1
, δ
i
= f
2
x
i 2
. The trading procedure explicitly introduces search costs into the model. The basic idea is
that every agent spends some time looking around and searching for a trading partner. There are some randomly chosen agents in the population he can meet if he invests enough time
in trading, but some of these possible matches are never realized because of a lack of time available for search. Think of a producer who has a number of potential trading partners
reachable within 1 day. If he decides to visit one of these agents he might either meet him in the middle, which means that both have to invest time, or — if the other agent does not
H. Dawid J. of Economic Behavior Org. 41 2000 27–53 31
sacrifice time for trading — go all the way and loose twice as much time. Of course he can also keep producing the whole time and wait for some of his neighbors to come all the
way and trade with him. In our simulations we use the following procedure to determine the trading partners. Two
agents i and j can only trade with one another if the sum of the time invested in trading exceeds some given threshold χ 0. The initial trading time budget for each agent is in
each period given by S
i
= s
i
and is reduced step by step during the trading period by the amount of time which has already been used for trading. Interpreting the effort invested in
trading as search costs this restriction means that the agents have to invest time for searching for partners if they like to trade. Note, however, that all effort might be invested by one party
allowing ‘professional’ traders who can be reached by others without any costs. We use the following matching algorithm for trading: Defining the ‘trading pool’ as the set of all
agents who might trade in the rest of period t the algorithm can be described as follows:
1. Choose randomly an agent i from the trading pool where each agent in the pool is chosen with the same probability.
2. If S
i
= 0 return agent i to the trading pool and goto 1. 3. Choose randomly again uniformly an agent j from the rest of the trading pool.
4. If S
i
+ S
j
χ goto 5, else determine amount and price for the trade between i and j . If δ
i
γ
j
δ
j
γ
i
good 1 is traded from agent i to agent j , if the inequality holds the other way round good 1 is traded from agent j to agent i. In the case of equality no
trade takes place. Let us assume that good 1 is traded from i to j . The amount traded is determined by the intersection of the trade functions of the two agents
y
ij
= γ
i
p
2 ij
− δ
i
p
ij
1 + p
ij
, 1
where p
ij
= s
δ
i
+ δ
j
γ
i
+ γ
j
2 is the corresponding price. In other words, we assume that if two producing agents
meet they are able to determine the price which allows them both to buy, respectively, sell the optimal amount given the trading price. Afterwards, the holdings of both agents
are updated accordingly:
γ
i
= γ
i
− y
ij
, δ
i
= δ
i
+ p
ij
y
ij
, γ
j
= γ
j
+ y
ij
, δ
j
= δ
j
− p
ij
y
ij
5. Update the time budgets S
k
= S
k
− min h
S
k
, max h
χ 2
, χ − S
l
ii ,
k = i, j, l = i, j, k 6= l
If S
j
= 0 eliminate j from the trading pool. If S
i
= 0 eliminate i from the trading pool and check whether there are still agents with S
k
0 in the trading pool. If this holds true goto 1 else stop trading. If S
i
0 goto 2. The time budget updating can be interpreted as follows: if two agents meet where both are looking for a partner each of
32 H. Dawid J. of Economic Behavior Org. 41 2000 27–53
them has to invest trading time in the amount of
χ 2
to find the partner; however, if an agent does not actively search for a partner or does so only for a short time S
i
χ 2 he is harder to find and his partner has to invest more time. Note also that this scheme
implies that a trader leaves the market as soon as he has invested all his trading time. Note that the trading scheme described above assumes that passive traders agents with
s
i
= 0 only trade once and leave the market afterwards. After the trading period all agents consume their current holding and receive a utility of
U
i
= b q
γ
i
+ p
δ
1
. 2.3. Model with mediators
The fact that we did not allow the agents to build up stocks in the previous model rules out the possibility of the emergence of agents who exclusively concentrate on trading without
producing any goods themselves. We call these agents mediators and will now extend the model by allowing the agents to decide to mediate in the market rather than to produce.
This is done by adding three more decision variables to the existing three decision variables of each agent. The first of the three additional variables, id
i
describes the identity of agent i. Whenever this variable has value 0 the agent is a producer and behaves in exactly the
same way as the agents described above. If id
i
= 1 the agent is a mediator. This implies that s
i
= 1 and no time is invested in producing. Furthermore, a mediator has a different kind of trading behavior than the producing agents. Whereas the producing agents trade
in a way to maximize their utility from consumption the meditator rather sells and buys good 1 at fixed prices. He sells one unit of good 1 for p
i s
units of good 2 and buys it for p
i b
units of the second good. These two prices are decision variables of the agent and again might change over time. Of course, mediators always markup selling from buying prices
and we have p
i s
p
i b
. To be able to mediate, an agent has to possess some stocks of both goods. Thus, we assume that whenever an agent changes due to imitation or innovation from
production to mediation he initially produces without trading for four periods two periods for each good and afterwards completely stops production and starts trading. The stock of
good g agent i holds is denoted by l
i g
. If an agent switches from mediation to production he leaves his stock untouched and is able to use this stock again if he decides to switch back
to mediation at some time. Mediators trade only with producers but never with other mediators. When a mediator i
is matched for trading with a producer j an amount of y
ij
= mingp
i s
, γ
j
, δ
j
, l
i 1
of good 1 is traded from i to j at price p
i s
and an amount of y
j i
= min hp
i b
, γ
j
, δ
j
, l
i 2
p
i b
of good 1 is traded from j to i at price p
i b
. Note that p
i s
p
i b
implies that at most one of these two amounts is positive. On the other hand, it is possible that both amounts equal
H. Dawid J. of Economic Behavior Org. 41 2000 27–53 33
zero and no trade occurs. The minimum operator used in these expressions ensures that no mediator sells a higher amount of a good than he has on stock. After the trade the current
holdings of the producer and the stock of the mediator are updated. The mediators try to keep their overall size of stock after consumption constant over
the periods. However decreasing marginal utilities of both goods makes it profitable for the mediators to smooth their consumption and consume equal amounts of both goods.
Thus, they consume the same aggregate amount of goods they have gained by trading in the current period but split consumption equally between the two goods in order to increase
utility. Denoting by l
i g,t
−1
the stock held at the end of period t − 1 and by ˜l
i g,t
the stock held after trading in period t we define c
i
= ˜l
i 1,t
+ ˜l
i 2,t
− l
i 1,t
−1
− l
i 2,t
−1
. In every period the consumption of mediator i in period t
γ
i
= min c
i
2 , l
i 1
, δ
i
= min c
i
2 , l
i 2
is subtracted from the stock. 2.4. Learning
The decisions of an agent whether he should concentrate entirely on production or invest some time in the search for trading partners and the determination of the production plan are
rather complex problems. How much time should be invested in trading and which goods should be offered depends crucially on the decisions of the other individuals. The payoff of a
certain strategy is also influenced by the fact which potential trading partners are randomly matched with an agent. Analytically determining the optimal strategy requires the exact
knowledge of all other agents’ actions in the next period and, even with this knowledge, quite complex calculations of the expected payoff of all available strategies. As pointed
out above we do not assume that the agents have the information and capability to carry out these calculations but rather use a simple learning rule to determine their strategy. The
proposed learning algorithm describes imitation based adaptation of the agents strategies. The individuals do not build any expectations in order to optimize anticipated payoffs
but just consider the past success of other individuals and try to imitate the ones with above average utility. Imitational learning rules have been analyzed in several mainly game
theoretic contexts e.g. Schlag 1998, Vega-Redondo 1995 or Björnerstedt and Weibull 1996 and it has been shown that in certain environments proportional imitation constitutes
an optimal learning rule Schlag, 1998. Although such a kind of adaptation underestimates the complexity of the actual decision making process of economic agents in most contexts it
can nevertheless provide interesting insights into the evolution of a population of boundedly rational agents who do not have enough information about their environment to be able to
predict future developments in a sensible way or to determine optimal responses to expected future developments. In such situations the reliance on strategies which worked well in the
past may indeed be rational behavior see also Pingle 1995.
We consider a learning process where all agents review their current strategy every τ periods. Upon reviewing his strategies an agent i may decide to adopt the strategy of
another agent based on the payoff in the previous τ periods. The probability to adopt the
34 H. Dawid J. of Economic Behavior Org. 41 2000 27–53
strategy of an agent j increases with the past payoff of j . Let ¯ U
j
denote the average payoff of agent j in the previous τ periods. With
π
j
i =
¯ U
j
j 6= i
w ¯ U
j
j = i
the probability that agent i adopts the strategy of agent j is given by 5
j
i =
π
j
i P
n k
=1
π
k
i ,
The parameter w ≥ 1 governs the inertia of the agent by increasing the probability that
he uses his own strategy again in the next τ periods. After all agents have adopted their new strategies these strategies are disrupted by stochastic shocks. These shocks might
incorporate implementation errors of the agents but also intended innovations. With some small probability µ
v
0 an amount ξ
v
generated by a normal distribution N 0, σ
2 v
is added to the variable v
∈ {sp, s} of agent i. If the shock drives a variable out of [0, 1] the variable is set to 0 or 1, respectively. The variable determining the good to specialize in,
pr is changed from 1 to 2 or vice versa with some small probability µ
pr
. This completes one learning step and all agents use their new strategies for the next τ periods. Afterwards
another learning step takes place and so on. In the model with mediators innovation effects also the three additional decision variables
id
i
, p
i s
and p
i b
. The variable id
i
is inverted with some small probability µ
id
and there is some probability that normally distributed noise is added to the prices. In case that innovations
would lead to a violation of p
i s
p
i b
these innovations are neglected. The learning dynamics we use may be interpreted as a stochastic version of the well
known replicator dynamics Taylor and Jonker, 1978 which was thoroughly analyzed in the biological and economic literature e.g. Cressman, 1992; Hofbauer and Sigmund, 1998. Of
course several variations of this rule could be considered. It would be especially interesting to consider a model where agents can only observe the strategies of the individuals they
meet for trading. However, here we assume that information spreads faster than the goods and that agents can also get information about individuals they do not meet themselves.
Further, we could include memory in our model and assume that agents remember also past payoffs or payoffs of trading with certain partners in the past. These extensions might make
the model slightly more realistic. However, we stick with the simple model presented above because it seems that this model captures the general main properties of imitation learning
and thus allows qualitative insights into the dynamics.
From a mathematical point of view the evolution of the population state can be described by a time homogeneous Markov process on a continuous state space. Due to the extremely
complicated structure of the transition functions a rigorous mathematical analysis seems to be impossible and even approximation results for decreasing mutation probabilities are
out of reach
2
. Thus we have to rely on simulations in order to study the dynamic behavior of the system. In the next subsection we will carry out a very loose comparative static
2
Kandori and Rob 1995 derive a general theory characterizing the long run outcome of learning dynamics with noise for low levels of noise. However, this theory needs a discrete state space and also a much simpler mode of
interaction than we have in our model.
H. Dawid J. of Economic Behavior Org. 41 2000 27–53 35
analyses pointing out some properties of the static model with completely rational agents. The dynamic properties will be studied in the next section.
2.5. Analytical considerations Having presented the model we will now shortly consider the case that the agents in the
model are completely rational. A rigorous derivation of all equilibria is quite complicated even in this static framework and we abstain from doing this here. However, we will get
some clues from these rather loose considerations which effects a variation of the parameters might have on the simulation results. In particular we are interested in the value parameter
α governing the increasing returns to scale in production. First of all we would like to note that an increase of α has two basic effects in our model. The first effect is that a high α
facilitates the specialization of the agents on producing only one good. This is quite obvious since specialization allows the agent to produce in a region where the marginal productivity
is larger than if he would produce both goods. If we consider an economy without trading this effect is of course at least partly neutralized by the fact that also the marginal utility of
a good decreases with increasing consumption. Which of these effects is stronger depends on whether α 2 or not. The second effect of an increase of α stems from the fact that the
opportunity costs of trading increases with increasing α. The producer always has to invest his most efficient production time in trading and the faster the efficiency of production
increases the more costly this is. Thus, we roughly might expect the following picture. For values of α only slightly larger than one specialization of production does not pay off and
accordingly there is also no trade in the economy. In such a case the average population payoff is
U
pr
= 2b r a
2
α
. For larger values of α specialization with direct trade becomes the most efficient way of
organizing the production process, but only as long as direct trade does not become too expensive for the producers. In a population where half of the agents produce good 1 and
the others produce good 2, but all agents invest a fraction of χ in trading which implies that every agent expects to meet one agent producing the other good per period the average
population payoff is
3
U
tr
= b p
2a1 − χ
α
. If α is large, it is rather expensive for the producers to invest time in trading since they have
to sacrifice their most efficient production time. If no mediation exists the optimal choice of the agents in such a situation would be to specialize in the production of one good without
trading. This means that agents can only consume one good, and accordingly have a utility of
U
sp
= b √
a.
3
We make the simplifying assumption that each agent indeed trades once per period.
36 H. Dawid J. of Economic Behavior Org. 41 2000 27–53
It is easy to see that this expression is larger than U
pr
whenever α 2 and larger than U
tr
if α ln 0.5 ln1 − χ. This shows that in the model without mediators trade is only
attractive for intermediate values of α. At this stage we would like to stress that these considerations only imply that trading
would pay off for certain values of α after it has been established. Of course this does by no means imply that trade will indeed emerge in a population with boundedly rational agents.
In a state where almost all agents spend all their time for production the probability to meet a trading partner is very small for an agent who starts trading and thus his expected payoff
might be much smaller than U
tr
. However, if several agents who start trading are matched by accident and their payoff is significantly larger than that of the rest of the population they
might eventually take over the whole population and establish trade in the economy. For which values of α trade does indeed occur in the long run cannot be answered analytically
but we have to rely on simulations here.
If we take into account the possibility of mediated trade, the effect, that trading becomes more expensive the higher α is, disappears. In such a case we might expect that professional
traders emerge and do all the trading whereas the producers do not spend any time on trading at all. In order to obtain exact analytical expressions characterizing the range of α favoring
direct trade, respectively, mediated trade we have to take into account the rather complicated matching algorithm. Accordingly these calculations become quite involved and we restrain
from carrying them out here. Again, we will gain some insights from the simulations.
Let us now consider a state where there are mediators in the market and determine which number of mediators we should expect. For sake of simplicity we assume that a fraction of
r2 of the agents only produce good 1 x
1
= 1, x
2
= s = 0, a fraction of r2 produces good 2 x
2
= 1, x
1
= s = 0 and a fraction of 1 − r of the agents does not produce at all but trades x
1
= x
2
= 0, s = 1. The number of producers a trader can deal with is constrained by the fact that each producer exchanges goods only with one trader per period and the
time constraint. Denote by µ = min [r1 − r, 2rχ1 + r] the expected number of
trading partners per trader. For the sake of simplicity let us assume that exactly half of these partners produce good 1 and half of them produce good 2. Let us further assume that the
trader buys good 1 at a price of p and sells it at a price of 1p such a behavior is optimal due to symmetry and that he can always deliver the quantity of goods he likes to trade.
Given these simplifying assumptions it is easy to see that the income per period of a trader is given by
U
med
= 2b r
µap 21
+ p 1
− p. In order to optimize this expression the trader should choose p
= √
2 − 1 which gives him
a utility of U
med
= √
2 √
2 − 1b
√ µa.
Accordingly, an amount of y = a
√ 2
− 1 √
2 of good 1 or good 2, respectively, is exchanged in any encounter between a producer and a trader. Since the probability that a
producer meets a trader is given by 1 − rrµ the utility of the expected income of a
producer per period reads
H. Dawid J. of Economic Behavior Org. 41 2000 27–53 37
U
pr
= b √
a
1
− µ1
− r r
+ µ 1
− r r
s 1
1 + p
+ µ 1
− r r
s p
2
1 + p
= b √
a 1
− µ1
− r r
+ µ
4
√ 2
1 − r
r .
In an equilibrium the utility of traders and producers must be equal. Under the assumption that the time constraint is not binding for the traders i.e. µ
= r1 − r this yields the equation
√ 2
√ 2
− 1 r
r 1
− r =
4
√ 2.
The unique solution is r
= 1
3 √
2 − 1
= 0.8047. For χ 0.2164 we have µ
= r1 − r. The utility in this state is given by U
med
= U
pr
=
4
√ 2b
√ a.
These considerations indicate that if all agents were completely rational, completely coor- dinated and χ is sufficiently small there would be about 20 mediators in a heterogeneous
state consisting of producers and mediators. A question to be answered in the next sec- tion is whether a similar degree of organization can actually be reached by a population of
boundedly rational agents and, if it can, what kind of transient behavior can be observed. Also, the effect of the parameters in the learning rule and χ which is neglected here on
the emergence of mediation will be studied by the means of simulations.
3. Simulation results