07350015%2E2013%2E764696

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

Search With Dirichlet Priors: Estimation and
Implications for Consumer Demand
Sergei Koulayev
To cite this article: Sergei Koulayev (2013) Search With Dirichlet Priors: Estimation and
Implications for Consumer Demand, Journal of Business & Economic Statistics, 31:2, 226-239,
DOI: 10.1080/07350015.2013.764696
To link to this article: http://dx.doi.org/10.1080/07350015.2013.764696

Accepted author version posted online: 15
Jan 2013.

Submit your article to this journal

Article views: 227

View related articles


Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]

Date: 11 January 2016, At: 22:06

Search With Dirichlet Priors: Estimation
and Implications for Consumer Demand
Sergei KOULAYEV

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:06 11 January 2016

Keystone Strategy, LLC, Cambridge, MA 02140 ([email protected])
This article is an empirical application of the search model with an unknown distribution, as introduced by
Rothschild in 1974. For searchers who hold Dirichlet priors, we develop a novel characterization of optimal
search behavior. Our solution delivers easily computable formulas for the ex-ante purchase probabilities
as outcomes of search, as required by discrete-choice-based estimation. Using our method, we investigate
the consequences of consumer learning on the properties of search-generated demand. Holding search
costs constant, the search model from a known distribution predicts larger price elasticities, mainly for the
lower-priced products. We estimate a search model with Dirichlet priors, on a dataset of prices and market

shares of S&P 500 mutual funds. We find that the assumption of no uncertainty in consumer priors leads
to substantial biases in search cost estimates.
KEY WORDS: Consumer learning; Consumer search; Discrete choice models of demand.

1.

INTRODUCTION

One of the insights of the theoretical literature on consumer
search is the distinction between two types of uncertainty that a
searcher may face. One is the uncertainty about current prices, or
locations, which makes search necessary. Another is the uncertainty about the underlying process that generates these prices.
According to Stigler (1961) and McCall (1970), when a consumer knows the price distribution, she stops searching as soon
as she finds a price below her reservation price. As pointed out
by various authors, this stopping rule leads to a search behavior with some unrealistic properties. For one, consumers who
know the price distribution are perpetually optimistic: they keep
searching regardless that they are finding prices that are higher
than the reservation price. Another is the no-return property: the
consumer always buys from the seller visited last. As shown
by de los Santos, Hortacsu, and Wildenbeest (2012a), there are

plenty of return patterns in the real-world search data. Finally,
one cannot expect consumers to know factors that determine the
distribution of prices. Rothschild (1974) was the first to study
optimal search under both types of uncertainty. In his model,
a searcher holds Dirichlet priors about unknown price distribution, and updates them in a Bayesian way as new price quotes
arrive. Instead of a single reservation price, a search is governed
by a sequence of increasing reservation prices, as if consumers
become less demanding with time.
This article is the first to bring the search model with Dirichlet
priors to empirical work. Existing studies of consumer search
have relied almost exclusively on the search model from a known
distribution: for example, Hong and Shum (2006) and MoragaGonzalez, Sandor, and Wildenbeest (2009). The underlying assumption has been that a search model where beliefs have zero
prior uncertainty delivers a reasonably good approximation of
search behavior. We are interested in the implications of deviations from this assumption for estimates of search costs and for
price sensitivity of the search-generated demand.
Within the framework of search with Dirichlet priors, we
develop a novel characterization of optimal search that leads
to closed form, easily computable probabilities of purchase of

individual products. This allows us to recover search costs using

either individual purchase data or aggregate market shares. As
such, it is a useful input for any market-level analysis where
consumers search before they buy: studying a firm’s pricing
decisions, investment in new products, advertising, and so on.
In the search context, the main difficulty in computing purchase probabilities is integrating out unobserved search histories
or sequences of price quotes received by the searcher. We show
that a direct approach to integration, based on the reservation
price characterization, suffers from the curse of dimensionality: the cost of numerical integration quickly increases with the
number of products. Our approach is based on an observation
that a full search history contains more information than what is
necessary to compute the probability of purchase. In fact, with
Dirichlet priors, only two pieces of information characterize a
search: the identity of the second-best product among the discovered set, and the length of search preceding the discovery of
the best product. All potential search histories are classified by
these two variables, breaking down the curse of dimensionality.
Using our theoretical results, we show that besides being
more realistic the learning process changes consumers’ price
responses in a qualitative way. When consumers search, they
buy a product in one of two ways: either right away (“fresh
demand”), or by returning at a later stage (“returning demand”).

In a search with a known distribution, all demand is “fresh”:
consumers buy the product either right away, or never. Higher
price reduces demand by motivating some of these consumers to
continue searching, to never return. In the model with learning, a
substantial part of demand is “returning.” Such buyers are necessarily more pessimistic than the “fresh” ones: they buy because
they failed to find a better deal. Since these searchers find potential improvements unlikely, their expected benefit of search
is also less sensitive to price increases. As a result, the learning mechanism leads to a less price-sensitive demand overall.

226

© 2013 American Statistical Association
Journal of Business & Economic Statistics
April 2013, Vol. 31, No. 2
DOI: 10.1080/07350015.2013.764696

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:06 11 January 2016

Koulayev: Search With Dirichlet Priors

Although it is difficult to generalize this argument for an arbitrary search cost distribution, this intuition does help explain

some of our empirical findings.
We then investigate the empirical relevance of consumer
learning. First, we conduct a number of Monte Carlo exercises.
In these experiments, we recover parameters of search cost distribution by fitting a search model from a known distribution
on a simulated dataset of prices and market shares generated by
consumers who learn while searching. We find that the model
without learning consistently overestimates the median search
cost. The possible explanation is that since the model without
learning delivers more elastic demand, the optimization routine attempts to compensate for it by choosing higher values of
search costs to explain the observed price–quantity relationship.
The bias increases quickly with variance of the consumer prior:
that is, even a small deviation from the assumption of known
distribution is consequential.
Next, we apply our model to a real-world dataset of prices
(management fees) and market shares of S&P 500 mutual funds,
from 1995 to 2000. These data were previously used by Hortacsu
and Syverson (2004) to recover search costs from a search model
with a known distribution. The results are generally similar to
our findings in Monte Carlo experiments: holding the mean
prior constant, a moderate degree of prior uncertainty results

in substantial biases in estimates of search cost parameters and,
consequently, the price elasticity of demand for funds. Similar to
Monte Carlo results, the search model with a known distribution
overestimates the median search cost, but underestimates the
variance. The bias in price elasticity can be either positive or
negative, depending on product rank.
Although we believe that the model with learning is more
realistic, using data on prices and market shares only is not
sufficient to distinguish between the two approaches to search.
Specifically, we find that the level of fit is essentially the same
for different values of prior variance. The explanation is that
a sufficiently flexible specification of search costs is capable
of explaining the market shares data, without the need for additional degrees of freedom coming from unknown priors. To
detect learning requires more detailed search data, as we discuss
in the concluding section of this article.
In the existing literature, the first derivation of ex-ante distributions of outcomes of search, when consumers are learning, is
due to Morgan (1985). Building on Kohn and Shavell (1974),
Morgan proposed a method to compute reservation values prior
to search and used them to find distributions of the length of
search and of the final purchase. Unfortunately, his method is

computationally intensive, as it involves solving a number of dynamic problems. Also, the applicable class of beliefs does not
include the Dirichlet distribution. Bikhchandani and Sharma
(1996) demonstrated that for Dirichlet priors the reservation
prices can also be computed prior to search and they conjectured that it should be possible to compute ex-ante distributions
of search outcomes but did not derive probabilities of purchase.
In a related article, de los Santos, Hortacsu, and Wildenbeest
(2012b) also considered search with Dirichlet priors and developed a method of recovering search costs through inequality conditions implied by the observed search decisions. Their
method requires data on price histories observed by searchers,
while ours relies only on market share data.

227

The literature has also explored other forms of consumer
learning, often within the search context. Hendricks, Sorensen,
and Wiseman (2012) developed a model of herd behavior, where
consumers learn from experience of others, and examined inefficiencies resulting from the herd behavior. In contrast, in
our model, consumers learn only from their own experience.
Ackerberg (2003) examined consumer learning in the context
of search for experience goods, when consumers are aware of
the set of brands (varieties), but have incomplete information

about their quality. Crawford and Shum (2005) modeled doctors’ learning about the effectiveness of a new drug through
experimentation. A more recent article in the healthcare context
is Chernew, Gowrisankaran, and Scanlon (2008). Similar to our
study, they employed Beta priors (a special case of Dirichlet
distribution for two points of support) to model the evolution of
consumer beliefs. In their model, consumers received additional
signals about the quality of a health plan, which were shown to
have substantial impact on health plan choices.
2.
2.1

THE SEARCH MODEL

Basic Setup

The following model was originally formulated by Rothschild
(1974). A consumer is looking to buy a certain product. Suppose
at the time of search N varieties are available on the market:
SN = {u1 , u2 , . . . , uN },
where utilities are sorted in decreasing order, that is, u1 is the

first-best, u2 is the second-best, and so on. Therefore, the utility
index r = 1 . . . N also stands for the rank of the product. For
clarity, we assume that all varieties have distinct utilities (a solution for the case of duplicate utilities is available upon request).
As the analysis will proceed from the point of view of a single
consumer, we suppress any consumer-specific indices until they
become relevant.
A consumer who wishes to make a purchase in this market
faces two types of uncertainty. First, she does not know the
locations of varieties, which necessitates search. She collects
information through a sequence of costly search attempts. The
search technology returns a particular ur ∈ S with probability
ρr , independently from previous draws. The second type of
uncertainty is that the consumer does not know the actual sampling probabilities. Instead, she believes that any search attempt
would reveal a utility that belongs to a set of potential utility
levels:
SN ⊆ SG = {u˜ 1 , u˜ 2 , . . . , u˜ G }.
In the original article, Rothschild (1974) assumed SN = SG ,
which corresponds to the case when the consumer knows the
set of actual utilities but does not know their locations. Our
formulation is more general as it allows for uncertainty about

the support of the distribution. In other words, a consumer does
not know the probability of sampling a particular price level,
including whether this probability is positive.
Because prices are discrete, the assumption SN ⊆ SG fits naturally the context of search for best price; however, it is restrictive for the general case of continuously distributed utilities. In

228

Journal of Business & Economic Statistics, April 2013

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:06 11 January 2016

that case, we essentially assume that consumers are indifferent
between products that have sufficiently similar utilities.
The actual probability pg of sampling u˜ g ∈ SG is determined by the search technology and the availability of products:
/ SN . The conpg = ρr(g) , ur(g) ∈ SN , and pg = 0 for ur(g) ∈
sumer perceives the unknown vector (p1 , . . . , pG ) as a random
variable, (p˜ 1 , . . . , p˜ G ), with a Dirichlet distribution. Its density
is
 
αg  αg −1 
Ŵ
p˜ g = 1, p˜ g > 0.
p˜ g
f (
p1 , . . . , p
G ) = 
Ŵ(αg )
The mean prior belief is



αG
α1
,...,
G ) =
E(
p1 , . . . , p
N0
N0

N0 =



αg , αg > 0.
(1)

The parameter N0 is inversely related to the diffusion of the
prior: the larger it is, the lower is the informational content
of new observations. The ratios αg /N0 represent relative likelihoods of elements of SG . As more utilities are sampled,
the consumer updates her beliefs in a Bayesian fashion. Let
N t = (nt1 , . . . , ntG ) represent the number of times each utility
level has been sampled during search. Standard results imply
(see, e.g., Rothschild 1974) that the posterior distribution is also
Dirichlet, with posterior mean:


αG + ntG
α1 + nt1
t
 ,...,

G |N ) =
E(
p1 , . . . , p
. (2)
N0 + ntg
N0 + ntg

According to this updating rule, the relative likelihood of observing a utility level ug ∈ SG increases every time it is sampled.
As the searcher’s information set grows, the posterior mean belief converges to the actual sampling technology: E(
pg ) → ρg
for ug ∈ SN and E(
pg ) → 0 for ug ∈
/ SN .
For all their advantages—flexibility, simplicity, and
conjugacy—Dirichlet priors also have a well-known drawback:
the updating is local. Sampling a certain utility level affects beliefs about higher utilities in a uniform fashion, without changing
their relative likelihoods. In a more realistic setup, one would
expect that sampling some utility level will also raise the relative weight of its neighbors. For example, a prior based on a
mixture of normal distributions has this property. However, the
normal prior has strong restrictions on the shape of its tails,
while Dirichlet distribution has no such restrictions.
2.2 The Optimal Search Decision
A rational consumer continues searching as long as the expected benefit of a search attempt remains above the search
cost. After t searches are made, the search history is a sequence
of utility levels drawn at every step: ht = {u1 , u2 , . . . , ut }. Alternatively, it can be written as a sequence of indices of elements of SN that have been sampled: ht = {r 1 , r 2 , . . . , r t }. We
adopt the latter representation. Since SN is ordered, we can define r ∗ (h) = min{r 1 , r 2 , . . . , r t }—the index of the best product
found during search.
As shown by Rothschild (1974), a consumer with Dirichlet
priors will continue searching after t attempts if and only if the

following inequality holds (where c stands for search cost):

pg |N t ] > c. (3)
(u˜ g − ur ∗ )E[
E max(
u, ur ∗ ) − ur ∗ =
u˜ g >ur ∗

With updating rule (2), the left side of the inequality decreases
with t, so the search process is finite. Once the inequality is
reversed, the consumer stops and buys the best product out of
those discovered during search. As search results are random,
the identity of the purchased product is also random. Our goal is
to derive the ex-ante (prior to search) probabilities of purchase
for a given product, conditional on preferences and search cost.
It is notable that not a single sequence of utility draws can
arise as a search history. This is because condition (3) must be
satisfied recursively, that is, the consumer must optimally decide
to continue searching at every step except the last one. To emphasize this property, we introduce the definition of “feasible”
search histories:
Definition 1. A search history ht = {r 1 , r 2 , . . . , r t } is feasible if it is optimal to continue searching after every subhistory,
hl = {r 1 , . . . , r l }, l < t.
Let Hr denote the set of feasible search histories after which
the consumer optimally decides to stop and buy ur ∈ SN . Theoretically, the ex-ante purchase probability of product r can be
obtained by summing up probabilities of all histories in Hr :

Pr =
P (h).
(4)
h∈Hr

2.3 Calculation of Purchase Probabilities
by Direct Integration
When the number of products is small, the formula (4) can
be computed directly. However, as the number of products increases, calculations quickly become unfeasible. Indeed, the
summation in Equation (4) is subject to a curse of dimensionality. For a given search cost c, let T (c) be the maximal search
length (as shown by Rothschild 1974, it is finite, but unbounded
as search cost reduces to zero). Because the search is with replacement, the number of potential search histories is in the
order of N T , where N is the number of products. As a result, the computational time involved in summation (4) grows
exponentially.
We attempt to compute Equation (4) for T = 10 and N =
2, 3, . . . , 10. The set of product utilities and their sampling
probabilities was taken from the actual data used in our empirical application. We construct an N T × T matrix of all possible search histories, and for every one we recursively apply
formula (3) to determine the optimal stopping decision, and the
purchased product. Computations are conducted in Matlab, on a
powerful computer cluster, vectorized for speed whenever possible. With T = 10 and N = 8, it takes 10 min to evaluate the
summation, 30 min for N = 9, and 10 hr for N = 12. In our
empirical application, we have 24 products per market: by extrapolation, we find it would take more than a year to compute
a purchase probability once.
Another issue with direct application of Equation (4) is that it
requires numerical integration over the unobserved search cost.

Koulayev: Search With Dirichlet Priors

229

With 100 draws from the underlying distribution, it will take
10 ∗ 100 = 1000 min for a single evaluation of the objective
function, in the case where T = 10 and N = 8 . In contrast, the
method we develop next allows for analytical integration over
search cost, which is both faster and more precise.

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:06 11 January 2016

2.4 New Characterization of Search
Our approach to this problem is to partition the set Hr into
a number of categories, whose probabilities are still simple to
compute. We achieve this by removing information about search
histories that is not necessary for computing the probability of
purchase. Indeed, consider the expected benefit of search after
a history with length t and with the best product r ∗ . Using (2),
it is obtained as

(u˜ g − ur ∗ )E[
pg |N t ]
E max(
u, ur ∗ ) − ur ∗ =
u˜ g >ur ∗

=



(u˜ g − ur ∗ )

u˜ g >ur ∗

αg
.
N0 + t

(5)

In the previous formula, we used the fact that since utilities
u˜ g > ur ∗ are not sampled, the corresponding ntg are all zeros.
Equating the expected benefit to search cost, we obtain the
indifference condition

αg
(u˜ g − ur ∗ )
= c.
(6)
N
0+t
u˜ >u ∗
g

r

Solving for t and taking into account integer constraints,
⎤ ⎞
⎛ ⎡

1
(u˜ g − ur ∗ )αg − N0 ⎦ ⎠ ,
k¯r ∗ = max ⎝1, ⎣
c u˜ >u ∗
g

r

(7)

+

where [x]+ is an upward rounding of x. This leads to an alternative optimal stopping rule:
Lemma 1. The continuation decision of a consumer with
Dirichlet priors after a search history ht depends on two parameters: t is the length of the search history, and r ∗ is the index
of the best utility sampled during search. Given these parameters, the consumer continues searching if and only if
t < k¯r ∗ .

(8)

Proof. Consider Equation (6). The left side—the expected
benefit of search after t attempts—is decreasing in t. If t > k¯r ∗ ,
the benefit of search falls below search cost, and the search
stops. The same is true for t = k¯r ∗ , because k¯r ∗ is obtained by

upward rounding of the solution to Equation (6).

Therefore, the length of the search history t and the index
of the best product r ∗ are sufficient to determine the optimal
continuation decision. According to Lemma 1, when a searcher
with Dirichlet priors finds a better product, in her continuation decisions she “forgets” the previously found products and
“remembers” only the length of prior search.
Computing Equation (7) for r = 1, . . . , N , we obtain the
vector of sufficient statistics that completely summarizes search
behavior:
k¯ = {k¯1 , k¯2 , . . . , k¯N }.

(9)

The advantage of this characterization is that the length of the
k¯ vector is fixed, as opposed to the sequence of reservation
utilities, whose length is unbounded. A further observation from
Equation (7) follows:
Lemma 2. A consumer with Dirichlet priors will search
longer if inferior products are sampled. Formally, let h1 and
h2 be two histories of the same length, but h1 has inferior best
product than h2 : r ∗ > r ∗∗ . Then, if the consumer decides to
continue searching after h2 , then he necessarily continues after
h1 , too.
Proof. According to Equation (8), the consumer continues
after h2 iff t < k¯r ∗ . From Equation (7), k¯r is decreasing in ur ,
or, alternatively, increasing in r. Therefore, k¯r ∗∗ ≥ k¯r ∗ > t and

the search continues after h1 as well.

During a search with an unknown distribution, sampling a
low-utility product has two opposite effects on the incentives to
search: it makes sampling higher utilities less probable, discouraging search; and it decreases the status quo, promising larger
potential improvements. As Lemma 2 finds, a consumer will
search longer if inferior products are sampled; therefore, the
second effect dominates.
2.5

Probability of Purchase

Reversing condition (8), we can see that all histories leading
to a purchase of product r ∗ must satisfy t ≥ k¯r ∗ . However, not
all sequences of utility draws that correspond to the same pair
(t, r ∗ ) are feasible. Additional information about a search history
is necessary to determine its feasibility.
Lemma 3. A search history ht = {r 1 , r 2 , . . . , r t } is feasible
if and only if it is optimal to continue searching after the penultimate subhistory, ht−1 = {r 1 , r 2 , . . . , r t−1 }.

the best product found during ht−1 .
Proof. Denote by rt−1

Assume first that it is optimal to continue searching after ht−1 .
∗ . Consider any subhistory
Lemma 1 implies that t − 1 < k¯rt−1
hl = {r 1 , . . . , r l }, l < t − 1. Since longer search can only im∗
≤ rl∗ . This implies
prove on the best product found, we have rt−1
¯
¯
that kr ∗ ≤ krl∗ (see Lemma 2). Therefore, it is optimal to cont−1

tinue searching after hl : l < t − 1 < k¯rt−1
≤ k¯rl∗ . The converse
t
is true, because if a history h is feasible, then by definition it is
optimal to continue searching after every subhistory, including
ht−1 .

It follows that, in addition to the search length and the index

of the best product, we need to know rt−1
, which is the index of
the product that prevailed on the penultimate subhistory, ht−1 .

ensure that the
Together, inequalities t > k¯r ∗ and t − 1 < k¯rt−1
feasibility of a search history leads to a purchase of product r ∗ .
Further, observe that any search history leading to product
r ∗ can be divided into three stages. First, there is an initial
stage where inferior products are sampled. Denote by t0 the
length of that stage, and by r ∗∗ the index of the best product
found during that stage, which is also the second-best for the
entire search history. Then, the product r ∗ is sampled at the
period t0 + 1. Finally, during the third stage, products indexed
r ≥ r ∗ are sampled until the search is stopped. If the first stage is
empty, that is, the search started by sampling the best product, we

230

Journal of Business & Economic Statistics, April 2013



have rt−1
= r ∗ ; if not, then rt−1
= r ∗∗ . Therefore, the feasibility
of a search history can alternatively be stated as a condition
on the second-best product. This intuition is formalized in the
following proposition.

Proposition 1. For a consumer with Dirichlet priors, a search
history ht = {r 1 , r 2 , . . . , r t } is feasible and results in a purchase
of product r ∗ if and only if it satisfies the following conditions:
1



2

t

r = min{r , r , . . . , r }
t0 < k¯r ∗∗

r ≥ r¯ (r ∗ , t0 ) can be sampled during the first stage. The probability of this happening exactly t0 times is (ρr¯ (r,t0 ) + · · · + ρN )t0 . On
the second stage, good r ∗ is sampled with probability ρr ∗ . After
that, the searcher must continue sampling goods r ∗ , . . . , N exactly max(0, k¯r ∗ − t0 − 1) times. A special provision is needed
for the case of r ∗ = N, because the first stage is empty.

So far, we have assumed that all utilities in SN are distinct.
A result analogous to Equation (11) for the case of duplicate
utilities is available from the author upon request.

(10)

t = max{t0 + 1, k¯r ∗ },

2.6 Rationality of Beliefs


Downloaded by [Universitas Maritim Raja Ali Haji] at 22:06 11 January 2016

where t0 is the length of the search history before r is discovered; r ∗∗ is the index of the second-best product; and t is the
length of the search history.
Proof. Together with r = r ∗ , the condition t = max{t0 +
1, k¯r } implies t ≥ k¯r ∗ , which is a necessary and sufficient condition for optimal stopping, by Lemma 1. Therefore, we only need
to establish feasibility. Assume first that for a history ht the system (10) is satisfied. Suppose then t = t0 + 1, that is, the search
stopped as soon as the product r was sampled (so the third stage
is empty). Inequality t0 < k¯r ∗∗ implies that, according to Lemma
1, it is optimal to continue after the penultimate subhistory,
ht0 = ht−1 . By Lemma 3, ht is feasible. Suppose alternatively
that t = k¯r > t0 + 1. Since r = r ∗ and t − 1 < k¯r , such a history is feasible too. Conversely, assume that a history is feasible.
The condition t0 < k¯r ∗∗ holds because by definition it is optimal
to continue searching under every subhistory, including t0 . If
t0 + 1 < k¯r , the search continues until t = k¯r ; if t0 + 1 ≥ k¯r ,
the search optimally stops as soon as product r is sampled;

generalizing both cases, we obtain t = max{t0 + 1, k¯r }.

Following this result, all variety of histories that belong to
Hr can be divided into exactly k¯N classes, indexed by length of
the first stage, t0 = 0, 1, . . . , k¯N − 1. The identities of goods
that are sampled during that stage do not matter, as long as only
products inferior to r ∗ are sampled and the search optimally
continues after that. The probability measure of Hr becomes
straightforward to compute.
Proposition 2. In a search process with Dirichlet priors, the
ex-ante probability of purchasing a good ranked r = r ∗ is found
as
P (r ∗ |k¯1 , . . . , k¯N )
=

k¯
N −1

(ρr¯ (r ∗ ,t0 ) + · · · + ρN )t0

t0 =0

In the search model from known distribution, rationality of
beliefs typically amounts to an assumption that consumers know
the empirical distribution of utilities. In the model with learning,
the implications of the rationality requirement are less clear.
Here, we will assume that while consumers are uncertain about
the actual sampling probabilities, their mean prior is correct:
E[
pg ] = ρg ,
= 0,

¯

= ρNkN , r ∗ = N,

3.

where r¯ (r ∗ , t0 ) = min{r : r > r ∗ , t0 < k¯r ∗ }.
Proof. It follows from Proposition 1 that the union of all t0
classes is equal to Hr . Given t0 , the set of products sampled during ht0 must satisfy two criteria: they must be inferior to product
v; the best among them, r ∗∗ , must satisfy t0 < k¯r ∗∗ . Since k¯r is
nondecreasing in r, the latter inequality implies t0 < k¯r for any
r ≥ r ∗∗ . Combining these conditions, we obtain that products

(12)

MARKET SHARES

3.1 Model With Learning
The market share of an individual product will be an aggregate
result of searches performed by many consumers. These consumers are endowed with a search cost as a draw from the population level CDF of search cost distribution: ci ∼ F (c). In addition, utilities of available products, SN = {u1i , u2i , . . . , uNi },
may differ across consumers. The market share of a product
j—a purchase probability by a random consumer—is computed
by integrating out these unobservables:
Qj =

(11)

if ug ∈
/ SN .

In particular, the rationality requirement implies that the set
of potential utilities, SG , coincides with the set of actual ones,
SN , which is a restrictive assumption, as we argued previously.
In our empirical application, we will additionally compute a
specification where this assumption is relaxed. Conceptually,
the case SG = SN only affects the values of sufficient statistics,
k¯1 , k¯2 , . . . , k¯N (see Equation (7)). Conditional on this vector, the
optimal stopping decisions and probabilities depend only on the
set of actual utilities, SN . By introducing nonexistent utilities
into the consumer prior, we effectively increase the variety of
potential search results, thus inducing consumers to search more
than they would otherwise.

¯

× ρr ∗ (ρr ∗ + · · · + ρN )max[0,kr ∗ −t0 −1] , r ∗ < N

if ug ∈ SN

N 


Pr(j ) (u1i , . . . , uNi ; ci )dF (ci )dG(u1i , . . . , uNi ).

r=1

(13)

In this expression, Pr(j ) (u1i , . . . , uNi ; ci ) is the probability of
purchasing product j by consumer i where the product’s rank is
r. This expression was derived in Equation (11).
In applications, we assume search for the best price, that is,
uj = −pj . In this case, the ranking of utilities is observable
(e.g., j = r), and consumers differ only by search cost. The

Koulayev: Search With Dirichlet Priors

231

market share function becomes

Qr (p1 , . . . , pN ) = Pr (p1 , . . . , pN ; ci )dF (ci )

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:06 11 January 2016

=



Pr (k¯1 (ci , p1 ), . . . , k¯N (ci , pN ))dF (ci ).
(14)

In the second equation, the functions k¯j (c, pj ) are integer-valued
and decreasing in the search cost. Therefore, there is a set of
search cost intervals such that the entire vector k¯2 , . . . , k¯N is
constant within each interval. Since the purchase probability
Pr depends only on this vector, it is also constant within such
intervals. We use this insight to integrate out unobserved search
costs analytically.
Define a matrix of search cost cutoffs:
1 N
αg (pr − pg )
c¯kr =
g=r
k+N
r = 2, . . . , N, k = 1, . . . , K, (15)
which are solutions to equations of type k¯r (c, pr ) = k. A full
support for search costs c ∈ (0, +∞) guarantees the existence.
Although k¯i can be arbitrarily large for small-enough values
of search cost, in practice, an upper bound k¯i < K will give an
adequate approximation. (Implicitly, this defines a small interval
of search costs, [0, cmin (K)], over which we assume a constant
probability of purchase)
Sorting the set of cutoffs {c¯kr } in an increasing order, we obtain a set of intervals [c˜s , c˜s+1 ), s = 1, . . . , Ns , such that the
probability of purchase is constant within each. Specifically, for
any c ∈ [c˜s , c˜s+1 ), we have k¯r (c, pr ) = k¯rs , r = 2, . . . , N , which
implies a purchase probability Prs = Pr (1, k¯2s , . . . , k¯Ns ). Returning to Equation (14), we obtain the market share of product r
as:
Qr (p1 , . . . , pN )

=
Pr (k¯1 (ci , p1 ), . . . , k¯N (ci , pN ))dF (ci )
=


s

=

c¯s+1

Pr (k¯1 (ci , p1 ), . . . , k¯N (ci , pN ))dF (ci )

c¯s


(F (c˜s+1 ) − F (c˜s ))Prs , r = 1, . . . , N.

or equal to pr , which implies the probability of buying good
r as ρr /(ρ1 + · · · + ρr ). For search costs from [c¯r+1 ; c¯r+2 ), the
potential set of purchases includes p1 , . . . , pr+1 , so the probability of buying r is ρr /(ρ1 + · · · + ρr+1 ). The predicted market
share of good r is the sum of demand by consumers from each
interval, weighted by their population shares:

Qr (p1 , . . . , pN ) =

N+1


(F (c¯g+1 ) − F (c¯g ))

g=r

ρr
,
(ρ1 + · · · + ρg )

r = 1, . . . , N. (18)

3.3

Example

A simple example helps illustrate the application of formulas
for market shares. Suppose only three prices are available: p1 =
0, p2 = 1, p3 = 2. Beliefs are represented by a uniform prior:
α1 = α2 = α3 = 1, N0 = 3. Accordingly, the actual sampling
technology is uniform, with replacement.
Applying formula (7),
⎛ ⎡

⎤ ⎞

1
(u˜ g − ur ∗ )αg − N0 ⎦ ⎠ ,
k¯r ∗ = max ⎝1, ⎣
c u˜ >u ∗
g

r

(19)

+

we obtain sufficient statistics for search with learning: k¯2 (c) =
max(1, ceil( 1c p2 − 3)) = max(1, ceil( 1c − 3)), k¯3 (c) = max(1,
ceil( 1c (2p3 − p2 ) − 3)) = max(1, ceil(3 1c − 3)). For the highest
ranked product 1, k¯1 (c) ≡ 1. Solving equations k¯2 = 1, 2, . . . , 5
and k¯3 = 1, 2, . . . , 6, we obtain a set of search cost cutoffs.
These cutoffs divide the search cost line in a number of intervals, as shown on the x-axis of Figure (1), upper panel. For
search cost values within an interval, both statistics k¯2 (c), k¯3 (c)
are constant (see lines “kbar2” and “kbar3” in the figure). Because the probabilities of purchase of products 2 and 3 are
completely determined by these statistics, they are also constant
within an interval. In this case, the formula (11) reduces down
to:

(16)

s

3.2 Model Without Learning
In the search model with a known distribution, the predicted
market shares are straightforward to compute. Let ρg be the
probability of finding product g on the next search attempt. In
terms of our learning model, ρg = αg /N0 . Given a vector of
prices, define search cost cutoffs as

c¯r =
ρr (pr − pg ), r = 2, . . . , N.
(17)
g c¯r will stop searching and buy good r upon
sampling it, while others will continue searching. The set of potential buyers for product r consists of a sequence of intervals:
[c¯r ; c¯r+1 ), [c¯r+1 ; c¯r+2 ), . . . , [c¯N , +∞). In the first interval, consumers will stop searching as soon as they find any price lower

P2 (k¯2 , k¯3 ) =

¯3 −1
k

¯

ρ3t0 ρ2 (ρ3 + ρ2 )max{0,k2 −t0 −1}

t0 =0

¯
P3 (k¯2 , k¯3 ) = ρ3k3 .

In the model without learning, there is only one cutoff per
product: c¯2 = 13 p2 and c¯3 = 31 (2p3 − p2 ). Observe that these
cutoffs are close to c¯12 and c¯13 in the learning model, respectively.
For c ∈ [c¯2 , c¯3 ), product 2 is bought with probability 1/2, and
for c ≥ c¯3 with probability 1/3. For low search costs, c < c¯3 ,
product 3 is not bought: consumers who find it continue searching. In contrast, in the learning model, any product has a positive
chance of being bought: even a consumer with low search cost
may have a sufficiently unlucky search experience that would
prompt her to stop and return to the product.

232

Journal of Business & Economic Statistics, April 2013
7

0.60

Search with learning

6

5
k_bar value

0.40
4
0.30
3

0.20
2
0.10

1

0

0.00
0.1

0.3

0.5

0.7

0.9

1.1

1.3

search cost
k_bar2

0.6

k_bar3

Product 2

Product 3

Search without learning

0.5

probability of purchase

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:06 11 January 2016

Probability of purchase

0.50

0.4

0.3

0.2

0.1

0
0.1

0.3

0.5

0.7

0.9

1.1

1.3

search cost
Product 2

Product 3

Figure 1. Probability of purchase as a function of search cost, for models with and without learning. The upper panel also displays values of k¯2
and k¯3 as a function of search cost. Notice the correspondence between changes in the values of k¯2 and k¯3 and changes in purchase probabilities.

4.

LEARNING AND PRICE ELASTICITY

When consumers learn while searching, the optimal stopping
rule of Proposition 1 determines whether a consumer who just
found a product will buy it now or will return to it in the future.
How does the likelihood of these events change if the seller
increases the price by a certain amount? How does the learning
process shape the consumer’s price response?
For both types of search models, define the price elasticity of
demand for product r as the normalized derivative of its market
share:
ǫr =

∂Qr (p1 , ., pr , ., pN ) pr
.
∂pr
Qr

(20)

We compare this quantity between models with and without
learning, using results derived in the previous section.
When applied to search-generated demand, the notion of price
elasticity can take two different meanings. A “short-term” price
elasticity refers to price changes that are sufficiently recent so
that they are not yet incorporated in prior beliefs. Consequently,
the short-term change in demand comes only from search decisions made by consumers who discovere the new price. A
“long-term” price elasticity refers to a demand response where
the new price is already incorporated in prior beliefs. In this case,
a price change affects search behavior of all consumers. This
implies a higher competition from rival products, as more consumers who discover a rival product now stop and buy instead
of continuing to search.

Koulayev: Search With Dirichlet Priors

Demand
for good 2

233

Search with learning

P(1,2,9)

P(1,2,2)
P(1,2,12)
1/3

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:06 11 January 2016

0

Cutoffs for
good 3
Cutoffs for
good 2
Search cost
Demand
for good 2

Search without learning

1/2

1/3

0

Cutoffs for
good 3
Cutoffs for
good 2
Search cost

Figure 2. Comparing price responses in search models with and without learning. Dotted areas are short-term losses in demand for product
2, while long-term losses are a sum of both dotted and dashed areas.

To see the implications of the learning process for both shortand long-term price elasticities, it is useful to distinguish between two components of demand for a search product. The
“fresh” demand comes from consumers who bought the product immediately upon discovering it. The “returning” demand
is from those who found the product and continued searching,
but unsuccessfully: failing to find a better deal, they returned
and made the purchase.
In search without learning, all demand is “fresh”: searchers
either buy the product right away, or never. In the three-product
example, only consumers with c > c¯2 buy product 2, immediately upon discovery. From Figure 1, lower panel, we can see
that demand for product 2 also depends on the cutoff for product
3, c¯3 . Following a short-term price increase for product 2, its
cutoff increases, c¯2′ > c¯2 , while c¯3 is not affected. In the long
run, the same price change also decreases product 3’s cutoff:
c¯3′ < c¯3 . Therefore, in the short run new cutoffs are c¯2′ , c¯3 and



in the long run, c¯2′ , c¯3 . Figure 2, lower panel, depicts the corresponding changes in demand for product 2. The dotted area is a
short-run loss in demand, from consumers who sampled product 2 but continued searching. The long-run change additionally
includes the dashed area, from consumers who sampled product
3 before discovering product 2.
With learning, the behavior of consumers with a search cost
c is described by a set of sufficient statistics, k¯1 (c), . . . , k¯N (c),
where k¯1 ≡ 1 as a normalization. Define a set of search cost
cutoffs for products 2 and 3:
1
α1 (p2 − p1 ), k = 1, . . . , 2
k+3
1
(α2 (p3 − p2 ) + α1 (p3 − p1 )),
c¯k3 =
k+3
c¯k2 =

(21)
k = 1, . . . , 12.
(22)

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:06 11 January 2016

234

Figure 2, upper panel, depicts demand for product 2. There
are two horizontal axes, one for product 2’s cutoffs and another
for product 3’s. Previously, Figure 1, upper panel, combined
both sets of cutoffs on a single line. “Fresh” demand for product
2 comes from consumers with c > c¯12 , while “returning” consumers are between (c¯22 , c¯12 ) (to save space, we omit the part of
demand curve for smaller costs). Similar to the model without
learning, a short-term price change only increases product 2’s

own cutoffs, c¯k2 > c¯k2 ; a long-run change will decrease all of its

rival’s cutoffs: c¯k3 < c¯k3 .
Comparing upper and lower panels, we observe that the sum
of short-term losses in the model with learning is smaller than in
the model without learning. That is, the short-term price elasticity is smaller if consumers learn while searching. The reason is
that consumers from various intervals of search costs will react
differently depending on whether or not they are learning.
Consider first consumers who belong to the interval (c¯12 , c¯1′2 ).
Before the price change, they were part of “fresh” demand,
buying the product with probability P (1, 2, 9). After the price
change, they become “returning,” who are less likely to buy
the product: P (1, 2, 12) < P (1, 2, 9). Further, consumers from
(c¯22 , c¯2′2 ) were initially part of the “returning” group, but stopped
buying the product altogether after the price change. In sum,
a short-term price increase leads to a redistribution of demand
from “fresh” to “returning,” and from “returning” to zero. The
key observation is that the interval (c¯22 , c¯2′2 ) is smaller than
(c¯12 , c¯1′2 ), as more consumers enter “returning” demand than drop
out. As a result, the relative change in total demand is smaller
than in the model without learning, where all consumers who
quit the “fresh” category stop buying the product. The underlying reason is given in Remark 1.
Remark 1. In a search with Dirichlet priors, more pessimistic
consumers are less sensitive to price changes.
Suppose a consumer has made n search attempts, where product 2 was sampled at least once, while product 1 was never sampled. From Equation (5), the expected benefit of future search is
1
EU2 = n+3
(p2 − p1 ). It increases with p2 , but at a smaller rate
with larger n. Indeed, if improvements are unlikely, the expected
benefit is less sensitive to changes in status quo. Since consumers
from the interval (c¯22 , c¯12 ) only buy product 2 after n = 2 search
attempts, they are more pessimistic than consumers with c > c¯12
who buy product 2 immediately.
To summarize, the learning process makes demand less elastic
for two related reasons. First, a portion of demand in the model
with learning comes from returning consumers, who are necessarily more pessimistic than the “fresh” ones, and hence less
price-sensitive. In contrast, consumers who do not learn are perpetually optimistic. Second, the loss in demand from the “fresh”
component is partially compensated by an increase in the “returning” one, as a result of redistribution of these consumers
between two groups. Higher-ranked products have higher share
of returning consumers and for them we expect the largest
discrepancy between the predictions of two models.
While this simple example is helpful to build intuition, developing a general result is beyond the scope of this article. The
gap between price elasticities depends on the shape of search
cost distribution, allocation of products, and so on. In the empirical application, all these parameters are known (including

Journal of Business & Economic Statistics, April 2013

the search cost distribution, which is estimated), so the price
elasticity comparisons are possible.
5.

MONTE CARLO EXPERIMENTS

We conduct a number of Monte Carlo experiments that aim at
the following question: What are the consequences of applying
a search model from a known distribution to choice data that are
generated by searchers who learn? We are particularly interested
in how biases in search cost estimates and price elasticities
depend on market conditions and belief parameters.
A Monte Carlo experiment is organized as follows. A vector
of prices is simulated, one for each market, and there are multiple
markets. Under the “true” search cost distribution, log-normal,
with parameters E(log(c)) = −2.5 and SD(log(c)) = 0.5, we
compute market shares under the assumption that consumers
learn while searching (Equation (16)), separately for each

Bias in E(log(c))
1.60
1.40
1.20
1.00
0.80
0.60
0.40
0.20
0.00
1

0.2

0.1

0.07

0.05

0.02

0.01

0.05

0.02

0.01

0.02

0.01

Bias in SD(log(c))
0.06
0.04
0.02
0.00
-0.02 1
-0.04
-0.06
-0.08
-0.10
-0.12
-0.14
-0.16

0.2

0.1

0.07

Bias in price elasticity
0.10
0.00
1

0.2

0.1

0.07

0.05

-0.10
-0.20
-0.30
-0.40
-0.50

product 1

product 2

product 3

Figure 3. Monte Carlo results. Distribution of biases (θˆ − θ), by
the variance of the prior. For search cost estimates, 95% confidence
intervals are shown.

235

market. On the simulated data, we estimate a search cost density by fitting the model without learning, with predicted market
shares given by Equation (18). Taking the difference between
price elasticities obtained under the true model and under the estimated model, we obtain a measure of a bias. Simulating market
share data many times, we obtain a distribution of biases, which
is described by the average bias and 95% confidence interval
around it.
We repeat the experiment for different values of the variance
of prior beliefs, which is a key parameter that differentiates the
two models. A smaller variance of the prior brings the model
closer to the search without learning, which is a special case
with zero variance. The difference between the two models is
computed for three parameters: median search cost, variance of
search cost, and price elasticity. For reference, the “true” search
cost parameters are E(log(c)) = −2.5 and SD(log(c)) = 0.5.
Figure 3 presents average biases in these parameters and simulated 95% confidence intervals, by the order of decreasing
variance. For example, for variance =1, the bias in E(log(c))
is 1.4, meaning that the model without learning overestimated
this parameter by 1.4/2.5 = 56%. The lower panel represents
biases in price elasticities for the top three products. Price elasticities are overestimated in all cases, to a varying degree. It is
notable that the bias for the third-best product is often much
larger, which points to our earlier conclusions about the role
of product rank in price elasticity. The size of the bias is not
strictly monotonic with the variance of the prior, but generally
gets smaller for small values of that parameter. For example,

doubling the variance of the prior from 0.1 to 0.2 increases the
bias by 40%.
6.
6.1

APPLICATION: SEARCHING FOR MUTUAL FUNDS
Model

Suppose a consumer would like to invest her money into an
S&P 500 mutual fund. Since the goal of such funds is to mimic
the behavior of the S&P 500 index, their expected returns should
be similar. Hortacsu and Syverson (2004) (hereafter, HS) confirmed this prediction with data on actual returns from mutual
funds. They also emphasized a wide variation in management
fees across what seem to be similar investment options. One
explanation for the observed price dispersion is that investors
have nonnegligible search costs and make their investment decisions using incomplete information.
Similar to HS, we explore this hypothesis by fitting a
search model to the observed relationship between prices
and market shares of mutual funds. Our data consist of years
from 1995 to 2000, and each year is assumed to represent a
separate market. The number of operating funds every year
is Nt = 24, 22, 45, 57, 68, 82, which is also the number of
products in the search model. Market shares are formed by
search decisions of investors looking for a fund with the lowest
management fees. Observed prices are measured in basis points,
as a fixed fee per 10,000 dollars invested. Figure 4 plots data for
2000.

log price vs log market share, 2000
-3.5

-4

-4.5

-5
lo g p rice

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:06 11 January 2016

Koulayev: Search With Dirichlet Priors

-5.5

-6

-6.5

-7
-14

-12

-10

-8
-6
log market share

-4

Figure 4. Log-price versus log-market share for funds available in year 2000.

-2

0

236

Journal of Business & Economic Statistics, April 2013

The main point of depar