Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji jbes%2E2009%2E07219

(1)

Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=ubes20

Download by: [Universitas Maritim Raja Ali Haji] Date: 12 January 2016, At: 00:22

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

Nonparametric Discrete Choice Models With

Unobserved Heterogeneity

Richard A. Briesch, Pradeep K. Chintagunta & Rosa L. Matzkin

To cite this article: Richard A. Briesch, Pradeep K. Chintagunta & Rosa L. Matzkin (2010)

Nonparametric Discrete Choice Models With Unobserved Heterogeneity, Journal of Business & Economic Statistics, 28:2, 291-307, DOI: 10.1198/jbes.2009.07219

To link to this article: http://dx.doi.org/10.1198/jbes.2009.07219

Published online: 01 Jan 2012.

Submit your article to this journal

Article views: 411

View related articles


(2)

Nonparametric Discrete Choice Models With

Unobserved Heterogeneity

Richard A. BRIESCH

Cox School of Business, Southern Methodist University, Dallas, TX 75275-0333 (rbriesch@mail.cox.smu.edu)

Pradeep K. CHINTAGUNTA

Booth School of Business, University of Chicago, Chicago, IL 60637 (pradeep.chintagunta@ChicagoBooth.edu)

Rosa L. MATZKIN

Department of Economics, University of California–Los Angeles, Los Angeles, CA 90095 (matzkin@econ.ucla.edu)

In this research, we provide a new method to estimate discrete choice models with unobserved hetero-geneity that can be used with either cross-sectional or panel data. The method imposes nonparametric assumptions on the systematic subutility functions and on the distributions of the unobservable random vectors and the heterogeneity parameter. The estimators are computationally feasible and strongly con-sistent. We provide an empirical application of the estimator to a model of store format choice. The key insights from the empirical application are: (1) consumer response to cost and distance contains interac-tions and nonlinear effects, which implies that a model without these effects tends to bias the estimated elasticities and heterogeneity distribution, and (2) the increase in likelihood for adding nonlinearities is similar to the increase in likelihood for adding heterogeneity, and this increase persists as heterogeneity is included in the model.

KEY WORDS: Discrete choice; Heterogeneity; Nonparametric; Random effects.

1. INTRODUCTION

Since the early work of McFadden (1974) on the develop-ment of the conditional logit model for the econometric analy-sis of choices among a finite number of alternatives, a large number of extensions of the model have been developed. These extensions have spawned streams of literature of their own. One such stream has focused on relaxing the strict parametric struc-ture imposed in the original model. Another stream has con-centrated on relaxing the parameter homogeneity assumption across individuals. This article contributes to both these areas of research. We introduce methods to estimate discrete choice models where all functions and distributions are nonparametric, individuals are allowed to be heterogeneous in their preferences over observable attributes, and the distribution of these prefer-ences is also nonparametric.

As is well known in discrete choice models, each individual possesses a utility for each available alternative, and chooses the one that provides the highest utility. The utility of each al-ternative is the sum of a subutility of observed attributes (the systematic subutility) and an unobservable random term (the random subutility). Manski (1975) developed an econometric model of discrete choice that did not require specification of a parametric structure for the distribution of the unobservable random subutilities. This method was followed by other sim-ilarly semiparametric distribution-free methods developed by Cosslett (1983), Manski (1985), Han (1987), Powell, Stock, and Stoker (1989), Horowitz (1992), Ichimura (1993), Klein and Spady (1993), and Moon (2004), among others. More re-cently, Geweke and Keane (1997) and Hirano (2002), have ap-plied mixing techniques that allow nonparametric estimation of

the error term in Bayesian models. Similarly, Klein and Sher-man (2002) propose a method that allows nonparametric es-timation of the density as well as the parameters for ordered response models. These methods are termed semiparametric because they require a parametric structure for the systematic subutility of the observable characteristics. A second stream of literature has focused on relaxing the parametric assump-tion about the systematic subutility. Matzkin’s (1991) semipara-metric method accomplished this while maintaining a paramet-ric structure for the distribution of the unobservable random subutilities. Matzkin (1992,1993) proposed fully nonparamet-ric methods where neither the systematic subutility nor the dis-tribution of the unobservable random subutility are required to possess parametric structures.

Finally, a third stream of literature has focused on incorpo-rating consumer heterogeneity into choice models.Wansbeek, Wedel, and Meijer (2001) noted the importance of including heterogeneity in choice models to avoid finding weak relation-ships between explanatory variables and choice. However, they also note the difficulty of incorporating heterogeneity into non-parametric and seminon-parametric models. Further, Allenby and Rossi (1999) noted the importance of allowing heterogeneity in choice models to extend into the slope coefficients. Specifica-tions that have allowed for heterogeneous systematic subutili-ties include those of Heckman and Willis (1977), Albright, Ler-man, and Manski (1977), and Hausman and Wise (1978). These

© 2010American Statistical Association Journal of Business & Economic Statistics

April 2010, Vol. 28, No. 2 DOI:10.1198/jbes.2009.07219

291


(3)

articles use a particular parametric specification, that is, a spe-cific continuous distribution, to account for the distribution of systematic subutilities across consumers. Heckman and Singer (1984) propose estimating the parameters of the model without imposing a specific continuous distribution for this heterogene-ity distribution. Ichimura and Thompson (1994) have developed an econometric model of discrete choice where the coefficients of the linear subutility have a distribution of unknown form, which can be estimated.

Recent empirical work has relaxed assumptions on the het-erogeneity distributions. Lancaster (1997) allows for nonpara-metric identification of the distribution of the heterogeneity in Bayesian models. Taber (2000) and Park, Sickles, and Simar (2007) apply semiparametric techniques to dynamic models of choice. Briesch, Chintagunta, and Matzkin (2002) allow con-sumer heterogeneity in the parametric part of the choice model while restricting the nonparametric function to be homoge-neous. Dahl (2002) applies nonparametric techniques to tran-sition probabilities and dynamic models. Pinkse, Slade, and Brett(2002) allow for heterogeneity in semiparametric models of aggregate-level choice.

The method that we develop here combines the fully non-parametric methods for estimating discrete choice models (Matzkin1992,1993) with a method that allows us to estimate the distribution of unobserved heterogeneity nonparametrically as well. The unobserved heterogeneity variable is included in the systematic subutility in a nonadditive way (Matzkin1999a, 2003). We provide conditions under which the systematic subu-tility, the distribution of the nonadditive unobserved hetero-geneity variable, and the distribution of the additive unobserved random subutility can all be nonparametrically identified and consistently estimated from individual choice data. The method can be used with either cross-sectional or panel data. These re-sults update Briesch, Chintagunta, and Matzkin (2002).

We apply the proposed methodology to study the drivers of grocery store format choice for a panel of households. There are two main formats that for classifying supermarkets—everyday low price (EDLP) stores or high–low price (Hi–Lo) stores. The former offer fewer promotions of lower “depth” (i.e., magni-tude of discounts) than the latter. The main tradeoff facing con-sumers is that EDLP stores are typically located farther away (longer driving distances) than Hi–Lo stores although their prices, on average, are lower than those at Hi–Lo stores lead-ing to a lower total cost of shopplead-ing “basket” for the consumer. Since the value of time (associated with the driving distance) is heterogeneous across households and little is known about the shape of the utility for driving distance and expenditure, we think that the proposed method is ideally suited to understand-ing the nature of the tradeoff between distance and expenditure facing the consumer. To decrease the well-known dimensional-ity problems associated with relaxing parametric structures, we use a semiparametric version of our model. In particular, only the subutility of distance to the store and cost of refilling in-ventory at the store are nonparametric. We allow this subutility to be heterogeneous across consumers and provide an estima-tor for the subutilities of the different types, the distribution of types, and the additional parameters of the model. Further, we assume that the unobserved component of utility in this appli-cation is distributed according to a type-I extreme value distri-bution.

In the next section we describe the model. Section3states conditions under which the model is identified. In Section 4 we present strongly consistent estimators for the functions and distributions in the model. Section 5 provides computational details. Section6presents the empirical application. Section7 concludes.

2. THE MODEL

As is usual in discrete choice models, we assume that a typ-ical consumer must choose one of a finite number,J, of alter-natives, and he/she chooses the one that maximizes the value of a utility function, which depends on the characteristics of the alternatives and the consumer. Each alternativej is character-ized by a vector,zj, of the observable attributes of the alterna-tives. We will assume thatzj≡(xj,rj), whererjRandxjRK (K≥1). Each consumer is characterized by a vector,sRL, of observable socioeconomic characteristics for the consumer. The utility of a consumer with observable socioeconomic char-acteristics s, for an alternative,j, is given byV∗(j,s,zj, ω)+εj, whereεjandωdenote the values of unobservable random vari-ables. For any given value ofω, and anyj,V∗(j,·, ω)is a real-valued but otherwise unknown function. The dependence ofV∗ onω allows this systematic subutility to be different for dif-ferent consumers, even if the observable exogenous variables are the same for these consumers. We denote the distribution of

ωbyG∗and we denote the distribution of the random vector

ε=(ε1, . . . , εJ)byF∗.

The probability that a consumer with socioeconomic char-acteristicss will choose alternative j when the vector of ob-servable attributes of the alternatives is z ≡(z1, . . . ,zJ) ≡

(x1,r1, . . . ,xJ,rJ)is denoted byp(j|s,z;V∗,F∗,G∗). Hence,

p(j|s,z;V,F,G)=

Pr(j|s,z;ω,V,F)dG(ω),

where Pr(j|s,z;ω,V,F) denotes the probability that a con-sumer with systematic subutility V(·, ω)will choose alternative j, when the distribution ofε is F. By the utility maximization hypothesis,

Pr(j|s,z;ω,V,F)

=Pr{V(j,s,xj,rj, ω)+εj>V(k,s,xk,rk, ω)+εk,∀k=j}

=Pr{εk−εj<V(j,s,xj,rj, ω)−V(k,s,xk,rk, ω),∀k=j}, which depends on the distributionF. In particular, if we letF1∗ denote the distribution of the vector(ε2−ε1, . . . , εJ−ε1), then

Pr(1|s,z;ω,V,F∗)

=F1

V(1|s,x1,r1, ω)−V(2|s,x2,r2, ω), . . . ,

V(1|s,x1,r1, ω)−V(J|s,xJ,rJ, ω)

and the probability that the consumer will choose alternative one is then

p(1|s,z;V,F∗,G)

=

Pr(1|s,z;V,F∗,G)dG(ω)

=

F1

V(1|s,x1,r1, ω)−V(2|s,x2,r2, ω), . . . , V(1|s,x1,r1, ω)−V(J|s,xJ,rJ, ω)dG(ω).


(4)

For anyj, Pr(j|s,z;ω,V,F∗)can be obtained in an analogous way, lettingFj∗denote the distribution of (ε1−εj, . . . , εJ−εj). A particular case of the polychotomous choice model is the bi-nary threshold Crossing model, which has been used in a wide range of applications. This model can be obtained from the polychotomous choice model by lettingJ=2, η=(ε2−ε1) and ∀(s,x2,r2, ω),V(2,s,x2,r2, ω)≡0. In other words, the model can be described by:y∗=V(x,r, ω)−η, with

y=

1, ify∗≥0 0, otherwise,

wherey∗ is unobservable. In this model, F1∗ denotes the dis-tribution of η. Hence, for all x, r, ω,Pr(1|x,r;ω,V∗,F∗)=

F1(V∗(x,r, ω))and for allx,rthe probability that the consumer will choose alternative one is

p(1|s,z;V∗,F∗,G∗)=

Pr(1|s,z;ω,V∗,F∗)dG∗(ω)

=

F1∗(V∗(x,r, ω))dG∗(ω).

3. NONPARAMETRIC IDENTIFICATION Our objective is to develop estimators for the functionV∗and the distributionsF∗andG∗without requiring that these func-tions and distribufunc-tions belong to parametric families. It follows from the definition of the model that we can only hope to iden-tify the distributions of the vectorsηj≡(ε1−εj, . . . , εJ−εj)for

j=1, . . . ,J. LetF1∗denote the distribution ofη1. Since fromF1∗ we can obtain the distribution ofηj(j=2, . . . ,J), we will deal only with the identification ofF1∗. We letF∗=F1.

Definition. The functionV∗and the distributionsF∗andG∗ are identified in a set (W×ŴF×ŴG) such that (V∗,F∗,G∗)∈

(W×ŴF×ŴG), if∀(V,F,G)∈(W×ŴF×ŴG),

Pr(j|s,z;V(·;ω),F)dG(ω)

=

Pr(j|s,z;V(·;ω),F∗)dG∗(ω) forj=1, . . . ,J, a.s. implies thatV=V∗,F=F∗, andG=G∗.

That is, (V∗,F∗,G∗) is identified in a set (W×ŴF×ŴG) to which(V∗,F∗,G∗)belongs if any triple(V,F,G)that belongs to (W ×ŴF×ŴG) and is different from (V∗,F∗,G∗) gener-ates choice probabilitiesp(j|s,z;V,F,G)that are different from p(j|s,z;V∗,F∗,G)for at least one alternativejand a set of(s,z)

that possesses positive probability. We next present a set of con-ditions that, when satisfied, guarantee that(V∗,F∗,G∗)is iden-tified in (W×ŴF×ŴG).

Assumption 1. The support of (s,x1,r1, . . . ,xJ,rJ, ω) is a set (S×J

j=1(Xj×R+)×Y), whereS andXj (j=1, . . . ,J) are subsets of Euclidean spaces andY is a subset ofR.

Assumption 2. The random vectors (ε1, . . . , εJ), (s,z1, . . . , xJ) andωare mutually independent.

Assumption 3. For allVW andj, there exists a real val-ued, continuous functionv(j,·,·, ω): int(S×Xj)→Rsuch that

∀(s,xj,rj, ω)∈(S×Xj×RY),V(j,s,xj,rj, ω)=v(j,s,xj,

ω)−rj.

Assumption 4. ∃(s¯,x¯1, . . . ,x¯j)∈(S×Jj=1Xj) and (α1, . . . ,

αJ)∈RJ such that∀j∀ω∀VW,v(js,x¯j, ω)=αj.

Assumption 5. ∃˜j,β˜jR, and˜xj˜∈X˜jsuch that∀sS,∀ω∈

Y,∀VW,vj,s,x˜˜j, ω)=β˜j.

Assumption 6.j∗= ˜jand(ˆs,xˆj)(S×Xj)such that∀V

W ∀ωv(j∗,sˆ,xˆj∗, ω)=ω.

Assumption 6.j∗= ˜j,(sˆ,xˆj∗,ω)ˆ ∈(S×Xj∗×Y)andγ∈

R such that ∀VW v(j∗,ˆs,xˆj,ω)ˆ =γ, and∀VWλR

v(j∗, λsˆ, λxˆj∗, λω)ˆ =λγ.

Assumption 6′′.j∗ = ˜j, (sˆ,xˆj∗)∈ (S ×Xj∗), γ ∈ R, and

there exists a real-valued, continuous function m(s,x1,x2ω) such that ∀VW, ∀ω∈Y, v(j∗,s,x, ω)=m(s,x1,x2ω), and ms,xˆ1,xˆ2ω)=γ, where (x1,x2)=xj∗, (xˆ1,xˆ2)= ˆxj∗, and

x2,xˆ2∈R.

Assumption 7. ∀(s,xj)(S×Xj)(i) either it is known that

v(j∗,s,xj∗,·)is strictly increasing inωfor allω∈YandVW,

or (ii) it is known thatv(j∗,s,xj,·)is strictly decreasing inω,

for allω∈YandVW.

Assumption 8.k=j∗, ˜j either v(k,s,xk, ω) is strictly in-creasing inω, for all(s,xk), orv(k,s,xk, ω)is strictly decreas-ing inω, for all(s,xk).

Assumption 9.j= ˜j such that ∀(s,xj,x˜j)∈(S×Xj×X˜j) either (i) it is known thatv(j,s,xj,·)−vj,s,x˜j,·)is strictly in-creasing inωfor allω∈Y andVW, or (ii) it is known that v(j,s,xj,·)−v(j˜,s,x˜j,·)is strictly decreasing inωfor allω∈Y andVW.

Assumption 10. G∗is a strictly increasing, continuous distri-bution onY.

Assumption 11. The characteristic functions corresponding to the marginal distribution functionsFε

˜

j−εj(j=1, . . . ,J;j= ˜j)

are everywhere different from 0.

An example of a model where Assumptions4–9are satisfied is a binary choice where each functionv(1,·)is characterized by a functionm(·)and each functionv(2,·)is characterized by a functionh(·)such that for alls,x1,x2, ω: (i)v(1,s,x1, ω)= m(x1, ω), (ii) v(2,s,x2, ω)=h(s,x2, ω), (iii) m(x¯1, ω)=0, (iv)h(s,x¯2, ω)=0, (v)h(sˆ,xˆ2, ω)=ω, (vi)h(s,x2,·)is strictly increasing whenx2= ¯x2, and (vii)m(x1,·)is strictly decreasing whenx1= ¯x1, where¯x1,x¯2,sˆ, andxˆ2are given.

In this example, Assumption 4 is satisfied when ¯s is any value andα1=α2=0. Assumption5is satisfied with˜j=1,

β˜j=0 and x˜1= ¯x1. By (v), Assumption 6 is satisfied with j∗ =2. Finally, Assumptions 7–9 are satisfied by (vi) and (vii). If in the above example (v) is replaced by assumption (v′)h(sˆ,xˆ2,ω)ˆ =αandh(·,·,·)is homogenous of degree one, where, in addition to x¯1,x¯2, ˆs, andxˆ2, ωˆ and α∈R are also given, then the model satisfies Assumptions4,5,6′, and7–9.

Assumption 1 specifies the support of the observable ex-planatory variables and of ω. The critical requirement in this assumption is the large support condition on(r1, . . . ,rJ). This requirement is used, together with the other requirements, to identify the distribution of (ε1−ε2, . . . , ε1−εJ). Assump-tion2 requires that the vectors(ε1, ε2, . . . , εJ),(s,z1, . . . ,zJ),

ω be mutually independent. That is, for all (ε1, ε2, . . . , εJ),


(5)

(s,z1, . . . ,zJ),ω,

f(ε1,...,εJ),(s,z1,...,zJ),ω(ε1, . . . , εJ,s,z1, . . . ,zJ, ω)

=f(ε1,...,εJ)(ε1, . . . , εJf(s,z1,...,zJ)(s,z1, . . . ,zJfω(ω),

wheref(ε1,...,εJ),(s,z1,...,zJ),ωdenotes the joint density of(ε1, . . . ,

εJ,s,z1, . . . ,zJ, ω), and f(ε1,...,εJ)(ε1, . . . , εJ), f(s,z1,...,zJ)(s,z1,

. . . ,zJ), andfω(ω)denotes the corresponding marginals. A crit-ical implication of this is that for all (ε1, . . . , εJ), (s,z1, . . . , zJ),ω, the vectors(ε1, . . . , εJ),ω, and(r1, . . . ,rJ)are mutually independent conditional on(s,z1, . . . ,zJ), that is,

f(ε1,...,εJ),(r1,...,rJ),ω|(s,x1,...,xJ)(ε1, . . . , εJ,r1, . . . ,rJ, ω)

=f(ε1,...,εJ)|(s,z1,...,zJ)(ε1, . . . , εJ)

×f(r1,...,rJ)|(s,z1,...,zJ)(r1, . . . ,rJ)fw|(s,z1,...,zJ)(w)

Assumption3restricts the form of each utility functionV(j,s,

xj,rj, ω) to be additive in rj, and the coefficient of rj to be known. A requirement like this is necessary to compensate for the fact that the distribution of(ε1, . . . , εJ)is not specified, even when the function V(j,s,xj,rj, ω)is linear in variables. Since the work of Lewbel (2000), regressors such as(r1, . . . ,rJ)have been usually called “special regressors.” Assumptions4and5 specify the value of the functionsv(j,·), defined in Assump-tion3, at some points of the observable variables. Assumption4 specifies the values of thesev(j,·)functions at one point of the observable variables, for all values ofω. This guarantees that at those points of the observable variables, the choice probabil-ities are completely determined by the value of the distribution of(ε2−ε1, . . . , εJ−ε1). This, together with the support con-dition in Assumption 1, allows us to identify this distribution. Assumption5specifies the value of one of thev(j,·)functions at one value ofxj, for alls,ω. As in the linear specification, only differences of the utility function are identified. Assumption5 allows us to recover the values of each of thev(j,·)functions, for j= ˜j, from the difference betweenv(j,·)andvj,·). Once thev(j,·)functions are identified, one can use them to identify vj,·).

Assumptions 5–10 guarantee that the difference function m(s,xj,x˜

j, ω)=v(j∗,s,xj∗, ω)−vj,s,x˜j, ω)and the distrib-ution ofωare identified from the joint distribution of(γ ,s,x), whereγ=m(s,xj,x˜

j, ω). Using the results in Matzkin (2003), identification can be achieved if (i) m(s,xj,x˜

j, ω) is either strictly increasing or strictly decreasing inωfor all(s,xj,x˜

j), (ii)ωis distributed independently of(s,xj∗)conditional on all

the other coordinates of the explanatory variables, and (iii) one of the following holds: (iii.a) for some(ˆsxj∗,x˜˜

j), the value ofm is known atm(sˆ,xˆj∗,x˜˜

j)for allω; (iii.b) for some(ˆsxj∗,x˜˜

j,ω)ˆ , the value ofmis known at(sˆ,xˆj∗,x˜˜

j,ω)ˆ andm(s,xj∗,x˜˜

j, ω)is homogeneous of degree 1 along the ray that connects(sˆ,xˆj,ω)ˆ

to the origin; (iii.c) for some (sˆ,xˆj,x˜˜

j,ω)ˆ , the value of m at

sxj,˜x˜

j,ω)ˆ is known andmis strictly separable into a known function ofω and at least one coordinate of(s,xj).

Assump-tions6,6′, and6′′guarantee, together with Assumption5, that (iii.a), (iii.b), and (iii.c) , respectively, are satisfied. Assump-tions5and7guarantee the strict monotonicity ofminω. As-sumption2 guarantees the conditional independence between

ωand(s,xj∗). Hence, under our conditions, the functionmand

the distribution ofω are identified nonparametrically, as long

as the joint distribution of(γ ,s,x), whereγ =m(s,xj∗,x˜

j, ω), is identified.

Assumptions2,10, and11guarantee that the distribution of

(γ ,s,x), whereγ =m(s,xj∗,x˜

j, ω), is identified by using re-sults in Teicher (1961). In particular, Assumption11is needed to show that one can recover the characteristic function of v(j∗,s,xj, ω)from the characteristic functions of εj−εj∗ and

of v(j∗,s,xj, ω)+εj−εj∗. The normal and the double

expo-nential distributions satisfy Assumption 11. The distribution whose density isf(x)=π−1x−2(1−cos(x)) does not satisfy it.

Assumptions8and9guarantee that all the functionsv(j,·)

can be identified when the distribution ofω, the distribution of

1−ε2, . . . , ε1−εJ), and the functionv(j∗,·)are identified. Using the assumptions specified above, we can prove the fol-lowing theorem:

Theorem 1. If Assumptions 1–11, Assumptions 1–5, 6′, 7–11, or Assumptions1–5,6′′,7–11are satisfied, then(V,F,

G∗)is identified in(W×ŴF×ŴG).

This theorem establishes that one can identify the distribu-tions and funcdistribu-tions in a discrete choice model with unobserved heterogeneity, making no assumptions about either the para-metric structure of the systematic subutilities or the paramet-ric structure of the distributions in the model. The proof of this theorem is presented in theAppendix.

Our identification result requires only one observation per in-dividual. If multiple observations per individual were available, one could relax some of our assumptions. One such possibility would be to use the additional observations for each individ-ual to allow the unobserved heterogeneity variable to depend on some of the explanatory variables, as in Matzkin (2004). Another, more obvious, possibility would be to allow the non-parametric functions and/or distributions to be different across periods.

4. NONPARAMETRIC ESTIMATION GivenN independent observations{yi,si,zi}N

i=1 we can de-fine the log-likelihood function:

L(V,F,G)=

N

i=1 log

J

j=1

Pr(j|si,zi;V(·;ω),F)yij

dG(ω),

whereF=(F1, . . . ,FJ). We then define our estimators,Vˆ,Fˆ, andGˆ, forV∗,F∗, andG∗, to be the functions and distributions that maximizeL(V,F,G)over triples(V,F,G)that belong to a set(W×ŴF×ŴG). LetdW,dF, anddGdenote, respectively, metric functions over the setsWF, andŴG. Letd:(W×ŴF×

ŴG)×(W×ŴF×ŴG)→R+denote the metric defined by

d[(V,F,G), (V′,F′,G′)]

=dW(V,V′)+dF(F,F′)+dG(G,G′). Then the consistency of the estimators can be established un-der the following assumptions:

Assumption 12. The metricsdW anddF are such that con-vergence of a sequence with respect todW or dF implies, re-spectively, pointwise convergence. The metricdG is such that convergence of a sequence with respect todGimplies weak con-vergence.


(6)

Assumption 13. The set(W×ŴF×ŴG)is compact with re-spect to the metricd.

Assumption 14. The functions inW andŴFare continuous. An example of a set(W×ŴF×ŴG)and a metricdthat satisfy Assumptions12and13is where the metricsdW,dF, anddGare defined for allV,V′,F,F′, andG,G′by

dw(V,V′)=

|V(s,z, ω)−V′(s,z, ω)|e−(s,z,ω)d(s,z, ω),

dF(F,F′)=

|F(t1, . . . ,tJ−1)−F′(t1, . . . ,tJ−1)|

×e−(t1,...,tJ−1)dt, and

dG(G,G′)=

|G(t)−G′(t)|etdt.

The setW is the closure with respect todW of a set of point-wise bounded andαw-Lipschitsian functions, the setŴF is the closure with respect todF of a set ofαŴ-Lipschitsian distribu-tions, andŴGis the set of monotone increasing functions onR with values in [0,1]. A functionm:RKRisα-Lipschitsian if for allx,yinRK,|m(x)−m(y)| ≤αxy. In particular, sets of concave functions with uniformly bounded subgradients are

α-Lipschitsian for someα(see Matzkin1992). The following theorem is proved in theAppendix:

Theorem 2. Under Assumptions1–14(Vˆ,Fˆ,Gˆ)is a strongly consistent estimator of (V∗,F∗,G∗) with respect to the met-ricd. The same conclusion holds if Assumption6′ or 6′′ re-places Assumption6.

In practice, one may want to maximize the log-likelihood function over some set of parametric functions that increases with the number of observations in such a way that it becomes dense in the set(W×ŴF×ŴG)(Elbadwai, Gallant, and Souza 1983; Gallant and Nychka 1987; Gallant and Nychka 1989). Let WN, ŴNF, and ŴGN denote, respectively, such sets of para-metric functions when the number of observations is N. Let

(V˜N,F˜N,G˜N) denote a maximizer of the log-likelihood func-tionL(V,F,G)over(WN×ŴFN×ŴGN). Then, we can establish the following theorem:

Theorem 3. Suppose that Assumptions 1–14 are satisfied, with the additional assumption that the sequence{(WN×ŴFN×

ŴGN)}∞

N=1 is increasing and becomes dense (with respect tod) in(W×ŴF×ŴG)asN→ ∞. Then(V˜N,F˜N,G˜N)is a strongly consistent estimator of (V∗,F∗,G∗) with respect to the met-ricd. The same conclusion holds if Assumption 6is replaced by Assumption6′or6′′.

Note that Theorem3holds when only some of the functions are maximized over a parametric set, which increases and be-comes dense as the number of observations increases, and the other functions are maximized over the original set.

5. COMPUTATION

In this section, we introduce a change of notation from the previous sections. Let H be the number of households in the data (HN) andNh be the number of observed choices for householdh=1, . . . ,H(Nh>0, andN=h=1,HNh). There-fore, the key difference in this and the following sections from the previous section is that we have repeated observations for the households. The above theorems still apply as long as the assumptions are maintained; including independence of choices (see, e.g., Lindsay 1983; Heckman and Singer 1984; Day-ton and McReady 1988). Note that this assumption excludes endogenous (including lagged endogenous variables), and we leave it for future research to determine the identification and consistency conditions for models with endogenous predictors. Let(Vˆ,Fˆ,Gˆ)denote a solution to the optimization problem:

max (V,F,G)∈(W×ŴF×ŴG)

L(V,F,G)

=

H

h=1 log

Nh

i=1

J

j=1

Pr(j|sih,zji;V(·, ω),F)y

i

h,jdG(ω).

It is well known that, when the setŴGincludes discrete dis-tributions, Gˆ is a discrete distribution with at most H points of support (Lindsay1983; Heckman and Singer1984). Hence, the above optimization problem can be solved by finding a so-lution over the set of discrete distributions, G, that possess at mostHpoints of support. We will denote the points of support of any suchGbyω1, . . . , ωH, and the corresponding probabil-ities byπ1, . . . , πH. Note that the value of the objective func-tion depends on any funcfunc-tionV(·, ω)only through the values that V(·, ω) attains at the finite number of observed vectors

{(s11,x1j,rj)j=1,...,J, . . . , (sNHH,xNjH,rj)j=1,...,J}. Hence, since at a solutionGˆ will possess at mostHpoints of support, we will be considering the values of at mostHdifferent functions,V(·, ωc)

c=1, . . . ,H, that is, we can consider for eachj(j=1, . . . ,J)

at most H different subutilities,V(j,·;ωc) (c=1, . . . ,H). For each iandc, we will denote the value ofV(j,shi,xij,rj, ωc)by

Vj,h,ci . Also, at a solution the value of the objective function will depend on anyFonly through the values thatF attains at the finite number of values(Vj,i1,cV1i,1,c, . . . ,Vj,H,ciV1i,H,c), i=1, . . . ,Nh,j=1, . . . ,J,h=1, . . . ,H,c=1, . . . ,H. We then letFj,h,ci denote the value of a distribution functionFjat the vec-tor(Vj,i1,cV1i,1,c, . . . ,Vj,H,ciV1i,H,c)for eachj(j=1, . . . ,J). It follows that a solution(Vˆ,Fˆ,Gˆ)for the above maximization problem can be obtained by first solving the following finite dimensional optimization problem and then interpolating be-tween its solution:

max

{Vji,h,c},{πc},{Fij,h,c}

H

h=1 log

H

c=1

πc Nh

i=1

J

j=1

[Fj,h,ci ]yih,j

subject to ({Vj,h,ci },{πc},{Fj,h,ci })∈K,

whereK is a set of a finite number of restrictions on{Vj,h,ci },

c},{Fij,h,c}. The restrictions characterize the behavior of se-quences {Vj,h,ci }, {πc}, {Fj,h,ci } whose values correspond to


(1)

302 Journal of Business & Economic Statistics, April 2010

Figure 1. Segment distance response surface at mean cost.


(2)

Figure 2. Segment cost response surface at mean distance.


(3)

304 Journal of Business & Economic Statistics, April 2010

Table 7. Demographic profiles of segments

Parametric

Semiparametric

Segment 1

Segment 2

Segment 3

Segment 1

Segment 2

Segment 3

Size (# families)

36

100

25

62

85

14

% Trips at EDLP

82%

a

55%

b

43%

c

63%

a

58%

a,b

44%

b

(

0

.

28

)

(

0

.

31

)

(

0

.

30

)

(

0

.

31

)

(

0

.

35

)

(

0

.

25

)

Av. days between trips

5

.

57

a

5

.

20

a

4

.

28

b

4

.

86

b

5

.

46

a

4

.

41

b

(

2

.

20

)

(

1

.

75

)

(

1

.

79

)

(

2

.

01

)

(

1

.

73

)

(

1

.

99

)

Cost-average cost at

0

.

07

0

.

18

0

.

10

0

.

14

0

.

11

0

.

00

selected format ($/10)

(

0

.

80

)

(

0

.

93

)

(

0

.

71

)

(

0

.

83

)

(

0

.

85

)

(

1

.

17

)

Distance to selected

3.28a

1.82

b

1

.

38

c

1.98

2.19

1

.

85

format/10

(

2

.

32

)

(

1

.

07

)

(

0

.

91

)

(

1

.

44

)

(

1

.

70

)

(

1

.

20

)

Income/10,000

48.96

b

59.22

a

54.44

a,b

50.23

b

58.51

a

68.39

a

(

19

.

89

)

(

26

.

10

)

(

25

.

37

)

(

26

.

83

)

(

21

.

46

)

(

30

.

72

)

Family size

2

.

67

2

.

85

3

.

20

2

.

82

2

.

93

2

.

64

(

1

.

17

)

(

1

.

29

)

(

1

.

63

)

(

1

.

34

)

(

1

.

33

)

(

1

.

28

)

Elderly

14%

13%

16%

18%

11%

14%

(

0

.

35

)

(

0

.

34

)

(

0

.

37

)

(

0

.

39

)

(

0

.

31

)

(

0

.

36

)

College educated

19%

b

48%

a

44%

a

47%

38%

36%

(

0

.

40

)

(

0

.

50

)

(

0

.

51

)

(

0

.

50

)

(

0

.

49

)

(

0

.

50

)

Av. trips per family

79

.

03

b

82

.

36

b

111

.

04

a

91

.

77

a

77

.

86

b

110

.

64

a

(

34

.

91

)

(

35

.

57

)

(

53

.

86

)

(

43

.

46

)

(

30

.

63

)

(

59

.

55

)

NOTE: 1. Bold implies comparisons (parametric versus semiparametric) are significant atp<0.05. 2. Bold and italic comparisons (parametric versus semiparametric) are significant atp<0.10. 3.a>b>cin paired comparisons atp<0.10 (within parametric or semiparametric).

(

t

2

, . . . ,

t

J

) be given. Let (

r

1

, . . . ,

r

J

) be such that (

t

2

, . . . ,

t

J

)

=

(

r

1

+

r

2

+

α

1

α

2

, . . . ,

r

1

+

r

J

+

α

1

α

J

)

. Then,

V

W

,

G

Ŵ

G

F

1

(

t

2

, . . . ,

t

J)

=

F

1

(

t

2

, . . . ,

t

J)

dG

(ω)

=

F

1

(

r

1

+

r

2

+

α

1

α

2

, . . . ,

r

1

+

r

J

+

α

1

α

J

)

dG

(ω)

=

p

(

1|¯

s

,

x

¯

1

,

r

1

,

¯

x

2

,

r

2

, . . . ,

x

¯

J

,

r

J

),

where the last equality follows from Assumption

4

. It follows

that

F

1

is identified, since if for some

F

1

and

(

t

2

, . . . ,

t

J

)

,

F

1

(

t

2

, . . . ,

t

J

)

=

F

1

(

t

2

, . . . ,

t

J

)

, then

V

,

V

,

G

,

G

p

(

1|¯

s

,

x

¯

1

,

r

1

,

x

¯

2

,

r

2

, . . . ,

x

¯

J

,

r

J

;

V

,

F

,

G

)

=

p

(

1|¯

s

,

x

¯

1

,

r

1

,

x

¯

2

,

r

2

, . . . ,

x

¯

J

,

r

J

;

V

,

F

,

G

).

Since both

p

(

1|·;

V

,

F

,

G

)

and

p

(

1|·

J

;

V

,

F

,

G

)

are

contin-uous, the inequality holds on a set of positive probability. Next,

assume without loss of generality (wlg) that the alternative,

˜

j

,

that satisfies Assumption

5

is

˜

j

=

2

, β

2

=

0, and the

alterna-tive,

j

, that satisfies either Assumption

6

,

6

, or

6

′′

is

j

=

1.

To show that

v

(

1

,

·

)

and

G

are identified, we transform the

polychotomous choice model into a binary choice model by

letting

r

j

→ ∞

for

j

3. Let

η

ε

2

ε

1

, and denote the

marginal distribution of

ε

2

ε

1

by

F

∗η

. Since

F

is identified,

we can assume that

F

η

is known.

(

s

,

x

1

,

x

3

, . . . ,

x

J,

r

1

,

r

2

)

(

S

×

X

1

×

(

Jj=3

X

j)

×

R

2+

)

,

p

(

1|

s

,

x

1

,

x

˜

2

, . . . ,

x

J

,

r

1

,

r

2

)

=

F

η

(

v

(

1

,

s

,

x

1

, ω)

r

1

+

r

2

)

dG

(ω).

Let

γ

=

v

(

1

,

s

,

x

1

, ω)

and let

S

denote the distribution of

γ

conditional on (

s

,

x

1

). Then,

(

r

1

,

r

2

)

R

2+

p

(

1|

s

,

x

1

,

r

1

,

r

2

)

=

F

η

(

v

(

1

,

s

,

x

1

, ω)

r

1

+

r

2

)

dG

(ω)

=

F

η

+

r

1

r

2

)

dS

(γ ).

It then follows by Assumption

11

and Teicher (

1961

) that

S

(

·

)

is identified. Let

f

(

s

,

x

1

)

be the marginal pdf of (

s

,

x

1

);

f

(

·

)

is identified since (

s

,

x

1

) is a vector of observable variables.

Hence, since

S

(

·

)

is the distribution of

γ

conditional on (

s

,

x

1

)

,

we can identify the joint pdf,

f

(γ ,

s

,

x

1

)

, of

(γ ,

s

,

x

1

)

. By

As-sumptions

5

7

and

10

,

γ

=

v

(

1

,

s

,

x

1

, ω)

is a nonparametric,

continuous function that satisfies the requirements in Matzkin

(

2003

) for identification of

γ

=

v

(

1

,

s

,

x

1

, ω)

and the

distribu-tion of

ω

. Hence,

v

(

1

,

·

,

·

,

·

)

and

G

are identified. Substituting

j

in the above argument by k, as in Assumption

8

, it follows

that the distribution of

γ

k

=

v

(

k

,

s

,

x

k

, ω)

conditional on

(

s

,

x

k

)

is identified. Since by Assumption

8

the nonparametric

func-tion

v

(

k

,

s

,

x

k

, ω)

is known to be either strictly increasing or

strictly decreasing in

ω

, and the distribution of

ω

has already

been shown to be identified, it follows by Matzkin (

2003

) that

v

(

k

,

·

,

·

,

·

)

is identified. Finally, since for all

j

= ˜

j

,

v

(

j

,

·

,

·

,

·

)

is

identified, it follows by Assumption

9

, using similar arguments

to those above that

v

(

˜

j

,

·

,

·

,

·

)

is identified. This completes the

proof of Theorem

1

.

Proof of Theorem

2

We show the theorem by showing that the assumptions

neces-sary to apply the result in Kiefer and Wolfowitz (

1956

) are

satis-fied (see also Wald

1949

). For any

(

V

,

G

,

F

)

(

W

×

Ŵ

G

×

Ŵ

F

)

,


(4)

define

f

(

y

,

s

,

z

;

V

,

G

,

F

)

=

J

j=1

p

(

j

|

s

,

s

,

z

;

V

(

·

, ω),

F

)

yij

dG

(ω)

and for any

ρ >

0, define the function

f

(

y

,

s

,

z

;

V

,

G

,

F

, ρ)

by

f

(

y

,

s

,

z

;

V

,

G

,

F

, ρ)

=

sup

d[(V,G,F),(V,G,F)]<ρ

f

(

y

,

s

,

z

;

V

,

G

,

F

).

We need to show continuity, measurability, integrability, and

identification.

Continuity.

∀{

(

V

k

,

F

k

,

G

k

)

}

k=1

,

(

V

,

F

,

G

)

such that

{

(

V

k

,

F

k

,

G

k

)

}

k=1

(

W

×

Ŵ

F

×

Ŵ

G

), (

V

,

F

,

G

)

(

W

×

Ŵ

F

×

Ŵ

G

)

and

d

[

(

V

k,

F

k,

G

k), (

V

,

F

,

G

)

] →

0, one has that

(

y

,

s

,

z

)

,

ex-cept perhaps on a set of probability 0,

f

(

y

,

s

,

z

;

V

k

,

F

k

,

G

k

)

f

(

y

,

s

,

z

;

V

,

F

,

G

)

.

Measurability.

(

V

,

F

,

G

)

(

W

×

Ŵ

F

×

Ŵ

G

)

and

ρ >

0,

f

(

y

,

s

,

z

;

V

,

G

,

F

, ρ)

is a measurable function of

(

y

,

z

)

.

Integrability.

(

V

,

F

,

G

)

(

W

×

Ŵ

F

×

Ŵ

G

)

,

lim

ρ→0

E

log

f

(

y

,

s

,

z

;

V

,

G

,

F

, ρ)

f

(

y

,

s

,

z

;

V

,

G

,

F

)

+

<

.

Identification.

(

V

,

F

,

G

)

(

W

×

Ŵ

F

×

Ŵ

G

)

such that

(

V

,

F

,

G

)

=

(

V

,

F

,

G

)

. There exists a set

such that

f

(

y

,

s

,

z

;

V

,

G

,

F

)

d

(

y

,

z

)

=

f

(

y

,

s

,

z

;

V

,

G

,

F

)

d

(

y

,

z

)

.

To show continuity, we note that

d

[

(

V

k,

F

k,

G

k), (

V

,

F

,

G

)

] →

0 implies that for all

(

s

,

z

,

w

)

(

S

×

J

j−1

(

X

j

×

R

+

)

×

R

)

, and all j,

F

j,k

(

V

j,k

(

z

,

s

, ω))

converges to

F

j

(

V

j

(

z

,

s

, ω))

,

where

V

j

(

s

,

z

, ω)

=

(

V

(

j

,

s

,

z

j

, ω)

V

(

1

,

s

,

z

1

, ω), . . . ,

V

(

j

,

s

,

z

j

,

ω)

V

(

J

,

s

,

z

J

, ω))

, and that

G

k

converges weakly to

G

over

Y

.

Let

h

k

:

Y

R

and

h

:

Y

R

denote, respectively, the

func-tions

F

j,k

(

V

j,k

(

s

,

z

,

·

))

and

F

j

(

V

j

(

s

,

z

,

·

))

. Since h is continuous,

it follows from theorem 5.5 in Billingsley (

1968

) that

G

k

h

k1

converges weakly to

Gh

−1

. Hence, using a change of variables,

it follows that

p

(

j

|

s

,

z

;

V

k,

F

k,

G

k)

=

F

j,k(

V

j,k(

s

,

z

, ω))

dG

k(ω)

=

h

k(ω)

dG

k(ω)

=

t d

(

G

k

h

k1

)(

t

)

t d

(

Gh

−1

)(

t

)

=

h

(ω)

dG

(ω)

=

F

j

(

V

j

(

s

,

z

, ω))

dG

(ω)

=

p

(

j

|

s

,

z

;

V

,

F

,

G

),

where the convergence follows because

t

is a continuous and

bounded function on the supports of

G

k

h

k1

and

Gh

−1

. Hence, it

follows that

f

(

y

,

s

,

z

;

V

k

,

F

k

,

G

k

)

f

(

y

,

s

,

z

;

V

,

F

,

G

)

for all

z

.

To show measurability, we first note that it suffices to show

that for all

j

, sup

d[(V,G,F),(V,G,F)]<ρ

p

(

j

|

s

,

z

;

V

,

F

,

G

)

is

mea-surable in

(

s

,

z

)

. Now, since

(

W

×

Ŵ

F

×

Ŵ

G

)

is a compact space,

there exists a countable, dense subset of

(

W

×

Ŵ

F

×

Ŵ

G

)

. Denote

this subset by

(

W

˜

× ˜

ŴF

× ˜

ŴG)

. Then,

sup

p

(

j

|

s

,

z

;

V

,

F

,

G

)

|

d

[

(

V

,

G

,

F

), (

V

,

G

,

F

)

]

< ρ,

(

V

,

G

,

F

)

(

W

×

ŴF

×

ŴG)

=

sup

p

(

j

|

s

,

z

;

V

,

F

,

G

)

|

d

[

(

V

,

G

,

F

), (

V

,

G

,

F

)

]

< ρ,

(

V

,

G

,

F

)

(

W

˜

× ˜

Ŵ

F

× ˜

Ŵ

G

)

.

Suppose that the left-hand side is bigger than the right-hand

side; then there must exist

δ >

0 and

(

V

,

G

,

F

)

(

W

×

ŴF

×

Ŵ

G

)

such that

(

V

′′

,

G

′′

,

F

′′

)

(

W

˜

× ˜

Ŵ

F

× ˜

Ŵ

G

)

,

(i)

p

(

j

|

s

,

z

;

V

,

F

,

G

) > δ >

p

(

j

|

s

,

z

;

V

′′

,

F

′′

,

G

′′

)

.

But

(

W

˜

× ˜

ŴF

× ˜

ŴG)

is dense in

(

W

×

ŴF

×

ŴG)

. Hence,

there exists a sequence

{

(

V

k

,

F

k

,

G

k

)

} ⊂

(

W

˜

× ˜

Ŵ

F

× ˜

Ŵ

G

)

}

such that

d

[

(

V

k,

F

k,

G

k), (

V

,

F

,

G

)

] →

0. As shown in the

proof of continuity, this implies that

p

(

j

|

s

,

z

;

V

k

,

F

k

,

G

k

)

p

(

j

|

s

,

z

;

V

,

F

,

G

)

, which contradicts (i). Hence,

sup

p

(

j

|

s

,

z

;

V

,

F

,

G

)

|

d

[

(

V

,

G

,

F

), (

V

,

G

,

F

)

]

< ρ,

(

V

,

G

,

F

)

(

W

×

Ŵ

F

×

Ŵ

G

)

=

sup

p

(

j

|

s

,

z

;

V

,

F

,

G

)

|

d

[

(

V

,

G

,

F

), (

V

,

G

,

F

)

]

< ρ,

(

V

,

G

,

F

)

(

W

˜

× ˜

Ŵ

F

× ˜

Ŵ

G

)

.

Since

(

V

,

G

,

F

)

(

W

˜

× ˜

Ŵ

F

× ˜

Ŵ

G

)

,

p

(

j

|

s

,

z

;

V

,

F

,

G

)

is

measurable in

(

s

,

z

)

, it follows that

sup

p

(

j

|

s

,

z

;

V

,

F

,

G

)

|

d

[

(

V

,

G

,

F

), (

V

,

G

,

F

)

]

< ρ,

(

V

,

G

,

F

)

(

W

×

ŴF

×

ŴG)

is measurable in

(

s

,

z

)

.

To show integrability, we note that

j

(

V

,

G

,

F

)

,

p

(

j

|

s

,

z

;

V

,

F

,

G

)

1. Hence,

j

, sup{

p

(

j

|

s

,

z

;

V

,

F

,

G

)

|

d

[

(

V

,

G

,

F

),

(

V

,

G

,

F

)

]

< ρ, (

V

,

G

,

F

)

(

W

×

Ŵ

F

×

Ŵ

G

)

} ≤

1. It follows

that

ρ >

0,

E

log

f

(

y

,

s

,

z

,

V

,

G

,

F

, ρ)

f

(

y

,

s

,

z

,

V

,

G

,

F

)

+

E

log

1

f

(

y

,

s

,

z

,

V

,

G

,

F

)

+

= −

(s,z)

J

j=1

p

(

j

|

s

,

z

;

V

,

F

,

G

)

×

log

(

p

(

j

|

s

,

z

;

V

,

F

,

G

))

dF

(

s

,

z

).

Since the term in brackets is bounded, it follows that

E

[log

(

ff(y(,ys,s,z,z,V,V∗,,GG,∗F,F,ρ)∗)

)

]

+

<

∞. Finally, identification follows

from Theorem

1

. Hence, it follows by Kiefer and Wolfowitz

(

1956

) that the estimators are consistent.

Proof of Theorem

3

The arguments used in the proof of Theorem

2

can be used to

show that the log-likelihood function converges a.s. uniformly,

over the compact set

(

W

×

ŴF

×

ŴG)

, to a continuous function

that is uniquely maximized over

(

W

×

Ŵ

F

×

Ŵ

G

)

at

(

V

,

F

,

G

)

(see Newey and McFadden

1994

). Since

{

(

W

N

×

Ŵ

N

F

×

Ŵ

N G

)

}

N=1

becomes dense in

(

W

×

Ŵ

F

×

Ŵ

G

)

as the number of observations

increases, it follows by Gallant and Nychka (

1987

, theorem 0)

that the estimators obtained by maximizing the log-likelihood

over

(

W

N

×

Ŵ

FN

×

Ŵ

GN

)

are strongly consistent.


(5)

306 Journal of Business & Economic Statistics, April 2010

ACKNOWLEDGMENTS

Rosa Matzkin gratefully acknowledges support from NSF for

this research through grants SES 0551272, BCS 0433990, SES

0241858, and SES 9410182. The authors gratefully thank

Infor-mation Resources, Inc.; ESRI; and Edward Fox, Director of the

Retailing Center at Southern Methodist University for

provid-ing the data used in this research. Pradeep Chintagunta

grate-fully acknowledges support from the Kilts Center in Marketing

at the University of Chicago.

[Received August 2007. Revised September 2008.]

REFERENCES

Albright, R. L., Lerman, S. R., and Manski, C. F. (1977), “Report on the De-velopment of an Estimation Program for the Multinomial Probit Model,” report for the Federal Highway Administration, Cambridge Systematics, Inc., Cambridge, MA. [291]

Allenby, G. M., and Rossi, P. E. (1999), “Marketing Models of Consumer Het-erogeneity,”Journal of Econometrics, 89, 57–78. [291]

Bell, D. R., and Lattin, J. M. (1998), “Shopping Behavior and Consumer Prefer-ences for Store Price Format: Why ‘Large Basket’ Shoppers Prefer EDLP,” Marketing Science, 17 (1), 66–88. [296,297,301]

Bell, D. R., Ho, T., and Tang, C. S. (1998), “Determining Where to Shop: Fixed and Variable Costs of Shopping,”Journal of Marketing Research, 35 (3), 352–369. [297,298,301]

Billingsley, P. (1968),Convergence of Probability Measures.Wiley Series in Probability and Mathematical Statistics, New York: Wiley. [305] Briesch, R. A., Chintagunta, P. K., and Matzkin, R. L. (2002), “Semiparametric

Estimation of Brand Choice Behavior,”Journal of the American Statistical Association, 97 (460), 973–982. [292,296,300]

Bucklin, L. P. (1971), “Retail Gravity Models and Consumer Choice: A The-oretical and Empirical Critique,”Economic Geography, 47 (4), 489–497. [296,297]

Cosslett, S. R. (1983), “Distribution-Free Maximum Likelihood Estimator of the Binary Choice Model,”Econometrica, 51 (3), 765–782. [291] Dahl, G. B., (2002), “Mobility and the Return to Education: Testing a Roy

Model With Multiple Markets,”Econometricia, 70 (6), 2367–2420. [292] Dayton, C. M., and McReady, G. B. (1988), “Concomitant-Variable

Latent-Class Models,”Journal of the American Statistical Association, 83 (401), 173–178. [295,297]

Elbadawi, I., Gallant, A. R., and Souza, G. (1983), “An Elasticity Can Be Estimated Consistently Without a priori Knowledge of Functional Form,” Econometricia, 51 (6), 1731–1751. [295]

Gallant, A. R., and Nychka, D. W. (1987), “Seminonparametric Maximum Likelihood Estimation,”Econometrica, 55, 363–390. [295,305]

(1989), “Seminonparametric Estimation of Conditionally Constrained Heterogeneous Processes: Asset Pricing Applications,”Econometrica, 57 (5), 1091–1120. [295]

Geweke, J., and Keane, M. (1997), “An Empirical Analysis of Male Income Dy-namics in the PSID: 1968:1989,” Research Dept. Staff Report 233, Federal Reserve Bank of Minneapolis. [291]

Han, A. K. (1987), “Non-Parametric Analysis of a Generalized Regression Model: The Maximum Rank Correlation Estimator,”Journal of Economet-rics, 35 (2–3), 303–316. [291]

Hausman, J. A., and Wise, D. A. (1978), “A Conditional Probit Model for Qual-itative Choice Discrete Decisions Recognizing Interdependence and Het-erogeneous Preferences,”Econometrica, 46, 403–426. [291]

Heckman, J. J., and Singer, B. (1984), “A Method for Minimizing the Impact of Distributional Assumptions in Econometric Models for Duration Data,” Econometrica, 52 (2), 271–320. [292,295]

Heckman, J. J., and Willis, R. (1977), “A Beta-Logistic Model for the Analysis of Sequential Labor Force Participation by Married Women,”Journal of Political Economy, 81 (1), 27–58. [291]

Hirano, K. (2002), “Semiparametric Bayesian Inference in Autoregressive Panel Data Models,”Econometricia, 70 (2), 781–799. [291]

Horowitz, J. L. (1992), “A Smooth Maximum Score Estimator for the Binary Choice Model,”Econometrica, 60, 505–531. [291]

Huff, D. L. (1962), “A Probability Analysis of Consumer Spatial Behavior,” inEmerging Concepts in Marketing, ed. W. S. Decker, Chicago: American Marketing Association, pp. 443–461. [296]

Ichimura, H. (1993), “Semiparametric Least Squares (SLS) and Weighted SLS Estimation of Single-Index Models,”Journal of Econometrics, 58 (1–2), 71–120. [291]

Ichimura, H., and Thompson, T. S. (1994), “Maximum Likelihood Estimation of a Binary Choice Model With Random Coefficients of Unknown Distrib-ution,” mimeo, University of Minnesota. [292]

Kamakura, W. A., and Russell, G. J. (1989), “A Probabilistic Choice Model for Market Segmentation and Elasticity Structure,”Journal of Marketing Research, 26 (4), 379–390. [297,299]

Kiefer, J., and Wolfowitz, J. (1956), “Consistency of the Maximum Likelihood Estimator in the Presence of Infinitely Many Incidental Parameters,”Annals of Mathematical Statistics, 27, 887–906. [304,305]

Klein, R. W., and Spady, R. H. (1993), “An Efficient Semiparametric Estimator for Discrete Choice Models,”Econometrica, 61, 387–422. [291]

Klein, R. W., and Sherman, R. P. (2002), “Shift Restrictions and Semipara-metric Estimation in Ordered Response Models,”Econometricia, 70 (2), 663–691. [291]

Lancaster, T. (1997), “Orthogonal Parameters and Panel Data,” Working Paper 97–32, Brown University, Dept. of Economics. [292]

Leszczyc, P. T., Sinah, A., and Timmermans, H. J. P. (2000), “Consumer Store Choice Dynamics: An Analysis of Competitive Market Structure for Gro-cery Stores,”Journal of Retailing, 76 (3), 323–345. [297]

Lewbel, A. (2000), “Semiparametric Qualitative Response Model Estimation With Unknown Heteroskedasticity and Instrumental Variables,”Journal of Econometrics, 97, 145–177. [294]

Lindsay, B. G. (1983), “The Geometry of Mixture Likelihoods: A General The-ory,”The Annals of Statistics, 11 (1), 86–94. [295]

Manski, C. (1975), “Maximum Score Estimation of the Stochastic Utility Model of Choice,”Journal of Econometrics, 3, 205–228. [291]

(1985), “Semiparametric Analysis of Discrete Response: Asymptotic Properties of the Maximum Score Estimator,”Journal of Econometrics, 27, 313–334. [291]

Matzkin, R. L. (1991), “Semiparametric Estimation of Monotone and Concave Utility Functions for Polychotomous Choice Models,”Econometrica, 59, 1315–1327. [291]

(1992), “Nonparametric and Distribution-Free Estimation of the Bi-nary Choice and the Threshold Crossing Models,”Econometrica, 60, 239– 270. [291,292,295,296,300]

(1993), “Nonparametric Identification and Estimation of Polychoto-mous Choice Models,”Journal of Econometrics, 58, 137–168. [291,292, 296]

(1994), “Restrictions of Economic Theory in Nonparametric Meth-ods,” inHandbook of Econometrics, Vol. 4, eds. D. McFadden and R. Engel, Amsterdam: Elsevier. [296]

(1999a), “Nonparametric Estimation of Nonadditive Random Func-tions,” mimeo, Northwestern University, presented at the 1999 Latin Amer-ican Meeting of the Econometric Society. [292]

(1999b), “Computation and Operational Properties of Nonparametric Shape Restricted Estimators,” mimeo, Northwestern University. [296]

(2003), “Nonparametric Estimation of Nonadditive Random Func-tions,”Econometrica, 71 (5), 1339–1375. [292,294,304]

(2004), “Unobservable Instruments,” mimeo, Northwestern Univer-sity. [294]

(2005), “Identification in Nonparametric Simultaneous Equations,” mimeo, Northwestern University. [301]

McFadden, D. (1974), “Conditional Logit Analysis of Qualitative Choice Be-havior,” inFrontiers in Econometrics, ed., P. Zarembka, New York, NY: Academic Press, pp. 105–142. [291]

Moon, H. R. (2004), “Maximum Score Estimation of a Nonstationary Binary Choice Model,”Journal of Econometrics, 122, 385–403. [291]

Nevo, A., and Hendel, I. (2002), “Measuring the Implications of Sales and Con-sumer Stockpiling Behavior,” mimeo, University of California, Berkeley, CA. [299]

Newey, W., and McFadden, D. (1994), “Large Sample Estimation and Hypoth-esis Testing,” inHandbook of Econometrics, Vol 4, eds. R. F. Engle and D. L. McFadden, Amsterdam: Elsevier. [305]

Powell, J. L, Stock, J. H., and Stoker, T. M. (1989), “Semiparametric Estimation of Index Coefficients,” Econometrica, 57 (6), 1403–1430. [291]

Park, B. U., Sickles, R. C., and Simar, L. (2007), “Semiparametric Efficient Estimation of Dynamic Panel Data Models,”Journal of Econometrics, 136, 281–301. [292]

Pinkse, J., Slade, M. E., and Brett, C. (2002), “Spatial Price Competition: A Semiparametric Approach,”Econometricia, 70 (3), 1111–1153. [292] Rhee, H., and Bell, D. R. (2002), “The Inter-Store Mobility of Supermarket

Shoppers,”Journal of Retailing, 78, 225–237. [297]

Smith, H. (2004), “Supermarket Choice and Supermarket Competition in Mar-ket Equilibrium,”Review of Economic Studies, 71, 235–263. [297] Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and van der Linde, A. (2002),

“Bayesian Measures of Model Complexity and Fit,”Journal of the Royal Statistical Society, Ser. B, 64 (4), 583–639. [299,300]


(6)

Taber, C. R. (2000), “Semiparametric Identification and Heterogeneity in Dis-crete Choice Dynamic Programming Models,”Journal of Econometrics, 96, 201–229. [292]

Tang, C. S., Bell, D. R., and Ho, T. (2001), “Store Choice and Shopping Be-havior: How Price Format Works,”California Management Review, 43 (2), 56–74. [296,297]

Teicher, H. (1961), “Identifiability of Mixtures,”The Annals of Mathematical Statistics, 32 (1), 244–248. [294,304]

Thompson, T. S. (1989), “Identification of Semiparametric Discrete Choice Models,” Discussion Paper 249, Center for Economic Research, University of Minnesota. [301]

Wald, A. (1949), “A Note on the Consistency of the Maximum Likelihood Es-timator,”Annals of Mathematical Statistics, 20, 595–601. [304]

Wansbeek, T., Wedel, M., and Meijer, E. (2001), Comment on “Microecono-metrics,” by J. A. Hausman,Journal of Econometrics, 100, 89–91. [291] Yachew, A. (1998), “Nonparametric Regression Techniques in Economics,”

Journal of Economic Literature, 36 (2), 699–721.