Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji jbes%2E2010%2E07225

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

A Comparison of Sales Response Predictions From
Demand Models Applied to Store-Level versus
Panel Data
Rick L. Andrews, Imran S. Currim & Peter S. H. Leeflang
To cite this article: Rick L. Andrews, Imran S. Currim & Peter S. H. Leeflang (2011) A
Comparison of Sales Response Predictions From Demand Models Applied to Store-Level
versus Panel Data, Journal of Business & Economic Statistics, 29:2, 319-326, DOI: 10.1198/
jbes.2010.07225
To link to this article: http://dx.doi.org/10.1198/jbes.2010.07225

View supplementary material

Published online: 01 Jan 2012.

Submit your article to this journal

Article views: 198


View related articles

Citing articles: 1 View citing articles

Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]

Date: 11 January 2016, At: 23:08

Supplementary materials for this article are available online. Please click the JBES link at http://pubs.amstat.org.

A Comparison of Sales Response Predictions
From Demand Models Applied to Store-Level
versus Panel Data
Rick L. A NDREWS
Lerner College of Business and Economics, University of Delaware, Newark, DE 19716 (andrewsr@udel.edu)

Imran S. C URRIM

Paul Merage School of Business, University of California, Irvine, CA 92697-3125 (iscurrim@uci.edu)

Downloaded by [Universitas Maritim Raja Ali Haji] at 23:08 11 January 2016

Peter S. H. L EEFLANG
Faculty of Economics, University of Groningen, P.O. Box 800, 9700 AV Groningen, The Netherlands
(P.S.H.Leeflang@rug.nl)
In order to generate sales promotion response predictions, marketing analysts estimate demand models using either disaggregated (consumer-level) or aggregated (store-level) scanner data. Comparison of
predictions from these demand models is complicated by the fact that models may accommodate different forms of consumer heterogeneity depending on the level of data aggregation. This study shows via
simulation that demand models with various heterogeneity specifications do not produce more accurate
sales response predictions than a homogeneous demand model applied to store-level data, with one major
exception: a random coefficients model designed to capture within-store heterogeneity using store-level
data produced significantly more accurate sales response predictions (as well as better fit) compared to
other model specifications. An empirical application to the paper towel product category adds additional
insights. This article has supplementary material online.
KEY WORDS: Finite mixture model; Heterogeneity; Nested logit; Random coefficients model

1. INTRODUCTION
Household-level scanner panel data and store-level scanner
data often have complementary uses for manufacturers of consumer packaged goods (Bodapati and Gupta 2004). Panel data,

which tracks purchases of a sample of households on an ongoing basis, allows managers to explore differences in purchase
behaviors and preferences that lead to segmentation and targeting, to determine how these segments differ in terms of demographic characteristics, to examine brand switching and loyalty
patterns, to track new product trial and repeat rates, to understand the impact of marketing variables on purchase timing and
stockpiling, and to test theories of consumer behavior (Gupta et
al. 1996).
Though the advantages of panel data are well known, storelevel data are widely available to marketing managers, are used
as a key resource for managerial decision making, are less expensive for firms to acquire, and require fewer computational
resources than household-level data (Chintagunta, Dubé, and
Singh 2002). In addition, Gupta et al. (1996) showed that inferences from panel data are not statistically representative of
those obtained from store-level data, though the differences in
inferences made from panel and store data may not be substantively significant when certain procedures are used for household purchase selection. Traditionally, store-level data were
used primarily to monitor category and brand performance over
time. However, partly due to these advantages of store-level
data, recent work (e.g., Chintagunta 2001; Sudhir 2001; Besanko, Dubé, and Gupta 2003; Bodapati and Gupta 2004) used
it to recover heterogeneity and segmentation structure, a task
traditionally in the domain of panel data.

While panel and store-level data have several complementary
uses, analysts in academic and industry settings use either type
of data to predict sales response to price reductions and promotions. However, it is unclear whether the predictions from

demand models using panel versus store-level scanner data are
more or less biased and under what conditions they are more
or less biased. Complicating the comparison of promotional response predictions from models applied to panel and store-level
data is the fact that different forms of consumer heterogeneity
can be captured using the two types of data. With panel data, the
focus is on heterogeneity in preferences and responses to marketing activity across households. Since store-level data lack
household identifiers (Bodapati and Gupta 2004), heterogeneity recovered by typical store-level applications is actually heterogeneity across store visits (Besanko, Dubé, and Gupta 2003)
and is often referred to as within-store heterogeneity. Bodapati
and Gupta (2004) demonstrated that parameters from a storelevel model explaining within-store heterogeneity can approximate those of panel data models explaining household heterogeneity, especially when sample sizes are very large.
Though less common in the marketing literature, it is also
feasible and potentially beneficial to capture across-store and
across-week (temporal) heterogeneity with store-level data
(e.g., Hoch et al. 1995; Montgomery 1997; Van Heerde,
Leeflang, and Wittink 2000). Demographic and psychographic

319

© 2011 American Statistical Association
Journal of Business & Economic Statistics
April 2011, Vol. 29, No. 2

DOI: 10.1198/jbes.2010.07225

Downloaded by [Universitas Maritim Raja Ali Haji] at 23:08 11 January 2016

320

Journal of Business & Economic Statistics, April 2011

profiles of consumers typically vary across stores in different
market areas. Consumer responses to promotions may also
change over time in response to changes in the frequency
and/or depth of promotions (Raju 1992), causing consumers to
change their expectations of future promotional activities and
hence their responses to current promotional activities. Thus,
the preferences and response sensitivities of consumers may
vary across stores and over time.
Based on research showing the importance of the number
of households and the number of purchases per household on
the recovery of household-level parameter estimates (e.g., Andrews, Ainslie, and Currim 2002), data characteristics such as
the number of stores, the number of observations per store (i.e.,

the number of weeks), and the number of households per store
can also affect bias in sales response predictions from storelevel and panel data models. However, the nature and extent of
the effects of such data characteristics in typical analysis settings remains unknown.
Endogeneity in prices (Shugan 2004) can affect models’
sales response predictions in different ways depending on the
heterogeneity specification and the level of data aggregation.
Endogeneity arises when there are variables for which data are
not available (such as shelf space allocation, reputation, or other
factors that vary over time but are constant across households)
that can influence a brand’s sales, and these variables are correlated with included marketing variables such as price (Chintagunta 2001). Whether endogeneity in prices affects the bias in
sales response predictions for store-level and panel data models, and if so to what extent, is not known.
This study designs a simulation experiment to determine the
effects of data aggregation (panel versus store level), heterogeneity, endogeneity, and the number of households, stores,
and weeks on bias in sales response predictions. With the everincreasing variety of model specifications for panel and store
data available to research analysts, it will be useful to know the
conditions under which prediction bias occurs and when it occurs to understand why. Knowing that complex models produce
nearly the same sales response predictions as simpler homogeneous models under certain conditions will allow analysts the
freedom to use simpler models in those conditions (though the
analyst should never knowingly utilize misspecified models).
Likewise, knowing the conditions under which panel data and

store-level data provide similar promotional response predictions can allow managers to use less costly, more widely available, and more computationally efficient store data for predicting promotional responses in those conditions.
In the next section, we describe the simulation study in detail. Following an examination of the simulation results, we apply various store-level and panel data models to actual Information Resources, Inc. (IRI) scanner data to investigate correspondence with the simulation results. Finally, we summarize
the study and describe directions for future research.
2. RESEARCH DESIGN
2.1 Data Generation Process
In this study, both the data generation process and the models
fitted to the simulated data are based on a nested logit formulation (Bucklin, Gupta, and Siddarth 1998; Bodapati and Gupta

2004). Given that the household makes a purchase (known as
incidence) in store s and week t, the choice probability for brand
b takes the form
exp(Xbst β)
P(b|incidence, Xst , β) = 
,
(1)
k=1 exp(Xkst β)
where Xst contains brand-specific dummy variables, price, and
promotion variables for store s and week t. The purchase incidence probabilities take the form
P(incidence|Xst , γ ) =


exp(γ0 + γ1 CVst )
,
1 + exp(γ0 + γ1 CVst )

(2)

where CVst is defined as the category value for store s and week
t, defined as the log of the denominator of Equation (1). The
category value represents the maximum utility available to the
household from making a purchase.
If the “no purchase” option is represented by b = 0, and if
the brands are represented by b = 1, . . . , B, then the probability
that the consumer will purchase brand b in store s in week t,
pbst , is

⎨ 1 − P(incidence|Xst , γ ) for b = 0
pbst = P(incidence|Xst , γ ) × P(b|incidence, Xst , β) (3)

for b = 0.


The incidence probabilities are determined by a constant and
the coefficient for the inclusive value (two parameters, γ0 and
γ1 ), while the brand choice probabilities are determined by
price, promotion, and four brand-specific constants for the five
brands assumed to be available to households (six parameters
in β).
The data characteristics that are manipulated and the levels
of those characteristics are as follows (see justification in Appendix A):
1. The degree of heterogeneity between consumer segments:
Mean difference = 0.5 or 1.0;
2. The degree of heterogeneity within consumer segments:
Variance = 0.05 or 0.25;
3. Number of weeks T: 50 or 100;
4. Number of stores S: 10 or 50;
5. Number of households per store: 400 or 800;
6. Correlation between error terms of price and demand
equations (strength of endogeneity): 0 or 0.50.
We use the following data generation process to create the
choices comprising the 320 panel and store-level datasets.
First, the response parameters for the incidence model (γ )

and the brand choice model (β) are generated. We generate
across household heterogeneity from a mixture of normal distributions, but with different stores drawing from the mixture
components (i.e., segments) with different weights to reflect
across-store differences in clientele composition. We assume
two segments of consumers for all datasets. The means of the
segment-specific coefficient vectors differ on average by either 0.5 or 1.0, depending on Factor 1. In addition, we assume within-segment variation in each segment (i.e., consumers
within a segment are not exactly alike), having a normal distribution with variance 0.05 or 0.25, depending on the level of
Factor 2. Factors 1 and 2 taken together result in the coefficients having a mixture of normal distributions, with the modes

Downloaded by [Universitas Maritim Raja Ali Haji] at 23:08 11 January 2016

Andrews, Currim, and Leeflang: Comparison of Sales Response Predictions From Demand Models

being nearer or farther apart, depending on Factor 1, and being
more or less peaked, depending on Factor 2. The composition
of store clientele is determined from a uniform distribution for
each dataset, with as little as 10% and as much as 90% of a
store’s clientele being drawn from a given segment.
Marketing mix data Xst , consisting of price and promotion,
are then constructed. Prices must be generated in such a manner that an endogeneity condition can be created. A linear price

equation is used to generate prices in which prices are determined by (i) a cost factor (drawn from a standard normal distribution), (ii) an unobserved factor (normally distributed) that
will also affect demand, thus giving rise to the endogeneity
problem, and (iii) a normally distributed error term. Demand
is a function of (i) brand-specific constants, price, and promotion, (ii) an extreme value error term, and in datasets with an
endogeneity condition, (iii) the unobserved factor affecting the
price setting of firms, such as brand reputation or style. The
unobserved factor affects demand only in datasets with an endogeneity condition. (See Appendix A for the parameters assumed for the price and demand equations.) The unobserved
factor is assumed to vary across brands and weeks, but is constant across stores, consistent with the literature on endogeneity
(e.g., Villas-Boas and Winer 1999). The extreme value distribution for the demand equation error term gives rise to the logit
functional form.
After generating demand utilities in this manner, we use the
nested logit formulation [Equations (1) to (3)] to calculate the
probabilities for the “no purchase” (brand 0) option and for the
purchase of each of the five brands b, b = 1, . . . , 5. Householdlevel purchase behavior on a store visit is simulated by using a uniform random draw to pick from the vector of choice
probabilities. All purchases are assumed to be single-unit purchases (see also Bodapati and Gupta 2004). Household-level
store visits are aggregated across households to form a storelevel dataset with S × T observations. The observation for store
s and week t records the number of purchases for each brand b.
Panel datasets are formed by randomly sampling 500 panelists
across 50 weeks from the household-level store visits generated
previously; it will not be computationally feasible to retain all
store visits given the massive numbers (four million) generated
for some datasets.
2.2 Models Fitted to Simulated Datasets
We estimate 10 different model specifications for the simulated datasets depending on the specification of consumer heterogeneity (across store visits, households, weeks, or stores).
The models also assume that heterogeneity is described by either continuous distributions (random coefficients models) or
nonparametric, discrete distributions (finite mixture models).
All model specifications are based on the same nested logit
formulation described earlier [Equations (1) to (3)]. For each
store-level dataset we estimate:
1. A homogeneous nested logit model;
2. A finite mixture nested logit model with within-store heterogeneity;
3. A random coefficients nested logit model with withinstore heterogeneity;

321

4. A finite mixture nested logit model with across-store heterogeneity;
5. A random coefficients nested logit model with acrossstore heterogeneity;
6. A finite mixture nested logit model with temporal heterogeneity; and
7. A random coefficients nested logit with temporal heterogeneity.
For each panel dataset, we estimate:
1. A homogeneous nested logit model;
2. A random coefficients nested logit model with acrosshousehold heterogeneity; and
3. A finite mixture nested logit model with across-household
heterogeneity.
The mathematical formulations of these models, including the
expressions for the log-likelihood functions, are described in
Appendix B.
For datasets in which prices are endogenous, we use a control function approach (Petrin and Train 2010) to account for
the endogeneity. The control function approach involves first
estimating a price equation by regressing observed prices on
suitable instruments. Then the fitted residuals from the price
equation are inserted as an additional regressor into the demand
equation. Model specifications with and without an additional
random demand shock in the demand equation are estimated
(analogous to random brand-specific intercepts that vary across
weeks but not households/stores) and without exception models
without the additional random demand shock produced more
accurate sales response predictions. Petrin and Train (2010)
also found that the additional random demand shock did not
result in a better model performance. Thus, we report results in
which model specifications do not include an additional random
demand shock in the demand equation.
The study by Andrews and Ebbes (2009) compared the control function approach with other approaches for controlling endogeneity in logit-based demand models and found that it performs almost identically to a simultaneous equations procedure
(e.g., Villas-Boas and Winer 1999) and better than other widely
used procedures for controlling endogeneity (e.g., Berry, Levinsohn, and Pakes 1995). We assume a worst-case scenario in
which the analyst does not have access to the cost variables
used to generate prices, which will serve as useful instruments
if available. Instead, we utilize the readily available instruments
Z˜ jts = (Psjt − P¯ jt ), where Psjt is the price of brand j in store s during week t and P¯ jt is the average price of brand j at time t across
stores. It is possible to show that these store mean-centered
price instruments are uncorrelated with the unobserved factor
affecting prices and demand and that the instruments are also
correlated with actual prices (Andrews and Ebbes 2009), satisfying all properties of desirable instruments.
In our study, the linear pricing equation assumed for the main
simulation is consistent with marginal cost pricing (pure competition) and cost-plus (fixed markup) pricing since the unobserved demand shock enters the price equation linearly and is
normally distributed. Studies showed that such pricing models are very widely used in the industry (see the discussion in
Park and Gupta 2009). In addition to a linear pricing model,

Downloaded by [Universitas Maritim Raja Ali Haji] at 23:08 11 January 2016

322

Journal of Business & Economic Statistics, April 2011

the study by Andrews and Ebbes (2009) also generated prices
using Nash pricing, as will occur in a differentiated products
oligopoly, for which the unobserved demand shock enters the
price equation in a highly nonlinear fashion. They found that,
even under a Nash pricing scenario, the store mean-centered
price instruments used in conjunction with the control function approach very effectively correct endogeneity problems in
logit-based demand models.
Calculation of sales response predictions, described in the
following, requires the estimation of unit-level parameters for
each model. For the finite mixture models, this is done by calculating the posterior probabilities of segment membership using Bayes’ theorem and then using the posterior probabilities
to form a weighted average of the segment-level parameters.
For the random coefficients models, a simulation-based procedure, also based on Bayes’ theorem, is required for estimating
unit-level parameters (see Train 2003, chap. 11). This procedure, which is not time intensive computationally, allows random coefficients models estimated with simulated maximum
likelihood to share the same advantages as those estimated with
hierarchical Bayes techniques.

3. SIMULATION RESULTS

2.3 Calculating Sales Response Predictions
Our goal is to compare the promotional response predictions from panel and store-level models accommodating different types of heterogeneity. To create a measure of promotional
response, we randomly select one brand for each dataset and
institute changes in the promotional depth and frequency over
the entire 50 or 100-week period. To change the depth of promotion, for every existing promotional event, we increase the
baseline depth for the focal brand by a randomly determined
amount, uniformly distributed between $0.10 and $0.25 (the
baseline mean promotional price reductions were $0.25–$0.50).
To increase the frequency, we count the number of promotional
events for the focal brand over the entire period, create 10%–
25% more events, uniformly distributed, and lower the price
by the average price reduction (including the additional depth
described earlier) for the additional promotional events. Using
each of the 10 estimated models, we then calculate the predicted
percentage change in market share for the focal brand compared
to the predicted baseline market share before the changes in the
promotional environment were made. That is, for each of the 10
models, we calculate the baseline predicted market share for the
focal brand before making any promotional changes, P0 , make
changes in the promotional environment, and then calculate the
average predicted choice probability of the focal brand, P1 . The
estimated promotional response for a model is calculated as the
percentage change Rest = (P1 − P0 )/P0 .
To assess the accuracy of these sales response predictions, we
calculate the true promotional response predictions (Rtrue ) using the household-level store visit data, the true household-level
γ and β parameters, the newly generated price and promotion
data, and the unobserved variables affecting demand. We calculate the percentage change in market share for the focal brand
due to the promotion increase, as we did for each of the 10 estimated models. Once the true percentage changes are calculated,
we calculate for each model the squared errors between the estimated and true percentage sales responses, (Rest − Rtrue )2 , and
use them as dependent variables in our analysis. Since mean
squared errors capture bias and variance, we also calculate bias
alone (Rest −Rtrue ) and use them as dependent variables as well.

3.1 Analysis of Fit
We begin by examining the posterior fit of the various models. The posterior fit is obtained by first calculating estimates
of unit-level parameters for models capturing some type of heterogeneity (store visit, household, store, or week), as discussed
in Section 2.2. Using the unit-level parameters, we calculate the
likelihood of the data.
To assess the effects of experimental factors on model fit, we
conducted a repeated measures analysis of variance (ANOVA)
in which the posterior model likelihood is the dependent variable, model type is a within-datasets factor, and the data characteristics are between-datasets factors; interactions of model
type with each data characteristic are also included. According
to the ANOVA results (not shown for brevity), there are significant differences in fit among model types. Simple contrasts
show that all model specifications fit better than the baseline
homogeneous nested logit applied to store level data (M1), with
the exception of the homogenous nested logit applied to panel
data (M8, P = 0.127). Thus, accommodating any type of consumer heterogeneity improves model fit, but the level of data
aggregation (panel versus store-level) does not affect the results
for fit. All manipulated factors except the number of households
per store have either significant main effects or significant interactions with model type.
Table 1 [part (a)] shows the mean log-likelihood values by
model type and experimental factor for all factors having significant interactions with model type (for other factors, the pattern
of results does not depend on model specification). Model M3
(random coefficients nested logit explaining within-store heterogeneity using store data) has by far the best fit to the data.
In general, the best-fitting models are the ones that allow the
most flexible heterogeneity structures. For the random coefficients models, for example, the best-fitting models are the ones
that allow heterogeneity across store visits (within-store heterogeneity), across households, across weeks, and across stores, in
that order, coinciding with the number of units allowed to have
different sets of parameters. (Similar results are obtained for the
finite mixture specifications.) The worst-fitting models are M1
(homogeneous model for store data), M8 (homogeneous model
for panel data), and M5 (random coefficients model, acrossstore heterogeneity), which allow little (in the case of M5) or
no (in the case of M1 and M8) heterogeneity in parameters.
The factor-level means in the right-most column of Table 1,
part (a), generally have the expected patterns. Only withinsegment heterogeneity and the strength of endogeneity have
statistically significant main effects, the largest of which is for
the endogeneity factor. Endogeneity in prices results in significantly worse fit because random demand shocks correlated with
prices are added to the utility function to create the endogeneity
condition, and additional randomness always produces worse
fit. The end result is an increase in the variance of the error
term.
3.2 Analysis of Bias in Sales Response Predictions
We also conducted an ANOVA on the bias in sales response
predictions. Simple contrasts show that model M3 (random

Andrews, Currim, and Leeflang: Comparison of Sales Response Predictions From Demand Models

323

Table 1. Simulation results for fit, bias, and MSE
Data factors

M1

M2

M3

M4

M5

M6

M7

M8

M9

M10

Means*

(a) Fit—M3 fits significantly better than all other models according to log-likelihood values
Between-segment (BS) heterogeneity
Mean separation = 0.50
−1.104 −1.104 −0.626 −1.103 −1.104 −1.082 −1.061 −1.103 −1.041 −1.077 −1.041
Mean separation = 1.00
−1.127 −1.054 −0.578 −1.122 −1.126 −1.105 −1.087 −1.127 −1.038 −1.073 −1.044

Downloaded by [Universitas Maritim Raja Ali Haji] at 23:08 11 January 2016

Within-segment (WS) heterogeneity
Variance = 0.05
−1.088 −1.055 −0.608 −1.085 −1.087 −1.065 −1.047 −1.087 −1.027 −1.052 −1.020*
Variance = 0.25
−1.144 −1.103 −0.597 −1.140 −1.143 −1.121 −1.100 −1.143 −1.053 −1.098 −1.064
Weeks
50
100

−1.098 −1.061 −0.581 −1.095 −1.098 −1.074 −1.054 −1.098 −1.023 −1.059 −1.024
−1.134 −1.097 −0.624 −1.130 −1.132 −1.112 −1.093 −1.131 −1.057 −1.092 −1.060

Stores
10
50

−1.118 −1.089 −0.570 −1.114 −1.117 −1.093 −1.095 −1.116 −1.044 −1.077 −1.043
−1.114 −1.069 −0.635 −1.111 −1.113 −1.093 −1.053 −1.114 −1.035 −1.074 −1.041

Strength of endogeneity
ρ = 0.00**
ρ = 0.50

−1.063 −1.007 −0.647 −1.058 −1.062 −1.062 −1.063 −1.061 −0.981 −1.018 −1.002*
−1.169 −1.151 −0.558 −1.167 −1.169 −1.124 −1.085 −1.168 −1.098 −1.132 −1.082

Means
95% confidence intervals

−1.116 −1.079 −0.602 −1.112 −1.115 −1.093 −1.074 −1.115 −1.040 −1.075 −1.042
−1.138 −1.116 −0.617 −1.134 −1.137 −1.114 −1.094 −1.136 −1.060 −1.096
−1.094 −1.042 −0.588 −1.091 −1.093 −1.072 −1.054 −1.093 −1.019 −1.054
(b) Bias—M3 has significantly lower bias than all other models

Between-segment heterogeneity
0.50
1.00

0.218
0.204

0.246
0.274

0.183
0.162

0.247
0.239

0.248
0.239

0.251
0.243

0.250
0.242

0.240
0.226

0.275
0.262

0.253
0.239

0.241
0.233

Strength of endogeneity
ρ = 0.00**
ρ = 0.50

0.203
0.219

0.263
0.257

0.164
0.181

0.248
0.239

0.247
0.239

0.248
0.246

0.247
0.245

0.248
0.218

0.266
0.271

0.249
0.243

0.238
0.236

0.211
0.197
0.225

0.260
0.242
0.278

0.172
0.158
0.186

0.243
0.227
0.259

0.243
0.227
0.259

0.247
0.231
0.263

0.246
0.230
0.262

0.233
0.217
0.249

0.269
0.250
0.287

0.246
0.229
0.263

0.237

Means
95% confidence intervals

(c) MSE—M3 has significantly lower MSE than all other models
Between-segment heterogeneity
0.50
1.00
Strength of endogeneity
ρ = 0.00**
ρ = 0.50
Means
95% confidence intervals

0.065
0.058

0.088
0.101

0.051
0.043

0.084
0.078

0.084
0.077

0.087
0.079

0.086
0.079

0.082
0.068

0.106
0.093

0.090
0.078

0.082
0.075

0.059
0.064

0.094
0.095

0.042
0.051

0.085
0.077

0.085
0.077

0.086
0.081

0.084
0.081

0.087
0.063

0.100
0.099

0.088
0.080

0.081
0.077

0.061
0.053
0.070

0.095
0.081
0.109

0.047
0.039
0.054

0.081
0.070
0.092

0.081
0.070
0.092

0.083
0.072
0.094

0.083
0.071
0.094

0.075
0.065
0.085

0.100
0.086
0.113

0.084
0.072
0.096

0.079

NOTE: Only factors with statistically significant main effects or interactions with model type are shown. * indicates statistically significant main effects for factor; ** is correlation
between price and demand equation error terms. Models fitted to store-level data: M1, Homogeneous nested logit; M2, Finite mixture nested logit, within-store heterogeneity; M3, Random
coefficients nested logit, within-store heterogeneity; M4, Finite mixture nested logit, across-store heterogeneity; M5, Random coefficients nested logit, across-store heterogeneity; M6,
Finite mixture nested logit, temporal heterogeneity; M7, Random coefficients nested logit, temporal heterogeneity. Models fitted to panel data: M8, Homogeneous nested logit; M9,
Random coefficients nested logit, across-household heterogeneity; M10, Finite mixture nested logit, across-household heterogeneity.

coefficients nested logit, within-store heterogeneity) produces
significantly lower bias than the baseline homogeneous nested
logit model applied to store-level data (M1), whereas all other
model specifications produce significantly higher bias than M1.
The number of stores and the number of households per store
have significant main effects on bias (increases in both factors
produce lower bias, as expected), while between-segment heterogeneity and strength of endogeneity have significant interac-

tions with model type (but no main effects). Other manipulated
factors have no effects on bias.
Table 1, part (b) shows the mean bias values by model type
and experimental factor for all factors having significant interactions with model type (for other factors, the pattern of results
does not depend on the model specification). Consistent with
the results for fit we find that M3 (random coefficients nested
logit, within-store heterogeneity) has significantly lower bias

Downloaded by [Universitas Maritim Raja Ali Haji] at 23:08 11 January 2016

324

Journal of Business & Economic Statistics, April 2011

in sales response predictions than all other models. Though all
other specifications produce significantly higher bias than the
homogeneous nested logit applied to store-level data, these differences across model types are small from a practical standpoint, regardless of the heterogeneity specification and regardless of the level of data aggregation. M9 (random coefficients
nested logit, across-household heterogeneity) and M2 (finite
mixture nested logit, within-store heterogeneity) have higher
bias values than all other models, which is surprising given the
good fit of these models [Table 1, part (a)].
Though endogeneity affects model biases in slightly different ways, resulting in a significant model by endogeneity interaction, the strength of endogeneity has no significant main effect [note the mean values in the right-most column of Table 1,
part (b)]. The lack of a significant main effect for endogeneity
indicates that the instrumental variables estimation is effective
for parameter estimation and generation of sales response predictions, despite its detrimental effect on fit.
3.3 Analysis of Mean Squared Error in Sales
Response Predictions
Finally, we conducted a repeated measures ANOVA on the
squared sales response prediction errors. The mean squared error (MSE) captures bias and variance in sales response predictions and is thus a more comprehensive measure of error. As
with the bias measure, M3 produces significantly lower MSE
values than the baseline homogeneous logit model applied to
store-level data, while all other specifications produce significantly higher MSE values than the homogeneous logit. All
MSE results in Table 1, part (c) are completely consistent with
the bias results in part (b), so we do not discuss these results in
detail.
To see how the results of the simulation translate to an actual
scanner panel dataset, we estimate all 10 model specifications
using scanner panel data from IRI.
4. APPLICATION TO ACTUAL DATA
The panelists for this application, who shopped in nine stores
located in a Chicago suburban area, are tracked over the 112week period from September 1995 to November 1997. A random sample of 300 households tracked over 52 weeks is used
for analysis. The total number of store trips made by the sample
panelists during the calibration period was 25,105, with 2022 of
the trips resulting in the purchase of paper towels; panelists purchased 3499 rolls of paper towels. We focus on single-roll packs
in the paper towel category. Brand names with 3% or greater
market share were retained for analysis. Price, store feature advertising, and aisle display data are available. The price variable is shelf price, inclusive of promotions. We found that, of
the two promotional variables, only aisle display was important
as a predictor.
To create a measure of promotional response, we use the
same general procedure used in the simulation for instituting
random price reductions through increases in the promotional
depth (0.10–0.25) and frequency (10%–25%), with three differences. First, since we have only one dataset, we simulate new
promotion environments for each brand separately, whereas in

the simulation study we randomly selected one brand for analysis in each dataset. Second, to control simulation error associated with the generation of the new promotion environments,
we generate 300 new promotional environments (replications)
for each brand for the one dataset, whereas in the simulation,
we generated one new promotional environment for each of
320 datasets. Finally, for the simulation, since the true sales response predictions were known, we computed bias and MSE
criteria to assess the accuracy of the models’ sales response
predictions. Since the true sales response predictions are not
known with the actual data, we assess the convergence of predictions from different heterogeneity specifications by presenting the correlations of the models’ sales response predictions
across simulated promotional scenarios.
In Table 2 we show (a) the posterior fit of the various models and (b) the correlations of the models’ sales response predictions across simulated promotional scenarios. Looking first
at model fit, we see that, as was the case with the simulation,
M3 fits far better than any other model, with M9 and M2 also
fitting better than most other models. Consistent with the simulation results, models specified with within-store heterogeneity
or across household heterogeneity fit the data best due to their
less restrictive assumptions on the nature of heterogeneity.
The correlations of sales response predictions across simulated scenarios are shown in part (b) of Table 2. The convergence of predictions from the homogeneous models (M1 and
M8) and various store-level specifications (M4, M5, M6, and
M7) is generally high and consistent with the findings of the
simulation. The predictions of M9 (random coefficients nested
logit, across-household heterogeneity) and M2 (finite mixture
nested logit, within-store heterogeneity) are not convergent
with those of other models, despite the good fit of these models. For M9 (random coefficients nested logit, across-household
heterogeneity), the distribution of price coefficients is not credible, with a mean of −6.01 and a standard deviation of 4.19.
As a point of reference, the homogeneous models M1 and M8
produce statistically significant price coefficient estimates of
−1.09. The posterior estimates of household-level price coefficients from M9 range from −10.53 to 3.95, which is a very
wide range for the coefficients and is not very believable. These
estimates should serve as warning signals to an analyst that perhaps the model should not be used as a basis for managerial
decision making.
Closer inspection of the M2 model estimates shows that the
model is poorly identified and unstable. The standard error of
one of the brand constants is 502, and in some runs, the covariance matrix of the parameters even fails to invert. These are
clear signals to the analyst that the model should not be used
for managerial decision making.
M3 (random coefficients nested logit, within-store heterogeneity) produces predictions that are reasonably consistent
with those of other models. Inspection of the model estimates
reveals believable parameters for the distribution of the price
coefficients (mean = −2.24, SD = 0.32) and promotion coefficients (mean = 1.89, SD = 0.73). Thus, we have no reason to
believe that M3 performs any differently in the empirical application than it did in the simulation study, given its excellent fit
and reasonable parameter estimates.
In conclusion, we observe a general consistency between the
analysis of the paper towel category and the simulation results.

Andrews, Currim, and Leeflang: Comparison of Sales Response Predictions From Demand Models

325

Table 2. Empirical application to data from paper towel product category
(a) Posterior fit
Model

Posterior fit, LL

M3
M9
M10
M2
M5
M4
M7
M6
M1
M8

−0.1610
−0.3054
−0.3614
−0.3655
−0.3929
−0.3996
−0.4002
−0.4071
−0.4100
−0.4100

Downloaded by [Universitas Maritim Raja Ali Haji] at 23:08 11 January 2016

(b) Correlations of sales response predictions across simulated scenarios
Model
M1
M2
M3
M4
M5
M6
M7
M8
M9
M10

M1

M2

M3

M4

M5

M6

M7

M8

M9

M10

1.00
−0.42
0.91
1.00
0.95
0.99
0.90
0.96
−0.17
0.54

1.00
−0.55
−0.37
−0.47
−0.39
−0.66
−0.25
0.86
0.47

1.00
0.88
0.81
0.91
0.91
0.83
−0.30
0.27

1.00
0.96
0.99
0.87
0.96
−0.11
0.58

1.00
0.92
0.87
0.89
−0.19
0.46

1.00
0.90
0.97
−0.17
0.55

1.00
0.82
−0.49
0.19

1.00
−0.03
0.69

1.00
0.60

1.00

NOTE: Models fitted to store-level data: M1, Homogeneous nested logit; M2, Finite mixture nested logit, within-store heterogeneity; M3, Random coefficients nested logit, withinstore heterogeneity; M4, Finite mixture nested logit, across-store heterogeneity; M5, Random coefficients nested logit, across-store heterogeneity; M6, Finite mixture nested logit,
temporal heterogeneity; M7, Random coefficients nested logit, temporal heterogeneity. Models fitted to panel data: M8, Homogeneous nested logit; M9, Random coefficients nested
logit, across-household heterogeneity; M10, Finite mixture nested logit, across-household heterogeneity.

The ordering of the models according to fit is generally consistent, with M3 having vastly superior fit and M2 and M9 fitting
better than most other models. We observe in the empirical application that M9 and M2 produce sales response predictions
that are inconsistent with those of other models; in the simulation, we observe that M9 and M2 have the largest biases in
sales response predictions as well as the largest MSE. In the
empirical application, inspection of the model estimates produces strong signals as to the validity of the predictions.
We do not mean to suggest that M2 and M9 produce such
poor results across all empirical applications. The analysis of a
single dataset from the paper towel category shows that the idiosyncrasies of any particular dataset can result in very poor outcomes for some models. In contrast, simulation results produce
insights into the patterns of model convergence and divergence
over a large number of conditions and replications, pointing to
most likely outcomes.
5. CONCLUSION
The goal of this study is to explore via simulation whether
the sales response predictions from demand models with various heterogeneity specifications converge or diverge under different levels of data aggregation and various heterogeneity conditions, endogeneity conditions, and sample size conditions. No
prior study has explored the convergence of sales response predictions from a wide variety of models fitted to store and panel

data across a variety of conditions commonly faced by marketing analysts.
With regard to fit, the study shows that, on average, the flexibility of the heterogeneity specification for a demand model has
much impact on fit, whereas the level of data aggregation does
not. The model with the least restrictive heterogeneity specification (within-store heterogeneity, which refers to heterogeneity
across store visits) fits better than all other models, while homogeneous models assuming parameter invariance across store
visits, households, weeks, and stores produced the worst fit.
Models with the best fit did not necessarily produce the most accurate predictions of sales response to promotions. This is reminiscent of findings of other panel data-based simulation studies
using different measures of model performance, as well as other
empirical studies (e.g., Foekens, Leeflang, and Wittink 1994).
With regard to the accuracy of sales response predictions, the
simulation shows compelling evidence that sales response predictions are similarly accurate across models, including models
applied to panel and store-level data, models explaining heterogeneity across households, stores, and weeks, and even models explaining no heterogeneity, with one important exception:
models explaining within-store heterogeneity (i.e., heterogeneity across store visits) using random distributions for the coefficients produced predictions that were significantly more accurate than those of all other models. One implication of this
finding is that if the sole objective of the analysis is to predict
market response to a new promotional environment, store-level

Downloaded by [Universitas Maritim Raja Ali Haji] at 23:08 11 January 2016

326

Journal of Business & Economic Statistics, April 2011

data should suffice for such a task. Since store-level data are
generally cheaper to obtain, more widely available, and more
computationally efficient than panel data, this is an important
finding.
We note that, technically, none of the model specifications
has a completely correct specification of consumer heterogeneity since we adopted a mixture of normal distributions for
within-market across-consumer heterogeneity, with the mixture
components weighted differently for different markets. The random coefficients within-store specification produces the most
accurate sales response predictions because it is capable of capturing heterogeneity across store visits, which is a more flexible specification than those used to capture heterogeneity across
households, stores, or weeks. The finite mixture version of this
specification is not as flexible as the random coefficients version
and is, therefore, not as effective. In addition, as was demonstrated in the empirical application, the finite mixture version
sometimes produces unstable results.
An empirical application to actual scanner data shows that
models producing the least accurate sales response predictions
in the simulation also produce divergent sales forecasts in the
empirical application, despite good fit statistics. Our analysis
indicates that the face validity of model results (e.g., evidence
of instability of estimates or lack of identification or unrealistic
parameters for coefficient distributions, particularly the distributions of price and promotion coefficients) provides a good
indication as to whether a model might perform poorly. The
idiosyncrasies of any particular dataset can produce very different outcomes for some models, underscoring the value of the
simulation results, which point to most-likely, big-picture outcomes.
Much work remains to be done in the area of logit-based demand models. In this study, the specifications of the panel and
store-level models are equivalent apart from the heterogeneity
specification. One question that arises is how will the convergence of predictions between store-level and panel data models be affected if more household-specific constructs such as
purchase-event feedback or choice sets were included in the
model specification? Such constructs are invariably important
predictors of household-level choices, yet they cannot be used
for store-level data. We hope that our research will stimulate
ongoing research on modeling promotional response with storelevel data versus panel data.
SUPPLEMENTAL MATERIALS
Appendices: Appendix A contains the simulation design and
procedures. Appendix B contains the models fitted to simulated datasets. (Supplemental appendices.pdf)

ACKNOWLEDGMENTS
The authors thank the co-editors, the area editor, and the reviewers for their very helpful comments on this manuscript.
[Received September 2007. Revised October 2009.]

REFERENCES
Andrews, R. L., and Ebbes, P. (2009), “Properties of Instrumental Variables
Estimation in Logit-Based Demand Models: Finite Sample Results,” unpublished manuscript, University of Delaware. [321,322]
Andrews, R. L., Ainslie, A., and Currim, I. S. (2002), “An Empirical Comparison of Logit Choice Models With Discrete versus Continuous Representations of Heterogeneity,” Journal of Marketing Research, 39, 479–487. [320]
Berry, S., Levinsohn, J., and Pakes, A. (1995), “Automobile Prices in Market
Equilibrium,” Econometrica, 63, 841–890. [321]
Besanko, D., Dubé, J. P., and Gupta, S. (2003), “Competitive Price Discrimination Strategies in a Vertical Channel Using Aggregate Retail Data,” Management Science, 49, 1121–1138. [319]
Bodapati, A. V., and Gupta, S. (2004), “The Recoverability of Segmentation
Structure From Store-Level Aggregate Data,” Journal of Marketing Research, 41, 351–364. [319-321]
Bucklin, R. E., Gupta, S., and Siddarth, S. (1998), “Determining Segmentation in Sales Response Across Consumer Purchase Behaviors,” Journal of
Marketing Research, 35, 189–197. [320]
Chintagunta, P. K. (2001), “Endogeneity and Heterogeneity in a Probit Demand
Model: Estimation Using Aggregate Data,” Marketing Science, 20, 442–
456. [319,320]
Chintagunta, P. K., Dubé, J. P., and Singh, V. (2002), “Market Structure Across
Stores: An Application of a Random Coefficients Logit Model With Store
Level Data,” in Advances in Econometrics, eds. P. H. Franses and A. Montgomery, Amsterdam, NY: JAI Press. [319]
Foekens, E. W., Leeflang, P. S. H., and Wittink, D. R. (1994), “A Comparison
and an Exploration of the Forecasting Accuracy of a Loglinear Model at
Different Levels of Aggregation,” International Journal of Forecasting, 10,
245–261. [325]
Gupta, S., Chintagunta, P. K., Kaul, A., and Wittink, D. R. (1996), “Do
Household Scanner Panels Provide Representative Inferences From Brand
Choices? A Comparison With Store Data,” Journal of Marketing Research,
33, 383–398. [319]
Hoch, S. J., Kim, B. D., Montgomery, A. L., and Rossi, P. E. (1995), “Determinants of Store-Level Price Elasticity,” Journal of Marketing Research, 32,
17–29. [319]
Montgomery, A. L. (1997), “Creating Micro-Marketing Pricing Strategies Using Supermarket Scanner Data,” Marketing Science, 16, 315–337. [319]
Park, S., and Gupta, S. (2009), “Simulated Maximum Likelihood Estimator for
the Random Coefficient Logit Model Using Aggregate Data,” Journal of
Marketing Research, 46 (August), 531–542. [321]
Petrin, A., and Train, K. (2010), “A Control Function Approach to Endogeneity
in Consumer Choice Models,” Journal of Marketing Research, 46, 3–13.
[321]
Raju, J. S. (1992), “The Effect of Price Promotions on Variability in Product
Category Sales,” Marketing Science, 11, 207–220. [320]
Shugan, S. M. (2004), “Endogeneity in Marketing Decision Models,” Marketing Science, 23, 1–3. [320]
Sudhir, K. (2001), “Competitive Pricing Behavior in the Auto Market: A Structural Analysis,” Marketing Science, 20, 42–60. [319]
Train, K. E. (2003), Discrete Choice Methods With Simulation, Cambridge:
Cambridge University Press. [322]
Van Heerde, H. J., Leeflang, P. S. H., and Wittink, D. R. (2000), “The Estimation of Pre- and Postpromotion Dips With Store-Level Scanner Data,”
Journal of Marketing Research, 37, 383–395. [319]
Villas-Boas, J. M., and Winer, R. S. (1999), “Endogeneity in Brand Choice
Models,” Management Science, 45, 1324–1338. [321]