Directory UMM :Data Elmu:jurnal:T:Transportation Research Part B Methodological:Vol33.Issue1.Feb1999:

TRANSPORTATION
RESEARCH
PART B

Transportation Research Part B 33 (1999) 63±79

A practical technique to estimate multinomial probit models
in transportation
Denis Bolduc *
DeÂpartment d'eÂconomique, Universite Laval, Sainte-Foy, QueÂbec, Canada G1K 7P4
Received 14 September 1994; received in revised form 1 June 1998

Abstract
The Multinomial Probit (MNP) formulation provides a very general framework to allow for interdependent alternatives in discrete choice analysis. Up until recently, its use was rather limited, mainly
because of the computational diculties associated with the evaluation of the choice probabilities which
are multidimensional normal integrals. In recent years, the econometric estimation of Multinomial Probit
models has greatly been focused on. Alternative simulation based approaches have been suggested and
compared. Most approaches exploit a conventional estimation technique where easy to compute simulators replace the choice probabilities. For situations such as in transportation demand modelling where
samples and choice sets are large, the existing literature clearly suggests the use of a maximum simulated
likelihood (MSL) framework combined with a Geweke±Hajivassiliou±Keane (GHK) choice probability
simulator. The present paper gives the computational details regarding the implementation of this practical

estimation approach where the scores are computed analytically. This represents a contribution of the
paper, because usually, numerical derivatives are used. The approach is tested on a 9-mode transportation
choice model estimated with disaggregate data from Santiago, Chile.
ReÂsumeÂ
La formulation probit polytomique (MNP) permet d'analyser et de deÂcrire de fac,on treÁs ¯exible, le choix
d'un individu parmi un ensemble de modaliteÂs inter-deÂpendantes. Les nombreux progreÁs e€ectueÂs au cours
des dernieÁres anneÂes concernant l'estimation eÂconomeÂtrique des modeÁles MNP, permet maintenant de
contourner la probleÂmatique lieÂe aÁ l'eÂvaluation d'inteÂgrales normales multiples qui de®nissent les probabiliteÂs de seÂlection des modaliteÂs. Les diverses approches consideÂreÂes exploitent geÂneÂralement des simulateurs ecaces agissant comme substituts aux probabiliteÂs exactes de choix. Le simulateur ayant la faveur
geÂneÂrale est le GHK, suggeÂre de fac,on indeÂpendante par Geweke, Hajivassiliou et Keane. Pour les situations telles que geÂneÂralement rencontreÂes dans le domaine des transports ouÁ les eÂchantillons ainsi que les
ensembles de choix sont de grande taille, la litteÂrature suggeÁre treÁs clairement l'emploi d'une approche du
* Tel: +1-418-656-5427; fax: +1-418-656-7412; e-mail: [email protected]
0191-2615/98/$Ðsee front matter # 1998 Elsevier Science Ltd. All rights reserved.
PII : S 0191-2615(9 8 ) 0 0 0 2 8 - 9

64

D. Bolduc/Transportation Research Part B 33 (1999) 63±79

maximum de vraisemblance utilisant le simulateur GHK pour approcher les probabiliteÂs de choix. Le
preÂsent article fournit les deÂtails relatifs aÁ l'utilisation de cette meÂthodologie dans un cadre du maximum

de vraisemblance avec deÂriveÂes analytiques. L'approche est ensuite testeÂe sur un ensemble de donneÂes
deÂcrivant le choix entre neuf modes servant aÁ relier le centre-ville de Santiago aÁ des reÂgions en peÂripheÂrie.
# 1998 Elsevier Science Ltd. All rights reserved.
Keywords: Multinomial probit; Simulation based estimate; Discrete choice; Transportation demand modeling

1. Introduction
Since the introduction of discrete choice techniques to analyze transportation related problems,
hundreds of studies have focused on the behavioral related aspects associated with the decision
process of individuals making a choice among a ®nite set of alternatives. The operational model
mostly used has the Multinomial Logit (MNL) form. To have choice probabilities with a
closed-form that can be calculated easily is its major advantage over more general strategies.
However, the assumption made by this model that the alternatives are mutually independent is
often limitative.
An attractive solution to this problem is to use the MultiNomial Probit (MNP) framework.
The inter-dependencies are then accounted for through the correlation structure of an error term
assumed to be normally distributed. Any error correlation structure can be postulated. Up until
recently, its use was rather limited, mainly because of the computational diculties associated
with the evaluation of the choice probabilities which are multidimensional normal integrals.
Recently, alternative simulation based approaches have been suggested. They are described and
compared in Hajivassiliou et al. (1996). Most approaches exploit a conventional estimation

technique where easy to compute simulators replace the choice probabilities. For situations such
as in transportation demand modelling where samples and choice sets are large, the maximum
simulated likelihood framework combined with a Geweke±Hajivassiliou±Keane (GHK) choice
probability simulator approach should be favoured. This paper gives the computational details
regarding the implementation of this particular estimation strategy where to speed computation,
the score vector associated with the likelihood function is computed using analytic expressions.
The approach is tested on a 9-mode transportation choice model using disaggregate data from
Santiago, Chile.

2. The multinomial probit formulation
A typical transportation mode choice model concerns the choice by individual n; n ˆ 1; . . . ; N
of the alternative i in the set Cn ˆ f1; . . . ; Jn g which produces the highest Vin utility level, i.e. so
that Vin 4Vjn , 8j 2 Cn . In this notation, the choice set is allowed to di€er across individuals, to
account for their own speci®c travel mode availabilities. In estimation, it is very important to take
this choice set variability into account. To present the proposed estimation approach intelligibility, it is easier to ®rst assume that each individual faces a same choice set C and then, in a
second stage, adapt the notation to the individual speci®c choice set case.

D. Bolduc/Transportation Research Part B 33 (1999) 63±79

65


2.1. MNP with universal choice set
Assuming that each individual n faces the same J alternatives, an MNP model formulation
based on linear-in-parameters utilities may be written as follows:
Vin ˆ Zin ‡ in ;
with
yin ˆ



1
0

if Vin 4Vin for j ˆ 1; . . . ; J; and
otherwise

The variable yin designates the choice made by individual n, Vin is the unobservable utility of
alternative i as perceived by individual n, Zin is a …1  K† vector of explanatory variables characterizing both the alternative i and the individual n. is a …K  1† vector of ®xed parameters and
®nally in is a normally distributed random error term of mean zero assumed to be correlated with
the errors associated with the other alternatives j; j ˆ 1; . . . ; J; j 6ˆ i. In vector form, one can write

this relationship as:
Vn ˆ Zn ‡ n ; n  N…0; †;

…1†

where Vn and n are …J  1† vectors and Zn is a …J  K† matrix.
As is well known, the only identi®able parameters in the original model (1) are those that can
be retrieved uniquely from the parameters of a scaled model di€erenced with respect to the utility
of an arbitrary alternative. Below, we use the last alternative as the base and the scaling is performed by ®xing to one, the variance of the ®rst error term in the di€erenced model. This version
which is referred here as the estimable model, can be written as:
Un ˆ Xn ‡ n ; n  N…0;
 †;

…2†

which is the model in Eq. (1) written in deviation with respect to the utility of the last alternative
VJn . More speci®cally, let m ˆ J ÿ 1, then Un is a …m  1† vector with components
Uin ˆ Vin ÿ VJn , i ˆ 1; . . . ; m. The matrix Xn and the vector n are de®ned similarly. The scaling
is such that var…n † ˆ var…1n ÿ Jn † ˆ 1.1 To impose a positive de®nite error covariance matrix


 , it is preferable to work with a formulation based on a Cholesky decomposition of
 . Such
an equivalent model is written as:
Un ˆ Xn ‡ Swn ; wn  N…0; Im †;

…3†


1
To set var…
1n † ˆ 1 is equivalent to dividing each row of the di€erenced model by the quantity model by the
p
quantity s ˆ var…1n ÿ Jn †. In that case, one would get Uin ˆ …Vin ÿ VJn †=s and the vector should be replaced with
ˆ =s. To assume that s ˆ 1 simpli®es the notation.

66

D. Bolduc/Transportation Research Part B 33 (1999) 63±79

where S is a lower triangular matrix such that SS0 ˆ

 , with s11 ˆ 1 to be consistent with the
scaling used. The parameters one is interested in are the K parameters k in the vector and the
p ˆ m  …m ‡ 1†=2 ÿ 1 parameters s21 ; s31 ; . . . ; sm1 ; s22 ; . . . ; smm that we incorporate in a …p  1†
vector denoted as s. To complete the notation, we call  ˆ … 0 ; s0 †0 , the joint vector of parameters
formed by the vertical concatenation of and s.
2.1.1. The utilities in deviation with respect to the chosen alternative
To evaluate the log-likelihood function requires calculating for each individual in the sample
the probability associated with the choice made. To compute the GHK choice probability simulator for a given individual n, it is convenient to map the utilities into di€erences with respect to
the utility of the choice made so that the probability Pn …i† then becomes an integral over a nonpositive orthant. This particular representation can be obtained by premultiplying the estimable
model in Eq. (2) by a …m  m† linear operator that we call Mi . For any i other than J, the following …m  1† vector is then obtained:
3
V
ÿ
V
1n
in
2
2
3
3

..
7
6
V1n ÿ VJn
U1n
7
6
.
..
7
6
6 . 7 6
7
Un ˆ 4 .. 5 ˆ 6 Viÿ1;n ÿ Vin 7 ˆ Mi 4
5 ˆ Mi Un :
.
7
6
4 Vi‡1;n ÿ Vin 5
Umn

VJÿ1;n ÿ VJn
VJn ÿ Vin
2

Premultiplying Eq. (2) by matrix Mi gives:
Un ˆ Mi Un ˆ Mi Xn ‡ Mi n ; ˆ Xn ‡ n ;

…4†

where n  N…0;
i †, with
i ˆ Mi
 M0i . Note that the Un vector thus obtained contains only
negative components: …U1n < 0; U2n < 0; . . . ; Umn < 0†, As seen below, the GHK simulator
exploits this particular feature. The vector n is de®ned similarly to Un and the matrix Xn is of
dimension …m  K† with rows …Z1n ÿ Zin †; . . . ; …ZJn ÿ Zin † . Finally, note that because of Eq. (3),
one may also write:
V…n † ˆ
i ˆ Mi
 M0i ˆ Mi SS0 M0i :


…5†

Eqs. (4) and (5) are particularly useful because they are expressed in terms of the estimable
parameters in and the vector s which, one can recall, is composed of the elements of the cholesky matrix S in Eq. (3).
2.2. MNP with individual speci®c choice set
To account for individual speci®c choice sets is relatively straightforward. Let n denote an
individual …n ˆ 1; . . . ; N† which chooses that alternative i in choice set Cn ˆ f1; . . . ; Jn g for which
Vin 4Vjn , 8j 2 Cn . The corresponding MNP model is obtained by excluding from Eq. (1) those
utilities associated with the alternatives that are not available to individual n. In terms of the

D. Bolduc/Transportation Research Part B 33 (1999) 63±79

67

estimable model, is amounts to removing the appropriate rows from Eq. (2). Such a formulation
can be obtained with the use of an …mn  m† mapping matrix En , where mn ˆ Jn ÿ 1. It is de®ned
as an identity matrix with rows associated with the alternatives not available, deleted. In the case
of an universal choice set, En would simply be an identity matrix of size m. With this in mind, one
can therefore use a unique and general notation to refer to both cases of the MNP model formulation.

2.3. A general notation for the MNP model
Consider the previously introduced …m  m† mapping matrix Mi that maps the m ˆ J ÿ 1 utility di€erences Un of the estimable model in Eq. (2) into a deviation with respect to the alternative
i chosen by individual n. Consider also the …mn  m† mapping matrix En which removes the utilities associated with the alternatives unavailable to n. Those two operations can be combined
into the following operator:
Min ˆ En Mi :

…6†

This …mn  m† mapping matrix, when premultiplying the utility vector Un gives:
3 2
3
3
2
V1n ÿ Vin
U1n
V1n ÿ VJn
..
..
7
7
6 . 7 6
6
Un ˆ 4 .. 5 ˆ 4
5 ˆ Min 4
5 ˆ Min Un :
.
.
Umn ;n
VJn ;n ÿ Vin
VJÿ1;n ÿ VJn
2

This is the …mn  1† vector of the utilities associated with the alternatives available to individual n
expressed in deviation with respect to the chosen alternative. Note again that in the case of a
universal choice set, Min ˆ Mi , 8n since En would be an identity matrix. Using the so called Min
matrix and the estimable model in Eq. (2), the ®nal notation that we exploit to compute the choice
probabilities is:
Un ˆ Min Un ˆ Min Xn ‡ Min n ˆ Xn ‡ n ;

…7†

where n  N…0;
in †, with
in ˆ Min SS0 M0in . This notation should make clear that depending on
the mode availability and the choice made, the error covariance matrix vary between observations. However, it should be noted that a common S Cholesky matrix appears in the covariance
structure. The matrix Xn represents a …mn  K† matrix of the deviation of the explanatory variables of each available alternative j other than alternative i with respect to the explanatory variables of the chosen alternative i. Again, note that by de®nition, the Un is a vector with negative
components, i.e.: U1n 40; . . . ; Umn n 40. To compute a given choice probability, we will also need
to consider a Cholesky transformed version of this formulation. This is written as:
Un ˆ Xn ‡ Sn wn ; wn  N…0; Imn †;
where Sn is a …mn  mn † lower triangular matrix such that:

…8†

68

D. Bolduc/Transportation Research Part B 33 (1999) 63±79

Sn S0n ˆ
in ˆ Min SS0 M0in :

…9†

For estimation, we will exploit this relationship between the individual speci®c Cholesky matrix
Sn and the unique Cholesky matrix S containing the parameters to estimate. To clarify the distinction between those two matrices, note that Sn refers to the formulation expressed in deviation
with respect to the chosen alternative with only the available alternatives being considered while S
refers to the estimable model expressed in deviation with respect to the last alternative with all the
alternatives included, whether it is available or not. Obviously, a given observation n contributes
to estimate only those elements of S referring to the available alternatives.

3. Model estimation
3.1. The choice probabilities
Denote Pn …i† as the choice probability associated with the alternative i; i 2 Cn chosen by individual n. Given the formulations in Eqs. (7) or (8), this is also the probability of drawing Un with
each component U1n ; . . . ; Umn ;n being non positive. It can be computed as an mn - dimensional
integral of the form:

Pn …i† ˆ

ÿX…1n ÿX…2n
ÿ1

ÿ1

...

ÿX…
mn ;n

n…n ;
in †dn ;

…10†

ÿ1

where n…n ;
in † is a multivariate normal density with mean zero and covariance matrix
in .
Unless mn is small, the choice probability in Eq. (10) cannot be computed using a numerical
integration technique. One solution is to simulate Pn …i†. Many simulators with good properties
have been suggested. The most useful ones are described and compared in Hajivassiliou et al.
(1996). The GHK simulator, that is used here, was clearly found to be the one with the best theoretical and empirical properties. Theoretical and analytical details on this simulator may be
found in BoÈrsch-Supan and Hajivassiliou (1993), Hajivassoliou (1993) and Geweke et al. (1992).
Still, we provide computational details regarding its implementation because we need speci®c
expressions to derive analytical relationships to compute the scores. The GHK simulator exploits
the recursive structure imposed by the Cholesky transformation present in Eq. (8). For a given
observation n, using Eq. (8), one can write:
3 2X 3 2 S
U1n
11;n
1n
7 6 s21;n

X
6 U2n 7 6
2n
7 6
6 . 7ˆ6
. 7‡6 .
4 .. 5 6
4 .. 5 4 ..
Umn ;n
Xmn ;n
smn ;1;n
2

which imply:

0
s22;n
..
.

...
...
..
.

smn ;2;n

...

32
3
w1n
76 w2n 7
76
40;
74 .. 7
5
. 5
wmn ;n
smn ;mn ;n
0
0
..
.

D. Bolduc/Transportation Research Part B 33 (1999) 63±79

U1n 40 ! w1n 4

U2n 40 ! w2n 4
..
.

69

ÿX1n
s11;n
ÿ‰s21;n w1n ‡ X2n Š
s22;n
..
.

Umn ;n 40 ! wmn ;n 4

ÿ‰smn ;1;n w1n ‡ smn ;2;n w2n ‡ . . . ‡ smn ;mn ÿ1;n wmn ÿ1;n ‡ Xmn ;n Š
:
smn ;mn ;n

To write it in a more compact way, we use the notation:
w1n
w2n
..
.

4
4

a1n
a2n …w1n †
..
.

…11†

4 amn ;n …w1n ; . . . ; wmn ÿ1;n †

wmn ;n
where:
aln …† ˆ

8 

lÿ1
P slh;n
>

>

>
: ÿX


1n s11;n

l>1
…12†
l ˆ 1:

Therefore, the choice probability Pn …i† can be written as:
Pn …i† ˆ pr…Un 40† ˆ pr…w1n 4a1n ; w2n 4a2n …w1n †; . . . ; wmn ;n 4amn ;n …w1n ; . . . ; wmn ÿ1;n ††:
By conditioning, one can also write:
Pn …i† ˆ pr…U1n 40†pr…U2n 40jU1n 40†. . .pr…Umn ;n 40jU1n 40; U2n 40; . . . ; Umn ÿ1;n 40†
ˆ pr…w1n 4a1n †pr…w2n 4a2n …w1n †jw1n 4a1n †. . . :

3.2. The GHK simulator
Let r denote a draw. Call wnr a given realization r of the vector wn such that Eq. (11) is satis®ed.
Based on R such draws, the following expression:

fn …i† ˆ

R
1X
fnr …i†
R rˆ1

…13†

70

D. Bolduc/Transportation Research Part B 33 (1999) 63±79

where
fnr …i† ˆ pr…w1n;r 4a1n;r †pr…w2n;r 4a2n;r …w1n;r †jw1n;r 4a1n;r †. . . ;
is a choice probability conditional on wnr which can be used to provide an unbiased simulator for
Pn …i†. The GHK simulator de®ned in Eq. (13) is smooth with respect to the parameters k ,
k ˆ 1; . . . ; K and the s11;n ; . . . ; smn ;mn ;n elements forming the lower part of Cholesky matrix Sn . It
is also known to have good asymptotic properties. For proofs, refer to Hajivassiliou and
McFadden (1990). For notational compactness and because of the normality assumption, we can
write fnr …i† as follows:
fnr …i† ˆ …a1n;r †…a2n;r † . . .…amn ;n;r † ˆ 1nr 2nr . . .mn ;nr ;

…14†

where  denotes a standard normal cumulative distribution function. Using Eq. (12), the aln;r are
computed as:
8 

lÿ1
>
< ÿ P slh;n w ‡ X ; l > 1
ln sll;n
sll;n hn;r
aln;r ˆ
…15†
hˆ1
>

:
l ˆ 1;
ÿX1n s11;n

where according to Eq. (11), the whn;r are drawn using ÿ1 …u…ahn;r ††, with the term u being a
random uniform number taken from the (0,1) interval.
3.3. The simulated likelihood function
The estimation method considered is based on the maximisation of the natural logarithm of the
simulated likelihood function. Denoting  as the joint vector of parameters to estimate (i.e. it
contains and s), the simulated log-likelihood function which is written as:
!
mn
R Y
N
N
X
X
1X
lnr ;
ln
ln fn …i† ˆ
…16†

R rˆ1 lˆ1
nˆ1
nˆ1
is maximized with respect to . Technical details on the computation of the analytical ®rst- order
derivatives are provided in Appendix A. To our knowledge, this is the ®rst implementation of the
simulated likelihood framework based on the GHK probability simulator which uses analytical instead of numerical derivatives.2 The computation of @L=@ is quite straightforward when
2

One of the referees claimed that numerical derivatives should be more reliable and as fast as suggested approach
with analytical expressions for the derivatives. The numerous Monte Carlo experiments based on four alternatives that
we performed clearly indicate that this is not true. On the issue of computation time, the approach based on the analytical derivatives was found to be 50% faster than the alternative technique. One reason for this is the fact that
numerical derivatives require recalculating, for each perturbation of the parameter vector, the error covariance matrix,
the Cholesky matrix and the choice probabilities. With analytical expressions, all those quantities are ®xed within a
given iteration step. Reliability should not be questioned because in general, approaches with analytical derivatives are
more reliable than those based on numerical derivatives.

D. Bolduc/Transportation Research Part B 33 (1999) 63±79

71

compared to the computation of @L=@s. The latter is computed using a chain rule which exploits a
jacobian transformation between s and sn . This transformation arises from the relationships in
Eq. (9). Appendix A provides the details on the computation of @L=@sn . The derivative @L=@s of
the log-likelihood function with respect to the vector s of interest is evaluated as @s0n =@s@L=@sn ,
where @s0n =@s is a jacobian matrix whose calculation is detailed in Appendix B. In the implementation of the estimation algorithm, we use a BHHH approach which avoids the need to
compute the second-order derivatives which are in this case, rather involved. The computer program written in Fortran 77 has gone through several stages of testing. Di€erent Monte Carlo
based tests were made on a SUN workstations. We now use it on some real data describing a
choice situation among nine inter-related travel mode alternatives.

4. An application
To test the methodology, we use a data bank about the choice of modes for the morning peak
journey to work to the central business district (CBD) of Santiago. This data bank has been
described and employed several times in the past. Gaudry et al. (1989) focused on the estimation
of the valuation of time saving by the transportation mode users. They found that the values
obtained with linear Logit or linear nested Logit speci®cations were particularly high. By using
nonlinear speci®cations of Box±Cox Logit type, they were able obtain more satisfactory results.3
In our application, we use the same data and the same model speci®cation in order to ®nd out
whether to allow for a ¯exible correlation structure of the utilities can lead to reasonable VOT
values. Details on the sample size, the di€erent mode shares and their availability are displayed in
Table 1.
We will ®nd that to do so certainly helps to improve the value of time (VOT) estimates. Still,
their seems to remain some room for improvement. To implement a Box±Cox technique within a
MNP setting represents a too formidable task. We suspect that an MNP extension of the MNL
formulation with lognormally distributed VOT implemented in Ben-Akiva et al. (1993) would be
a reasonable approach to address this problem. Our exploratory analyses using the random VOT
based MNL speci®cation clearly point into this direction. Such a more general MNP model framework with randomly distributed VOT coecients will be considered in a later research.
The important variables entering the model speci®cation are travel time tin and cin =wn , the cost
for individual n of travelling by mode i as a proportion of his/her net personal income per min of
work [Chilean pesos/min ($)]. To use cost as a proportion of income allows the VOT to vary
deterministically with income. The other VOT related variables used in the speci®cation are
walking and in vehicle time for all modes and a waiting time variable for all modes other than
car. Finally, in addition to eight alternative speci®c dummies, appear two other explanatory
variables of socio-economic type. The ®rst one, speci®c to car driver alternatives 1 and 6 indicates
the number of cars in the household as a proportion of number of driving permit holders. The
last variable is a sex dummy included for modes 2, 3 and 7 listed in Table 1. The model estimates
3
In order to implement their Box-Cox methodology, they removed observations that contained zero values for some
of the explanatory variables. Unless there is a good reason for removing those observations, we favor the use of an
approach that retains those observations for model estimation.

72

D. Bolduc/Transportation Research Part B 33 (1999) 63±79

Table 1
Statistics on the sample used
Alternative

Chosen

Percent

Availability

168
66
58
295
430
101
41
65
75

0.12933
0.05081
0.04465
0.22710
0.33102
0.07775
0.03156
0.05004
0.05774

681
730
833
407
1287
530
594
828
841

1299

1.0000

6731

1. Car-driver
2. Car-passenger
3. Shared-taxi
4. Metro
5. Bus
6. Car-driver±metro
7. Car-passenger±metro
8. Shared-taxi±metro
9. Bus-metro
Totals

obtained using a basic MNP i.i.d. speci®cation are displayed in column 1 of Table 2. An MNP
speci®cation with i.i.d. errors can easily be estimated using Gauss±Hermite quadrature to compute the choice probabilities entering the log-likelihood function. Only 12 quadrature points were
used to perform the integrals. The solution was obtained in 14 iterations using 0 as the starting
value for all the parameters. The implied VOT estimates as a percentage of net income are produced in the same column of Table 3. Those values are in the same range as the ones obtained
using a linear logit speci®cation of Gaudry et al. (1989). The value of time sensitivity in vehicle, as
a percentage of net personal income is computed as:
@…cin =wn † @Vin
@Vin
tm
=


:
@tin
@tin @…cin =wn † …cin =wn †
As well known, in linear speci®cation, the VOT measures are evaluated as ratios of parameters.
The two other VOT estimates were produced similarly. Standard errors were computed using the
delta method.
4.1. The model formulation with correlated utilities
To capture the similarities between the transportation modes and to keep a rather parsimonious parametric speci®cation of the error covariance structure, we use the ®rst-order Generalized
Autoregressive [GAR(1)] process approach suggested in Bolduc (1992). In order to obtain such a
formulation, the original model in Eq. (1) is replaced with:
Vn ˆ Zn ‡ TPÿ1 n ; n  N…0; IJ †;

…17†

where T is a J-diagonal matrix which contains standard deviation i in the ith position, and P is a
matrix for capturing the covariance e€ects using functions based on few underlying unknown

73

D. Bolduc/Transportation Research Part B 33 (1999) 63±79
Table 2
Estimation resultsa
Variables

MNP i.i.d.

Alternative speci®c dummies
1. Car-driver
3. Shared-taxi
4. Metro
5. Bus
6. Car-driver±metro
7. Car-passenger±metro
8. Shared-taxi±metro
9. Bus±metro

0.19
0.61
3.11
1.49
ÿ0.02
0.36
0.66
0.90

(0.86)
(4.74)
(16.56)
(12.30)
(ÿ0.01)
(2.68)
(3.65)
(5.02)

0.12
0.82
3.10
1.65
0.18
0.58
0.90
1.12

(0.55)
(5.25)
(15.97)
(11.11)
(0.80)
(3.60)
(4.54)
(5.78)

ÿ0.44
0.61
2.81
1.47
ÿ0.59
ÿ1.46
ÿ0.09
1.23

(ÿ0.94)
(2.31)
(5.78)
(8.27)
(ÿ1.15)
(ÿ0.96)
(ÿ0.16)
(6.60)

ÿ0.37
0.51
2.73
1.45
ÿ0.49
ÿ1.94
ÿ0.14
1.20

(ÿ0.81)
(1.66)
(5.84)
(8.38)
(ÿ1.01)
(ÿ1.07)
(ÿ0.22)
(6.74)

ÿ0.39
0.27
2.88
1.49
ÿ0.59
ÿ2.33
ÿ0.20
1.25

(ÿ0.85)
(0.89)
(5.88)
(7.96)
(ÿ1.20)
(ÿ1.11)
(ÿ0.62)
(7.72)

ÿ0.02
ÿ0.08
ÿ0.05
ÿ0.25
1.29

(ÿ8.61)
(ÿ9.37)
(ÿ5.50)
(ÿ5.28)
(5.61)

ÿ0.02
ÿ0.08
ÿ0.05
ÿ0.18
1.24

(ÿ7.59)
(ÿ9.24)
(ÿ5.62)
(ÿ6.15)
(5.44)

ÿ0.02
ÿ0.07
ÿ0.04
ÿ0.13
1.54

(ÿ3.89)
(ÿ4.04)
(ÿ3.51)
(ÿ3.56)
(3.16)

ÿ0.02
ÿ0.06
ÿ0.04
ÿ0.13
1.45

(ÿ3.81)
(ÿ3.98)
(ÿ3.48)
(ÿ3.54)
(3.13)

ÿ0.02
ÿ0.07
ÿ0.05
ÿ0.13
1.52

(ÿ3.83)
(ÿ4.01)
(3.55)
(ÿ3.64)
(3.14)

Other variables
Cost/income ($)
Walk time (min)
In vehicle time (min)
Waiting time (min)
No. cars/no. permit holders
(alternatives 1 and 6)
Sex dummies (male=1)
(alternatives 2,3 and 7)

ÿ0.42 (ÿ3.64)

SML MNP
R ˆ 50
homoscedastic

ÿ0.39 (ÿ3.69)

SML MNP
R ˆ 50
unconstrained

ÿ0.41 (ÿ3.21)

Error correlation structure
1
2
3
4
5
6
7
8
9

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1.52
1
0.95
1.22
0.31
1.32
2.26
1.51
0.46



0

0.46 (5.89)

0.29 (2.29)

(Simulated) log. Likelihood
ÿ1482.39
at convergence.
Number of iterations
14
Run time (min and s)/iteration
0.06
on SUN UltraSparc 1
a

(3.85)
(3.46)
(3.27)
(1.31)
(3.34)
(2.04)
(2.78)
(2.84)

SML MNP
R ˆ 250
unconstrained

ÿ0.43 (ÿ3.27)

1.43
1
1.01
1.15
0.27
1.22
2.57
1.50
0.46

(3.76)
(3.33)
(3.25)
(1.08)
(3.40)
(2.02)
(2.72)
(3.00)

0.28 (2.19)

SML MNP
R ˆ 250
constrained

ÿ0.49 (ÿ3.53)

1.35
1
1.27
1.27
0.40
1.35
2.87
1.27
0.40

(3.97)
(3.90)
(3.90)
(3.66)
(3.97)
(2.00)
(3.90)
(3.66)

0.35 (3.15)

ÿ1473.60

ÿ1447.40

ÿ1442.83

ÿ1443.86

7
1.44

20
1.50

22
7.26

18
6.43

Asymptotic t-statistics are in parentheses.

parameters. It can be viewed as a restricted version of a saturated factor analytic formulation.
This model can be obtained from an initial model Vn ˆ Zn ‡ Tn , with n ˆ Wn ‡ n , being
assumed. This last autoregressive process is assumed to be based on a …J  J† Boolean (0±1)
contiguity matrix W which, in this particular application, relates the ®rst two alternatives together and does the same with the last seven alternatives. The W matrix used is as follows:

74

D. Bolduc/Transportation Research Part B 33 (1999) 63±79

Table 3
Value of time as a percentage of net personal incomea
Variable

MNP i.i.d.

SML MNP
R ˆ 50
homoscedastic

SML MNP
R ˆ 50
unconstrained

SML MNP
R ˆ 250
unconstrained

SML MNP
R ˆ 250
constrained

In vehicle time
Walking time
Waiting time

201.7 (4.72)
345.7 (6.46)
773.5 (4.74)

211.3 (4.64)
353.9 (6.05)
854.5 (4.83)

184.0 (4.43)
288.2 (5.16)
558.8 (4.03)

198.6 (4.42)
297.7 (5.04)
590.1 (3.99)

214.5 (4.50)
315.8 (5.01)
606.3 (4.02)

a

Asymptotic t-statistics are in parentheses.

0
1
0
0
Wˆ 0
0
0
0
0

1
0
0
0
0
0
0
0
0

0
0
0
1
1
1
1
1
1

0
0
1
0
1
1
1
1
1

0
0
1
1
0
1
1
1
1

0
0
1
1
1
0
1
1
1

0
0
1
1
1
1
0
1
1

0
0
1
1
1
1
1
0
1

0
0
1
1
1
1
1
1
0

To insure the invertibility of the P ˆ IJ ÿ W matrix, so that n could be replaced with Pÿ1 n
for a value of  de®ned on the (ÿ1,1) interval, the W matrix is normalized so that each row sum
to one. In the postulated structure, the correlation coecient  and a maximum of J ÿ 1 standard
error terms can be estimated. [For more details on the use of GAR(1) processes to approximate the error covariance structure in discrete choice modelling and on parameter estimability
issues, refer to Bolduc, (1992)]. In our application, the second standard deviation term 2 is set
to 1.
4.2. Estimation results
Estimation results of the di€erent MNP versions considered are displayed in Table 2. Column 2
results refers to the SML MNP solution based on 50 draws, assuming cross-correlation between
the alternative speci®c errors and homoscedasticity. In other words, all sigmas are ®xed to 1 and
only  is estimated. The results obtained clearly show that correlation is present between the utilities. The GAR(1) setting makes it possible to summarize the full correlation structure using a
single correlation coecient. The ®t is better but the estimated parameters are not so di€erent
from the MNP i.i.d. solution. Columns 3 and 4 of Table 2 refer to the model speci®cation where
seven standard deviation terms are estimated. Recall that for identi®cation, the second standard
deviation term is ®xed to 1. Therefore, in this speci®cation referred to as unconstrained in the
tables each utility has its own heteroscedasticity e€ect. According to the estimation results
obtained, heteroscedasticity is signi®cantly present. The model ®t is much better than with the
homoscedastic structure. Values of time estimates, especially when 250 draws are used, are much

D. Bolduc/Transportation Research Part B 33 (1999) 63±79

75

lower than in the MNP i.i.d. case. One known problem associated with maximum simulated
likelihood is the bias introduced by simulating. This is because the GHK is a technique to simulate the choice probability, not its natural log. The bias can be proved to be present using Jensen's
inequality. It is also known that it becomes unsigni®cant with large number of simulation draws.
This justi®es our use of 250 draws for estimation. Note that with 50 draws, the results are very
close to those obtained with large number of draws. The improvement in the ®t using larger
number of draws can be attributed to the presence of the bias. All this indicates that the GHK
simulator performs very well.
The last column of Table 2 refers to a constrained version of the model where the heteroscedastic structure is postulated to be consistent with the de®nition of the alternatives provided in
Table 1. In this version, we are postulating homoscedasticity among groups of alternatives. The
groups formed using the restrictions: 1 ˆ 6 ; 2 ˆ 1; 3 ˆ 4 ˆ 8 ; 5 ˆ 9 with 7 remaining
free are car-driver, car passenger, taxi-metro, bus and car-passenger±metro. This is done to
demonstrate the ¯exibility of the approach. In this case, the simulated log-likelihood value is
ÿ1443.86 which is marginally higher in absolute value than the one corresponding to the unrestricted version of the model, which indicates that the restrictions make sense. This is con®rmed
with a log-likelihood ratio test. All parameter estimates closely resemble the values obtained with
the unconstrained model. The run time per iteration was approximately 6.4 min.
The constrained estimation gains in terms of number of iterations to reach optimum but the
computation time per iteration is almost the same, since only mappings are applied in computing
the derivatives, so a comparable number of manipulations are required at a given iteration. The
implied VOT estimates for in vehicle time, walking time and waiting time are not as good as those
obtained with the more general error structure. The unconstrained version performs best in producing smaller VOT estimates. However, there appears to be some room for improvement. An
MNP formulation with lognormally distributed VOT coecients is a potentially good alternative
for solving the problem of high VOT estimates. This is left for a future research.

5. Conclusion
Our application demonstrated the feasibility of MNP estimation when applied to choice situation based on large choice sets. In the most general case that we considered in the application,
each utility was characterized by a speci®c standard error and the correlation among two natural
blocks of utilities was modelled using a generalized autoregressive error structure. Based on the
observed improvements in the log-likelihood values, to take into account these inter-relationships
between the model errors was important. The technique was applied to the 9-mode transportation
choice model considered in Gaudry et al. (1989).
Estimating di€erent models on this reliable disaggregate data set from Santiago, Chile, these
authors found that the time valuation coecients obtained with linear speci®cation were particularly high. Our MNP estimations gave some more evidence to their ®nding. We suspect that an
MNP extension of the MNL formulation with lognormally distributed VOT implemented in BenAkiva et al. (1993) would be a promising technique for solving this problem. Our exploratory
analyses using the random VOT based MNL speci®cation clearly point into this direction. This
approach will be considered in a later research.

76

D. Bolduc/Transportation Research Part B 33 (1999) 63±79

Acknowledgements
The author would like to thank Professor Marc Gaudry for kindly providing the Santiago data
bank used for this study.

Appendix A. The ®rst-order derivatives
A.1. The simulated likelihood function
The estimation method considered here maximises the natural logarithm of the simulated
likelihood function. Noting that  denotes the joint vector of parameters to estimate (i.e. it contains and s), then the simulated log-likelihood function is written as:
mn
R Y
1X
lnr
ln
lnfn …i† ˆ

R rˆ1 lˆ1
nˆ1
nˆ1
N
X

N
X

where fnr …i† ˆ

mn
Q

!

…A1†

lnr is the conditional probability of choosing alternative i given a particular

lˆ1

draw r of the vector wn in Eq. (8) of the text. Note that we use directly the conventions established
in Eqs. (11)±(15). We now provide the details regarding the computation of the ®rst-order derivatives.
A.2. First-order derivatives
Recall that the joint vector of parameters is denoted as . Using Eq. (A1), we get:
mn
R
N
X
@L X
1 1X
@ ln lnr
ˆ
fnr …i†
;
@
f …i† R rˆ1
@
nˆ1 n
lˆ1

…A2†

where:
@ ln lnr lnr @aln;r
@aln;r
ˆ

ˆ llnr
;
@
lnr @
@

…A3†

with llnr ˆ lnr =lnr and with lnr denoting a standard normal density function evaluated at aln;r .
This particular value is computed from Eq. (15). Because aln;r is a function of all whn;r ,
h ˆ 1; . . . ; l ÿ 1 and since we have whn;r ˆ ÿ1 …uhn;r …ahn;r †† for h ˆ 1; . . . ; l ÿ 1, then in the
computation of @aln;r =@, the following recursion:
@whnr
…ahn;r † @ahn;r

ˆ uhn;r 
;
@
…whn;r † @

…A4†

D. Bolduc/Transportation Research Part B 33 (1999) 63±79

77

needs to be taken into account. Recall that uhn;r denotes a particular draw from the random uniform distribution de®ned over the unit interval. We now give the explicit relationships for and s.
A.3. Derivatives with respect to b
mn
N
R
X
@L X
1 1X
@ ln lnr
fnr …i†
ˆ
;
@ nˆ1 fn …i† R rˆ1
@
lˆ1

…A5†

where:
@ ln lnr
@aln;r
ˆ llnr
;
@
@
and where using Eq. (15) and the recursion in Eq. (A4), one has:
@aln;r
@

ˆÿ

@aln;r
@

ˆ

 lÿ1 
P slh;n
hˆ1

sll;n

…ahn;r † @ahn;r
 @
uhn;r …w
hn;r †

1n
ÿ sX11;n



‡

Xln
sll;n



l>1
l ˆ 1:

A.4. Derivatives with respect to s
As previously mentioned, this derivative is computed using a chain rule linking sn to s. Below
we provide @L=@sn . Then, @L=@s is computed as @s0n =@s@L=@sn where the jacobian matrix @s0n =@s is
detailed in Appendix B. Now recall that the sn vector is formed by concatenating the elements
sij;n , i5j in a column vector. Then the elements of the @L=@sn vector are:
mn
N
R
X
X
@L
1 1X
@ ln lnr
fnr …i†
ˆ
; i ˆ 1; . . . ; mn ; j ˆ 1; . . . ; i
@sij;n nˆ1 fn …i† R rˆ1
@sij;n
lˆ1

…A6†

where using Eq. (15), one can note that:
@ ln lnr
@sij;n
@ ln lnr
@sij;n

since aln;r

lnr @aln;r
lnr  @sij;n

ˆ

j4i4l

ˆ 0
i > l;
 lÿ1 


P slh;n
@aln;r

ˆÿ
sll;n whn;r ‡ Xln sll;n , implies that @sij;n ˆ 0, whenever i > l.

…A7†

hˆ1

…ahn;r †
hn;r
Also, the recursion in Eq. (A4) implies that @w
@sij;n ˆ uhn;r whn;r
lÿ1
P

…ahn;r † @ahn;r
;
uhn;r sslh;n
ll;n …whn;r † @sij;n

@aln;r
@sij;n

ˆ ÿ

@aln;r
@sij;n

jn;r
ˆ ÿ sll;n

j