Directory UMM :Data Elmu:jurnal:J-a:Journal of Econometrics:Vol95.Issue1.2000:
Journal of Econometrics 95 (2000) 117}129
A numerically stable quadrature procedure for
the one-factor random-component discrete
choice model
Lung-fei Lee*
Department of Economics, Hong Kong University of Science and Technology, Clear Water Bay,
Kowloon, Hong Kong
Received 1 January 1997; received in revised form 1 December 1998; accepted 1 April 1999
Abstract
The Gaussian quadrature formula had been popularized by Butler and Mo$tt (1982
Econometrika 50, 761}764) for the estimation of the error component probit panel model.
Borjas and Sueyoshi (1994, Journal of Econometrics 64, 164}182) pointed out some
numerical and statistical di$culties of applying it to models with group e!ects. With
a moderate or large number of individuals in a group, the likelihood function of the model
evaluated by the Gaussian quadrature formula can be numerically unstable, and at worst,
impossible to evaluate. Statistical inference may also be inaccurate. We point out that some
of these di$culties can be overcome with a carefully designed algorithm and the proper
selection of the number of quadrature points. However, with a very large number of
individuals in a group, the Gaussian quadrature formulation of integral may have large
numerical approximation errors. ( 2000 Elsevier Science S.A. All rights reserved.
Keywords: Discrete choice; Random component; Quadrature
1. Introduction
Consider the following random-component, binary choice model:
yH"x b#e ,
ij
ij
ij
* Tel.: 852-2358-7600; fax: 852-2358-2084.
E-mail address: l#[email protected] (L.f. Lee)
0304-4076/00/$ - see front matter ( 2000 Elsevier Science S.A. All rights reserved.
PII: S 0 3 0 4 - 4 0 7 6 ( 9 9 ) 0 0 0 3 2 - 9
(1)
118
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
for groups j"1,2, J and individuals i"1,2, N in the jth group, where x is
j
a vector of exogenous variables. The disturbance in (1) is generated by an error
component structure:
e "u #v ,
(2)
ij
j
ij
where u and v are mutually independent, u are i.i.d. for all j, and v are
i
ij
j
ij
i.i.d. for all i and j. The sign of the latent yH determines the observed
dichotomous dependent variable I as I"1 if yH'0; it is 0 otherwise. Let F be
the conditional distribution function of e conditional on u, and f be the density
of u. The log-likelihood function for the model is
J
¸" + ln[Prob(I ,2,I j )],
1j
N ,j
j/1
where
P
=
Nj
f (u ) < [F(b Du )!F(a Du )] du ,
ij j
ij j
j
j
~=
i/1
a "!x b and b "R if I "1, and a "!R and b "!x b if I "0.
ij
ij
ij
ij
ij
ij
ij
ij
For a probit model, both u and v are assumed to be normally distributed.
A typical normalization for the probit model speci"es a zero mean and a unit
variance for the total disturbance e. Let o be the correlation between individuals
within a group. For this normalized random-component model, o is also the
variance of u. Therefore, u is N(0, o) and v is N(0, 1!o). The log-likelihood
j
ij
function for this probit error component model can be compactly written by
using the symmetry of a normal density as
Prob(I ,2,I j )"
1j
N ,j
GP
C A
BD H
=
J
Nj
x b#o1@2u
1
ij
¸" + ln
e~(1@2)u2 < U D
du ,
(3)
ij (1!o)1@2
(2p)1@2
~=
j/1
i/1
where U is the standard normal distribution function, and D is a sign indicator
such that D "1 if I "1; D "!1, otherwise.
ij
ij
ij
The joint probability of individual responses within a group in (3) involves
a single integral whose integrand is a product of univariate probability functions. Butler and Mo$tt (1982) pointed out that these joint probabilities could
be e!ectively evaluated by the Gauss}Hermite quadrature. The Gaussian quadrature evaluates the integral := e~z2g(z) dz by using the integration formula
~=
=
M
e~z2g(z) dz" + w g(z ),
(4)
m m
~=
m/1
where M is the designated number of points; zs and ws are the M-point
Gauss}Hermite abscissas and weights. If g in (4) is a polynomial of degree less
than 2M!1, the Gauss}Hermite integration evaluation is exact. The theory
P
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
119
is based on orthogonal polynomials (see e.g., Press et al., 1992, Chapter 4).
The abscissas and weights are available from Stroud and Secrest (1966) and
Abramowitz and Stegun (1964). Press et al. (1992) provides computing codes for
generating the abscissas and weights with any speci"c number of points. Given
the Gauss}Hermite points z and weights w , the likelihood in (3) can be
m
m
evaluated by the Gauss}Hermite formula as
G
C A
BDH
J
1 Nj
M
x b#(2o)1@2z
ij
m
¸" + ln + w
< U D
mp1@2
ij
(1!o)1@2
j/1
m/1
i/1
.
(5)
Butler and Mo$tt (1982) illustrated the usefulness of the Gaussian quadrature
with a panel data consisting of a sample of 1550 cross-sectional units with
a maximum of 11 periods each. Based on stability of estimates, they suggested
that the two- or four-point quadratures would be su$cient. The Gaussian
quadrature approach is computationally e$cient relative to other quadrature
techniques such as trapezoidal integration (Heckman and Willis, 1975). Subsequently, the Gaussian quadrature approach has been often used in the empirical econometrics literature.
In a recent publication, Borjas and Sueyoshi (1994) pointed out some numerical and statistical di$culties that occurred when they tried to apply the
technique to study probit models with structural group e!ects where the number
of individuals in a group was large. In their model, individuals belonging to
a given group share a common component in the speci"cation of a conditional
mean, and there are many groups. The group e!ect speci"cation is an error
component probit model as in (1) and (2). They argued that a likelihood
formulation with the Gaussian quadrature could be numerically unstable, and,
at worst, impossible to evaluate with computers if the number of individuals in
some groups, i.e., N , was large. This is so because the integrand of a numeric
j
integration in (3) involves the product of cumulative probabilities for all members in a group. With a hypothetical sample of 500 observations per group and
assuming a likelihood contribution of 0.5 for each member in a group, the value
of the integrand can be as small as e500C-/(0.5)+e~346.6, which is below the
existing absolute minimum for a computer. The numerical problem occurs when
one tries to evaluate the product that consists of many terms of small numbers.
Based on Monte Carlo results, Borjas and Sueyoshi (1994) also pointed out that
statistical inference based on the Gaussian quadrature likelihood function can
be quite inaccurate. Nominal levels of signi"cance can be much smaller than
actual levels of signi"cance in hypothesis testing.
In this paper, we reconsider the numerical and statistical problems. We point
out that with a carefully designed algorithm and proper selection of the number
of Gaussian quadrature points, the di$culties discussed by Borjas and Sueyoshi
(1994) can be overcome to a certain degree. A by-product of our research is that
we discovered that the Gaussian quadrature formulation may contain relatively
120
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
large approximation errors when the number of individuals in a group is large.
The consequence of the latter may be larger standard errors for maximum
likelihood estimates (MLE) when the number of individuals in a group
increases.1
2. Gaussian quadrature and a numerically stable algorithm
We suggest an algorithm that can overcome the numerical problem. The
numerically unstable problem can be resolved if the summation and product
operators behind the logarithmic transformation in (5) can somehow be interchanged. The possible interchanging summation and product is "rst discussed in
Lee (1996) for some related problems in simulation estimation. It can be
summarized in the following proposition.
Proposition. For any constants a , t"1,2,¹ and r"1,2,R, the following
tr
identity holds:
R
T
T R
,
+ b < a "< + a u
tr t~1,r
r
tr
r/1 t/1
t/1 r/1
where u are weights for t*1, which can be computed recursively as
tr
N
R
u "a u
+ a u
,
tr
tr t~1,r
ts t~1,s
s/1
O0 for all
starting with u "b for r"1,2, R, assuming that +R a u
0r
r
s/1 ts t~1,s
t"1,2,¹!1.
Proof. Let c "+R a u
. Then
t
s/1 ts t~1,s
A
B
A
B
T
R
T~1
R
T~1
< c" + a u
< c" + a a
u
/c
< c
t
Tr T~1,r
t
Tr T~1,r T~2,r T~1
t
t/1
r/1
t/1
r/1
t/1
R
T~2
"+ a a
u
< c.
Tr T~1,r T~2,r
t
r/1
t/1
The result follows by induction. h
1 This raises a challenging question on whether there are other superior integration methods
(numerical or stochastic ones) which can be better than the Gaussian quadrature formulation. For
those possible formulations, our recommended algorithm will also be useful.
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
121
The summation over the quadrature points and the product of individual
probabilities in ¸ of (5) can be interchanged with a weight adjustment. The
logarithmic transformation is then applied to the product of adjusted terms. In
consequence, the log likelihood function ¸ in (5) can be evaluated by the
following iterative algorithm:
Algorithm. The log likelihood ¸ can be evaluated as
G
C A
BD
H
J Nj
M
x b#(2o)1@2z
ij
m u
¸" + + ln + U D
,
(6)
ij
i~1,jm
(1!o)1@2
j/1 i/1
m/1
where the weights u can be computed recursively by
ijm
x b#(2o)1@2z
ij
m u
U D
i~1,jm
ij
(1!o)1@2
u "
, i"1,2,N !1; j"1,2,J,
ijm
j
x b#(2o)1@2z
ij
s
+M U D
u
s/1
i~1,js
ij
(1!o)1@2
C A
C A
BD
BD
(7)
starting with u
0jm
"w /p1@2 for m"1,2,M and for all j.
m
In a group e!ect model, individuals in a group correspond to periods (&time')
and the number of groups is the number of cross-sectional units in a panel data
model with cross-sectional time series. The recursive evaluation of the weights
w in (7) is over individuals i within a group where ordering can be arbitrary.
ijm
The w
for m"1,2,M are weights; they are positive numbers and
ijm
+M w "1. The product over i in ¸ of (5) has been e!ectively taken out by the
m/1 ijm
logarithmic transformation in (6). This formulation avoids the evaluation of the
product of probabilities and can be numerically stable. Except for the weighting
adjustment, the expression of ¸ in (6) resembles the log likelihood function for
a pooled probit likelihood function. The weighting has e!ectively corrected the
correlation e!ect of individuals within a group. The evaluation of the log
likelihood function in (6) may be slightly more complicated than the evaluation
of the conventional one in (5) as the former involves updating the weights in (7)
in a recursive fashion. However, the formulation of the log likelihood in (6) is
a by-product of the weighting scheme as the term in the bracket of (6) is exactly
the denominator term in (7). In this regard, the updating of the weighting scheme
in (7) does not impose much additional computational burden. In a subsequent
Monte Carlo experiment, numerical evidence will be provided to demonstrate
the e!ectiveness of this iterative formulation and compare it with the conventional algorithm if possible.
Another possible modi"cation of the conventional formulation in (5) has been
suggested by an associate editor of this journal as follows. The expression in (5)
122
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
can be rewritten as ¸"+J ln[+M exp(h )], where
m/1
jm
j/1
Nj
h "ln(u /Jp)# + ln U[D ((x b#J2oz )/J1!o2)].
jm
m
ij ij
m
i/1
Denote p "maxMh : m"1,2,MN. The ¸ can be evaluated as
j
jm
C
A
M
J
¸" + p #ln + exp(h !p )
jm
j
j
m/1
j/1
BD
.
(8)
This modi"cation might be valuable if h for all m"1,2,M, are not much less
jm
than p for each j. This formulation may be more expensive than both the
j
conventional and the recommended algorithms as one has to sort out the
maximum quantity among h for m"1,2,M and compute the di!erence of
jm
h !p for each j.2 In any case, we will compare this approach with our
jm
j
recommended approach in the subsequent Monte Carlo experiments.
3. Monte Carlo results
Monte Carlo experiments are designed to investigate the numerical stability
of the proposed algorithm and compare it with others in terms of computing
time. In addition, estimation results may reveal the relevance of the number of
Gaussian quadrature points and the performance of the MLE and related test
statistics.
In the main design of our experiments, sample data are generated from the
model:
yH"b #b x #u #v ,
ij
1
2 ij
j
ij
(9)
where u is N(0, o) and v is N(0, 1!o). x is generated from an N(0, 1) random
ij
generator with a 0.5 correlation coe$cient for individuals in a group j. The true
parameters are set to b "0, b "1, and o"0.3. The underlying R2 for yH is
1
2
therefore 0.5. Since o is the variance of u, its value is restricted between 0 and 1 in
the estimation. We have experimented with samples with various groups (J),
various numbers of individuals (N) in a group, and various Gaussian quadrature
points (M). There are either 50 or 100 groups in a sample. The sample in
the main design is balanced because the number of individuals in a group is the
2 The computation cost di!erences will be entirely due to these additional calculations when
a common optimization subroutine is used.
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
123
same for all groups, i.e., N "N for all j"1,2,J. We consider cases with small
j
and large Ns. For each case, the number of replications is 400. We report
summary statistics on the empirical mean (Mean), empirical standard deviation
(Em.SD), the average maximized log likelihood value (lnlk), and the average
CPU time in seconds per replication. In addition to the main design, we have
also experimented with designs with a larger proportion of variance due to u in
the overall e, a larger number of regressors in a model, and panels with
unbalanced observations. The MLE will also be compared with the "xed e!ect
probit estimates (FEPE). The optimization subroutine is the DFP algorithm
from the GQOPT package. All computations are performed in a cluster of SUN
SPARCstation 20 workstations. The Gauss}Hermite points and weights are
generated by the subroutine &gauher' in Numerical Recipes by Press et al. (1992).
Table 1 reports results for MLEs when N is either 10 or 100. Various
M"2}20 are tried. The numerical algorithm is stable as all replications converge. For cases with N"10, this is expected because there was no report on
instability of the conventional algorithm in Butler and Mo$tt (1982) with small
&time' dimension in discrete panel data. All the estimates of b and b are
1
2
unbiased. This is true for various numbers of Gaussian quadrature points and
groups. There is some moderate amount of downward bias in the estimate of
o when only a two-point quadrature is used. The biases become small when
four- or eight-point quadratures are used. The lnlk improves as M increases
from two to four. The improvement in lnlk from M"4 to 8 is small. It is
interesting to note that, with M greater than 8, no improvements are observed as
the likelihood function becomes stable. The CPU times are approximately linear
in M and J. For the cases with N"10, the four- or eight-point quadratures are
su$cient. These results con"rm the suggestion in Butler and Mo$tt (1982) for
the use of a four-point quadrature. Their suggestion was derived from a sample
with large cross-sectional units but similar &time' dimensions. It is, however, by
no means a universal rule. To compare time costs of the recommended algorithm with the conventional formulation by Butler and Mo$tt (1982) and the
one in (8), we re-estimate a case (N"10, J"50 and M"4) with the two
alternative formulations. All these three algorithms provide identical estimates
but there are some di!erences in time cost. The conventional algorithm took
1.103 CPU seconds on average per replication to converge and the algorithm in
(8) took 1.333 CPU seconds. Our iterative algorithm's time cost is 1.152 CPU
seconds on average. Thus, our iterative algorithm is slightly more time consuming than the conventional one but is less so than that in (8). Subsequently, we
have done more experiments (reported in Table 3). For a case with larger
N"50, J"50 and M"16, the conventional algorithm took 13.486 CPU
seconds; our iterative algorithm took 13.623 CPU seconds; and the algorithm (8)
took 19.483 CPU seconds.
The second part of Table 1 reports results when the number of individuals per
group N is 100. Borjas and Sueyoshi (1994) reported numerical di$culties for
b
b1
o2
b
b1
o2
b
b1
o2
b
b1
o2
b
b1
o2
b
b1
o2
b
b1
o2
b
b1
o2
b
b1
o2
Mean
J"50
!234.937
0.63
!0.0052
1.0144
0.2905
!233.937
1.15
!0.0051
1.0144
0.2905
!233.863
2.06
!0.0051
1.0144
0.2905
J"100
!471.00
1.23
!0.0028
1.0101
0.2960
!468.56
2.30
!0.0026
1.0100
0.2961
J"50
!2256.81
9.03
!0.0034
1.0052
0.2038
!2192.21
18.00
!0.0057
1.0611
0.1901
!2180.86
28.84
!0.0101
1.0402
0.2354
!2177.24
55.53
!0.0031
1.0111
0.2832
!2176.87
69.09
!0.0021
1.0043
0.2938
Em.SD
N"10;
M"16
(0.0999)
(0.1006)
(0.0773)
M"20
(0.0999)
(0.1006)
(0.0772)
M"30
(0.0999)
(0.1006)
(0.0772)
N"10;
M"8
(0.0692)
(0.0670)
(0.0553)
M"16
(0.0692)
(0.0671)
(0.0553)
N"100;
M"2;
(0.0847)
(0.0270)
(0.0213)
M"4;
(0.1144)
(0.0273)
(0.0251)
M"8;
(0.1115)
(0.0297)
(0.0307)
M"16;
(0.0960)
(0.0318)
(0.0366)
M"20;
(0.0833)
(0.0320)
(0.0375)
lnlk
CPU
J"50
!233.865
3.95
!233.865
5.86
!233.866
7.26
J"100
!468.38
4.68
!468.38
7.85
J"100
!4516.42
14.34
!4382.59
30.26
!4357.31
66.23
!4348.72
112.42
!4347.81
139.20
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
b
b1
o2
N"10;
M"2
!0.0003
(0.1125)
1.0174
(0.1037)
0.2420
(0.0604)
M"4
!0.0046
(0.1067)
1.0165
(0.1012)
0.2840
(0.0745)
M"8
!0.0052
(0.1007)
1.0143
(0.1006)
0.2907
(0.0776)
N"10;
M"2
!0.0027
(0.0768)
1.0110
(0.0692)
0.2418
(0.0412)
M"4
!0.0027
(0.0711)
1.0132
(0.0675)
0.2873
(0.0514)
N"100;
M"2;
!0.0043
(0.1208)
1.0043
(0.0387)
0.2069
(0.0342)
M"4;
!0.0087
(0.1497)
1.0560
(0.0392)
0.1970
(0.0370)
M"8;
!0.0087
(0.1389)
1.0274
(0.0446)
0.2519
(0.0493)
M"16;
!0.0149
(0.1209)
0.9967
(0.0494)
0.3009
(0.0590)
M"20;
!0.0077
(0.1050)
0.9901
(0.0495)
0.3111
(0.0608)
CPU
124
Table 1
Error component group e!ect model: main design
True parameters: b "0, b "1, and o"0.3; balanced panels
1
2
Mean
Em.SD
lnlk
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
125
the conventional Gaussian quadrature for the case with N"100 as under#ow
problems occurred when Ms greater than 4 were used (Borjas and Sueyoshi
1994, Appendix A.2.2). On the contrary, our algorithm is stable for all replications with M"2}20. The estimates of bs with various M and J are all unbiased.
There are downward biases in the estimate of o. The biases decrease as M increases from 2 to 16 or 20. Except for b , the Em.SDs of the estimates of b and
1
2
o tend to increase with M. For a larger number of groups J, the proper M tends
to be slightly larger so as to achieve a small bias. The lnlk values show better
goodness of "t when M"16 or 20 is used. It is evident from these Monte Carlo
results that M"4 is too small for N"100 as the biases in o are substantial.
Borjas and Sueyoshi (1991) reported statistical inaccuracy on the level of
signi"cance in hypothesis testing based on random e!ect probit estimates with
the conventional Gaussian quadrature algorithm. With a conventional "fth
percent nominal level of signi"cance, the actual level of signi"cance can be more
than 40th percent. This problem occurred because a four-point quadrature was
the one used for their reported Monte Carlo results (Borjas and Sueyoshi 1994,
Tables 1 and 2). Table 2 reports results on the likelihood ratio test for the null
hypothesis b "1 based on various M for our study. The inaccurate results in
1
Borjas and Sueyoshi are recon"rmed when M"4 is used for N"100. But
when M increases, the degree of inaccuracy in the level of signi"cance decreases.
While there are still some discrepancies when M"20 is used, the di!erences are
reasonably small. The discrepancies are in general smaller for the case with
N"10 than the case with N"100. These results indicate the importance of the
proper selection of M.
Table 3 reports results on some additional designs (mainly with N"100). For
sample generated with o"0.6, the estimates of o tend to be biased downward
Table 2
Likelihood ratio test } level of signi"cance
H : b "1
0 2
N
J
M
10%
5%
2%
1%
10
10
10
10
50
50
50
50
2
4
8
16
15.75
14.25
13.00
13.00
9.00
7.25
6.00
5.50
4.75
3.50
2.00
2.00
2.25
2.50
1.25
1.25
100
100
100
100
50
50
50
50
4
8
16
20
63.00
34.25
16.75
13.50
52.75
26.25
12.00
7.50
40.25
18.75
6.75
4.00
34.50
14.25
3.75
2.75
Table 3
Error component group e!ect model: additional designs
0.0
1.0
0.6
b
b1
b2
b3
b4
b5
o6
0.0
1.0
0.5
0.0
!0.5
!1.0
0.3
b
b1
o2
0.0
1.0
0.3
b
b1
o2
0.0
1.0
0.3
b
b1
o2
0.0
1.0
0.3
b
b1
o2
0.0
1.0
0.3
b
b1
o2
0.0
1.0
0.3
b
b1
o2
0.0
1.0
0.3
Em.SD
lnlk
N"100; J"50
M"20
!0.0011
(0.1261) !1698.98
1.1011
(0.0553)
0.5088
(0.0433)
N"100; J"50
M"16
!0.0486
(0.2188) !1985.04
0.9786
(0.0590)
0.4853
(0.0737)
0.0002
(0.0152)
!0.4870
(0.0502)
!0.7052
(0.4808)
0.3288
(0.0754)
(N"10, J"25)
(N"100, J"25)
M"4
!0.0068
(0.1648) !1215.31
1.0385
(0.0632)
0.2226
(0.0654)
M"8
0.0038
(0.1489) !1209.09
1.0115
(0.0622)
0.2753
(0.0703)
(N"10, J"45)
(N"100, J"5)
M"(4; 16)
!0.0058
(0.1130)
!428.65
0.9977
(0.0828)
0.3030
(0.0827)
N"500; J"50
M"4
0.0056
(0.1678) !10775.65
1.0723
(0.0349)
0.1647
(0.0413)
M"10
0.0066
(0.2123) !10667.62
1.0738
(0.0547)
0.1832
(0.0710)
N"500; J"100
M"36
!0.0078
(0.1237) !21263.68
1.0115
(0.0577)
0.2798
(0.0769)
CPU
53.62
161.72
8.43
17.94
6.32
92.99
201.39
1373.98
Mean
Em.SD
lnlk
N"100; J"50
M"30
0.0049
(0.1237) !1696.54
1.0709
(0.0576)
0.5381
(0.0444)
N"100; J"100
M"20
!0.0246
(0.1696) !3966.58
0.9982
(0.0350)
0.4989
(0.0500)
!0.0002
(0.0112)
!0.4964
(0.0334)
!0.9167
(0.3108)
0.3034
(0.0433)
(N"10, J"25)
(N"100, J"25)
M"16
0.0054
(0.1158) !1207.22
0.9905
(0.0654)
0.3110
(0.0793)
M"(4; 16)
0.0038
(0.1163) !1207.25
0.9945
(0.0620)
0.3053
(0.0734)
N"50; J"50
M"8
!0.0036
(0.1178) !1107.56
1.0128
(0.0502)
0.2809
(0.0512)
N"500; J"50
M"20
!0.0030
(0.1681) !10648.84
1.0112
(0.0716)
0.2754
(0.0950)
M"30
0.0103
(0.1340) !10639.28
0.9791
(0.0749)
0.3219
(0.0988)
N"500; J"100
M"42
!0.0083
(0.1108) !21259.62
1.0018
(0.0601)
0.2937
(0.0800)
CPU
81.63
361.02
34.79
29.74
13.62
370.80
607.10
1572.32
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
b
b1
o2
Mean
126
True
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
127
but have smaller variances than those with o"0.3. The estimates of bs have
slightly larger upward biases. The recursive algorithm is numerically stable for
models with large o and those with more regressors. For the latter, four more
regressors, say x to x , are introduced in addition to the constant term and
3
6
the regressor x in (9). x is a uniform random variable. x is an ordered discrete
3
4
variable taking values from 1 to 5, of which their occurrence probabilities are,
respectively, 0.1, 0.2, 0.2, 0.3, and 0.2. x is a dichotomous indictor with equal
5
probabilities for its two categories 0 and 1. All these three additional regressors
are i.i.d. for all i and j. The fourth additional regressor x is a uniform random
6
variable which is independent across groups but is invariant for members within
a group. The true coe$cients of these additional regressors are set to
(b , b , b , b )"(0.5, 0.0, !0.5, !1.0). Except the estimates of b , all the
3 4 5 6
6
estimates have small biases. With J"50, the estimate of b has a 30% down6
ward bias. When J increases to 100, the downward bias is reduced to only 10%.
For x , because it is a &time' invariant variable, the group dimension J plays
6
a crucial role.
All the preceding results are for balanced panels. It remains of interest
to investigate the estimation of unbalanced panels. The preceding results in
Table 1 indicate that, for N"10, M"4 is su$cient but for N"100, M"16
will be appropriate. In an unbalanced sample, some panels might have small
N but the others have large N. An issue for unbalanced panels is on the selection
of M. The third part of Table 3 reports Monte Carlo results on the estimation of
models with unbalanced panels. The heading with (N"10, J"25) and
(N"100, J"25) refers to unbalanced panels with a total of 50 groups; among
them half have N"10 and the remaining half have N"100. With these
unbalanced panels, M"4 is insu$cient as much downward bias appears in the
estimate of o. M"16 is needed. These results are expected as the larger M is
needed for long panels. With long panels in a sample, the presence of short
panels will not ease the demand for the larger M but will not pose additional
burden either. However, the strategy for the selection of a single, su$ciently
large M to accomodate panels of various lengths is conservative but expensive.
A better strategy may be to select varying M for each group. In Table 3,
M"(4;16) refers to the selection of M"4 for groups with N"4 but M"16
for N"100. The results indicate the latter strategy is desirable. Its estimates are
slightly more accurate, the time cost is less, and the lnlk value is similar to the
one of a constant M"16. The unbalanced panels design with (N"10, J"45)
and (N"100, J"10) provides additional evidence.
The remaining part of Table 3 reports estimates for samples with N"500.
These estimates provide more evidence on the need of using a larger number of
Gaussian points when the &time' dimension N becomes larger. The case with
N"500 was considered impossible in Borjas and Sueyoshi (1994) with the
conventional Gaussian quadrature. Our algorithm shows no numerical problem
as all replications converge. The large lnlk values con"rm the numerical
128
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
Table 4
Fix e!ect model
True parameter: b "1
2
N
J
10
10
50
100
100
500
500
(10
(100
(10
(100
50
100
50
50
100
50
100
45)
5)
25)
25)
Mean
Em.SD
JH
Iter
CPU
b
2
b
2
b
2
b
2
b
2
b
2
b
2
b
2
1.1655
1.1637
1.0282
1.0118
1.0132
1.0031
1.0030
1.0663
0.1395
0.0997
0.0409
0.0256
0.0190
0.0127
0.0087
0.0799
45.37
90.97
49.99
50.00
100.00
50.00
100.00
46.02
5.84
5.98
5.51
5.25
5.44
5.06
5.15
5.67
0.05
0.12
0.27
0.62
1.29
2.58
5.38
0.11
b
2
1.0217
0.0374
47.80
5.44
0.32
impossibility with the conventional quadrature.3 The estimates of bs are again
unbiased for all Ms from 4 to 30. With a small M, the magnitude of bias for o is
larger than those in Table 1 with small and moderate Ns. With M being 20 or 30
for J"50 and M"36 or larger for J"100, the biases of o become reasonably
small. However, their Em.SDs are larger than those of N"100 in Table 1. The
latter's poor statistical property must be due to the possibility that the Gaussian
quadrature approximation becomes poorer as N becomes larger.
For comparison, some results on the estimates of the "xed e!ect probit panel
model are provided in Table 4. The FEPE can be e!ectively derived from the
Newton}Raphson algorithm as described in Hall (1978). Borjas and Sueyoshi
(1994) compared the performance of the FEPE with the MLE of a random e!ect
model with N"100 and M"4. Here we supplement their comparisons with
a few of our Monte Carlo designs. The FEPE provides estimates of b and u s. In
2
j
a group j, if all its members have the same discrete response, it is known that the
FEPE of u will be in"nity. In Table 4, JH refers to the (average) number of
j
groups in a sample where not all members of a group have the same response.
The estimates are consistent only if N goes to in"nity (Chamberlain, 1980).4 The
3 We experimented with the algorithm in (8) for a case with N"500, J"50 and M"20. That
algorithm did not encounter any numerical under#ow problem in that case. It provided similar
coe$cient and likelihood estimates but its CPU time cost was 582.77 s per replication. The latter is
much more than the 370.82 s for the recommended iterative algorithm.
4 It has been shown in Chamberlain (1980) that if N remains "nite when J goes to in"nity, the
FEPE of b may not be consistent. The FEPE of b is consistent and distribution-free with respect to
the distribution of u if N goes to in"nity.
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
129
FEPE is computationally simpler and inexpensive even with large N. The
number of iterations (iter) for convergence of the Newton}Raphson algorithm is
almost invariant with respect to N and J. The FEPEs of b have larger biases
2
and variances than those of the random e!ect estimates for models with small
N"10. But as N increases to 50, the bias is reduced and is only slightly larger
than that of the random e!ect estimate (in Table 3). For N"100 or large, the
estimates of b are unbiased. For N"50 or larger, the Em.SDs of the FEPEs of
2
b can even be smaller than those of the random e!ect estimates. In conclusion,
2
the FEPEs can be preferred to the Gaussian quadrature random e!ect MLEs
when N is large. But for small or moderate N, the FEPE would not be a better
procedure. This comparison con"rms once more the conclusion in Borjas and
Sueyoshi (1994).
Acknowledgements
I appreciate having valuable comments and suggestions from two anonymous
referees and an associated editor. Financial support from the RGC of Hong
Kong under grant no. HKUST595/96H for my research is gratefully acknowledged.
References
Abramowitz, M., Stegun, I., 1964. Handbook of mathematical functions with formulas, graphs, and
mathematical tables. National Bureau of Standards Applied Mathematics Series No. 55, US
Government Printing O$ce, Washington, D.C.
Borjas, G.J., Sueyoshi, G.T., 1994. A two-stage estimator for probit models with structural group
e!ects. Journal of Econometrics 64, 165}182.
Butler, J.S., Mo$tt, R., 1982. A computationally e$cient quadrature procedure for the one-factor
multinomial probit model. Econometrica 50, 761}764.
Chamberlain, G., 1980. Analysis of covariance with qualitative data. Review of Economic Studies 47,
225}238.
Hall, B.H., 1978. A general framework for time series-cross section estimation. Annales de l'INSEE
30}31, 177}202.
Heckman, J.J., Willis, R.J., 1975. Estimation of a stochastic model of reproduction: an econometric
approach. in: Terleckyj, N. (Ed.), Household Production and Consumption. Cambridge University Press, New York, NY.
Lee, L.F., 1996. Estimation of dynamic and arch tobit models, Department of Economics. HKUST
Working paper no. 96/97-2.
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 1992. Numerical Recipes, 2nd
Edition, Cambridge University Press, New York.
Stroud, A., Secrest, D., 1966. Gaussian Quadrature Formulas. Prentice-Hall, Englewood Cli!s,
NJ.
A numerically stable quadrature procedure for
the one-factor random-component discrete
choice model
Lung-fei Lee*
Department of Economics, Hong Kong University of Science and Technology, Clear Water Bay,
Kowloon, Hong Kong
Received 1 January 1997; received in revised form 1 December 1998; accepted 1 April 1999
Abstract
The Gaussian quadrature formula had been popularized by Butler and Mo$tt (1982
Econometrika 50, 761}764) for the estimation of the error component probit panel model.
Borjas and Sueyoshi (1994, Journal of Econometrics 64, 164}182) pointed out some
numerical and statistical di$culties of applying it to models with group e!ects. With
a moderate or large number of individuals in a group, the likelihood function of the model
evaluated by the Gaussian quadrature formula can be numerically unstable, and at worst,
impossible to evaluate. Statistical inference may also be inaccurate. We point out that some
of these di$culties can be overcome with a carefully designed algorithm and the proper
selection of the number of quadrature points. However, with a very large number of
individuals in a group, the Gaussian quadrature formulation of integral may have large
numerical approximation errors. ( 2000 Elsevier Science S.A. All rights reserved.
Keywords: Discrete choice; Random component; Quadrature
1. Introduction
Consider the following random-component, binary choice model:
yH"x b#e ,
ij
ij
ij
* Tel.: 852-2358-7600; fax: 852-2358-2084.
E-mail address: l#[email protected] (L.f. Lee)
0304-4076/00/$ - see front matter ( 2000 Elsevier Science S.A. All rights reserved.
PII: S 0 3 0 4 - 4 0 7 6 ( 9 9 ) 0 0 0 3 2 - 9
(1)
118
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
for groups j"1,2, J and individuals i"1,2, N in the jth group, where x is
j
a vector of exogenous variables. The disturbance in (1) is generated by an error
component structure:
e "u #v ,
(2)
ij
j
ij
where u and v are mutually independent, u are i.i.d. for all j, and v are
i
ij
j
ij
i.i.d. for all i and j. The sign of the latent yH determines the observed
dichotomous dependent variable I as I"1 if yH'0; it is 0 otherwise. Let F be
the conditional distribution function of e conditional on u, and f be the density
of u. The log-likelihood function for the model is
J
¸" + ln[Prob(I ,2,I j )],
1j
N ,j
j/1
where
P
=
Nj
f (u ) < [F(b Du )!F(a Du )] du ,
ij j
ij j
j
j
~=
i/1
a "!x b and b "R if I "1, and a "!R and b "!x b if I "0.
ij
ij
ij
ij
ij
ij
ij
ij
For a probit model, both u and v are assumed to be normally distributed.
A typical normalization for the probit model speci"es a zero mean and a unit
variance for the total disturbance e. Let o be the correlation between individuals
within a group. For this normalized random-component model, o is also the
variance of u. Therefore, u is N(0, o) and v is N(0, 1!o). The log-likelihood
j
ij
function for this probit error component model can be compactly written by
using the symmetry of a normal density as
Prob(I ,2,I j )"
1j
N ,j
GP
C A
BD H
=
J
Nj
x b#o1@2u
1
ij
¸" + ln
e~(1@2)u2 < U D
du ,
(3)
ij (1!o)1@2
(2p)1@2
~=
j/1
i/1
where U is the standard normal distribution function, and D is a sign indicator
such that D "1 if I "1; D "!1, otherwise.
ij
ij
ij
The joint probability of individual responses within a group in (3) involves
a single integral whose integrand is a product of univariate probability functions. Butler and Mo$tt (1982) pointed out that these joint probabilities could
be e!ectively evaluated by the Gauss}Hermite quadrature. The Gaussian quadrature evaluates the integral := e~z2g(z) dz by using the integration formula
~=
=
M
e~z2g(z) dz" + w g(z ),
(4)
m m
~=
m/1
where M is the designated number of points; zs and ws are the M-point
Gauss}Hermite abscissas and weights. If g in (4) is a polynomial of degree less
than 2M!1, the Gauss}Hermite integration evaluation is exact. The theory
P
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
119
is based on orthogonal polynomials (see e.g., Press et al., 1992, Chapter 4).
The abscissas and weights are available from Stroud and Secrest (1966) and
Abramowitz and Stegun (1964). Press et al. (1992) provides computing codes for
generating the abscissas and weights with any speci"c number of points. Given
the Gauss}Hermite points z and weights w , the likelihood in (3) can be
m
m
evaluated by the Gauss}Hermite formula as
G
C A
BDH
J
1 Nj
M
x b#(2o)1@2z
ij
m
¸" + ln + w
< U D
mp1@2
ij
(1!o)1@2
j/1
m/1
i/1
.
(5)
Butler and Mo$tt (1982) illustrated the usefulness of the Gaussian quadrature
with a panel data consisting of a sample of 1550 cross-sectional units with
a maximum of 11 periods each. Based on stability of estimates, they suggested
that the two- or four-point quadratures would be su$cient. The Gaussian
quadrature approach is computationally e$cient relative to other quadrature
techniques such as trapezoidal integration (Heckman and Willis, 1975). Subsequently, the Gaussian quadrature approach has been often used in the empirical econometrics literature.
In a recent publication, Borjas and Sueyoshi (1994) pointed out some numerical and statistical di$culties that occurred when they tried to apply the
technique to study probit models with structural group e!ects where the number
of individuals in a group was large. In their model, individuals belonging to
a given group share a common component in the speci"cation of a conditional
mean, and there are many groups. The group e!ect speci"cation is an error
component probit model as in (1) and (2). They argued that a likelihood
formulation with the Gaussian quadrature could be numerically unstable, and,
at worst, impossible to evaluate with computers if the number of individuals in
some groups, i.e., N , was large. This is so because the integrand of a numeric
j
integration in (3) involves the product of cumulative probabilities for all members in a group. With a hypothetical sample of 500 observations per group and
assuming a likelihood contribution of 0.5 for each member in a group, the value
of the integrand can be as small as e500C-/(0.5)+e~346.6, which is below the
existing absolute minimum for a computer. The numerical problem occurs when
one tries to evaluate the product that consists of many terms of small numbers.
Based on Monte Carlo results, Borjas and Sueyoshi (1994) also pointed out that
statistical inference based on the Gaussian quadrature likelihood function can
be quite inaccurate. Nominal levels of signi"cance can be much smaller than
actual levels of signi"cance in hypothesis testing.
In this paper, we reconsider the numerical and statistical problems. We point
out that with a carefully designed algorithm and proper selection of the number
of Gaussian quadrature points, the di$culties discussed by Borjas and Sueyoshi
(1994) can be overcome to a certain degree. A by-product of our research is that
we discovered that the Gaussian quadrature formulation may contain relatively
120
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
large approximation errors when the number of individuals in a group is large.
The consequence of the latter may be larger standard errors for maximum
likelihood estimates (MLE) when the number of individuals in a group
increases.1
2. Gaussian quadrature and a numerically stable algorithm
We suggest an algorithm that can overcome the numerical problem. The
numerically unstable problem can be resolved if the summation and product
operators behind the logarithmic transformation in (5) can somehow be interchanged. The possible interchanging summation and product is "rst discussed in
Lee (1996) for some related problems in simulation estimation. It can be
summarized in the following proposition.
Proposition. For any constants a , t"1,2,¹ and r"1,2,R, the following
tr
identity holds:
R
T
T R
,
+ b < a "< + a u
tr t~1,r
r
tr
r/1 t/1
t/1 r/1
where u are weights for t*1, which can be computed recursively as
tr
N
R
u "a u
+ a u
,
tr
tr t~1,r
ts t~1,s
s/1
O0 for all
starting with u "b for r"1,2, R, assuming that +R a u
0r
r
s/1 ts t~1,s
t"1,2,¹!1.
Proof. Let c "+R a u
. Then
t
s/1 ts t~1,s
A
B
A
B
T
R
T~1
R
T~1
< c" + a u
< c" + a a
u
/c
< c
t
Tr T~1,r
t
Tr T~1,r T~2,r T~1
t
t/1
r/1
t/1
r/1
t/1
R
T~2
"+ a a
u
< c.
Tr T~1,r T~2,r
t
r/1
t/1
The result follows by induction. h
1 This raises a challenging question on whether there are other superior integration methods
(numerical or stochastic ones) which can be better than the Gaussian quadrature formulation. For
those possible formulations, our recommended algorithm will also be useful.
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
121
The summation over the quadrature points and the product of individual
probabilities in ¸ of (5) can be interchanged with a weight adjustment. The
logarithmic transformation is then applied to the product of adjusted terms. In
consequence, the log likelihood function ¸ in (5) can be evaluated by the
following iterative algorithm:
Algorithm. The log likelihood ¸ can be evaluated as
G
C A
BD
H
J Nj
M
x b#(2o)1@2z
ij
m u
¸" + + ln + U D
,
(6)
ij
i~1,jm
(1!o)1@2
j/1 i/1
m/1
where the weights u can be computed recursively by
ijm
x b#(2o)1@2z
ij
m u
U D
i~1,jm
ij
(1!o)1@2
u "
, i"1,2,N !1; j"1,2,J,
ijm
j
x b#(2o)1@2z
ij
s
+M U D
u
s/1
i~1,js
ij
(1!o)1@2
C A
C A
BD
BD
(7)
starting with u
0jm
"w /p1@2 for m"1,2,M and for all j.
m
In a group e!ect model, individuals in a group correspond to periods (&time')
and the number of groups is the number of cross-sectional units in a panel data
model with cross-sectional time series. The recursive evaluation of the weights
w in (7) is over individuals i within a group where ordering can be arbitrary.
ijm
The w
for m"1,2,M are weights; they are positive numbers and
ijm
+M w "1. The product over i in ¸ of (5) has been e!ectively taken out by the
m/1 ijm
logarithmic transformation in (6). This formulation avoids the evaluation of the
product of probabilities and can be numerically stable. Except for the weighting
adjustment, the expression of ¸ in (6) resembles the log likelihood function for
a pooled probit likelihood function. The weighting has e!ectively corrected the
correlation e!ect of individuals within a group. The evaluation of the log
likelihood function in (6) may be slightly more complicated than the evaluation
of the conventional one in (5) as the former involves updating the weights in (7)
in a recursive fashion. However, the formulation of the log likelihood in (6) is
a by-product of the weighting scheme as the term in the bracket of (6) is exactly
the denominator term in (7). In this regard, the updating of the weighting scheme
in (7) does not impose much additional computational burden. In a subsequent
Monte Carlo experiment, numerical evidence will be provided to demonstrate
the e!ectiveness of this iterative formulation and compare it with the conventional algorithm if possible.
Another possible modi"cation of the conventional formulation in (5) has been
suggested by an associate editor of this journal as follows. The expression in (5)
122
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
can be rewritten as ¸"+J ln[+M exp(h )], where
m/1
jm
j/1
Nj
h "ln(u /Jp)# + ln U[D ((x b#J2oz )/J1!o2)].
jm
m
ij ij
m
i/1
Denote p "maxMh : m"1,2,MN. The ¸ can be evaluated as
j
jm
C
A
M
J
¸" + p #ln + exp(h !p )
jm
j
j
m/1
j/1
BD
.
(8)
This modi"cation might be valuable if h for all m"1,2,M, are not much less
jm
than p for each j. This formulation may be more expensive than both the
j
conventional and the recommended algorithms as one has to sort out the
maximum quantity among h for m"1,2,M and compute the di!erence of
jm
h !p for each j.2 In any case, we will compare this approach with our
jm
j
recommended approach in the subsequent Monte Carlo experiments.
3. Monte Carlo results
Monte Carlo experiments are designed to investigate the numerical stability
of the proposed algorithm and compare it with others in terms of computing
time. In addition, estimation results may reveal the relevance of the number of
Gaussian quadrature points and the performance of the MLE and related test
statistics.
In the main design of our experiments, sample data are generated from the
model:
yH"b #b x #u #v ,
ij
1
2 ij
j
ij
(9)
where u is N(0, o) and v is N(0, 1!o). x is generated from an N(0, 1) random
ij
generator with a 0.5 correlation coe$cient for individuals in a group j. The true
parameters are set to b "0, b "1, and o"0.3. The underlying R2 for yH is
1
2
therefore 0.5. Since o is the variance of u, its value is restricted between 0 and 1 in
the estimation. We have experimented with samples with various groups (J),
various numbers of individuals (N) in a group, and various Gaussian quadrature
points (M). There are either 50 or 100 groups in a sample. The sample in
the main design is balanced because the number of individuals in a group is the
2 The computation cost di!erences will be entirely due to these additional calculations when
a common optimization subroutine is used.
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
123
same for all groups, i.e., N "N for all j"1,2,J. We consider cases with small
j
and large Ns. For each case, the number of replications is 400. We report
summary statistics on the empirical mean (Mean), empirical standard deviation
(Em.SD), the average maximized log likelihood value (lnlk), and the average
CPU time in seconds per replication. In addition to the main design, we have
also experimented with designs with a larger proportion of variance due to u in
the overall e, a larger number of regressors in a model, and panels with
unbalanced observations. The MLE will also be compared with the "xed e!ect
probit estimates (FEPE). The optimization subroutine is the DFP algorithm
from the GQOPT package. All computations are performed in a cluster of SUN
SPARCstation 20 workstations. The Gauss}Hermite points and weights are
generated by the subroutine &gauher' in Numerical Recipes by Press et al. (1992).
Table 1 reports results for MLEs when N is either 10 or 100. Various
M"2}20 are tried. The numerical algorithm is stable as all replications converge. For cases with N"10, this is expected because there was no report on
instability of the conventional algorithm in Butler and Mo$tt (1982) with small
&time' dimension in discrete panel data. All the estimates of b and b are
1
2
unbiased. This is true for various numbers of Gaussian quadrature points and
groups. There is some moderate amount of downward bias in the estimate of
o when only a two-point quadrature is used. The biases become small when
four- or eight-point quadratures are used. The lnlk improves as M increases
from two to four. The improvement in lnlk from M"4 to 8 is small. It is
interesting to note that, with M greater than 8, no improvements are observed as
the likelihood function becomes stable. The CPU times are approximately linear
in M and J. For the cases with N"10, the four- or eight-point quadratures are
su$cient. These results con"rm the suggestion in Butler and Mo$tt (1982) for
the use of a four-point quadrature. Their suggestion was derived from a sample
with large cross-sectional units but similar &time' dimensions. It is, however, by
no means a universal rule. To compare time costs of the recommended algorithm with the conventional formulation by Butler and Mo$tt (1982) and the
one in (8), we re-estimate a case (N"10, J"50 and M"4) with the two
alternative formulations. All these three algorithms provide identical estimates
but there are some di!erences in time cost. The conventional algorithm took
1.103 CPU seconds on average per replication to converge and the algorithm in
(8) took 1.333 CPU seconds. Our iterative algorithm's time cost is 1.152 CPU
seconds on average. Thus, our iterative algorithm is slightly more time consuming than the conventional one but is less so than that in (8). Subsequently, we
have done more experiments (reported in Table 3). For a case with larger
N"50, J"50 and M"16, the conventional algorithm took 13.486 CPU
seconds; our iterative algorithm took 13.623 CPU seconds; and the algorithm (8)
took 19.483 CPU seconds.
The second part of Table 1 reports results when the number of individuals per
group N is 100. Borjas and Sueyoshi (1994) reported numerical di$culties for
b
b1
o2
b
b1
o2
b
b1
o2
b
b1
o2
b
b1
o2
b
b1
o2
b
b1
o2
b
b1
o2
b
b1
o2
Mean
J"50
!234.937
0.63
!0.0052
1.0144
0.2905
!233.937
1.15
!0.0051
1.0144
0.2905
!233.863
2.06
!0.0051
1.0144
0.2905
J"100
!471.00
1.23
!0.0028
1.0101
0.2960
!468.56
2.30
!0.0026
1.0100
0.2961
J"50
!2256.81
9.03
!0.0034
1.0052
0.2038
!2192.21
18.00
!0.0057
1.0611
0.1901
!2180.86
28.84
!0.0101
1.0402
0.2354
!2177.24
55.53
!0.0031
1.0111
0.2832
!2176.87
69.09
!0.0021
1.0043
0.2938
Em.SD
N"10;
M"16
(0.0999)
(0.1006)
(0.0773)
M"20
(0.0999)
(0.1006)
(0.0772)
M"30
(0.0999)
(0.1006)
(0.0772)
N"10;
M"8
(0.0692)
(0.0670)
(0.0553)
M"16
(0.0692)
(0.0671)
(0.0553)
N"100;
M"2;
(0.0847)
(0.0270)
(0.0213)
M"4;
(0.1144)
(0.0273)
(0.0251)
M"8;
(0.1115)
(0.0297)
(0.0307)
M"16;
(0.0960)
(0.0318)
(0.0366)
M"20;
(0.0833)
(0.0320)
(0.0375)
lnlk
CPU
J"50
!233.865
3.95
!233.865
5.86
!233.866
7.26
J"100
!468.38
4.68
!468.38
7.85
J"100
!4516.42
14.34
!4382.59
30.26
!4357.31
66.23
!4348.72
112.42
!4347.81
139.20
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
b
b1
o2
N"10;
M"2
!0.0003
(0.1125)
1.0174
(0.1037)
0.2420
(0.0604)
M"4
!0.0046
(0.1067)
1.0165
(0.1012)
0.2840
(0.0745)
M"8
!0.0052
(0.1007)
1.0143
(0.1006)
0.2907
(0.0776)
N"10;
M"2
!0.0027
(0.0768)
1.0110
(0.0692)
0.2418
(0.0412)
M"4
!0.0027
(0.0711)
1.0132
(0.0675)
0.2873
(0.0514)
N"100;
M"2;
!0.0043
(0.1208)
1.0043
(0.0387)
0.2069
(0.0342)
M"4;
!0.0087
(0.1497)
1.0560
(0.0392)
0.1970
(0.0370)
M"8;
!0.0087
(0.1389)
1.0274
(0.0446)
0.2519
(0.0493)
M"16;
!0.0149
(0.1209)
0.9967
(0.0494)
0.3009
(0.0590)
M"20;
!0.0077
(0.1050)
0.9901
(0.0495)
0.3111
(0.0608)
CPU
124
Table 1
Error component group e!ect model: main design
True parameters: b "0, b "1, and o"0.3; balanced panels
1
2
Mean
Em.SD
lnlk
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
125
the conventional Gaussian quadrature for the case with N"100 as under#ow
problems occurred when Ms greater than 4 were used (Borjas and Sueyoshi
1994, Appendix A.2.2). On the contrary, our algorithm is stable for all replications with M"2}20. The estimates of bs with various M and J are all unbiased.
There are downward biases in the estimate of o. The biases decrease as M increases from 2 to 16 or 20. Except for b , the Em.SDs of the estimates of b and
1
2
o tend to increase with M. For a larger number of groups J, the proper M tends
to be slightly larger so as to achieve a small bias. The lnlk values show better
goodness of "t when M"16 or 20 is used. It is evident from these Monte Carlo
results that M"4 is too small for N"100 as the biases in o are substantial.
Borjas and Sueyoshi (1991) reported statistical inaccuracy on the level of
signi"cance in hypothesis testing based on random e!ect probit estimates with
the conventional Gaussian quadrature algorithm. With a conventional "fth
percent nominal level of signi"cance, the actual level of signi"cance can be more
than 40th percent. This problem occurred because a four-point quadrature was
the one used for their reported Monte Carlo results (Borjas and Sueyoshi 1994,
Tables 1 and 2). Table 2 reports results on the likelihood ratio test for the null
hypothesis b "1 based on various M for our study. The inaccurate results in
1
Borjas and Sueyoshi are recon"rmed when M"4 is used for N"100. But
when M increases, the degree of inaccuracy in the level of signi"cance decreases.
While there are still some discrepancies when M"20 is used, the di!erences are
reasonably small. The discrepancies are in general smaller for the case with
N"10 than the case with N"100. These results indicate the importance of the
proper selection of M.
Table 3 reports results on some additional designs (mainly with N"100). For
sample generated with o"0.6, the estimates of o tend to be biased downward
Table 2
Likelihood ratio test } level of signi"cance
H : b "1
0 2
N
J
M
10%
5%
2%
1%
10
10
10
10
50
50
50
50
2
4
8
16
15.75
14.25
13.00
13.00
9.00
7.25
6.00
5.50
4.75
3.50
2.00
2.00
2.25
2.50
1.25
1.25
100
100
100
100
50
50
50
50
4
8
16
20
63.00
34.25
16.75
13.50
52.75
26.25
12.00
7.50
40.25
18.75
6.75
4.00
34.50
14.25
3.75
2.75
Table 3
Error component group e!ect model: additional designs
0.0
1.0
0.6
b
b1
b2
b3
b4
b5
o6
0.0
1.0
0.5
0.0
!0.5
!1.0
0.3
b
b1
o2
0.0
1.0
0.3
b
b1
o2
0.0
1.0
0.3
b
b1
o2
0.0
1.0
0.3
b
b1
o2
0.0
1.0
0.3
b
b1
o2
0.0
1.0
0.3
b
b1
o2
0.0
1.0
0.3
Em.SD
lnlk
N"100; J"50
M"20
!0.0011
(0.1261) !1698.98
1.1011
(0.0553)
0.5088
(0.0433)
N"100; J"50
M"16
!0.0486
(0.2188) !1985.04
0.9786
(0.0590)
0.4853
(0.0737)
0.0002
(0.0152)
!0.4870
(0.0502)
!0.7052
(0.4808)
0.3288
(0.0754)
(N"10, J"25)
(N"100, J"25)
M"4
!0.0068
(0.1648) !1215.31
1.0385
(0.0632)
0.2226
(0.0654)
M"8
0.0038
(0.1489) !1209.09
1.0115
(0.0622)
0.2753
(0.0703)
(N"10, J"45)
(N"100, J"5)
M"(4; 16)
!0.0058
(0.1130)
!428.65
0.9977
(0.0828)
0.3030
(0.0827)
N"500; J"50
M"4
0.0056
(0.1678) !10775.65
1.0723
(0.0349)
0.1647
(0.0413)
M"10
0.0066
(0.2123) !10667.62
1.0738
(0.0547)
0.1832
(0.0710)
N"500; J"100
M"36
!0.0078
(0.1237) !21263.68
1.0115
(0.0577)
0.2798
(0.0769)
CPU
53.62
161.72
8.43
17.94
6.32
92.99
201.39
1373.98
Mean
Em.SD
lnlk
N"100; J"50
M"30
0.0049
(0.1237) !1696.54
1.0709
(0.0576)
0.5381
(0.0444)
N"100; J"100
M"20
!0.0246
(0.1696) !3966.58
0.9982
(0.0350)
0.4989
(0.0500)
!0.0002
(0.0112)
!0.4964
(0.0334)
!0.9167
(0.3108)
0.3034
(0.0433)
(N"10, J"25)
(N"100, J"25)
M"16
0.0054
(0.1158) !1207.22
0.9905
(0.0654)
0.3110
(0.0793)
M"(4; 16)
0.0038
(0.1163) !1207.25
0.9945
(0.0620)
0.3053
(0.0734)
N"50; J"50
M"8
!0.0036
(0.1178) !1107.56
1.0128
(0.0502)
0.2809
(0.0512)
N"500; J"50
M"20
!0.0030
(0.1681) !10648.84
1.0112
(0.0716)
0.2754
(0.0950)
M"30
0.0103
(0.1340) !10639.28
0.9791
(0.0749)
0.3219
(0.0988)
N"500; J"100
M"42
!0.0083
(0.1108) !21259.62
1.0018
(0.0601)
0.2937
(0.0800)
CPU
81.63
361.02
34.79
29.74
13.62
370.80
607.10
1572.32
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
b
b1
o2
Mean
126
True
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
127
but have smaller variances than those with o"0.3. The estimates of bs have
slightly larger upward biases. The recursive algorithm is numerically stable for
models with large o and those with more regressors. For the latter, four more
regressors, say x to x , are introduced in addition to the constant term and
3
6
the regressor x in (9). x is a uniform random variable. x is an ordered discrete
3
4
variable taking values from 1 to 5, of which their occurrence probabilities are,
respectively, 0.1, 0.2, 0.2, 0.3, and 0.2. x is a dichotomous indictor with equal
5
probabilities for its two categories 0 and 1. All these three additional regressors
are i.i.d. for all i and j. The fourth additional regressor x is a uniform random
6
variable which is independent across groups but is invariant for members within
a group. The true coe$cients of these additional regressors are set to
(b , b , b , b )"(0.5, 0.0, !0.5, !1.0). Except the estimates of b , all the
3 4 5 6
6
estimates have small biases. With J"50, the estimate of b has a 30% down6
ward bias. When J increases to 100, the downward bias is reduced to only 10%.
For x , because it is a &time' invariant variable, the group dimension J plays
6
a crucial role.
All the preceding results are for balanced panels. It remains of interest
to investigate the estimation of unbalanced panels. The preceding results in
Table 1 indicate that, for N"10, M"4 is su$cient but for N"100, M"16
will be appropriate. In an unbalanced sample, some panels might have small
N but the others have large N. An issue for unbalanced panels is on the selection
of M. The third part of Table 3 reports Monte Carlo results on the estimation of
models with unbalanced panels. The heading with (N"10, J"25) and
(N"100, J"25) refers to unbalanced panels with a total of 50 groups; among
them half have N"10 and the remaining half have N"100. With these
unbalanced panels, M"4 is insu$cient as much downward bias appears in the
estimate of o. M"16 is needed. These results are expected as the larger M is
needed for long panels. With long panels in a sample, the presence of short
panels will not ease the demand for the larger M but will not pose additional
burden either. However, the strategy for the selection of a single, su$ciently
large M to accomodate panels of various lengths is conservative but expensive.
A better strategy may be to select varying M for each group. In Table 3,
M"(4;16) refers to the selection of M"4 for groups with N"4 but M"16
for N"100. The results indicate the latter strategy is desirable. Its estimates are
slightly more accurate, the time cost is less, and the lnlk value is similar to the
one of a constant M"16. The unbalanced panels design with (N"10, J"45)
and (N"100, J"10) provides additional evidence.
The remaining part of Table 3 reports estimates for samples with N"500.
These estimates provide more evidence on the need of using a larger number of
Gaussian points when the &time' dimension N becomes larger. The case with
N"500 was considered impossible in Borjas and Sueyoshi (1994) with the
conventional Gaussian quadrature. Our algorithm shows no numerical problem
as all replications converge. The large lnlk values con"rm the numerical
128
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
Table 4
Fix e!ect model
True parameter: b "1
2
N
J
10
10
50
100
100
500
500
(10
(100
(10
(100
50
100
50
50
100
50
100
45)
5)
25)
25)
Mean
Em.SD
JH
Iter
CPU
b
2
b
2
b
2
b
2
b
2
b
2
b
2
b
2
1.1655
1.1637
1.0282
1.0118
1.0132
1.0031
1.0030
1.0663
0.1395
0.0997
0.0409
0.0256
0.0190
0.0127
0.0087
0.0799
45.37
90.97
49.99
50.00
100.00
50.00
100.00
46.02
5.84
5.98
5.51
5.25
5.44
5.06
5.15
5.67
0.05
0.12
0.27
0.62
1.29
2.58
5.38
0.11
b
2
1.0217
0.0374
47.80
5.44
0.32
impossibility with the conventional quadrature.3 The estimates of bs are again
unbiased for all Ms from 4 to 30. With a small M, the magnitude of bias for o is
larger than those in Table 1 with small and moderate Ns. With M being 20 or 30
for J"50 and M"36 or larger for J"100, the biases of o become reasonably
small. However, their Em.SDs are larger than those of N"100 in Table 1. The
latter's poor statistical property must be due to the possibility that the Gaussian
quadrature approximation becomes poorer as N becomes larger.
For comparison, some results on the estimates of the "xed e!ect probit panel
model are provided in Table 4. The FEPE can be e!ectively derived from the
Newton}Raphson algorithm as described in Hall (1978). Borjas and Sueyoshi
(1994) compared the performance of the FEPE with the MLE of a random e!ect
model with N"100 and M"4. Here we supplement their comparisons with
a few of our Monte Carlo designs. The FEPE provides estimates of b and u s. In
2
j
a group j, if all its members have the same discrete response, it is known that the
FEPE of u will be in"nity. In Table 4, JH refers to the (average) number of
j
groups in a sample where not all members of a group have the same response.
The estimates are consistent only if N goes to in"nity (Chamberlain, 1980).4 The
3 We experimented with the algorithm in (8) for a case with N"500, J"50 and M"20. That
algorithm did not encounter any numerical under#ow problem in that case. It provided similar
coe$cient and likelihood estimates but its CPU time cost was 582.77 s per replication. The latter is
much more than the 370.82 s for the recommended iterative algorithm.
4 It has been shown in Chamberlain (1980) that if N remains "nite when J goes to in"nity, the
FEPE of b may not be consistent. The FEPE of b is consistent and distribution-free with respect to
the distribution of u if N goes to in"nity.
L.-f. Lee / Journal of Econometrics 95 (2000) 117}129
129
FEPE is computationally simpler and inexpensive even with large N. The
number of iterations (iter) for convergence of the Newton}Raphson algorithm is
almost invariant with respect to N and J. The FEPEs of b have larger biases
2
and variances than those of the random e!ect estimates for models with small
N"10. But as N increases to 50, the bias is reduced and is only slightly larger
than that of the random e!ect estimate (in Table 3). For N"100 or large, the
estimates of b are unbiased. For N"50 or larger, the Em.SDs of the FEPEs of
2
b can even be smaller than those of the random e!ect estimates. In conclusion,
2
the FEPEs can be preferred to the Gaussian quadrature random e!ect MLEs
when N is large. But for small or moderate N, the FEPE would not be a better
procedure. This comparison con"rms once more the conclusion in Borjas and
Sueyoshi (1994).
Acknowledgements
I appreciate having valuable comments and suggestions from two anonymous
referees and an associated editor. Financial support from the RGC of Hong
Kong under grant no. HKUST595/96H for my research is gratefully acknowledged.
References
Abramowitz, M., Stegun, I., 1964. Handbook of mathematical functions with formulas, graphs, and
mathematical tables. National Bureau of Standards Applied Mathematics Series No. 55, US
Government Printing O$ce, Washington, D.C.
Borjas, G.J., Sueyoshi, G.T., 1994. A two-stage estimator for probit models with structural group
e!ects. Journal of Econometrics 64, 165}182.
Butler, J.S., Mo$tt, R., 1982. A computationally e$cient quadrature procedure for the one-factor
multinomial probit model. Econometrica 50, 761}764.
Chamberlain, G., 1980. Analysis of covariance with qualitative data. Review of Economic Studies 47,
225}238.
Hall, B.H., 1978. A general framework for time series-cross section estimation. Annales de l'INSEE
30}31, 177}202.
Heckman, J.J., Willis, R.J., 1975. Estimation of a stochastic model of reproduction: an econometric
approach. in: Terleckyj, N. (Ed.), Household Production and Consumption. Cambridge University Press, New York, NY.
Lee, L.F., 1996. Estimation of dynamic and arch tobit models, Department of Economics. HKUST
Working paper no. 96/97-2.
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 1992. Numerical Recipes, 2nd
Edition, Cambridge University Press, New York.
Stroud, A., Secrest, D., 1966. Gaussian Quadrature Formulas. Prentice-Hall, Englewood Cli!s,
NJ.