Improving parameter tests in covariance

COMPUTATIONAL
STATISTICS
& DATAANALYSIS
ELSEVIER

Computational Statistics & Data Analysis 26 (1997) 177-198

Improving parameter tests
in covariance structure analysis
Ke-Hai Yuan, Peter M. Bentler*

Department of Psychology, University of California, Los Angeles, CA 90024-1563, USA
Received 1 August 1996; received in revised form 1 March 1997

Abstract

In many areas, covariance structure analysis plays an important role in understanding how the relationship among observed variables might be generated by hypothesized latent variables. Once a model
is established as relevant to a given data set, it is important to evaluate the significance of specific
parameters, such as coefficients of regressions among latent variables, within the model. The popular
z-test of a parameter is the estimator of the parameter divided by its standard error estimator. A valid
z-statistic must be based on a high-quality standard error estimator. We focus on the quality of the

standard error estimator from both MLE and ADF methods, which are the two most frequently used
methods in covariance structure practice. For these two estimation methods, empirical evidence shows
that classical formulae give "too optimistic" standard error estimators, with the result that the z-tests
regularly give false conclusions. We review one and introduce another simple corrected standard error
estimator. These substantially improve on the classical ones, depending on distribution and sample size.
Two implications of this study are that significant parameters as printed in most statistical software may
not be really significant, and that corrected standard errors should be direct output for the two most
widely used methods. A comparison of the accuracy of the estimators based on these two methods is
also made. @ 1997 Elsevier Science B.V.

Keywords: Z-scores; Corrected standard error; Bias; Mean square error; Maximum likelihood; Asymptotically distribution free

I. Introduction
In the social and behavioral sciences, variables such as attitudes, personality traits,
mental health status, political liberalism-conservatism, etc., are often of interest to

*

Corresponding author.


0167-9473/97/$17.00 @ 1997 Elsevier Science B.V. All rights reserved
PH S0 1 6 7 - 9 4 7 3 ( 9 7 ) 0 0 0 2 5 - X

178

K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 17~198

researchers. Attributes such as intellectual abilities of students or teaching styles of
instructors are important concepts in education. The relationship between demand
and supply is very important to a market economy. Abstract concepts such as these
typically cannot be observed or measured directly, or can only be measured with
errors; hence, an appropriate data analysis must take into account not only the statistical relations involved among such hypothesized latent constructs but also the errors of measurement in the variables. Structural equation models such as confirmatory
factor analysis and general linear relations have been developed to provide practical
and accurate ways of dealing with multivariate analysis with latent variables (e.g.,
Bentler and Wu, 1995; J6reskog and S6rbom, 1993). Recent reviews of some of the
vast literature in this field have been given by Austin and Calder6n (1996), Bentler
and Dudgeon (1996), Browne and Arminger ( 1995 ), Hoyle (1995), Marcoulides and
Schumacker (1996), and Tremblay and Gardner (1996).
Data analysis with structural models requires parameter estimation, ideally with an
efficient estimator such as the maximum likelihood or minimum 7~2 estimator. Associated with an estimator are two key statistical tests. The first is a goodness of fit

test to evaluate whether a proposed model is relevant to a particular set of data.
One of several asymptotic )/2 tests is usually used for this purpose; we review
two such tests below. If the ~2 is small compared to its degrees of freedom, the
model provides a plausible representation of the data; on the other hand, a large 7~2
implies an inadequate model. While serious questions have been raised about the
adequacy of some of these )~2 tests (e.g., Bentler and Dudgeon, 1996; Chou et al.,
1991; Curran et al., 1996; Hu et al., 1992), for purposes of this paper we assume
that an adequate model test procedure is available. The second key statistical test is
invoked primarily when a model hypothesis has been accepted, i.e., the model can be
considered appropriate for the data at hand. In that case, it is important to evaluate
particular parameters in the model to see whether they are statistically necessary.
A more parsimonious model will result when pruning nonsignificant estimates. Typically, an asymptotic z-test, based on the ratio of an estimate to its asymptotic standard
error estimate, is used to evaluate the necessity of a given parameter. In this paper
we study the quality of the standard error estimates in covariance structure analysis
and use the typical factor analysis model as an example. We find that the routinely
used standard error estimates are much more biased than has been suspected, and in
particular, tend to be substantial underestimates of actual sampling variability so that
parameters in these models that should truly be considered as nonexistent or zero,
are taken as statistically significant instead. As a result, models may contain many
superfluous parameters that not only make the models much more complex than

necessary, but also the models may not replicate when evaluated on new samples.
We study this problem both theoretically and empirically, and make some specific
recommendations for use in data analysis.
The confirmatory factor analytic model is the most widely used example of this
class of models. This model expresses observed variables as functions of hypothesized latent factors and errors as

X = # + A f +e,

(1.1)

K.-H. Yuan, P.M. Bentlerl Computational Statistics & Data Analysis 26 (1997) 177-198

179

where X represents the vector of observed indicator variables, f is a vector of latent
variables, e is a vector of unique factors or random errors, and A is a matrix of
regression coefficients called factor loadings. Assuming the means of f and e are
zero, then /~ = E(X). The usual estimator for /~ is X, the sample mean of the observed data. However, the parameter matrix A cannot be estimated from the first
moment of X. Let 2; = var(X), • = v a r ( f ) , T = var(e). Assuming that the factors
f and errors e are uncorrelated, the covariance structure of (1.1) is


2;(O)=ACbA' + T,

(1.2)

where 0 is the vector of unknown elements in matrices A, ~b, and 5u. Eq. (1.2)
describes the covariance structure of X through the factor analysis model (1.1). It
can also be written as X = (A, I ) ( f ' , e')'. More generally, we have X = A~ where the
matrix A----A(7) is a function of a basic vector of parameters, and the underlying
generating vector ~ contains measured, latent, or residual random or fixed variables
(e.g., Anderson, 1989; Bentler, 1983; Satorra and Neudecker, 1994), with a corresponding covariance structure 2;(0). These types of models can be estimated and
tested based on the sample covariance matrix. Let XI,... ,XN be a sample of size
N = n + 1 from population X, and S be the usual sample covariance. If X is of dimension p, then the number of nonduplicated elements in S is p* = p(p + 1)/2. Let
q be the number of unknown elements in 0, then the model will not be of interest
unless q < p*. For example, in model (1.2), we can restrict 7j to be a diagonal
matrix, 4~ to be a correlation matrix, and each row of A to have only one or a few
nonzero elements. Estimation of the parameters is performed under the assumption
of multivariate normality by maximum likelihood (ML), or without any distributional assumptions, using minimum chi-square, which is known in this field as
asymptotically distribution free (ADF).
We can illustrate the parameter testing problem using data from Harlow et al.

(1995), who studied a variety of scales and measures regarding psychosocial functioning, sexual experience, and substance use in an attempt to understand the
antecedents of HIV-risky sexual behaviors. Thirteen of their variables, a small subset of their study variables, were made available to us. The authors had strong reasons
to believe that four correlated factors based on 32 parameters could explain these
91 covariances, i.e., a confirmatory factor model with 59 degrees of freedom was
hypothesized. The authors posited that their 13 variables X could be generated by
(1.1) with A(13 x 4 ) , f ( 4 x 1), and e(13 x 1). For our purposes, we added an extra
free factor loading 2ij to be estimated in A, permitting one variable to be influenced
by more than one factor. Based on a sample of n = 2 1 3 women, this model was
estimated using both ML and ADF since the data do not have a multivariate normal
distribution. The resulting model fit the data acceptably. We then tested our extra
factor loading and obtained 2ij=0.410 with S.E.=0.140 for a z-test of 2.92 (for
ML) and 2ij = 0.296 with S.E~.= 0.115 for a z-test of 2.58 (for ADF). Evidently, this
extra factor loading is necessary to the model. However, as we shall show below,
at this sample size ML and ADF produce standard error estimates that are markedly
below the actual standard errors, being too small in size by a factor of 1.4-2.6.

180

K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis" 26 (1997) 17~198


Using a more accurate standard error estimate would show that our conclusion using
existing methods would be clearly in error.

2. Two major methods
When the sample )(1 .... ,XN is obtained from a normal population X ~ N ( p , X ) ,
then nS follows a Wishart distribution. The ML estimator (MLE) On can be obtained
by minimizing

F~(0) = tr[SZ-'(O)] - log IS2:-l(0)l - p,

(2.1)

and T1 =nFl(0n) is used to judge the adequacy of the structure X(O). Under some
standard regularity conditions,

x/~(On - 0o) ~ N(O, (2,),
where (2~ = 2 { a ' ( X - l ® X - l ) ~ } -1 is the inverse of the information matrix with ~ =
8 vec(X(O))/80'. Thus, an estimate of the standard error of 0n can be calculated
based on F2j/n with estimator
8, = 2[~'(0,){X-'(0n)®X-'(0n)}6(0,)]-~.


(2.2)

Since T1 is just the likelihood ratio statistic associated with the Wishart distribution, T~~ )~p.2_ q' and a critical value from )~p2._ q is often used to judge the significance o f T1. Parameter tests can be based on z-statistics calculated as z = 0i/S~E.,
where S.E. is obtained as the square root from the appropriate diagonal element
of (2~/n. When data are normal and sample sizes are large enough, these z-statistics
are approximately normally distributed (e.g., Gerbing and Anderson, 1985), and so
the univariate normal distribution can be used to test the significance of the given
parameters. Unfortunately, this optimality property can break down in small samples,
and generally will break down in samples which violate the assumed multivariate normality condition, as reviewed by Bentler and Dudgeon (1996). For example, Curran
(1994) performed an extensive Monte Carlo study of three 9-variable models evaluated under a normal and two nonnormal conditions, at sample sizes 100, 200, and
1000. He concluded that "the ML standard errors were significantly negatively biased
as a function of increasing nonnormality. Under severely non-normal distributions,
the ML standard errors for the factor loadings were underestimated by more than
50% across all three sample sizes. This underestimation was even more pronounced
for the standard errors of the uniquenesses. These results provide further evidence
that great care must be taken when interpreting the significance of z-ratios for parameter estimates in models based on non-normally distributed data regardless of sample
size" (p. 197). We study these problems in further detail below.
Since real data are seldom normal and are usually characterized by skewness
and positive kurtosis, it seems appropriate to pay more attention to methods which

are valid under nonnormal data. Anderson and Amemiya (1988) and Amemiya and
Anderson (1990) gave conditions under which some normal theory standard errors

K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 177-198

181

and the test T~ are asymptotically valid even when X is not normally distributed; see
also Satorra and Neudecker (1994). For the factor analysis model (1.1), one of their
main conditions is that f and e are independent and the elements of e are also
independent. Their results makes the normal theory method valid for a wide class
of distributions. However, there is no effective way to verify their assumptions in
practice, and the structure (1.2), for example, only requires that f and e are uncorrelated. Empirical results of Hu et al. (1992) indicate that inference based on T~ can
be misleading if Anderson and Amemiya's independence condition is not met. We
shall evaluate whether this may also occur for standard error estimates.
A theoretical breakthrough in the development of covariance structure analysis
for data which may be arbitrarily distributed was made by Browne (1984) and
Chamberlain (1982). Let vech(.) be an operator which transforms a symmetric matrix
into a vector by picking the nonduplicated elements of the matrix, Yi = vech{(Xi )?)(X/ - )?)'}, and I7 and Sy be the usual sample mean and sample covariance of Y/,
then 17= nvech(S)/N. Modeling s = vech(S) by o-(0)= vech(X(0)) is asymptotically

equivalent to modeling Yi by a(0). Let V~=var(s), then V~~ V =var[vech{(X,.#0)(X~- #0)'}]. Since Sr is a consistent estimator of V, Browne (1984) proposed to
estimate 00 by minimizing
F2(0) = (s - a(O))'S~'(s - a(0)).

(2.3)

Let 0n be the corresponding estimator, then
x/~(0, - 00) -~ N(0, (22),
where ~c~2 = ( o - t V - l o - ) - I

for which

(22 = {d'(O, ) S ~ d ( O , )}-'

(2.4)

is a consistent estimator. The corresponding statistic T2 = n&(0,) asymptotically follows a chi-square distribution with degrees of freedom p * - q. This method is asymptotically valid for any distribution with finite fourth-order moments (Browne assumed
eighth moments). When the model size is large, the ADF method often does not
converge to a solution, and T2 rejects a correct model too often for the converged
solutions (see Hu et al., 1992 for p = 15, and Curran et al., 1996 for p = 9 ) . The

minor modification T* = 7"2/(1 + T2/n) substantially improves the performance of T2
(Yuan and Bentler, 1997). With regard to standard errors, the standard error estimator based on (2.4) can match the empirical variability across samples in a Monte
Carlo study for p = 6 (Chou et al., 1991), but it can also break down except at the
largest sample sizes (Henly, 1993). Curran (1994), in the study referred to previously, found that ADF standard errors were significantly negatively biased in normal
samples with Ns of 100 and 200, but also "cannot be trusted even under conditions of multivariate non-normality at samples of N = 1000 or less" (p. 198). It is
our expectation that for p = 15 there will be a larger discrepancy between empirical
standard error of 0, and that based on (2.4), leading to incorrect parameter tests at
all but the largest sample sizes. The dimension of a data set can really decide the
quality of ADF methods for a practical sample size, as will be demonstrated below.

K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 17~198

182

Clearly, there is a need to correct the standard error estimates for both MLE and
ADF. We consider two corrections. One is well known but almost never used. The
other is not known in this context.
When using a method based on normal theory such as ML, the standard error given
by t)~ is asymptotically correct when data actually are normal. We will call this the
standard error "predicted" under normality. However, when data are not normal, the
resulting standard error based on ~)~ will not be correct in general. For example, when
data are sampled from an elliptical distribution with a coefficient of kurtosis larger
than 1, the true asymptotic standard error of 0, will be underestimated when based
on 81. Let Dp be the duplication matrix as defined by Magnus and Neudecker (1988,
p. 49), then it has been known for over a decade (e.g., Bentler, 1983; Browne, 1984;
Bentler and Dijkstra, 1985) but almost completely ignored (Bentler and Dudgeon,
1996), that the MLE has the general covariance matrix

v (O.

-

00) ~ N ( 0 ,

where ~21c is consistently estimated by
~r)l c

=

^

{G (o.) wsy
.t

^



^

^

(2.5)

and W = 5D'p{Z
1 , - 1 (O,)®Z-l(O,)}Dp. This covariance matrix is sometimes known as
the sandwich-type matrix, and is computed as the "robust" covariance matrix in EQS
(Bentler, 1995). Another version of (2.5) is given by Arminger and Schoenberg
(1989) in the framework of pseudo-likelihood estimator. If the pseudo-likelihood
is based on a normal density function, it can be shown that the objective function (18) in Arminger and Schoenberg (1989) is equivalent to FI(0) in (2.1) and
their covariance matrix (12) is equivalent to t21c after taking expectations. Arminger
and Schoenberg state that their sandwich-type covariance is computationally simple. Since their equation (25) includes several terms (involving second derivatives)
whose expectations are zero, implementation of their Eq. (12) actually needs more
computations than given for (2.5). Our empirical experience is that including terms
whose expectations are zero in calculating standard errors has a nonsignificant difference from those given by (2.5) even in small samples. Since, besides St, all the
quantities in (2.5) are computed automatically when using the ML method, ~)~c can
be easily incorporated into existing software. Hence, we will concentrate on study
(2.5) in the rest of this paper.
Regardless of the distribution conditions, the standard error of O, based on Qlc
is always asymptotically correct, so we will call it the "corrected" standard error.
However, not much is known about its finite sample performance in this context.
For p = 6 , Chou et al. (1991) gave an extensive study of different estimators of
standard errors, and the sandwich-type matched the empirical variability very well.
Similar results were obtained by Chou and Bentler (1995). For p = 9, Curran (1994)
concluded that the robust standard errors were negatively biased at his smallest sample size but were unbiased at his moderate and large sample sizes. It is not known
whether this estimator will perform well with larger models, e.g., for p = 15. In
line with Curran, we suspect that the standard errors based on the sandwich-type

I(2-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 177-198

183

covariance matrix will still underestimate the empirical variability in small samples
and under various conditions of nonnormality.
Next, we propose a new correction to the estimator of the asymptotic covariance of the ADF estimator 0n. As noted above, when based on the ADF estimation
process, the asymptotic covariance of v~(0n - 00) is given by 02. If we knew the
population fourth-order moment matrix V, we could use V -j instead of S~-1 in ~2.
It is easy to understand that a more accurate estimator of V-~ would lead to a more
accurate estimator of f22, and consequently, to more accurate standard errors for 0n.
Let Z1,...,ZN be a sample from Nm(#, V) and Sz be the sample covariance matrix.
Then it is known that the unbiased estimator of V -1 is given by ( n - m - 1)/nSz l,
with n = N - 1. Obviously, V -1 is overestimated by Sz 1 itself. Even though the Y,.
are not normal vectors in the ADF method, it is hard to imagine that there is an
advantage to S{-1 over [ ( n - p * - 1 ) / n ] S ~ 1 in estimating V -l in f22. The estimation
of V -I by Sr-1 as suggested by Browne (1984) and Chamberlain (1982) was motivated by the consistency property. However, this property is maintained when using
[ ( n - p * - 1 )/n]S~ 1 to estimate V 1. Motivated by this approach, and by the discrepancy between the empirical standard errors and those based on ~2 (e.g., Henly,
1993; Chou and Bentler, 1995), we propose to estimate the asymptotic covariance
of v/-n(0n - 00 ) by
~)2c -

n

n-p*-I

~2.

(2.6)

Since the standard error based on ~2 is too optimistic for an ADF estimator 0n,
standard errors based o n ~2c should match the empirical ones more closely. In parallel
to the standard error of 0n, we call the standard error based on 02 and ~e'~2c the ADF
"predicted" standard error and the ADF "corrected" standard error, respectively. It
should be noted that, unlike the relation between f)l and f)lc, both ~'22 and f22c are
asymptotically correct for estimating the covariance of v ~ ( 0 n - 00). Any difference
will be seen on!y at finite sample sizes.
Beside the two major methods discussed above, several other methods exist that
can be used for both normal and nonnormal data. When enough instrumental variables
are available for a proper structural model, Bollen's (1996) two-stage least-squares
method can be used for estimating the parameters and their standard errors in the
structural model. When facing a data set that is significantly nonnormal, a bootstrap
method is always an alternative to ML and ADF (e.g., Bollen and Stine, 1990;
Boomsma, 1986; Chatterjee, 1984; Ichikawa and Konishi, 1995). Especially when
sample size is small, the distributions of MLE or ADF estimator may not be well
approximated by normal distributions as is assumed in asymptotic theory. In such a
case, a well-chosen bootstrap method might still be able to give reliable confidence
intervals for the parameter estimators, but this is not guaranteed as reviewed by
Yung and Bentler (1996). Further, since each bootstrap sample may contain many
repeated observations, convergence can be a practical problem facing the bootstrap
method. As commented by Ichikawa and Konishi (1995), "This implies that bootstrap
methods should be applied with care when the improper solutions are frequent for the
bootstrap samples". The problems of nonnormality of the MLE and ADF estimators

K.-H. Yuan, P.h/L Bentler/ Computational Statistics & Data Analysis 26 (1997) 17~198

184

and reliable implementations of bootstrap methods in this situation are important and
challenging problems for which further research is needed.

3. Empirical standard errors versus asymptotic standard errors

In this section, we compare the empirical standard errors obtained from Monte
Carlo sampling study to the corresponding asymptotic standard errors given above
for MLE and ADF. We also study the sandwich estimator for MLE and our corrected
standard errors for the ADF estimators.
The first model we used is a factor analysis model as in (1.1) with p = 15. The
population covariance is generated by
A=

(i0i)
2
0

,

q~=

/10

0.30
\0.40

1.0
0.50

04o,
0.501,
1.0 /

(3.1)

where 2' = (70, 0.70, 0.75, 0.80, 0.80) and 0 is a vector of 5 zeros. The 7j is a diagonal
matrix which makes all the diagonal elements of 22 equal to 1. In the estimation
process, we restrict the last factor loading corresponding to each factor at its true
value in order to fix the scale of the factors. This restriction is only for model
identification, and has no effect on the quality of the model fitting process nor on
the resulting estimated models. All the other nonzero parameters are set as unknown
free parameters. So q = 33, of which 12 are factor loading parameters.
Three conditions of distributions of variables are used. In the first condition, both f
and e are normal, so that the observed X is a normal vector. In the second condition,
we use condition 5 of Hu et al. (1992), in which f and e are independent normal
vectors multiplied by a common random variable R = v~/v/~52, which is independent
to both f and e. This makes f and e uncorrelated but not independent, and X
is symmetrically distributed. Since ER 2 = 1, and ER 4 = 3, the observed X has the
same covariance structure as that in condition 1, with Mardia's multivariate kurtosis
t / = {(X -/t)'22-1(X - # ) } 2 / p ( p + 1 ) = E R 4 = 3. In the third condition, f is normal
and e is lognormal and multiplied by R = x/3/v~52. The resulting X has the same
covariance structure as in conditions 1 and 2, but the variables are not symmetrically
distributed anymore. For each condition, repeated samples with sizes N = 150, 200,
300, 500, and 1000 are drawn. Both normal theory ML and ADF methods were used
in each sample to estimate the parameters and compute standard error estimates. For
each distribution condition and sample size, 500 replications were performed. The
empirical standard deviation based on 0n across the 500 replications is calculated.
The actual empirical variability of the estimates across the replications is compared
to the average values given by the relevant formulae (2.2), (2.4)-(2.6). The results
based on ML estimation for normal, elliptical, and lognormal error data are presented
in Tables 1-3, while the corresponding results based on ADF estimation are given
in Tables 4 - 6 . In each table, for each sample size, we give the Empirical (Emp),
Predicted (Pred), and Corrected (Corr) standard error estimates. Each entry in these
tables needs to be multiplied by 0.1 to yield the relevant estimate. The quality of

K_-H. Yuan, P.M. Bentler / Computational Statistics & Data Analysis 26 (1997) 177-198

185

Table 1
Normal data, normal theory method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)
N

Standard errors (x 10)

Emp
150 Pred
Corr

0.823
0.803
0.799

0.789
0.806
0.805

Emp
200 Pred
Corr

0.689
0.697
0.696

Emp
300 Pred
Corr
Emp
500 Pred
Con.
Emp
1000 Pred
Corr

0.802
0.793
0.789

0.838 0.838
0.791 0.805
0.793 0.800

0.838
0.805
0.799

0.787 0.806
0.794 0.793
0.791 0.785

0.789
0.802
0.796

0.818
0.807
0.796

0.706 0.711 0.721 0.697
0.700 0.690 0.689 0.695
0.701 0.685 0.689 0.689

0.699 0.676 0.662
0.695 0.686 0.682
0.691 0.681 0.677

0.670
0.695
0.688

0.675 0.661 0.629
0.698 0.690 0.682
0.691 0.686 0.679

0.554
0.567
0.567

0.536 0.558
0.570 0.563
0.571 0.562

0.579
0.560
0.560

0.583
0.568
0.566

0.565
0.569
0.568

0.561 0.571 0.571 0.519
0.567 0.568 0.560 0.556
0.563 0.563 0.559 0.552

0.436
0.439
0.439

0.436 0.428
0.441 0.435
0.440 0.434

0.439
0.433
0.434

0.432
0.440
0.440

0.431 0.417
0.441 0.435
0.441 0.435

0.415 0.456
0.433 0.440
0.431 0.438

0.457
0.440
0.437

0.453
0.434
0.434

0.308 0.317 0.296
0.311 0.311 0.308
0.311 0.311 0.307

0.303
0.306
0.306

0.308
0.310
0.310

0.308
0.310
0.310

0.301 0.318
0.304 0.310
0.304 0.310

0.327
0.310
0.309

0.321 0.306
0.306 0.304
0.306 0.303

0.565 0.560
0.561 0.558
0.559 0.555

0.308
0.306
0.306

0.777
0.799
0.793

0.770
0.790
0.783

0.427
0.430
0.430

Table 2
Elliptical data, normal theory method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)
N

Standard errors (x 10)

Emp
150 Pred
Corr

1.19
0.825
1.09

1.40
0.825
1.12

1.35
0.823
1.11

1.21
0.805
1.08

1.37
0.826
1.15

1.35
0.830
1.14

1.27
0.825
1.14

1.31
0.816
1.12

1.39
0.828
1.13

1.26
0.826
1.10

Emp 1.11
200 Pred 0.706
Corr 0.959

1.09
0.706
0.990

1.07
0.702
0.942

1.08
1.22
0.691 0.713
0.960 0.977

1.15
0.713
0.983

1.13
0.710
0.967

1.12
0.704
0.967

1.04
0.712
0.965

1.11
1.00
1.03
0.711 0.705 0.697
0.957 0.951 0.949

Emp
300 Pred
Corr

0.856
0.574
0.800

0.865
0.573
0.812

0.906
0.569
0.793

0.900 0.947 0.894
0.561 0.578 0.579
0.798 0.811 0.816

0.894
0.574
0.797

0.918
0.569
0.794

0.886
0.575
0.819

0.895
0.574
0.810

0.896
0.566
0.803

0.868
0.562
0.800

Emp 0.641
500 Pred 0.443
Con" 0.636

0.679
0.443
0.642

0.697
0.439
0.630

0.670
0.433
0.632

0.712
0.444
0.644

0.705
0.443
0.652

0.688
0.440
0.635

0.674
0.435
0.636

0.727
0.442
0.648

0.700
0.442
0.638

0.717
0.435
0.64l

0.694
0.432
0.628

0.514
0.312
0.470

0.468
0.309
0.462

0.486
0.306
0.462

0.503
0.313
0.470

0.491 0.527
0.312 0.309
0.469 0.464

0.461 0.535 0.529
0.307 0.311 0.312
0.459 0.478 0.477

0.530
0.308
0.475

0.539
0.305
0.470

Emp
1000 Pred
Corr

0.493
0.312
0.467

1.39
0.820
1.11

1.14
0.808
1.06

the f o r m u l a s is j u d g e d a g a i n s t the e m p i r i c a l s t a n d a r d error. T o m i n i m i z e u se o f
j o u r n a l space, w e o n l y r e p o r t the s t a n d a r d errors c o r r e s p o n d i n g to f a c t o r l o a d i n g s .
T h e s t a n d a r d errors c o r r e s p o n d i n g to f a c t o r v a r i a n c e s an d er r o r v a r i a n c e s f o l l o w a
s i m i l a r pattern.

186

K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 177-198

Table 3
Lognormal error data, normal theory method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)
N

Standard errors (× 10)

Emp
150 Pred
Con.

1.26
0.779
0.994

1.21
0.774
0.988

1.19
0.766
0.955

1.17
0.778
0.988

1.32
0.796
1.04

1.22
0.788
1.02

1.28
0.779
1.00

1.40
0.793
1.03

1.16
0.792
0.989

1.30
0.795
1.01

1.17
0.775
0.961

Emp
200 Pred
Corr

1.04
0.666
0.855

0.983 0.996 0.957
0.661 0.661 0.652
0.854 0.851 0.818

1.04
0.675
0.887

1.20
0.687
0.932

1.09
0.683
0.903

1.14
0.672
0.892

1.12
0.685
0.898

1.04
1.14
0.688 0.683
0.881 0.893

1.05
0.672
0.857

Emp
300 Pred
Corr

0.891 0.864
0.553 0.547
0.741 0.745

0.967 0.908
0.551 0.543
0.758 0.716

0.819
0.548
0.726

0.894
0.557
0.755

0.849
0.550
0.735

0.839
0.543
0.726

0.834
0.558
0.734

0.778
0.559
0.729

0.895
0.547
0.732

0.771
0.542
0.701

0.726
0.430
0.603

0.670
0.433
0.596

0.688
0.436
0.604

0.668
0.432
0.589

0.648
0.425
0.589

0.628 0.636
0.431 0.429
0.572 0.575

0.656
0.420
0.570

0.565
0.416
0.549

0.464 0.457
0.306 0.303
0.431 0.422

0.502
0.299
0.425

0.471 0.489
0.307 0.304
0.429 0.445

0.484 0.470
0.301 0.296
0.435 0.428

Emp 0.666
500 Pred 0.432
Corr 0.594
Emp 0.542
1000 Pred 0.308
Con" 0.455

0.669
0.429
0.592

1.23
0.780
1.01

0.495 0.434
0.305 0.304
0.441 0.422

0.663
0.423
0.578

0.497 0.467
0.301 0.305
0.438 0.432

Table 4
Normal data, ADF method: Empirical (Emp), Predicted (Pred), and Corrected (Con')
N

Standard errors (× 10)

Emp 1 . 7 0
150a Pred 0.522
Corr 1 . 1 9

1.80
0.525
1.19

1.66
0.516
1.17

1.67
0.516
1.17

1.73
0.516
1.17

1.80
0.517
1.17

1.66
0.504
1.15

1.63
0.494
1.12

Emp 1 . 1 0
200 b Pred 0.512
Con" 0.814

1.10
0.516
0.820

1.10
0.508
0.809

1.08
0.504
0.802

1.17 1 . 0 9
0.511 0.515
0.813 0.819

1.16
0.504
0.802

1.15 1 . 1 0
0.503 0.504
0.801 0.802

300

Emp 0.762
Pred 0.470
Con" 0.609

0.700
0.474
0.613

0.736
0.465
0.602

0.790
0.465
0.602

0.776
0.470
0.608

500

Emp 0.515
Pred 0.395
Corr 0.454

0.507
0.396
0.455

0.506
0.390
0.448

0.508
0.390
0.448

0.511 0.521
0.396 0.397
0.455 0.456

Emp 0.336
1000 Pred 0.296
Corr 0.315

0.338
0.296
0.315

0.328 0.318 0.328
0.292 0.291 0.295
0.311 0.310 0.314

aBased on 449 converged samples.
bBased on 493 converged samples.

0.760 0.760 0.766
0.472 0.464 0.459
0.611 0.601 0.594

0.334
0.295
0.315

2.02
0.540
1.23

2.16
2.20
0.541 0.538
1.23
1.22

2.20
0.540
1.23

1.09
0.507
0.806

1.11
0.499
0.793

0.737 0.745
0.464 0.463
0.601 0.599

1.09
0.504
0.802

0.722 0.668
0.461 0.457
0.597 0.592

0.487 0.488
0.391 0.387
0.450 0.444

0.512 0.525 0.506
0.393 0.391 0.389
0.451 0.449 0.447

0.330 0.326
0.291 0.289
0.311 0.308

0.339
0.295
0.314

0.352
0.293
0.313

0.493
0.386
0.443

0.341 0.333
0.291 0.288
0.310 0.307

K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 177-198

187

Table 5
Elliptical data, ADF method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)
N

Standard errors ( × 1O)

Emp 2.02
150a Pred 0.590
Corr 1.34

2.12
0.594
1.35

2.22
0.590
1.34

2.21
0.584
1.33

1.99
0.582
1.32

2.54
0.613
1.39

2.13
0.593
1.35

2.15
0.580
1.32

2.28
0.613
1.39

2.14
0.598
1.36

2.15
0.595
1.35

2.29
0.603
1.37

Emp 1.42
200 b Pred 0.607
Corr 0.966

1.51
0.614
0.977

1.45
0.603
0.960

1.42
0.599
0.954

1.62
0.616
0.980

1.48
0.606
0.964

1.53
0.618
0.983

1.48
0.602
0.957

1.75
0.620
0.986

1.51
0.613
0.976

1.41
0.609
0.969

1.46
0.603
0.959

300

Emp 0.896
Pred 0.566
Corr 0.732

0.893
0.568
0.736

0.894
0.555
0.719

0.822
0.547
0.708

0.922
0.567
0.734

0.933
0.567
0.734

0.961 0.962 0.907
0.566 0.557 0.563
0.733 0.721 0.729

0.895 0.912 0.826
0.561 0.551 0.544
0.726 0.714 0.704

500

Emp 0.645
Pred 0.490
Corr 0.563

0.654
0.487
0.559

0.615
0.480
0.552

0.607
0.472
0.543

0.624
0.486
0.558

0.665
0.485
0.557

0.630
0.482
0.553

0.610
0.475
0.546

0.622
0.485
0.557

0.630
0.482
0.554

0.607
0.476
0.546

0.599
0.470
0.540

0.421 0.430
0.373 0.367
0.397 0.392

0.416
0.365
0.389

0.417
0.375
0.400

0.422
0.374
0.399

0.426 0.415
0.371 0.368
0.396 0.393

0.409
0.372
0.397

0.415
0.373
0.398

0.430
0.368
0.393

0.420
0.363
0.387

Emp 0.436
1000 Pred 0.376
Corr 0.402

Based on 421 converged samples.
b Based on 481 converged samples.

Table 6
Lognormal error data, ADF method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)
N

Standard errors ( × 1O)

Emp 1.19 1.15
150a Pred 0.370 0.369
Corr 0.841 0.839

1.32
0.369
0.840

1.25
0.356
0.809

1.28
0.377
0.856

1.40
0.384
0.873

1.22
0.369
0.839

1.22
0.367
0.834

1.66
0.396
0.900

1.53
0.396
0.900

1.58
0.395
0.898

1.46
0.390
0.888

Emp 0.999
200 b Pred 0.413
Con" 0.657

0.908
0.403
0.642

0.989
0.398
0.633

0.959
0.394
0.627

0.859
0.400
0.636

0.864 0.884
0.401 0.395
0.638 0.628

0.820
0.393
0.626

0.966
0.399
0.634

0.941 0.943
0.399 0.396
0.634 0.630

0.891
0.389
0.619

300

Emp
Pred
Corr

0.628
0.396
0.513

0.627
0.388
0.503

0.575
0.380
0.492

0.628
0.396
0.512

0.645 0.642
0.395 0.392
0.511 0.507

0.652
0.386
0.500

0.632
0.396
0.513

0.673
0.400
0.518

0.652
0.393
0.509

0.585
0.390
0.505

500

Emp 0.461
Pred 0.363
Corr 0.417

0.473 0.455
0.358 0.355
0.411 0.407

0.466
0.359
0.412

0.471 0.459
0.359 0.354
0.413 0.407

0.454
0.352
0.404

0.360
0.293
0.313

0.340
0.298
0.318

0.338
0.298
0.318

0.333
0.292
0.312

0.612
0.397
0.514

Emp 0.322
1000 Pred 0.302
Corr 0.322

0.485 0.467
0.361 0.358
0.414 0.411

0.442 0.478
0.349 0.363
0.401 0.416

0.340
0.299
0.319

0.330
0.293
0.313

0.347
0.296
0.315

a Based on 449 converged samples.
b Based on 490 converged samples.

0.485
0.360
0.414

0.321 0.335
0.297 0.297
0.317 0.317

0.333
0.293
0.313

0.347
0.295
0.315

188

K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 17~198

When data are distributed normally, both the normal theory predicted standard error (based on the inverse of the information matrix) and the normal theory corrected
standard error match the empirical standard error very well for all the sample sizes
studied (Table 1 ). Even though we estimate the large matrix V by St in (2~c, this
does not decrease the accuracy of the estimated standard errors. In Tables 2 and 3,
when data are not normal and Anderson and Amemiya's condition for asymptotic
robustness is not met, there is an obvious pattern among the three kind of standard errors. For all the sample sizes studied, the empirical standard errors are always bigger
than the other two type of standard errors based on formulas, and the normal theory
corrected standard errors outperform the normal theory predicted standard errors in
describing the empirical variability of normal estimators. While there exists difference between normal and nonnormal data when using the corrected standard errors to
describe the empirical standard errors, when data are not normal, the corrected standard errors are insensitive to the symmetry of the data (comparing Tables 2 and 3)
and are a little bit off in estimating the empirical standard errors. Theoretically, the
empirical and the corrected standard errors will approach the population standard
error when sample size N goes to infinity. For the nonnormal data investigated here,
N = 1000 is not large enough. We did not include a larger N (e.g., N = 5000 in Hu
et al., 1992), because we have no doubt about the consistency of f2jc for f2~c, and for
practical data sets with p = 15, people seldom have sample sizes larger than 1000.
These results based on the MLE yield the following conclusion. The corrected
standard error works as well as the predicted standard error when data are normal. When data are not normally distributed, both the corrected standard error and
the predicted standard error underestimate the empirical one, but the corrected standard error is substantially better than the predicted standard error. This is perhaps
not surprising since the inverse of the normal theory information matrix yields the
correct standard errors asymptotically only when the normality assumption is true.
These results also suggest that until a better way to estimate the standard errors of
normal theory estimators can be found, we should use the standard errors based
o n ~'~IcResults for the ADF simulation compare the empirical variability with variability
given by ~2 and ~2c. Unlike the relation between ~1 and ~lc,~both O2 and f22~
are asymptotically correct for estimating the covariance of v/~(0n- 00). However,
since p * = 120 for the factor analysis model used in our simulation, the constants
n/(n-p*-1) that differentiate the two estimators approximately are 5, 2.5, 1.67,
1.32, 1.14 for sample size 150, 200, 300, 500, 1000. As in the normal theory method,
for each condition and sample size, 500 replications were performed, and 0n was obtained for each converged replication. Also obtained were the standard errors based
on f22 and O2~. The empirical standard errors based on all converged samples in each
case were calculated, as were the averaged ADF predicted and corrected standard errors. These results are given in Tables 4 - 6 , where only standard errors corresponding
to factor loadings are presented due to space limitations.
An immediate conclusion from Tables 4 - 6 is that the ADF predicted standard
errors, based on asymptotic theory, are much too small when compared with the

K.-H. Yuan, P.M. Bentlerl Computational Statistics & Data Analysis 26 (1997) 177-198

189

empirical standard errors when sample sizes are small to medium. Even though our
new corrected standard error still underestimates the empirical variability of 0n, it is
more accurate than the classical formula which can be quite misleading. As sample
sizes get larger, the ADF predicted standard errors match the empirical ones better,
and the effect of our correction becomes smaller. This indicates the consistency of
the three types of standard errors. Since for all sample sizes and all distribution
conditions studied here, the corrected standard errors are always better than the
predicted ones in estimating the empirical variability of 0n, we recommend using
the corrected standard errors in practice.
In order to gain some generality to the simulation, another study was done. This
study also used a factor model as in (1.1). We took p = 8, with the factor loading
matrix given by

A=

0.2
0.1

0.4
0

0.6
0

0.8
0

0
0.3

0
0.5

0
0.7

0 \
0.8 ")

(3.2)

The two factors are correlated with cov(fl, f2) -- 0.5 and var(fl ) = var(f2) = 1. The
error variance matrix 7j is diagonal so that the population covariance matrix 2; is
actually a correlation matrix. The three conditions for generating observed data as
used for model (3.1) are also used for model (3.2). For model identification, we fix
•~41 and 282 in the estimation process. With sample sizes 100, 150, 200, 300, 500,
10000 and 500 simulation replications, the standard errors are given, respectively,
in Tables 7-9 for the MLE and Tables 10-12 for the ADF estimators. Even though
the structure in (3.2) is different from that in (3.1) and the loadings are also more
dispersed, the standard errors in Tables 7-12 follow a similar pattern as those in
Tables 1-6. The only noticeable difference is that the corrected standard errors in
Tables 7-12 are not as dramatically improved over the uncorrected ones as in the
previous study, though there is still always an improvement. These results match our
anticipations. Since our proposal for the corrected standard errors is based on considering the statistical theories behind ML and ADF methods for all structural models
as discussed in Section 2, rather than just based on a set of limited simulations,
a similar pattern of standard errors should carry over to other model structures as
well.
When data are elliptically symmetric, Browne (1984) proved that the normal theory
estimator 0n and the ADF estimator 0, have the same asymptotic efficiency. We can
observe this fact by comparing the numbers in Tables 1 and 2 and 7 and 8 to those
in Tables 4 and 5 and 10 and 11, respectively. Even though when N gets larger,
the efficiency of 0n and 0n become more similar for normal and elliptical data, and
for small to medium N, 0, is much more efficient than 0n. This observation will be
further discussed in Section 4. When data are symmetric, Browne (1984) also gave
a formula for the asymptotic covariance ~2~ of the normal theory estimator 0n based
on an estimator of Mardia's multivariate kurtosis. We did not compute the standard
error computed from ~ here, since real data sets are seldom symmetric.

190

K.-H. Yuan, P.M. Bentler I Computational Statistics & Data Analysis 26 (1997) 177-198

Table 7
Normal data, normal theory method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)
N

Standard errors ( x 10)

100

Emp
Pred
Corr

1.59
1.55
1.53

1.41
1.33
1.30

1.78
1.62
1.58

1.53
1.38
1.35

1.10
1.11
1.08

1.18
1.13
1.12

1.32
1.23
1.21

150

Emp
Pred
Corr

1.22
1.25
1.24

1.15
1.08
1.07

1.45
1.30
1.28

1.16
1.12
1.10

0.919
0.901
0.899

0.948
0.919
0.907

1.07
1.00
0.989

200

Emp
Pred
Corr

1.09
1.07
1.07

0.930
0.934
0.922

1.17
1.12
1.10

1.03
0.959
0.952

0.767
0.781
0.774

0.804
0.797
0.788

0.825
0.871
0.866

300

Emp
Pred
Corr

0.844
0.869
0.867

0.773
0.757
0.750

0.919
0.902
0.893

0.830
0.784
0.783

0.631
0.637
0.634

0.657
0.652
0.648

0.709
0.710
0.705

500

Emp
Pred
Con"

0.678
0.671
0.670

0.572
0.585
0.582

0.697
0.697
0.692

0.671
0.604
0.603

0.490
0.491
0.489

0.506
0.504
0.502

0.518
0.547
0.544

1000

Emp
Pred
Corr

0.470
0.471
0.471

0.422
0.413
0.412

0.501
0.492
0.490

0.463
0.423
0.423

0.345
0.345
0.345

0.365
0.355
0.354

0.364
0.384
0.383

Table 8
Elliptical data, normal theory method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)
N

Standard errors ( x 10)

100

Emp
Pred
Corr

2.62
1.68
2.10

2.49
1.39
1.82

3.01
1.94
2.58

2.09
1.43
1.72

1.71
1.11
1.43

1.82
1.13
1.45

2.25
1.26
1.63

150

Emp
Pred
Corr

1.91
1.31
1.70

1.88
1.12
1.51

2.28
1.37
1.82

1.66
1.14
1.44

1.42
0.901
1.21

1.47
0.918
1.21

1.71
1.01
1.33

200

Emp
Pred
Corr

1.70
1.12
1.47

1.53
0.958
1.31

1.98
1.17
1.60

1.46
0.979
1.28

1.22
0.781
1.07

1.23
0.798
1.08

1.38
0.884
1.20

300

Emp
Pred
Corr

1.30
0.899
1.23

1.22
0.771
1.09

1.49
0.920
1.28

1.22
0.804
1.09

1.00
0.641
0.906

1.03
0.654
0.906

1.14
0.718
1.01

500

Emp
Pred
Corr

1.08
0.681
0.971

0.938
0.593
0.852

1.16
0.709
1.03

0.979
0.608
0.864

0.728
0.492
0.707

0.771
0.506
0.712

0.841
0.551
0.789

1000

Emp
Pred
Corr

0.782
0.474
0.718

0.725
0.417
0.632

0.888
0.497
0.758

0.699
0.424
0.642

0.549
0.346
0.522

0.606
0.356
0.538

0.615
0.386
0.593

K.-H. Yuan, P.M. Bentler/Computational Statistics & Data Analysis 26 (1997) 17~198

191

Table 9
Lognormal error data, normal theory method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)
N

Standard errors (x 10)

100

Emp
Pred
Corr

2.05
1.49
1.68

1.72
1.19
1.39

2.15
1.49
1.74

1.77
1.30
1.47

1.55
1.03
1.23

1.86
1.04
1.25

1.75
1.09
1.31

150

Emp
Pred
Corr

1.96
1.26
1.61

1.39
0.990
1.19

2.39
1.32
1.66

1.52
1.10
1.35

1.33
0.864
1.09

1.64
0.864
1.12

1.64
0.942
1.24

200

Emp
Pred
Corr

1.84
1.01
1.27

1.28
0.866
1.05

1.67
1.02
1.29

1.53
0.929
1.15

1.31
0.757
0.971

1.73
0.751
0.981

1.34
0.820
1.07

300

Emp
Pred
Corr

1.53
0.854
1.11

1.07
0.720
0.908

1.47
0.864
1.11

1.28
0.764
0.991

0.942
0.622
0.816

1.22
0.619
0.827

1.05
0.675
0.901

500

Emp
Pred
Corr

1.12
0.648
0.865

0.920
0.563
0.748

0.984
0.661
0.864

0.996
0.584
0.786

0.652
0.480
0.629

0.769
0.485
0.642

0.762
0.524
0.708

1000

Emp
Pred
Corr

0.724
0.457
0.636

0.632
0.405
0.564

0.743
0.475
0.652

0.665
0.415
0.583

0.528
0.340
0.473

0.593
0.347
0.494

0.633
0.376
0.545

Table 10
Normal data, ADF method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)
N

Standard errors (x 10)

100

Emp
Pred
Corr

2.06
1.41
1.78

1.84
1.15
1.45

2.30
1.38
1.75

1.81
1.29
1.63

1.36
0.961
1.21

1.82
1.02
1.29

1.85
1.09
1.39

150

Emp
Pred
Corr

1.45
1.15
1.33

1.23
0.971
1.12

1.62
1.17
1.35

1.28
1.02
1.18

1.09
0.804
0.928

1.07
0.822
0.948

1.21
0.896
1.03

200

Emp
Pred
Corr

1.25
1.01
1.12

1.09
0.859
0.952

1.25
1.02
1.13

1.07
0.907
1.00

0.823
0.715
0.793

0.889
0.729
0.808

0.995
0.795
0.882

300

Emp
Pred
Corr

0.924
0.826
0.882

0.789
0.713
0.762

0.970
0.853
0.911

0.833
0.742
0.793

0.667
0.601
0.642

0.674
0.613
0.654

0.727
0.667
0.712

500

Emp
Pred
Corr

0.699
0.651
0.677

0.582
0.564
0.586

0.712
0.673
0.699

0.625
0.584
0.607

0.525
0.473
0.492

0.505
0.484
0.504

0.574
0.527
0.548

1000

Emp
Pred
Corr

0.472
0.463
0.472

0.385
0.405
0.413

0.470
0.484
0.493

0.438
0.417
0.425

0.350
0.341
0.348

0.357
0.349
0.355

0.378
0.381
0.388

192

K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 177-198

Table 11
Elliptical data, ADF method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)
N

Standard errors ( x 10)

100

Emp
Pred
Corr

3.10
1.91
2.42

3.15
1.46
1.84

3.82
1.85
2.34

2.43
1.67
2.11

1.89
1.12
1.41

2.22
1.16
1.47

2.78
1.29
1.63

150

Emp
Pred
Corr

1.90
1.50
1.73

1.71
1.19
1.37

2.11
1.44
1.66

1.71
1.30
1.50

1.35
0.960
1.11

1.37
0.982
1.13

1.63
1.07
1.23

200

Emp
Pred
Corr

1.68
1.28
1.42

1.50
1.06
1.17

1.72
1.26
1.39

1.41
1.15
1.28

1.17
0.869
0.963

1.14
0.898
0.996

1.34
0.976
1.08

300

Emp
Pred
Corr

1.16
1.06
1.13

1.07
0.886
0.947

1.27
1.07
1.14

1.03
0.951
1.02

0.874
0.753
0.804

0.852
0.769
0.821

1.00
0.826
0.882

500

Emp
Pred
Corr

0.952
0.856
0.889

0.815
0.725
0.753

0.991
0.867
0.902

0.831
0.764
0.794

0.660
0.613
0.637

0.660
0.628
0.653

0.763
0.679
0.706

1000

Emp
Pred
Corr

0.672
0.622
0.634

0.584
0.542
0.552

0.681
0.651
0.664

0.631
0.560
0.571

0.471
0.458
0.467

0.497
0.472
0.481

0.535
0.512
0.521

Table 12
Lognormal error data, ADF method: Empirical (Emp), Predicted (Pred), and Corrected

(Corr)
N

Standard errors (× 10)

100

Emp
Pred
Corr

1.70
1.12
1.42

1.56
0.903
1.14

1.84
1.07
1.35

1.59
1.04
1.32

1.18
0.752
0.951

1.35
0.773
0.976

1.33
0.806
1.02

150

Emp
Pred
Corr

1.20
0.954
1.10

1.14
0.824
0.950

1.46
0.989
1.14

1.14
0.882
1.02

0.971
0.682
0.786

0.915
0.686
0.791

1.02
0.721
0.832

200

Emp
Pred
Corr

1.12
0.893
0.990

1.04
0.760
0.843

1.20
0.916
1.01

1.01
0.822
0.911

0.762
0.624
0.691

0.783
0.648
0.718

0.932
0.694
0.769

300

Emp
Pred
Corr

0.854
0.760
0.812

0.829
0.668
0.713

0.941
0.794
0.848

0.818
0.700
0.748

0.630
0.566
0.605

0.659
0.576
0.615

0.701
0.600
0.641

500

Emp
Pred
Corr

0.704
0.654
0.680

0.626
0.564
0.587

0.728
0.669
0.695

0.656
0.596
0.619

0.553
0.481
0.499

0.554
0.487
0.506

0.586
0.513
0.533

1000

Emp
Pred
Corr

0.532
0.508
0.517

0.456
0.447
0.456

0.546
0.532
0.543

0.506
0.464
0.473

0.392
0.379
0.386

0.400
0.382
0.390

0.442
0.406
0.413

K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 177-198

193

4. Finite sample bias and accuracy
Since both ML and ADF methods can be used for a given data set, an interesting
question is which method should be recommended in practice. Judging by asymptotic
efficiency, the ADF method is as good as the ML method when data are symmetric,
and is generally better than ML for arbitrarily distributed data sets. The finite sample
efficiency implied in Section 3, however, tells a somewhat different story. Thus, we
will compare the advantages of each method based on our empirical results. Even
though 0,, is consistent for 00, Browne (1984) reported a finite sample bias in 0n.
There may also exist finite sample bias in 0n as well. In a small model, Henly
(1993) found that a minimum sample size of 600 was needed before the ML and
ADF methods yielded reliable parameter estimates, though this minimum could go
to 1200 for nonnormal data and even higher for the ADF method.
We shall report the bias as well as the MSE associated with the empirical studies
in the last section. They are defined by
Bias = (0 - 0 0 ) ' ( 0 - 00),

(4.1)

MSE = - -

(4.2)

and

nc

( 0 . ) - 0 o ) ' ( 0 . ) - 0o),
i=1

respectively, where nc is the number of converged samples among the 500 replications; 0~i) is the ith of the nc converged solutions; 0 is the sample mean of the
^
estimated 0,). Each of the nc for normal theory estimator is 500, the n'cS for ADF estimators in the factor model (3.1) are indicated in Tables 4-6. The relative bias and
MSE can be obtained if (4.1) and (4.2) are divided by 0~00. We did not compute
these values because the interest here is restricted to compare the relative advantage of ML and ADF methods.
As for the standard errors, the bias and MSE corresponding to model (3.1) and
(3.2) convey basically the same information, so we only report those corresponding to
model (3.1) here. With respect to bias, there exist differences between the estimators
for all the 33 parameters and those for only the 12 factor loadings. Hence, we present
these results separately in Tables 13 and 14, respectively. For estimators of the
33 parameters, MLE is much less biased than ADF estimators for all distributions and
sample sizes. This confirms Chou and Bentler (1995). With respect to the estimators
of the 12 factor loadings only, the normal theory estimator has an advantage only
when the sample size is small. This is especially true for asymmetric data. Even
when data are normal, the ADF estimator can compete with the MLE when the
sample size is very large. The comparison of Tables 13 and 14 indicates that the
ADF