Effect of outliers on estimators and tes
British Journal of Mathematical and Statistical Psychology (2001), 54, 161±175
© 2001 The British Psychological Society
Printed in Great Britain
161
Effect of outliers on estimators and tests in
covariance structure analysis
Ke-Hai Yuan*
Department of Psychology, University of North Texas, USA
Peter M. Bentler
Departments of Psychology and Statistics, University of California, Los Angeles, USA
A small proportion of outliers can distort the results based on classical procedures
in covariance structure analysis. We look at the quantitative effect of outliers on
estimators and test statistics based on normal theory maximum likelihood and the
asymptotically distribution-free procedures. Even if a proposed structure is correct
for the majority of the data in a sample, a small proportion of outliers leads to biased
estimators and signi®cant test statistics. An especially unfortunate consequence is
that the power to reject a model can be made arbitrarilyÐbut misleadinglyÐlarge
by inclusion of outliers in an analysis.
1. Introduction
Covariance structure analysis (CSA) plays an important role in understanding the relationship
among multivariate data (e.g. Bollen, 1989; Mueller, 1996). Let x1 , . . . , xN be a sample of
dimension p with E(xi ) 5 m, Cov(xi ) 5 S, and let xÅ and S be the sample mean vector and
sample covariance matrix. In CSA, S is ®tted by a structural model S 5 S(u) which
hypothesizes the elements of the covariance matrix S to be functions of a more basic set
of parameters u. The best-known example is the con®rmatory factor analysis model
S 5 LFL9 1 W, where the unknown elements of the factor loading matrix L, the factor
covariance matrix F and the unique variance matrix W are the elements of u. In practice,
covariance structure models S 5 S(u) often are rejected empirically. There are obviously
many possible explanations for such empirical failures, all of which in some way must
involve violation of regularity conditions associated with the theory on which the statistical
tests are based. Of course, any particular model S(u) may be inadequate, for example, it may
omit crucial parameters or re¯ect an entirely incorrect functional form. Earlier technical
development in this respect was given by Satorra & Saris (1985) and Steiger, Shapiro &
Browne (1985), who proposed using the non-central chi-square to describe the distribution of
the test statistics. Recent discussions of structural modelling imply that such theoretical
* Requests for reprints should be addressed to Dr Ke-Hai Yuan, Department of Psychology, University of North
Texas, PO Box 311280, Denton, TX 76203-1280, USA.
162
K.-H. Yuan and P. M. Bentler
inadequacy is almost inevitable in the social and behavioural sciences (de Leeuw, 1988;
Browne & Cudeck, 1993).
In applications of structural equation models, the perspective that any model is only an
approximation to reality dominates the literature. When a model does not ®t a data set as
implied by a signi®cant chi-square test statistic, then many empirical researchers start using
various ®t indices to justify their models (see, for example, Hu & Bentler, 1998). Some of
them may go one step further to modify their models as facilitated by the modi®cation index
in LISREL (JoÈreskog & SoÈrbom, 1993) or the Lagrange multiplier test in EQS (Bentler,
1995). This step was classi®ed as a model generating situation in JoÈreskog (1993). However,
few or none of them question the quality of their data. This phenomenon was noted by
Breckler (1990). JoÈreskog (1993) discussed various possible causes when a model does not
match a data set in practice and emphasized the importance of proper data for proper
statistical results. Even though there exist various approaches for outlier detection (Cook,
1986; Berkane & Bentler, 1988; Bollen, 1989; Bollen & Arminger, 1991; Tanaka, Watadani
& Moon, 1991; Barnett & Lewis, 1994; Bentler, 1995; Cadigan, 1995; Fung & Kwan, 1995;
Lee & Wang, 1996; Gnanadesikan, 1997; Kwan & Fung, 1998) as well as robust procedures
(Huber, 1981; Hampel, Ronchetti, Rousseeuw & Stahel, 1986; Rousseeuw & van Zomeren,
1990; Wilcox, 1997; Yuan & Bentler, 1998a, 1998b; Yuan, Chan & Bentler, 2000), in the
applications of CSA these techniques are seldom used as exempli®ed in numerous publications using CSA in the applied journals in social and behavioural sciences (see, for
example, MacCallum & Austin, 2000). This lack of attention to data quality probably occurs
because there is no analytical development that directly connects the signi®cance of a chisquare test statistic to bad data. Such a development is one of the major foci of the current
paper. Speci®cally, we will show that even if a proposed structure is correct for the majority
of the data in a sample, a small proportion of outliers can lead to biased estimators and
highly signi®cant test statistics.
Several approaches to CSA exist (Bollen, 1989). The two most widely used are the normal
theory based maximum likelihood (ML) procedure and the asymptotically distribution-free
(ADF) procedure (Browne, 1982, 1984). Of these two, the ML method is more popular
among empirical researchers (Breckler, 1990). One reason for this is that the ML procedure is
the default option in almost all the standard software (e.g., LISREL, EQS, AMOS). Another
reason is that there exist various results about the robustness of the normal theory based
ML procedure when applied to non-normal data (Browne, 1987; Shapiro, 1987; Anderson &
Amemiya, 1988; Browne & Shapiro, 1988; Amemiya & Anderson, 1990; Satorra & Bentler,
1990, 1991; Mooijaart & Bentler, 1991; Satorra, 1992; Yuan & Bentler, 1999). The basic
conclusion of this literature is that, under some special conditions, the parameter estimates
based on ML are consistent, some standard errors remain consistent, and the likelihood
ratio statistic can still follow a chi-square distribution asymptotically even when the observed
data are non-normally distributed. The conditions under which normal theory methods can be
applied to non-normal data are commonly referred to as asymptotic robustness conditions.
Perhaps spurred on by a false sense of generalizability of the ML method, many researchers
seldom bother to check the quality of their data. Once again, this phenomenon is probably
due to the fact that there are no analytical results pointing out that asymptotic robustness is
not enough when a data set contains outliers. As our results show, even if the observations in
a sample come from normal distributions, the normal theory based ML still cannot verify the
right covariance structure if there are a few outliers. Of course, the implication of the ADF
Effect of outliers on estimators and tests in covariance structure analysis
163
procedure is that whatever the distribution of a data set may be, this procedure will give a
fair model evaluation when sample size is large enough. Unfortunately, the ADF method also
can be distorted by a few outliers.
In the next section we will give the analytical results for the effect of outliers on the two
commonly used procedures. In Section 3 we will use some real data sets as well as simulation
to illustrate the effect of outliers in practice. Relatively technical proofs will be given in an
appendix.
2. Effect of outliers on model evaluations
We will quantify the effect of outliers on the normal theory based ML procedure and on the
ADF procedure. The following preliminaries will set up the framework for our study.
Let x1 , . . . , xN0 come from a distribution G0 , xN01 1 , . . . , xN come from a distribution G1 ,
N1 5 N 2 N0 and N0 À N1 . Let the mean vectors and covariance matrices of G0 and G1 be
m0 and m1 , and S0 and S1 , respectively. We will consider the structure S(u) such that
S0 5 S(u0 ) for some unknown u0 , but S1 will not equal S(u) for any u. This means that we
have a correct covariance structure if there are no data from G1 . Consequently, we will
consider the N1 cases as outliers in the CSA context. The above formulation is very general
because we do not put any speci®c distributional form on G1 . For example, the commonly
used slippage model (see, Ferguson, 1961) for outlier identi®cation purposes is only a special
case of the above model. When a variety of data contaminations exist in a sample, G1 can be
regarded as a pooled distribution based on many sources of in¯uence (Beckman & Cook,
1983). We have no speci®c interest in identifying cases from G1 . The approach of in¯uence
analysis, as illustrated in Lee & Wang (1996) and Kwan & Fung (1998), should identify
observations from G1 , otherwise, a kind of Type I error is made in identifying the false
outliers.
In the above formulation, we assume that N1 is much smaller than N0 , and the memberships
of the observations are unknown. We may have knowledge about the impurity of the data,
but we do not need to treat G1 as another group. In practice, with observations from multiple
groups, the multiple-group approach (SoÈrbom, 1974) can be used to analyse the data when
group membership is known, and a mixture distribution approach to structural modelling can
be used when the group membership is unknown (Yung, 1997). Here, we have no special
interest in the structure of S1 . The formulation is simply to study the quantitative effect of the
N1 outliers on parameter estimators and test statistics. Since the ®nite-sample properties of
the estimators and test statistics are hard to quantify, we will concentrate on the large-sample
properties.
We assume that n0 /n N p0 and n1 /n N p1 as n grows large, where n 5 N 2 1, n0 5 N0 2 1
and n1 5 N1 2 1. Let xÅ and S be the sample mean vector and the sample covariance matrix
based on the whole sample. It is easily seen that
a.s.
xÅ ¡¡N m
a.s.
and S ¡¡N S ,
(1)
where m 5 p0 m0 1 p1 m1 and
S 5 p0 S0 1 p1 S1 1 p0 p1 (m0 2 m1 )(m0 2 m1 )9.
(2)
If we assume m0 5 m1 , then S 5 p0 S0 1 p1 S1 , but it may be unrealistic to assume S0 Þ S1
while m0 5 m1 . If S1 1 p0 (m0 2 m1 )(m0 2 m1 )9 5 S0 , then the effect of the N1 outliers will
164
K.-H. Yuan and P. M. Bentler
not be observed. If S 5 S(u ) for some u , we can still recover the covariance structure, but
the parameter is shifted with a bias of u 2 u0 . However, because we usually do not observe
the u0 , and chi-square statistics will not discriminate u from u0 , the effect of the N1 outliers
will not be observed either. In the following, we will deal with the general case that
S0 5 S(u0 ) but S1 Þ S(u) for any u. For the purpose of distribution characterization, we
will assume that the proportion of outliers is small so that estimates are still near u0 . Let
vech(·) be the operator which transforms a symmetric matrix into a vector by stacking
the columns of the matrix, leaving out the elements above the diagonal. Let G 5
Cov[vech{(x 2 m0 )(x 2 m0 )9}], where x ~ G0 . For the distribution of s 5 vech(S), writing
s 5 vech(S), we have the following lemma.
Lemma 2.1. If the ®rst four moments of both G0 and G1 exist and are ®nite, then we
have:
p
p
L
(a)
n(s 2 s0 ) N Np (0, G) when n1 / n N 0;
p
p
L
(b)
n(s 2 s0 ) N Np (cg, G) when n1 / n N c, where g 5 vech[S1 1 (m0 2 m1 )
(m0 2 m1 )9 2 S0 ];
p
p
P
(c)
n(s 2 s0 ) N ` when g Þ 0 and n1 / n N ` while n1 /n N 0.
p
Lemma 2.1 tells us that the distribution of n(s 2 s0 ) varies aspthe
proportion of outn and n, the quantity
liers
changes.
When
the
number
of
outliers
is
somewhere
between
p
n(s 2 s0 ) does not converge at all and hence standard statistical theory of CSA cannot be
relied upon. For our purpose of quantifying the distribution
of parameter estimates in CSA,
p
we will be mainly interested in case (b), i.e., n1 / n N c.
In the rest of this paper, a dot on top of a function will denote the derivative (e.g.,
s(u)
Ç
5 s(u)/u9). When a function is evaluated at u0 , we sometimes omit its argument
Ç 5 s(u
Ç 0 )). We also need the following standard conditions for the results to be
(e.g., s
rigorous:
(C1)
(C2)
(C3)
(C4)
(C5)
u0 is an interior point of some compact set Q Ì q .
S0 is positive de®nite and S(u) 5 S(u0 ) only when u 5 u0 .
S(u) is twice continuously differentiable.
Ç
s(u)
is of full rank.
G is positive de®nite.
Under the above conditions we will ®rst concentrate on the ML procedure and then turn to
the ADF procedure.
2.1. Maximum likelihood procedure
Let S be the sample covariance based on a p-variate sample with size N 5 n 1 1. The
approach based on the normal theory ML procedure is to minimize
FML (S, S(u)) 5 tr[SS21 (u)] 2 log |SS21 (u)| 2 p
(3)
à Let Dp be the duplication matrix de®ned in Magnus & Neudecker (1988, p. 49) and
for u.
W(u) 5 221 D9p [S21 (u) Ä S21 (u)]Dp .
Effect of outliers on estimators and tests in covariance structure analysis
165
Under the assumption that N1 5 0, G0 5 Np (m0 , S0 ), and the null hypothesis of a correct
model structure S0 5 S(u0 ),
p
L
n(uà 2 u0 ) N Nq (0, Q)
(4a)
and
L 2
à N
TML 5 nFML (S, S (u))
xp
2q ,
(4b)
21
Ç , p 5 p( p 1 1)/2 and q is the number of unknown parameters in u.
Ç 9Ws)
where Q 5 (s
The assumption G0 5 Np (m0 , S0 ) can be relaxed to some degree and the results in (4) still
hold. Readers who have interests in this direction are referred to Yuan & Bentler (1999) for a
very general characterization.
Still assuming G0 5 Np (m0 , S0 ) and S0 5 S(u0 ) for some unknown u0 , our interest is in
the property of uà and TML when N1 does not equal zero. The following lemma is on the
consistency of uà when the proportion of contaminated data is small.
a,s
Lemma 2.2. Under conditions (C1) and (C2), and if n1 /n N 0, then uà ¡¡N u0 .
When the number of outliers is comparable with the sample size, the estimator uà will no
a.s.
longer converge to u0 ; instead, uà ¡¡N u0 which minimizes FML (S , S(u)); we will not deal
with this situation here. Since u0 is an interior point of Q, when n is large enough uà will
satisfy
à 5 0.
à 2 s(u))
Ã
Ç 9(u)W(
u)(s
s
(5)
à we have the following result.
For this u,
P
Theorem 2.1. Under conditions (C3), (C4) and (C5), if uà N u0 , then:
p
p
L
n(uà 2 u0 ) N Nq (0, Q) when n1 / n N 0;
p
p
L
(b)
n(uà 2 u0 ) N Nq (j, Q) when n1 / n N c, where
(a)
Ç 9Ws)
Ç 21 s
Ç 9Wg;
j 5 c(s
(c)
p
p
P
n(uà 2 u0 ) N ` when n1 / n N ` while n1 /n N 0.
with Lemma 2.1, Theorem 2.1 tells us that the asymptotic distribution of
pComparing
np
in a similar way to that
(uÃ2 u0 ) is decided by the proportion of outliers in the
pdata
of n(s 2 s0 ). The bias in the asymptotic distribution of n(uà 2 u0 ) will increase as the
proportion of outliers increases. The non-centrality parameter (NCP) in the chi-square
distribution, which characterizes the test statistic TML as described in the following theorem,
will increase in a similar way.
P
Theorem 2.2. Under conditions (C3), (C4) and (C5), if uà N u0 , then:
(a)
L
TML N x2p
2q
when n1 /
p
n N 0;
166
(b)
L
TML N x2p 2q (t)
K.-H. Yuan and P. M. Bentler
p
when n1 / n N c, where
Ç 9W]g.
Ç 21 s
Ç s
Ç 9Ws)
t 5 c 2 g9[W 2 Ws(
(6)
The NCP in (6) can be compared with the NCPs discussed in Satorra (1989) and Satorra,
Saris & de Pijper (1991).
p Actually, t is just the W3 in Satorra et al. (1991) when N1 5 N,
c 2 5 n and g 5 O(1/ n). Notice that the assumption of G0 5 N(m0 , S0 ) is used to obtain
simple expressions in Theorems 2.1 and 2.2. When G0 is not normal the bias in uà as an
estimator of u0 still exists. Similarly, even though TML cannot then be described by a chisquare distribution, when N1 > 0 it will be stochastically larger than when N1 5 0.
2.2. The asymptotically distribution-free procedure
Since data sets in social and behavioural sciences may not follow multivariate normal
distributions, Browne (1982, 1984) proposed the ADF procedure. Let yi 5 vech[(x1 2 xÅ )
(xi 2 xÅ )9] and Sy be the sample covariance matrix of yi . The ADF procedure is to minimize
1
FADF (S, S(u)) 5 (s 2 s(u))9S2
y (s 2 s(u))
(7)
Ä We do not need to assume a particular distribution to apply this procedure. Under the
for u.
conditions S0 5 S(u0 ) and N1 5 0,
p
L
n(uÄ 2 u0 ) N N(0, Q),
(8a)
Ç 9G21 s)
Ç 21 and
where Q 5 (s
L
Ä N x2p
TADF 5 nFADF (S, S(u))
2q .
(8b)
Like the normal theory based ML procedure, the ADF procedure is also susceptible to
poor data quality. Now, assuming that S0 5 S(u0 ) but N1 does not equal zero, we have the
following lemma.
a.s.
Lemma 2.3. Under conditions (C1), (C2) and (C5), and if n1 /n N 0, then uÄ ¡¡N u0 .
As in the previous subsection, we are interested in quantifying the effect of outliers on
Ä we usually need to solve the
the distribution of uÄ and on the test statistic TADF . To obtain u,
equation
1
Ä 2
Ä
s
Ç 9(u)S
y (s 2 s(u)) 5 0.
Ä
The following theorem gives the asymptotic distribution of u.
P
Theorem 2.3. Under conditions (C3), (C4) and (C5), if uÄ N u0 , then:
p
p
L
n(uÄ 2 u0 ) N N(0, Q) when n1 / n N 0;
p
p
L
(b)
n(uÄ 2 u0 ) N N(w, Q) when n1 / n N c, where
(a)
Ç 9G21 s)
Ç 21 sÇ 9G21 g;
w 5 c(s
(c)
p
p
P
n(uÄ 2 u0 ) N ` when n1 / n N ` while n1 /n N 0.
(9)
Effect of outliers on estimators and tests in covariance structure analysis
167
Theorem 2.3p
tells
us that in the presence of outliers, the mean vector in the asymptotic
distribution of n(uÄ 2 u0 ) will not be zero. It changes in a similar way to that for the ML
estimator. The bias in uÄ in case (b) is mainly due to the effect of the outliers on S. Even though
1
Ä
the outliers also affect the weight matrix S2
y , such an in¯uence on u is minimal. Actually, as
long as n1 /n approaches zero, Sy will be consistent for G, which is suf®cient for the result in
the theorem to hold. The following theorem is on the corresponding test statistic.
P
Theorem 2.4. Under conditions (C3), (C4) and (C5), if uÄ N u0 , then:
(a)
(b)
L
TADF N x2p
p
when n1 / n N 0;
p
L
TADF N x2p 2q (h) when n1 / n N c, where
2q
h 5 c 2 g9[G21 2 G21 s(
Ç 21 s
Ç s
Ç 9G21 s)
Ç 9G21 ]g.
It is obvious that w 5 j and h 5 t when G0 follows a multivariate normal distribution.
However, as we shall see in the next section, there may exist a big difference between the
estimates by the two methods even when N1 5 0.
From the results in this section, we can see the effect of outliers on the commonly used
inference procedures. Even if a proposed model would ®t the population covariance matrix
of G0 , because of outliers, parameter estimates will be biased and test statistics will be
signi®cant. For example, in a factor analysis model, biased parameter estimates may lead
to non-signi®cant factor loadings being signi®cant and vice versa (Yuan & Bentler, 1998a,
1998b), or there may even be an inability to minimize (2.3) or (2.7) using standard minimization methods. A signi®cant test statistic may discredit a theoretically correct model
structure S0 5 S(u0 ) that a researcher wants to recover (Yuan & Bentler, 1998b). As will be
illustrated in the next section, the NCP associated with a model test can be made larger
for any actual degree of model misspeci®cation by inclusion of a few outliers. These
outliers will seriously distort any power analysis that should be an integral part of structural
modelling.
3. Illustration
We will ®rst use two data sets to demonstrate the relevance of our results to data analysis. In
particular, we will show that a few outliers can create a problematic solution such as a
Heywood case (negative error variance). The NCP estimate, on which most ®t indices are
based, can also be strongly in¯uenced by outliers. Then we will conduct a simulation to see
how the analytical results in Section 2 are matched with empirical data.
The ®rst example is based on a data set from Bollen (1989). It consists of three estimates
of percentage cloud cover for 60 slides. This data set was introduced for outlier identi®cation purposes. Bollen & Arminger (1991) further used a one-factor model to ®t this data
set to study observational residuals in factor analysis. Fixing the factor variance at 1.0, the
maximum likelihood solution uà for factor loadings and factor variances is given in the ®rst
row of Table 1. With the negative variance estimates wà 33 5 2 51.439, this solution is
de®nitely not acceptable. After removing the three cases 52, 40 and 51, which correspond
168
K.-H. Yuan and P. M. Bentler
Table 1. Parameter estimates and biases for cloud cover data from Bollen (1989)
u
uÃ
uà (1)
uà 2 uà (1)
l11
l21
l31
w11
w22
w33
32.432
31.986
0.446
31.457
36.567
25.110
38.145
35.907
2.238
248.770
105.799
142.971
473.781
157.294
316.487
251.439
58.151
2109.590
to the three largest residuals in Bollen & Arminger’s (1991) analysis, the solution uà (1) is given
in the second row of Table 1. In this example, the outliers have a big effect on the estimates
of error variances, but relatively less effect on the factor loadings.
For this example, the ADF procedure will give the same result as the ML procedure
because the model is saturated. Notice that one has no way to evaluate the population
parameter j or w because the true parameter u0 is unknown. We may approximate the bias by
the parameter differences uà 2 uà (1) , which is given in the third row of Table 1. Since the
degrees of freedom are zero, a test statistic is not relevant here.
The second data set is from NBA descriptive statistics for 105 guards for the 1992±1993
basketball season (Chatterjee, Handcock & Simonoff, 1995). We select a subset of four
variables from the original nine. These four variables are total minutes played, points scored
per game, assists per game, and rebounds per game. Yuan & Bentler (1998a) proposed a
one-factor model for these variables. Since the variable `total minutes played’ has a very
large variance, as in Yuan & Bentler (1998a) it is multiplied by 0.01 before the model is
®tted. With factor variance F 5 1.0, the ML estimates based on the entire data set are given
in the ®rst row of Table 2. The likelihood ratio statistic TML 5 15.281, which is highly
signi®cant when referred to x22 . This rejection of the model may be due to a few outlying
observations. Yuan & Bentler (1998a) identi®ed cases 2, 4, 6, 24, 32 as the ®ve most
in¯uential cases. The model was re-estimated without these ®ve cases. The likelihood ratio
statistic is now TML 5 4.918, which is not signi®cant at the 0.05 level. For this example, the
NCP estimate is 13.281 if based on the entire data set. This is more than four times its
corresponding estimate, 2.918, when the ®ve cases are removed. As is well known, almost
all popular ®t indices are based on the NCP estimate (see, for example, McDonald, 1989;
Bentler, 1990). Hence, model evaluation using ®t indices without considering the in¯uence
of outliers would be misleading. Similarly, power analysis without the ®ve in¯uential cases
would lead to quite different conclusions about the one-factor model as compared to an
analysis that would include these ®ve cases (e.g., Satorra & Saris, 1985). The new estimates
uà (1) as well as the approximate biases are given in the upper panel in Table 2. Even though the
effect of the ®ve outliers on parameter estimates is not as deleterious as in the last example,
some of the biases are not trivial.
The corresponding parameter estimates by the ADF approach are given in the lower panel
à However, their effect
of Table 2. The effect of the outliers on uÄ is not as obvious as that on u.
on the model evaluation also is dramatic. The test statistic based on the whole data set is
TADF 5 7.380, signi®cant at the 0.05 level, while the corresponding test without the ®ve
outlying cases is only 4.292. The NCP estimate based on the whole data set is more than
twice of that based on the reduced data set.
While the effect of outliers can be seen in the two practical data sets, not knowing the
true population parameters makes it impossible to evaluate how well the results in Section 2
match empirical research. To obtain better insight into this, we will conduct a simulation
Effect of outliers on estimators and tests in covariance structure analysis
169
Table 2. Parameter estimates and biases for NBA statistics data from Chatterjee et al. (1995)
u
uÃ
uà (1)
uà 2 uà (1)
uÃ
uÄ (1)
uÄ 2 uÄ (1)
l11
l21
l31
l41
w11
w22
w33
w44
7.894
8.161
20.267
5.372
4.800
0.572
1.790
1.812
20.022
1.086
1.008
0.078
16.109
9.580
6.529
7.843
7.353
0.490
2.994
2.510
0.484
0.478
0.328
0.150
8.289
8.243
0.046
4.649
4.936
20.287
1.918
1.834
0.084
0.929
1.004
20.075
7.253
7.464
20.211
6.666
6.557
0.109
2.727
2.616
0.111
0.375
0.338
0.037
study. The S0 is generated by a one-factor model with ®ve indicators, that is,
S0 5 L0 L90 1 W0 ,
where L0 5 (1.0, 1.0, 1.0, 1.0, 1.0)9 and W0 5 diag(0.5, 0.5, 0.5, 0.5, 0.5). The S1 is generated
by a two-factor model with
S1 5 L1 F1 L91 1 W1 ,
where
L1 5
©
1.0
1.0
0
0
0
0
1.0
1.0
ª
0 9
,
1.0
F1 5
©
1.0
0.5
0.5
1.0
ª
,
and W1 5 diag(0.5, 0.5, 0.5, 0.5, 0.5). For the normal theory based method one needs to
choose G0 5 N5 (m0 , S0 ) in order for the chi-square approximation in Theorem 2.2 to hold.
Of course, other choices, such as a multivariate t distribution, are legitimate for G1 in the ML
method and G0 in the ADF method. A speci®c way of generating non-normal multivariate
distributions is as follows. Let A be a p 3 m (m $ p) matrix such that AA9 5 S1 and
z 5 (z1 , z2 , . . . , zm )9, with zi being independent and Var(zi ) 5 1. Then x 5 Az 5 m1 will
have mean vector m1 and covariance matrix S1 . For example, if one generates p independent
chi-square variables (r1 , r2 , . . . , rp ) each with two degrees of freedom, with zi 5 (ri 2 2)/2,
then x 5 S1/2
1 z 1 m1 will serve the purpose for x ~ G1 . Actually, it is not necessary to have
the distributional form of x speci®ed in such a way. More details for creating different
multivariate non-normal distributions can be found in Yuan & Bentler (1999).
We choose G0 5 N5 (0, S0 ) and G1 5 N5 (0, S1 ) in both the ML and the ADF procedures
for convenience. With such a design, j 5 w is easy to calculate and is given in the ®rst
column in Table 3 for c 5 1. With model degrees of freedom p 2 q 5 5 and t 5 h 5 2.0 for
c 5 1, it is also easy for us to evaluate the NCP approximations given in Theorems
p 2.2
and 2.4. We choose sample size N 5 401, N1 5 11 and 21, which lead to c < n1 / n 5 0.5
and 1, respectively. For the purpose of comparison we also include the condition N1 5 0.
With 1000 replications the bias and NCP are calculated as
p
Bias 5 n(uÅ 2 u0 )
and
NCP 5 TÅ 2 df
respectively, where uÅ and TÅ are the average of parameter estimates and test statistics across
the 1000 replications.
170
K.-H. Yuan and P. M. Bentler
Table 3. Simulation results on bias (uÅ 2 u0 ) and non-centrality parameter (T 2 df )
ML
u
l11
l21
l32
l42
l52
w11
w22
w33
w44
w55
NCP
j5w
c51
20.250
20.250
20.083
20.083
20.083
N1 5 0
0.039
20.025
0.003
0.500
0.500
0.167
0.167
0.167
20.038
20.015
20.043
20.046
20.008
20.023
20.006
2.000
20.027
ADF
11
21
20.102
20.166
20.041
20.083
20.060
20.235
20.297
20.080
20.122
20.099
0.673
0.243
0.237
0.081
0.066
0.086
N1 5 0
0.053
20.010
0.018
11
21
20.069
20.132
20.031
20.068
20.057
20.167
20.230
20.073
20.110
20.098
0.502
0.490
0.154
0.141
0.160
20.022
20.009
20.164
20.179
20.128
20.150
20.123
0.049
0.032
20.058
20.081
20.050
0.185
0.168
20.013
20.033
20.006
2.155
0.050
0.479
1.409
When N1 5 11 and c < 1/2, the bias in uà in column 3 should be approximately half of the
theoretical ones under j 5 w in column 1. Considering the ®nite-sample effect in column 2
when there is no theoretical bias, this approximation is pretty good. For example, instead of
half of 20.083, the bias for l42 in column 3 is 2 0.083. This may seem a bad approximation.
However, considering the random error 20.038 in column 2, the amount that 2 0.083 is off
target just re¯ects the ®nite sample effect. With N 5 21, the fourth column of numbers
are estimates for the population numbers in column 1. Considering the ®nite-sample effect
in column 2, the approximation is also quite good. With c < 1/2, the theoretical NCPs are
approximately 0.5 and 2.0, respectively. There exists some discrepancy between the NCP
estimates and those theoretical NCPs. Similar discrepancies have been reported by Satorra
p
et al. (1991), who studied various approximations to NCP when N1 5 N and g 5 O(1/ n).
In contrast to the bias approximations in the ML method, the bias approximations in the
ADF procedure are substantially more off-target. This is related to the ®nite-sample effect, as
shown in column 5, where there is no theoretical bias. This phenomenon was reported in
Yuan & Bentler (1997a) under correct models, where the ®nite-sample bias in uÄ is about 50
à Also, uÄ is much less ef®cient than uà unless data are extremely non-normal.
times of that for u.
Considering these factors and comparing the last two columns of numbers under ADF with
those for N1 5 0, we can see that the parameter estimates are shifted from those corresponding to N1 5 0 by amounts roughly equal to w/2 and w, respectively. Similarly, the NCP
parameter estimate based on the ADF procedure is also some way off from the one described
in Theorem 2.4. This may be related to the unstable nature of TADF , as has been reported
previously (e.g., Yuan & Bentler, 1997b).
4. Discussion
We studied the effect of outliers on the two most commonly used CSA procedures. Even
though a model structure may ®t the majority of the data, a few outliers can discredit the
value of the model. The analytical development in Section 2 establishes a direct relationship between model statistical signi®cance and the presence of outliers. When a signi®cant
chi-square statistic occurs in practice, the researcher should check the model as well as the
Effect of outliers on estimators and tests in covariance structure analysis
171
data since either could be the source of the lack of ®t. When data contain possible outliers,
there are two general approaches to minimize their effect. One approach is to identify the
outliers ®rst, and then to apply classical procedures after outlier removal (see, Lee & Wang,
1996). The other is to use a robust approach to downweight the in¯uence of outliers. With the
®rst approach, no method can guarantee that one can identify all the outliers. With the robust
approach, the in¯uence of outliers is not necessarily completely removed. So the results in
Section 2 may also be relevant to data analysis even when care has been taken to minimize
the effect of outlying cases.
Our model formulation in Section 2 is for covariance structure analysis. We regard
observations from G1 as outliers as long as u0 satis®es S0 5 S(u0 ) while no u satis®es
S1 5 S(u). Thus, it is not necessary for outliers to be very extreme to break down the regular
analysis. A difference that characterizes our approach as compared to that of the robust
statistic literature (e.g., regression), is that in robust statistics, it is usually assumed that
the model is correct, but that error distributions have different tails. We assume here that the
model is incorrect for the outliers. If a model is correct in both the means and covariances,
the ADF procedure works well when sample size is large enough even though errors may
have different third- and fourth-order moments (Yuan & Bentler, 1997b). Although outliers
can have the effect of generating a sample that violates distributional assumptions in the ML
procedure, this is not possible with the ADF procedure since it allows any distributional
shape.
In the technical development in Section 2, we assumed n1 /n N 0 forp
convenience.
In
practice, sample sizes are always ®nite. When N À N1 , we can use c < n1 / n and the other
corresponding sample quantities to approximate the biases in the normal distributions
and non-centrality parameters in the chi-square distributions, as illustrated in the previous
section. The proportion of outliers in a sample cannot be large in order for the asymptotic
results in Section p
2 to be a good approximation. This is parallel to the assumption
S0 5 S(u ) 1 O(1/ n) in Satorra & Saris (1985) and Steiger et al. (1985). These authors
considered the behaviour of TML when N1 5 0.
One may not want to know details about outliers unless they are of special scienti®c
interest. But outliers may in many instances represent important scienti®c opportunities to
discover a new phenomenon. Potentially a new physical or theoretical aspect of a research
area can be enriched by understanding the conditions and meaning associated with the
occurrence of atypical observations. In such a case, focusing attention on only the majority of
the data may be ignoring a fundamental and important phenomenon, and special scienti®c
attention should be devoted to the analysis of atypical cases. Ultimately it may be possible to
develop quite different models for various subsets of the data. In the case of covariance
structure analysis, this is well known, and indeed multiple population models were developed
primarily to deal with the case where a single model would distort a phenomenon. Our
attention here is focused on the situation where prior research and theory dictate that a single
multivariate process is at work and is of primary interest, and where there is recognition that
subject carelessness, response errors, coding errors, transcribing problems or other unknown
sources of distortion to a primary model may be operating, but are not of special interest. In
this situation, we desire to quantify the effects that atypical observations may or may not have
on a standard analysis.
Finally, we need to note that there is no con¯ict in the results developed here with those
in the literature on asymptotic robustness. This is because there is no concept of outliers in
172
K.-H. Yuan and P. M. Bentler
all the asymptotic robustness research. The set-up in the asymptotic robustness literature
assumes that a sample comes from a multivariate distribution whose covariance structure is of
interest. Our set-up is that the majority of the data comes from a distribution with S0 5 S(u0 )
while a small proportion of outliers comes from a distribution that does not satisfy the
proposed structural model. The implication of our result in this regard is that one should
not misuse asymptotic robustness theory by blindly applying a model ®tting procedure in
a CSA program to data with possible outliers.
Appendix
Proof of Lemma 2.1. Let xÅ 0 , xÅ 1 , and S0 , S1 represent the sample means and sample
covariance matrices of x1 , . . . , xN0 and xN01 1 , . . . , xN . Then we have
S5
n0
n1
N0 N1
S0 1 S1 1
(xÅ 0 2 xÅ 1 )(xÅ 0 2 xÅ 1 )9.
n
n
nN
(A.1)
It follows from (A.1) that
r
p
n0 p
n
N N
1
n0 (S0 2 S0 ) 1 p1 (S1 2 S0 ) 1 p0 1 (Åx0 2 xÅ 1 )(Åx0 2 xÅ 1 )9 2 p S0 .
n(S 2 S0 ) 5
n
n
nN
n
(A.2)
p
L
Since n0 (s0 2 s(u0 )) N Np (0, G), the lemma follows from (A.2).
e
Proof of Lemma 2.2. Since n1 /n N 0, it follows from (3) and (A.1) that
a.s.
FML (S, S(u)) ¡¡N FML (S0 , S(u))
uniformly on Q. Since u0 minimizes FML (S0 , S(u)) and the model is identi®ed, the lemma
follows from a standard argument (Yuan, 1997).
e
Proof of Theorem 2.1. Applying the ®rst-order Taylor expansion on the left-hand side of (5)
at u0 gives
p
p
Ç 9W n(s 2 s0 ) 1 op (1).
Ç 21 s
Ç 9Ws)
n(uà 2 u0 ) 5 (s
(A.3)
The theorem follows from Lemma 2.1 by noticing that G 5 W21 .
e
Ã
Proof of Theorem 2.2. Using the Taylor expansion on FML (s) 5 FML (S, S(u))
at
à we have
s
à 5 vech[S(u)],
n
TML 5 nFML (s)
à 1 nFÇ 9ML (s)(s
à 2 s)
à 1 (s 2 s)
à 9FÈML (s)(s
Å 2 s),
Ã
(A.4)
2
where s
Å is a vector lying between s and s.
à Notice that FML (s)
à 5 0, FÇML (s)
à 5 0 and
ÈFML (s)
Å 5 2W 1 op (1). We have from (A.4),
à 9W(s 2 s)
à 1 op (1).
TML 5 n(s 2 s)
(A.5)
Effect of outliers on estimators and tests in covariance structure analysis
173
It follows from (A.3) that
à 2 s0 )
à 5 (s 2 s0 ) 2 (s(u)
s 2 s(u)
Ç uà 2 u0 ) 1 op (1/
5 (s 2 s0 ) 2 s(
p
n)
(A.6)
p
Ç s
Ç 9Ws)
Ç 21 s
Ç 9W](s 2 s0 ) 1 op (1/ n).
5 [I 2 s(
Putting (A.6) into (A.5), one obtains
TML 5 e9Qe 1 op (1),
(A.7)
where
e 5 W1/2
and
p
n(s 2 s0 )
Ç 9W1/2 .
Ç 21 s
Ç s
Ç 9Ws)
Q 5 I 2 W1/2 s(
When G0 5 Np (m0 , S0 ), W 5 G21 . Noticing that Q is a projection matrix of rank p 2 q, the
theorem follows from Lemma 2.1 and (A.7).
e
Proof of Lemma 2.3. This is similar to the proof of Lemma 2.2.
Proof of Lemma 2.3. Using the Taylor expansion of s(u) in (2.9) at u0 , it follows that
p
p
1
1
Ä 2
Å 21 s
Ä 2
n(uÄ 2 u0 ) 5 [s
Ç 9(u)S
Ç u)]
Ç 9(u)S
n(s 2 s0 ),
(A.8)
y s(
y
Å denotes that each row of s(u)
Ç
Ç u)
is evaluated at an uÅ which lies between uÄ and
where s(
P
P
Å
Ä
u0 . Since u N u0 , so does u. Since Sy N G when n1 /n N 0, the theorem follows from (A.8)
and Lemma 2.1.
e
Proof of Theorem 2.4. We have from (A.8),
p
p
Ä 5 [I 2 s(
Ç 9G21 ] n(s 2 s0 ) 1 op (1).
Ç s
Ç 9G21 s)
Ç 21 s
n(s 2 s(u))
So TADF can be written as
Ç 9G21/2 ]e 1 op (1),
Ç s
Ç 9G21 s)
Ç 21 s
TADF 5 e9[I 2 G21/2 s(
p
where e 5 G21/2 n(s 2 s0 ). The proof follows from (A.9) and Lemma 2.1.
(A.9)
Acknowledgements
The authors gratefully acknowledge the constructive comments of three referees and the
editor that led to an improved version of the paper. This work was supported by National
Institute on Drug Abuse grants DA01070 and DA00017 at the US National Institutes of
Health.
References
Amemiya, Y., & Anderson, T. W. (1990). Asymptotic chi-square tests for a large class of factor analysis
models. Annals of Statistics, 18, 1453±1463.
174
K.-H. Yuan and P. M. Bentler
Anderson, T. W., & Amemiya, Y. (1988). The asymptotic normal distribution of estimators in factor
analysis under general conditions. Annals of Statistics, 16, 759±771.
Barnett, V., & Lewis, T. (1994). Outliers in statistical data (3rd ed.). Chichester: Wiley.
Beckman, R. J., & Cook, R. D. (1983).Outlier . . . . . . . . . .s (with discussion).Technometrics, 25, 119±163.
Bentler, P. M. (1990). Comparative ®t indexes in structural models. Psychological Bulletin, 107,
238±246.
Bentler, P. M. (1995). EQS structural equations program manual. Encino, CA: Multivariate Software.
Berkane, M., & Bentler, P. M. (1988). Estimation of contamination parameters and identi®cation of
outliers in multivariate data. Sociological Methods and Research, 17, 55±64.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Bollen, K. A., & Arminger, G. (1991). Observational residuals in factor analysis and structural equation
models. Sociological Methodology, 21, 235±262.
Breckler, S. J. (1990). Application of covariance structure modeling in psychology: Cause for concern?
Psychological Bulletin, 107, 260±273.
Browne, M. W. (1982). Covariance structure analysis. In D. M. Hawkins (Ed.), Topics in applied
multivariate analysis (pp. 72±141). Cambridge: Cambridge University Press.
Browne, M. W. (1984). Asymptotic distribution-free methods for the analysis of covariance structures.
British Journal of Mathematical and Statistical Psychology, 37, 62±83.
Browne, M. W. (1987). Robustness of statistical inference in factor analysis and related models.
Biometrika, 74, 375±384.
Browne, M. W., & Cudeck, R. (1993). Alternatives ways of assessing model ®t. In K. A. Bollen &
J. S. Long (Eds.), Testing structural equation models (pp. 136±162). Newbury Park, CA: Sage.
Browne, M. W., & Shapiro, A. (1988). Robustness of normal theory methods in the analysis of linear
latent variate models. British Journal of Mathematical and Statistical Psychology, 41, 193±208.
Cadigan, N. G. (1995). Local in¯uence in structural equation models. Structural Equation Modeling, 2,
13±30.
Chatterjee, S., Handcock, M. S., & Simonoff, J. S. (1995). A casebook for a rst course in statistics and
data analysis. New York: Wiley.
Cook, R. D. (1986). Assessment of local in¯uence (with discussion). Journal of the Royal Statistical
Society, Series B, 48, 133±169.
de Leeuw, J. (1988). Multivariate analysis with linearizable regressions. Psychometrika, 53, 437±455.
Ferguson, T. S. (1961). On the rejection of outliers. In J. Neyman (Ed.), Proceedings of the fourth
Berkeley symposium on mathematical statistics and probability, Vol. 1 (pp. 253±297). Berkeley:
University of California Press.
Fung, W. K., & Kwan, C. W. (1995). Sensitivity analysis in factor analysis: Difference between using
covariance and correlation matrices. Psychometrika, 60, 607±614.
Gnanadesikan, R. (1997). Methods for statistical data analysis of multivariate observations (2nd ed.).
New York: Wiley.
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., & Stahel, W. A. (1986). Robust statistics: The
approach based on inuence functions. New York: Wiley.
Hu, L., & Bentler, P. M. (1998). Fit indices in covariance structural equation modeling: Sensitivity to
underparameterized model misspeci®cation. Psychological Method, 3, 424±453.
Huber, P. J. (1981). Robust statistics. New York: Wiley.
JoÈreskog, K. G. (1993). Test structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing
structural equation models (pp. 294±316). Newbury Park, CA: Sage.
JoÈreskog, K. G., & SoÈrbom, D. (1993). LISREL 8 user’s reference guide, Chicago: Scienti®c Software
International.
Kwan, C. W., & Fung, W. K. (1998). Assessing local in¯uence for speci®c restricted likelihood:
Application to factor analysis. Psychometrika, 63, 35±46.
Lee, S.-Y., & Wang, S.-J. (1996). Sensitivity analysis of structural equation models. Psychometrika, 61,
93±108.
MacCallum, R. C., & Austin, J. T. (2000). Applications of structural equation modeling in psychological research. Annual Review of Psychology, 51, 201±226.
Magnus, J. R., & Neudecker, H. (1988). Matrix differential calculus with applications in statistics and
econometrics. New York: Wiley.
Effect of outliers on estimators and tests in covariance structure analysis
175
McDonald, R. P. (1989). An index of goodness-of-®t based on noncentrality. Journal of Classication,
6, 97±103.
Mooijaart, A., & Bentler, P. M. (1991). Robustness of normal theory statistics in structural equation
models. Statistica Neerlandica, 45, 159±171.
Mueller, R. O. (1996). Basic principles of structural equation modeling. New York: Springer-Verlag.
Rousseeuw, P. J., & van Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points
(with discussion). Journal of the American Statistical Association, 85, 633±651.
Satorra, A. (1989). Alternative test criteria in covariance structure analysis: A uni®ed approach.
Psychometrika, 54, 131±151.
Satorra, A. (1992). Asymptotic robust inferences in the analysis of mean and covariance structures.
Sociological Methodology, 22, 249±278.
Satorra, A., & Bentler, P. M. (1990). Model conditions for asymptotic robustness in the analysis of
linear relations. Computational Statistics & Data Analysis, 10, 235±249.
Satorra, A., & Bentler, P. M. (1991). Goodness-of-®t test under IV estimation: Asymptotic robustness
of a NT test statistic. In R. GutieÂrrez & M. J. Valderrama (Eds.), Applied stochastic models and
data analysis (pp. 555±567). Singapore: World Scienti®c.
Satorra, A., & Saris, W. E. (1985). Power of the likelihood ratio test in covariance structure analysis.
Psychometrika, 50, 83±90.
Satorra, A., Saris, W. E., & de Pijper, W. M. (1991). A comparison of several approximations to the
power function of the likelihood ratio test in covariance structure analysis. Statistica Neerlandica, 45, 173±185.
Shapiro, A. (1987). Robustness properties of the MDF analysis of moment structures. South African
Statistical Journal, 21, 39±62.
SoÈrbom, D. (1974). A general method for studying differences in factor means and factor structure
between groups. British Journal of Mathematical and Statistical Psychology, 27, 229±239.
Steiger, J. H., Shapiro, A., & Browne, M. W. (1985). On the multivariate asymptotic distribution of
sequential chi-square statistics. Psychometrika, 50, 253±264.
Tanaka, Y., Watadani, S., & Moon, S. H. (1991). In¯uence in covariance structure analysis: With
an application to con®rmatory factor analysis. Communication in Statistics—Theory and
Methods, 20, 3805±3821.
Wilcox, R. R. (1997). Introduction to robust estimation and hypothesis testing. San Diego: Academic
Press.
Yuan, K.-H. (1997). A theorem on uniform convergence of stochastic functions with applications.
Journal of Multivariate Analysis, 62, 100±109.
Yuan, K.-H., & Bentler, P. M. (1997a). Improving parameter tests in covariance structure analysis.
Computational Statistics and Data Analysis, 26, 177±198.
Yuan, K.-H., & Bentler, P. M. (1997b). Mean and covariance structure analysis: Theoretical and
practical improvements. Journal of the American Statistical Association, 92, 767±774.
Yuan, K.-H., & Bentler, P. M. (1998a). Robust mean and covariance structure analysis. British Journal
of Mathematical and Statistical Psychology, 51, 63±88.
Yuan, K.-H., & Bentler, P. M. (1998b). Structural equation modeling with robust covariances.
Sociological Methodology, 28, 363±396.
Yuan, K.-H., & Bentler, P. M. (1999). On normal theory and associated test statistics in covariance
structure analysis under two classes of nonnormal distributions. Statistica Sinica, 9, 831±853.
Yuan, K.-H., Chan, W., & Bentler, P. M. (2000). Robust transformation with applications to structural
equation modelling. British Journal of Mathematical and Statistical Psychology, 53, 31±50.
Yung, Y.-F. (1997). Finite mixtures in con®rmatory factor analysis models. Psychometrika, 62,
297±330.
Received 23 April 1998; revised version received 25 August 2000
© 2001 The British Psychological Society
Printed in Great Britain
161
Effect of outliers on estimators and tests in
covariance structure analysis
Ke-Hai Yuan*
Department of Psychology, University of North Texas, USA
Peter M. Bentler
Departments of Psychology and Statistics, University of California, Los Angeles, USA
A small proportion of outliers can distort the results based on classical procedures
in covariance structure analysis. We look at the quantitative effect of outliers on
estimators and test statistics based on normal theory maximum likelihood and the
asymptotically distribution-free procedures. Even if a proposed structure is correct
for the majority of the data in a sample, a small proportion of outliers leads to biased
estimators and signi®cant test statistics. An especially unfortunate consequence is
that the power to reject a model can be made arbitrarilyÐbut misleadinglyÐlarge
by inclusion of outliers in an analysis.
1. Introduction
Covariance structure analysis (CSA) plays an important role in understanding the relationship
among multivariate data (e.g. Bollen, 1989; Mueller, 1996). Let x1 , . . . , xN be a sample of
dimension p with E(xi ) 5 m, Cov(xi ) 5 S, and let xÅ and S be the sample mean vector and
sample covariance matrix. In CSA, S is ®tted by a structural model S 5 S(u) which
hypothesizes the elements of the covariance matrix S to be functions of a more basic set
of parameters u. The best-known example is the con®rmatory factor analysis model
S 5 LFL9 1 W, where the unknown elements of the factor loading matrix L, the factor
covariance matrix F and the unique variance matrix W are the elements of u. In practice,
covariance structure models S 5 S(u) often are rejected empirically. There are obviously
many possible explanations for such empirical failures, all of which in some way must
involve violation of regularity conditions associated with the theory on which the statistical
tests are based. Of course, any particular model S(u) may be inadequate, for example, it may
omit crucial parameters or re¯ect an entirely incorrect functional form. Earlier technical
development in this respect was given by Satorra & Saris (1985) and Steiger, Shapiro &
Browne (1985), who proposed using the non-central chi-square to describe the distribution of
the test statistics. Recent discussions of structural modelling imply that such theoretical
* Requests for reprints should be addressed to Dr Ke-Hai Yuan, Department of Psychology, University of North
Texas, PO Box 311280, Denton, TX 76203-1280, USA.
162
K.-H. Yuan and P. M. Bentler
inadequacy is almost inevitable in the social and behavioural sciences (de Leeuw, 1988;
Browne & Cudeck, 1993).
In applications of structural equation models, the perspective that any model is only an
approximation to reality dominates the literature. When a model does not ®t a data set as
implied by a signi®cant chi-square test statistic, then many empirical researchers start using
various ®t indices to justify their models (see, for example, Hu & Bentler, 1998). Some of
them may go one step further to modify their models as facilitated by the modi®cation index
in LISREL (JoÈreskog & SoÈrbom, 1993) or the Lagrange multiplier test in EQS (Bentler,
1995). This step was classi®ed as a model generating situation in JoÈreskog (1993). However,
few or none of them question the quality of their data. This phenomenon was noted by
Breckler (1990). JoÈreskog (1993) discussed various possible causes when a model does not
match a data set in practice and emphasized the importance of proper data for proper
statistical results. Even though there exist various approaches for outlier detection (Cook,
1986; Berkane & Bentler, 1988; Bollen, 1989; Bollen & Arminger, 1991; Tanaka, Watadani
& Moon, 1991; Barnett & Lewis, 1994; Bentler, 1995; Cadigan, 1995; Fung & Kwan, 1995;
Lee & Wang, 1996; Gnanadesikan, 1997; Kwan & Fung, 1998) as well as robust procedures
(Huber, 1981; Hampel, Ronchetti, Rousseeuw & Stahel, 1986; Rousseeuw & van Zomeren,
1990; Wilcox, 1997; Yuan & Bentler, 1998a, 1998b; Yuan, Chan & Bentler, 2000), in the
applications of CSA these techniques are seldom used as exempli®ed in numerous publications using CSA in the applied journals in social and behavioural sciences (see, for
example, MacCallum & Austin, 2000). This lack of attention to data quality probably occurs
because there is no analytical development that directly connects the signi®cance of a chisquare test statistic to bad data. Such a development is one of the major foci of the current
paper. Speci®cally, we will show that even if a proposed structure is correct for the majority
of the data in a sample, a small proportion of outliers can lead to biased estimators and
highly signi®cant test statistics.
Several approaches to CSA exist (Bollen, 1989). The two most widely used are the normal
theory based maximum likelihood (ML) procedure and the asymptotically distribution-free
(ADF) procedure (Browne, 1982, 1984). Of these two, the ML method is more popular
among empirical researchers (Breckler, 1990). One reason for this is that the ML procedure is
the default option in almost all the standard software (e.g., LISREL, EQS, AMOS). Another
reason is that there exist various results about the robustness of the normal theory based
ML procedure when applied to non-normal data (Browne, 1987; Shapiro, 1987; Anderson &
Amemiya, 1988; Browne & Shapiro, 1988; Amemiya & Anderson, 1990; Satorra & Bentler,
1990, 1991; Mooijaart & Bentler, 1991; Satorra, 1992; Yuan & Bentler, 1999). The basic
conclusion of this literature is that, under some special conditions, the parameter estimates
based on ML are consistent, some standard errors remain consistent, and the likelihood
ratio statistic can still follow a chi-square distribution asymptotically even when the observed
data are non-normally distributed. The conditions under which normal theory methods can be
applied to non-normal data are commonly referred to as asymptotic robustness conditions.
Perhaps spurred on by a false sense of generalizability of the ML method, many researchers
seldom bother to check the quality of their data. Once again, this phenomenon is probably
due to the fact that there are no analytical results pointing out that asymptotic robustness is
not enough when a data set contains outliers. As our results show, even if the observations in
a sample come from normal distributions, the normal theory based ML still cannot verify the
right covariance structure if there are a few outliers. Of course, the implication of the ADF
Effect of outliers on estimators and tests in covariance structure analysis
163
procedure is that whatever the distribution of a data set may be, this procedure will give a
fair model evaluation when sample size is large enough. Unfortunately, the ADF method also
can be distorted by a few outliers.
In the next section we will give the analytical results for the effect of outliers on the two
commonly used procedures. In Section 3 we will use some real data sets as well as simulation
to illustrate the effect of outliers in practice. Relatively technical proofs will be given in an
appendix.
2. Effect of outliers on model evaluations
We will quantify the effect of outliers on the normal theory based ML procedure and on the
ADF procedure. The following preliminaries will set up the framework for our study.
Let x1 , . . . , xN0 come from a distribution G0 , xN01 1 , . . . , xN come from a distribution G1 ,
N1 5 N 2 N0 and N0 À N1 . Let the mean vectors and covariance matrices of G0 and G1 be
m0 and m1 , and S0 and S1 , respectively. We will consider the structure S(u) such that
S0 5 S(u0 ) for some unknown u0 , but S1 will not equal S(u) for any u. This means that we
have a correct covariance structure if there are no data from G1 . Consequently, we will
consider the N1 cases as outliers in the CSA context. The above formulation is very general
because we do not put any speci®c distributional form on G1 . For example, the commonly
used slippage model (see, Ferguson, 1961) for outlier identi®cation purposes is only a special
case of the above model. When a variety of data contaminations exist in a sample, G1 can be
regarded as a pooled distribution based on many sources of in¯uence (Beckman & Cook,
1983). We have no speci®c interest in identifying cases from G1 . The approach of in¯uence
analysis, as illustrated in Lee & Wang (1996) and Kwan & Fung (1998), should identify
observations from G1 , otherwise, a kind of Type I error is made in identifying the false
outliers.
In the above formulation, we assume that N1 is much smaller than N0 , and the memberships
of the observations are unknown. We may have knowledge about the impurity of the data,
but we do not need to treat G1 as another group. In practice, with observations from multiple
groups, the multiple-group approach (SoÈrbom, 1974) can be used to analyse the data when
group membership is known, and a mixture distribution approach to structural modelling can
be used when the group membership is unknown (Yung, 1997). Here, we have no special
interest in the structure of S1 . The formulation is simply to study the quantitative effect of the
N1 outliers on parameter estimators and test statistics. Since the ®nite-sample properties of
the estimators and test statistics are hard to quantify, we will concentrate on the large-sample
properties.
We assume that n0 /n N p0 and n1 /n N p1 as n grows large, where n 5 N 2 1, n0 5 N0 2 1
and n1 5 N1 2 1. Let xÅ and S be the sample mean vector and the sample covariance matrix
based on the whole sample. It is easily seen that
a.s.
xÅ ¡¡N m
a.s.
and S ¡¡N S ,
(1)
where m 5 p0 m0 1 p1 m1 and
S 5 p0 S0 1 p1 S1 1 p0 p1 (m0 2 m1 )(m0 2 m1 )9.
(2)
If we assume m0 5 m1 , then S 5 p0 S0 1 p1 S1 , but it may be unrealistic to assume S0 Þ S1
while m0 5 m1 . If S1 1 p0 (m0 2 m1 )(m0 2 m1 )9 5 S0 , then the effect of the N1 outliers will
164
K.-H. Yuan and P. M. Bentler
not be observed. If S 5 S(u ) for some u , we can still recover the covariance structure, but
the parameter is shifted with a bias of u 2 u0 . However, because we usually do not observe
the u0 , and chi-square statistics will not discriminate u from u0 , the effect of the N1 outliers
will not be observed either. In the following, we will deal with the general case that
S0 5 S(u0 ) but S1 Þ S(u) for any u. For the purpose of distribution characterization, we
will assume that the proportion of outliers is small so that estimates are still near u0 . Let
vech(·) be the operator which transforms a symmetric matrix into a vector by stacking
the columns of the matrix, leaving out the elements above the diagonal. Let G 5
Cov[vech{(x 2 m0 )(x 2 m0 )9}], where x ~ G0 . For the distribution of s 5 vech(S), writing
s 5 vech(S), we have the following lemma.
Lemma 2.1. If the ®rst four moments of both G0 and G1 exist and are ®nite, then we
have:
p
p
L
(a)
n(s 2 s0 ) N Np (0, G) when n1 / n N 0;
p
p
L
(b)
n(s 2 s0 ) N Np (cg, G) when n1 / n N c, where g 5 vech[S1 1 (m0 2 m1 )
(m0 2 m1 )9 2 S0 ];
p
p
P
(c)
n(s 2 s0 ) N ` when g Þ 0 and n1 / n N ` while n1 /n N 0.
p
Lemma 2.1 tells us that the distribution of n(s 2 s0 ) varies aspthe
proportion of outn and n, the quantity
liers
changes.
When
the
number
of
outliers
is
somewhere
between
p
n(s 2 s0 ) does not converge at all and hence standard statistical theory of CSA cannot be
relied upon. For our purpose of quantifying the distribution
of parameter estimates in CSA,
p
we will be mainly interested in case (b), i.e., n1 / n N c.
In the rest of this paper, a dot on top of a function will denote the derivative (e.g.,
s(u)
Ç
5 s(u)/u9). When a function is evaluated at u0 , we sometimes omit its argument
Ç 5 s(u
Ç 0 )). We also need the following standard conditions for the results to be
(e.g., s
rigorous:
(C1)
(C2)
(C3)
(C4)
(C5)
u0 is an interior point of some compact set Q Ì q .
S0 is positive de®nite and S(u) 5 S(u0 ) only when u 5 u0 .
S(u) is twice continuously differentiable.
Ç
s(u)
is of full rank.
G is positive de®nite.
Under the above conditions we will ®rst concentrate on the ML procedure and then turn to
the ADF procedure.
2.1. Maximum likelihood procedure
Let S be the sample covariance based on a p-variate sample with size N 5 n 1 1. The
approach based on the normal theory ML procedure is to minimize
FML (S, S(u)) 5 tr[SS21 (u)] 2 log |SS21 (u)| 2 p
(3)
à Let Dp be the duplication matrix de®ned in Magnus & Neudecker (1988, p. 49) and
for u.
W(u) 5 221 D9p [S21 (u) Ä S21 (u)]Dp .
Effect of outliers on estimators and tests in covariance structure analysis
165
Under the assumption that N1 5 0, G0 5 Np (m0 , S0 ), and the null hypothesis of a correct
model structure S0 5 S(u0 ),
p
L
n(uà 2 u0 ) N Nq (0, Q)
(4a)
and
L 2
à N
TML 5 nFML (S, S (u))
xp
2q ,
(4b)
21
Ç , p 5 p( p 1 1)/2 and q is the number of unknown parameters in u.
Ç 9Ws)
where Q 5 (s
The assumption G0 5 Np (m0 , S0 ) can be relaxed to some degree and the results in (4) still
hold. Readers who have interests in this direction are referred to Yuan & Bentler (1999) for a
very general characterization.
Still assuming G0 5 Np (m0 , S0 ) and S0 5 S(u0 ) for some unknown u0 , our interest is in
the property of uà and TML when N1 does not equal zero. The following lemma is on the
consistency of uà when the proportion of contaminated data is small.
a,s
Lemma 2.2. Under conditions (C1) and (C2), and if n1 /n N 0, then uà ¡¡N u0 .
When the number of outliers is comparable with the sample size, the estimator uà will no
a.s.
longer converge to u0 ; instead, uà ¡¡N u0 which minimizes FML (S , S(u)); we will not deal
with this situation here. Since u0 is an interior point of Q, when n is large enough uà will
satisfy
à 5 0.
à 2 s(u))
Ã
Ç 9(u)W(
u)(s
s
(5)
à we have the following result.
For this u,
P
Theorem 2.1. Under conditions (C3), (C4) and (C5), if uà N u0 , then:
p
p
L
n(uà 2 u0 ) N Nq (0, Q) when n1 / n N 0;
p
p
L
(b)
n(uà 2 u0 ) N Nq (j, Q) when n1 / n N c, where
(a)
Ç 9Ws)
Ç 21 s
Ç 9Wg;
j 5 c(s
(c)
p
p
P
n(uà 2 u0 ) N ` when n1 / n N ` while n1 /n N 0.
with Lemma 2.1, Theorem 2.1 tells us that the asymptotic distribution of
pComparing
np
in a similar way to that
(uÃ2 u0 ) is decided by the proportion of outliers in the
pdata
of n(s 2 s0 ). The bias in the asymptotic distribution of n(uà 2 u0 ) will increase as the
proportion of outliers increases. The non-centrality parameter (NCP) in the chi-square
distribution, which characterizes the test statistic TML as described in the following theorem,
will increase in a similar way.
P
Theorem 2.2. Under conditions (C3), (C4) and (C5), if uà N u0 , then:
(a)
L
TML N x2p
2q
when n1 /
p
n N 0;
166
(b)
L
TML N x2p 2q (t)
K.-H. Yuan and P. M. Bentler
p
when n1 / n N c, where
Ç 9W]g.
Ç 21 s
Ç s
Ç 9Ws)
t 5 c 2 g9[W 2 Ws(
(6)
The NCP in (6) can be compared with the NCPs discussed in Satorra (1989) and Satorra,
Saris & de Pijper (1991).
p Actually, t is just the W3 in Satorra et al. (1991) when N1 5 N,
c 2 5 n and g 5 O(1/ n). Notice that the assumption of G0 5 N(m0 , S0 ) is used to obtain
simple expressions in Theorems 2.1 and 2.2. When G0 is not normal the bias in uà as an
estimator of u0 still exists. Similarly, even though TML cannot then be described by a chisquare distribution, when N1 > 0 it will be stochastically larger than when N1 5 0.
2.2. The asymptotically distribution-free procedure
Since data sets in social and behavioural sciences may not follow multivariate normal
distributions, Browne (1982, 1984) proposed the ADF procedure. Let yi 5 vech[(x1 2 xÅ )
(xi 2 xÅ )9] and Sy be the sample covariance matrix of yi . The ADF procedure is to minimize
1
FADF (S, S(u)) 5 (s 2 s(u))9S2
y (s 2 s(u))
(7)
Ä We do not need to assume a particular distribution to apply this procedure. Under the
for u.
conditions S0 5 S(u0 ) and N1 5 0,
p
L
n(uÄ 2 u0 ) N N(0, Q),
(8a)
Ç 9G21 s)
Ç 21 and
where Q 5 (s
L
Ä N x2p
TADF 5 nFADF (S, S(u))
2q .
(8b)
Like the normal theory based ML procedure, the ADF procedure is also susceptible to
poor data quality. Now, assuming that S0 5 S(u0 ) but N1 does not equal zero, we have the
following lemma.
a.s.
Lemma 2.3. Under conditions (C1), (C2) and (C5), and if n1 /n N 0, then uÄ ¡¡N u0 .
As in the previous subsection, we are interested in quantifying the effect of outliers on
Ä we usually need to solve the
the distribution of uÄ and on the test statistic TADF . To obtain u,
equation
1
Ä 2
Ä
s
Ç 9(u)S
y (s 2 s(u)) 5 0.
Ä
The following theorem gives the asymptotic distribution of u.
P
Theorem 2.3. Under conditions (C3), (C4) and (C5), if uÄ N u0 , then:
p
p
L
n(uÄ 2 u0 ) N N(0, Q) when n1 / n N 0;
p
p
L
(b)
n(uÄ 2 u0 ) N N(w, Q) when n1 / n N c, where
(a)
Ç 9G21 s)
Ç 21 sÇ 9G21 g;
w 5 c(s
(c)
p
p
P
n(uÄ 2 u0 ) N ` when n1 / n N ` while n1 /n N 0.
(9)
Effect of outliers on estimators and tests in covariance structure analysis
167
Theorem 2.3p
tells
us that in the presence of outliers, the mean vector in the asymptotic
distribution of n(uÄ 2 u0 ) will not be zero. It changes in a similar way to that for the ML
estimator. The bias in uÄ in case (b) is mainly due to the effect of the outliers on S. Even though
1
Ä
the outliers also affect the weight matrix S2
y , such an in¯uence on u is minimal. Actually, as
long as n1 /n approaches zero, Sy will be consistent for G, which is suf®cient for the result in
the theorem to hold. The following theorem is on the corresponding test statistic.
P
Theorem 2.4. Under conditions (C3), (C4) and (C5), if uÄ N u0 , then:
(a)
(b)
L
TADF N x2p
p
when n1 / n N 0;
p
L
TADF N x2p 2q (h) when n1 / n N c, where
2q
h 5 c 2 g9[G21 2 G21 s(
Ç 21 s
Ç s
Ç 9G21 s)
Ç 9G21 ]g.
It is obvious that w 5 j and h 5 t when G0 follows a multivariate normal distribution.
However, as we shall see in the next section, there may exist a big difference between the
estimates by the two methods even when N1 5 0.
From the results in this section, we can see the effect of outliers on the commonly used
inference procedures. Even if a proposed model would ®t the population covariance matrix
of G0 , because of outliers, parameter estimates will be biased and test statistics will be
signi®cant. For example, in a factor analysis model, biased parameter estimates may lead
to non-signi®cant factor loadings being signi®cant and vice versa (Yuan & Bentler, 1998a,
1998b), or there may even be an inability to minimize (2.3) or (2.7) using standard minimization methods. A signi®cant test statistic may discredit a theoretically correct model
structure S0 5 S(u0 ) that a researcher wants to recover (Yuan & Bentler, 1998b). As will be
illustrated in the next section, the NCP associated with a model test can be made larger
for any actual degree of model misspeci®cation by inclusion of a few outliers. These
outliers will seriously distort any power analysis that should be an integral part of structural
modelling.
3. Illustration
We will ®rst use two data sets to demonstrate the relevance of our results to data analysis. In
particular, we will show that a few outliers can create a problematic solution such as a
Heywood case (negative error variance). The NCP estimate, on which most ®t indices are
based, can also be strongly in¯uenced by outliers. Then we will conduct a simulation to see
how the analytical results in Section 2 are matched with empirical data.
The ®rst example is based on a data set from Bollen (1989). It consists of three estimates
of percentage cloud cover for 60 slides. This data set was introduced for outlier identi®cation purposes. Bollen & Arminger (1991) further used a one-factor model to ®t this data
set to study observational residuals in factor analysis. Fixing the factor variance at 1.0, the
maximum likelihood solution uà for factor loadings and factor variances is given in the ®rst
row of Table 1. With the negative variance estimates wà 33 5 2 51.439, this solution is
de®nitely not acceptable. After removing the three cases 52, 40 and 51, which correspond
168
K.-H. Yuan and P. M. Bentler
Table 1. Parameter estimates and biases for cloud cover data from Bollen (1989)
u
uÃ
uà (1)
uà 2 uà (1)
l11
l21
l31
w11
w22
w33
32.432
31.986
0.446
31.457
36.567
25.110
38.145
35.907
2.238
248.770
105.799
142.971
473.781
157.294
316.487
251.439
58.151
2109.590
to the three largest residuals in Bollen & Arminger’s (1991) analysis, the solution uà (1) is given
in the second row of Table 1. In this example, the outliers have a big effect on the estimates
of error variances, but relatively less effect on the factor loadings.
For this example, the ADF procedure will give the same result as the ML procedure
because the model is saturated. Notice that one has no way to evaluate the population
parameter j or w because the true parameter u0 is unknown. We may approximate the bias by
the parameter differences uà 2 uà (1) , which is given in the third row of Table 1. Since the
degrees of freedom are zero, a test statistic is not relevant here.
The second data set is from NBA descriptive statistics for 105 guards for the 1992±1993
basketball season (Chatterjee, Handcock & Simonoff, 1995). We select a subset of four
variables from the original nine. These four variables are total minutes played, points scored
per game, assists per game, and rebounds per game. Yuan & Bentler (1998a) proposed a
one-factor model for these variables. Since the variable `total minutes played’ has a very
large variance, as in Yuan & Bentler (1998a) it is multiplied by 0.01 before the model is
®tted. With factor variance F 5 1.0, the ML estimates based on the entire data set are given
in the ®rst row of Table 2. The likelihood ratio statistic TML 5 15.281, which is highly
signi®cant when referred to x22 . This rejection of the model may be due to a few outlying
observations. Yuan & Bentler (1998a) identi®ed cases 2, 4, 6, 24, 32 as the ®ve most
in¯uential cases. The model was re-estimated without these ®ve cases. The likelihood ratio
statistic is now TML 5 4.918, which is not signi®cant at the 0.05 level. For this example, the
NCP estimate is 13.281 if based on the entire data set. This is more than four times its
corresponding estimate, 2.918, when the ®ve cases are removed. As is well known, almost
all popular ®t indices are based on the NCP estimate (see, for example, McDonald, 1989;
Bentler, 1990). Hence, model evaluation using ®t indices without considering the in¯uence
of outliers would be misleading. Similarly, power analysis without the ®ve in¯uential cases
would lead to quite different conclusions about the one-factor model as compared to an
analysis that would include these ®ve cases (e.g., Satorra & Saris, 1985). The new estimates
uà (1) as well as the approximate biases are given in the upper panel in Table 2. Even though the
effect of the ®ve outliers on parameter estimates is not as deleterious as in the last example,
some of the biases are not trivial.
The corresponding parameter estimates by the ADF approach are given in the lower panel
à However, their effect
of Table 2. The effect of the outliers on uÄ is not as obvious as that on u.
on the model evaluation also is dramatic. The test statistic based on the whole data set is
TADF 5 7.380, signi®cant at the 0.05 level, while the corresponding test without the ®ve
outlying cases is only 4.292. The NCP estimate based on the whole data set is more than
twice of that based on the reduced data set.
While the effect of outliers can be seen in the two practical data sets, not knowing the
true population parameters makes it impossible to evaluate how well the results in Section 2
match empirical research. To obtain better insight into this, we will conduct a simulation
Effect of outliers on estimators and tests in covariance structure analysis
169
Table 2. Parameter estimates and biases for NBA statistics data from Chatterjee et al. (1995)
u
uÃ
uà (1)
uà 2 uà (1)
uÃ
uÄ (1)
uÄ 2 uÄ (1)
l11
l21
l31
l41
w11
w22
w33
w44
7.894
8.161
20.267
5.372
4.800
0.572
1.790
1.812
20.022
1.086
1.008
0.078
16.109
9.580
6.529
7.843
7.353
0.490
2.994
2.510
0.484
0.478
0.328
0.150
8.289
8.243
0.046
4.649
4.936
20.287
1.918
1.834
0.084
0.929
1.004
20.075
7.253
7.464
20.211
6.666
6.557
0.109
2.727
2.616
0.111
0.375
0.338
0.037
study. The S0 is generated by a one-factor model with ®ve indicators, that is,
S0 5 L0 L90 1 W0 ,
where L0 5 (1.0, 1.0, 1.0, 1.0, 1.0)9 and W0 5 diag(0.5, 0.5, 0.5, 0.5, 0.5). The S1 is generated
by a two-factor model with
S1 5 L1 F1 L91 1 W1 ,
where
L1 5
©
1.0
1.0
0
0
0
0
1.0
1.0
ª
0 9
,
1.0
F1 5
©
1.0
0.5
0.5
1.0
ª
,
and W1 5 diag(0.5, 0.5, 0.5, 0.5, 0.5). For the normal theory based method one needs to
choose G0 5 N5 (m0 , S0 ) in order for the chi-square approximation in Theorem 2.2 to hold.
Of course, other choices, such as a multivariate t distribution, are legitimate for G1 in the ML
method and G0 in the ADF method. A speci®c way of generating non-normal multivariate
distributions is as follows. Let A be a p 3 m (m $ p) matrix such that AA9 5 S1 and
z 5 (z1 , z2 , . . . , zm )9, with zi being independent and Var(zi ) 5 1. Then x 5 Az 5 m1 will
have mean vector m1 and covariance matrix S1 . For example, if one generates p independent
chi-square variables (r1 , r2 , . . . , rp ) each with two degrees of freedom, with zi 5 (ri 2 2)/2,
then x 5 S1/2
1 z 1 m1 will serve the purpose for x ~ G1 . Actually, it is not necessary to have
the distributional form of x speci®ed in such a way. More details for creating different
multivariate non-normal distributions can be found in Yuan & Bentler (1999).
We choose G0 5 N5 (0, S0 ) and G1 5 N5 (0, S1 ) in both the ML and the ADF procedures
for convenience. With such a design, j 5 w is easy to calculate and is given in the ®rst
column in Table 3 for c 5 1. With model degrees of freedom p 2 q 5 5 and t 5 h 5 2.0 for
c 5 1, it is also easy for us to evaluate the NCP approximations given in Theorems
p 2.2
and 2.4. We choose sample size N 5 401, N1 5 11 and 21, which lead to c < n1 / n 5 0.5
and 1, respectively. For the purpose of comparison we also include the condition N1 5 0.
With 1000 replications the bias and NCP are calculated as
p
Bias 5 n(uÅ 2 u0 )
and
NCP 5 TÅ 2 df
respectively, where uÅ and TÅ are the average of parameter estimates and test statistics across
the 1000 replications.
170
K.-H. Yuan and P. M. Bentler
Table 3. Simulation results on bias (uÅ 2 u0 ) and non-centrality parameter (T 2 df )
ML
u
l11
l21
l32
l42
l52
w11
w22
w33
w44
w55
NCP
j5w
c51
20.250
20.250
20.083
20.083
20.083
N1 5 0
0.039
20.025
0.003
0.500
0.500
0.167
0.167
0.167
20.038
20.015
20.043
20.046
20.008
20.023
20.006
2.000
20.027
ADF
11
21
20.102
20.166
20.041
20.083
20.060
20.235
20.297
20.080
20.122
20.099
0.673
0.243
0.237
0.081
0.066
0.086
N1 5 0
0.053
20.010
0.018
11
21
20.069
20.132
20.031
20.068
20.057
20.167
20.230
20.073
20.110
20.098
0.502
0.490
0.154
0.141
0.160
20.022
20.009
20.164
20.179
20.128
20.150
20.123
0.049
0.032
20.058
20.081
20.050
0.185
0.168
20.013
20.033
20.006
2.155
0.050
0.479
1.409
When N1 5 11 and c < 1/2, the bias in uà in column 3 should be approximately half of the
theoretical ones under j 5 w in column 1. Considering the ®nite-sample effect in column 2
when there is no theoretical bias, this approximation is pretty good. For example, instead of
half of 20.083, the bias for l42 in column 3 is 2 0.083. This may seem a bad approximation.
However, considering the random error 20.038 in column 2, the amount that 2 0.083 is off
target just re¯ects the ®nite sample effect. With N 5 21, the fourth column of numbers
are estimates for the population numbers in column 1. Considering the ®nite-sample effect
in column 2, the approximation is also quite good. With c < 1/2, the theoretical NCPs are
approximately 0.5 and 2.0, respectively. There exists some discrepancy between the NCP
estimates and those theoretical NCPs. Similar discrepancies have been reported by Satorra
p
et al. (1991), who studied various approximations to NCP when N1 5 N and g 5 O(1/ n).
In contrast to the bias approximations in the ML method, the bias approximations in the
ADF procedure are substantially more off-target. This is related to the ®nite-sample effect, as
shown in column 5, where there is no theoretical bias. This phenomenon was reported in
Yuan & Bentler (1997a) under correct models, where the ®nite-sample bias in uÄ is about 50
à Also, uÄ is much less ef®cient than uà unless data are extremely non-normal.
times of that for u.
Considering these factors and comparing the last two columns of numbers under ADF with
those for N1 5 0, we can see that the parameter estimates are shifted from those corresponding to N1 5 0 by amounts roughly equal to w/2 and w, respectively. Similarly, the NCP
parameter estimate based on the ADF procedure is also some way off from the one described
in Theorem 2.4. This may be related to the unstable nature of TADF , as has been reported
previously (e.g., Yuan & Bentler, 1997b).
4. Discussion
We studied the effect of outliers on the two most commonly used CSA procedures. Even
though a model structure may ®t the majority of the data, a few outliers can discredit the
value of the model. The analytical development in Section 2 establishes a direct relationship between model statistical signi®cance and the presence of outliers. When a signi®cant
chi-square statistic occurs in practice, the researcher should check the model as well as the
Effect of outliers on estimators and tests in covariance structure analysis
171
data since either could be the source of the lack of ®t. When data contain possible outliers,
there are two general approaches to minimize their effect. One approach is to identify the
outliers ®rst, and then to apply classical procedures after outlier removal (see, Lee & Wang,
1996). The other is to use a robust approach to downweight the in¯uence of outliers. With the
®rst approach, no method can guarantee that one can identify all the outliers. With the robust
approach, the in¯uence of outliers is not necessarily completely removed. So the results in
Section 2 may also be relevant to data analysis even when care has been taken to minimize
the effect of outlying cases.
Our model formulation in Section 2 is for covariance structure analysis. We regard
observations from G1 as outliers as long as u0 satis®es S0 5 S(u0 ) while no u satis®es
S1 5 S(u). Thus, it is not necessary for outliers to be very extreme to break down the regular
analysis. A difference that characterizes our approach as compared to that of the robust
statistic literature (e.g., regression), is that in robust statistics, it is usually assumed that
the model is correct, but that error distributions have different tails. We assume here that the
model is incorrect for the outliers. If a model is correct in both the means and covariances,
the ADF procedure works well when sample size is large enough even though errors may
have different third- and fourth-order moments (Yuan & Bentler, 1997b). Although outliers
can have the effect of generating a sample that violates distributional assumptions in the ML
procedure, this is not possible with the ADF procedure since it allows any distributional
shape.
In the technical development in Section 2, we assumed n1 /n N 0 forp
convenience.
In
practice, sample sizes are always ®nite. When N À N1 , we can use c < n1 / n and the other
corresponding sample quantities to approximate the biases in the normal distributions
and non-centrality parameters in the chi-square distributions, as illustrated in the previous
section. The proportion of outliers in a sample cannot be large in order for the asymptotic
results in Section p
2 to be a good approximation. This is parallel to the assumption
S0 5 S(u ) 1 O(1/ n) in Satorra & Saris (1985) and Steiger et al. (1985). These authors
considered the behaviour of TML when N1 5 0.
One may not want to know details about outliers unless they are of special scienti®c
interest. But outliers may in many instances represent important scienti®c opportunities to
discover a new phenomenon. Potentially a new physical or theoretical aspect of a research
area can be enriched by understanding the conditions and meaning associated with the
occurrence of atypical observations. In such a case, focusing attention on only the majority of
the data may be ignoring a fundamental and important phenomenon, and special scienti®c
attention should be devoted to the analysis of atypical cases. Ultimately it may be possible to
develop quite different models for various subsets of the data. In the case of covariance
structure analysis, this is well known, and indeed multiple population models were developed
primarily to deal with the case where a single model would distort a phenomenon. Our
attention here is focused on the situation where prior research and theory dictate that a single
multivariate process is at work and is of primary interest, and where there is recognition that
subject carelessness, response errors, coding errors, transcribing problems or other unknown
sources of distortion to a primary model may be operating, but are not of special interest. In
this situation, we desire to quantify the effects that atypical observations may or may not have
on a standard analysis.
Finally, we need to note that there is no con¯ict in the results developed here with those
in the literature on asymptotic robustness. This is because there is no concept of outliers in
172
K.-H. Yuan and P. M. Bentler
all the asymptotic robustness research. The set-up in the asymptotic robustness literature
assumes that a sample comes from a multivariate distribution whose covariance structure is of
interest. Our set-up is that the majority of the data comes from a distribution with S0 5 S(u0 )
while a small proportion of outliers comes from a distribution that does not satisfy the
proposed structural model. The implication of our result in this regard is that one should
not misuse asymptotic robustness theory by blindly applying a model ®tting procedure in
a CSA program to data with possible outliers.
Appendix
Proof of Lemma 2.1. Let xÅ 0 , xÅ 1 , and S0 , S1 represent the sample means and sample
covariance matrices of x1 , . . . , xN0 and xN01 1 , . . . , xN . Then we have
S5
n0
n1
N0 N1
S0 1 S1 1
(xÅ 0 2 xÅ 1 )(xÅ 0 2 xÅ 1 )9.
n
n
nN
(A.1)
It follows from (A.1) that
r
p
n0 p
n
N N
1
n0 (S0 2 S0 ) 1 p1 (S1 2 S0 ) 1 p0 1 (Åx0 2 xÅ 1 )(Åx0 2 xÅ 1 )9 2 p S0 .
n(S 2 S0 ) 5
n
n
nN
n
(A.2)
p
L
Since n0 (s0 2 s(u0 )) N Np (0, G), the lemma follows from (A.2).
e
Proof of Lemma 2.2. Since n1 /n N 0, it follows from (3) and (A.1) that
a.s.
FML (S, S(u)) ¡¡N FML (S0 , S(u))
uniformly on Q. Since u0 minimizes FML (S0 , S(u)) and the model is identi®ed, the lemma
follows from a standard argument (Yuan, 1997).
e
Proof of Theorem 2.1. Applying the ®rst-order Taylor expansion on the left-hand side of (5)
at u0 gives
p
p
Ç 9W n(s 2 s0 ) 1 op (1).
Ç 21 s
Ç 9Ws)
n(uà 2 u0 ) 5 (s
(A.3)
The theorem follows from Lemma 2.1 by noticing that G 5 W21 .
e
Ã
Proof of Theorem 2.2. Using the Taylor expansion on FML (s) 5 FML (S, S(u))
at
à we have
s
à 5 vech[S(u)],
n
TML 5 nFML (s)
à 1 nFÇ 9ML (s)(s
à 2 s)
à 1 (s 2 s)
à 9FÈML (s)(s
Å 2 s),
Ã
(A.4)
2
where s
Å is a vector lying between s and s.
à Notice that FML (s)
à 5 0, FÇML (s)
à 5 0 and
ÈFML (s)
Å 5 2W 1 op (1). We have from (A.4),
à 9W(s 2 s)
à 1 op (1).
TML 5 n(s 2 s)
(A.5)
Effect of outliers on estimators and tests in covariance structure analysis
173
It follows from (A.3) that
à 2 s0 )
à 5 (s 2 s0 ) 2 (s(u)
s 2 s(u)
Ç uà 2 u0 ) 1 op (1/
5 (s 2 s0 ) 2 s(
p
n)
(A.6)
p
Ç s
Ç 9Ws)
Ç 21 s
Ç 9W](s 2 s0 ) 1 op (1/ n).
5 [I 2 s(
Putting (A.6) into (A.5), one obtains
TML 5 e9Qe 1 op (1),
(A.7)
where
e 5 W1/2
and
p
n(s 2 s0 )
Ç 9W1/2 .
Ç 21 s
Ç s
Ç 9Ws)
Q 5 I 2 W1/2 s(
When G0 5 Np (m0 , S0 ), W 5 G21 . Noticing that Q is a projection matrix of rank p 2 q, the
theorem follows from Lemma 2.1 and (A.7).
e
Proof of Lemma 2.3. This is similar to the proof of Lemma 2.2.
Proof of Lemma 2.3. Using the Taylor expansion of s(u) in (2.9) at u0 , it follows that
p
p
1
1
Ä 2
Å 21 s
Ä 2
n(uÄ 2 u0 ) 5 [s
Ç 9(u)S
Ç u)]
Ç 9(u)S
n(s 2 s0 ),
(A.8)
y s(
y
Å denotes that each row of s(u)
Ç
Ç u)
is evaluated at an uÅ which lies between uÄ and
where s(
P
P
Å
Ä
u0 . Since u N u0 , so does u. Since Sy N G when n1 /n N 0, the theorem follows from (A.8)
and Lemma 2.1.
e
Proof of Theorem 2.4. We have from (A.8),
p
p
Ä 5 [I 2 s(
Ç 9G21 ] n(s 2 s0 ) 1 op (1).
Ç s
Ç 9G21 s)
Ç 21 s
n(s 2 s(u))
So TADF can be written as
Ç 9G21/2 ]e 1 op (1),
Ç s
Ç 9G21 s)
Ç 21 s
TADF 5 e9[I 2 G21/2 s(
p
where e 5 G21/2 n(s 2 s0 ). The proof follows from (A.9) and Lemma 2.1.
(A.9)
Acknowledgements
The authors gratefully acknowledge the constructive comments of three referees and the
editor that led to an improved version of the paper. This work was supported by National
Institute on Drug Abuse grants DA01070 and DA00017 at the US National Institutes of
Health.
References
Amemiya, Y., & Anderson, T. W. (1990). Asymptotic chi-square tests for a large class of factor analysis
models. Annals of Statistics, 18, 1453±1463.
174
K.-H. Yuan and P. M. Bentler
Anderson, T. W., & Amemiya, Y. (1988). The asymptotic normal distribution of estimators in factor
analysis under general conditions. Annals of Statistics, 16, 759±771.
Barnett, V., & Lewis, T. (1994). Outliers in statistical data (3rd ed.). Chichester: Wiley.
Beckman, R. J., & Cook, R. D. (1983).Outlier . . . . . . . . . .s (with discussion).Technometrics, 25, 119±163.
Bentler, P. M. (1990). Comparative ®t indexes in structural models. Psychological Bulletin, 107,
238±246.
Bentler, P. M. (1995). EQS structural equations program manual. Encino, CA: Multivariate Software.
Berkane, M., & Bentler, P. M. (1988). Estimation of contamination parameters and identi®cation of
outliers in multivariate data. Sociological Methods and Research, 17, 55±64.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Bollen, K. A., & Arminger, G. (1991). Observational residuals in factor analysis and structural equation
models. Sociological Methodology, 21, 235±262.
Breckler, S. J. (1990). Application of covariance structure modeling in psychology: Cause for concern?
Psychological Bulletin, 107, 260±273.
Browne, M. W. (1982). Covariance structure analysis. In D. M. Hawkins (Ed.), Topics in applied
multivariate analysis (pp. 72±141). Cambridge: Cambridge University Press.
Browne, M. W. (1984). Asymptotic distribution-free methods for the analysis of covariance structures.
British Journal of Mathematical and Statistical Psychology, 37, 62±83.
Browne, M. W. (1987). Robustness of statistical inference in factor analysis and related models.
Biometrika, 74, 375±384.
Browne, M. W., & Cudeck, R. (1993). Alternatives ways of assessing model ®t. In K. A. Bollen &
J. S. Long (Eds.), Testing structural equation models (pp. 136±162). Newbury Park, CA: Sage.
Browne, M. W., & Shapiro, A. (1988). Robustness of normal theory methods in the analysis of linear
latent variate models. British Journal of Mathematical and Statistical Psychology, 41, 193±208.
Cadigan, N. G. (1995). Local in¯uence in structural equation models. Structural Equation Modeling, 2,
13±30.
Chatterjee, S., Handcock, M. S., & Simonoff, J. S. (1995). A casebook for a rst course in statistics and
data analysis. New York: Wiley.
Cook, R. D. (1986). Assessment of local in¯uence (with discussion). Journal of the Royal Statistical
Society, Series B, 48, 133±169.
de Leeuw, J. (1988). Multivariate analysis with linearizable regressions. Psychometrika, 53, 437±455.
Ferguson, T. S. (1961). On the rejection of outliers. In J. Neyman (Ed.), Proceedings of the fourth
Berkeley symposium on mathematical statistics and probability, Vol. 1 (pp. 253±297). Berkeley:
University of California Press.
Fung, W. K., & Kwan, C. W. (1995). Sensitivity analysis in factor analysis: Difference between using
covariance and correlation matrices. Psychometrika, 60, 607±614.
Gnanadesikan, R. (1997). Methods for statistical data analysis of multivariate observations (2nd ed.).
New York: Wiley.
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., & Stahel, W. A. (1986). Robust statistics: The
approach based on inuence functions. New York: Wiley.
Hu, L., & Bentler, P. M. (1998). Fit indices in covariance structural equation modeling: Sensitivity to
underparameterized model misspeci®cation. Psychological Method, 3, 424±453.
Huber, P. J. (1981). Robust statistics. New York: Wiley.
JoÈreskog, K. G. (1993). Test structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing
structural equation models (pp. 294±316). Newbury Park, CA: Sage.
JoÈreskog, K. G., & SoÈrbom, D. (1993). LISREL 8 user’s reference guide, Chicago: Scienti®c Software
International.
Kwan, C. W., & Fung, W. K. (1998). Assessing local in¯uence for speci®c restricted likelihood:
Application to factor analysis. Psychometrika, 63, 35±46.
Lee, S.-Y., & Wang, S.-J. (1996). Sensitivity analysis of structural equation models. Psychometrika, 61,
93±108.
MacCallum, R. C., & Austin, J. T. (2000). Applications of structural equation modeling in psychological research. Annual Review of Psychology, 51, 201±226.
Magnus, J. R., & Neudecker, H. (1988). Matrix differential calculus with applications in statistics and
econometrics. New York: Wiley.
Effect of outliers on estimators and tests in covariance structure analysis
175
McDonald, R. P. (1989). An index of goodness-of-®t based on noncentrality. Journal of Classication,
6, 97±103.
Mooijaart, A., & Bentler, P. M. (1991). Robustness of normal theory statistics in structural equation
models. Statistica Neerlandica, 45, 159±171.
Mueller, R. O. (1996). Basic principles of structural equation modeling. New York: Springer-Verlag.
Rousseeuw, P. J., & van Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points
(with discussion). Journal of the American Statistical Association, 85, 633±651.
Satorra, A. (1989). Alternative test criteria in covariance structure analysis: A uni®ed approach.
Psychometrika, 54, 131±151.
Satorra, A. (1992). Asymptotic robust inferences in the analysis of mean and covariance structures.
Sociological Methodology, 22, 249±278.
Satorra, A., & Bentler, P. M. (1990). Model conditions for asymptotic robustness in the analysis of
linear relations. Computational Statistics & Data Analysis, 10, 235±249.
Satorra, A., & Bentler, P. M. (1991). Goodness-of-®t test under IV estimation: Asymptotic robustness
of a NT test statistic. In R. GutieÂrrez & M. J. Valderrama (Eds.), Applied stochastic models and
data analysis (pp. 555±567). Singapore: World Scienti®c.
Satorra, A., & Saris, W. E. (1985). Power of the likelihood ratio test in covariance structure analysis.
Psychometrika, 50, 83±90.
Satorra, A., Saris, W. E., & de Pijper, W. M. (1991). A comparison of several approximations to the
power function of the likelihood ratio test in covariance structure analysis. Statistica Neerlandica, 45, 173±185.
Shapiro, A. (1987). Robustness properties of the MDF analysis of moment structures. South African
Statistical Journal, 21, 39±62.
SoÈrbom, D. (1974). A general method for studying differences in factor means and factor structure
between groups. British Journal of Mathematical and Statistical Psychology, 27, 229±239.
Steiger, J. H., Shapiro, A., & Browne, M. W. (1985). On the multivariate asymptotic distribution of
sequential chi-square statistics. Psychometrika, 50, 253±264.
Tanaka, Y., Watadani, S., & Moon, S. H. (1991). In¯uence in covariance structure analysis: With
an application to con®rmatory factor analysis. Communication in Statistics—Theory and
Methods, 20, 3805±3821.
Wilcox, R. R. (1997). Introduction to robust estimation and hypothesis testing. San Diego: Academic
Press.
Yuan, K.-H. (1997). A theorem on uniform convergence of stochastic functions with applications.
Journal of Multivariate Analysis, 62, 100±109.
Yuan, K.-H., & Bentler, P. M. (1997a). Improving parameter tests in covariance structure analysis.
Computational Statistics and Data Analysis, 26, 177±198.
Yuan, K.-H., & Bentler, P. M. (1997b). Mean and covariance structure analysis: Theoretical and
practical improvements. Journal of the American Statistical Association, 92, 767±774.
Yuan, K.-H., & Bentler, P. M. (1998a). Robust mean and covariance structure analysis. British Journal
of Mathematical and Statistical Psychology, 51, 63±88.
Yuan, K.-H., & Bentler, P. M. (1998b). Structural equation modeling with robust covariances.
Sociological Methodology, 28, 363±396.
Yuan, K.-H., & Bentler, P. M. (1999). On normal theory and associated test statistics in covariance
structure analysis under two classes of nonnormal distributions. Statistica Sinica, 9, 831±853.
Yuan, K.-H., Chan, W., & Bentler, P. M. (2000). Robust transformation with applications to structural
equation modelling. British Journal of Mathematical and Statistical Psychology, 53, 31±50.
Yung, Y.-F. (1997). Finite mixtures in con®rmatory factor analysis models. Psychometrika, 62,
297±330.
Received 23 April 1998; revised version received 25 August 2000