The Effect of Skewness and Kurtosis on M

Sociological Methods & Research
http://smr.sagepub.com

The Effect of Skewness and Kurtosis on Mean and Covariance Structure
Analysis: The Univariate Case and Its Multivariate Implication
Ke-Hai Yuan, Peter M. Bentler and Wei Zhang
Sociological Methods Research 2005; 34; 240
DOI: 10.1177/0049124105280200
The online version of this article can be found at:
http://smr.sagepub.com/cgi/content/abstract/34/2/240

Published by:
http://www.sagepublications.com

Additional services and information for Sociological Methods & Research can be found at:
Email Alerts: http://smr.sagepub.com/cgi/alerts
Subscriptions: http://smr.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.com/journalsPermissions.nav
Citations (this article cites 37 articles hosted on the
SAGE Journals Online and HighWire Press platforms):

http://smr.sagepub.com/cgi/content/refs/34/2/240

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

The Effect of Skewness and Kurtosis on Mean
and Covariance Structure Analysis
The Univariate Case and Its Multivariate Implication
KE-HAI YUAN
University of Notre Dame

PETER M. BENTLER
University of California, Los Angeles

WEI ZHANG
University of Notre Dame

The maximum likelihood (ML) method, based on the normal distribution assumption, is
widely used in mean and covariance structure analysis. With typical nonnormal data,
the ML method will lead to biased statistics and inappropriate scientific conclusions.

This article develops a simple but informative case to show how ML results are influenced by skewness and kurtosis. Specifically, the authors discuss how skewness and
kurtosis in a univariate distribution affect the standard errors of the ML estimators, the
covariances between the estimators, and the likelihood ratio test of hypotheses on
mean and variance parameters. They also describe corrections that have been developed to allow appropriate inference. Enough details are provided so that this material
can be used in graduate instruction. For each result, the corresponding results in the
higher dimensional case are pointed out, and references are provided.
Keywords: likelihood ratio statistic; nonnormal data; sandwich-type covariance matrix;
Wald statistics

1. INTRODUCTION

Mean and covariance structure analysis is becoming increasingly
popular in social and behavioral sciences (Bollen 2002; Boomsma
AUTHORS’ NOTE: The research was supported by Grants DA00017 and DA01070 from the
National Institute on Drug Abuse and NSF Grant DMS04-37167. We thank three referees for their
constructive comments, which led to an improved version of the article.
SOCIOLOGICAL METHODS & RESEARCH, Vol. 34, No. 2, November 2005 240-258
DOI: 10.1177/0049124105280200
Ó 2005 Sage Publications


240

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS

241

2000; MacCallum and Austin 2000). The most widely used method
for estimation and testing is normal theory-based maximum likelihood (ML). In this method, parameter estimates are obtained by
maximizing the likelihood function derived from the multivariate
normal distribution. Standard errors of the maximum likelihood
estimators (MLE) are based on the covariance matrix that is
obtained by inverting the associated information matrix. Overall
model evaluation is accomplished by referring the likelihood ratio
(LR) statistic to a chi-square distribution. Fit indices are also related
to, or derived from, the LR statistic. Although data in practice are
seldom normally distributed (Micceri 1989), researchers commonly
use the ML method without checking the distribution assumption.

One possible reason is that ML is the default method in almost all
the structural equation modeling (SEM) software. Another reason
may be that the effects of nonnormally distributed data on standard
errors of the MLEs and on the LR statistic are not well understood
by applied researchers. Actually, even more technically oriented
publications do not emphasize limitations of the normal theory ML
approach with nonnormal data (see, e.g., reviews by Breckler 1990;
MacCallum and Austin 2000).
Although SEM is taught in most graduate programs, since current
textbooks do not rigorously introduce material on the effect of nonnormal data on model inference, it is likely that few instructors cover
this material in classrooms. The mathematics/statistics involved is
more complicated than that of regression, ANOVA, or basic SEM,
and even courses in univariate and multivariate statistics do not provide enough technical background for digesting the literature on
SEM with nonnormal data. The aim of this article is to provide a rigorous introduction to the effect of nonnormality on statistical inference in mean and covariance structure analysis using the most simple
one-dimensional case. Although the one-dimensional case is oversimplified, all the effects of nonnormality on standard errors and test
statistics in the higher dimensional case are reflected in the onedimensional case. The concepts needed to develop this case are quite
minimal and build on material already in the armamentarium of
many graduate students, namely, basic calculus, linear algebra, and
an introductory course in statistics/probability. Thus, we expect that


Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

242

SOCIOLOGICAL METHODS & RESEARCH

this case can be used as a teaching tool in SEM courses for graduate
students in the social and behavioral sciences.
In the one-dimensional case, the interesting parameters are the
population mean and variance. The effect of nonnormal data on statistical inference for these two parameters can be totally characterized by skewness and kurtosis. The concepts of skewness and
kurtosis in the one-dimensional case are well known to graduate
students in social sciences (see, e.g., Tabachnick and Fidell
2001:73-5). The other concepts involved in this article are partial
derivatives, the law of large numbers, and the central limit theorem.
We will provide the necessary steps for each result so that a quantitative graduate student will be able to check or derive it. For each
result, we will also point out the parallel higher dimensional result
in the SEM literature.
In section 2, we study the effect of nonnormal data on the variances and covariance of the MLEs of the population mean and variance. In section 3, we study the effect of nonnormal data on the LR
and related statistics. In section 4, we present an example illustrating the effect of nonnormal data on the distributions of the MLEs

and the LR statistics. A discussion with a further guide to the literature is provided in section 5.

2. THE NORMAL THEORY BASED MAXIMUM
LIKELIHOOD ESTIMATOR

Let y1 , y2 , . . . , yn be a random sample from a population y with
E(y) = µ, Var(y) = σ 2 , E(y − µ)3 = σ 3/2 γ, and E(y − µ)4 = σ 4 β.
Then γ and β − 3 are the population skewness and kurtosis of y.
When y ∼ N(µ, σ 2 ), γ = 0 and β = 3. This section deals with the
effect of γ and β on the distribution of the normal theory-based
MLEs of µ and σ 2 . Notice that even when γ = 0 and β = 3, y may
still be nonnormally distributed. However, the violation of normality
in higher order moments will have only a minimal effect. Actually, the
asymptotic distributions of the MLEs of µ and σ 2 depend on the distribution of y only up to the fourth-order moment (see, e.g., Ferguson
1996:44-9; Magnus and Neudecker 1999:313-20).

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS


243

Given yi , the likelihood function based on yi ∼ N(µ, σ 2 ) is
Li (µ, σ 2 ) = L(µ, σ 2 |yi ) =

1
exp{−(yi − µ)2 /(2σ 2 )},
(2πσ 2 )1/2

which is just the normal density function with yi known. The corresponding log-likelihood function li = log(Li ) is
1
1
1
li (µ, σ 2 ) = − log(2π) − log(σ 2 ) − 2 (yi − µ)2 :
2
2

The MLEs of µ and σ 2 , based on y1 , y2 , . . . , yn , are
n

n
1
1
2
2
^
^ = y =
µ
yi and σ = s =
(yi − y)2 ,
n i=1
n i=1

which maximize the log-likelihood function
l(µ, σ 2 ) =

n


li (µ, σ 2 ):


(1)

i=1

Closely related to the log-likelihood function is the so-called information matrix


I 11 I 12
,
I=
I 21 I 22
where
 2

∂ li (µ, σ 2 )
I 11 = −E
= 1/σ 2 ,
∂µ∂µ


∂2 li (µ, σ 2 )
I 12 = −E
= 0,
∂µ∂σ 2



∂2 li (µ, σ 2 )
= 0,
I 21 = −E
∂σ 2 ∂µ



∂2 li (µ, σ 2 )
I 22 = −E
= 1/(2σ 4 ):
∂σ 2 ∂σ 2



In mean and covariance structure analysis in higher dimensions,
both the mean vector and the covariance matrix are parameterized

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

244

SOCIOLOGICAL METHODS & RESEARCH

as functions of a more basic set of parameters. Then elements
of − I will be just the expectation of the second derivative of the
log-likelihood function with respect to a pair of the parameters.
y, s2 )0 . When data are normally distribLet θ = (µ, σ 2 )0 and θ^ = (
uted, standard asymptotic statistical theory (see, e.g., Ferguson
1996:121) tells us that, as n → ∞,

L

pffiffiffi
L
^ − θ) !
N(0, ),
n(θ

(2)

where ! means ‘‘converging in distribution.’’ This means that,
with a large n, the distribution of the left side of (2) can be approximately described by a normal random vector with mean zero and
covariance matrix . Furthermore, this covariance matrix is the
inverse of the information matrix


 2

ω11 ω12
0
σ
−1
=
=I =
:
(3)
ω21 ω22
0 2σ 4
^ and σ^2 are asymptotically indeBecause  is a diagonal matrix, µ
pendent. Of course, they are also independent with any finite sample
sizes due to y ∼ N(µ, σ 2 ) (see, e.g., Casella and Berger 2002:218;
Hays 1994:250). Such a result holds also in higher dimensional
normal data. That is, when the mean and covariance structures do not
have overlapping parameters, parameter estimates in the mean structure are asymptotically independent of parameter estimates in the
variance-covariance structure (see Yuan and Bentler forthcoming).
^ σ^2 )0 when data are
We next study the distribution of θ^ = (µ,
nonnormally distributed. Notice that
n
n
1X
1X
(yi − y)2 =
(yi − µ)2 − (
y − µ)2 :
n i=1
n i=1
Pn
Denote σ~2 = i=1 (yi − µ)2 /n. We have

 



pffiffiffi µ
pffiffiffi µ
^−µ
^−µ
0
− pffiffiffi
:
(4)
= n 2
n 2
σ~ − σ 2
σ^ − σ 2
n(
y − µ)2
pffiffiffi
Because (
y − µ) approaches zero in probability and n(
y − µ) is
bounded inpprobability
(see,
e.g.,
Bishop,
Fienberg,
and
Holland
ffiffiffi
2
y − µ) also approaches zero in probability. Denote
1975:476), n(

s2 =

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS

245

follows
from (4)pand
θ~ = (
y, σ~2 )0 . It p
ffiffiffi the well-known Slutsky’s (1925)
ffiffiffi
theorem1 that n(θ^ − θ) and n(θ~ − θ) have the same asymptotic
distribution. Notice that

n 
pffiffiffi
1 X
yi − µ
~
:
(5)
n(θ − θ) = pffiffiffi
2
n i=1 (yi − µ) − σ 2
Applying the central limit theorem to the right side of (5) leads to
pffiffiffi
L
n(θ~ − θ) ! N(0, Π),

where

Π=



π11
π21

π12
π22



with
π11 = E(yi − µ)2 = σ 2 ,
π12 = π21 = E{(yi − µ)[(yi − µ)2 − σ 2 ]} = E(yi − µ)3 = σ 3/2 γ,
π22 = E{[(yi − µ)2 − σ 2 ][(yi − µ)2 − σ 2 ]} = E(yi − µ)4 − σ 4
= σ 4 (β − 1):
pffiffiffi
pffiffiffi
Because n(θ^ − θ) and n(θ~ − θ) have the same asymptotic
distribution,
pffiffiffi
L
n(θ^ − θ) ! N(0, Π):
(6)

Comparing (6) with (2) and (3), ω22 = π22 only when β = 3.
A standard error for σ^2 based on the  in (3) will be negatively biased
when β > 3 and positively biased when β < 3. With sample estimates
of skewness and kurtosis, a consistent estimate of Π can be obtained
when replacing its unknown elements by the sample estimates. Thus,
a consistent standard error of σ^2 will be obtained. This result is a
special case of the so-called sandwich-type covariance matrix in
mean and covariance structure analysis, discussed in Dijkstra
(1981); Bentler (1983); Shapiro (1983); Browne (1984); Bentler
and Dijkstra (1985); Satorra and Bentler (1988, 1994); Arminger and
Schoenberg (1989); Arminger and Sobel (1990); Kano, Berkane, and

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

246

SOCIOLOGICAL METHODS & RESEARCH

Bentler (1993); Browne and Arminger (1995); and Yuan and Bentler
(1997a, 1998a, 1998b, 2000b).
It follows from (6) that the asymptotic distribution of σ^2 depends
^
on σ 2 and β but not γ. In contrast, the asymptotic distribution of µ
does not depend on either γ or β. This is also true in the higher
dimensional case. Results in Yuan and Bentler (1999a, 2000a,
2002a) imply that the asymptotic distributions of the covariance
parameter estimates, the commonly used sample correlation coefficients, and sample reliability coefficients depend on only the joint
fourth-order moments or kurtoses of the variables. Equation (6) also
^ and σ^2 are no longer asymptotically independent when
tells us that µ
γ¼
6 0. This is also true in the higher dimensional case when not all
the third-order moments are zero, where mean and covariance parameter estimates are not asymptotically independent even when they
do not have overlapping parameters (Yuan and Bentler forthcoming).

3. THE NORMAL THEORY-BASED LIKELIHOOD RATIO TEST

We first consider the distribution of the LR statistic when µ is a free
parameter. The null hypothesis is H0 : σ 2 = σ 20 . Notice that when H0
is true, the σ 20 will equal the σ 2 in section 2, which will also be the
scenario we consider in this section. The behavior of the LR statistic
with misspecified models was studied in Shapiro (1983); Satorra and
Saris (1985); Steiger, Shapiro, and Browne (1985); Satorra (1989);
Yuan and Hayashi (2003); Yuan (2005); and Yuan and Bentler
(forthcoming) Yuan, Hayashi and Bentler (2005).
Using the log-likelihood function in (1), we obtain the LR
statistic as
 2
 2

s
s
2
2
^ σ^ ) − l(µ,
^ σ 0 )] = n 2 − log 2 − 1 : (7)
TML = 2[l(µ,
σ0
σ0
It is obvious that (7) is just the univariate version of the normal theorybased discrepancy function in covariance structure analysis (see equation 4.67 of Bollen 1989:107). Notice that
 2

 2

s
s
log 2 = log 1 + 2 − 1 ,
σ0
σ0

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS

247

and (s2 /σ 20 − 1) will be small when n is large. Using the Taylor expansion, we have
 2  2


2
s
s
1 s2
log 2 = 2 − 1 −
− 1 + rn ,
(8)
σ0
σ0
2 σ 20
where nrn approaches zero in probability when n → ∞. Putting (8)
into (7), we get
TML = n

(s2 − σ 20 )2
+ nrn :
2σ 40

It follows from (6) that
pffiffiffi 2
L
n(s − σ 20 ) ! N(0, π22 ):
Thus,

and

pffiffiffi
pffiffiffiffiffiffiffi L
xn = n(s2 − σ 20 )/ π22 ! N(0, 1)

π22 2
x + nrn
2σ 40 n
(β − 1) 2
=
xn + nrn
2
L (β − 1) 2
!
χ1 :
2
So the distribution of the LR statistic is proportional to kurtosis.
TML =

L

When data are normally distributed, β = 3 and TML ! χ21 . A correct
hypothesis σ 20 can be easily rejected when we refer the TML in (7) to
χ21 while β > 3. Similarly, a wrong hypothesis might not be rejected
when β < 3, even when n is large. In the higher dimensional case, the
LR statistic is also proportional to the common kurtosis when data are
elliptically symmetric (Browne 1984; Shapiro and Browne 1987), and
TML may still not depend on skewness when the marginal kurtosis is
heterogeneous (Kano, Berkane, and Bentler 1990) or even when data
are skewed (Yuan and Bentler 1999b).
With a consistent estimator of π22 , we can rescale TML to
TR =

2^
σ4
TML :
π^22

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

(9)

248

SOCIOLOGICAL METHODS & RESEARCH

It is obvious that σ^4 /π^22 converges in probability to 1/(β − 1) and
L

TR ! χ21 :
In the multivariate case, the statistic TR is just the Satorra and Bentler
(1988, 1994) rescaled statistic.
Notice that
pffiffiffiffiffiffiffi L
pffiffiffi
zn = n(s2 − σ 20 )/ π^22 ! N(0, 1):
The Wald-type statistic for testing σ 2 = σ 20 is
TW = n

(s2 − σ 20 )2
:
π^22

(10)

As long as π^22 is consistent for π22 , the asymptotic distribution of TW
is χ21 , which does not depend on the underlying distribution of y.
Such a property is commonly called asymptotically distribution free
(ADF) in the SEM literature. Two estimates of π22 are available.
One is
(1)
=
π^22

n
1
[(yi − y)2 − s2 ]2 ,
n i=1

(11)

which is equivalent to s4 (β^ − 1), where
n
1
(yi − y)4 /s4 :
β^ =
n i=1

The other one is
π^(2)
22 =

n
1
[(yi − y)2 − σ 20 ]2 ,
n i=1

(12)

and there exists
(1)
^22
π^(2)
+ (s2 − σ 20 )2 :
22 = π

Notice that, under H0 : σ 2 = σ 20 , (s2 − σ 20 ) approaches zero in prob(1)
^22
ability according to the law of large numbers, and π^(2)
are
22 and π
asymptotically equivalent. The well-known ADF statistic (Browne
1984) in covariance structure analysis corresponds to a multivariate
(1)
version of TW with π^22 = π^22
. The corrected ADF statistic developed

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS

249

(2)
in Yuan and Bentler (1997b) corresponds to TW with π^22 = π^22
.
Yuan and Bentler (1998c) provided a corrected residual-based ADF
(2)
statistic, which is also a multivariate version of TW with π^22 = π^22
.
2 0
2 0
We next consider testing H0 : (µ, σ ) = (µ0 , σ 0 ) or θ = θ0 . It is
easy to obtain

^ σ^2 ) − l(µ0 , σ 20 )]
TML = 2[l(µ,
 2
 2

s
s
(
y − µ0 )2
= n 2 − log 2 − 1 + n
:
σ0
σ0
σ 20

(13)

Let Π−1/2 be a symmetric matrix such that Π−1/2 Π−1/2 = Π−1 .
It follows from (6) that
pffiffiffi
L
xn = Π−1/2 n(θ^ − θ0 ) ! N(0, I2 ),

where I2 is the 2-by-2 identity matrix. Let


0
1/σ 20
W=
:
0
1/(2σ 40 )
It follows from (8) and (13) that

(s2 − σ 20 )2
(
y − µ0 )2
+
n
+ nrn
2σ 40
σ 20
^ − θ0 )0 W(θ
^ − θ0 ) + nrn
= n(θ

TML = n

= x0n Π1/2 W Π1/2 xn + nrn :
Let 1 and 2 be the two eigenvalues of Π1/2 W Π1/2 . Then there
exist eigenvectors v1 and v2 such that


1 0
1/2
1/2
Π W Π = (v1 , v2 )
(v1 , v2 )0 :
0 2
Let V = (v1 , v2 ) and zn = (zn1 , zn2 )0 = V0 xn . Because V0 V = I2 ,
L

zn ! z = (z1 , z2 )0 ∼ N(0, I2 )
and
TML = 1 z2n1 + 2 z2n2 + nrn :

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

250

SOCIOLOGICAL METHODS & RESEARCH

Notice that the eigenvalues of Π1/2 W Π1/2 equal the eigenvalues of


1
γ/(2σ 3 )1/2
1/2
1/2
:
W ΠW =
γ/(2σ 3 )1/2 (β − 1)/2
The determinant equation determining the eigenvalues is
|W1/2 Π W1/2 −  I2 | = 0,
which is just
β+1
β − 1 − γ 2 /σ 3
= 0:
+
2
2
Solving this equation, we have
1
1 = {β + 1 + [(β − 3)2 + 8γ 2 /σ 3 ]1/2 },
4
1
2 = {β + 1 − [(β − 3)2 + 8γ 2 /σ 3 ]1/2 }:
4
When data are normally distributed (β = 3, γ = 0), 1 = 2 = 1,
2 −

L

TML ! χ22 :
When data are symmetric (γ = 0), 1 = (β − 1)/2 and 2 = 1. In
^ and σ^2 are asymptotically independent. A rescaled
such a case, µ
statistic that removes the effect of β is given by

 2

2^
σ 4 s2
s
(
y − µ0 )2

log
(14)

1
+
n
TR = n
σ 20
σ 20
π^22 σ 20
L

and TR ! χ22 . A parallel statistic to (14) can also be constructed
in the higher dimensional case, although we are not aware of the
existence of such a development.
When data are skewed or γ 6¼ 0, the two eigenvalues are not
equal. We might consider simultaneously removing the effect of
skewness and kurtosis by constructing the rescaled statistic
2
TR =
(15)
T :
^ Π)
^ ML
tf(W
However, TR will not approach χ22 but 2(1 z21 + 2 z22 )/(1 + 2 ),
whose mean is 2 = E(χ22 ). In the higher dimensional case, the
rescaled statistic parallel to the TR in (15) generally does not follow

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS

251

a chi-square distribution (Satorra and Bentler 1994; Yuan and Bentler
2000b). Even with only a covariance structure model, the rescaled
statistic parallel to the TR in (9) may not follow a chi-square distribution either due to the heterogeneity of the eigenvalues, although its
distribution is still chi-square under various conditions (see Yuan and
Bentler 1999b). Similarly, ADF-type statistics can be constructed in
testing H0 : (µ, σ 2 )0 = (µ0 , σ 20 )0 , and we leave it to readers to work
out the details. See Browne and Arminger (1995) and Yuan and
Bentler (1997b, 1999c) for the higher dimensional case.

4. A NUMERICAL EXAMPLE

Neumann (1994) studied the relationship of alcohol and psychological
symptoms. His data set consists of p = 10 variables and n = 335 cases.
We will use the family history of psychopathology variable to illustrate
the effect of skewness and kurtosis. For this variable, the MLEs of µ
^ = y = 1:361 and σ^2 = s2 = 2:302; the samand σ 2 are, respectively, µ
ple skewness and kurtosis are, respectively,
n
1
γ^ =
(yi − y)3 /s3 = 2:001 and β^ = 8:766:
n i=1
Note that both γ^ and β^ − 3 are significantly different from zero
(see, e.g., Snedecor and Cochran 1989, Table A19), indicating that the
sample most likely comes from a nonnormal distribution. Our purpose
here is to illustrate the effect of γ and β on the asymptotic distributions
^ and σ^2 and statistics for testing µ = µ0 and σ 2 = σ 20 , not to elaboof µ
rate on the substantive side of the data.
Assuming the sample is from N(µ, σ 2 ), the asymptotic distribu^ σ^2 )0 is given by (2) with
tion of (µ,


2:302
0
^
=
:
0
10:598
Admitting that the data may not be normally distributed, the asymp^ σ^2 )0 is given by (6) with
totic distribution of (µ,


2:302 6:989
^
Π=
:
6:989 41:154

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

252

SOCIOLOGICAL METHODS & RESEARCH

If based on (2), the estimated standard error of σ^2 is (10:598/335)1/2 =
0:178. If based on (6), the estimated standard error of σ^2 is
(41:154/335)1/2 = 0:350, almost double that based on (2). Of course,
^ and σ^2 are no longer asymptotically independent in this example
µ
when using (6). Although the confidence interval for σ 2 based on (2) is
much shorter than that based on (6), the shorter interval is a misleading
result due to the nonnormality of the data.
Turning to hypothesis testing, suppose the null hypothesis is
H0 : (µ, σ 2 ) = (1:2, 1:8). When testing σ 2 = 1:8 alone, the LR statistic in (7) is TML = 11:021, which is highly significant when referred
to χ21 . The rescaled statistic in (9) is TR = 2:838; the Wald statistic in
(1)
(10), using the π^(1)
22 in (11), is T W = 2:051; and the Wald statistic in
(2)
(2)
(10), using the π^22 in (12) is T W = 2:039. None is statistically signif(1)
and
icant at the α = 0:05 level when referred to χ21 . Note that T W
(2)
T W have a tiny difference in this example because of p = 1 and a
relatively large sample size. In the higher dimensional case, their dif(2)
ference can be huge (Yuan and Bentler 1997b, 1998c), and T W
is
recommended for more reliable inference with smaller samples.
When testing (µ, σ 2 ) = (1:2, 1:8) simultaneously, the LR statistic in (13) is TML = 15:845, which is highly significant when
referred to χ22 . The rescaled statistic in (15) is given by
TR = 4:153, which is no longer statistically significant at the
α = 0:05 level when referred to χ22 . Note that the TR in (15) unlikely follows χ22 due to the significance of γ^. However, referring TR
to a chi-square distribution does make the inference more reliable.
More empirical results about these statistics in higher dimensional
cases can be found in Hu, Bentler, and Kano (1992) and Yuan and
Bentler (1998c).
The rescaled statistic in (14) is TR = 7:662, which is statistically
significant at the α = 0:05 level when referred to χ22 . However, this
TR is not justified when γ^ is statistically significant.
In summary, all the evidence in this example is not against the
hypothesis H0 : (µ, σ 2 ) = (1:2, 1:8) when using proper statistics.
However, if one starts with the normal theory-based ML procedure without checking the distribution of the sample, H0 will be
rejected!

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS

253

5. DISCUSSION

Motivated by the gap in teaching resources and the technical literature of SEM, this article provides a simplified version of SEM with
nonnormal data. Some of the material in this article may be just
a trivial exercise to quantitative graduate students. For students or
researchers who do not have much quantitative training, the material
should better facilitate an understanding of the effect of nonnormal
data on standard errors and test statistics in mean and covariance
structure analysis. Of course, fit indices defined through TML will be
equally affected by skewness or kurtosis (see Yuan 2005 and Yuan
and Marshall 2004). Readers with a solid quantitative background
may further read the literature in the higher dimensional case cited in
sections 2 and 3. We hope the article will help to solidify an understanding of the effect of nonnormal data on SEM inference, especially in graduate education.
Proper procedures have to be used to get reliable inferences with
nonnormal data. Although we do not have space to discuss these,
we would like to note that truly robust methods, not depending on
ML, do exist. These methods minimize the effect of bad data not
only on standard errors and test statistics but also on parameter
estimates and power evaluations (Yuan and Bentler 1998a, 1998b,
2000c; Yuan, Bentler, and Chan 2004; Yuan, Chan, and Bentler 2000;
Yuan and Hayashi 2003; Yuan, Marshall, and Weston 2002). It is well
known that Mardia’s (1970, 1974) measure of multivariate kurtosis is
a generalization of the univariate kurtosis β − 3. When the sample
multivariate kurtosis is significantly greater than that of the multivariate normal distribution, a robust procedure might be necessary. In
small samples, the significance of Mardia’s coefficient can be evaluated using the simulation approach of Bonett, Woodward, and Randall
(2002). In addition to nonnormal data, a small sample size also tends
to cause the significance of the statistic TML with correctly specified
models. Remedies in this direction are addressed in Bentler and Yuan
(1999) and Yuan and Bentler (1999c).
In this article, we have emphasized the ML function for the simplest case in which data are complete and obtained by simple random
sampling. As can easily be imagined, the problems arising from

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

254

SOCIOLOGICAL METHODS & RESEARCH

skewness and kurtosis do not vanish when data are missing or when
data are obtained under hierarchical sampling schemes. The same
principles apply—that is, normal theory-based ML standard errors
and test statistics will be biased under nonnormality. Solutions to this
problem for the missing data case were developed by Arminger and
Sobel (1990) and Yuan and Bentler (2000b). Solutions for multilevel
data were provided by Poon and Lee (1994) and Yuan and Bentler
(2002b, 2003).
There is a related literature called asymptotic robustness theory.
This is concerned with the validity of normal theory-based methods
with large-sample nonnormal data (Amemiya and Anderson 1990;
Anderson and Amemiya 1988; Browne and Shapiro 1988; Mooijaart
and Bentler 1991; Satorra and Bentler 1990; Shapiro 1987; Yuan and
Bentler 1999a, 1999b). Unfortunately, the conditions for asymptotic
robustness depend on both the data and the model, and there is no
effective way to verify these conditions at present. It is not appropriate
to blindly trust that a researcher’s given data and model satisfy these
conditions.
In practice, TML with empirical data is often statistically significant when referred to a chi-square distribution. However, a small p
value associated with TML may not be due to a bad model and/or
too much power (i.e., a huge sample size). It may be due to violation of assumptions or bad data (Yuan and Bentler 2001). The
newer statistics2 described in this article do not require making the
stringent multivariate normality assumption. These statistics not
only are liable to make a good model more acceptable statistically
but also should lead to more accurate scientific conclusions.

NOTES
L

1. The theorem states that, if xn ! x, an converges in probability to a and bn converges
L

in probability to b, an xn + bn ! ax + b.
2. Most of the procedures discussed in this article, such as standard errors based on the
sandwich-type covariance matrices, rescaled and improved asymptotically distribution free
statistics, robust methods, and statistics that perform well with small samples, are currently
available in EQS 6.0 (Bentler forthcoming).

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS

255

REFERENCES
Amemiya, Yasuo and Theodore W. Anderson. 1990. ‘‘Asymptotic Chi-Square Tests for a
Large Class of Factor Analysis Models.’’ Annals of Statistics 18:1453-63.
Anderson, Theodore W. and Yasuo Amemiya. 1988. ‘‘The Asymptotic Normal Distribution of
Estimators in Factor Analysis Under General Conditions.’’ Annals of Statistics 16:759-71.
Arminger, Gehard and Ronald Schoenberg. 1989. ‘‘Pseudo Maximum Likelihood Estimation
and a Test for Misspecification in Mean and Covariance Structure Models.’’ Psychometrika
54:409-26.
Arminger, Gehard and Michael E. Sobel. 1990. ‘‘Pseudo-Maximum Likelihood Estimation of
Mean and Covariance Structures With Missing Data.’’ Journal of the American Statistical
Association 85:195-203.
Bentler, Peter M. 1983. ‘‘Some Contributions to Efficient Statistics in Structural Models:
Specification and Estimation of Moment Structures.’’ Psychometrika 48:493-517.
———. Forthcoming. EQS 6 Structural Equations Program Manual. Encino, CA: Multivariate
Software.
Bentler, Peter M. and Theo K. Dijkstra. 1985. ‘‘Efficient Estimation Via Linearization
in Structural Models.’’ Pp. 9-42 in Multivariate Analysis VI, edited by P. R. Krishnaiah.
Amsterdam: North-Holland.
Bentler, Peter M. and Ke-Hai Yuan. 1999. ‘‘Structural Equation Modeling With Small Samples: Test Statistics.’’ Multivariate Behavioral Research 34:181-97.
Bishop, Yvonne M. M., Stephen E. Fienberg, and Paul W. Holland. 1975. Discrete Multivariate Analysis: Theory and Practice. Cambridge, MA: MIT Press.
Bollen, Kenneth A. 1989. Structural Equations With Latent Variables. New York: John
Wiley.
———. 2002. ‘‘Latent Variables in Psychology and the Social Sciences.’’ Annual Review of
Psychology 53:605-34.
Bonett, Douglas G., J. Arthur Woodward, and Robert L. Randall. 2002. ‘‘Estimating p-Values
for Mardia’s Coefficients of Multivariate Skewness and Kurtosis.’’ Computational Statistics 17:117-22.
Boomsma, Anne. 2000. ‘‘Reporting on Structural Equation Analyses.’’ Structural Equation
Modeling 7:461-83.
Breckler, Steven J. 1990. ‘‘Application of Covariance Structure Modeling in Psychology:
Cause for Concern?’’ Psychological Bulletin 107:260-73.
Browne, Michael W. 1984. ‘‘Asymptotic Distribution-Free Methods for the Analysis of Covariance Structures.’’ British Journal of Mathematical and Statistical Psychology 37:62-83.
Browne, Michael W. and Gehard Arminger. 1995. ‘‘Specification and Estimation of Mean and
Covariance Structure Models.’’ Pp. 185-249 in Handbook of Statistical Modeling for the
Social and Behavioral Sciences, edited by G. Arminger, C. C. Clogg, and M. E. Sobel.
New York: Plenum.
Browne, Michael W. and Alexander Shapiro. 1988. ‘‘Robustness of Normal Theory Methods
in the Analysis of Linear Latent Variate Models.’’ British Journal of Mathematical and
Statistical Psychology 41:193-208.
Casella, George and Roger L. Berger. 2002. Statistical Inference. Pacific Grove, CA: Duxbury.
Dijkstra, Theo K. 1981. ‘‘Latent Variables in Linear Stochastic Models: Reflections on ‘Maximum Likelihood’ and ‘Partial Least Squares’ Methods.’’ Ph.D. dissertation, University
of Groningen.
Ferguson, Thomas S. 1996. A Course in Large Sample Theory. London: Chapman & Hall.

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

256

SOCIOLOGICAL METHODS & RESEARCH

Hays, William L. 1994. Statistics. 5th ed. Fort Worth, TX: Harcourt Brace.
Hu, Li-tze, Peter M. Bentler, and Yutaka Kano. 1992. ‘‘Can Test Statistics in Covariance
Structure Analysis Be Trusted?’’ Psychological Bulletin 112:351-62.
Kano, Yukata, Maria Berkane, and Peter M. Bentler. 1990. ‘‘Covariance Structure Analysis
With Heterogeneous Kurtosis Parameters.’’ Biometrika 77:575-85.
———. 1993. ‘‘Statistical Inference Based on Pseudo-Maximum Likelihood Estimators in
Elliptical Populations.’’ Journal of the American Statistical Association 88:135-43.
MacCallum, Robert C. and James T. Austin. 2000. ‘‘Applications of Structural Equation
Modeling in Psychological Research.’’ Annual Review of Psychology 51:201-26.
Magnus, Jan R. and Heinz Neudecker. 1999. Matrix Differential Calculus With Applications
in Statistics and Econometrics. Rev. ed. New York: John Wiley.
Mardia, Kanti V. 1970. ‘‘Measures of Multivariate Skewness and Kurtosis With Applications.’’ Biometrika 57:519-30.
———. 1974. ‘‘Applications of Some Measures of Multivariate Skewness and Kurtosis in
Testing Normality and Robustness Studies.’’ Sankhyā B 35:115-28.
Micceri, Theodore. 1989. ‘‘The Unicorn, the Normal Curve, and Other Improbable
Creatures.’’ Psychological Bulletin 105:156-66.
Mooijaart, Ab and Peter M. Bentler. 1991. ‘‘Robustness of Normal Theory Statistics in Structural Equation Models.’’ Statistica Neerlandica 45:159-71.
Neumann, Craig S. 1994. ‘‘Structural Equation Modeling of Symptoms of Alcoholism and
Psychopathology.’’ Ph.D. dissertation, University of Kansas.
Poon, Wai-Yin and Sik-Yum Lee. 1994. ‘‘A Distribution Free Approach for Analysis of
Two-Level Structural Equation Model.’’ Computational Statistics and Data Analysis
17:265-75.
Satorra, Albert. 1989. ‘‘Alternative Test Criteria in Covariance Structure Analysis: A Unified
Approach.’’ Psychometrika 54:131-51.
Satorra, Albert and Peter M. Bentler. 1988. ‘‘Scaling Corrections for Chi-Square Statistics in
Covariance Structure Analysis.’’ Pp. 308-13 in American Statistical Association 1988
Proceedings of Business and Economics Sections. Alexandria, VA: American Statistical
Association.
———. 1990. ‘‘Model Conditions for Asymptotic Robustness in the Analysis of Linear Relations.’’ Computational Statistics and Data Analysis 10:235-49.
———. 1994. ‘‘Corrections to Test Statistics and Standard Errors in Covariance Structure
Analysis.’’ Pp. 399-419 in Latent Variables Analysis: Applications for Developmental
Research, edited by A. von Eye and C. C. Clogg. Newbury Park, CA: Sage.
Satorra, Albert and William Saris. 1985. ‘‘Power of the Likelihood Ratio Test in Covariance
Structure Analysis.’’ Psychometrika 50:83-90.
Shapiro, Alexander. 1983. ‘‘Asymptotic Distribution Theory in the Analysis of Covariance
Structures (A Unified Approach).’’ South African Statistical Journal 17:33-81.
———. 1987. ‘‘Robustness Properties of the MDF Analysis of Moment Structures.’’ South
African Statistical Journal 21:39-62.
Shapiro, Alexander and Michael W. Browne. 1987. ‘‘Analysis of Covariance Structures
Under Elliptical Distributions.’’ Journal of the American Statistical Association 82:
1092-97.
Slutsky, Eugene. 1925. Über Stochastische Asymptoten and Grenzwerte.’’ Metron 5:1-90.
Snedecor, George W. and William G. Cochran. 1989. Statistical Methods. 8th ed. Ames:
Iowa State University Press.
Steiger, James H., Alexander Shapiro, and Michael W. Browne. 1985. ‘‘On the Multivariate
Asymptotic Distribution of Sequential Chi-Square Statistics.’’ Psychometrika 50:253-64.

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS

257

Tabachnick, Barbara G. and Linda S. Fidell. 2001. Using Multivariate Statistics 4th ed.
New York: HarperCollins.
Yuan, Ke-Hai. 2005. ‘‘Fit Indices Versus Test Statistics.’’ Multivariate Behavior Research
40:115-148.
Yuan, Ke-Hai and Peter M. Bentler. 1997a. ‘‘Improving Parameter Tests in Covariance Structure Analysis.’’ Computational Statistics and Data Analysis 26:177-98.
———-. 1997b. ‘‘Mean and Covariance Structure Analysis: Theoretical and Practical
Improvements.’’ Journal of the American Statistical Association 92:767-74.
———-. 1998a. ‘‘Robust Mean and Covariance Structure Analysis.’’ British Journal of
Mathematical and Statistical Psychology 51:63-88.
———-. 1998b. ‘‘Structural Equation Modeling With Robust Covariances.’’ Sociological
Methodology 28:363-96.
———-. 1998c. ‘‘Normal Theory Based Test Statistics in Structural Equation Modelling.’’
British Journal of Mathematical and Statistical Psychology 51:289-309.
———-. 1999a. ‘‘On Asymptotic Distributions of Normal Theory MLE in Covariance Structure Analysis Under Some Nonnormal Distributions.’’ Statistics and Probability Letters
42:107-13.
———-. 1999b. ‘‘On Normal Theory and Associated Test Statistics in Covariance Structure
Analysis Under Two Classes of Nonnormal Distributions.’’ Statistica Sinica 9:831-53.
———-. 1999c. ‘‘F-tests for Mean and Covariance Structure Analysis.’’ Journal of Educational and Behavioral Statistics 24:225-43.
———-. 2000a. ‘‘Inferences on Correlation Coefficients in Some Classes of Nonnormal
Distributions.’’ Journal of Multivariate Analysis 72:230-48.
———-. 2000b. ‘‘Three Likelihood-Based Methods for Mean and Covariance Structure
Analysis With Nonnormal Missing Data.’’ Sociological Methodology 30:167-202.
———-. 2000c. ‘‘Robust Mean and Covariance Structure Analysis Through Iteratively
Reweighted Least Squares.’’ Psychometrika 65:43-58.
———-. 2001. ‘‘Effect of Outliers on Estimators and Tests in Covariance Structure Analysis.’’
British Journal of Mathematical and Statistical Psychology 54:161-75.
———-. 2002a. ‘‘On Robustness of the Normal-Theory Based Asymptotic Distributions of
Three Reliability Coefficient Estimates.’’ Psychometrika 67:251-9.
———-. 2002b. ‘‘On Normal Theory Based Inference for Multilevel Models With Distributional Violations.’’ Psychometrika 67:539-61.
———-. 2003. ‘‘Eight Test Statistics for Multilevel Structural Equation Models.’’ Psychometrika 44:89-107.
———-. Forthcoming. ‘‘Mean Comparison: Manifest Variable Versus Latent Variable.’’
Psychometrika.
Yuan, Ke-Hai, Peter M. Bentler, and Wai Chan. 2004. ‘‘Structural Equation Modeling With
Heavy Tailed Distributions.’’ Psychometrika 69:421-36.
Yuan, Ke-Hai, Wai Chan, and Peter M. Bentler. 2000. ‘‘Robust Transformation With Applications to Structural Equation Modeling.’’ British Journal of Mathematical and Statistical
Psychology 53:31-50.
Yuan, Ke-Hai and Kentaro Hayashi. 2003. ‘‘Bootstrap Approach to Inference and Power
Analysis Based on Three Statistics for Covariance Structure Models.’’ British Journal of
Mathematical and Statistical Psychology 56:93-110.
Yuan, Ke-Hai, Kentaro Hayashi, and Peter M. Bentler. (N.d.) ‘‘Normal theory likelihood ratio
statistic for mean and covariance structure analysis under alternative hypotheses.’’
Yuan, Ke-Hai and Linda L. Marshall. 2004. ‘‘A New Measure of Misfit for Covariance Structure Models.’’ Behaviormetrika 31:1-24.

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

258

SOCIOLOGICAL METHODS & RESEARCH

Yuan, Ke-Hai, Linda L. Marshall, and Rebecca Weston. 2002. ‘‘Cross-Validation Through
Downweighting Influential Cases in Structural Equation Modeling.’’ British Journal of
Mathematical and Statistical Psychology 55:125-43.

Ke-Hai Yuan is an associate professor in quantitative psychology at the University of
Notre Dame. His research interests are in the areas of psychometric theory and applied
multivariate statistics. He received the Cattell award for early career outstanding multivariate research from the Society of Multivariate Experimental Psychology in 2002.
Peter M. Bentler is a Distinguished Professor of Psychology and Statistics at UCLA
who has been an elected president of the Psychometric Society, the Society of Multivariate Experimental Psychology, and the Division of Evaluation, Measurement, and Statistics of the American Psychological Association. He directs the Center for Collaborative
Research on Drug Abuse.
Wei Zhang is a graduate student in quantitative psychology at the University of Notre
Dame. His primary research interest is structural equation modeling.

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on February 6, 2008
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.