The asymptotic variance of ˆθ is equal to see e.g. Yuan and Bentler, 1998
Ω = A
−1
ΠA
−1
, 15
with A =
∂σθ
∂θ
T
T
W
∂σθ
∂θ
T
, Π =
∂σθ
∂θ
T
T
WV
Σ
W
∂σθ
∂θ
T
and V
Σ
being the asymptotic variance of
√ n
ˆ σ
= √
n vec ˆ
Σ. 14 actually defines a class of GLS estimators within which the most
efficient and equivalent to the MLE is obtained when W = V
−1 Σ
, which is replaced in practice
by a sample estimate of V
Σ
. Hence for the MLE, 15 simplifies to Ω =
∂σθ
∂θ
T T
V
−1 Σ
∂σθ
∂θ
T −1
. 16
When θ contains only the loadings, 16 can be written as
V
G
= D
Γ T
V
−1 Σ
D
Γ −1
, 17
with V
Σ
given in 8. One can see that 11 and 17 are equal if D
Γ
is a square matrix, but not in the other cases. It is however not clear which of the asymptotic covariance matrices is smaller,
and hence which of the two-stage or direct estimators is the most efficient.
4. Simulation study
The aim of the simulation study is to compare the direct CBS estimator with respectively the classical two-stage estimator i.e based on the Pearson correlations and the two-stage es-
timator using the robust BS -estimator as covariance matrix estimator, under different kinds of models and data contamination. For both robust estimators, the breakdown point ǫ
∗
is set to 50. For the classical and robust two-stage estimators we use the software Mplus, whereas we
use the statistical package R R Development Core Team, 2008 for the direct CBS , with the OGK
estimator as a starting point as well as for computing the BS -estimator. We try both the MLE and the GLS objective function in Mplus for the classical and robust two-stage estimators,
but the latter doesn’t seem to be reliable in relatively small samples n = 40 , p = 20 as will be shown below with Figure 1. Therefore, we will finally compare three estimators, the classical
and robust two-stage estimators with, respectively, the Pearson or the BS covariance estimates with the ML objective function MLP and MLBS and the direct CBS .
The data are generated from two different models, a model with a relatively small number of manifest variables p = 6 for which we simulate relatively large samples n = 100 and a model
with a relatively large number of manifest variables p = 20 for which we simulate relatively small samples n = 40. In both models, we set the number of latent variables to 2, and for each
of them, two sets of values for Γ were chosen in order to create a strong and weak signal to noise ratio. The values of the parameters for all models can be found in Appendix B. For the model
with p = 6, the number of loadings different from 0 q is equal to p and for the other model we choose 2 different values for q, namely q = p = 20 and q = 26.
The contamination consists in generating 1 − ǫ of the data from a multivariate normal
distribution with parameters given by the model and ǫ of the data from a multivariate normal distribution with one third of the loadings different from 0 that are multiplied by 10. Different
amounts of contamination are chosen, but we present here only the ǫ = 5 case. For each model, we compute the asymptotic efficiency measured by the ratio of the traces of
the covariance matrices of the robust two-stage relative to the direct CBS estimators using 11 6
and 17 with the true values of the parameters. With our chosen set of parameters, when q = p the efficiency is approximately one but when q p the two-stage estimator is asymptotically
more efficient see values in Appendix B. In Figure 1 are presented the bias distributions of the loadings’ and uniqueness’ estimators for
the model with n = 40 q = p = 20 and a high signal to noise ratio, without data contamination. The estimators are the two-stage estimators using respectively the ML or the GLS objective
functions, with the Pearson correlations or the BS estimator, as well as the direct CBS estimation. As mentioned previously, it seems that using the GLS objective function produces unreliable
results, while the other estimators MLP, MLBS and CBS behave well unbiased and in the same manner same variances.
MLP GLSP
MLBS GLSBS
CBS −2.0
−1.0 0.0
0.5 1.0
1.5
γ
3
MLP GLSP
MLBS GLSBS
CBS −2.0
−1.0 0.0
0.5 1.0
1.5
γ
16
MLP GLSP
MLBS GLSBS
CBS −2.0
−1.0 0.0
0.5 1.0
1.5
ψ
3
MLP GLSP
MLBS GLSBS
CBS −2.0
−1.0 0.0
0.5 1.0
1.5
ψ
16
Figure 1: Bias distribution of the loadings’ and uniqueness’ estimators for the model with n = 40, no contamination and a high signal to noise ratio
When there is data contamination, not surprisingly the simulations show that the two-stage estimator based on the Pearson’s covariance matrix is biased, whereas this is not the case with
the CBS and the two-stage estimator based on the BS see Figure 2. Overall, the two robust estimators MLBS and CBS seem to be pretty similar in terms
of bias and variance. However, in one case n = 100, p = 6 and low signal to noise ratio the two-stage estimator produces extreme estimates for the loadings and negative ones for the
uniqueness, as can be seen in Figures 3 and 4. The Mean Square Error of the two-stage estimator is really larger than that of the CBS in that case see Table 1. This could be due to the fact that
the two-stage estimator needs the estimation of a full covariance matrix.
In the case where q p, although the asymptotic variance of the CBS is larger than that 7
MLP MLBS
CBS −1.0
−0.5 0.0
0.5 1.0
γ
3
MLP MLBS
CBS −1.5
−1.0 −0.5
0.0 0.5
1.0 1.5
γ
5
MLP MLBS
CBS −1.5
−0.5 0.0
0.5 1.0
1.5
ψ
3
MLP MLBS
CBS −1.5
−0.5 0.0
0.5 1.0
1.5
ψ
5
Figure 2: Bias distribution of the loadings’ and uniqueness’ estimators for the model with p = 6, data contamination and a high signal to noise ratio
of the two-stage estimator, with finite samples, the CBS doesn’t seem to be more variable than the two-stage estimator with or without data contamination, as illustrated in Figure 5. This
might be explained by the fact that since p is large relative to n, the BS estimator can be very computationally unstable so that its finite sample variance is artificially increased. With the
CBS
the number of parameters to estimate is smaller relative to the sample size, hence providing more computationally stable estimators.
To conclude, the simulation study shows that, in finite samples, the CBS is not less efficient than the indirect estimator and it is more stable in some cases where the latter produces extreme
estimates.
8
MLP MLBS
CBS −4
−2 2
4
γ
2
MLP MLBS
CBS −4
−2 2
4
γ
6
MLP MLBS
CBS −4
−2 2
4
ψ
1
MLP MLBS
CBS −4
−2 2
4
ψ
6
Figure 3: Bias distribution of the loadings’ and uniqueness’ estimators for the model with p = 6 without data contami- nation and with a low signal to noise ratio
Table 1: MSE × 100 for the model with p = 6, low signal to noise ratio
ǫ
a
= 0 ǫ = 5
MLP MLBS
CBS MLP
MLBS CBS
γ
1
5.45 4.82
9.24 7.72
6.57 10.91
γ
2
17.14 31.74
16.19 63.09
9.74 18.32
γ
3
9.35 23.39
15.67 22.86
15.99 16.53
γ
4
10.63 54.25
13.02 31.96
35.76 13.88
γ
5
1.24 1.31
3.60 1.35
1.72 4.16
γ
6
6.86 8.78
9.82 44.43
28.16 12.09
ψ
1
13.45 5.53
9.79 16.52
10.93 10.97
ψ
2
174.50 692.83
26.61 1536.86
30.74 28.27
ψ
3
56.35 275.53
25.17 244.39
110.70 26.43
ψ
4
62.63 4658.36
30.62 1338.24
1013.10 34.91
ψ
5
1.51 1.76
5.40 1.43
2.27 5.95
ψ
6
30.47 69.43
23.58 1348.62
445.21 26.20
a
Amount of contamination
9
MLP MLBS
CBS −4
−2 2
4
γ
2
MLP MLBS
CBS −4
−2 2
4
γ
6
MLP MLBS
CBS −4
−2 2
4
ψ
1
MLP MLBS
CBS −4
−2 2
4
ψ
6
Figure 4: Bias distribution of the loadings’ and uniqueness’ estimators for the model with p = 6 with data contamination and with a low signal to noise ratio
10
MLP MLBS
CBS −1.5
−1.0 −0.5
0.0 0.5
1.0 1.5
γ
4
MLP MLBS
CBS −1.5
−1.0 −0.5
0.0 0.5
1.0 1.5
γ
23
MLP MLBS
CBS −1.5
−0.5 0.0
0.5 1.0
1.5
ψ
4
MLP MLBS
CBS −1.5
−0.5 0.0
0.5 1.0
1.5
ψ
17
Figure 5: Bias distribution of the loadings’ and uniqueness’ estimators for the model with n = 40, q p, data contami- nation and with a low signal to noise ratio
11
5. Conclusion