Exploratory Factor Analysis

11.3 Exploratory Factor Analysis

11.3.1 The Concept of Exploratory Factor Analysis

The two primary types of factor analysis are confirmatory and exploratory, the latter the subject of this section. The models that underlie both types of factor analysis are similar. Both models express each measured variable as a function of the underlying factors and the corresponding uniqueness term. The distinction is that for exploratory factor analysis, introduced by Thurstone (1931), only the number of factors is specified to begin the analysis. Unlike confirmatory analysis, exploratory factor analysis expresses each measured variable as a function of all the factors specified in the analysis.

For example, suppose two specified factors account for the correlational structure of six items. Then the measurement model estimated by the corresponding exploratory factor analysis specifies six equations, each with two factors. Each measured variable has a pattern coefficient,

a λ coefficient, on each of the two specified factors.

Conducting the exploratory factor analysis provides an estimated value of λ for each factor for each measured variable. Often the goal of such an analysis is to uncover a structure in which the response for each measured variable is causally dependent on a single factor and is not directly dependent on any other factor in the system. For the exploratory analysis,

Factor/Item Analysis 255

every measured variable is to some extent directly related to every factor. The desired solution, simple

however, would be that for each item there is one relatively large value of λ and the rest of the structure: A factor solution in λ ’s are reasonably close to zero in value. This style of solution is a simple structure solution.

which each item

The measurement model for the exploratory factor analysis is primarily relates to underidentified, which means

a single factor.

that there is no unique solution for each estimated value in the model. The situation here is the same as the examples from high school algebra for systems of equations in which there are

more unknown values to solve for than there are equations to provide information on which underidentified

to base the solution. With underidentification there are many possible equally valid solutions. model: The model

has no unique

Accordingly, an exploratory factor analysis is actually two different analyses. First obtain the solution from the initial solution, the available data. factor extraction. The criterion for which to extract the factors is to define

factor

the first factor such that it is related as strongly as possible to all of the existing variables. The extraction: First problem with the initial solution is that there is usually a strong general factor such that most set of factors in an exploratory of the items tend to highly relate to this general factor and less so with the other factors. Then analysis. the second extracted factor is the factor most related to all of the items after the first factor has factor rotation: been extracted. The result is that the initial extraction of the specified number of factors results Second set of

in a solution that is roughly opposite of the desired goal of simple structure. factors in an

exploratory

The factors from the initial factor extraction solution are uncorrelated. For the usual attitude analysis, which are survey that measures multiple attitudes, however, the attitudes are usually correlated with interpreted.

each other. When the second, rotated solution is obtained, the rotated factors can remain orthogonal

rotation:

uncorrelated, what are called orthogonal factors, or they can be rotated into a correlated solution, Uncorrelated resulting in what are called oblique factors.

factors from the rotation.

For the study of sets of multi-item attitude scales, the scales present on the attitude survey oblique

are typically compared with scales defined by the factor analysis. Define a derived scale for rotation: each extracted factor, and then place each item on the scale for which it is most related to the Correlated factors from the rotation. corresponding factor. For the purpose of deriving scales of related items, the choice of orthogonal or oblique factor is not critical as both do reasonably well at this task (Gerbing & Hamilton, 1996).

11.3.2 Exploratory Analysis of the Mach IV Scale

The data to which we apply factor analysis are the responses to the Mach IV scale for the measurement of Machiavellianism. The first task is to read the data. The data are included in Mach IV scale, the lessR package, and so can be read with the format="lessR" option.

Section 1.2 , p. 26

> mydata <- Read("Mach4", format="lessR") Before conducting a factor analysis, the items on a scale should be coded in the same

direction. For the Mach IV scale, 11 of the 20 items are written so that agreement with the item indicates low Machiavellianism. These 11 items are reverse scored before computing the item correlations.

reverse score, Section 3.4.1 ,

> mydata <- Recode(c(m03,m04,m06,m07,m09,m10,m11,m14,m16,m17,m19), p. 63 old=0:5, new=5:0)

To do a factor analysis we need the item correlations, the 20 × 20 correlation matrix, which by default accesses the data in the mydata data frame.

correlation matrix,

> mycor <- Correlation(m01:m20)

Section 8.3 , p. 194

256 Factor/Item Analysis

In contrast to confirmatory factor analysis, exploratory analysis does not require the specification of a complete measurement model. Exploratory factor analysis only requires that the number of factors that account for item correlations first be specified. Probably the most useful test for the optimal number of factors is the scree plot, based on what are called eigenvalues from the initial factor extraction of the item correlation matrix. The eigenvalues are just a rescaling of the extracted factors, one eigenvalue per factor. The first extracted factor has the largest eigenvalue, the second extracted factor the second largest, and so forth.

scree plot: Plot of

The word “scree” refers to a geological description of a cliff in which rock and dirt have

successive eigenvalues from

slid down the face of the cliff, gathering at the bottom. The scree plot plots the eigenvalues

the correlation

sequentially in descending order. There are as many eigenvalues as items in the input correlation

matrix to identify the number of

matrix, but the size of these eigenvalues, when plotted, tends to resemble a scree. Examination

factors.

of the scree plot attempts to separate the important factors, which correspond to the cliff, from the scree, or rubble, at the bottom of the cliff.

Scenario Estimate the number of factors of a correlation matrix Analyze a correlation matrix to suggest the number of factors that should be specified for an exploratory factor analysis.

scree function:

The lessR function corScree , abbreviated scree , provides a scree plot of the eigenvalues

Obtain a scree plot and plot of

of the input correlation matrix. To generate the scree plot for the default correlation matrix

eigenvalue

mycor , no argument to the function is needed.

differences.

lessR Input Scree and scree difference plot > scree()

The scree plot for the 20-item Mach IV correlation matrix appears in Figure 11.1 . Where does the “cliff” end and the “scree” begin? The answer based on this scree plot is around the 4th eigenvalue, after which the “scree” begins to accumulate. At this point the slope becomes much less steep.

To view this change in slope between the “cliff” and the “scree” more directly, your author also prefers to evaluate the plot of the difference of the successive eigenvalues, to plot the successive changes directly. Accordingly, the scree function also provides this plot, shown in Figure 11.2 . Here only the first four eigenvalue differences lie above the flat line, which is automatically provided as part of the plot. After the first four eigenvalue differences, the rate of change is essentially constant, which plots as a flat line. This flat line through the plotted eigenvalue differences represents the scree.

Any purely statistical criterion for the number of factors is at best a guideline for the most useful number. The final decision regarding the number of factors also depends on the interpretability of the resulting exploratory factor analysis. For example, if a five-factor solution is more interpretable than a four-factor solution, then that former solution would be preferred. Perhaps an important concept that corresponds to a factor is only represented by a small number of items, and hence the corresponding factor has a relatively small eigenvalue, but because of its substantive importance the factor should still be included in the results.

Factor/Item Analysis 257

Figure 11.1 Scree plot for the successive eigenvalues of the 20-item Mach IV correlation matrix.

es u

1.5 val

e Eigen v 1.0

ccessi u

erence of S Diff

5 10 15 Index

Figure 11.2 Scree plot for the difference of successive eigenvalues of the 20-item Mach IV correlation matrix.

258 Factor/Item Analysis

The scree plot, and its associated plot of successive differences, provide a useful starting point for specifying the number of factors that underlie the correlation matrix, and a required starting point for the exploratory factor analysis. Next proceed with the exploratory factor analysis.

Scenario Exploratory factor analysis Given a correlation matrix of the measured variables, such as the items, extract the specified number of factors and then rotate to a correlated factor solution.

The lessR exploratory factor analysis routine is corEFA , abbreviated efa . The specified number of factors is given by the required n.factors argument. There are many factor extraction methods. The standard R system factor analysis function factanal , upon which efa relies, does a maximum likelihood extraction.

show.initial

Here we proceed with the four-factor maximum likelihood exploratory analysis on the

option: If TRUE, show the initial

default correlation matrix mycor . This initial solution is not displayed by default. To view the

extracted factor

initial factors, specify show.initial=TRUE .

solution.

lessR Input Exploratory factor analysis for a four-factor model > efa(n.factors=4)

promax

rotation: Oblique

The default rotational method provided by efa is promax, a rotational method that yields

factor solution.

correlated, that is, oblique, factors. The underlying latent variables of interest, such as the

varimax

attitudes of interest, correspond to the extracted factors. The alternative is to rotate the initially

rotation: Orthogonal factor

extracted factors to an uncorrelated or orthogonal solution. The orthogonal rotation method R

solution.

provides is varimax. To specify, include rotate="varimax" in the call to efa .

rotate option:

The primary output of the exploratory four-factor solution appears in Listing 11.1 . R labels

Can specify a varimax rotation.

the output Loadings: , a somewhat ambiguous term in the factor analysis literature. Here the term refers to the pattern coefficients, the λ ’s of the underlying measurement model. Loadings with a value close to zero, by default values between −

0.2 and 0.2, are replaced by blanks to

facilitate interpretation of the more important coefficients.

min.loading

option: Specify the

The minimum value to display can be changed by specifying a value other than 0.2 for the

minimum loading to display on the

min.loading option. By default the items are sorted by their highest loading across all the

output.

factors, though just the items with loadings of 0.5 or greater are listed first. To suppress sorting,

sort option: If

specify the following in the call to efa , sort=FALSE .

FALSE then the items are not

The four factors in this solution define a four-dimensional space, that is, a coordinate system

sorted by highest

with four axes, and each item can be plotted in that space. In most social science research,

factor loading.

however, the interpretation of this loading matrix is usually not based on the factors as abstract dimensions, but rather on the multi-item scales defined from this loading matrix.

In Listing 11.1 , we see, for example, that Items m06 , m07 , m10 , m03 and m09 all load primarily on the first rotated factor. The interpretation of the underlying factor, and the associated scale of items, is based on the content of these items. This interpretation is explored in the next section.

The relations of the four factors across all the measured variables are shown in Listing 11.2 . Here SS loadings refers to the sum of the squared factor loadings across all of the items for

Factor/Item Analysis 259

Loadings: Factor1 Factor2 Factor3 Factor4 m06 0.828

m07 0.712 m10 0.539 m05

0.649 m13

0.543 -0.226 m18

0.555 -0.253 m14

m01 0.490 m02

m03 0.422 -0.318 m04

m09 0.323 m12

Listing 11.1 Factor loadings for the rotated four-factor extraction of the correlations of Mach IV items.

each factor. There can be as many factors as there are items, so the total amount of variance to account for is 20. The proportion of variance accounted for by the sum of squared loadings, Proportion Var , for the first factor follows.

Proportion of Variance for Factor #1: 1.933/20 = 0.097 As can be seen, the sum of squared loadings for the first three factors all approximate 2,

whereas there is a drop-off for the fourth factor, a value that only approximates 1. As mentioned previously, however, these indices should not be interpreted too literally because perhaps the content of the items that tap into the fourth factor are simply fewer in number than the number of items that tap into the other factors. The issue, again, cannot be reduced solely to statistics, but instead focuses on the interpretability and usefulness of the factors for explaining, in this example, Machiavellianism.

Factor1 Factor2 Factor3 Factor4 SS loadings

Proportion Var 0.097

Cumulative Var 0.097

Listing 11.2 The sum of the squared loadings of each factor across the measured variables and the proportion of the total variance accounted for by each factor.

Listing 11.1 and 11.2 are the primary output of the exploratory factor analysis. Now the task is to interpret the meaning of the extracted factors, usually in the form of the corresponding multi-item scales that are defined on the basis of the pattern of factor loadings.

260 Factor/Item Analysis