Principal Components of Correlation Matrices

8.3 Principal Components of Correlation Matrices

Sometimes, instead of computing the principal components out of the original data, they are computed out of the standardised data, i.e., using the z-scores of the data. This is the procedure followed by SPSS and STATISTICA, which is related to the factor analysis approach described in the following section. Using the standardised data has the consequence of eigenvalues and eigenvectors computed from the correlation matrix instead of the covariance matrix (see formula 8.2). The R function princomp has a logical argument, cor, whose value controls the use of the data correlation or covariance matrix. The results obtained are, in general, different.

Note that since all diagonal elements of a correlation matrix are 1, we have tr( Λ) = d. Thus, the Guttman-Kaiser criterion amounts, in this case, to selecting the eigenvalues which are greater than 1.

Using standardised data has several benefits, namely imposing equal contribution of the original variables when they have different units or heterogeneous variances.

Example 8.4

Q: Compare the bivariate principal component analysis of the Rocks dataset (134 cases, 18 variables), using covariance and correlation matrices.

A: Table 8.4 shows the eigenvectors and correlations (called factor loadings in STATISTICA) computed with the original data and the standardised data. The first ones, u 1 and u 2 , are computed with MATLAB or R using the covariance matrix; the second ones, f 1 and f 2 , are computed with STATISTICA using the correlation matrix. Figure 8.6 shows the corresponding pc scores (called factor scores in STATISTICA), that is the data projections onto the principal components.

8 Data Structure Analysis

We see that by using the covariance matrix, only one eigenvector has dominant correlations with the original variables, namely the “compression breaking load” variables RMCS and RCSG. These variables are precisely the ones with highest variance. Note also the dominant values of the first two elements of u. When using the correlation matrix, the f elements are more balanced and express the

contribution of several original features: f 1 highly correlated with chemical features, and f 2 highly correlated with density (MVAP), porosity (PAOA), and water absorption (AAPN).

The scatter plot of Figure 8.6a shows that the pc scores obtained with the

covariance matrix are unable to discriminate the several groups of rocks; u 1 only discriminates the rock classes between high and low “compression breaking load” groups. On the other hand, the scatter plot in Figure 8.6b shows that the pc scores obtained with the correlation matrix discriminate the rock classes, both in terms of

chemical composition (f 1 basically discriminates Ca vs. SiO 2 -rich rocks) and of

density-porosity-water absorption features (f 2 ).

Table 8.4. Eigenvectors of the rock dataset computed from the covariance matrix (u 1 and u 2 ) and from the correlation matrix (f 1 and f 2 ) with the respective correlations. Correlations above 0.7 are shown in bold.

u 1 u 2 r 1 r 2 f 1 f 2 r 1 r 2 RMCS

-0.695 0.487 -0.983 0.136 -0.079 0.018 -0.569 0.057

RCSG

-0.714 -0.459 -0.984 -0.126 -0.069 0.034 -0.499 0.105

RMFX

-0.013 -0.489 -0.078 -0.606 -0.033 0.053 -0.237 0.163

MVAP

-0.015 -0.556 -0.089 -0.664 -0.034 0.271 -0.247 0.839

AAPN

0.000 0.003 0.251 0.399 0.046 -0.293 0.331 -0.905

PAOA

0.001 0.008 0.241 0.400 0.044 -0.294 0.318 -0.909

CDLT

0.001 -0.005 0.240 -0.192 0.001 0.177 0.005 0.547

RDES

0.002 -0.002 0.523 -0.116 0.070 -0.101 0.503 -0.313

RCHQ

-0.002 -0.028 -0.060 -0.200 -0.095 0.042 -0.689 0.131 SiO 2 -0.025 0.046 -0.455 0.169 -0.129 -0.074 -0.933

Al 2 O 3 -0.004 0.001 -0.329 0.016 -0.129 -0.069 -0.932

Fe 2 O 3 -0.001 -0.006 -0.296 -0.282 -0.111 -0.028 -0.798

MnO

-0.000 -0.000 -0.252 -0.039 -0.090 -0.011 -0.647 -0.034

CaO

0.020 -0.025 0.464 -0.113 0.132 0.073 0.955

MgO

-0.003 -0.007 -0.393 -0.226 -0.024 0.025 -0.175 0.078

Na 2 O

-0.001 0.004 -0.428 0.236 -0.119 -0.071 -0.856

-0.001 0.005 -0.320 0.267 -0.117 -0.084 -0.845

TiO 2 -0.000 -0.000 -0.152 -0.097 -0.088 -0.026 -0.633 -0.079

8.3 Principal Components of Correlation Matrices 341

Diorite Marble

-0.4 Slate Limestone

U1 a -0.5 -3.0

-3 Granite Diorite

-4 Marble Slate Limestone

F1 -5 b -2.5

0.0 0.5 1.0 1.5 2.0 2.5 Figure 8.6. The rock dataset analysed with principal components computed from

the covariance matrix (a) and from the correlation matrix (b).

Example 8.5

Q: Consider the three classes of the Cork Stoppers’ dataset (150 cases). Evaluate the training set error for linear discriminant classifiers using the 10 original features and one or two principal components of the data correlation matrix.

A: The classification matrices, using the linear discriminant procedure described in Chapter 6, are shown in Table 8.5. We see that the dimensional reduction didn’t degrade the training set error significantly. The first principal component, F1, alone corresponds to more than 86% of the total variance. Adding the principal component F2, 94.5% of the total data variance is explained. Principal component F1 has a distribution that is well approximated by the normal distribution (Shapiro-Wilk

8 Data Structure Analysis

p = 0.69, 0.67 and 0.33 for class 1, 2 and 3, respectively). For the principal component F2, the approximation is worse for the first class (Shapiro-Wilk p =

0.09, 0.95 and 0.40 for class 1, 2 and 3, respectively).

A classifier with only one or two features has, of course, a better dimensionality ratio and is capable of better generalisation. It is left as an exercise to compare the cross-validation results for the three feature sets.

Table 8.5. Classification matrices for the cork stoppers dataset. Correct classifications are along the rows (50 cases per class).

10 Features F 1 and F 2 F 1 ω 1 ω 2 ω 3 ω 1 ω 2 ω 3 ω 1 ω 2 ω 3 ω 1 45 5 0 46 4 0 47 3 0 ω 2 7 42 1 11 39 0 10 40 0 ω 3 0 4 46 0 5 45 0 5 45

Pe 10% 16% 6% 8% 22% 10% 6% 20% 10%

Example 8.6

Q: Compute the main principal components for the two first classes of the Cork Stoppers’ dataset, using standardised data. Select the principal components using the Guttman-Kaiser criterion. Determine the respective correlations with each original variable and interpret the results.

A: Figure 8.7a shows the eigenvalues computed with STATISTICA. The first two eigenvalues comply with the Guttman-Kaiser criterion (take note that the sum of all eigenvalues is 10).

The factor loadings of the two main principal components are shown in Figure 8.8a. Significant values appear in bold. A plot of these factor loadings is shown in Figure 8.8b. It is clearly visible that the first principal component, F 1 , is highly correlated with all cork-stopper features except N and the opposite happens with

F 2 . These observations suggest, therefore, that the description (or classification) of

the two cork-stopper classes can be achieved either with F 1 and F 2 , or with feature N and one of the other features, namely the highest correlated feature PRTG (total perimeter of the big defects).

Furthermore, we see that the only significant correlation relative to F 2 is smaller than any of the significant correlations relative to F 1 . Thus, F 1 or PRTG alone describes most of the data, as suggested by the scatter plot of Figure 8.7b (pc scores).

When analysing grouped data with principal components, as we did in the previous Examples 8.4 and 8.6, one often wants to determine the most important

8.3 Principal Components of Correlation Matrices 343

variables as well as the data groups that best reflect the behaviour of those variables.

Figure 8.7. Dimensionality reduction of the first two classes of cork-stoppers:

a) Eigenvalues; b) Principal component scatter plot (compare with Figure 6.5). (Both graphs obtained with STATISTICA.)

Consider the means of variable F1 in Example 8.6: 0.71 for class 1 and −0.71 for class 2 (see Figure 8.7b). As expected, given the translation y = x – x , the means are symmetrically located around F1 = 0. Moreover, by visual inspection, we see that the class 1 cases cluster on a high F1 region and class 2 cases cluster on

a low F1 region. Notice that since the scatter plot 8.7b uses the projections of the standardised data onto the F1-F2 plane, the cases tend to cluster around the (1, 1) and ( −1, −1) points in this plane.

Figure 8.8. Factor loadings table (a) with significant correlations in bold and graph (b) for the first two classes of cork-stoppers, obtained with STATISTICA.

8 Data Structure Analysis

In order to analyse this issue in further detail, let us consider the simple dataset shown in Figure 8.9a, consisting of normally distributed bivariate data generated

with (true) mean µ o = [3 3] and the following (true) covariance matrix: ’

Figure 8.9b shows this dataset after standardisation (subtraction of the mean and division by the standard deviation) with the new covariance matrix:

The standardised data has unit variance along all variables with the new covariance: σ 12 = σ 21 = 3/( 5 2 ) = 0.9487. The eigenvalues and eigenvectors of Σ (computed with MATLAB function eig), are:

Note that tr( Λ) = 2, the total variance, and that the first principal component explains 97% of the total variance. Figure 8.9c shows the standardised data projected onto the new system of variables F1 and F2.

Let us now consider a group of data with mean m o = [4 4] and a one-standard- ’

deviation boundary corresponding to the ellipsis shown in Figure 8.9a, with s x = 5 /2 and s y = 2 /2, respectively. The mean vector maps onto m = m o – µ o =

[1 1] ; given the values of the standard deviation, the ellipsis maps onto a circle of ’

radius 0.5 (Figure 8.9b). This same group of data is shown in the F1-F2 plane (Figure 8.9c) with mean:

Figure 8.9d shows the correlations of the principal components with the original variables, computed with formula 8.9:

r F 1 X = r F 1 Y = 0.987;

r F 2 X = − r F 2 Y = 0.16 .

These correlations always lie inside a unit-radius circle. Equal magnitude correlations occur when the original variables are perfectly correlated with λ 1 = λ 2 = 1. The correlations are then r F 1 X = r F 1 Y =1/ 2 (apply formula 8.9).

8.3 Principal Components of Correlation Matrices 345

In the case of Figure 8.9d, we see that F1 is highly correlated with the original variables, whereas F2 is weakly correlated. At the same time, a data group lying in the “high region” of X and Y tends to cluster around the F1 = 1 value after projection of the standardised data. We may superimpose these two different graphs – the pc scores graph and the correlation graph – in order to facilitate the interpretation of the data groups that exhibit some degree of correspondence with high values of the variables involved.

0 1 2 Figure 8.9. Principal component transformation of a bivariate dataset: a) original

data with a group delimited by an ellipsis; b) Standardised data with the same group (delimited by a circle); c) Standardised data projection onto the F1-F2 plane;

d) Plot of the correlations (circles) of the original variables with F1 and F2.

Example 8.7

Q: Consider the Rocks’ dataset, a sample of 134 rocks classified into five classes (1=“granite”, 2=“diorite”, 3=“marble”, 4=“slate”, 5=“limestone”) and characterised by 18 features (see Appendix E). Use the two main principal components of the data in order to interpret it.

8 Data Structure Analysis

A: Only the first four eigenvalues satisfy the Kaiser criterion. The first two eigenvalues are responsible for about 58% of the total variance; therefore, when discarding the remaining eigenvalues, we are discarding a substantial amount of the information from the dataset (see Exercise 8.12).

We can conveniently interpret the data by using a graphic display of the standardised data projected onto the plane of the first two principal components, say F1 and F2, superimposed over the correlation plot. In STATISTICA, this overlaid graphic display can be obtained by first creating a datasheet with the projections (“factor scores”) and the correlations (“factor loadings”). For this purpose, we first extract the scrollsheet of the “factor scores” (click with the right button of the mouse over the corresponding “factor scores” sheet in the workbook and select Extract as stand alone window). Then, secondly, we join the factor loadings in the same F1 and F2 columns and create a grouping variable that labels the data classes and the original variables. Finally, a scatter plot with all the information, as shown in Figure 8.10, is obtained.

By visual inspection of Figure 8.10, we see that F1 has high correlations with chemical features, i.e., reflects the chemical composition of the rocks. We see, namely, that F1 discriminates between the silica-rich rocks such as granites and diorites from the lime-rich rocks such as marbles and limestones. On the other hand, F2 reflects physical properties of the rocks, such as density (MVAP), porosity (PAOA) and water absorption (AAPN). F2 discriminates dense and compact rocks (e.g. marbles) from less dense and more porous counterparts (e.g. some limestones).

1.0 MVAP

0.5 CaO

0.0 F2 Fe2O3 Al2O3

Na2O

SiO2 K2O -0.5

AAPN -1.0

PAOA

-1.5 Granite Diorite Marble Slate

-2.0 Limestone F1-type variable F2-type variable

Figure 8.10. Partial view of the standardised rock dataset projected onto the F1-F2 principal component plane, overlaid with the correlation plot.

8.4 Factor Analysis