Dimensional Reduction

8.2 Dimensional Reduction

When using principal component analysis for dimensional reduction, one must decide how many components (and corresponding variances) to retain. There are several criteria published in the literature to consider. The following are commonly used:

1. Select the principal components that explain a certain percentage (say, 95%) of tr( Λ). This is a very simplistic criterion that is not recommended.

2. The Guttman-Kaiser criterion discards eigenvalues below the average tr( Λ)/d (below 1 for standardised data), which amounts to retaining the components responsible for the variance contributed by one variable if the total variance was equally distributed.

3. The so-called scree test uses a plot of the eigenvalues (scree plot), discarding those starting where the plot levels off.

4. A more elaborate criterion is based on the so-called broken stick model. This criterion discards the eigenvalues whose proportion of explained variance is smaller than what should be the expected length l k of the kth longest segment of a unit length stick randomly broken into d segments:

A table of l k values is given in Tools.xls.

5. The Bartlett’s test method is based on the assessment of whether or not the null hypothesis that the last p − q eigenvalues are equal, λ q+1 = λ q+2 = … = λ p , can be accepted. The mathematics of this test are intricate (see Jolliffe IT, 2002, for a detailed discussion) and its results often unreliable. We pay no further attention to this procedure.

6. The Velicer partial correlation procedure uses the partial correlations among the original variables when one or more principal components are removed. Let S k represent the remaining covariance matrix when the covariance of the first k principal components is removed:

S k = S − ∑ λ i u i u i ’ ; k = 0 , 1 , K , d . 8.13

Using the diagonal matrix D k of S k , containing the variances, we compute the correlation matrix:

Finally, with the elements r ij(k) of R k we compute the following quantity:

8 Data Structure Analysis

r ij ( k ) / [ d ( d − 1 ) ] .

The f k are the sum of squares of the partial correlations when the first k principal components are removed. As long as f k decreases, the partial covariances decline faster than the residual variances. Usually, after an initial decrease, f k will start to increase, reflecting the fact that with the removal of main principal components, we are obtaining increasingly correlated “noise”. The k value corresponding to the first f k minimum is then used as the stopping rule. The Velicer procedure can be applied using the velcorr function implemented in MATLAB and R and available in Tools (see Appendix F).

Example 8.3

Q: Using all the previously described criteria, determine the number of principal components for the Cork Stoppers’ dataset (150 cases, 10 variables) that should be retained and assess their contribution.

A: Table 8.2 shows the computed eigenvalues of the cork-stopper dataset. Figure 8.5a shows the scree plot and Figure 8.5b shows the evolution of Velicer’s f k . Finally, Table 8.3 compares the number of retained principal components for the several criteria and the respective percentage of explained variance. The highly recommended Velicer’s procedure indicates 3 as the appropriate number of principal components to retain.

Table 8.2. Eigenvalues of the cork-stopper dataset computed with MATLAB (a scale factor of 10 4 has been removed).

Table 8.3. Comparison of dimensional reduction criteria (Example 8.3). Criterion

Broken stick Velicer k

Scree test

31313 Explained

variance 96.5% 83.7% 96.5% 83.7% 96.5%

8.3 Principal Components of Correlation Matrices 339

1 2 3 4 5 6 7 8 9 10 b 0.2 1 2 3 4 5 6 7 8 9 10 Figure 8.5. Assessing the dimensional reduction to be performed in the cork

stopper dataset with: a) Scree plot, b) Velicer partial correlation plot. Both plots obtained with MATLAB.