The Scale Score

11.4 The Scale Score

Items on a survey may be intended to measure attitudes such as Machiavellianism or Self-Esteem. Measurement of these attitudes is usually accomplished with groups of items that define scales, even though the survey itself may present the items from all the scales in a randomized order.

total score: The

The sum or average of the responses for the items on a scale for each person is referred to as

sum or average of all the items on

the total score, composite score, or scale score.

multi-item scale.

Item analysis is the analysis of the relations between the items and the scale scores to

item analysis,

guide the construction of the multi-item scales. Both the item responses or the scale scores

Section 11.2 , p. 252

can be directly observed by, for example, computing a scale score for each person who took an attitude survey. Contrast these scale scores to the factors from the factor analysis that are inferred abstractions, latent variables instead of observed variables.

Accomplish a more sophisticated item analysis with confirmatory factor analysis, as discussed in Section 11.5 . The more traditional item analysis described in this section, however, can be considered optional and is somewhat tangential to the more sophisticated confirmatory factor analyses. Conceptually, this analyses of the observed responses and scores complement the focus on abstract factors in an exploratory and also confirmatory factor analysis.

A primary role of exploratory factor analysis in item analysis is the construction of scales based on the factor loadings of the items. There is one scale for each factor. Each item is generally put into the scale on which it has its highest corresponding factor loading, unless the absolute value of its highest loading is less than the value of some minimum value such as 0.2. To facilitate this scale construction process, the last part of the efa output defines the corresponding scales, and then constructs the code, to copy and paste into R , to run the confirmatory factor analysis of the resulting scales.

Scenario Analyze the reliability of a total score and the item-total correlations Item analysis procedures evaluate the relation of an item to the overall scale of which the item is a part. The scale score is computed as the sum of the corresponding item scores. Obtain these observed item–scale correlations, the scale–scale correlations and the reliability of each scale score.

The corresponding scales are shown in Listing 11.4 . Here, however, we do not pursue the confirmatory analysis, but only analyze the scale scores. Replace the unmodified call to the

scales function: A

lessR confirmatory factor analysis function, cfa , with the lessR function scales . The scales

call to cfa for the analysis of the

function is an abbreviation for a call to cfa with parameter settings to provide for the analysis

observed scale

of the observed scale scores in place of their unobserved, latent counterparts.

scores.

lessR Input Analysis of four scales from the exploratory factor analysis > scales(

F1=c(m03,m06,m07,m09,m10), F2=c(m01,m05,m08,m12,m13,m18), F3=c(m04,m14,m17,m20), F4=c(m02,m11,m15,m16)

Factor/Item Analysis 261

In this example, only m19 was not included in any of the derived multi-item scales, because its highest factor loading from the exploratory factor analysis was below the minimum absolute value of min.loading=0.2 , the default value of efa . Of course, increasing the value of min.loading will omit more items from the resulting scales.

11.4.1 Item and Scale Correlations

A person’s score on a multi-item scale is the sum of his or her responses to the items that define

the scale. The correlation of an item with its own scale score is an item–total correlation. The call item–total

to scales provides the correlations of the items with all of the scale scores shown in Figure 11.3 . correlation: Correlation of an The item–total correlation of each item with its own scale is highlighted in Figure 11.3 .

item with its own scale score.

m07 m06 m10 m09 m03 m01 m05 m12 m13 m18 m08 m17 m20 m14 m04 m16 m11 m15 m02 F1 .75 .73 .67 .60 .54 .11 .08 .18 .16 .16 .08 .17 .15 .17 .17 .25 .28 .21 .16 .75 .73 .67 .60 .54 .75 .73 .67 .60 .54 F2 .23 .23 .20 .21 -.11 .60 .58 .57 .56 .53 .49 .20 .27 .01 .20 .17 .26 .22 .22 .60 .58 .57 .56 .53 .49 .60 .58 .57 .56 .53 .49 F3 .20 .13 .22 .21 .12 .17 .08 .29 .13 .07 .18 .61 .61 .60 .60 .17 .25 .02 .12 .61 .61 .60 .60 .61 .61 .60 .60 F4 .30 .32 .29 .30 .09 .15 .23 .27 .19 .26 .16 .07 .13 .22 .19 .58 .57 .56 .56 .58 .57 .56 .56

Figure 11.3 Annotated item–total correlations from the function scales plus the added highlight of the uncorrected correlation of each item with its own scale.

One property of these highlighted item–total correlations, the correlation of an item with its own total score, is an upward bias. The reason for this bias is that the item response is by definition part of the total score and so must necessarily correlate at least to some extent with the total score even if the item correlates zero with all the other items on the multi-item

scale. These item–total correlations are usually more useful when the item correlates not with part–whole

the entire total score, but with the total score calculated without the contribution of the item correlation: Correlation of an of interest. This modified correlation is a part–whole correlation or the corrected correlation. This item with the scale more sophisticated version of these corrected correlations is presented later, in the form of the without the item’s influence. correlation of an item with the underlying factor instead of the observed total score.

The correlations of the scales with each other are also part of the scales output, here shown in Listing 11.3 .

correlation matrix, Section 8.3 , p. 194

F1 F2 F3 F4 F1 1.00 0.23 0.27 0.40 F2 0.23 1.00 0.28 0.38 F3 0.27 0.28 1.00 0.25 F4 0.40 0.38 0.25 1.00

Listing 11.3 Correlation matrix of the scale scores.

Here the correlation of Scale F 1 with Scale F 2 , 0.23, is the observed correlation between the two scale scores. Later we obtain the underlying factor correlation in place of the observed, that is, directly calculated, correlation. The true factor correlation is with measurement error purged, the value corrected for attenuation due to measurement error, and so is larger than the value observed here. With the item uniquenesses remaining in the analysis, the correlations are between the scale scores and not between abstract factors, though scales still uses the F notation as an abbreviation for Factor instead of something like S for scale.

262 Factor/Item Analysis

11.4.2 Scale Reliability

reliability:

A fundamental concept of measurement is that of reliability, the consistency of the measured

Consistency of repeated measures.

values from measurement to measurement. For example, suppose one morning you wish to measure your weight. You step on the bathroom scale and it registers 148 lbs. Then suppose you step off the scale, wait a few moments for the scale to re-set to 0, and then step back on the same scale. Now it registers 157 lbs. Your weight has not changed, which means that the measuring device, in this case the bathroom scale for measuring weight, is unreliable.

What is the purpose of measuring attitudes with multi-item scales instead of a single item? One reason is that the multi-item scale score, the composite score, is generally more reliable than any of the individual items that define the scale. A primary reason for the use of scale scores in psychological measurement is due to the relatively large amount of error in individual responses. That is, each item by itself may be relatively unreliable. These composite scores provide a level of analysis that is intermediate to the responses to individual components and the factors from

a factor analysis. Measurements at two different points in time provide the data to assess the reliability of the bathroom scale. The analogy for social measurement would be to administer an attitude survey, wait for enough time to pass so that the responses are not memorized but the underlying

test–retest

attitudes have not changed, and then re-administer the survey. Administering a measuring

reliability: Correlation of

instrument at two different times to assess the consistency of responses forms the basis for

repeated measures.

test–retest reliability. Usually the opportunity to administer a survey multiple times does not exist. How, then,

internal-

to assess consistency over multiple measures at one time period? Assess consistency over

consistency

multiple items that are all on the same scale to obtain the internal-consistency reliability. The

reliability: Based on the correlation

most well known index of internal-consistency reliability is Coefficient Alpha, introduced by

of items on the

Cronbach (1951) and sometimes called Cronbach’s Alpha. These reliability estimates appear in

same scale administered at the

Listing 11.4 .

same time.

Reliability Analysis --------------------

Scale Alpha ------------

F1 0.676 F2 0.550 F3 0.423 F4 0.300

Listing 11.4 Scale reliabilities for the two scales that correspond to the grouping of the items that define the underlying multiple-indicator measurement model.

All of these four scales have low reliabilities. A rough heuristic is to have a scale reliability of at least 0.7 or 0.8, so even the scale with the highest reliability, 0.677, falls short of this goal. To increase the reliability of the scale score, increase the number of items that assess more or less the same underlying attitude. These low scale reliabilities are not surprising because the Mach IV scale was not written as four separate scales, but as a single scale. Future work, then, would extend the reliability of these subscales by increasing the number of items on each scale. First, however, we wish to refine the scales further with confirmatory factor analysis.

Factor/Item Analysis 263