Rank Correlation Coefficient
16.7 Rank Correlation Coefficient
In Chapter 11, we used the sample correlation coefficient r to measure the pop- ulation correlation coefficient ρ, the linear relationship between two continuous variables X and Y . If ranks 1, 2, . . . , n are assigned to the x observations in or- der of magnitude and similarly to the y observations, and if these ranks are then substituted for the actual numerical values in the formula for the correlation coef- ficient in Chapter 11, we obtain the nonparametric counterpart of the conventional correlation coefficient. A correlation coefficient calculated in this manner is known as the Spearman rank correlation coefficient and is denoted by r s . When there are no ties among either set of measurements, the formula for r s reduces to
a much simpler expression involving the differences d i between the ranks assigned to the n pairs of x’s and y’s, which we now state.
16.7 Rank Correlation Coefficient 675
Rank Correlation
A nonparametric measure of association between two variables X and Y is given Coefficient by the rank correlation coefficient
where d i is the difference between the ranks assigned to x i and y i and n is the number of pairs of data.
In practice, the preceding formula is also used when there are ties among ei- ther the x or y observations. The ranks for tied observations are assigned as in the signed-rank test by averaging the ranks that would have been assigned if the observations were distinguishable.
The value of r s will usually be close to the value obtained by finding r based on numerical measurements and is interpreted in much the same way. As before, the value of r s will range from −1 to +1. A value of +1 or −1 indicates perfect association between X and Y , the plus sign occurring for identical rankings and the minus sign occurring for reverse rankings. When r s is close to zero, we conclude that the variables are uncorrelated.
Example 16.8: The figures listed in Table 16.7, released by the Federal Trade Commission, show the milligrams of tar and nicotine found in 10 brands of cigarettes. Calculate the rank correlation coefficient to measure the degree of relationship between tar and nicotine content in cigarettes.
Table 16.7: Tar and Nicotine Contents Cigarette Brand Tar Content Nicotine Content
Chesterfield
Old Gold
Philip Morris
31 2.0 Solution : Let X and Y represent the tar and nicotine contents, respectively. First we assign
Players
ranks to each set of measurements, with the rank of 1 assigned to the lowest number in each set, the rank of 2 to the second lowest number in each set, and so forth, until the rank of 10 is assigned to the largest number. Table 16.8 shows the individual rankings of the measurements and the differences in ranks for the
10 pairs of observations.
676 Chapter 16 Nonparametric Statistics
Table 16.8: Rankings for Tar and Nicotine Content
Cigarette Brand
Chesterfield
Old Gold
Philip Morris
Substituting into the formula for r s , we find that (6)(5.50)
indicating a high positive correlation between the amounts of tar and nicotine found in cigarettes.
Some advantages to using r s rather than r do exist. For instance, we no longer assume the underlying relationship between X and Y to be linear and therefore, when the data possess a distinct curvilinear relationship, the rank correlation co- efficient will likely be more reliable than the conventional measure. A second ad- vantage to using the rank correlation coefficient is the fact that no assumptions of normality are made concerning the distributions of X and Y . Perhaps the greatest advantage occurs when we are unable to make meaningful numerical measurements but nevertheless can establish rankings. Such is the case, for example, when dif- ferent judges rank a group of individuals according to some attribute. The rank correlation coefficient can be used in this situation as a measure of the consistency of the two judges.
To test the hypothesis that ρ = 0 by using a rank correlation coefficient, one needs to consider the sampling distribution of the r s -values under the assumption of no correlation. Critical values for α = 0.05, 0.025, 0.01, and 0.005 have been calculated and appear in Table A.21. The setup of this table is similar to that of the table of critical values for the t-distribution except for the left column, which now gives the number of pairs of observations rather than the degrees of freedom. Since the distribution of the r s -values is symmetric about zero when ρ = 0, the r s -value that leaves an area of α to the left is equal to the negative of the r s -value that leaves an area of α to the right. For a two-sided alternative hypothesis, the critical region of size α falls equally in the two tails of the distribution. For a test in which the alternative hypothesis is negative, the critical region is entirely in the left tail of the distribution, and when the alternative is positive, the critical region is placed entirely in the right tail.
Exercises 677
Example 16.9: Refer to Example 16.8 and test the hypothesis that the correlation between the amounts of tar and nicotine found in cigarettes is zero against the alternative that it is greater than zero. Use a 0.01 level of significance.
4. Critical region: r s > 0.745 from Table A.21.
5. Computations: From Example 16.8, r s = 0.967.
6. Decision: Reject H 0 and conclude that there is a significant correlation be- tween the amounts of tar and nicotine found in cigarettes. Under the assumption of no correlation, it can be shown that the distribution of the r s -values approaches a normal distribution with a mean of 0 and a standard √ deviation of 1/ n − 1 as n increases. Consequently, when n exceeds the values given in Table A.21, one can test for a significant correlation by computing
and comparing with critical values of the standard normal distribution shown in Table A.3.
Exercises
16.23 A random sample of 15 adults living in a small 16.25 Use the runs test to test, at level 0.01, whether town were selected to estimate the proportion of voters there is a difference in the average operating time for favoring a certain candidate for mayor. Each individual the two calculators of Exercise 16.17 on page 670. was also asked if he or she was a college graduate. By letting Y and N designate the responses of “yes” and
16.26 In an industrial production line, items are in- “no” to the education question, the following sequence spected periodically for defectives. The following is a was obtained:
sequence of defective items, D, and nondefective items, N , produced by this production line:
NNNNYYNYYNYNNNN DDNNNDNNDDNNNN
Use the runs test at the 0.1 level of significance to de- NDDDNNDNNNNDND termine if the sequence supports the contention that the sample was selected at random.
Use the large-sample theory for the runs test, with a significance level of 0.05, to determine whether the de-
16.24 A silver-plating process is used to coat a cer- fectives are occurring at random. tain type of serving tray. When the process is in con- trol, the thickness of the silver on the trays will vary
16.27 Assuming that the measurements of Exercise randomly following a normal distribution with a mean
1.14 on page 30 were recorded successively from left of 0.02 millimeter and a standard deviation of 0.005 to right as they were collected, use the runs test, with
millimeter. Suppose that the next 12 trays examined α = 0.05, to test the hypothesis that the data represent show the following thicknesses of silver: 0.019, 0.021,
a random sequence.
0.020, 0.019, 0.020, 0.018, 0.023, 0.021, 0.024, 0.022, 0.023, 0.022. Use the runs test to determine if the
16.28 How large a sample is required to be 95% con- fluctuations in thickness from one tray to another are fident that at least 85% of the distribution of measure- random. Let α = 0.05.
ments is included between the sample extremes?
678 Chapter 16 Nonparametric Statistics 16.29 What is the probability that the range of a (b) test the hypothesis, at the 0.025 level of signif-
random sample of size 24 includes at least 90% of the icance, that ρ = 0 against the alternative that population?
16.30 How large a sample is required to be 99% con- 16.36 A consumer panel tests nine brands of mi- fident that at least 80% of the population will be less crowave ovens for overall quality. The ranks assigned
than the largest observation in the sample? by the panel and the suggested retail prices are as fol- lows:
16.31 What is the probability that at least 95% of a population will exceed the smallest value in a random
Panel Suggested sample of size n = 135?
Manufacturer Rating Price A 6 $480 16.32 The following table gives the recorded grades
B 9 395 for 10 students on a midterm test and the final exam-
C 2 575 ination in a calculus course:
D 8 550 E 5 510
F 1 545 Student
Midterm
Final
G 7 400 L.S.A.
Test
Examination
84 73 H 4 465 W.P.B.
98 63 I 3 420 R.W.K.
J.R.L. 72 66 Is there a significant relationship between the quality J.K.L.
86 78 and the price of a microwave oven? Use a 0.05 level of D.L.P.
93 78 significance.
B.L.P.
D.W.M. 0 0 16.37 Two judges at a college homecoming parade M.N.M.
92 88 rank eight floats in the following order: R.H.S.
87 77 Float (a) Calculate the rank correlation coefficient.
1 2 3 4 5 6 7 8 (b) Test the null hypothesis that ρ = 0 against the
5 8 4 3 6 2 7 1 alternative that ρ > 0. Use α = 0.025.
Judge A
7 5 4 2 8 1 6 3 (a) Calculate the rank correlation coefficient.
Judge B
16.33 With reference to the data of Exercise 11.1 on (b) Test the null hypothesis that ρ = 0 against the page 398,
alternative that ρ > 0. Use α = 0.05. (a) calculate the rank correlation coefficient;
(b) test the null hypothesis, at the 0.05 level of sig- 16.38 In the article called “Risky Assumptions” by nificance, that ρ = 0 against the alternative that Paul Slovic, Baruch Fischoff, and Sarah Lichtenstein, published in Psychology Today (June 1980), the risk of in Exercise 11.44 on page 435.
dying in the United States from 30 activities and tech- nologies is ranked by members of the League of Women
16.34 Calculate the rank correlation coefficient for Voters and also by experts who are professionally in- the daily rainfall and amount of particulate removed volved in assessing risks. The rankings are as shown in in Exercise 11.13 on page 400.
Table 16.9. (a) Calculate the rank correlation coefficient.
16.35 With reference to the weights and chest sizes (b) Test the null hypothesis of zero correlation between of infants in Exercise 11.47 on page 436,
the rankings of the League of Women Voters and (a) calculate the rank correlation coefficient;
the experts against the alternative that the corre- lation is not zero. Use a 0.05 level of significance.
Review Exercises 679
Table 16.9: The Ranking Data for Exercise 16.38
Activity or
Activity or
Technology Risk
Voters Experts Nuclear power
Voters
Experts
Technology Risk
2 1 Handguns
1 20 Motor vehicles
4 2 Motorcycles
3 4 Smoking
6 3 Private aviation
5 6 Alcoholic beverages
8 17 Pesticides
7 12 Police work
10 5 Fire fighting
9 8 Surgery
12 13 Hunting
11 18 Large construction
14 26 Mountain climing
13 23 Spray cans
16 15 Commercial aviation
15 29 Bicycles
18 9 Swimming
17 16 Electric power
24 19 Food preservatives
23 27 Railroads
26 21 Power mowers
25 14 Food coloring
28 24 Home appliances
Review Exercises
16.39 A study by a chemical company compared the (a) Use the sign test at the 0.05 level to test the null drainage properties of two different polymers. Ten dif-
hypothesis that polymer A has the same median ferent sludges were used, and both polymers were al-
drainage as polymer B.
lowed to drain in each sludge. The free drainage was (b) Use the signed-rank test to test the hypotheses of measured in mL/min.
part (a).
Sludge Type Polymer A Polymer B 1 12.7 12.0 16.40 In Review Exercise 13.45 on page 555, use the
2 14.6 15.0 Kruskal-Wallis test, at the 0.05 level of significance, to 3 18.6 19.2 determine if the chemical analyses performed by the 4 17.5 17.3 four laboratories give, on average, the same results.
6 16.9 16.6 16.41 Use the data from Exercise 13.14 on page 530 7 19.9 20.1 to see if the median amount of nitrogen lost in perspi- 8 17.6 17.6 ration is different for the three levels of dietary protein.
This page intentionally left blank
Chapter 17
Statistical Quality Control