Multiple Comparisons

13.6 Multiple Comparisons

The analysis of variance is a powerful procedure for testing the homogeneity of

a set of means. However, if we reject the null hypothesis and accept the stated alternative—that the means are not all equal—we still do not know which of the population means are equal and which are different.

524 Chapter 13 One-Factor Experiments: General

The GLM Procedure

Dependent Variable: moisture

Sum of

Source

F Value Pr > F Model

DF Squares

Mean Square

Corrected Total

moisture Mean 0.407669

R-Square

Coeff Var

Root MSE

F Value Pr > F aggregate

DF Type I SS

Mean Square

F Value Pr > F aggregate

DF Type III SS

Mean Square

F Value Pr > F (1,2,3,5) vs. 4

DF Contrast SS

Mean Square

Figure 13.4: A set of orthogonal contrasts Often it is of interest to make several (perhaps all possible) paired compar-

isons among the treatments. Actually, a paired comparison may be viewed as a simple contrast, namely, a test of

H 0 :μ i −μ j = 0,

H 1 :μ i −μ j

beneficial when particular complex contrasts are not known a priori. For example, in the aggregate data of Table 13.1, suppose that we wish to test

H 0 :μ 1 −μ 5 = 0,

H 1 :μ 1 −μ 5

The test is developed through use of an F, t, or confidence interval approach. Using t, we have

where s is the square root of the mean square error and n = 6 is the sample size per treatment. In this case,

553.33 − 610.67 t= √

13.6 Multiple Comparisons 525 The P-value for the t-test with 25 degrees of freedom is 0.17. Thus, there is not

sufficient evidence to reject H 0 .

Relationship between T and F

In the foregoing, we displayed the use of a pooled t-test along the lines of that discussed in Chapter 10. The pooled estimate was taken from the mean squared error in order to enjoy the degrees of freedom that are pooled across all five samples. In addition, we have tested a contrast. The reader should note that if the t-value is squared, the result is exactly of the same form as the value of f for a test on a contrast, discussed in the preceding section. In fact,

which, of course, is t 2 .

Confidence Interval Approach to a Paired Comparison

It is straightforward to solve the same problem of a paired comparison (or a con- trast) using a confidence interval approach. Clearly, if we compute a 100(1 − α)%

confidence interval on μ 1 −μ 5 , we have "

y ¯ 1. − ¯y 5. ±t α/2 s

where t α/2 is the upper 100(1 − α/2)% point of a t-distribution with 25 degrees of freedom (degrees of freedom coming from s 2 ). This straightforward connection between hypothesis testing and confidence intervals should be obvious from dis- cussions in Chapters 9 and 10. The test of the simple contrast μ 1 −μ 5 involves no more than observing whether or not the confidence interval above covers zero. Substituting the numbers, we have as the 95% confidence interval

(553.33 − 610.67) ± 2.060 4961 3 = −57.34 ± 83.77. Thus, since the interval covers zero, the contrast is not significant. In other words,

we do not find a significant difference between the means of aggregates 1 and 5.

Experiment-wise Error Rate

Serious difficulties occur when the analyst attempts to make many or all pos- sible paired comparisons. For the case of k means, there will be, of course, r = k(k − 1)/2 possible paired comparisons. Assuming independent comparisons, the experiment-wise error rate or family error rate (i.e., the probability of false rejection of at least one of the hypotheses) is given by 1 − (1 − α) r , where α is the selected probability of a type I error for a specific comparison. Clearly, this measure of experiment-wise type I error can be quite large. For example, even

526 Chapter 13 One-Factor Experiments: General if there are only 6 comparisons, say, in the case of 4 means, and α = 0.05, the

experiment-wise rate is

When many paired comparisons are being tested, there is usually a need to make the effective contrast on a single comparison more conservative. That is, with the confidence interval approach, the confidence intervals would be much wider than the ±t α/2 s 2/n used for the case where only a single comparison is being made.

Tukey’s Test

There are several standard methods for making paired comparisons that sustain the credibility of the type I error rate. We shall discuss and illustrate two of them here. The first one, called Tukey’s procedure, allows formation of simultaneous 100(1 − α)% confidence intervals for all paired comparisons. The method is based on the studentized range distribution. The appropriate percentile point is a function

of α, k, and v = degrees of freedom for s 2 . A list of upper percentage points for α = 0.05 is shown in Table A.12. The method of paired comparisons by Tukey

i. − ¯y j. 5 |

exceeds q(α, k, v) s 2 n .

Tukey’s procedure is easily illustrated. Consider a hypothetical example where we have 6 treatments in a one-factor completely randomized design, with 5 obser- vations taken per treatment. Suppose that the mean square error taken from the

analysis-of-variance table is s 2 = 2.45 (24 degrees of freedom). The sample means are in ascending order:

¯ y 2. y ¯ 5. y ¯ 1. ¯ y 3. y ¯ 6. y ¯ 4.

14.50 16.75 19.84 21.12 22.90 23.20. With α = 0.05, the value of q(0.05, 6, 24) is 4.37. Thus, all absolute differences are

to be compared to

As a result, the following represent means found to be significantly different using Tukey’s procedure:

Where Does the α-Level Come From in Tukey’s Test?

We briefly alluded to the concept of simultaneous confidence intervals being employed for Tukey’s procedure. The reader will gain a useful insight into the notion of multiple comparisons if he or she gains an understanding of what is meant by simultaneous confidence intervals.

In Chapter 9, we saw that if we compute a 95% confidence interval on, say,

a mean μ, then the probability that the interval covers the true mean μ is 0.95.

13.6 Multiple Comparisons 527 However, as we have discussed, for the case of multiple comparisons, the effective

probability of interest is tied to the experiment-wise error rate, and it should be emphasized that the confidence intervals of the type ¯ y i. − ¯y j. ± q(α, k, v)s 1/n are not independent since they all involve s and many involve the use of the same averages, the ¯ y i. . Despite the difficulties, if we use q(0.05, k, v), the simultaneous confidence level is controlled at 95%. The same holds for q(0.01, k, v); namely, the confidence level is controlled at 99%. In the case of α = 0.05, there is a probability of 0.05 that at least one pair of measures will be falsely found to be different (false rejection of at least one null hypothesis). In the α = 0.01 case, the corresponding probability will be 0.01.

Duncan’s Test

The second procedure we shall discuss is called Duncan’s procedure or Dun- can’s multiple-range test. This procedure is also based on the general notion of studentized range. The range of any subset of p sample means must exceed a certain value before any of the p means are found to be different. This value is called the least significant range for the p means and is denoted by R p , where

R p =r p

The values of the quantity r p , called the least significant studentized range, depend on the desired level of significance and the number of degrees of freedom of the mean square error. These values may be obtained from Table A.13 for p = 2, 3, . . . , 10 means.

To illustrate the multiple-range test procedure, let us consider the hypothetical example where 6 treatments are compared, with 5 observations per treatment. This is the same example used to illustrate Tukey’s test. We obtain R p by multiplying each r p by 0.70. The results of these computations are summarized as follows:

2.919 3.066 3.160 3.226 3.276 R p 2.043 2.146 2.212 2.258 2.293

Comparing these least significant ranges with the differences in ordered means, we arrive at the following conclusions:

1. Since ¯ y 4. − ¯y 2. = 8.70 > R 6 = 2.293, we conclude that μ 4 and μ 2 are signifi-

cantly different.

2. Comparing ¯ y 4. − ¯y 5. and ¯ y 6. − ¯y 2. with R 5 , we conclude that μ 4 is significantly greater than μ 5 and μ 6 is significantly greater than μ 2 .

3. Comparing ¯ y 4. − ¯y 1. ,¯ y 6. − ¯y 5. , and ¯ y 3. − ¯y 2. with R 4 , we conclude that each

difference is significant.

4. Comparing ¯ y 4. − ¯y 3. ,¯ y 6. − ¯y 1. ,¯ y 3. − ¯y 5. , and ¯ y 1. − ¯y 2. with R 3 , we find all differences significant except for μ 4 −μ 3 . Therefore, μ 3 ,μ 4 , and μ 6 constitute

a subset of homogeneous means.

5. Comparing ¯ y 3. − ¯y 1. ,¯ y 1. − ¯y 5. , and ¯ y 5. − ¯y 2. with R 2 , we conclude that only

μ 3 and μ 1 are not significantly different.

528 Chapter 13 One-Factor Experiments: General It is customary to summarize the conclusions above by drawing a line under any

subsets of adjacent means that are not significantly different. Thus, we have y ¯ 2. ¯ y 5. y ¯ 1. y ¯ 3. ¯ y 6. y ¯ 4.

14.50 16.75 19.84 21.12 22.90 23.20 It is clear that in this case the results from Tukey’s and Duncan’s procedures

are very similar. Tukey’s procedure did not detect a difference between 2 and 5, whereas Duncan’s did.

Dunnett’s Test: Comparing Treatment with a Control

In many scientific and engineering problems, one is not interested in drawing infer- ences regarding all possible comparisons among the treatment means of the type μ i −μ j . Rather, the experiment often dictates the need to simultaneously compare each treatment with a control. A test procedure developed by C. W. Dunnett de- termines significant differences between each treatment mean and the control, at a single joint significance level α. To illustrate Dunnett’s procedure, let us consider the experimental data of Table 13.6 for a one-way classification where the effect of three catalysts on the yield of a reaction is being studied. A fourth treatment, no catalyst, is used as a control.

Table 13.6: Yield of Reaction

Control

Catalyst 1 Catalyst 2 Catalyst 3

In general, we wish to test the k hypotheses

where μ 0 represents the mean yield for the population of measurements in which the control is used. The usual analysis-of-variance assumptions, as outlined in Section 13.3, are expected to remain valid. To test the null hypotheses specified

by H 0 against two-sided alternatives for an experimental situation in which there are k treatments, excluding the control, and n observations per treatment, we first calculate the values

/n The sample variance s 2 is obtained, as before, from the mean square error in the

2s 2

analysis of variance. Now, the critical region for rejecting H 0 , at the α-level of

Exercises 529 significance, is established by the inequality

|d i |>d α/2 (k, v),

where v is the number of degrees of freedom for the mean square error. The values of the quantity d α/2 (k, v) for a two-tailed test are given in Table A.14 for α = 0.05 and α = 0.01 for various values of k and v.

Example 13.5: For the data of Table 13.6, test hypotheses comparing each catalyst with the con- trol, using two-sided alternatives. Choose α = 0.05 as the joint significance level. Solution : The mean square error with 16 degrees of freedom is obtained from the analysis- of-variance table, using all k + 1 treatments. The mean square error is given by

From Table A.14 the critical value for α = 0.05 is found to be d 0.025 (3, 16) = 2.59. Since |d 1 | < 2.59 and |d 3 | < 2.59, we conclude that only the mean yield for catalyst

2 is significantly different from the mean yield of the reaction using the control. Many practical applications dictate the need for a one-tailed test for comparing treatments with a control. Certainly, when a pharmacologist is concerned with the effect of various dosages of a drug on cholesterol level and his control is zero dosage, it is of interest to determine if each dosage produces a significantly larger reduction than the control. Table A.15 shows the critical values of d α (k, v) for one-sided alternatives.

Exercises

13.12 Consider the data of Review Exercise 13.45 on laundered under specific conditions. Two baths were page 555. Make significance tests on the following con- prepared, one with carboxymethyl cellulose and one trasts:

without. Twelve pieces of fabric were laundered 5 times (a) B versus A, C, and D;

in bath I, and 12 other pieces of fabric were laundered (b) C versus A and D;

10 times in bath I. This was repeated using 24 addi- tional pieces of cloth in bath II. After the washings the

(c) A versus D. lengths of fabric that burned and the burn times were measured. For convenience, let us define the following

13.13 The purpose of the study The Incorporation treatments: of a Chelating Agent into a Flame Retardant Finish of a Cotton Flannelette and the Evaluation of Selected

Treatment 1: 5 launderings in bath I, Fabric Properties conducted at Virginia Tech was to

Treatment 2: 5 launderings in bath II, evaluate the use of a chelating agent as part of the flame-retardant finish of cotton flannelette by deter-

Treatment 3: 10 launderings in bath I, mining its effects upon flammability after the fabric is

Treatment 4: 10 launderings in bath II.

530 Chapter 13 One-Factor Experiments: General Burn times, in seconds, were recorded as follows:

13.16 An investigation was conducted to determine Treatment

the source of reduction in yield of a certain chemical 1 2 3 4 product. It was known that the loss in yield occurred in

13.7 6.2 27.2 18.2 the mother liquor, that is, the material removed at the 23.0 5.4 16.8 8.8 filtration stage. It was thought that different blends 15.7 5.0 12.9 14.5 of the original material might result in different yield reductions at the mother liquor stage. The following 25.5 4.4 14.9 14.7 are the percent reductions for 3 batches at each of 4 15.8 5.0 17.1 17.1 preselected blends:

14.0 16.0 10.8 10.6 Blend 29.4 2.5 13.5 5.8 1 2 3 4 9.7 1.6 25.5 7.3 25.6 25.2 20.8 31.6 14.0 3.9 14.2 17.7 24.3 28.6 26.7 29.8 12.3 2.5 27.4 18.3 27.9 24.7 22.2 34.3

12.3 7.1 11.5 9.9 (a) Perform the analysis of variance at the α = 0.05 (a) Perform an analysis of variance, using a 0.01 level of

level of significance.

significance, and determine whether there are any (b) Use Duncan’s multiple-range test to determine significant differences among the treatment means.

which blends differ.

(b) Use single-degree-of-freedom contrasts with α = (c) Do part (b) using Tukey’s test. 0.01 to compare the mean burn time of treatment 1 versus treatment 2 and also treatment 3 versus

13.17 In the study An Evaluation of the Removal treatment 4.

Method for Estimating Benthic Populations and Diver- sity conducted by Virginia Tech on the Jackson River, 5

13.14 The study Loss of Nitrogen Through Sweat by different sampling procedures were used to determine Preadolescent Boys Consuming Three Levels of Dietary the species counts. Twenty samples were selected at Protein was conducted by the Department of Human random, and each of the 5 sampling procedures was Nutrition and Foods at Virginia Tech to determine per- repeated 4 times. The species counts were recorded as spiration nitrogen loss at various dietary protein levels. follows: Twelve preadolescent boys ranging in age from 7 years,

Sampling Procedure 8 months to 9 years, 8 months, all judged to be clini-

Substrate cally healthy, were used in the experiment. Each boy

Removal Kick- was subjected to one of three controlled diets in which

Deple- Modified

Surber Kicknet net 29, 54, or 84 grams of protein were consumed per day.

tion

Hess

85 75 31 43 17 The following data represent the body perspiration ni-

55 45 20 21 10 trogen loss, in milligrams, during the last two days of

40 35 9 15 8 the experimental period:

77 67 37 27 15 Protein Level

29 Grams

(a) Is there a significant difference in the average 190

54 Grams

84 Grams

species counts for the different sampling proce- 266

dures? Use a P-value in your conclusion. 270

(b) Use Tukey’s test with α = 0.05 to find which sam-

pling procedures differ.

(a) Perform an analysis of variance at the 0.05 level 13.18 The following data are values of pressure (psi) of significance to show that the mean perspiration in a torsion spring for several settings of the angle be- nitrogen losses at the three protein levels are dif- tween the legs of the spring in a free position: ferent.

Angle ( ◦ ) (b) Use Tukey’s test to determine which protein levels

67 71 75 79 83 are significantly different from each other in mean

83 84 86 87 89 90 nitrogen loss.

85 85 87 87 90 92 85 88 88 90 13.15 Use Tukey’s test, with a 0.05 level of signifi-

86 88 88 91 cance, to analyze the means of the five different brands

86 88 89 of headache tablets in Exercise 13.2 on page 518.

Exercises 531 Compute a one-way analysis of variance for this experi- the data of Exercise 13.6 on page 519. Discuss the

ment and state your conclusion concerning the effect of results. angle on the pressure in the spring. (From C. R. Hicks, Fundamental Concepts in the Design of Experiments,

13.23 In a biological experiment, four concentrations Holt, Rinehart and Winston, New York, 1973.)

of a certain chemical are used to enhance the growth of a certain type of plant over time. Five plants are used at each concentration, and the growth in each plant is

13.19 It is suspected that the environmental temper- measured in centimeters. The following growth data ature at which batteries are activated affects their life. are taken. A control (no chemical) is also applied. Thirty homogeneous batteries were tested, six at each of five temperatures, and the data are shown below

Concentration (activated life in seconds). Analyze and interpret the

1 2 3 4 data. (From C. R. Hicks, Fundamental Concepts in

Control

6.8 8.2 7.7 6.9 5.9 Design of Experiments, Holt, Rinehart and Winston,

7.3 8.7 8.4 5.8 6.1 New York, 1973.)

6.9 9.2 8.1 6.8 5.7 Temperature ( C)

55 60 70 72 65 Use Dunnett’s two-sided test at the 0.05 level of signif- 55 61 72 72 66 icance to simultaneously compare the concentrations 57 60 72 72 60 with the control.

54 60 77 68 65 13.24 The financial structure of a firm refers to the 56 60 77 69 65 way the firm’s assets are divided into equity and debt, and the financial leverage refers to the percentage of assets financed by debt. In the paper The Effect of Fi-

13.20 The following table (from A. Hald, Statistical nancial Leverage on Return, Tai Ma of Virginia Tech Theory with Engineering Applications, John Wiley & claims that financial leverage can be used to increase Sons, New York, 1952) gives tensile strengths (in devi- the rate of return on equity. To say it another way, ations from 340) for wires taken from nine cables to be stockholders can receive higher returns on equity with used for a high-voltage network. Each cable is made the same amount of investment through the use of fi- from 12 wires. We want to know whether the mean nancial leverage. The following data show the rates strengths of the wires in the nine cables are the same. of return on equity using 3 different levels of financial If the cables are different, which ones differ? Use a leverage and a control level (zero debt) for 24 randomly P-value in your analysis of variance.

selected firms:

Cable Tensile Strength Financial Leverage

Medium High 2 −11 −13 −8

7 10 7 8 1 Source: Standard & Poor’s Machinery Indus- try Survey, 1975.

13.21 The printout in Figure 13.5 on page 532 gives (a) Perform the analysis of variance at the 0.05 level of information on Duncan’s test, using PROC GLM in

significance.

SAS, for the aggregate data in Example 13.1. Give (b) Use Dunnett’s test at the 0.01 level of significance conclusions regarding paired comparisons using Dun-

to determine whether the mean rates of return on can’s test results.

equity are higher at the low, medium, and high lev- els of financial leverage than at the control level.

13.22 Do Duncan’s test for paired comparisons for

532 Chapter 13 One-Factor Experiments: General

The GLM Procedure Duncan’s Multiple Range Test for moisture NOTE: This test controls the Type I comparisonwise error rate,

not the experimentwise error rate.

Error Degrees of Freedom

Error Mean Square

Number of Means

83.75 87.97 90.69 92.61 Means with the same letter are not significantly different.

Critical Range

Duncan Grouping

6 4 Figure 13.5: SAS printout for Exercise 13.21.

B 465.17

Dokumen yang terkait

Optimal Retention for a Quota Share Reinsurance

0 0 7

Digital Gender Gap for Housewives Digital Gender Gap bagi Ibu Rumah Tangga

0 0 9

Challenges of Dissemination of Islam-related Information for Chinese Muslims in China Tantangan dalam Menyebarkan Informasi terkait Islam bagi Muslim China di China

0 0 13

Family is the first and main educator for all human beings Family is the school of love and trainers of management of stress, management of psycho-social-

0 0 26

THE EFFECT OF MNEMONIC TECHNIQUE ON VOCABULARY RECALL OF THE TENTH GRADE STUDENTS OF SMAN 3 PALANGKA RAYA THESIS PROPOSAL Presented to the Department of Education of the State Islamic College of Palangka Raya in Partial Fulfillment of the Requirements for

0 3 22

GRADERS OF SMAN-3 PALANGKA RAYA ACADEMIC YEAR OF 20132014 THESIS Presented to the Department of Education of the State College of Islamic Studies Palangka Raya in Partial Fulfillment of the Requirements for the Degree of Sarjana Pendidikan Islam

0 0 20

A. Research Design and Approach - The readability level of reading texts in the english textbook entitled “Bahasa Inggris SMA/MA/MAK” for grade XI semester 1 published by the Ministry of Education and Culture of Indonesia - Digital Library IAIN Palangka R

0 1 12

A. Background of Study - The quality of the english textbooks used by english teachers for the tenth grade of MAN Model Palangka Raya Based on Education National Standard Council (BSNP) - Digital Library IAIN Palangka Raya

0 0 15

1. The definition of textbook - The quality of the english textbooks used by english teachers for the tenth grade of MAN Model Palangka Raya Based on Education National Standard Council (BSNP) - Digital Library IAIN Palangka Raya

0 0 38

CHAPTER IV DISCUSSION - The quality of the english textbooks used by english teachers for the tenth grade of MAN Model Palangka Raya Based on Education National Standard Council (BSNP) - Digital Library IAIN Palangka Raya

0 0 95