Multiple Comparisons

13.6 Multiple Comparisons

The analysis of variance is a powerful procedure for testing the homogeneity of

a set of means. However, if we reject the null hypothesis and accept the stated alternative—that the means are not all equal—we still do not know which of the population means are equal and which are diﬀerent.

524 Chapter 13 One-Factor Experiments: General

The GLM Procedure

Dependent Variable: moisture

Sum of

Source

F Value Pr > F Model

DF Squares

Mean Square

Corrected Total

moisture Mean 0.407669

R-Square

Coeff Var

Root MSE

F Value Pr > F aggregate

DF Type I SS

Mean Square

F Value Pr > F aggregate

DF Type III SS

Mean Square

F Value Pr > F (1,2,3,5) vs. 4

DF Contrast SS

Mean Square

Figure 13.4: A set of orthogonal contrasts Often it is of interest to make several (perhaps all possible) paired compar-

isons among the treatments. Actually, a paired comparison may be viewed as a simple contrast, namely, a test of

H 0 :μ i −μ j = 0,

H 1 :μ i −μ j

beneﬁcial when particular complex contrasts are not known a priori. For example, in the aggregate data of Table 13.1, suppose that we wish to test

H 0 :μ 1 −μ 5 = 0,

H 1 :μ 1 −μ 5

The test is developed through use of an F, t, or conﬁdence interval approach. Using t, we have

where s is the square root of the mean square error and n = 6 is the sample size per treatment. In this case,

553.33 − 610.67 t= √

13.6 Multiple Comparisons 525 The P-value for the t-test with 25 degrees of freedom is 0.17. Thus, there is not

suﬃcient evidence to reject H 0 .

Relationship between T and F

In the foregoing, we displayed the use of a pooled t-test along the lines of that discussed in Chapter 10. The pooled estimate was taken from the mean squared error in order to enjoy the degrees of freedom that are pooled across all ﬁve samples. In addition, we have tested a contrast. The reader should note that if the t-value is squared, the result is exactly of the same form as the value of f for a test on a contrast, discussed in the preceding section. In fact,

which, of course, is t 2 .

Conﬁdence Interval Approach to a Paired Comparison

It is straightforward to solve the same problem of a paired comparison (or a contrast) using a conﬁdence interval approach. Clearly, if we compute a 100(1 − α)%

conﬁdence interval on μ 1 −μ 5 , we have "

y ¯ 1. − ¯y 5. ±t α/2 s

where t α/2 is the upper 100(1 − α/2)% point of a t-distribution with 25 degrees of freedom (degrees of freedom coming from s 2 ). This straightforward connection between hypothesis testing and confidence intervals should be obvious from dis- cussions in Chapters 9 and 10. The test of the simple contrast μ 1 −μ 5 involves no more than observing whether or not the confidence interval above covers zero. Substituting the numbers, we have as the 95% confidence interval

(553.33 − 610.67) ± 2.060 4961 3 = −57.34 ± 83.77. Thus, since the interval covers zero, the contrast is not signiﬁcant. In other words,

we do not find a significant difference between the means of aggregates 1 and 5.

Experiment-wise Error Rate

Serious diﬃculties occur when the analyst attempts to make many or all possible paired comparisons. For the case of k means, there will be, of course, r = k(k − 1)/2 possible paired comparisons. Assuming independent comparisons, the experiment-wise error rate or family error rate (i.e., the probability of false rejection of at least one of the hypotheses) is given by 1 − (1 − α) r , where α is the selected probability of a type I error for a speciﬁc comparison. Clearly, this measure of experiment-wise type I error can be quite large. For example, even

526 Chapter 13 One-Factor Experiments: General if there are only 6 comparisons, say, in the case of 4 means, and α = 0.05, the

experiment-wise rate is

When many paired comparisons are being tested, there is usually a need to make the effective contrast on a single comparison more conservative. That is, with the confidence interval approach, the confidence intervals would be much wider than the ±t α/2 s 2/n used for the case where only a single comparison is being made.

Tukey’s Test

There are several standard methods for making paired comparisons that sustain the credibility of the type I error rate. We shall discuss and illustrate two of them here. The ﬁrst one, called Tukey’s procedure, allows formation of simultaneous 100(1 − α)% conﬁdence intervals for all paired comparisons. The method is based on the studentized range distribution. The appropriate percentile point is a function

of α, k, and v = degrees of freedom for s 2 . A list of upper percentage points for α = 0.05 is shown in Table A.12. The method of paired comparisons by Tukey

i. − ¯y j. 5 |

exceeds q(α, k, v) s 2 n .

Tukey’s procedure is easily illustrated. Consider a hypothetical example where we have 6 treatments in a one-factor completely randomized design, with 5 observations taken per treatment. Suppose that the mean square error taken from the

analysis-of-variance table is s 2 = 2.45 (24 degrees of freedom). The sample means are in ascending order:

¯ y 2. y ¯ 5. y ¯ 1. ¯ y 3. y ¯ 6. y ¯ 4.

14.50 16.75 19.84 21.12 22.90 23.20. With α = 0.05, the value of q(0.05, 6, 24) is 4.37. Thus, all absolute diﬀerences are

to be compared to

As a result, the following represent means found to be signiﬁcantly diﬀerent using Tukey’s procedure:

Where Does the α-Level Come From in Tukey’s Test?

We briefly alluded to the concept of simultaneous confidence intervals being employed for Tukey’s procedure. The reader will gain a useful insight into the notion of multiple comparisons if he or she gains an understanding of what is meant by simultaneous confidence intervals.

In Chapter 9, we saw that if we compute a 95% conﬁdence interval on, say,

a mean μ, then the probability that the interval covers the true mean μ is 0.95.

13.6 Multiple Comparisons 527 However, as we have discussed, for the case of multiple comparisons, the eﬀective

probability of interest is tied to the experiment-wise error rate, and it should be emphasized that the confidence intervals of the type ¯ y i. − ¯y j. ± q(α, k, v)s 1/n are not independent since they all involve s and many involve the use of the same averages, the ¯ y i. . Despite the difficulties, if we use q(0.05, k, v), the simultaneous confidence level is controlled at 95%. The same holds for q(0.01, k, v); namely, the confidence level is controlled at 99%. In the case of α = 0.05, there is a probability of 0.05 that at least one pair of measures will be falsely found to be different (false rejection of at least one null hypothesis). In the α = 0.01 case, the corresponding probability will be 0.01.

Duncan’s Test

The second procedure we shall discuss is called Duncan’s procedure or Dun- can’s multiple-range test. This procedure is also based on the general notion of studentized range. The range of any subset of p sample means must exceed a certain value before any of the p means are found to be diﬀerent. This value is called the least signiﬁcant range for the p means and is denoted by R p , where

R p =r p

The values of the quantity r p , called the least signiﬁcant studentized range, depend on the desired level of signiﬁcance and the number of degrees of freedom of the mean square error. These values may be obtained from Table A.13 for p = 2, 3, . . . , 10 means.

To illustrate the multiple-range test procedure, let us consider the hypothetical example where 6 treatments are compared, with 5 observations per treatment. This is the same example used to illustrate Tukey’s test. We obtain R p by multiplying each r p by 0.70. The results of these computations are summarized as follows:

2.919 3.066 3.160 3.226 3.276 R p 2.043 2.146 2.212 2.258 2.293

Comparing these least signiﬁcant ranges with the diﬀerences in ordered means, we arrive at the following conclusions:

1. Since ¯ y 4. − ¯y 2. = 8.70 > R 6 = 2.293, we conclude that μ 4 and μ 2 are signiﬁ-

cantly diﬀerent.

2. Comparing ¯ y 4. − ¯y 5. and ¯ y 6. − ¯y 2. with R 5 , we conclude that μ 4 is signiﬁcantly greater than μ 5 and μ 6 is signiﬁcantly greater than μ 2 .

3. Comparing ¯ y 4. − ¯y 1. ,¯ y 6. − ¯y 5. , and ¯ y 3. − ¯y 2. with R 4 , we conclude that each

diﬀerence is signiﬁcant.

4. Comparing ¯ y 4. − ¯y 3. ,¯ y 6. − ¯y 1. ,¯ y 3. − ¯y 5. , and ¯ y 1. − ¯y 2. with R 3 , we find all differences significant except for μ 4 −μ 3 . Therefore, μ 3 ,μ 4 , and μ 6 constitute

a subset of homogeneous means.

5. Comparing ¯ y 3. − ¯y 1. ,¯ y 1. − ¯y 5. , and ¯ y 5. − ¯y 2. with R 2 , we conclude that only

μ 3 and μ 1 are not signiﬁcantly diﬀerent.

528 Chapter 13 One-Factor Experiments: General It is customary to summarize the conclusions above by drawing a line under any

subsets of adjacent means that are not signiﬁcantly diﬀerent. Thus, we have y ¯ 2. ¯ y 5. y ¯ 1. y ¯ 3. ¯ y 6. y ¯ 4.

14.50 16.75 19.84 21.12 22.90 23.20 It is clear that in this case the results from Tukey’s and Duncan’s procedures

are very similar. Tukey’s procedure did not detect a diﬀerence between 2 and 5, whereas Duncan’s did.

Dunnett’s Test: Comparing Treatment with a Control

In many scientific and engineering problems, one is not interested in drawing infer- ences regarding all possible comparisons among the treatment means of the type μ i −μ j . Rather, the experiment often dictates the need to simultaneously compare each treatment with a control. A test procedure developed by C. W. Dunnett de- termines significant differences between each treatment mean and the control, at a single joint significance level α. To illustrate Dunnett’s procedure, let us consider the experimental data of Table 13.6 for a one-way classification where the effect of three catalysts on the yield of a reaction is being studied. A fourth treatment, no catalyst, is used as a control.

Table 13.6: Yield of Reaction

Control

Catalyst 1 Catalyst 2 Catalyst 3

In general, we wish to test the k hypotheses

where μ 0 represents the mean yield for the population of measurements in which the control is used. The usual analysis-of-variance assumptions, as outlined in Section 13.3, are expected to remain valid. To test the null hypotheses speciﬁed

by H 0 against two-sided alternatives for an experimental situation in which there are k treatments, excluding the control, and n observations per treatment, we ﬁrst calculate the values

/n The sample variance s 2 is obtained, as before, from the mean square error in the

2s 2

analysis of variance. Now, the critical region for rejecting H 0 , at the α-level of

Exercises 529 signiﬁcance, is established by the inequality

|d i |>d α/2 (k, v),

where v is the number of degrees of freedom for the mean square error. The values of the quantity d α/2 (k, v) for a two-tailed test are given in Table A.14 for α = 0.05 and α = 0.01 for various values of k and v.

Example 13.5: For the data of Table 13.6, test hypotheses comparing each catalyst with the control, using two-sided alternatives. Choose α = 0.05 as the joint signiﬁcance level. Solution : The mean square error with 16 degrees of freedom is obtained from the analysis- of-variance table, using all k + 1 treatments. The mean square error is given by

From Table A.14 the critical value for α = 0.05 is found to be d 0.025 (3, 16) = 2.59. Since |d 1 | < 2.59 and |d 3 | < 2.59, we conclude that only the mean yield for catalyst

2 is significantly different from the mean yield of the reaction using the control. Many practical applications dictate the need for a one-tailed test for comparing treatments with a control. Certainly, when a pharmacologist is concerned with the effect of various dosages of a drug on cholesterol level and his control is zero dosage, it is of interest to determine if each dosage produces a significantly larger reduction than the control. Table A.15 shows the critical values of d α (k, v) for one-sided alternatives.

Exercises

13.12 Consider the data of Review Exercise 13.45 on laundered under speciﬁc conditions. Two baths were page 555. Make signiﬁcance tests on the following con- prepared, one with carboxymethyl cellulose and one trasts:

without. Twelve pieces of fabric were laundered 5 times (a) B versus A, C, and D;

in bath I, and 12 other pieces of fabric were laundered (b) C versus A and D;

10 times in bath I. This was repeated using 24 addi- tional pieces of cloth in bath II. After the washings the

(c) A versus D. lengths of fabric that burned and the burn times were measured. For convenience, let us deﬁne the following

13.13 The purpose of the study The Incorporation treatments: of a Chelating Agent into a Flame Retardant Finish of a Cotton Flannelette and the Evaluation of Selected

Treatment 1: 5 launderings in bath I, Fabric Properties conducted at Virginia Tech was to

Treatment 2: 5 launderings in bath II, evaluate the use of a chelating agent as part of the flame-retardant finish of cotton flannelette by deter-

Treatment 3: 10 launderings in bath I, mining its eﬀects upon ﬂammability after the fabric is

Treatment 4: 10 launderings in bath II.

530 Chapter 13 One-Factor Experiments: General Burn times, in seconds, were recorded as follows:

13.16 An investigation was conducted to determine Treatment

the source of reduction in yield of a certain chemical 1 2 3 4 product. It was known that the loss in yield occurred in

13.7 6.2 27.2 18.2 the mother liquor, that is, the material removed at the 23.0 5.4 16.8 8.8 filtration stage. It was thought that different blends 15.7 5.0 12.9 14.5 of the original material might result in different yield reductions at the mother liquor stage. The following 25.5 4.4 14.9 14.7 are the percent reductions for 3 batches at each of 4 15.8 5.0 17.1 17.1 preselected blends:

14.0 16.0 10.8 10.6 Blend 29.4 2.5 13.5 5.8 1 2 3 4 9.7 1.6 25.5 7.3 25.6 25.2 20.8 31.6 14.0 3.9 14.2 17.7 24.3 28.6 26.7 29.8 12.3 2.5 27.4 18.3 27.9 24.7 22.2 34.3

12.3 7.1 11.5 9.9 (a) Perform the analysis of variance at the α = 0.05 (a) Perform an analysis of variance, using a 0.01 level of

level of signiﬁcance.

significance, and determine whether there are any (b) Use Duncan’s multiple-range test to determine significant differences among the treatment means.

which blends diﬀer.

(b) Use single-degree-of-freedom contrasts with α = (c) Do part (b) using Tukey’s test. 0.01 to compare the mean burn time of treatment 1 versus treatment 2 and also treatment 3 versus

13.17 In the study An Evaluation of the Removal treatment 4.

Method for Estimating Benthic Populations and Diver- sity conducted by Virginia Tech on the Jackson River, 5

13.14 The study Loss of Nitrogen Through Sweat by diﬀerent sampling procedures were used to determine Preadolescent Boys Consuming Three Levels of Dietary the species counts. Twenty samples were selected at Protein was conducted by the Department of Human random, and each of the 5 sampling procedures was Nutrition and Foods at Virginia Tech to determine per- repeated 4 times. The species counts were recorded as spiration nitrogen loss at various dietary protein levels. follows: Twelve preadolescent boys ranging in age from 7 years,

Sampling Procedure 8 months to 9 years, 8 months, all judged to be clini-

Substrate cally healthy, were used in the experiment. Each boy

Removal Kick- was subjected to one of three controlled diets in which

Deple- Modiﬁed

Surber Kicknet net 29, 54, or 84 grams of protein were consumed per day.

tion

Hess

85 75 31 43 17 The following data represent the body perspiration ni-

55 45 20 21 10 trogen loss, in milligrams, during the last two days of

40 35 9 15 8 the experimental period:

77 67 37 27 15 Protein Level

29 Grams

(a) Is there a signiﬁcant diﬀerence in the average 190

54 Grams

84 Grams

species counts for the diﬀerent sampling proce- 266

dures? Use a P-value in your conclusion. 270

(b) Use Tukey’s test with α = 0.05 to ﬁnd which sam-

pling procedures diﬀer.

(a) Perform an analysis of variance at the 0.05 level 13.18 The following data are values of pressure (psi) of signiﬁcance to show that the mean perspiration in a torsion spring for several settings of the angle be- nitrogen losses at the three protein levels are dif- tween the legs of the spring in a free position: ferent.

Angle ( ◦ ) (b) Use Tukey’s test to determine which protein levels

67 71 75 79 83 are signiﬁcantly diﬀerent from each other in mean

83 84 86 87 89 90 nitrogen loss.

85 85 87 87 90 92 85 88 88 90 13.15 Use Tukey’s test, with a 0.05 level of signiﬁ-

86 88 88 91 cance, to analyze the means of the ﬁve diﬀerent brands

86 88 89 of headache tablets in Exercise 13.2 on page 518.

Exercises 531 Compute a one-way analysis of variance for this experi- the data of Exercise 13.6 on page 519. Discuss the

ment and state your conclusion concerning the eﬀect of results. angle on the pressure in the spring. (From C. R. Hicks, Fundamental Concepts in the Design of Experiments,

13.23 In a biological experiment, four concentrations Holt, Rinehart and Winston, New York, 1973.)

of a certain chemical are used to enhance the growth of a certain type of plant over time. Five plants are used at each concentration, and the growth in each plant is

13.19 It is suspected that the environmental temper- measured in centimeters. The following growth data ature at which batteries are activated aﬀects their life. are taken. A control (no chemical) is also applied. Thirty homogeneous batteries were tested, six at each of ﬁve temperatures, and the data are shown below

Concentration (activated life in seconds). Analyze and interpret the

1 2 3 4 data. (From C. R. Hicks, Fundamental Concepts in

Control

6.8 8.2 7.7 6.9 5.9 Design of Experiments, Holt, Rinehart and Winston,

7.3 8.7 8.4 5.8 6.1 New York, 1973.)

6.9 9.2 8.1 6.8 5.7 Temperature ( C)

55 60 70 72 65 Use Dunnett’s two-sided test at the 0.05 level of signif- 55 61 72 72 66 icance to simultaneously compare the concentrations 57 60 72 72 60 with the control.

54 60 77 68 65 13.24 The financial structure of a firm refers to the 56 60 77 69 65 way the firm’s assets are divided into equity and debt, and the financial leverage refers to the percentage of assets financed by debt. In the paper The Effect of Fi-

13.20 The following table (from A. Hald, Statistical nancial Leverage on Return, Tai Ma of Virginia Tech Theory with Engineering Applications, John Wiley & claims that financial leverage can be used to increase Sons, New York, 1952) gives tensile strengths (in devi- the rate of return on equity. To say it another way, ations from 340) for wires taken from nine cables to be stockholders can receive higher returns on equity with used for a high-voltage network. Each cable is made the same amount of investment through the use of fi- from 12 wires. We want to know whether the mean nancial leverage. The following data show the rates strengths of the wires in the nine cables are the same. of return on equity using 3 different levels of financial If the cables are different, which ones differ? Use a leverage and a control level (zero debt) for 24 randomly P-value in your analysis of variance.

selected ﬁrms:

Cable Tensile Strength Financial Leverage

Medium High 2 −11 −13 −8

7 10 7 8 1 Source: Standard & Poor’s Machinery Indus- try Survey, 1975.

13.21 The printout in Figure 13.5 on page 532 gives (a) Perform the analysis of variance at the 0.05 level of information on Duncan’s test, using PROC GLM in

signiﬁcance.

SAS, for the aggregate data in Example 13.1. Give (b) Use Dunnett’s test at the 0.01 level of signiﬁcance conclusions regarding paired comparisons using Dun-

to determine whether the mean rates of return on can’s test results.

equity are higher at the low, medium, and high levels of ﬁnancial leverage than at the control level.

13.22 Do Duncan’s test for paired comparisons for

532 Chapter 13 One-Factor Experiments: General

The GLM Procedure Duncan’s Multiple Range Test for moisture NOTE: This test controls the Type I comparisonwise error rate,

not the experimentwise error rate.

Error Degrees of Freedom

Error Mean Square

Number of Means

83.75 87.97 90.69 92.61 Means with the same letter are not significantly different.

Critical Range

Duncan Grouping

6 4 Figure 13.5: SAS printout for Exercise 13.21.

B 465.17

Multiple Comparisons

13.6 Multiple Comparisons

Parts

Dokumen yang terkait

Optimal Retention for a Quota Share Reinsurance

Digital Gender Gap for Housewives Digital Gender Gap bagi Ibu Rumah Tangga

Challenges of Dissemination of Islam-related Information for Chinese Muslims in China Tantangan dalam Menyebarkan Informasi terkait Islam bagi Muslim China di China

Family is the first and main educator for all human beings Family is the school of love and trainers of management of stress, management of psycho-social-

THE EFFECT OF MNEMONIC TECHNIQUE ON VOCABULARY RECALL OF THE TENTH GRADE STUDENTS OF SMAN 3 PALANGKA RAYA THESIS PROPOSAL Presented to the Department of Education of the State Islamic College of Palangka Raya in Partial Fulfillment of the Requirements for

GRADERS OF SMAN-3 PALANGKA RAYA ACADEMIC YEAR OF 20132014 THESIS Presented to the Department of Education of the State College of Islamic Studies Palangka Raya in Partial Fulfillment of the Requirements for the Degree of Sarjana Pendidikan Islam

A. Research Design and Approach - The readability level of reading texts in the english textbook entitled “Bahasa Inggris SMA/MA/MAK” for grade XI semester 1 published by the Ministry of Education and Culture of Indonesia - Digital Library IAIN Palangka R

A. Background of Study - The quality of the english textbooks used by english teachers for the tenth grade of MAN Model Palangka Raya Based on Education National Standard Council (BSNP) - Digital Library IAIN Palangka Raya

1. The definition of textbook - The quality of the english textbooks used by english teachers for the tenth grade of MAN Model Palangka Raya Based on Education National Standard Council (BSNP) - Digital Library IAIN Palangka Raya

CHAPTER IV DISCUSSION - The quality of the english textbooks used by english teachers for the tenth grade of MAN Model Palangka Raya Based on Education National Standard Council (BSNP) - Digital Library IAIN Palangka Raya

Dukungan

Links

Multiple Comparisons

13.6 Multiple Comparisons

Parts

Dokumen yang terkait

Optimal Retention for a Quota Share Reinsurance

Digital Gender Gap for Housewives Digital Gender Gap bagi Ibu Rumah Tangga

Challenges of Dissemination of Islam-related Information for Chinese Muslims in China Tantangan dalam Menyebarkan Informasi terkait Islam bagi Muslim China di China

Family is the first and main educator for all human beings Family is the school of love and trainers of management of stress, management of psycho-social-

THE EFFECT OF MNEMONIC TECHNIQUE ON VOCABULARY RECALL OF THE TENTH GRADE STUDENTS OF SMAN 3 PALANGKA RAYA THESIS PROPOSAL Presented to the Department of Education of the State Islamic College of Palangka Raya in Partial Fulfillment of the Requirements for

GRADERS OF SMAN-3 PALANGKA RAYA ACADEMIC YEAR OF 20132014 THESIS Presented to the Department of Education of the State College of Islamic Studies Palangka Raya in Partial Fulfillment of the Requirements for the Degree of Sarjana Pendidikan Islam

A. Research Design and Approach - The readability level of reading texts in the english textbook entitled “Bahasa Inggris SMA/MA/MAK” for grade XI semester 1 published by the Ministry of Education and Culture of Indonesia - Digital Library IAIN Palangka R

A. Background of Study - The quality of the english textbooks used by english teachers for the tenth grade of MAN Model Palangka Raya Based on Education National Standard Council (BSNP) - Digital Library IAIN Palangka Raya

1. The definition of textbook - The quality of the english textbooks used by english teachers for the tenth grade of MAN Model Palangka Raya Based on Education National Standard Council (BSNP) - Digital Library IAIN Palangka Raya

CHAPTER IV DISCUSSION - The quality of the english textbooks used by english teachers for the tenth grade of MAN Model Palangka Raya Based on Education National Standard Council (BSNP) - Digital Library IAIN Palangka Raya

Dokumen yang Anda mencari sudah siap untuk unduhkan