Multiple Comparisons

13.6 Multiple Comparisons

The analysis of variance is a powerful procedure for testing the homogeneity of

a set of means. However, if we reject the null hypothesis and accept the stated alternative—that the means are not all equal—we still do not know which of the population means are equal and which are diﬀerent.

Chapter 13 One-Factor Experiments: General

The GLM Procedure

Dependent Variable: moisture

Sum of

Source

DF Squares

Mean Square

F Value Pr > F

Corrected Total

R-Square

Coeff Var

Root MSE

moisture Mean

DF Type I SS

Mean Square

F Value Pr > F

DF Type III SS

Mean Square

F Value Pr > F

DF Contrast SS

Mean Square

F Value Pr > F

Figure 13.4: A set of orthogonal contrasts

Often it is of interest to make several (perhaps all possible) paired comparisons among the treatments. Actually, a paired comparison may be viewed as a simple contrast, namely, a test of

H 0 :μ i −μ j = 0,

H 1 :μ i −μ j = 0,

for all i = j. Making all possible paired comparisons among the means can be very beneﬁcial when particular complex contrasts are not known a priori. For example, in the aggregate data of Table 13.1, suppose that we wish to test

H 0 :μ 1 −μ 5 = 0,

H 1 :μ 1 −μ 5 = 0.

The test is developed through use of an F, t, or conﬁdence interval approach. Using t, we have

where s is the square root of the mean square error and n = 6 is the sample size per treatment. In this case,

− 610.67

√

= −1.41.

13.6 Multiple Comparisons

The P-value for the t-test with 25 degrees of freedom is 0.17. Thus, there is not

suﬃcient evidence to reject H 0 .

Relationship between T and F

In the foregoing, we displayed the use of a pooled t-test along the lines of that discussed in Chapter 10. The pooled estimate was taken from the mean squared error in order to enjoy the degrees of freedom that are pooled across all ﬁve samples. In addition, we have tested a contrast. The reader should note that if the t-value is squared, the result is exactly of the same form as the value of f for a test on a contrast, discussed in the preceding section. In fact,

(¯ y 1.

− ¯y 5. ) 2 (553.33

which, of course, is t 2 .

Conﬁdence Interval Approach to a Paired Comparison

It is straightforward to solve the same problem of a paired comparison (or a contrast) using a conﬁdence interval approach. Clearly, if we compute a 100(1 − α)

conﬁdence interval on μ 1 −μ 5 , we have "

where t α2 is the upper 100(1 − α2) point of a t-distribution with 25 degrees

of freedom (degrees of freedom coming from s 2 ). This straightforward connection

between hypothesis testing and conﬁdence intervals should be obvious from dis-

cussions in Chapters 9 and 10. The test of the simple contrast μ 1 −μ 5 involves

no more than observing whether or not the conﬁdence interval above covers zero. Substituting the numbers, we have as the 95 conﬁdence interval

(553.33 − 610.67) ± 2.060 4961

3 −57.34 ± 83.77.

Thus, since the interval covers zero, the contrast is not significant. In other words, we do not find a significant difference between the means of aggregates 1 and 5.

Experiment-wise Error Rate

Serious diﬃculties occur when the analyst attempts to make many or all possible paired comparisons. For the case of k means, there will be, of course, r = k(k − 1)2 possible paired comparisons. Assuming independent comparisons, the experiment-wise error rate or family error rate (i.e., the probability of false rejection of at least one of the hypotheses) is given by 1

− (1 − α) r , where

α is the selected probability of a type I error for a speciﬁc comparison. Clearly, this measure of experiment-wise type I error can be quite large. For example, even

Chapter 13 One-Factor Experiments: General

if there are only 6 comparisons, say, in the case of 4 means, and α = 0.05, the experiment-wise rate is

1 − (0.95) 6 ≈ 0.26.

When many paired comparisons are being tested, there is usually a need to make the effective contrast on a single comparison more conservative. That is, with the confidence interval approach, the confidence intervals would be much wider than the ±t α2 s 2n used for the case where only a single comparison is being made.

Tukey’s Test

There are several standard methods for making paired comparisons that sustain the credibility of the type I error rate. We shall discuss and illustrate two of them here. The ﬁrst one, called Tukey’s procedure, allows formation of simultaneous 100(1 − α) conﬁdence intervals for all paired comparisons. The method is based on the studentized range distribution. The appropriate percentile point is a function

of α, k, and v = degrees of freedom for s 2 . A list of upper percentage points for

α = 0.05 is shown in Table A.12. The method of paired comparisons by Tukey involves finding a significant difference between means i and j (i

5 j. = j) if |¯y − ¯y |

exceeds q(α, k, v) s 2 n .

Tukey’s procedure is easily illustrated. Consider a hypothetical example where we have 6 treatments in a one-factor completely randomized design, with 5 observations taken per treatment. Suppose that the mean square error taken from the

analysis-of-variance table is s 2 = 2.45 (24 degrees of freedom). The sample means

are in ascending order:

y ¯ 2. y ¯ 5. y ¯ 1. ¯ y 3. y ¯ 6. y ¯ 4.

With α = 0.05, the value of q(0.05, 6, 24) is 4.37. Thus, all absolute diﬀerences are to be compared to

As a result, the following represent means found to be signiﬁcantly diﬀerent using Tukey’s procedure:

Where Does the α-Level Come From in Tukey’s Test?

We briefly alluded to the concept of simultaneous confidence intervals being employed for Tukey’s procedure. The reader will gain a useful insight into the notion of multiple comparisons if he or she gains an understanding of what is meant by simultaneous confidence intervals.

In Chapter 9, we saw that if we compute a 95 conﬁdence interval on, say,

a mean μ, then the probability that the interval covers the true mean μ is 0.95.

13.6 Multiple Comparisons

However, as we have discussed, for the case of multiple comparisons, the effective probability of interest is tied to the experiment-wise error rate, and it should be emphasized that the confidence intervals of the type ¯ y i. − ¯y j. ± q(α, k, v)s 1n are not independent since they all involve s and many involve the use of the same averages, the ¯ y i. . Despite the difficulties, if we use q(0.05, k, v), the simultaneous confidence level is controlled at 95. The same holds for q(0.01, k, v); namely, the confidence level is controlled at 99. In the case of α = 0.05, there is a probability of 0.05 that at least one pair of measures will be falsely found to be different (false rejection of at least one null hypothesis). In the α = 0.01 case, the corresponding probability will be 0.01.

Duncan’s Test

The second procedure we shall discuss is called Duncan’s procedure or Dun- can’s multiple-range test . This procedure is also based on the general notion of studentized range. The range of any subset of p sample means must exceed a certain value before any of the p means are found to be diﬀerent. This value is called the least signiﬁcant range for the p means and is denoted by R p , where

The values of the quantity r p , called the least signiﬁcant studentized range, depend on the desired level of signiﬁcance and the number of degrees of freedom of the mean square error. These values may be obtained from Table A.13 for p = 2, 3, . . . , 10 means.

To illustrate the multiple-range test procedure, let us consider the hypothetical example where 6 treatments are compared, with 5 observations per treatment. This is the same example used to illustrate Tukey’s test. We obtain R p by multiplying each r p by 0.70. The results of these computations are summarized as follows:

2.919 3.066 3.160 3.226 3.276 R p 2.043 2.146 2.212 2.258 2.293

r p

Comparing these least signiﬁcant ranges with the diﬀerences in ordered means, we arrive at the following conclusions:

1. Since ¯ y 4. − ¯y 2. = 8.70 > R 6 = 2.293, we conclude that μ 4 and μ 2 are signiﬁ-

cantly diﬀerent.

2. Comparing ¯ y 4. − ¯y 5. and ¯ y 6. − ¯y 2. with R 5 , we conclude that μ 4 is signiﬁcantly greater than μ 5 and μ 6 is signiﬁcantly greater than μ 2 .

3. Comparing ¯ y 4. − ¯y 1. ,¯ y 6. − ¯y 5. , and ¯ y 3. − ¯y 2. with R 4 , we conclude that each

diﬀerence is signiﬁcant.

4. Comparing ¯ y 4. − ¯y 3. ,¯ y 6. − ¯y 1. ,¯ y 3. − ¯y 5. , and ¯ y 1. − ¯y 2. with R 3 , we find all differences significant except for μ 4 −μ 3 . Therefore, μ 3 ,μ 4 , and μ 6 constitute

a subset of homogeneous means.

5. Comparing ¯ y 3. − ¯y 1. ,¯ y 1. − ¯y 5. , and ¯ y 5. − ¯y 2. with R 2 , we conclude that only

μ 3 and μ 1 are not signiﬁcantly diﬀerent.

Chapter 13 One-Factor Experiments: General

It is customary to summarize the conclusions above by drawing a line under any subsets of adjacent means that are not signiﬁcantly diﬀerent. Thus, we have

y ¯ 2. ¯ y 5. y ¯ 1. y ¯ 3. ¯ y 6. y ¯ 4.

It is clear that in this case the results from Tukey’s and Duncan’s procedures are very similar. Tukey’s procedure did not detect a diﬀerence between 2 and 5, whereas Duncan’s did.

Dunnett’s Test: Comparing Treatment with a Control

In many scientific and engineering problems, one is not interested in drawing infer- ences regarding all possible comparisons among the treatment means of the type μ i −μ j . Rather, the experiment often dictates the need to simultaneously compare each treatment with a control. A test procedure developed by C. W. Dunnett de- termines significant differences between each treatment mean and the control, at a single joint significance level α. To illustrate Dunnett’s procedure, let us consider the experimental data of Table 13.6 for a one-way classification where the effect of three catalysts on the yield of a reaction is being studied. A fourth treatment, no catalyst, is used as a control.

Table 13.6: Yield of Reaction

Control

Catalyst 1 Catalyst 2 Catalyst 3

y ¯ 0. = 51.44

y ¯ 1. = 53.50

¯ y 2. = 54.04

y ¯ 3. = 49.38

In general, we wish to test the k hypotheses

where μ 0 represents the mean yield for the population of measurements in which the control is used. The usual analysis-of-variance assumptions, as outlined in Section 13.3, are expected to remain valid. To test the null hypotheses speciﬁed

by H 0 against two-sided alternatives for an experimental situation in which there are k treatments, excluding the control, and n observations per treatment, we ﬁrst calculate the values

n The sample variance s 2 is obtained, as before, from the mean square error in the

2s 2

analysis of variance. Now, the critical region for rejecting H 0 , at the α-level of

Exercises

signiﬁcance, is established by the inequality

|d i |>d α2 (k, v),

where v is the number of degrees of freedom for the mean square error. The values of the quantity d α2 (k, v) for a two-tailed test are given in Table A.14 for α = 0.05 and α = 0.01 for various values of k and v.

Example 13.5: For the data of Table 13.6, test hypotheses comparing each catalyst with the con-

trol, using two-sided alternatives. Choose α = 0.05 as the joint signiﬁcance level. Solution : The mean square error with 16 degrees of freedom is obtained from the analysis-

of-variance table, using all k + 1 treatments. The mean square error is given by

From Table A.14 the critical value for α = 0.05 is found to be d 0.025 (3, 16) = 2.59.

Since |d 1 | < 2.59 and |d 3 | < 2.59, we conclude that only the mean yield for catalyst

2 is signiﬁcantly diﬀerent from the mean yield of the reaction using the control.

Many practical applications dictate the need for a one-tailed test for comparing treatments with a control. Certainly, when a pharmacologist is concerned with the eﬀect of various dosages of a drug on cholesterol level and his control is zero dosage, it is of interest to determine if each dosage produces a signiﬁcantly larger reduction than the control. Table A.15 shows the critical values of d α (k, v) for one-sided alternatives.

Exercises

13.12 Consider the data of Review Exercise 13.45 on laundered under speciﬁc conditions. Two baths were page 555. Make signiﬁcance tests on the following con- prepared, one with carboxymethyl cellulose and one trasts:

without. Twelve pieces of fabric were laundered 5 times

(a) B versus A, C, and D;

in bath I, and 12 other pieces of fabric were laundered

(b) C versus A and D;

10 times in bath I. This was repeated using 24 addi- tional pieces of cloth in bath II. After the washings the

lengths of fabric that burned and the burn times were measured. For convenience, let us deﬁne the following

13.13 The purpose of the study The Incorporation treatments: of a Chelating Agent into a Flame Retardant Finish of a Cotton Flannelette and the Evaluation of Selected

Treatment 1: 5 launderings in bath I,

Fabric Properties conducted at Virginia Tech was to

Treatment 2: 5 launderings in bath II,

evaluate the use of a chelating agent as part of the flame-retardant finish of cotton flannelette by deter-

Treatment 3: 10 launderings in bath I,

mining its eﬀects upon ﬂammability after the fabric is

Treatment 4: 10 launderings in bath II.

Chapter 13 One-Factor Experiments: General

Burn times, in seconds, were recorded as follows:

13.16 An investigation was conducted to determine

Treatment

the source of reduction in yield of a certain chemical 1 2 3 4 product. It was known that the loss in yield occurred in

13.7 6.2 27.2 18.2 the mother liquor, that is, the material removed at the 23.0 5.4 16.8 8.8 filtration stage. It was thought that different blends 15.7 5.0 12.9 14.5 of the original material might result in different yield 25.5 4.4 14.9 14.7 reductions at the mother liquor stage. The following 15.8 5.0 17.1 17.1 are the percent reductions for 3 batches at each of 4 preselected blends:

7.1 11.5 9.9 (a) Perform the analysis of variance at the α = 0.05

(a) Perform an analysis of variance, using a 0.01 level of

level of signiﬁcance.

significance, and determine whether there are any (b) Use Duncan’s multiple-range test to determine significant differences among the treatment means.

which blends diﬀer.

(b) Use single-degree-of-freedom contrasts with α = (c) Do part (b) using Tukey’s test.

0.01 to compare the mean burn time of treatment

1 versus treatment 2 and also treatment 3 versus 13.17 In the study An Evaluation of the Removal

treatment 4.

Method for Estimating Benthic Populations and Diver- sity conducted by Virginia Tech on the Jackson River, 5

13.14 The study Loss of Nitrogen Through Sweat by diﬀerent sampling procedures were used to determine Preadolescent Boys Consuming Three Levels of Dietary the species counts. Twenty samples were selected at Protein was conducted by the Department of Human random, and each of the 5 sampling procedures was Nutrition and Foods at Virginia Tech to determine per- repeated 4 times. The species counts were recorded as spiration nitrogen loss at various dietary protein levels. follows: Twelve preadolescent boys ranging in age from 7 years,

Sampling Procedure

8 months to 9 years, 8 months, all judged to be clini-

Substrate

cally healthy, were used in the experiment. Each boy

Deple- Modiﬁed

Removal Kick-

was subjected to one of three controlled diets in which

tion

Hess

Surber Kicknet net

29, 54, or 84 grams of protein were consumed per day.

The following data represent the body perspiration ni-

trogen loss, in milligrams, during the last two days of

the experimental period:

Protein Level

(a) Is there a signiﬁcant diﬀerence in the average

species counts for the diﬀerent sampling proce-

dures? Use a P-value in your conclusion.

(b) Use Tukey’s test with α = 0.05 to ﬁnd which sam-

pling procedures diﬀer.

(a) Perform an analysis of variance at the 0.05 level 13.18 The following data are values of pressure (psi)

of signiﬁcance to show that the mean perspiration in a torsion spring for several settings of the angle be- nitrogen losses at the three protein levels are dif- tween the legs of the spring in a free position: ferent.

Angle ( ◦ )

(b) Use Tukey’s test to determine which protein levels

are signiﬁcantly diﬀerent from each other in mean

nitrogen loss.

13.15 Use Tukey’s test, with a 0.05 level of signiﬁ-

cance, to analyze the means of the ﬁve diﬀerent brands

of headache tablets in Exercise 13.2 on page 518.

Exercises

Compute a one-way analysis of variance for this experi- the data of Exercise 13.6 on page 519. Discuss the ment and state your conclusion concerning the eﬀect of results. angle on the pressure in the spring. (From C. R. Hicks,

Fundamental Concepts in the Design of Experiments, 13.23 In a biological experiment, four concentrations

Holt, Rinehart and Winston, New York, 1973.)

of a certain chemical are used to enhance the growth of a certain type of plant over time. Five plants are used at each concentration, and the growth in each plant is

13.19 It is suspected that the environmental temper- measured in centimeters. The following growth data ature at which batteries are activated aﬀects their life. are taken. A control (no chemical) is also applied. Thirty homogeneous batteries were tested, six at each of ﬁve temperatures, and the data are shown below

Concentration

(activated life in seconds). Analyze and interpret the

Control

data. (From C. R. Hicks, Fundamental Concepts in

Design of Experiments, Holt, Rinehart and Winston,

New York, 1973.)

Temperature ( ◦

55 60 70 72 65 Use Dunnett’s two-sided test at the 0.05 level of signif- 55 61 72 66 icance to simultaneously compare the concentrations 57 60 72 60 with the control.

54 60 77 68 65 13.24 The financial structure of a firm refers to the 56 60 77 69 65 way the firm’s assets are divided into equity and debt, and the financial leverage refers to the percentage of assets financed by debt. In the paper The Effect of Fi-

13.20 The following table (from A. Hald, Statistical nancial Leverage on Return, Tai Ma of Virginia Tech Theory with Engineering Applications, John Wiley claims that financial leverage can be used to increase Sons, New York, 1952) gives tensile strengths (in devi- the rate of return on equity. To say it another way, ations from 340) for wires taken from nine cables to be stockholders can receive higher returns on equity with used for a high-voltage network. Each cable is made the same amount of investment through the use of fi- from 12 wires. We want to know whether the mean nancial leverage. The following data show the rates strengths of the wires in the nine cables are the same. of return on equity using 3 different levels of financial If the cables are different, which ones differ? Use a leverage and a control level (zero debt) for 24 randomly P-value in your analysis of variance.

selected ﬁrms:

Cable

Tensile Strength

Financial Leverage

Medium High

7 10 7 8 1 Source: Standard Poor’s Machinery Indus- try Survey, 1975.

13.21 The printout in Figure 13.5 on page 532 gives (a) Perform the analysis of variance at the 0.05 level of information on Duncan’s test, using PROC GLM in

signiﬁcance.

SAS, for the aggregate data in Example 13.1. Give (b) Use Dunnett’s test at the 0.01 level of signiﬁcance conclusions regarding paired comparisons using Dun-

to determine whether the mean rates of return on

can’s test results.

equity are higher at the low, medium, and high levels of ﬁnancial leverage than at the control level.

13.22 Do Duncan’s test for paired comparisons for

Chapter 13 One-Factor Experiments: General

The GLM Procedure Duncan’s Multiple Range Test for moisture

NOTE: This test controls the Type I comparisonwise error rate,

not the experimentwise error rate.

Error Degrees of Freedom

Error Mean Square

Number of Means

Critical Range

Means with the same letter are not significantly different.

Duncan Grouping

Figure 13.5: SAS printout for Exercise 13.21.

Multiple Comparisons

13.6 Multiple Comparisons

Parts

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

The Correlation between students vocabulary master and reading comprehension

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

Antiremed Kelas 12 Matematika (4)

Transmission of Greek and Arabic Veteri

Services for adults with an autism spect

Dukungan

Links

Multiple Comparisons

13.6 Multiple Comparisons

Parts

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

The Correlation between students vocabulary master and reading comprehension

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

Antiremed Kelas 12 Matematika (4)

Transmission of Greek and Arabic Veteri

Services for adults with an autism spect

Dokumen yang Anda mencari sudah siap untuk unduhkan