Multiple comparisons in ANOVA

10.2 Multiple comparisons in ANOVA

  When the computed value of the F statistic in single-factor ANOVA is not signifi- cant, the analysis is terminated because no differences among the m i ’s have been

  identified. But when H 0 is rejected, the investigator will usually want to know which

  of the m i ’s are different from one another. A method for carrying out this further analysis is called a multiple comparisons procedure .

  Several of the most frequently used procedures are based on the following cen- tral idea. First calculate a confidence interval for each pairwise difference m i 2m j

  with i , j. Thus if I 5 4, the six required CIs would be for m 1 2m 2 (but not also for m 2 2m 1 ), m 1 2m 3 ,m 1 2m 4 ,m 2 2m 3 ,m 2 2m 4 , and m 3 2m 4 . Then if the interval for m 1 2m 2 does not include 0, conclude that m 1 and m 2 differ significantly from

  one another; if the interval does include 0, the two m’s are judged not significantly different. Following the same line of reasoning for each of the other intervals, we end up being able to judge for each pair of m’s whether or not they differ significantly from one another.

  The procedures based on this idea differ in how the various Cls are calculated. Here we present a popular method that controls the simultaneous confidence level for all I(I 2 1) y2 intervals.

  tukey’s Procedure (the t Method)

  Tukey’s procedure involves the use of another probability distribution called the Studentized range distribution . The distribution depends on two parameters: a

  numerator df m and a denominator df n. Let Q a ,m,n denote the upper-tail a critical

  value of the Studentized range distribution with m numerator df and n denominator

  df (analogous to F a ,n 1 ,n 2 ). Values of Q a ,m,n are given in Appendix Table A.10.

  10.2 Multiple Comparisons in aNOVa 421

  pRoposition With probability 1 2 a,

  X i? 2 X j? 2 Q a , I , I (J21) Ï MSEJ m i 2m j

  X i? 2 X j? 1 Q a ,I,I(J21) MSE Ï (10.4) yJ

  for every i and j (i 5 1,…, I and j 5 1,…, I) with i , j.

  Notice that numerator df for the appropriate Q a critical value is I, the number of pop-

  ulation or treatment means being compared, and not I 2 1 as in the F test. When the computed x i? ,x j? and MSE are substituted into (10.4), the result is a collection of con- fidence intervals with simultaneous confidence level 100(1 2 a) for all pairwise differences of the form m i 2m j with i , j. Each interval that does not include 0 yields the conclusion that the corresponding values of m i and m j differ significantly from one another.

  Since we are not really interested in the lower and upper limits of the various intervals but only in which include 0 and which do not, much of the arithmetic asso- ciated with (10.4) can be avoided. The following box gives details and describes how differences can be identified visually using an “underscoring pattern.”

  the t Method for Identifying Significantly different m i’ s

  Select a, extract Q a ,I,I(J21) from Appendix Table A.10, and calculate w 5 Q a , I , I (J21) ? ÏMSEyJ. Then list the sample means in increasing order and

  underline those pairs that differ by less than w. Any pair of sample means not underscored by the same line corresponds to a pair of population or treatment means that are judged significantly different.

  Suppose, for example, that I 5 5 and that x 2? , x 5? , x 4? , x 1? , x 3? Then

  1. Consider first the smallest mean x 2? . If x 5? 2 x 2? w , proceed to Step 2. However,

  if x 5? 2 x 2? , w , connect these first two means with a line segment. Then if possible extend this line segment even further to the right to the largest x i? that differs from x 2? by less than w (so the line may connect two, three, or even more means).

  2. Now move to x 5? and again extend a line segment to the largest x i? to its right that differs from x 5? by less than w (it may not be possible to draw this line, or alterna- tively it may underscore just two means, or three, or even all four remaining means).

  3. Continue by moving to x 4? and repeating, and then finally move to x 1 .

  To summarize, starting from each mean in the ordered list, a line segment is extended as far to the right as possible as long as the difference between the means is smaller than w. It is easily verified that a particular interval of the form (10.4) will contain 0 if and only if the corresponding pair of sample means is underscored by the same line segment.

  ExamplE 10.5

  An experiment was carried out to compare five different brands of automobile oil filters with respect to their ability to capture foreign material. Let m i denote the true average amount of material captured by brand i filters (i 5 1,…, 5) under controlled

  422 Chapter 10 the analysis of Variance

  conditions. A sample of nine filters of each brand was used, resulting in the fol-

  lowing sample mean amounts: x 1? 5 14.5, x 2? 5 13.8, x 3? 5 13.3, x 4? 5 14.3, and

  x 5 . 5 13.1. Table 10.3 is the ANOVA table summarizing the first part of the analysis. Table 10.3 ANOVA Table for Example 10.5

  Source of Variation

  df Sum of Squares

  Mean Square f

  Treatments (brands)

  Since F .001,4,40 5 5.70, the P-value is smaller than .001. Therefore, H 0 is rejected (deci-

  sively) at level .05. We now use Tukey’s procedure to look for significant differences

  among the m i ’s. From Appendix Table A.10, Q .05,5,40 5 4.04 (the second subscript on

  Q is I and not I 2 1 as in F), so w 5 4.04Ï.088y9 5 .4. After arranging the five sample means in increasing order, the two smallest can be connected by a line seg- ment because they differ by less than .4. However, this segment cannot be extended further to the right since 13.8 2 13.1 5 .7 .4. Moving one mean to the right, the pair x 3? and x 2? cannot be underscored because these means differ by more than .4. Again moving to the right, the next mean, 13.8, cannot be connected to any further to the right. The last two means can be underscored with the same line segment.

  13.1 13.3 13.8 14.3 14.5 Thus brands 1 and 4 are not significantly different from one another, but are signifi-

  cantly higher than the other three brands in their true average contents. Brand 2 is significantly better than 3 and 5 but worse than 1 and 4, and brands 3 and 5 do not differ significantly.

  If x 2? 5 14.15 rather than 13.8 with the same computed w, then the configura-

  tion of underscored means would be

  13.1 13.3 14.15 14.3 14.5 n

  ExamplE 10.6

  A biologist wished to study the effects of ethanol on sleep time. A sample of 20 rats, matched for age and other characteristics, was selected, and each rat was given an oral injection having a particular concentration of ethanol per body weight. The rapid eye movement (REM) sleep time for each rat was then recorded for a 24-hour period, with the following results:

  Treatment (concentration of ethanol)

  x i? x i?

  0 (control) 88.6 73.2 91.4 68.0 75.2 396.4 79.28 1 gkg

  x .. 5 1107.5 x.. 5 55.375

  Does the data indicate that the true average REM sleep time depends on the concentration of ethanol? (This example is based on an experiment reported in

  “Relationship of Ethanol Blood Level to REM and Non-REM Sleep Time and

  Distribution in the Rat,” Life Sciences, 1978: 839–846 .)

  10.2 Multiple Comparisons in aNOVa 423

  The x i .’s differ rather substantially from one another, but there is also a great deal of variability within each sample. To answer the question precisely we must carry out the ANOVA. The smallest and largest of the four sample standard deviations are 9.34 and 10.18, respectively, which supports the assumption of equal variances. A normal probability plot of the 20 residuals shows a reasonably linear pattern, justifying the assumption that the four REM sleep time distributions are normal. Thus it is legiti- mate to employ the F test.

  Table 10.4 is a SAS ANOVA table. The last column gives the P-value as .0001. Using a significance level of .05, we reject the null hypothesis

  H 0 :m 1 5m 2 5m 3 5m 4 , since P-value 5 .0001 < .05 5 a. True average REM

  sleep time does appear to depend on concentration level. Table 10.4 SAS ANOVA Table

  Analysis of Variance Procedure Dependent Variable: TIME

  Sum of

  Mean

  Source DF Squares Square F Value Pr . F

  Corrected Total

  There are I 5 4 treatments and 16 df for error, from which Q .05,4,16 5 4.05 and

  w5 4.05 Ï93.0y5 5 17.47. Ordering the means and underscoring yields

  x 4? x 3? x 2? x 1?

  The interpretation of this underscoring must be done with care, since we seem to have concluded that treatments 2 and 3 do not differ, 3 and 4 do not differ, yet 2 and

  4 do differ. The suggested way of expressing this is to say that although evidence allows us to conclude that treatments 2 and 4 differ from one another, neither has been shown to be significantly different from 3. Treatment 1 has a significantly higher true average REM sleep time than any of the other treatments.

  Figure 10.5 shows SAS output from the application of Tukey’s procedure.

  Alpha =0.05 df =16 MSE = 92.9625 Critical Value of Studentized Range = 4.046

  Minimum Significant Difference = 17.446 Means with the same letter are not significantly different. Tukey Grouping Mean N TREATMENT

  C B 47.920

  Figure 10.5 Tukey’s method using SAS n

  the Interpretation of a in tukey’s Method

  We stated previously that the simultaneous confidence level is controlled by Tukey’s method. So what does “simultaneous” mean here? Consider calculating a 95 CI for a population mean m based on a sample from that population and then a 95 CI

  424 Chapter 10 the analysis of Variance

  for a population proportion p based on another sample selected independently of the first one. Prior to obtaining data, the probability that the first interval will include m is .95, and this is also the probability that the second interval will include p. Because the two samples are selected independently of one another, the probability that both intervals will include the values of the respective parameters is (.95)(.95) 5

  (.95) 2 < .90. Thus the simultaneous or joint confidence level for the two intervals is roughly 90—if pairs of intervals are calculated over and over again from inde- pendent samples, in the long run roughly 90 of the time the first interval will cap- ture m and the second will include p. Similarly, if three CI’s are calculated based on

  independent samples, the simultaneous confidence level will be 100(.95) 3 < 86.

  Clearly, as the number of intervals increases, the simultaneous confidence level that all intervals capture their respective parameters will decrease.

  Now suppose that we want to maintain the simultaneous confidence level at

  95. Then for two independent samples, the individual confidence level for each would have to be 100 Ï.95 < 97.5. The larger the number of intervals, the higher the individual confidence level would have to be to maintain the 95 simul- taneous level.

  The tricky thing about the Tukey intervals is that they are not based on inde- pendent samples—MSE appears in every one, and various intervals share the same x i? ’s (e.g., in the case I 5 4, three different intervals all use x 1? ). This implies that there is no straightforward probability argument for ascertaining the simultane- ous confidence level from the individual confidence levels. Nevertheless, it can be shown that if Q .05 is used, the simultaneous confidence level is controlled at 95, whereas using Q .01 gives a simultaneous 99 level. To obtain a 95 simultaneous level, the individual level for each interval must be considerably larger than 95. Said in a slightly different way, to obtain a 5 experimentwise or family error rate, the individual or per-comparison error rate for each interval must be considerably smaller than .05. Minitab asks the user to specify the family error rate (e.g., 5) and then includes on output the individual error rate (see Exercise 16).

  confidence Intervals for other Parametric Functions

  In some situations, a CI is desired for a function of the m i ’s more complicated than

  a difference of m i 2m j . Let u 5 oc i m i , where the c i ’s are constants. One such func-

  tion is 1 y2(m 1 1m 2 )21 y3(m 3 1m 4 1m 5 ), which in the context of Example 10.5

  measures the difference between the group consisting of the first two brands and that of the last three brands. Because the X ij ’s are normally distributed with E(X ij )5m i

  and V(X ij )5s 2 , uˆ 5 oc i X i . is normally distributed, unbiased for u, and

  c _ 2 o i? + i? i

  o i

  s 2

  V (uˆ) 5 V c i X 5 c 2 i V (X )5

  J o i

  i

  Estimating s 2 by MSE and forming sˆ uˆ results in a t variable (uˆ 2 u) ysˆ uˆ , which can

  be manipulated to obtain the following 100(1 2 a) confidence interval for oc i m i ,

  Î J

  o (10.5)

  MSE 2 oc i

  c i x i? 6 t a y2,I(J21)

  ExamplE 10.7

  The parametric function for comparing the first two (store) brands of oil filter with the

  last three (national) brands is u 5 1 y2(m 1 1m 2 )21 y3(m 3 1m 4 1m 5 ), from which

  c 2 i 5 1 1 2 1 2 1 2 o 5 1

  10.2 Multiple Comparisons in aNOVa 425

  With uˆ 5 1 y2(x 1 .1x 2 .) 2 1 y3(x 3 .1x 4 .1x 5 .) 5 .583 and MSE 5 .088, a 95

  interval is

  Ï n y[(6)(9)] 5 .583 6 .182 5 (.401, .765)

  Sometimes an experiment is carried out to compare each of several “new” treatments to a control treatment. In such situations, a multiple comparisons tech- nique called Dunnett’s method is appropriate.

  ExErciSES section 10.2 (11–21)

  11. An experiment to compare the spreading rates of five

  Tukey’s pairwise comparisons

  different brands of yellow interior latex paint available in

  Family error rate 5 0.0500

  a particular area used 4 gallons (J 5 4) of each paint.

  Individual error rate 5 0.00693

  The sample average spreading rates (ft 2 ygal) for the five

  Critical value 5 4.10

  brands were x 1? 5 462.0, x 2? 5 512.8, x 3? 5 437.5,

  x 4? 5 469.3, and x 5? 5 532.1. The computed value of F

  Intervals for (column level mean) – (row level

  mean)

  was found to be significant at level a 5 .05. With

  MSE 5 272.8, use Tukey’s procedure to investigate sig-

  nificant differences in the true average spreading rates

  between brands.

  12. In Exercise 11, suppose x . 5 427.5. Now which true

  average spreading rates differ significantly from one 3

  2 another? Be sure to use the method of underscoring to 23.9 10.9 18.0

  illustrate your conclusions, and write a paragraph sum- 2 80.0

  marizing your results.

  a. Is it plausible that the variances of the five axial stiff-

  13. Repeat Exercise 12 supposing that x 2 . 5 502.8 in addi-

  ness index distributions are identical? Explain.

  tion to x 3 . 5 427.5.

  b. Use the output (without reference to our F table) to

  14. Use Tukey’s procedure on the data in Example 10.3 to

  test the relevant hypotheses.

  identify differences in true average bond strengths

  c. Use the Tukey intervals given in the output to deter-

  among the five protocols.

  mine which means differ, and construct the corre-

  15. Exercise 10.7 described an experiment in which 26 resis-

  sponding underscoring pattern.

  tivity observations were made on each of six different

  17. Refer to Exercise 5. Compute a 95 t CI for u 5

  concrete mixtures. The article cited there gave the fol-

  1 y2(m 1 1m 2 )2m 3 .

  lowing sample means: 14.18, 17.94, 18.00, 18.00, 25.74,

  18. Consider the accompanying data on plant growth

  27.67. Apply Tukey’s method with a simultaneous confi-

  after the application of five different types of growth

  dence level of 95 to identify significant differences,

  hormone.

  and describe your findings (use MSE 5 13.929).

  16. Reconsider the axial stiffness data given in Exercise 8.

  ANOVA output from Minitab follows:

  Analysis of Variance for Stiffness Source DF SS MS F P

  a. Perform an F test at level a 5 .05.

  Total 34 75468

  b. What happens when Tukey’s procedure is applied?

  Level

  N Mean StDev

  19. Consider a single-factor ANOVA experiment in which

  I5 3, J 5 5, x 1? 5 2? 10, x 5 12, and x 3? 5 20. Find a

  value of SSE for which f . F .05,2,12 , so that H 0 :m 1 5

  m 2 5m 3 is rejected, yet when Tukey’s procedure is

  applied none of the m i ’s can be said to differ signifi-

  Pooled StDev 5 32.39

  cantly from one another.

  426 Chapter 10 the analysis of Variance

  20. Refer to Exercise 19 and suppose x 1? 5 2? 10, x 5 15, and

  a. Test the null hypothesis that true average survival time

  x 3? 5 20. Can you now find a value of SSE that produces

  does not depend on an injection regimen against the

  such a contradiction between the F test and Tukey’s proce-

  alternative that there is some dependence on an injec-

  dure?

  tion regimen using a 5 .01.

  21. The article “The Effect of Enzyme Inducing Agents on the

  b. Suppose that 100(1 2 a) CIs for k different para-

  Survival Times of Rats Exposed to Lethal Levels of

  metric functions are computed from the same ANOVA

  Nitrogen Dioxide” (Toxicology and Applied Pharmacology,

  data set. Then it is easily verified that the simultaneous

  1978: 169–174) reports the following data on survival times

  confidence level is at least 100(1 2 ka). Compute

  for rats exposed to nitrogen dioxide (70 ppm) via different

  CIs with a simultaneous confidence level of at least

  injection regimens. There were J 5 14 rats in each group.

  98 for

  m 2 1 y5(m

  x (min)

  6. p-Aminobenzoic Acid

Dokumen yang terkait

AN ALIS IS YU RID IS PUT USAN BE B AS DAL AM P E RKAR A TIND AK P IDA NA P E NY E RTA AN M E L AK U K A N P R AK T IK K E DO K T E RA N YA NG M E N G A K IB ATK AN M ATINYA P AS IE N ( PUT USA N N O MOR: 9 0/PID.B /2011/ PN.MD O)

0 82 16

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

19 819 7

Anal isi s L e ve l Pe r tanyaan p ad a S oal Ce r ita d alam B u k u T e k s M at e m at ik a Pe n u n jang S MK Pr ogr a m Keahl ian T e k n ologi , Kese h at an , d an Pe r tani an Kelas X T e r b itan E r lan gga B e r d asarkan T ak s on om i S OL O

2 99 16

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

1 29 9

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

7 202 3

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

0 63 87

The Correlation between students vocabulary master and reading comprehension

16 145 49

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

8 140 133

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

9 128 37

Transmission of Greek and Arabic Veteri

0 1 22