Chapter11 - Analysis of Variance

  Statistics for Managers Using Microsoft® Excel 5th Edition

Learning Objectives

  In this chapter, you learn:

   The basic concepts of experimental design

  

How to use the one-way analysis of variance

to test for differences among the means of several groups

  

How to use the two-way analysis of variance

and interpret the interaction

Chapter Overview

  Analysis of Variance (ANOVA) F-test

  Tukey- Kramer test

  One-Way ANOVA

  Two-Way ANOVA

  Interaction Effects General ANOVA Setting

  

Investigator controls one or more factors of interest

   Each factor contains two or more levels

   Levels can be numerical or categorical

   Different levels produce different groups

   Think of the groups as populations

   Observe effects on the dependent variable

   Are the groups the same?

   Experimental design: the plan used to collect the data

  Completely Randomized Design

   Experimental units (subjects) are assigned randomly to the different levels (groups)

   Subjects are assumed homogeneous

   Only one factor or independent variable

   With two or more levels (groups)

  

Analyzed by one-factor analysis of variance

(one-way ANOVA)

One-Way Analysis of Variance

  Evaluate the difference among the means of three or

   more groups

Examples: Accident rates for 1st, 2nd, and 3rd shift

Expected mileage for five brands of tires

Assumptions

   Populations are normally distributed

   Populations have equal variances

  

Samples are randomly and independently drawn

   Hypotheses: One-Way ANOVA H : μ μ μ  μ    

   1

  2 3 c All population means are equal

   i.e., no treatment effect (no variation in means among groups)

   H : Not all of the population means are the same

   1

   At least one population mean is different i.e., there is a treatment (groups) effect

   Does not mean that all population means are different (at

   least one of the means is different from the others)

  Hypotheses: One-Way ANOVA H : μ μ μ  μ    

  1

  2 3 c H : Not all μ are the same

  1 j

  All Means are the same: The Null Hypothesis is True

  (No Group Effect)

  μ μ μ  

  1

  2

  3 Hypotheses: One-Way ANOVA

  At least one mean is different:

  H : μ μ μ  μ    

  1

  2 3 c

  The Null Hypothesis is NOT true

  j H : Not all μ are the same

  1

  (Treatment Effect is present) or

  μ μ μ μ μ μ  

   

  1

  2

  3

  1

  2

  3 Partitioning the Variation

   Total variation can be split into two parts:

  SST = Total Sum of Squares (Total variation)

  SSA = Sum of Squares Among Groups (Among-group variation)

  SSW = Sum of Squares Within Groups (Within-group variation)

  SST = SSA + SSW Partitioning the Variation

SST = SSA + SSW

  Total Variation = the aggregate dispersion of the individual data values around the overall (grand) mean of all factor levels (SST) Among-Group Variation = dispersion between the factor sample means (SSA) Within-Group Variation = dispersion that exists among the data values within the particular factor levels (SSW)

  =

  Partitioning the Variation

Among Group Variation (SSA) Within Group Variation (SSW) Total Variation (SST)

The Total Sum of Squares SST = SSA + SSW n j c

  2 SST (

  X X )   ij

    j 1 i

  1  

  Where: SST = Total sum of squares c = number of groups n = number of values in group j j th X = i value from group j ij

  X = grand mean (mean of all data values)

The Total Sum of Squares

  2

  2

  12

  2

  11 ) ( ... ) ( ) (

  X X

  X X

  X X SST nc

  

      

 

   

 

c j n i ij

j

  X X SST

  1

  

1

  2 ) ( Among-Group Variation SST = SSA + SSW c

  2 SSA n (

  X j X )   j

   j

  1 

  Where: SSA = Sum of squares among groups c = number of groups n = sample size from group j j

  X = sample mean from group j j X = grand mean (mean of all data values) Among-Group Variation

  2 j c 1 j j )

  X X ( n SSA    2 c c 2

2

2 2 1 1

  )

  X X ( n ... )

  X X ( n )

  X X ( n SSA       

  µ 1 µ 2 µ c Within-Group Variation SST = SSA + SSW n j c

  2 SSW (

  X X ) j   ij

   

  Where:

  j 1 i

  1  

  SSW = Sum of squares within groups c = number of groups n = sample size from group j j

  X = sample mean from group j j th X = i value in group j ij

Within-Group Variation

  2 2 21 2 11

) ( ... ) ( ) (

1 1 c

  X X

  X X

  X X SSW nc       

  2

  1

  1 ) ( j ij n i c j

  X X SSW j  

      j

  

 Obtaining the Mean Squares SSA

  Mean Squares Among

  MSA  c

  1  SSW MSW

  Mean Squares Within

   n c  SST

  MST

  Mean Squares Within

   n 1  One-Way ANOVA Table Source of df SS MS F-Ratio Variation

  (Variance) Among c-1 SSA MSA MSA F

  Groups MSW

  Within n-c SSW MSW Groups Total n-1 SST = SSA+SSW c = number of groups n = sum of the sample sizes from all groups df = degrees of freedom

  One-Way ANOVA Test Statistic

   Test statistic

   MSA is mean squares among variances

   MSW is mean squares within variances

   Degrees of freedom

   df 1 = c – 1 (c = number of groups)

   df 2 = n – c (n = sum of all sample sizes) MSW MSA

  H : μ 1 = μ 2 = … = μ c H 1 : At least two population means are different

F 

  One-Way ANOVA Test Statistic

  

The F statistic is the ratio of the among variance to the

within variance

   The ratio must always be positive

   df 1 = c -1 will typically be small

   df 2 = n - c will typically be large

  Decision Rule: Reject H if F > F U , otherwise do not reject H

   = .05

  Reject H Do not reject H F U One-Way ANOVA Example You want to see if three different golf clubs yield

  Club 1 Club 2 Club 3 different distances. You

  254 234 200

  randomly select five 263 218 222 measurements from trials on an

  241 235 197

  automated driving machine for

  237 227 206

  each club. At the .05

  251 216 204

  significance level, is there a difference in mean distance?

  One-Way ANOVA Example Distance Club 1 Club 2 Club 3

  270 254 234 200

  • 260

  263 218 222

  X

  1

  • •• 250

  241 235 197

  • 240

  237 227 206

  • ••

  251 216 204 230

  X

  • 2

  X

  • 220
    • •• X

  3

  210

  • •• x 249.2 x 226.0 x 205.8
  • 1    2 3

      200

    • •• x 227.0 

      1 2 3 190

      Club

    One-Way ANOVA Example

      X = 249.2 1 n = 5 1 X = 226.0 2 n = 5 2 X = 205.8 3 n = 5 3 n = 15 X = 227.0 2 c = 3 2 2 SSA = 5 (249.2 – 227) + 5 (226 – 227) + 5 (205.8 – 227) = 4716.4 2 2 2 SSW = (254 – 249.2) + (263 – 249.2) +…+ (204 – 205.8) = 1119.6 One-Way ANOVA Example

      MSA = 4716.4 / (3-1) = 2358.2

      2358.2 F 25.275  

      MSW = 1119.6 / (15-3) = 93.3

      93.3 Critical Value:

    F = 3.89

    U

       = .05

      Do not Reject H reject H F = 25.275 F = 3.89 U

    One-Way ANOVA Example

       H : μ 1 = μ 2

    = μ

    3

       H 1 : μ j

    not all equal

        = .05

       df 1 = 2 df

    2

    = 12 Decision: Reject H at α = 0.05

      Conclusion: There is evidence that at least one

      μ j

      differs from the rest

      

    One-Way ANOVA in Excel

    SUMMARY

      EXCEL:

      Groups Count Sum Average Variance

      Select

      Club 1 5 1246 249.2 108.2

      Tools ____

      Club 2 5 1130 226

      77.5 Data Club 3 5 1029 205.8

      94.2 Analysis ANOVA

      _____

      Source of SS df MS F P-value F crit

      ANOVA:

      Variation

      single

      Between 4716.4 2 2358.2 25.275

      4.99E-05

      3.89

      factor

      Groups Within 1119.6

      12

      93.3 Groups Total 5836.0

      

    14

      

    The Tukey-Kramer Procedure

    Tells which population means are significantly different

       1 2 3

      e.g.: μ = μ ≠ μ

      

    Done after rejection of equal means in ANOVA

       Allows pair-wise comparisons

      

    Compare absolute mean differences with critical

       range

      x

      μ μ = μ

      1

      2

      3

      

    Tukey-Kramer Critical Range

      MSW

      1

      1   Critical Range Q

       U

        2 n n j j'

       

      where: Q = Value from Studentized Range Distribution with c U and n - c degrees of freedom for the desired level of  (see appendix E.9 table) MSW = Mean Square Within n and n = Sample sizes from groups j and j’ j j’

      The Tukey-Kramer Procedure

      1. Compute absolute mean differences: Club 1 Club 2 Club 3

      254 234 200 263 218 222 241 235 197 237 227 206 251 216 204

      20.2 205.8 226.0 x x 43.4 205.8 249.2 x x 23.2 226.0 249.2 x x

      3 2 3 1 2 1        

         

      2. Find the Q U value from the table in appendix E.9 with c = 3 and (n – c) = (15 – 3) = 12 degrees of freedom for the desired level of  ( = .05 used here):

      3.77 Q  U The Tukey-Kramer Procedure

      5. All of the absolute mean differences are greater than critical range. Therefore there is a significant difference between each pair of means at the 5% level of significance.

         

           

      20.2 x x 43.4 x x 23.2 x x 3 2 3 1 2 1

      3. Compute Critical Range:

       

         

           

        

      2 MSW Q Range Critical U j' j

      16.285

      1

      3.77 n 1 n

      93.3

      2

      1

      5

      1

      5

      4. Compare:

      ANOVA Assumptions

       Randomness and Independence Select random samples from the c groups (or

       randomly assign the levels) Normality

       The sample values from each group are from a

       normal population Homogeneity of Variance

       Can be tested with Levene’s Test 

      ANOVA Assumptions Levene’s Test

       Tests the assumption that the variances of each group are equal.

       First, define the null and alternative hypotheses:

       H : σ

      21 = σ

      22 = …=σ

      2c

       H 1 : Not all σ 2j are equal

       Second, compute the absolute value of the difference between each value and the median of each group.

       Third, perform a one-way ANOVA on these absolute differences.

      Two-Way ANOVA

       Examines the effect of

       Two factors of interest on the dependent variable

       e.g., Percent carbonation and line speed on soft drink bottling process

       Interaction between the different levels of these two factors

       e.g., Does the effect of one particular carbonation level depend on which level the line speed is set?

    Two-Way ANOVA

       Assumptions

       Populations are normally distributed

       Populations have equal variances

       Independent random samples are selected Two-Way ANOVA Sources of Variation

    Two Factors of interest: A and B

      r = number of levels of factor A c = number of levels of factor B / n = number of replications for each cell n = total number of observations in all cells

      / (n = rcn )

    X = value of the kth observation of level i ijk of factor A and level j of factor B Two-Way ANOVA Sources of Variation

      Degrees of

      SST = SSA + SSB + SSAB + SSE

      Freedom: SSA r – 1

      Factor A Variation SST

      SSB c – 1

      Total Variation Factor B Variation

      SSAB Variation due to interaction

      (r – 1)(c – 1) between A and B n - 1

      SSE / rc(n – 1)

      Random variation (Error) Two-Way ANOVA Equations   

         

        r 1 i c 1 j n 1 k

      2 ijk )

      X X ( SST 2 r

      1 i .. i )

      X X ( n c SSA   

       

      2 c 1 j .j .

      )

      X X ( n r SSB   

       

      Total Variation: Factor A Variation:

      Factor B Variation:

      Two-Way ANOVA Equations

      2

      1

      1 . . .. .

      ) (

      X X

      X X n SSAB r i c j j i ij

          

      

     

     

           

         r 1 i

    c

    1 j

    n 1 k

      2 . ij ijk

      )

      Interaction Variation: Sum of Squares Error:

    X X ( SSE

      Two-Way ANOVA Equations Mean Grand n rc

      X X r 1 i c 1 j n 1 k ijk   

           ..., r) 2, 1, (i A factor of level i of Mean n c

      X X .. i th c 1 j n 1 k ijk   

         r = number of levels of factor A c = number of levels of factor B n / = number of replications in each cell Two-Way ANOVA Equations r n

      X ijk   i   1 k 1 th

      X .j. Mean of j level of factor B (j 1, 2, ...,

      c)    n r n  ij.

      X ijk

      X Mean of cell ij    n k 1  r = number of levels of factor A c = number of levels of factor B / n = number of replications in each cell

      Two-Way ANOVA Equations 1 r SSA

      A factor square Mean MSA   

      1 c SSB B factor square Mean MSB

         ) )( 1 c

      ( 1 r SSAB Mean n interactio square MSAB

          ) ( 1 ' n rc

      SSE Mean error square MSE    Two-Way ANOVA: The F Test Statistic F Test for Factor B Effect F Test for Interaction Effect H : μ 1.. = μ 2..

      = • • • = μ r.. H 1 : Not all μ i.. are equal

      H : the interaction of A and B is equal to zero H 1 : interaction of A and B isn’t zero F Test for Factor A Effect H : μ .1. = μ .2.

      = • • • = μ .c. H 1 : Not all μ .j. are equal

      Reject H if F > F U

      MSE MSA F 

      MSE MSB F 

      MSE MSAB

      Reject H if F > F U Reject H if F > F U

    F 

      Two-Way ANOVA: Summary Table Source of Variation Degrees of Freedom Sum of Squares Mean Squares F Statistic Factor A r – 1 SSA MSA

      = SSA /(r – 1) MSA/ MSE Factor B c – 1 SSB MSB

      = SSB /(c – 1) MSB/ MSE AB (Interaction) (r – 1)(c – 1) SSAB MSAB

      = SSAB / (r – 1)(c – 1) MSAB/ MSE Error rc(n’ – 1) SSE MSE =

      SSE/rc(n’ – 1) Total n – 1 SST Two-Way ANOVA: Features Degrees of freedom always add up

       /

       Total = error + factor A + factor B + interaction

      n-1 = rc(n -1) + (r-1) + (c-1) + (r-1)(c-1)

       The denominator of the F Test is always the same

       but the numerator is different The sums of squares always add up

       SST = SSE + SSA + SSB + SSAB

       Total = error + factor A + factor B + interaction

       Two-Way ANOVA: Interaction

       No Significant Interaction: Factor B Level 1 Factor B Level 3 Factor B Level 2 Factor A Levels

      Factor B Level 1 Factor B Level 3 Factor B Level 2

      Factor A Levels M ea n R es po ns e

      M ea n R es po ns e

       Interaction is present:

    Chapter Summary

       Described one-way analysis of variance

       The logic of ANOVA

       ANOVA assumptions

       F test for difference in c means

      

    The Tukey-Kramer procedure for multiple comparisons

       Described two-way analysis of variance

       Examined effects of multiple factors

       Examined interaction between factors In this chapter, we have