Distribution-Free ANOVA
15.4 Distribution-Free ANOVA
The single-factor ANOVA model of Chapter 10 for comparing I population or treat- ment means assumed that for i 5 1, 2,…, I, a random sample of size J i was drawn
from a normal population with mean m and variance s 2 i . This can be written as
X ij 5m i 1e ij
j5 1,…, J i ; i 5 1,…, I (15.10)
where the e ij ’s are independent and normally distributed with mean zero and vari-
ance s 2 . Although the normality assumption was required for the validity of the F test described in Chapter 10, the next procedure for testing equality of the m i ’s requires only that the e ij ’s have the same continuous distribution.
the Kruskal-Wallis test
Let N 5 oJ i , the total number of observations in the data set, and suppose we rank all N observations from 1 (the smallest X ij ) to N (the largest X ij ). When
672 ChapTeR 15 Distribution-Free procedures
H 0 :m 1 5m 2 5…5m I is true, the N observations all come from the same distribu-
tion, in which case all possible assignments of the ranks 1, 2,…, N to the I samples are
equally likely and we expect ranks to be intermingled in these samples. If, however, H 0
is false, then some samples will consist mostly of observations having small ranks in the combined sample, whereas others will consist mostly of observations having large ranks. More specifically, if R ij denotes the rank of X ij among the N observations, and R i? and R i? denote, respectively, the total and average of the ranks in the ith sample,
then when H 0 is true,
sR i? d5
E
E sR ij d5 E
i o sR
The Kruskal-Wallis test statistic is a measure of the extent to which the R i? ’s deviate from their common expected value sN 1 1dy2.
tEst stAtistic
12 I N1 1
N (N 1 1) j5 o 1 1 2
K5 J i R i? 2
N (N 1 1) o i5 1 J i
12 I R 2 i?
5 2 3(N 1 1)
The second expression for K is the computational formula; it involves the rank totals (R i? ’s) rather than the averages and requires only one subtraction.
Values of K at least as contradictory to H 0 as the calculated k are those that equal or exceed k. That is, the test is upper-tailed: P-value 5 P 0 (K k). Under H 0 ,
each possible assignment of the ranks to the I samples is equally likely, so in theory all such assignments can be enumerated, the value of K determined for each one, and the null distribution obtained by counting the number of times each value of K occurs. Clearly, this computation is tedious, so even though there are tables of the exact null distribution and critical values for small values of the J i ’s, we will use the following “large-sample” approximation.
pRoposition When H 0 is true and either I5 3 J i 6 (i 5 1, 2, 3)
or
I. 3 J i 5 (i 5 1,…, I)
then K has approximately a chi-squared distribution with I 2 1 df. This implies
that the approximate P-value is the area under the x 2 I2 1 curve to the right of k.
Appendix Table A.11 gives a tabulation of chi-squared upper-tail curve areas.
ExAmplE 15.9
The accompanying observations (Table 15.6) on axial stiffness index resulted from
a study of metal-plate connected trusses in which five different plate lengths—4 in.,
6 in., 8 in., 10 in., and 12 in.—were used (“Modeling Joints Made with Light-
Gauge Metal Connector Plates,” Forest Products J., 1979: 39–44) .
15.4 Distribution-Free aNOVa 673
Table 15.6 Data and Ranks for Example 15.9
i51 s40d: 309.2 309.7 311.0 316.8 326.5 349.8 409.5 i52 s60d: 331.0 347.2 348.9 361.0 381.7 402.1 404.5 i53 s80d: 351.0 357.1 366.2 367.3 382.0 392.4 409.9 i54 s100d: 346.7 362.6 384.2 410.6 433.1 452.9 461.4 i55 s120d: 407.4 410.7 419.9 441.2 441.8 465.8 473.4
r i? r i?
i 5 1: 1 2 3 4 5 10 24 49 7.00 i 5 2: 6 8 9 13 17 21 22 96 13.71
Ranks
i 5 3: 11 12 15 16 18 20 25 117 16.71 i 5 4: 7 14 19 26 29 32 33 160 22.86 i 5 5: 23 27 28 30 31 34 35 208 29.71
The computed value of K is
12 2 2 2 2 s49d 2 s96d s117d s160d s208d
35 s36d 3 7 7 7 7 7 4
k5
1 1 1 1 2 3 s36d
Appendix Table A.11 shows that the area under the 4 df chi-squared curve to the right of 16.74 is .005 and the area under this curve to the right of 20.51 is .001. So the P -value for the test is slightly larger than .001 but much smaller than .005, and thus
smaller than .01. Therefore H 0 is rejected at significance level .01, and we conclude
that expected axial stiffness does depend on plate length.
n
Friedman’s test for a randomized Block Experiment
Suppose X ij 5m1a i 1b j 1e ij , where a i is the ith treatment effect, b j is the jth
block effect, and the e ij ’s are drawn independently from the same continuous (but not
necessarily normal) distribution. Then to test H 0 :a 1 5a 2 5…5a I 5 0, the null
hypothesis of no treatment effects, the observations are first ranked separately from
1 to I within each block, and then the rank average r i? is computed for each of the
I treatments. When H 0 is true, the r i? ’s should be close to one another, since within
each block all I! assignments of ranks to treatments are equally likely. Friedman’s test statistic measures the discrepancy between the expected value sI 1 1dy2 of each rank average and the r i? ’s.
12J
I (I 1 1) o 1 i? 2 2 IJ (I 1 1) o i5 i?
I I1 1 12
tEst stAtistic
F 5 R 2 5 R r 2 3 J (I 1 1)
The test is again upper-tailed, because any value exceeding the calculated f r is even
more contradictory to H 0 than is f r itself. For the cases I 5 3, J 5 2,…, 15 and
I5 4, J 5 2,…, 8, Lehmann’s book (see the chapter bibliography) gives the upper- tail critical values from which P-value information can be obtained. Alternatively, for even moderate values of J, the test statistic F r has approximately a chi-squared
674 ChapTeR 15 Distribution-Free procedures
distribution with I 2 1 df when H 0 is true, so the approximate P-value is the area
under the x 2 I2 1 curve to the right of f r .
ExAmplE 15.10 The article “Physiological Effects During Hypnotically Requested Emotions” (Psychosomatic Med., 1963: 334–343) reports the following data (Table 15.7) on skin potential (mV) when the emotions of fear, happiness, depression, and calmness were requested from each of eight subjects.
Table 15.7 Data and Ranks for Example 15.10
Blocks (Subjects)
x ij 12345678
Fear
Happiness 22.7 53.2 9.7 19.6 13.8 47.1 13.6 23.6 Depression 22.5 53.7 10.8 21.1 13.7 39.2 13.7 16.3 Calmness 22.6 53.1 8.3 21.6 13.3 37.0 14.8 14.8
Ranks 1 2 3 4 5 6 7 8 r i? r 2 i?
Fear
Happiness 3 2 2 1 4 3 1 4 20 400 Depression 1 3 4 2 3 2 2 2 19 361 Calmness 2 1 1 3 2 1 3 1 14 196
f r 5 s1686d 2 3s8ds5d 5 6.45
4 s8ds5d
The v = 3 column of Appendix Table A.11 shows that P-value < .09. Since this
exceeds .05, H 0 cannot be rejected at that significance level. There is no evidence
that average skin potential depends on which emotion is requested.
n
The book by Myles Hollander et. al. (see the chapter bibliography) discusses multiple comparisons procedures associated with the Kruskal-Wallis and Friedman tests, as well as other aspects of distribution-free ANOVA.
ExERciSES section 15.4 (23–27)
23. The accompanying data refers to concentration of the
Test at level .10 to see whether true average strontium-90
radioactive isotope strontium-90 in milk samples
concentration differs for at least two of the regions.
obtained from five randomly selected dairies in each of
24. The article “Production of Gaseous Nitrogen in
four different regions.
Human Steady-State Conditions” (J. of Applied
Physiology, 1972: 155–159) reports the following
Region
observations on the amount of nitrogen expired (in
liters) under four dietary regimens: (1) fasting, (2) 23
protein, (3) 32 protein, and (4) 67 protein. Use the
Supplementary exercises 675
Kruskal-Wallis test at level .05 to test equality of the
on soil pretreated with Basic-H. Test at level .01 to see
corresponding m i ’s.
whether there are any effects due to the different treatments.
25. The accompanying data on cortisol level was reported in
the article “Cortisol, Cortisone, and 11-Deoxycortisol
A 25.3 23.7 24.4 21.7 26.2
Levels in Human Umbilical and Maternal Plasma in
B 19.3 17.3 17.0 16.7 18.3
Relation to the Onset of Labor” (J. of Obstetric
C 48.8 47.8 40.2 44.0 46.4
Gynaecology of the British Commonwealth, 1974:
D 37.1 37.5 39.6 35.1 36.5
737–745) . Experimental subjects were pregnant women whose babies were delivered between 38 and 42 weeks
27. In an experiment to study the way in which different anes-
gestation. Group 1 individuals elected to deliver by
thetics affect plasma epinephrine concentration, ten dogs
Caesarean section before labor onset, group 2 delivered by
were selected and concentration was measured while they
emergency Caesarean during induced labor, and group 3
were under the influence of the anesthetics isoflurane, halo-
individuals experienced spontaneous labor. Use the
thane, and cyclopropane (“Sympathoadrenal and
Kruskal-Wallis test at level .05 to test for equality of the
Hemodynamic Effects of Isoflurane, Halothane, and
three population means.
Cyclopropane in Dogs,” Anesthesiology, 1974: 465–470) . Test at level .05 to see whether there is an anesthetic effect
Group 1 262 307 211 323 454 339
on concentration.
26. In a test to determine whether soil pretreated with small
Isoflurane
amounts of Basic-H makes the soil more permeable to
Halothane
water, soil samples were divided into blocks, and each
Cyclopropane 1.07 1.35
block received each of the four treatments under study.
The treatments were (A) water with .001 Basic-H flooded on control soil, (B) water without Basic-H on
Isoflurane
control soil, (C) water with Basic-H flooded on soil
Halothane
pretreated with Basic-H, and (D) water without Basic-H
Cyclopropane 1.53 .49 .56 1.02 .30
SuPPlEmENTARy ExERciSES (28–36)
28. The article “Effects of a Rice-Rich Versus Potato-Rich
significance level .05 to determine whether the true mean
Diet on Glucose, Lipoprotein, and Cholesterol
cholesterol-synthesis rate differs significantly for the two
Metabolism in Noninsulin-Dependent Diabetics” (Amer.
sources of carbohydrates.
J. of Clinical Nutr., 1984: 598–606) gives the accompany- ing data on cholesterol-synthesis rate for eight diabetic subjects. Subjects were fed a standardized diet with potato
Cholesterol-Synthesis Rate
or rice as the major carbohydrate source. Participants received both diets for specified periods of time, with
Subject 123 45 678
cholesterol-synthesis rate (mmolday) measured at the end
Potato 1.88 2.60 1.38 4.41 1.87 2.89 3.96 2.31
of each dietary period. The analysis presented in this article
Rice
used a distribution-free test. Use such a test with
676 ChapTeR 15 Distribution-Free procedures
29. High-pressure sales tactics or door-to-door salespeople can
that the underlying distribution is continuous. To illus-
be quite offensive. Many people succumb to such tactics,
trate, consider the following sample of 20 observations
sign a purchase agreement, and later regret their actions. In
on component lifetime (hr):
the mid-1970s, the Federal Trade Commission imple- mented regulations clarifying and extending the rights of
purchasers to cancel such agreements. The accompanying
data is a subset of that given in the article “Evaluating the
FTC Cooling-Off Rule” (J. of Consumer Affairs, 1977:
We wish to test H 0 :m , 5 25.0 versus H a :m , . 25.0. The
101–106) . Individual observations are cancellation rates
for each of nine sales people during each of 4 years. Use an
test statistic is Y 5 the number of observations that
appropriate test at level .05 to see whether true average
exceed 25.
cancellation rate depends on the year.
a. Determine the P-value of the test when Y 5 15. [Hint: Think of a “success” as a lifetime that exceeds
Salesperson
25.0. Then Y is the number of successes in the sam-
ple. What kind of a distribution does Y have when m , 5 25.0?]
b. For the given data, should H 0 be rejected at signifi-
cance level .05?
[Note: The test statistic is the number of differences
X i 2 25 that have positive signs, hence the name sign
test. ]
30. The given data on phosphorus concentration in topsoil for four different soil treatments appeared in the article
34. Refer to Exercise 33, and consider a confidence inter-
“Fertilisers for Lotus and Clover Establishment on a
val associated with the sign test: the sign interval.
The relevant hypotheses are now H 0 :m , 5 m, 0 versus
Sequence of Acid Soils on the East Otago Uplands” (N.
H a :m , ± m, 0 .
Zeal. J. of Exptl. Ag., 1984: 119–129) . Use a distribution-
free procedure to test the null hypothesis of no difference
a. Suppose we decide to reject H 0 if either Y 15 or
in true mean phosphorus concentration (mgg) for the four
Y 5. What is the smallest a for which this equiva-
soil treatments.
lent to rejecting H 0 if P-value a? b. The confidence interval will consist of all values m , 0
I 8.1 5.9 7.0 8.0 9.0
for which H 0 is not rejected. Determine the CI for the
Treatment II 11.5 10.9 12.1 10.3 11.9
given data, and state the confidence level.
III 15.3 17.4 16.4 15.8 16.0 IV 23.0 33.0 28.4 24.6 27.7
35. Suppose we wish to test.
31. Refer to the data of Exercise 30 and compute a 95 CI
H 0 : the X and Y distributions are identical
for the difference between true average concentrations
versus
for treatments II and III.
H a : the X distribution is less spread out than the Y
32. The study reported in “Gait Patterns During Free
distribution
Choice Ladder Ascents” (Human Movement Sci.,
The accompanying figure pictures X and Y distributions
1983: 187–195) was motivated by publicity concern-
for which H a is true. The Wilcoxon rank-sum test is not
ing the increased accident rate for individuals climbing
appropriate in this situation because when H a is true as
ladders. A number of different gait patterns were used
pictured, the Y’s will tend to be at the extreme ends of the
by subjects climbing a portable straight ladder accord-
combined sample (resulting in small and large Y ranks),
ing to specified instructions. The ascent times for
so the sum of X ranks will result in a W value that is nei-
seven subjects who used a lateral gait and six subjects
ther large nor small.
who used a four-beat diagonal gait are given.
X distribution
Lateral
Y distribution
Diagonal 1.27 1.82 1.66 .85 1.45 1.24
a. Carry out a test using a 5 .05 to see whether the data suggests any difference in the true average ascent
times for the two gaits.
“Ranks”:
b. Compute a 95 CI for the difference between the
Consider modifying the procedure for assigning ranks as
true average gait times.
follows: After the combined sample of m 1 n observations
33. The sign test is a very simple procedure for testing
is ordered, the smallest observation is given rank 1, the
hypotheses about a population median assuming only
largest observation is given rank 2, the second smallest is
Bibliography 677
given rank 3, the second largest is given rank 4, and so on.
36. The ranking procedure described in Exercise 35 is some-
Then if H a is true as pictured, the X values will tend to be in
what asymmetric, because the smallest observation
the middle of the sample and thus receive large ranks. Let
receives rank 1, whereas the largest receives rank 2, and
W9 denote the sum of the X ranks and consider an upper-
so on. Suppose both the smallest and the largest receive
tailed test based on this test statistic. When H 0 is true, every
rank 1, the second smallest and second largest receive
possible set of X ranks has the same probability, so W9
rank 2, and so on, and let W0 be the sum of the X ranks.
has the same distribution as does W when H 0 is true. The
The null distribution of W0 is not identical to the null
accompanying data refers to medial muscle thickness for
distribution of W, so different tables are needed. Consider
arterioles from the lungs of children who died from sudden
the case m 5 3, n 5 4. List all 35 possible orderings of
infant death syndrome (x’s) and a control group of children
the three X values among the seven observations (e.g., 1,
(y’s). Carry out the test of H 0 versus H a at level .05.
3, 7 or 4, 5, 6), assign ranks in the manner described,
SIDS 4.0 4.4 4.8 4.9
compute the value of W0 for each possibility, and then
Control 3.7 4.1 4.3
5.1 5.6
tabulate the null distribution of W0. What is the P-value if w0
5 9? This is the Ansari-Bradley test; for additional
Consult the Lehmann book (in the chapter bibliography)
information, see the book by Hollander and Wolfe in the
for more information on this test, called the Siegel-Tukey
chapter bibliography.
test.
BiBliogRAphy
Hollander, Myles, Douglas Wolfe, and Eric Chicken,
Lehmann, Erich, Nonparametrics: Statistical Methods Based
Nonparametric Statistical Methods (3rd ed.), Wiley, New
on Ranks, Springer, New York, 2006. An excellent
York, 2013. A very good reference on distribution-free
discussion of the most important distribution-free methods,
methods with an excellent collection of tables.
presented with a great deal of insightful commentary.
Quality Control Methods