Wilcoxon Rank-Sum Test

16.3 Wilcoxon Rank-Sum Test

  As we indicated earlier, the nonparametric procedure is generally an appropriate alternative to the normal theory test when the normality assumption does not hold. When we are interested in testing equality of means of two continuous distributions that are obviously nonnormal, and samples are independent (i.e., there is no pairing of observations), the Wilcoxon rank-sum test or Wilcoxon two-sample test is an appropriate alternative to the two-sample t-test described in Chapter 10.

  We shall test the null hypothesis H 0 that ˜ μ 1 =˜ μ 2 against some suitable alter- native. First we select a random sample from each of the populations. Let n 1 be the number of observations in the smaller sample, and n 2 the number of observa- tions in the larger sample. When the samples are of equal size, n 1 and n 2 may be randomly assigned. Arrange the n 1 +n 2 observations of the combined samples in ascending order and substitute a rank of 1, 2, . . . , n 1 +n 2 for each observation. In

  the case of ties (identical observations), we replace the observations by the mean of the ranks that the observations would have if they were distinguishable. For example, if the seventh and eighth observations were identical, we would assign a rank of 7.5 to each of the two observations.

  The sum of the ranks corresponding to the n 1 observations in the smaller sample is denoted by w 1 . Similarly, the value w 2 represents the sum of the n 2 ranks corresponding to the larger sample. The total w 1 +w 2 depends only on the number

  of observations in the two samples and is in no way affected by the results of the

  experiment. Hence, if n 1 = 3 and n 2 = 4, then w 1 +w 2 =1+2+ · · · + 7 = 28,

  regardless of the numerical values of the observations. In general, (n 1 +n 2 )(n 1 +n 2 + 1)

  the arithmetic sum of the integers 1, 2, . . . , n 1 +n 2 . Once we have determined w 1 ,

  it may be easier to find w 2 by the formula

  In choosing repeated samples of sizes n 1 and n 2 , we would expect w 1 , and therefore w 2 , to vary. Thus, we may think of w 1 and w 2 as values of the random variables W 1 and W 2 , respectively. The null hypothesis ˜ μ 1 =˜ μ 2 will be rejected in favor of the alternative ˜ μ 1 <˜ μ 2 only if w 1 is small and w 2 is large. Likewise, the alternative ˜ μ 1 >˜ μ 2 can be accepted only if w 1 is large and w 2 is small. For a two-tailed test, we may reject H 0 in favor of H 1 if w 1 is small and w 2 is large or if w 1 is large and w 2 is small. In other words, the alternative ˜ μ 1 <˜ μ 2 is accepted if w 1 is sufficiently small; the alternative ˜ μ 1 >˜ μ 2 is accepted if w 2 is sufficiently small; and the alternative ˜ μ 1 = ˜μ 2 is accepted if the minimum of w 1 and w 2 is

  sufficiently small. In actual practice, we usually base our decision on the value

  2 of the related statistic U 1 or U 2 or on the value u of the statistic U , the minimum

  of U 1 and U 2 . These statistics simplify the construction of tables of critical values,

  Chapter 16 Nonparametric Statistics

  since both U 1 and U 2 have symmetric sampling distributions and assume values in

  the interval from 0 to n 1 n 2 such that u 1 +u 2 =n 1 n 2 .

  From the formulas for u 1 and u 2 we see that u 1 will be small when w 1 is small and u 2 will be small when w 2 is small. Consequently, the null hypothesis will

  be rejected whenever the appropriate statistic U 1 ,U 2 , or U assumes a value less

  than or equal to the desired critical value given in Table A.17. The various test procedures are summarized in Table 16.4.

  Table 16.4: Rank-Sum Test

  H 0 H 1 Compute

  ⎧ ⎨ μ ˜ 1 <˜ μ 2 u 1 μ ˜ 1 =˜ μ 2 μ ˜ 1 >˜ μ 2 u ⎩ 2 μ ˜ 1 = ˜μ 2 u

  Table A.17 gives critical values of U 1 and U 2 for levels of significance equal

  to 0.001, 0.01, 0.025, and 0.05 for a one-tailed test, and critical values of U for levels of significance equal to 0.002, 0.02, 0.05, and 0.10 for a two-tailed test. If

  the observed value of u 1 ,u 2 , or u is less than or equal to the tabled critical

  value, the null hypothesis is rejected at the level of significance indicated by the

  table. Suppose, for example, that we wish to test the null hypothesis that ˜ μ 1 =˜ μ 2 against the one-sided alternative that ˜ μ 1 <˜ μ 2 at the 0.05 level of significance for random samples of sizes n 1 = 3 and n 2 = 5 that yield the value w 1 = 8. It follows

  Our one-tailed test is based on the statistic U 1 . Using Table A.17, we reject the null hypothesis of equal means when u 1 ≤ 1. Since u 1 = 2 does not fall in the

  rejection region, the null hypothesis cannot be rejected.

  Example 16.5: The nicotine content of two brands of cigarettes, measured in milligrams, was found

  to be as follows:

  Brand A 2.1 4.0 6.3 5.4 4.8 3.7 6.1 3.3 Brand B 4.1 0.6 3.1 2.5 4.0 6.2 1.6 2.2 1.9 5.4

  Test the hypothesis, at the 0.05 level of significance, that the median nicotine contents of the two brands are equal against the alternative that they are unequal.

  4. Critical region: u ≤ 17 (from Table A.17).

  5. Computations: The observations are arranged in ascending order and ranks from 1 to 18 assigned.

  16.3 Wilcoxon Rank-Sum Test

  Original Data Ranks Original Data Ranks

  The ranks marked with an asterisk belong to sample A.

  6. Decision: Do not reject the null hypothesis H 0 and conclude that there is

  no significant difference in the median nicotine contents of the two brands of cigarettes.

  Normal Theory Approximation for Two Samples

  When both n 1 and n 2 exceed 8, the sampling distribution of U 1 (or U 2 ) approaches

  the normal distribution with mean and variance given by

  Consequently, when n 2 is greater than 20, the maximum value in Table A.17, and

  n 1 is at least 9, we can use the statistic

  for our test, with the critical region falling in either or both tails of the standard

  normal distribution, depending on the form of H 1 .

  The use of the Wilcoxon rank-sum test is not restricted to nonnormal popula- tions. It can be used in place of the two-sample t-test when the populations are normal, although the power will be smaller. The Wilcoxon rank-sum test is always superior to the t-test for decidedly nonnormal populations.

  Chapter 16 Nonparametric Statistics