The Wilcoxon Signed-Rank Test
15.1 The Wilcoxon Signed-Rank Test
A research chemist performed a particular chemical experiment a total of ten times under identical conditions, obtaining the following ordered values of reaction temperature:
The distribution of reaction temperature is of course continuous. Suppose the investigator is willing to assume that the reaction temperature distribution is
symmetric; that is, there is a point of symmetry such that the density curve to the left of that point is the mirror image of the density curve to its right. This point of symmetry is the median m , of the distribution (and is also the mean value m pro- vided that the mean is finite). The assumption of symmetry may at first thought seem quite bold, but remember that any normal distribution is symmetric, so sym- metry is actually a weaker assumption than normality.
Let’s now consider testing H 0 :m , 5 0 versus H a :m , . 0. The null hypothesis
can be interpreted as saying that a temperature of any particular magnitude, for example, 1.50, is no more likely to be positive s11.50d than it is to be negative s21.50d. A glance at the data suggests that this hypothesis is not very tenable; for example, the sample median is 1.66, which is far larger than the magnitude of any of the three negative observations.
Figure 15.1 shows two different symmetric pdf’s, one for which H 0 is true and one for which H a is true. When H 0 is true, we expect the magnitudes of the negative
observations in the sample to be comparable to the magnitudes of the positive obser-
vations. If, however, H 0 is “grossly” untrue as in Figure 15.1(b), then observations of
large absolute magnitude will tend to be positive rather than negative.
(a)
(b)
Figure 15.1 Distributions for which (a) m , 5 0; (b) m, W 0
For the sample of ten reaction temperatures, let’s for the moment disregard the signs of the observations and rank the absolute magnitudes from 1 to 10, with the smallest getting rank 1, the second smallest rank 2, and so on. Then apply the sign of each observation to the corresponding rank to obtain signed ranks . Typically some signed ranks will be negative (e.g., 23), whereas others will be positive
(e.g., 8). The test statistic will be S 1 5 the sum of the positively signed ranks.
Signed Rank
s 1 5 4 1 5 1 6 1 7 1 8 1 9 1 10 5 49
654 ChapTeR 15 Distribution-Free procedures
When the median of the distribution is much greater than 0, most of the observations with large absolute magnitudes should be positive, resulting in positively signed
ranks and a large value of s 1 . On the other hand, if the median is 0, magnitudes
of positively signed observations should be intermingled with those of negatively
signed observations, in which case s 1 will not be very large. So intuitively, a larger s 1 value provides more evidence against H 0 than does a smaller value. This implies that the test is upper-tailed: The P-value will be P 0 (S 1 s 1 ), where P 0 represents the probability calculated assuming that H 0 is true. Thus we must determine the distribu-
tion of S 1 when the null hypothesis is true—that is, its null distribution.
Consider n 5 5, in which case there are 2 5 32 ways of applying signs to the
five ranks 1, 2, 3, 4, and 5 (each rank could have a 2 sign or a 1 sign). The key
point is that when H 0 is true, any collection of five signed ranks has the same chance
as does any other collection. That is, the smallest observation in absolute magnitude is equally likely to be positive or negative, the same is true of the second smallest observation in absolute magnitude, and so on. Thus the collection 21, 2, 3, 24, 5 of signed ranks is just as likely as the collection 1, 2, 3, 4, 25, and just as likely as any one of the other 30 possibilities.
Table 15.1 lists the 32 possible signed-rank sequences when n 5 5, along with
the value s 1 , for each sequence. This immediately gives the null distribution of S 1
displayed in Table 15.2. For example, Table 15.1 shows that three of the 32 possible
sequences have s 1 5 8, so P 0 (S 1 5 8) 5 1 y32 1 1y32 1 1y32 5 3y32. Notice that
the null distribution is symmetric about 7.5 [more generally, symmetrically distributed Table 15.1 Possible Signed-Rank Sequences for n55
Sequence s 1 Sequence s 1
Table 15.2 Null Distribution of S 1 When n55
s 1 0 1 2 3 45 6 7
p ss d 1 1 1 2 2 3 3 1 3 32 32 32 32 32 32 32 32
s 1 8 9 10 11 12 13 14 15
p ss d 3 3 3 2 2 1 1 1 1 32 32 32 32 32 32 32 32
15.1 The Wilcoxon Signed-Rank Test 655
over the possible values 0, 1, 2,…, n sn 1 1dy2g. This symmetry is important in relat- ing the P-value for lower-tailed and two-tailed tests to that of an upper-tailed test.
For n 5 10 there are 2 10 5 1024 possible signed-rank sequences, so a listing
would involve much effort. Each sequence, though, would have probability 1 y1024
when H 0 is true, from which the distribution of S 1 when H 0 is true can be easily obtained. We are now in a position to calculate a P-value for testing H 0 :m , 5 0 versus
H a :m , . 0 when n 5 5. Suppose that s 1 5 13. Then
Pvalue 5 P(S 1 13 when H 0 is true)
If s 1 5 14, then P-value 5 2 y32 5 .063. For the sample x 1 5 .58, x 2 5 2.50, x 3 52 .21, x 4 5 1.23, x 5 .97, the signed rank sequence is 21, 12, 13, 14, 15, so s 1 5 14. Thus H 0 would be rejected at significance level .10 because P-value 5
.063 .10 5 a. However, at significance level .05 or .01, there would not be enough evidence to justify rejecting the null hypothesis.
General description of the test Because the underlying distribution is assumed symmetric, m 5 m ,, so we will state
the hypotheses of interest in terms of m rather than m ,.
Assumption X 1 ,X 2 ,…, X n is a random sample from a continuous and symmetric probability distribution with mean (and median) m.
When the hypothesized value of m is m 0 , the absolute differences ux 1 2m 0 u,…,
ux n 2m 0 u must be ranked from smallest to largest. Null hypothesis: H 0 :m5m 0
Test statistic value: s 1 5 the sum of the ranks associated with positive
(x i 2m 0 ) ’ s
Alternative Hypothesis
P-Value determination
H a :m.m 0 P 0 (S 1 s 1 )
H a :m,m 0 P 0 (S 1 s 1 )5P 0 (S 1 n(n 1 1) y2 – s 1 )
H a :m±m 0 2P 0 (S 1 max{s 1 , n(n 1 1) y2 – s 1 }) Appendix Table A.13 gives P 0 (S 1 c) 5 P(S 1 c when H 0 is true) for
values of c for which this probability is closest to .1, .05, .025, .01, and .005. This allows conclusions to be reached at significance levels that are at least approximately .10, .05, and .01.
Suppose, for example, that the test is upper-tailed and based on n 5 10. Table A.13
shows that P 0 (S 1 41) 5 .097 and P 0 (S 1 44) 5 .053. So if s 1 5 40, then the
If the tails of the distribution are “too heavy,” as was the case with the Cauchy distribution mentioned in Chapter 6, then m will not exist. In such cases, the Wilcoxon test will still be valid for tests concerning m ,.
656 ChapTeR 15 Distribution-Free procedures
P- value exceeds .10. The value s 1 5 42 implies that .05 , P-value , .10, allowing
for rejection of the null hypothesis at significance level .10 but not at significance
level .05. If s 1 5 44, it is a really close call at significance level .05.
In the case of a lower-tailed test based on n 5 10, the value s 1 5 13 results
in P-value P 0 (S 1 13). By symmetry of the null distribution, this is identical to
P 0 (S 1 10(11) y2 2 13) 5 P 0 (S 1 42). The P-value is then between .05 and .10. If a two-tailed test results in s 1 5 44 when n 5 10, then max{44, 55 2 44} 5 44. Thus the P-value is 2P 0 (S 1 44) 5 2(.053) 5 .106. This would also be the P-value
if s 1 5 11, since max{11, 55 2 11} 5 44; the value 11 is just as far out in the lower tail of the null distribution as 44 is in the upper tail.
ExAmplE 15.1 A manufacturer of electric irons, wishing to test the accuracy of the thermostat con- trol at the 500°F setting, instructs a test engineer to obtain actual temperatures at that
setting for 15 irons using a thermocouple. The resulting measurements are as follows:
The engineer believes it is reasonable to assume that a temperature deviation from 500° of any particular magnitude is just as likely to be positive as negative (the ass- umption of symmetry) but wants to protect against possible nonnormality of the actual temperature distribution, so she decides to use the Wilcoxon signed-rank test to see whether the data strongly suggests incorrect calibration of the iron.
The hypotheses are H 0 : m 5 500 versus H a : m ± 500, where m 5 the true
average actual temperature at the 500°F setting. Subtracting 500 from each x i gives
The ranks are obtained by ordering these from smallest to largest without regard to sign.
Thus s 1 5 2 1 4 1 7 1 9 1 13 5 35. With n(n 1 1) y2 5 120, the P-value for a
two-tailed test is 2P 0 (S 1 35) 5 2P 0 (S 1 85). Appendix Table A.13 shows that
P 0 (S 1 89) 5 .053, so P-value . 2(.053) 5 .106. Even at significance level .10, the null hypothesis cannot be rejected, so it certainly cannot be rejected at level .05. Software gives .164 as the P-value. There is no reason to question the plausibility of 500 as the value of the population mean and median.
n
Although a theoretical implication of the continuity of the underlying distribu- tion is that ties will not occur, in practice they often do because of the discreteness of measuring instruments. If there are several data values with the same absolute magnitude, then they would be assigned the average of the ranks they would receive if they differed very slightly from one another. For example, if in Example 15.1
x 8 5 498.2 is changed to 498.4, then two different values of sx i 2 500 d would have
absolute magnitude 1.6. The ranks to be averaged would be 2 and 3, so each would
be assigned rank 2.5.
15.1 The Wilcoxon Signed-Rank Test 657