Distribution-Free confidence intervals
15.3 Distribution-Free confidence intervals
The method we have used so far to construct a confidence interval (CI) can be described
as follows: Start with a random variable (Z, T, x 2 , F, or the like) that depends on the
parameter of interest and a probability statement involving the variable, manipulate the inequalities of the statement to isolate the parameter between random endpoints, and, finally, substitute computed values for random variables. Another general method for obtaining CIs takes advantage of the relationship between test procedures and CIs discussed in Section 8.5. A 100 s1 2 ad CI for a parameter u can be obtained from a
level a test for H 0 :u5u 0 versus H a :u±u 0 . This method will be used to derive inter-
vals associated with the Wilcoxon signed-rank test and the Wilcoxon rank-sum test.
pRoposition Suppose we have a level a test procedure for testing H 0 :u5u 0 versus
H a :u±u 0 . For fixed sample values, let A denote the set of all values u 0 for
which H 0 is not rejected. Then A is a 100(1 2 a) CI for u.
This makes intuitive sense because the CI consists of all values of the parameter
that are plausible at the selected confidence level, and we do not want to reject H 0
in favor of H a if u 0 is a plausible value.
There are actually pathological examples in which the set A defined in the proposition is not an interval of u values, but instead the complement of an interval or something even stranger. To be more precise, we should really replace the notion of a CI with that of a confidence set. In the cases of interest here, the set A does turn out to be an interval.
the Wilcoxon Signed-rank Interval
To test H 0 :m5m 0 versus H a :m±m 0 using the Wilcoxon signed-rank test,
where m is the mean of a continuous symmetric distribution, the absolute values
ux 1 2m 0 u,…, ux n 2m 0 u are ordered from smallest to largest, with the smallest
receiving rank 1 and the largest rank n. Each rank is then given the sign of its
associated x i 2m 0 , and the test statistic is the sum of the positively signed ranks. The two-tailed test rejects H 0 if s 1 is either c or n sn 1 1dy2 2 c, where c
is obtained from Appendix Table A.13 once the desired level of significance a is
specified. For fixed x 1 ,…, x n , the 100 s1 2 ad signed-rank interval will consist of all m 0 for which H 0 :m5m 0 is not rejected at level a. To identify this interval, it is convenient to express the test statistic S 1 in another form.
S 1 5 the number of pairwise averages (X i 1 X j ) y2 with i j that
are m 0
That is, if we average each x j in the list with each x i to its left, including
sx j 1 x j dy2 (which is just x j ), and count the number of these averages that are
m 0 ,s 1 results. In moving from left to right in the list of sample values, we are simply averaging every pair of observations in the sample [again including
sx j 1 x j dy2] exactly once, so the order in which the observations are listed before
668 ChapTeR 15 Distribution-Free procedures
averaging is not important. The equivalence of the two methods for computing
s is not difficult to verify. The number of pairwise averages is n 1 ( 2 ) 1 n (the first
term due to averaging of different observations and the second due to averaging each x i with itself), which equals n sn 1 1dy2. It can be shown that P-value a
if and only if either too many or too few of these pairwise averages are m 0 , in
which case H 0 is rejected.
ExAmplE 15.6 The following observations are values of cerebral metabolic rate for rhesus monkeys:
x 1 5 2 4.51, x 5 3 4.59, x 5 4 4.90, x 5 5 4.93, x
6 6.80, x 5 7 5.08, x 5 5.67.
The 28 pairwise averages are, in increasing order,
The first few and the last few of these are pictured in Figure 15.2.
At level .046, H 0 is accepted for
0 in here.
Figure 15.2 Plot of the data for Example 15.6
Because S 1 is a discrete rv, a 5 .05 cannot be obtained exactly. Appendix Table
A.13 shows that the P-value for a two-tailed test is 2(.023) 5 .046 if either s 1 5 26 or 2. Thus H 0 will not be rejected at significance level .046 if 3 s 1 25. That is, if the number of pairwise averages m 0 is between 3 and 25, inclusive, H 0 is not
rejected. From Figure 15.2 the CI for m with confidence level 95.4 (approximately
95) is (4.59, 5.94).
n
In general, once the pairwise averages are ordered from smallest to largest, the endpoints of the Wilcoxon interval are two of the “extreme” averages. To express this precisely, let the smallest pairwise average be denoted by x s1d , the next smallest
by x s2d ,…, and the largest by x sn sn11dy2d .
pRoposition
If the level a Wilcoxon signed-rank test for H 0 :m5m 0 versus H a :m±m 0 is to reject H 0 if either s 1 c or s 1 n (n 1 1) y2 2 c, then a 100(1 2 a) CI
for m is
(x (n(n11) y22c11) ,x (c) ) (15.7)
In words, the interval extends from the dth smallest pairwise average to the dth larg- est average, where d 5 n sn 1 1dy2 2 c 1 1. Appendix Table A.15 gives the values of c that correspond to approximately the usual confidence levels for n 5 5, 6,…, 25.
15.3 Distribution-Free Confidence Intervals 669
ExAmplE 15.7
For n 5 7, the P-value for a two-tailed test is 2(.055) 5 .11 if s 1 5 24 or s 1 5 4.
(Example 15.6
Therefore the null hypothesis will be rejected at significance level .11 if s 1 5 0, 1, 2,
continued) 3, 4, 24, 25, 26, 27, or 28. Thus an 89.0 interval (approximately 90) is obtained by using c 5 24. The interval is sx s2822411d , x s24d d 5 sx s5d , x s24d d 5 s4.72, 5.85d, which extends from the fifth smallest to the fifth largest pairwise average.
n
The derivation of the interval depended on having a single sample from a con- tinuous symmetric distribution with mean (median) m. When the data is paired, the
interval constructed from the differences d 1 ,d 2 ,…, d n is a CI for the mean (median)
difference m D . In this case, the symmetry of X and Y distributions need not be assumed;
as long as the X and Y distributions have the same shape, the X 2 Y distribution will
be symmetric, so only continuity is required.
For n . 20, the large-sample approximation to the Wilcoxon test based
on standardizing S 1 gives an approximation to c in (15.7). The result [for a
100 s1 2 ad interval] is
4 y2 Î 24
n (n 1 1)
n (n 1 1)(2n 1 1)
c <
1 z a
The efficiency of the Wilcoxon interval relative to the t interval is roughly the same as that for the Wilcoxon test relative to the t test. In particular, for large sam- ples when the underlying population is normal, the Wilcoxon interval will tend to be slightly wider than the t interval, but if the population is quite nonnormal (symmetric but with heavy tails), then the Wilcoxon interval will tend to be much narrower than the t interval.
the Wilcoxon rank-Sum Interval
The Wilcoxon rank-sum test for testing H 0 :m 1 2m 2 5D 0 is carried out by first combining the sX i 2D 0 d’s and Y j ’s into one sample of size m 1 n and ranking them
from smallest (rank 1) to largest (rank m 1 n). The test statistic W is then the sum
of the ranks of the sX i 2D 0 d’s. For the two-sided alternative, H 0 is rejected if w is
either too small or too large.
To obtain the associated CI for fixed x i ’s and y j ’s, we must determine the
set of all D 0 values for which H 0 is not rejected. This is easiest to do if the test sta-
tistic is expressed in a slightly different form. The smallest possible value of W is
m sm 1 1dy2, corresponding to every sX i 2D 0 d less than every Y j , and there are mn differences of the form sX i 2D 0 d2Y j . A bit of manipulation gives
m (m 1 1)
W 5 [number of (X i 2 Y j 2D 0 )’s 0] 1
2 (15.8) m (m 1 1)
5 [number of (X i 2 Y j )’s D 0 ]1
The P-value will be at most a, leading to rejection of the null hypothesis, if w is relatively small (close to 0) or large (close to m(m 1 2n 1 1) y2). This is equivalent
to rejecting H 0 if the number of (x i 2 y j )’s D 0 is either too small or too large. Expression (15.8) suggests that we compute x i 2 y j for each i and j and order these mn differences from smallest to largest. Then if the null value D 0 is neither smaller than most of the differences nor larger than most, H 0 : m 1 2m 2 5D 0 is not rejected. Varying D 0 now shows that a CI for m 1 2m 2 will have as its lower endpoint
one of the ordered sx i 2 y j d’s, and similarly for the upper endpoint.
670 ChapTeR 15 Distribution-Free procedures
pRoposition Let x 1 ,…, x m and y 1 ,…, y n
be the observed values in two independent samples
from continuous distributions that differ only in location (and not in shape).
With d ij 5 x i 2 y j and the ordered differences denoted by d ij (1) ,d ij (2) ,…, d ij (mn) ,
the general form of a 100(1 2 a) CI for m 1 2m 2 is
(d ij (mn2c11) ,d ij (c) ) (15.9)
where c is the critical constant for the two-tailed level a Wilcoxon rank-sum test.
Notice that the form of the Wilcoxon rank-sum interval (15.9) is very similar to the Wilcoxon signed-rank interval (15.7); that uses pairwise averages from a sin- gle sample, whereas (15.9) uses pairwise differences from two samples. Appendix Table A.16 gives values of c for selected values of m and n.
ExAmplE 15.8
The article “Some Mechanical Properties of Impregnated Bark Board” (Forest
Products J., 1977: 31–38) reports the following data on maximum crushing strength (psi) for a sample of epoxy-impregnated bark board and for a sample of bark board impregnated with another polymer:
Epoxy (x’s) 10,860 11,120 11,340 12,130 14,380 13,070 Other ( y’s) 4590 4850 6510 5640 6390
Let’s obtain a 95 CI for the true average difference in crushing strength between the epoxy-impregnated board and the other type of board.
From Appendix Table A.16, since the smaller sample size is 5 and the larger sample size is 6, c 5 26 for a confidence level of approximately 95. The d ij ’s
appear in Table 15.5. The five smallest d ij ’s [d ij s1d ,…, d ij s5d ] are 4350, 4470, 4610,
4730, and 4830; and the five largest d ij ’s are (in descending order) 9790, 9530, 8740, 8480, and 8220. Thus the CI is sd ij s5d ,d ij s26d d 5 s4830, 8220d.
Table 15.5 Differences for the Rank-Sum Interval in Example 15.8
y j
d ij 4590 4850 5640 6390 6510
10,860 6270 6010 5220 4470 4350 11,120 6530 6270 5480 4730 4610 x i 11,340 6750 6490 5700 4950 4830 12,130 7540 7280 6490 5740 5620 13,070 8480 8220 7430 6680 6560 14,380 9790 9530 8740 7990 7870
n
When m and n are both large, the Wilcoxon test statistic has approximately a normal distribution. This can be used to derive a large-sample approximation for the value c in interval (15.9). The result is
2 a y2 Î 12
mn
mn sm 1 n 1 1d
c <
1z
15.4 Distribution-Free aNOVa 671
As with the signed-rank interval, the rank-sum interval (15.9) is quite efficient with respect to the t interval; in large samples, it will tend to be only a bit wider than the t interval when the underlying populations are normal and may be considerably narrower than the t interval if the underlying populations have heavier tails than do normal populations.
ExERciSES section 15.3 (17–22)
17. The article “The Lead Content and Acidity of Christ-
Calculate a CI using a confidence level of roughly 95 church Precipitation” (N. Zeal. J. of Science, 1980: 311– for the difference between the true average amount
312) reports the accompanying data on lead concentration
extracted using the first solvent and the true average
(mgL) in samples gathered during eight different summer
amount extracted using the second solvent.
rainfalls: 17.0, 21.4, 30.6, 5.0, 12.2, 11.8, 17.3, and 18.8.
20. The following observations are amounts of hydrocarbon
Assuming that the lead-content distribution is symmetric, use
emissions resulting from road wear of bias-belted tires
the Wilcoxon signed-rank interval to obtain a 95 CI for m.
under a 522 kg load inflated at 228 kPa and driven at
18. Compute the 99 signed-rank interval for true average
64 kmhr for 6 hours (“Characterization of Tire
pH m (assuming symmetry) using the data in Exercise
Emissions Using an Indoor Test Facility,” Rubber
15.3. [Hint: Try to compute only those pairwise averages
Chemistry and Technology, 1978: 7–25) : .045, .117,
having relatively small or large values (rather than all
.062, and .072. What confidence levels are achievable
105 averages).]
for this sample size using the signed-rank interval? Select an appropriate confidence level and compute the
19. An experiment was carried out to compare the abilities
interval.
of two different solvents to extract creosote impregnated in test logs. Each of eight logs was divided into two seg-
21. Compute the 90 rank-sum CI for m 1 2m 2 using the
ments, and then one segment was randomly selected for
data in Exercise 11.
application of the first solvent, with the other segment
22. Compute a 99 CI for m 1 2m 2 using the data in
receiving the second solvent.
Solvent 1 3.92 3.79 3.70 4.08 3.87 3.95 3.55 3.76 Solvent 2 4.25 4.20 4.41 3.89 4.39 3.75 4.20 3.90