Tests for Two Independent Samples
5.3.1 Tests for Two Independent Samples
Commands 5.8. SPSS, STATISTICA, MATLAB and R commands used to perform non-parametric tests on two independent samples.
SPSS Analyze; Nonparametric Tests;
2 Independent Samples STATISTICA
Statistics; Nonparametrics; Comparing two independent samples (groups)
MATLAB [p,h,stats]=ranksum(x,y,alpha) R
ks.test(x,y) ; wilcox.test(x,y) | wilcox.test(x~y)
5.3.1.1 The Kolmogorov-Smirnov Two-Sample Test
The Kolmogorov-Smirnov test is used to assess whether two independent samples were drawn from the same population or from populations with the same distribution, for the variable X being tested, which is assumed to be continuous. Let
F (x) and G(x) represent the unknown distributions for the two independent samples. The null hypothesis is formalised as:
H 0 : Data variable X has equal cumulative probability distributions for the two samples: F (x) = G(x).
The test is conducted similarly to the way described in section 5.1.4. Let S m (x) and S n (x) represent the empirical distributions of the two samples, with sizes m and n , respectively. We then use as test statistic, the maximum deviation of these empirical distributions:
D m,n = max | S n (x) – S m (x) |.
For large samples (say, m and n above 25) and two-tailed tests (the most usual), the significance of D m,n can be evaluated using the critical values obtained with the expression:
c , 5.30 mn
where c is a coefficient that depends on the significance level, namely c = 1.36 for α = 0.05 (for details, see e.g. Siegel S, Castellan Jr NJ, 1998).
When compared with its parametric counterpart, the t test, the Kolmogorov- Smirnov test has a high power-efficiency of about 95%, even for small samples.
202 5 Non-Parametric Tests of Hypotheses
Example 5.13
Q: Consider the variable ART, the total area of defects, of the cork-stopper dataset. Can one assume that the distributions of ART for the first two classes of cork- stoppers are the same?
A: Variable ART can be considered a continuous variable, and the samples are independent. Table 5.17 shows the Kolmogorov test results, from where we conclude that the null hypothesis is rejected, i.e., for variable ART, the first two classes have different distributions. The test is performed in R with ks.test (ART[1:50],ART[51:100]).
Table 5.17. Two sample Kolmogorov-Smirnov test results obtained with SPSS for variable ART of the cork-stopper dataset.
ART Most Extreme Differences
Kolmogorov-Smirnov Z
Asymp. Sig. (2-tailed)
5.3.1.2 The Mann-Whitney Test
The Mann-Whitney test, also known as Wilcoxon-Mann-Whitney or rank-sum test, is used like the previous test to assess whether two independent samples were drawn from the same population, or from populations with the same distribution, for the variable being tested, which is assumed to be at least ordinal.
Let F X (x) and G Y (x) represent the unknown distributions of the two independent populations, where we explicitly denote by X and Y the corresponding random variables. The null hypothesis can be formalised as in the previous section (F X (x) =
G Y (x)). However, when the distributions are different, it often happens that the probability associated to the event “X > Y ” is not ½, as should be expected for equal distributions. Following this approach, the hypotheses for the Mann-Whitney test are formalised as:
H 0 : P(X > Y ) = ½ ;
H 1 : P(X > Y ) ≠ ½ ,
for the two-sided test, and
for the one-sided test.
5.3 Inference on Two Populations
In order to assess these hypotheses, the Mann-Whitney test starts by assigning
ranks to the samples. Let the samples be denoted x 1 , x 2 , …, x n and y 1 , y 2 , …, y m .
The ranking of the x i and y i assigns ranks in 1, 2, …, n + m. As an example, let us consider the following situation:
x i : 12 21 15 8 y i : 9 13 19
The ranking of x i and y i would then yield the result:
Variable: XYXYXYX Data:
8 9 12 13 15 19 21 Rank:
The test statistic is the sum of the ranks for one of the variables, say X:
where R(x i ) are the ranks assigned to the x i . For the example above, W X = 16.
Similarly, W Y = 12 with:
, total sum of the ranks from 1 through N = n + m.
The rationale for using W X as a test statistic is that under the null hypothesis,
P (X > Y ) = ½, one expects the ranks to be randomly distributed between the x i and y i , therefore resulting in approximately equal average ranks in each of the two
samples. For small samples, there are tables with the exact probabilities of W X . For large samples (say m or n above 10), the sampling distribution of W X rapidly
approaches the normal distribution with the following parameters:
Therefore, for large samples, the following test statistic with standard normal distribution is used:
X . σ 5.33
The 0.5 continuity correction factor is added when one wants to determine critical points in the left tail of the distribution, and subtracted to determine critical points in the right tail of the distribution.
When compared with its parametric counterpart, the t test, the Mann-Whitney test has a high power-efficiency, of about 95.5%, for moderate to large n. In some
204 5 Non-Parametric Tests of Hypotheses
cases, it was even shown that the Mann-Whitney test is more powerful than the t test! There is also evidence that it should be preferred over the previous Kolmogorov-Smirnov test for large samples.
Example 5.14
Q: Consider the Programming dataset. Does this data support the hypothesis that freshmen and non-freshmen have different distributions of their scores?
A: The Mann-Whitney test results are summarised in Table 5.18. From this table one concludes that the null hypothesis (equal distributions) cannot be rejected at the 5% level. In R this test would be solved with wilcox.test (Score~F)yielding the same results for the “Mann-Whitney U” and “Asymp. Sig.” as in Table 5.18.
Table 5.18. Mann-Whitney test results obtained with SPSS for Example 5.14:
a) Ranks; b) Test statistic and significance. F=1 for freshmen; 0, otherwise.
FN Mean
SCORE Rank
Sum of
Ranks
Mann-Whitney U 3916
Wilcoxon W
Asymp. Sig.
Total 271 (2-tailed) 0.791
Table 5.19. Ranks for variables ASP and PHE (Example 5.15), obtained with SPSS.
Sum of Ranks ASP 1
TYPE
Mean Rank
2 37 29.04 1074.5 Total 67 PHE 1
Q: Consider the t test performed in Example 4.9, for variables ASP and PHE of the wine dataset. Apply the Mann-Whitney test to these continuous variables and compare the results with those previously obtained.
5.3 Inference on Two Populations 205
A: Tables 5.19 and 5.20 show the results with identical conclusions (and p values!) to those presented in Example 4.9.
Note that at a 1% level, we do not reject the null hypothesis for the ASP variable. This example constitutes a good illustration of the power-efficiency of the Mann-Whitney test when compared with its parametric counterpart, the t test.
Table 5.20. Mann-Whitney test results for variables ASP and PHE (Example 5.15) with grouping variable TYPE, obtained with SPSS.
PHE Mann-Whitney U
ASP
314 Wilcoxon W
−3.039 Asymp. Sig. (2-tailed)