Compare Dependent Samples

6.4 Compare Dependent Samples

The independent groups t-test discussed in the previous section compares the means of samples from two different groups. The groups are independent. The samples are drawn independently from each other, and there is no linkage between any specific data value in one sample with a specific data value in another sample. The analysis depends only on the means themselves, and also the two sample sizes and two sample standard deviations.

block of matched data values: Match

6.4.1 Dependent-samples t-test

a data value to a data value in

The alternative to independent groups is to collect the data in blocks of matched data values.

another sample.

The samples, then, are necessarily the same size. A dependent-samples design organizes data into

dependent-

these blocks. A block could be the data values of the same person before training and after

samples: Two or more samples are

training, or a block could consist of happiness measures from two different people, a husband

organized in blocks

and a wife, in a study of marital happiness.

of matched data values.

A classic example of a dependent-samples design applies to treatments such as a therapy for depression or a program for weight loss. For weight loss, measure weight before the treatment program begins. Administer the program, and then re-assess the weight of each participant. The results are samples of two sets of measurements with each measurement from the first sample linked to the same person’s measurement in the second sample.

As with the independent groups design, there are still two sets of measurements with two sample means and two sample standard deviations. That is, the independent groups t-test could

dependent

still be employed to analyze the mean difference for these two samples of data. The t-test for the

groups t-test: A t -test of the

dependent groups design, however, directly compares the matched scores, usually by subtracting

difference scores of

one from the other. The analysis is not of the two means, but the mean of the differences of the

the matched data values.

two scores. For the pre- and post-measurements for a treatment program, the change in each person’s score is analyzed directly, person-by-person.

The analysis of the direct differences can yield a more powerful test than the comparison of

power of a test:

the overall means. A test with more power is more likely to detect a population difference that

The ability to detect a real

actually exists than a test with less power. The dependent-samples test is more powerful when

population

the variable of interest, such as weight, exhibits much variability among the participants. Some

difference.

people do initially weigh more than others. In an independent-groups analysis these variations in weight contribute to the variability within each sample of measured weights, which then masks the ability of the test to differentiate between the means. The dependent-groups analysis removes this variability of the participant’s initial weights from the analysis. The focus shifts from the weights to a direct assessment of weight loss.

Scenario Evaluate the effectiveness of a weight loss program Ten people who desired to lose weight participated in a weight loss program with measurements of their weight before and after the program. Was the program effective, and, if so, how effective?

row.names option, Section 2.2.6 ,

The first task is to read the data. The name of each participant is in the first column, so

p. 37

these are identified as the row names for the mydata data frame.

Means, Compare Two Samples 143

> mydata <- Read("http://lessRstats.com/data/WeightLoss.csv",

row.names=1)

View all the data by entering the name of the data frame, mydata , illustrated in Figure 6.18.

> mydata Before After Saechao, M.

Smith, D. 187

... Jones, S.

Langston, M. 174

Listing 6.18 Weight before and after a weight loss program, in pounds, for the first few and last few blocks of data.

The analysis is of the difference between the Before and After weight for each participant. The null hypothesis is that, on average, there is no change from before and after the program. One-sample t-test,

Section 6.2.2 ,

One way to accomplish the analysis is first to calculate these differences with the Transform p. 126

function and do the usual one-sample t-test on the difference scores. The analysis can also be paired=TRUE

done directly by listing the two variables, in this case Before and After, and then specifying the option: Specify a dependent-groups option paired set to TRUE .

t -test.

lessR Input Dependent-groups t-test > ttest(Before, After, paired=TRUE)

Specify the test as a two-tailed test even though the advocates of the weight loss program prefer to see a rejection of the null hypothesis in the direction of weight loss. Given this desire for a specific outcome, some authors recommend a one-tailed test, here with the value of alternative set to "greater" . However, setting a one-tailed test rules out any interpretation of results in the opposite direction. If the null hypothesis is not rejected, then the only conclusion is that the population mean difference is not greater than 0, even if the program had the opposite result than intended and actually leads to weight gain.

one-tailed test

Yes, it would be somewhat strange for a weight loss program to result in weight gain, but example, Section 6.23, sometimes our best intentions lead to results different from what we expect. Accordingly two- p. 147 tailed tests are generally more appropriate unless there genuinely is no interest in a result that lies in the tail opposite the tail of a one-tailed rejection region. So the following test is run with the rejection region in both tails.

The descriptive statistics in Listing 6.19 indicate that the mean weight loss for these 10 participants is 8.20 pounds.

------ Description ------ Difference: n.miss = 0, n = 10,

mean = 8.20, sd = 7.67

Listing 6.19 Descriptive statistics of the differences.

144 Means, Compare Two Samples

The normality assumption is assessed in Listing 6.20 . The null hypothesis is that the population of difference scores from which these 10 scores were sampled is normal. This null hypothesis was not rejected as the resulting p-value of 0.573 is larger than α= 0 .

05. As previously indicated, this test may not perform well in small samples, but at least the result is consistent with the assumption of normality.

------ Normality Assumption ------ Null hypothesis is a normal distribution of Difference.

Shapiro-Wilk normality test: W = 0.939, p-value = 0.541

Listing 6.20 Evaluation of the normality of the differences.

Listing 6.21 presents the inferential analysis, the hypothesis test and the confidence interval of the difference scores. The hypothesis test indicates that the population mean value of 0 is rejected.

Weight Loss Effect: p -value = 0 . 008 <α= 0 . 05 , so reject H 0

The mean value of the differences is likely not zero, so what is it? There is no precise answer, but the 95% confidence interval indicates that the plausible range of values of the population mean indicates that the average weight loss for all participants is from 2.71 to 13.69 pounds.

------ Inference ------ t-cutoff: tcut = 2.262

Standard Error of Mean: SE = 2.43 Hypothesized Value H0: mu = 0

Hypothesis Test of Mean: t-value = 3.380, df = 9, p-value = 0.008 Margin of Error for 95% Confidence Level: 5.49

95% Confidence Interval for Mean:

2.71 to 13.69

Listing 6.21 Inference of the mean population difference.

The plot of the data and the displacement of the sample mean of the differences of 8.20 are illustrated in Figure 6.5 . Also illustrated is the displacement in terms of Cohen’s

d, which indicates that the sample mean of the differences is more than 1 standard deviation from zero. The weight loss program appears to be effective. Most participants lose weight, and the average weight loss is somewhere between 2.7 and 13.7 pounds.

6.4.2 The Non-parametric Alternative for Dependent Samples

Just as the dependent-groups t-test directly analyzed the difference scores of matched data values with a one-sample test, so does the non-parametric alternative. This non-parametric alternative to the one-sample t-test is the Wilcoxon signed rank test. This test is first discussed in this section

Means, Compare Two Samples 145

Figure 6.5 Density plot of the differences of weight loss, in pounds.

instead of the section on the one-sample t-test because the Wilcoxon test is most commonly applied to the analysis of difference scores.

To illustrate, return to the weight loss data in Listing 6.18 , and the corresponding difference scores. The Wilcoxon test compares the size of the positive differences, in this case, weight loss, with the size of the negative differences, here, weight gain. Order the absolute values of the differences from smallest to largest. If weight loss predominates, then the size of the weight loss values should be larger than for the weight gain values. The test assesses the position of these weight losses in the overall distribution of ordered absolute value of differences scores. The basis of the test is the Wilcoxon signed rank statistic, the sum of the ranks of the positive differences.

Specify the Wilcoxon signed rank test for the analysis of a single variable with the same wilcox.test function that compares two independent samples. In this situation the difference Wilcoxon rank sum of weight before and after the treatment program is the basis of the test. Again, since this is a test, Section 6.3.3 , p. 135 standard R function, use the with function to inform R as to the location of the variable for analysis, here in the mydata data frame.

conf.level option,

The default value for the confidence level is 95%. The interval must be explicitly requested Section 6.3.2 , by setting conf.int=TRUE . To change the confidence level, explicitly change the default value p. 134 of conf.level=0.95 .

alternative option

Also, by default the test is two-tailed to evaluate differences in either direction, regardless for a one-tailed test, Section 6.2.2 , of what the desired direction. To change to a one-tailed test, change the default value of p. 126 alternative="two.sided" to either "less" or "greater" .

The null hypothesis is that there is no systematic difference within the matched data values for the two samples. The alternative hypothesis is that there is a systematic difference. For the default two-tailed test, the difference could be in either direction.

paired option: Set

The test is run as the difference between matched data values implicitly calculated. Just to TRUE to inform specify the two variables that define the two matched samples, and include the paired=TRUE R to calculate and analyze a option, which informs R to analyze the difference between the corresponding data values.

difference.

146 Means, Compare Two Samples

R Input Wilcoxon signed rank test with implied Difference score > with(mydata, wilcox.test(Before, After, paired=TRUE,

conf.int=TRUE))

The output is given in Listing 6.22 .

Wilcoxon signed rank test data: Difference

V = 53, p-value = 0.005859 alternative hypothesis: true location is not equal to 0 95 percent confidence interval: 3 14 sample estimates: (pseudo)median

Listing 6.22 Output from the R function wilcox.test for analysis of a single variable.

From the output, reject the null hypothesis of no difference of Weight in the Before and After samples.

Weight Loss Effect: p -value < α = 0 . 05

The sample estimate is of the (pseudo) median, a concept similar to, but not exactly equal to, the actual sample median. The 95% confidence interval ranges from 3 to 14 of these (pseudo) medians. These values are close to the lower and upper bounds of the confidence interval of the true mean of the difference scores, which are 2.71 pounds and 13.69 pounds.

These results follow from the two-tailed test, in which the mean of the differences in either the positive or the negative direction is interpreted. Some researchers prefer a one-tailed test when an outcome is predicted in a specified direction. For this example, a positive mean of the differences indicates a successful weight loss program. Carry out the one-tailed test to predict a

one-tailed

positive difference by adding alternative="greater" to the function call to wilcox.test .

alternative option, Section 6.2.2 , p. 126

R Input Wilcoxon signed rank test for a one-tailed test > with(mydata, wilcox.test(Before, After, paired=TRUE,

conf.int=TRUE, alternative="greater"))

The result of this one-tailed analysis is given in Listing 6.23 .

one-tailed test

The p-value of the one-tailed test is exactly half the size compared to that of the two-tailed

results: The p-value is half the size of

test, and the confidence interval is one-sided. The smaller p-value makes the test more powerful

the two-tailed test,

in finding a difference from the null value that is real, because the one-tailed test might achieve

and the confidence interval is

a p-value less than 0.05 when the two-tailed version might not. That gain in efficiency is offset

one-sided.

by not being able to conclude the opposite if the result is the opposite of what was intended.

Means, Compare Two Samples 147

Wilcoxon signed rank test data: Difference

V = 53, p-value = 0.00293 alternative hypothesis: true location is greater than 0 95 percent confidence interval: 3.5 Inf sample estimates: (pseudo)median

Listing 6.23 Directional, one-tailed Wilcoxon signed rank test.

Moreover, the sample estimate, the (pseudo) median, could be an under or over estimate of the true value, a result more consistent with the concept of the standard confidence interval with both an upper and lower bound.

The conclusion from both the parametric t-test and the non-parametric wilcox.test dependent-samples analyses is that the weight loss program does appear to facilitate weight loss, likely somewhere between about 3 and 14 pounds.

Worked Problems

Answer the following questions in terms of the hypothesis test, confidence interval, and effect size.

1 Consider the Employee data set, available from within the lessR package.

?Employee for more information.

> mydata <- Read("Employee", format="lessR") Two of the variables in this data set are Salary and Gender for employees at a specific

company. (a) Is it reasonable that the true mean Salary is $75,000?

(b) Are mean Salaries the same for men and women at this company? Answer with a parametric and a non-parametric procedure.

2 Consider the Cars93 data set, available from within the lessR package.

?Cars93 for more information.

> mydata <- Read("Cars93", format="lessR") Some of the variables in this data set are MPGcity, MPGhiway, and Manual, a binary variable

with a value of 1 for a manual transmission and a 0 for an automatic. (a) Is it reasonable that the true mean city MPG is 22?

(b) Is city fuel mileage the same, on average, for cars with manual transmissions and cars with

automatic transmissions? Answer with a parametric and a non-parametric procedure. (c) Is there a difference in fuel mileage for city and highway driving? Answer with a parametric and a non-parametric procedure.

This page intentionally left blank