Analysis of Paired Data

9.3 Analysis of Paired Data

In Sections 9.1 and 9.2, we considered making an inference about a difference between

two means m 1 and m 2 . This was done by utilizing the results of a random sample

X 1 ,X 2 ,…X m from the distribution with mean m 1 and a completely independent (of the

X ’s) sample Y 1 ,…, Y n from the distribution with mean m 2 . That is, either m individuals

were selected from population 1 and n different individuals from population 2, or m individuals (or experimental objects) were given one treatment and another set of n individuals were given the other treatment. In contrast, there are a number of experimental situations in which there is only one set of n individuals or experimental objects; making two observations on each one results in a natural pairing of values.

ExamPlE 9.8 Trace metals in drinking water affect the flavor, and unusually high concentrations

can pose a health hazard. The article “Trace Metals of South Indian River” (Envir.

Studies, 1982: 62–66) reported on a study in which six river locations were selected (six experimental objects) and the zinc concentration (mgL) determined for both surface water and bottom water at each location. The six pairs of observations are displayed in the accompanying table. Does the data suggest that true average concentration in bottom water exceeds that of surface water?

Location

Zinc concentration in

bottom water (x) .430 .266 .567 .531 .707 .716 Zinc concentration in

surface water (y) .415 .238 .390 .410 .605 .609 Difference

9.3 analysis of paired Data 383

Figure 9.4(a) displays a plot of this data. At first glance, there appears to be little dif ference between the x and y samples. From location to location, there is a great deal of variability in each sample, and it looks as though any differences between the samples can be attributed to this variability. However, when the observations are identified by location, as in Figure 9.4(b), a different view emerges. At each location, bottom concentration exceeds surface concentration. This is confirmed by the fact that all x 2 y differences displayed in the bottom row of the data table are positive. A correct analysis of this data focuses on these differences.

Location x

Location y

Figure 9.4 Plot of paired data from Example 9.8: (a) observations not identified by location; (b) observations identified by location

assumPtions The data consists of n independently selected pairs (X 1 ,Y 1 ), (X 2 ,Y 2 ),…(X n ,Y n ), with E(X i )5m 1 and E(Y i )5m 2 . Let D 1 5 X 1 2 Y 1 ,D 2 5 X 2 Y 2 ,…,

D n 5 X n 2 Y n so the D i ’ s are the differences within pairs. The D i ’ s are assumed

to be normally distributed with mean value m

D and variance s D (this is usually a

consequence of the X i ’ s and Y i ’ s themselves being normally distributed).

We are again interested in making an inference about the difference m 1 2m 2 .

The two-sample t confidence interval and test statistic were obtained by assuming independent samples and applying the rule V(X 2 Y) 5 V(X) 1 V(Y). However, with paired data, the X and Y observations within each pair are often not independent. Then X and Y are not independent of one another. We must therefore abandon the two-sample t procedures and look for an alternative method of analysis.

the Paired t test

Because different pairs are independent, the D i ’ s are independent of one another. Let

D5X2Y , where X and Y are the first and second observations, respectively, within an arbitrary pair. Then the expected difference is

m D 5 E (X 2 Y) 5 E(X) 2 E(Y) 5 m 1 2m 2

(the rule of expected values used here is valid even when X and Y are dependent).

Thus any hypothesis about m 1 2m 2 can be phrased as a hypothesis about the mean difference m D . But since the D i ’ s constitute a normal random sample (of differences) with mean m D , hypotheses about m D can be tested using a one-sample t test. That is, to test hypotheses about m 1 2m 2 when data is paired, form the differ-

ences D 1 ,D 2 ,…, D n and carry out a one-sample t test (based on n 2 1 df) on these

differences.

384 Chapter 9 Inferences Based on two Samples

the Paired t test

Null hypothesis: H 0 :m D 5D 0 (where D 5 X 2 Y is the difference between the first and second observations within a pair, and m D 5m 1 2m 2 )

d2D 0

Test statistic value: t 5

(where d and s D are the sample mean and s D yÏn standard deviation, respectively, of the d i ’ s)

Alternative Hypothesis

P-Value determination

H a :m D .D 0 Area under the t n2 1 curve to the right of t

H a :m D ,D 0 Area under the t n2 1 curve to the left of t

H a :m D ±D 0 2 ∙ (Area under the t n2 1 curve to the right of | t |)

Assumptions: The D i s constitute a random sample from a normal “difference” population.

ExamPlE 9.9 Musculoskeletal neck-and-shoulder disorders are all too common among office staff

who perform repetitive tasks using visual display units. The article “Upper-Arm

Elevation During Office Work” (Ergonomics, 1996: 1221–1230) reported on

a study to determine whether more varied work conditions would have any impact on arm movement. The accompanying data was obtained from a sample of n 5 16 subjects. Each observation is the amount of time, expressed as a proportion of total time observed, during which arm elevation was below 30°. The two measurements from each subject were obtained 18 months apart. During this period, work conditions were changed, and subjects were allowed to engage in a wider variety of work tasks. Does the data suggest that true average time during which elevation is below 30° dif fers after the change from what it was before the change?

Subject 1 23456 7 8

Before 81 87 86 82 90 86 96 73 After 78 91 78 78 84 67 92 70

Difference 3 24 8 4 6 19 4 3

Subject 9 10 11 12 13 14 15 16

Before 74 75 72 80 66 72 56 82 After 58 62 70 58 66 60 65 73

Difference 16 13 2 22 0 12 2 99

Figure 9.5 shows a normal probability plot of the 16 differences; the pattern in the plot is quite straight, supporting the normality assumption. A boxplot of these differences appears in Figure 9.6; the boxplot is located considerably to the right of

zero, suggesting that perhaps m D . 0 (note also that 13 of the 16 differences are

positive and only two are negative). Let’s now test the appropriate hypotheses.

1. Let m D denote the true average difference between elevation time before the change in work conditions and time after the change.

2. H 0 :m D 5 0 (there is no difference between true average time before the change and true average time after the change)

3. H 0 :m D ±0

9.3 analysis of paired Data 385

Mean 6.75 StDev

Figure 9.5 A normal probability plot from Minitab of the differences in Example 9.9

Figure 9.6 A boxplot of the differences in Example 9.9

i 5 108, and od i 5 1746, from which d 5 6.75, s D 5 8.234, and

6. Appendix Table A.8 shows that the area to the right of 3.3 under the t curve

with 15 df is .002. The inequality in H a implies that a two-tailed test is appro-

priate, so the P-value is approximately 2(.002) 5 .004 (Minitab gives .0051).

7. Since .004 , .01, the null hypothesis can be rejected at either significance level .05 or .01. It does appear that the true average difference between times is something other than zero; that is, true average time after the change is different from that before the change.

When the number of pairs is large, the assumption of a normal difference distribution is not necessary. The CLT validates the resulting z test.

the Paired t confidence Interval

In the same way that the t CI for a single population mean m is based on the t vari-

able T 5 (X 2 m) y(SyÏn), a t confidence interval for m D (5 m 1 2m 2 ) is based on

the fact that D2m D

T5 S D yÏn

has a t distribution with n 2 1 df. Manipulation of this t variable, as in previous derivations of CI’s, yields the following 100(1 2 a) CI:

386 Chapter 9 Inferences Based on two Samples

The paired t CI for m D is d6t a y2,n21 ? s D yÏn

A one-sided confidence bound results from retaining the relevant sign and

replacing t a y2 by t a .

When n is small, the validity of this interval requires that the distribution of differences be at least approximately normal. For large n, the CLT ensures that the resulting z interval is valid without any restrictions on the distribution of differences.

ExamPlE 9.10

Magnetic resonance imaging is a commonly used noninvasive technique for assessing the extent of cartilage damage. However, there is concern that the MRI sizing of articular

cartilage defects may not be accurate. The article “Preoperative MRI Underestimates Articular Cartilage Defect Size Compared with Findings at Arthroscopic Knee

Surgery” (Amer. J. of Sports Med., 2013: 590–595) reported on a study involving a sample of 92 cartilage defects. For each one, the size of the lesion area was determined by an MRI analysis and also during arthroscopic surgery. Each MRI value was then subtracted from the corresponding arthroscopic value to obtain a difference value. The

sample mean difference was calculated to be 1.04 cm 2 , with a sample standard deviation

of 1.67. Let’s now calculate a confidence interval using a confidence level of (at least

approximately) 95 for m D , the mean difference for the population of all such defects

(as did the authors of the cited article). Because n is quite large here, we use the z critical value z .025 5 1.96 (an entry at the very bottom of our t table). The resulting CI is

At the 95 confidence level, we believe that .70 , m D ,

1.38. Perhaps the most

interesting aspect of this interval is that 0 is not included; only certain positive values

of m D are plausible. It is this fact that led the investigators to conclude that MRIs tend to underestimate defect size.

Analysis of Paired Data

9.3 Analysis of Paired Data

Parts

Dokumen yang terkait

AN ALIS IS YU RID IS PUT USAN BE B AS DAL AM P E RKAR A TIND AK P IDA NA P E NY E RTA AN M E L AK U K A N P R AK T IK K E DO K T E RA N YA NG M E N G A K IB ATK AN M ATINYA P AS IE N ( PUT USA N N O MOR: 9 0/PID.B /2011/ PN.MD O)

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

Anal isi s L e ve l Pe r tanyaan p ad a S oal Ce r ita d alam B u k u T e k s M at e m at ik a Pe n u n jang S MK Pr ogr a m Keahl ian T e k n ologi , Kese h at an , d an Pe r tani an Kelas X T e r b itan E r lan gga B e r d asarkan T ak s on om i S OL O

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

The Correlation between students vocabulary master and reading comprehension

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

Transmission of Greek and Arabic Veteri

Dukungan

Links

Analysis of Paired Data

9.3 Analysis of Paired Data

Parts

Dokumen yang terkait

AN ALIS IS YU RID IS PUT USAN BE B AS DAL AM P E RKAR A TIND AK P IDA NA P E NY E RTA AN M E L AK U K A N P R AK T IK K E DO K T E RA N YA NG M E N G A K IB ATK AN M ATINYA P AS IE N ( PUT USA N N O MOR: 9 0/PID.B /2011/ PN.MD O)

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

Anal isi s L e ve l Pe r tanyaan p ad a S oal Ce r ita d alam B u k u T e k s M at e m at ik a Pe n u n jang S MK Pr ogr a m Keahl ian T e k n ologi , Kese h at an , d an Pe r tani an Kelas X T e r b itan E r lan gga B e r d asarkan T ak s on om i S OL O

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

The Correlation between students vocabulary master and reading comprehension

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

Transmission of Greek and Arabic Veteri

Dokumen yang Anda mencari sudah siap untuk unduhkan