The Two-Sample t Test and Confidence Interval
9.2 The Two-Sample t Test and Confidence Interval
Values of the population variances will usually not be known to an investigator. In the previous section, we illustrated for large sample sizes the use of a z test and CI in which the sample variances were used in place of the population variances. In fact, for large samples, the CLT allows us to use these methods even when the two popula- tions of interest are not normal.
In practice, though, it will often happen that at least one sample size is small and the population variances have unknown values. Without the CLT at our disposal, we proceed by making specific assumptions about the underly- ing popula tion distributions. The use of inferential procedures that follow from these assumptions is then restricted to situations in which the assumptions are at least approximately satisfied. We could, for example, assume that both popula- tion distributions are members of the Weibull family or that they are both Poisson distributions. It shouldn’t surprise you to learn that normality is often the most reasonable assumption.
assumPtions
Both population distributions are normal, so that X 1 ,X 2 ,…, X m is a random sample from a normal distribution and so is Y 1 ,…, Y n (with the X’s and Y’s
independent of one another). The plausibility of these assumptions can be judged by constructing a normal probability plot of the x i ’s and another of the y i ’s.
The test statistic and confidence interval formula are based on the same standardized variable developed in Section 9.1, but the relevant distribution is now t rather than z.
thEorEm
When the population distributions are both normal, the standardized variable
X2Y2 (m 1 2m 2 )
has approximately a t distribution with df v estimated from the data by
s 2 1 s 2
1 m n 2 [(se 1 ) 2 1 (se ) 2 ] 2 2
n5 2 2 2 5 4 ( 4
s 1 ym ) ( s 2 yn ) (se 1 ) (se 2 ) m2 1 n2 1 m2 1 n2 1
where s 1 s 2
(round v down to the nearest integer).
Manipulating T in a probability statement to isolate m 1 2m 2 gives a CI, whereas a test statistic results from replacing m 1 2m 2 by the null value D 0 .
9.2 the two-Sample t test and Confidence Interval 375
The two-sample t confidence interval for m 1 2m 2 with confidence level
100(1 2 a) is then
1 Î m n
s 2 1 s 2
x2y6t a y2,n
A one-sided confidence bound can be calculated as described earlier.
The two-sample t test for testing H 0 :m 1 2m 2 5D 0 is as follows: x2y2D 0
Test statistic value: t 5
1 Î m n
s 2 1 s 2
Alternative Hypothesis
P-Value determination
H a :m 1 2m 2 .D 0 Area under the t v curve to the right of t
H a :m 1 2m 2 ,D 0 Area under the t v curve to the left of t
H a :m 1 2m 2 ±D 0 2 ∙ (Area under the t v curve to the right of | t |)
Assumptions: Both population distributions are normal, and the two random samples are selected independently of one another.
ExamPlE 9.6 The void volume within a textile fabric affects comfort, flammability, and insulation properties. Permeability of a fabric refers to the accessibility of void space to the
flow of a gas or liquid. The article “The Relationship Between Porosity and Air
Permeability of Woven Textile Fabrics” (J. of Testing and Eval., 1997: 108–114)
gave summary information on air permeability (cm 3 cm 2 sec) for a number of different
fabric types. Consider the following data on two different types of plain-weave fabric:
Fabric Type
Sample Size
Sample Mean
Sample Standard Deviation
Assuming that the porosity distributions for both types of fabric are normal, let’s cal- culate a confidence interval for the difference between true average porosity for the cotton fabric and that for the acetate fabric, using a 95 confidence level. Before the appropriate t critical value can be selected, df must be determined:
df 5
2 2 5 5 (.6241 9.87 y10) (12.8881 y10) .1850
Thus we use n 5 9; Appendix Table A.5 gives t .025,9 5 2.262. The resulting interval is
Î 10 10
376 Chapter 9 Inferences Based on two Samples
With a high degree of confidence, we can say that true average porosity for triacetate fabric specimens exceeds that for cotton specimens by between 81.80 and 87.06
cm 3 cm 2 sec.
n
ExamplE 9.7
The deterioration of many municipal pipeline networks across the country is a grow ing concern. One technology proposed for pipeline rehabilitation uses a flexible liner
threaded through existing pipe. The article “Effect of Welding on a High-Density
Polyethylene Liner” (J. of Materials in Civil Engr., 1996: 94–100) reported the following data on tensile strength (psi) of liner specimens both when a certain fusion process was used and when this process was not used.
No fusion 2748 2700 2655 2822 2511 3149 3257 3213 3220 2753 m5
n5 8 y5 3108.1
s 2 5 205.9
Figure 9.2 shows normal probability plots from Minitab. The linear pattern in each plot supports the assumption that the tensile strength distributions under the two con ditions are both normal.
Not fused
Fused
Figure 9.2 Normal probability plots from Minitab for the tensile strength data
The authors of the article stated that the fusion process increased the average tensile strength. The message from the comparative boxplot of Figure 9.3 is not all that clear. Let’s carry out a test of hypotheses to see whether the data supports this conclusion.
Figure 9.3 A comparative boxplot of the tensile-strength data
9.2 the two-Sample t test and Confidence Interval 377
1. Let m 1 be the true average tensile strength of specimens when the no-fusion
treatment is used and m 2 denote the true average tensile strength when the
fusion treatment is used.
2. H 0 :m 1 2m 2 5 0 (no difference in the true average tensile strengths for the two treatments)
3. H a :m 1 2m 2 ,
0 (true average tensile strength for the no-fusion treatment is less than that for the fusion treatment, so that the investiga- tors’ conclusion is correct)
4. The null value is D 0 5 0, so the test statistic value is x2y
5. We now compute both the test statistic value and df for the test:
Î 10 8
Using s 2 1 ym 5 7689.529 and s 2 yn 5 5299.351,
y9 1 (5299.351) y7 10,581,747.35
so the test will be based on 15 df.
6. Appendix Table A.8 shows that the area under the 15 df t curve to the right of
1.8 is .046, so the P-value for a lower-tailed test is also .046. The following Minitab output summarizes all the computations:
Two-sample T for nofusion vs fused
SE Mean
not fused
95 C.I. for mu nofusion-mu fused: (]488, 38) t-Test mu not fused 5 mu fused (vs ,): T 5 ]1.80 P 5 0.046 DF 5 15
7. Using a significance level of .05, we can barely reject the null hypothesis in favor of the alternative hypothesis, confirming the conclusion stated in the article. However, someone demanding more compelling evidence might select
a5 .01, a level for which H 0 cannot be rejected.
If the question posed had been whether fusing increased true average strength by more
than 100 psi, then the relevant hypotheses would have been H 0 :m 1 2m 2 52 100 ver sus
H a :m 1 2m 2 ,2 100; that is, the null value would have been D 0 52 100. n