19 The article “A Study of a Partial Nutrient Removal System for Wastewater Treatment
Example 12.19 The article “A Study of a Partial Nutrient Removal System for Wastewater Treatment
Plants” (Water Research, 1972: 1389–1397) reports on a method of nitrogen removal that involves the treatment of the supernatant from an aerobic digester. Both the influent total nitrogen x (mg L) and the percentage y of nitrogen removed were
recorded for 20 days, with resulting summary statistics gx 5 285.90, g x 2 i i 5
4409.55, g y 5 690.30, g y 2 i i 5 29,040.29 , and gx i y i 5 10,818.56 . The sample
correlation coefficient between influent nitrogen and percentage nitrogen removed is
r 5 .733 , giving v 5 .935 . With n 5 20 , a 95 confidence interval for m V is (.935 2 1.96 117, .935 1 1.96 117) 5 (.460, 1.410) 5 (c 1 ,c 2 ) . The 95 inter-
val for r is
e 2(.46) 21 e 2(1.41) c 21
e 11 e 2(1.41)
In Chapter 5, we cautioned that a large value of the correlation coefficient (near
1 or 21 ) implies only association and not causation. This applies to both r and r.
EXERCISES Section 12.5 (57–67)
57. The article “Behavioural Effects of Mobile Telephone Use
“Post-Harvest Glyphosphate Application Reduces Tough-
During Simulated Driving” (Ergonomics, 1995:
ening, Fiber Content, and Lignification of Stored Asparagus
2536–2562) reported that for a sample of 20 experimental
Spears” (J. of the Amer. Soc. of Hort. Science, 1988: 569–572).
subjects, the sample correlation coefficient for x age and
The article reported the accompanying data (read from a graph)
y
time since the subject had acquired a driving license
on x shear force (kg) and y percent fiber dry weight.
(yr) was .97. Why do you think the value of r is so close to 1? (The article’s authors give an explanation.)
x
58. The Turbine Oil Oxidation Test (TOST) and the Rotating
y
Bomb Oxidation Test (RBOT) are two different procedures for evaluating the oxidation stability of steam turbine oils.
x
The article “Dependence of Oxidation Stability of Steam Turbine Oil on Base Oil Composition” (J. of the Society of
y
Tribologists and Lubrication Engrs., Oct. 1997: 19–24) n 5 18, gx 5 1950, gx reported the accompanying observations on x 2 5 251,970, TOST time i i
gy i 5 47.92, gy i 2 5 130.6074, gx RBOT time (min) for 12 oil specimens. i y i 5 5530.92
(hr) and y
a. Calculate the value of the sample correlation coefficient.
Based on this value, how would you describe the nature
of the relationship between the two variables?
b. If a first specimen has a larger value of shear force than
does a second specimen, what tends to be true of percent dry fiber weight for the two specimens?
a. Calculate and interpret the value of the sample correla-
c. If shear force is expressed in pounds, what happens to
tion coefficient (as do the article’s authors).
the value of r? Why?
b. How would the value of r be affected if we had let
d. If the simple linear regression model were fit to this data,
x RBOT time and y TOST time?
what proportion of observed variation in percent fiber dry
c. How would the value of r be affected if RBOT time were
weight could be explained by the model relationship?
expressed in hours?
e. Carry out a test at significance level .01 to decide
d. Construct normal probability plots and comment.
whether there is a positive linear association between the
e. Carry out a test of hypotheses to decide whether RBOT
two variables.
time and TOST time are linearly related.
60. Head movement evaluations are important because individu-
59. Toughness and fibrousness of asparagus are major determi-
als, especially those who are disabled, may be able to operate
nants of quality. This was the focus of a study reported in
communications aids in this manner. The article “Constancy
12.5 Correlation
of Head Turning Recorded in Healthy Young Humans”
Reduced Pressure Test as a Measuring Tool in the Evaluation
(J. of Biomed. Engr., 2008: 428–436) reported data on ranges
of PorosityHydrogen Content in A1–7 Wt Pct Si-10 Vol Pct
in maximum inclination angles of the head in the clockwise
SiC(p) Metal Matrix Composite” (Metallurgical Trans.,
anterior, posterior, right, and left directions for 14 randomly
1993: 1857–1868) gives the accompanying data on x con-
selected subjects. Consider the accompanying data on aver-
tent and y gas porosity for one particular measurement
age anterior maximum inclination angle (AMIA) both in the
technique.
clockwise direction and in the counterclockwise direction.
Co: 44.2 52.1 60.2 52.7 47.2 65.6 71.4 x
Co: 48.8 53.1 66.3 59.8 47.5 64.5 34.5 Minitab gives the following output in response to a
a. Calculate a point estimate of the population correlation
Correlation command:
coefficient between Cl AMIA and Co AMIA (g Cl 5
Correlation of Hydrcon and Porosity 5 0.449 786.7, g Co 5 767.9, g Cl 2 5 45,727.31, g Co 2 5 a. Test at level .05 to see whether the population correlation
43,478.07, g ClCo 5 44,187.87) .
coefficient differs from 0.
b. Assuming bivariate normality (normal probability plots
b. If a simple linear regression analysis had been carried
of the Cl and Co samples are reasonably straight), carry
out, what percentage of observed variation in porosity
out a test at significance level .01 to decide whether there
could be attributed to the model relationship?
is a linear association between the two variables in the population (as do the authors of the cited paper). Would
63. Physical properties of six flame-retardant fabric samples were
the conclusion have been the same if a significance level
investigated in the article “Sensory and Physical Properties of
of .001 had been used?
Inherently Flame-Retardant Fabrics” (Textile Research, 1984: 61–68). Use the accompanying data and a .05 significance
61. The authors of the paper “Objective Effects of a Six
level to determine whether a linear relationship exists
Months’ Endurance and Strength Training Program in
between stiffness x (mg-cm) and thickness y (mm). Is the
Outpatients with Congestive Heart Failure” (Medicine and
result of the test surprising in light of the value of r?
Science in Sports and Exercise, 1999: 1102–1107) pre- sented a correlation analysis to investigate the relationship
x
between maximal lactate level x and muscular endurance y. The accompanying data was read from a plot in the paper.
64. The article “Increases in Steroid Binding Globulins Induced by Tamoxifen in Patients with Carcinoma of the Breast” (J.
y
3.80 4.00 4.90 5.20 4.00 3.50 6.30 of Endocrinology, 1978: 219–226) reports data on the
effects of the drug tamoxifen on change in the level of cor- tisol-binding globulin (CBG) of patients during treatment.
y
6.88 7.55 4.95 7.80 4.45 6.60 8.90 With age 5 x and CBG 5 y , summary values are n 26, gx i 5 1613, g (x i 2 x) 2 5 3756.96, g y i 5 281.9,
S xx 5 36.9839, S yy 5 2,628,930.357, S
5 7377.704 .A g(y 2 y) xy 2 i 5 465.34 , and gx i y i 5 16,731 .
scatter plot shows a linear pattern.
a. Compute a 90 CI for the true correlation coefficient r.
a. Test to see whether there is a positive correlation be-
b. Test versus at H 0 : r 5 2.5 H a : r , 2.5 level .05.
tween maximal lactate level and muscular endurance in
c. In a regression analysis of y on x, what proportion of
the population from which this data was selected.
variation in change of cortisol-binding globulin level
b. If a regression analysis were to be carried out to predict
could be explained by variation in patient age within the
endurance from lactate level, what proportion of ob-
sample?
served variation in endurance could be attributed to the
d. If you decide to perform a regression analysis with age
approximate linear relationship? Answer the analogous
as the dependent variable, what proportion of variation in
question if regression is used to predict lactate level from
age is explainable by variation in CBG?
endurance—and answer both questions without doing
65. Torsion during hip external rotation and extension may
any regression calculations.
explain why acetabular labral tears occur in professional ath-
62. Hydrogen content is conjectured to be an important factor
letes. The article “Hip Rotational Velocities During the Full
in porosity of aluminum alloy castings. The article “The
Golf Swing” (J. of Sports Science and Med., 2009: 296–299)
CHAPTER 12 Simple Linear Regression and Correlation
reported on an investigation in which lead hip internal peak
values separated by one time unit. Similarly, the lag-two
rotational velocity (x) and trailing hip peak external rota-
sample autocorrelation coefficient r 2 is r for the n22
tional velocity (y) were determined for a sample of 15
pairs . (x 1 ,x 3 ), (x 2 ,x 4 ), c, (x n22 ,x n )
golfers. Data provided by the article’s authors was used to
a. Calculate the values of r 1 ,r 2 , and r 3 for the temperature
calculate the following summary quantities:
data from Exercise 82 of Chapter 1, and comment. b. Analogous to the population correlation coefficient r, let
g(x 2 x) 2 i 5 64,732.83, g(y 2 y) 2 5 130,566.96,
r
1 ,r 2 ,c denote the theoretical or long-run autocorre- lation coefficients at the various lags. If all these r’s are
i
g (x i 2 x)(y i 2 y) 5 44,185.87
0, there is no (linear) relationship at any lag. In this case, if n is large, each R i has approximately a normal distri- bution with mean 0 and standard deviation
1 1n, and
Separate normal probability plots showed very substantial
different R i ’s are almost independent. Thus H 0 :r i 50
linear patterns.
can be rejected at a significance level of approximately
a. Calculate a point estimate for the population correlation
.05 if either r i 2 1n or r i 22 1n . If n 5 100 and
coefficient.
r 1 5 .16, r 2 5 2.09 , and r 3 5 2.15 , is there any evi-
b. Carry out a test at significance level .01 to decide
dence of theoretical autocorrelation at the first three
whether there is a linear relationship between the two
lags?
velocities in the sampled population; your conclusion
c. If you are simultaneously testing the null hypothesis in
should be based on a P-value.
part (b) for more than one lag, why might you want to
c. Would the conclusion of (b) have changed if you had
increase the cutoff constant 2 in the rejection region?
tested appropriate hypotheses to decide whether there is
a positive linear association in the population? What if a
67. A sample of n 5 500(x, y) pairs was collected and a test of
significance level of .05 rather than .01 had been used?
H 0 :r50 versus H a :r20 was carried out. The resulting
P-value was computed to be .00032.
66. Consider a time series—that is, a sequence of observa-
a. What conclusion would be appropriate at level of signif-
tions X 1 ,X 2 ,c obtained over time—with observed val-
icance .001?
ues x 1 ,x 2 , c, x n . Suppose that the series shows no
b. Does this small P-value indicate that there is a very
upward or downward trend over time. An investigator will
strong linear relationship between x and y (a value of r
frequently want to know just how strongly values in the
that differs considerably from 0)? Explain.
series separated by a specified number of time units are
c. Now suppose a sample of n 5 10,000 (x, y) pairs resulted related. The lag-one sample autocorrelation coefficient r 1 in . r 5 .022 Test versus at H 0 :r50 H a :r20 level .05.
is just the value of the sample correlation coefficient r for
Is the result statistically significant? Comment on the
the pairs (x 1 ,x 2 ), (x 2 ,x 3 ), c, (x n21 ,x n ) , that is, pairs of
practical significance of your analysis.