19 The article “A Study of a Partial Nutrient Removal System for Wastewater Treatment

Example 12.19 The article “A Study of a Partial Nutrient Removal System for Wastewater Treatment

  Plants” (Water Research, 1972: 1389–1397) reports on a method of nitrogen removal that involves the treatment of the supernatant from an aerobic digester. Both the influent total nitrogen x (mg L) and the percentage y of nitrogen removed were

  recorded for 20 days, with resulting summary statistics gx 5 285.90, g x 2 i i 5

  4409.55, g y 5 690.30, g y 2 i i 5 29,040.29 , and gx i y i 5 10,818.56 . The sample

  correlation coefficient between influent nitrogen and percentage nitrogen removed is

  r 5 .733 , giving v 5 .935 . With n 5 20 , a 95 confidence interval for m V is (.935 2 1.96 117, .935 1 1.96 117) 5 (.460, 1.410) 5 (c 1 ,c 2 ) . The 95 inter-

  val for r is

  e 2(.46) 21 e 2(1.41) c 21

  e 11 e 2(1.41)

  In Chapter 5, we cautioned that a large value of the correlation coefficient (near

  1 or 21 ) implies only association and not causation. This applies to both r and r.

  EXERCISES Section 12.5 (57–67)

  57. The article “Behavioural Effects of Mobile Telephone Use

  “Post-Harvest Glyphosphate Application Reduces Tough-

  During Simulated Driving” (Ergonomics, 1995:

  ening, Fiber Content, and Lignification of Stored Asparagus

  2536–2562) reported that for a sample of 20 experimental

  Spears” (J. of the Amer. Soc. of Hort. Science, 1988: 569–572).

  subjects, the sample correlation coefficient for x age and

  The article reported the accompanying data (read from a graph)

  y

  time since the subject had acquired a driving license

  on x shear force (kg) and y percent fiber dry weight.

  (yr) was .97. Why do you think the value of r is so close to 1? (The article’s authors give an explanation.)

  x

  58. The Turbine Oil Oxidation Test (TOST) and the Rotating

  y

  Bomb Oxidation Test (RBOT) are two different procedures for evaluating the oxidation stability of steam turbine oils.

  x

  The article “Dependence of Oxidation Stability of Steam Turbine Oil on Base Oil Composition” (J. of the Society of

  y

  Tribologists and Lubrication Engrs., Oct. 1997: 19–24) n 5 18, gx 5 1950, gx reported the accompanying observations on x 2 5 251,970, TOST time i i

  gy i 5 47.92, gy i 2 5 130.6074, gx RBOT time (min) for 12 oil specimens. i y i 5 5530.92

  (hr) and y

  a. Calculate the value of the sample correlation coefficient.

  Based on this value, how would you describe the nature

  of the relationship between the two variables?

  b. If a first specimen has a larger value of shear force than

  does a second specimen, what tends to be true of percent dry fiber weight for the two specimens?

  a. Calculate and interpret the value of the sample correla-

  c. If shear force is expressed in pounds, what happens to

  tion coefficient (as do the article’s authors).

  the value of r? Why?

  b. How would the value of r be affected if we had let

  d. If the simple linear regression model were fit to this data,

  x RBOT time and y TOST time?

  what proportion of observed variation in percent fiber dry

  c. How would the value of r be affected if RBOT time were

  weight could be explained by the model relationship?

  expressed in hours?

  e. Carry out a test at significance level .01 to decide

  d. Construct normal probability plots and comment.

  whether there is a positive linear association between the

  e. Carry out a test of hypotheses to decide whether RBOT

  two variables.

  time and TOST time are linearly related.

  60. Head movement evaluations are important because individu-

  59. Toughness and fibrousness of asparagus are major determi-

  als, especially those who are disabled, may be able to operate

  nants of quality. This was the focus of a study reported in

  communications aids in this manner. The article “Constancy

  12.5 Correlation

  of Head Turning Recorded in Healthy Young Humans”

  Reduced Pressure Test as a Measuring Tool in the Evaluation

  (J. of Biomed. Engr., 2008: 428–436) reported data on ranges

  of PorosityHydrogen Content in A1–7 Wt Pct Si-10 Vol Pct

  in maximum inclination angles of the head in the clockwise

  SiC(p) Metal Matrix Composite” (Metallurgical Trans.,

  anterior, posterior, right, and left directions for 14 randomly

  1993: 1857–1868) gives the accompanying data on x con-

  selected subjects. Consider the accompanying data on aver-

  tent and y gas porosity for one particular measurement

  age anterior maximum inclination angle (AMIA) both in the

  technique.

  clockwise direction and in the counterclockwise direction.

  Co: 44.2 52.1 60.2 52.7 47.2 65.6 71.4 x

  Co: 48.8 53.1 66.3 59.8 47.5 64.5 34.5 Minitab gives the following output in response to a

  a. Calculate a point estimate of the population correlation

  Correlation command:

  coefficient between Cl AMIA and Co AMIA (g Cl 5

  Correlation of Hydrcon and Porosity 5 0.449 786.7, g Co 5 767.9, g Cl 2 5 45,727.31, g Co 2 5 a. Test at level .05 to see whether the population correlation

  43,478.07, g ClCo 5 44,187.87) .

  coefficient differs from 0.

  b. Assuming bivariate normality (normal probability plots

  b. If a simple linear regression analysis had been carried

  of the Cl and Co samples are reasonably straight), carry

  out, what percentage of observed variation in porosity

  out a test at significance level .01 to decide whether there

  could be attributed to the model relationship?

  is a linear association between the two variables in the population (as do the authors of the cited paper). Would

  63. Physical properties of six flame-retardant fabric samples were

  the conclusion have been the same if a significance level

  investigated in the article “Sensory and Physical Properties of

  of .001 had been used?

  Inherently Flame-Retardant Fabrics” (Textile Research, 1984: 61–68). Use the accompanying data and a .05 significance

  61. The authors of the paper “Objective Effects of a Six

  level to determine whether a linear relationship exists

  Months’ Endurance and Strength Training Program in

  between stiffness x (mg-cm) and thickness y (mm). Is the

  Outpatients with Congestive Heart Failure” (Medicine and

  result of the test surprising in light of the value of r?

  Science in Sports and Exercise, 1999: 1102–1107) pre- sented a correlation analysis to investigate the relationship

  x

  between maximal lactate level x and muscular endurance y. The accompanying data was read from a plot in the paper.

  64. The article “Increases in Steroid Binding Globulins Induced by Tamoxifen in Patients with Carcinoma of the Breast” (J.

  y

  3.80 4.00 4.90 5.20 4.00 3.50 6.30 of Endocrinology, 1978: 219–226) reports data on the

  effects of the drug tamoxifen on change in the level of cor- tisol-binding globulin (CBG) of patients during treatment.

  y

  6.88 7.55 4.95 7.80 4.45 6.60 8.90 With age 5 x and CBG 5 y , summary values are n 26, gx i 5 1613, g (x i 2 x) 2 5 3756.96, g y i 5 281.9,

  S xx 5 36.9839, S yy 5 2,628,930.357, S

  5 7377.704 .A g(y 2 y) xy 2 i 5 465.34 , and gx i y i 5 16,731 .

  scatter plot shows a linear pattern.

  a. Compute a 90 CI for the true correlation coefficient r.

  a. Test to see whether there is a positive correlation be-

  b. Test versus at H 0 : r 5 2.5 H a : r , 2.5 level .05.

  tween maximal lactate level and muscular endurance in

  c. In a regression analysis of y on x, what proportion of

  the population from which this data was selected.

  variation in change of cortisol-binding globulin level

  b. If a regression analysis were to be carried out to predict

  could be explained by variation in patient age within the

  endurance from lactate level, what proportion of ob-

  sample?

  served variation in endurance could be attributed to the

  d. If you decide to perform a regression analysis with age

  approximate linear relationship? Answer the analogous

  as the dependent variable, what proportion of variation in

  question if regression is used to predict lactate level from

  age is explainable by variation in CBG?

  endurance—and answer both questions without doing

  65. Torsion during hip external rotation and extension may

  any regression calculations.

  explain why acetabular labral tears occur in professional ath-

  62. Hydrogen content is conjectured to be an important factor

  letes. The article “Hip Rotational Velocities During the Full

  in porosity of aluminum alloy castings. The article “The

  Golf Swing” (J. of Sports Science and Med., 2009: 296–299)

  CHAPTER 12 Simple Linear Regression and Correlation

  reported on an investigation in which lead hip internal peak

  values separated by one time unit. Similarly, the lag-two

  rotational velocity (x) and trailing hip peak external rota-

  sample autocorrelation coefficient r 2 is r for the n22

  tional velocity (y) were determined for a sample of 15

  pairs . (x 1 ,x 3 ), (x 2 ,x 4 ), c, (x n22 ,x n )

  golfers. Data provided by the article’s authors was used to

  a. Calculate the values of r 1 ,r 2 , and r 3 for the temperature

  calculate the following summary quantities:

  data from Exercise 82 of Chapter 1, and comment. b. Analogous to the population correlation coefficient r, let

  g(x 2 x) 2 i 5 64,732.83, g(y 2 y) 2 5 130,566.96,

  r

  1 ,r 2 ,c denote the theoretical or long-run autocorre- lation coefficients at the various lags. If all these r’s are

  i

  g (x i 2 x)(y i 2 y) 5 44,185.87

  0, there is no (linear) relationship at any lag. In this case, if n is large, each R i has approximately a normal distri- bution with mean 0 and standard deviation

  1 1n, and

  Separate normal probability plots showed very substantial

  different R i ’s are almost independent. Thus H 0 :r i 50

  linear patterns.

  can be rejected at a significance level of approximately

  a. Calculate a point estimate for the population correlation

  .05 if either r i 2 1n or r i 22 1n . If n 5 100 and

  coefficient.

  r 1 5 .16, r 2 5 2.09 , and r 3 5 2.15 , is there any evi-

  b. Carry out a test at significance level .01 to decide

  dence of theoretical autocorrelation at the first three

  whether there is a linear relationship between the two

  lags?

  velocities in the sampled population; your conclusion

  c. If you are simultaneously testing the null hypothesis in

  should be based on a P-value.

  part (b) for more than one lag, why might you want to

  c. Would the conclusion of (b) have changed if you had

  increase the cutoff constant 2 in the rejection region?

  tested appropriate hypotheses to decide whether there is

  a positive linear association in the population? What if a

  67. A sample of n 5 500(x, y) pairs was collected and a test of

  significance level of .05 rather than .01 had been used?

  H 0 :r50 versus H a :r20 was carried out. The resulting

  P-value was computed to be .00032.

  66. Consider a time series—that is, a sequence of observa-

  a. What conclusion would be appropriate at level of signif-

  tions X 1 ,X 2 ,c obtained over time—with observed val-

  icance .001?

  ues x 1 ,x 2 , c, x n . Suppose that the series shows no

  b. Does this small P-value indicate that there is a very

  upward or downward trend over time. An investigator will

  strong linear relationship between x and y (a value of r

  frequently want to know just how strongly values in the

  that differs considerably from 0)? Explain.

  series separated by a specified number of time units are

  c. Now suppose a sample of n 5 10,000 (x, y) pairs resulted related. The lag-one sample autocorrelation coefficient r 1 in . r 5 .022 Test versus at H 0 :r50 H a :r20 level .05.

  is just the value of the sample correlation coefficient r for

  Is the result statistically significant? Comment on the

  the pairs (x 1 ,x 2 ), (x 2 ,x 3 ), c, (x n21 ,x n ) , that is, pairs of

  practical significance of your analysis.