Research Study: Effects of Oil Spill on Plant Growth

growth 1 year after the burning. In an unpublished Texas AM University disser- tation, Newman 1997 describes the researchers’ plan for evaluating the effect of the oil spill on Distichlis spicata, a flora of particular importance to the area of the spill. We will now describe a hypothetical set of steps that the researchers may have implemented in order to successfully design their research study. Defining the Problem The researchers needed to determine the important characteristics of the flora that may be affected by the spill. Some of the questions that needed to be answered prior to starting the study included the following: 1. What are the factors that determine the viability of the flora? 2. How did the oil spill affect these factors?

3.

Are there data on the important flora factors prior to the spill? 4. How should the researchers measure the flora factors in the oil-spill region? 5. How many observations are necessary to confirm that the flora has undergone a change after the oil spill? 6. What type of experimental design or study is needed? 7. What statistical procedures are valid for making inferences about the change in flora parameters after the oil spill? 8. What types of information should be included in a final report to docu- ment the changes observed if any in the flora parameters? Collecting the Data The researchers determined that there was no specific information on the flora in this region prior to the oil spill. Since there was no relevant information on flora density in the spill region prior to the spill, it was necessary to evaluate the flora den- sity in unaffected areas of the marsh to determine whether the plant density had changed after the oil spill. The researchers located several regions that had not been contaminated by the oil spill. The researchers needed to determine how many tracts would be required in order that their study yield viable conclusions. To determine how many tracts must be sampled, we would have to determine how accurately the researchers want to estimate the difference in the mean flora density in the spilled and unaffected regions. The researchers specified that they wanted the estimator of the difference in the two means to be within 8 units of the true difference in the means. That is, the researchers wanted to estimate the difference in mean flora den- sity with a 95 confidence interval having the form y Con ⫺ y Spill ⫾ 8. In previous studies on similar sites, the flora density ranged from 0 to 73 plants per tract. The number of tracts the researchers needed to sample in order to achieve their specifi- cations would involve the following calculations. We want a 95 confidence interval on m Con ⫺ m Spill with E ⫽ 8 and z a 2 ⫽ z .025 ⫽ 1.96. Our estimate of s is ⫽ range4 ⫽ 73 ⫺ 04 ⫽ 18.25. Substituting into the sample size formula, we have Thus, a random sample of 40 tracts should give a 95 confidence interval for m Con ⫺ m Spill with the desired tolerance of 8 plants provided 18.25 is a reasonable estimate of s. The spill region and the unaffected regions were divided into tracts of nearly the same size. From the above calculations, it was decided that 40 tracts from both the spill and unaffected areas would be used in the study. Forty tracts of exactly the same n ⫽ 2z a 兾2 2 s ˆ 2 E 2 ⫽ 21.96 2 18.25 2 8 2 ⫽ 39.98 ⬇ 40 s ˆ size were randomly selected in these locations, and the Distichlis spicata density was recorded. Similar measurements were taken within the spill area of the marsh. The data consist of 40 measurements of flora density in the uncontaminated control sites and 40 density measurements in the contaminated spill sites. The data are on the book’s companion website, www.cengage.comstatisticsott. The researchers would next carefully examine the data from the field work to determine if the meas- urements were recorded correctly. The data would then be transfered to computer files and prepared for analysis. Summarizing Data The next step in the study would be to summarize the data through plots and sum- mary statistics. The data are displayed in Figure 6.8 with summary statistics given in Table 6.19. A boxplot of the data displayed in Figure 6.9 indicates that the control FIGURE 6.8 Number of plants observed in tracts at oil spill and control sites. The data are displayed in stem-and-leaf plots s t c a r T l l i p S l i O s t c a r T l o r t n o C Mean: 38.48 000 Mean: 26.93 Median: 41.50 7 59 Median: 26.00 S t. Dev: 16.37 1 1 14 S t. Dev: 9.88 n: 40 6 1 77799 n: 40 4 2 2223444 9 2 555667779 3 11123444 55678 3 5788 000111222233 4 1 57 4 0112344 5 02 67789 5 Descriptive Statistics Variable Site Type N Mean Median Tr. Mean St. Dev. No. plants Control 40 38.48 41.50 39.50 16.37 Oil spill 40 26.93 26.00 26.69 9.88 Variable Site Type SE Mean Minimum Maximum Q1 Q3 No. plants Control 2.59 0.00 59.00 35.00 51.00 Oil spill 1.56 5.00 52.00 22.00 33.75 TABLE 6.19 Summary statistics for oil spill data FIGURE 6.9 Number of plants observed in tracts at control sites 1 and oil spill sites 2 Spill sites Plant density Control sites 10

20 30

40 50 60 sites have a somewhat greater plant density than the oil-spill sites. From the sum- mary statistics, we have that the average flora density in the control sites is Con ⫽ 38.48 with a standard deviation of s Con ⫽ 16.37. The sites within the spill region have an average density of Spill ⫽ 26.93 with a standard deviation of s Spill ⫽ 9.88. Thus, the control sites have a larger average flora density and a greater variability in flora den- sity than do the sites within the spill region. Whether these observed differences in flora density reflect similar differences in all the sites and not just the ones included in the study will require a statistical analysis of the data. Analyzing Data The researchers hypothesized that the oil-spill sites would have a lower plant den- sity than the control sites. Thus, we will construct confidence intervals on the mean plant density in the control plots, m Con and in the oil spill plots, m Spill to assess their average plant density. Also, we can construct confidence intervals on the difference m Con ⫺ m Spill and test the research hypothesis that m Con is greater than m Spill . From Figure 6.9, the data from the oil spill area appear to have a normal distribution, whereas the data from the control area appear to be skewed to the left. The normal probability plots are given in Figure 6.10 to further assess whether the population distributions are in fact normal in shape. We observe that the data from the spill tracts appear to follow a normal distribution but the data from the control tracts do not since their plotted points do not fall close to the straight line. Also, the vari- ability in plant density is higher in control sites than in the spill sites. Thus, the ap- proximate t procedures will be the most appropriate inference procedures. The sample data yielded the summary values shown in Table 6.20. The research hypothesis is that the mean plant density for the control plots exceeds that for the oil spill plots. Thus, our statistical test is set up as follows: H : m Con ⱕ m Spill versus H a : m Con ⬎ m Spill That is, H : m Con ⫺ m Spill ⱕ H a : m Con ⫺ m Spill ⬎ T.S.: In order to compute the rejection region and p-value, we need to compute the approximate df for t⬘. ⫽ 64.38, which is rounded to 64. Since Table 2 in the Appendix does not have df ⫽ 64, we will use df ⫽ 60. In fact, the difference is very small when df becomes large: t .05 ⫽ 1.671 and 1.669 for df ⫽ 60 and 64, respectively. R.R.: For a ⫽ .05 and df ⫽ 60, reject H if t⬘ ⬎ 1.671 df ⫽ n Con ⫺ 1n Spill ⫺ 1 1 ⫺ c 2 n Con ⫺ 1 ⫹ c 2 n Spill ⫺ 1 ⫽ 3939 1 ⫺ .73 2 39 ⫹ .73 2 39 c ⫽ s 2 Con 兾n Con s 2 Con n Con ⫹ s 2 Spill n Spill ⫽ 16.37 2 兾40 16.37 2 兾40 ⫹ 9.88 2 兾40 ⫽ .73 t⬘ ⫽ y Con ⫺ y Spill ⫺ D A s 2 Con n Con ⫹ s 2 Spill n Spill ⫽ 38.48 ⫺ 26.93 ⫺ 0 A 16.37 2 40 ⫹ 9.88 2 40 ⫽ 3.82 y y Since t⬘ ⫽ 3.82 is greater than 1.671, we reject H . We can bound the p-value using Table 2 in the Appendix with df ⫽ 60. With t⬘ ⫽ 3.82, the level of significance is p-value ⬍ .001. Thus, we can conclude that there is significant p-value ⬍ .001 evi- dence that m Con is greater than m Spill . Although we have determined that there is a statistically significant amount of evidence that the mean plant density at the control sites is greater than the mean plant density at the spill sites, the question remains FIGURE 6.10 a Normal probability plot for oil-spill sites. b Normal probability plot for control sites 5 1

10 20

30 Plant density a 40 50 10 20 Percent 30 40 50 60 70 80 90 95 99 P -value .100 RJ .990 N 40 StDev 9.882 Mean 26.93 5 1

10 20

30 Plant density b 40 80 10 20 Percent 30 40 50 60 70 80 90 95 99 50 60 70 P -value .010 RJ .937 N 40 StDev 16.37 Mean 38.48 Control Plots Oil Spill Plots n Con ⫽ 40 n Spill ⫽ 40 Con ⫽ 38.48 Spill ⫽ 26.93 s Con ⫽ 16.37 s Spill ⫽ 9.88 y y TABLE 6.20 whether these differences have practical significance. We can estimate the size of the difference in the means by placing a 95 confidence interval on m Con ⫺ m Spill . The appropriate 95 confidence interval for m Con ⫺ m Spill is computed by using the following formula with df ⫽ 64, the same as was used for the R.R. Thus, we are 95 confident that the mean plant densities differ by an amount between 5.5 and 17.6. The plant scientists would then evaluate whether a difference in this range is of practical importance. This would then determine whether the sites in which the oil spill occurred have been returned to their prespill condition, at least in terms of this particular type of flora. Reporting Conclusions We would need to write a report summarizing our findings from the study. The fol- lowing items should be included in the report: 1. Statement of objective for study 2. Description of study design and data collection procedures

3.

Numerical and graphical summaries of data sets ● table of means, medians, standard deviations, quartiles, range ● boxplots ● stem-and-leaf plots 4. Description of all inference methodologies: ● approximate t tests of differences in means ● approximate t-based confidence interval on population means ● Verification that all necessary conditions for using inference techniques were satisfied using boxplots, normal probability plots 5. Discussion of results and conclusions 6. Interpretation of findings relative to previous studies 7. Recommendations for future studies 8. Listing of data set

6.8 Summary and Key Formulas

In this chapter, we have considered inferences about m 1 ⫺ m 2 . The first set of meth- ods was based on independent random samples being selected from the popula- tions of interest. We learned how to sample data to run a statistical test or to construct a confidence interval for m 1 ⫺ m 2 using t methods. Wilcoxon’s rank sum test, which does not require normality of the underlying populations, was pre- sented as an alternative to the t test. The second major set of procedures can be used to make comparisons between two populations when the sample measurements are paired. In this situation, we no longer have independent random samples, and hence the procedures of Sec- tions 6.2 and 6.3 t methods and Wilcoxon’s rank sum are inappropriate. The test 38.48 ⫺ 26.93 ⫾ 2.0 A 16.37 2 40 ⫹ 9.88 2 40 or 11.55 ⫾ 6.05 y Con ⫺ y Spill ⫾ t a 兾2 A s 2 Con n Con ⫹ s 2 Spill n Spill or and estimation methods for paired data are based on the sample differences for the paired measurements or the ranks of the differences. The paired t test and corre- sponding confidence interval based on the difference measurements were intro- duced and found to be identical to the single-sample t methods of Chapter 5. The nonparametric alternative to the paired t test is Wilcoxon’s signed-rank test. The material presented in Chapters 5 and 6 lays the foundation of statistical inference estimation and testing for the remainder of the text. Review the mate- rial in this chapter periodically as new topics are introduced so that you retain the basic elements of statistical inference. Key Formulas 1. 1001 ⫺ a confidence interval for m 1 ⫺ m 2 , independent samples; y 1 and y 2 approximately normal; where 2. t test for m 1 ⫺ m 2 , independent samples; y 1 and y 2 approximately normal; T.S.:

3.

t⬘ test for m 1 ⫺ m 2 , unequal variances; independent samples; y 1 and y 2 approximately normal; T.S.: where 4. 1001 ⫺ a confidence interval for m 1 ⫺ m 2 , unequal variances; independent samples; y 1 and y 2 approximately normal; where the t-percentile has with c ⫽ s 2 1 兾n 1 s 2 1 n 1 ⫹ s 2 2 n 2 df ⫽ n 1 ⫺ 1n 2 ⫺ 1 1 ⫺ c 2 n 1 ⫺ 1 ⫹ c 2 n 2 ⫺ 1 , y 1 ⫺ y 2 ⫾ t a 兾2 A s 2 1 n 1 ⫹ s 2 2 n 2 c ⫽ s 2 1 兾n 1 s 2 1 n 1 ⫹ s 2 2 n 2 t⬘ ⫽ y 1 ⫺ y 2 ⫺ D A s 2 1 n 1 ⫹ s 2 2 n 2 df ⫽ n 1 ⫺ 1n 2 ⫺ 1 1 ⫺ c 2 n 1 ⫺ 1 ⫹ c 2 n 2 ⫺ 1 t ⫽ y 1 ⫺ y 2 ⫺ D s p 11兾n 1 ⫹ 1 兾n 2 df ⫽ n 1 ⫹ n 2 ⫺ 2 s 2 1 ⫽ s 2 2 s p ⫽ A n 1 ⫺ 1s 2 1 ⫹ n 2 ⫺ 1s 2 2 n 1 ⫹ n 2 ⫺ 2 and df ⫽ n 1 ⫹ n 2 ⫺ 2 y 1 ⫺ y 2 ⫾ t a 兾2 s p A 1 n 1 ⫹ 1 n 2 s 2 1 ⫽ s 2 2