Research Study: Percent Calories from Fat

3.

What characteristics of the nurses other than dietary intake may be important in studying the nurses’ health condition? 4. How should the nurses be selected to participate in the study? 5. What hypotheses are of interest to the researchers? The researchers decided that the main variable of interest was the percentage of calories from fat PCF in the diet of nurses. The parameters of interest were the average of PCF values m for the population of nurses, the standard deviation s of PCF for the population of nurses, and the proportion p of nurses having PCF greater than 50. They also wanted to determine if the average PCF for the pop- ulation of nurses exceeded the recommended value of 30. In order to estimate these parameters and test hypotheses about the parame- ters, it was first necessary to determine the sample size required to meet certain specifications imposed by the researchers. The researchers wanted to estimate the mean PCF with a 95 confidence interval having a tolerable error of 3. From pre- vious studies, the values of PCF ranged from 10 to 50. Because we want a 95 confidence interval with width 3, E ⫽ 3 兾2 ⫽ 1.5 and z a 兾2 ⫽ z .025 ⫽ 1.96. Our estimate of s is ⫽ range 兾4 ⫽ 50 ⫺ 10兾4 ⫽ 10. Substituting into the formula for n, we have Thus, a random sample of 171 nurses should give a 95 confidence interval for m with the desired width of 3, provided 10 is a reasonable estimate of s. Three nurses originally selected for the study did not provide information on PCF; therefore, the sample size was only 168. Collecting Data The researchers would need to carefully examine the data from the food frequency questionnaires to determine if the responses were recorded correctly. The data would then be transfered to computer files and prepared for analysis following the steps outlined in Chapter 2. The next step in the study would be to summarize the data through plots and summary statistics. Summarizing Data The PCF values for the 168 women are displayed in Figure 5.23 in a stem-and-leaf di- agram along with a table of summary statistics. A normal probability plot is pro- vided in Figure 5.24 to assess the normality of the distribution of PCF values. From the stem-and-leaf plot and normal probability plot, it appears that the data are nearly normally distributed, with PCF values ranging from 15 to 57. The proportion of the women who have PCF greater than 50 is . From the table of summary statistics in the output, the sample mean is ⫽ 36.919 and the sample standard deviation is s ⫽ 6.728. The researchers want to draw infer- ences from the random sample of 168 women to the population from which they were selected. Thus, we would need to place bounds on our point estimates in order to reflect our degree of confidence in their estimation of the population values. Also, they may be interested in testing hypotheses about the size of the pop- ulation mean PCF m or variance s 2 . For example, many nutritional experts recom- mend that one’s daily diet have no more than 30 of total calories a day from fat. Thus, we would want to test the statistical hypotheses that m is greater than 30 to determine if the average value of PCF for the population of nurses exceeds the recommended value. y ˆ p ⫽ 4 兾168 ⫽ 2.4 n ⫽ z a 兾2 2 ˆs 2 E 2 ⫽ 1.96 2 10 2 1.5 2 ⫽ 170.7 ˆ s Analyzing Data and Interpreting the Analyses One of the objectives of the study was to estimate the mean percentage of calories in the diet of nurses from fat. Also, the researchers wanted to test whether the mean was greater than the recommended value of 30. Prior to constructing confidence intervals or testing hypotheses, we must first check whether the data represent ran- dom samples from normally distributed populations. From the normal probability plot in Figure 5.24, the data values fall nearly on a straight line. Hence, we can con- clude that the data appear to follow a normal distribution. The mean and standard deviation of the PCF data were given by ⫽ 36.92 and s ⫽ 6.73. We can next construct a 95 confidence interval for the mean PCF for the population of nurses as follows: Thus, we are 95 confident that the mean PCF in the population of nurses is be- tween 35.90 and 37.94. Thus, we would be inclined to conclude that the mean PCF for the population of nurses exceeds the recommended value of 30. We will next formally test the following hypotheses: H : m ⱕ 30 versus H a : m ⬎ 30 36.92 ⫾ t .025,167 6.73 1168 or 36.92 ⫾ 1.974 6.73 1168 or 36.92 ⫾ 1.02 y FIGURE 5.23 The percentage of calories from fat PCF for 168 women in a dietary study 1 5 2 0 0 4 4 2 5 5 6 6 6 6 7 7 8 8 8 9 9 9 9 9 9 9 9 3 0 0 0 0 0 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 4 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 4 4 4 4 4 4 5 5 5 5 5 6 6 6 7 7 8 9 9 5 0 3 4 5 5 7 D escriptive Statistics for Percentage Calories from Fat Data Var iable N Mean Median TrMean StDev SE Mean PCF 168 36.919 36.473 36.847 6.728 0.519 Var iable Minimum Maximum Q 1 Q 3 PCF 15.925 57.847 32.766 41.295 FIGURE 5.24 Normal probability plot for percentage of calories from fat PCF .999 .99 .95 .80 .50 .20 .05 .01 .001

15 25

35 45 55 Percentage of calories from fat PCF Probability Since the data appear to be normally distributed and in any case the sample size is reasonably large, we can use the t test with rejection region as follows: R.R. For a one-tail t test with a ⫽ .05, we reject H if Since t ⫽ , we reject H . The p-value of the test is essentially 0, so we can conclude that the mean PCF value is very significantly greater than 30. Thus, there is strong evidence that the population of nurses has an average PCF larger than the recommended value of 30. The experts in this field would have to determine the practical consequences of having a PCF value between 5.90 and 7.94 units higher than the recommended value. Reporting Conclusions A report summarizing our findings from the study would include the following items: 1. Statement of objective for study 2. Description of study design and data collection procedures

3.

Numerical and graphical summaries of data sets 4. Description of all inference methodologies: ● t tests ● t-based confidence interval on population mean ● Verification that all necessary conditions for using inference techniques were satisfied 5. Discussion of results and conclusions 6. Interpretation of findings relative to previous studies 7. Recommendations for future studies 8. Listing of data set

5.11 Summary and Key Formulas

A population mean or median can be estimated using point or interval estimation. The selection of the median in place of the mean as a representation of the center of a population depends on the shape of the population distribution. The per- formance of an interval estimate is determined by the width of the interval and the confidence coefficient. The formulas for a 1001 ⫺ a confidence interval for the mean m and median M were given. A formula was provided for determining the necessary sample size in a study so that a confidence interval for m would have a predetermined width and level of confidence. Following the traditional approach to hypothesis testing, a statistical test consists of five parts: research hypothesis, null hypothesis, test statistic, rejection region, and checking assumptions and drawing conclusions. A statistical test employs the technique of proof by contradiction. We conduct experiments and studies to gather data to verify the research hypothesis through the contradic- tion of the null hypothesis H . As with any two-decision process based on vari- able data, there are two types of errors that can be committed. A Type I error is the rejection of H when H is true and a Type II error is the acceptance of H when the alternative hypothesis H a is true. The probability for a Type I error is denoted by a. For a given value of the mean m a in H a , the probability of a Type II error is denoted by bm a . The value of bm a decreases as the distance from m a to m increases. The power of a test of hypothesis is the probability that the 36.92 ⫺ 30 6.73 兾 1168 ⫽ 13.33 t ⫽ y ⫺ 30 s 兾 1168 ⱖ t .05,167 ⫽ 1.654 test will reject H when the value of m resides in H a . Thus, the power at m a equals 1 ⫺ bm a . We also demonstrated that for a given sample size and value of the mean m a , a and bm a are inversely related; as a is increased, bm a decreases, and vice versa. If we specify the sample size n and a for a given test procedure, we can compute b m a for values of the mean m a in the alternative hypothesis. In many studies, we need to determine the necessary sample size n to achieve a testing procedure hav- ing a specified value for a and a bound on bm a . A formula is provided to deter- mine n such that a level a test has bm a ⱕ b whenever m a is a specified distance beyond m . We developed an alternative to the traditional decision-based approach for a statistical test of hypotheses. Rather than relying on a preset level of a, we compute the weight of evidence in the data for rejecting the null hypothesis. This weight, ex- pressed in terms of a probability, is called the level of significance for the test. Most professional journals summarize the results of a statistical test using the level of significance. We discussed how the level of significance can be used to obtain the same results as the traditional approach. We also considered inferences about m when s is unknown which is the usual situation. Through the use of the t distribution, we can construct both confidence intervals and a statistical test for m. The t-based tests and confidence intervals do not have the stated levels or power when the population distribution is highly skewed or very heavy tailed and the sample size is small. In these situa- tions, we may use the median in place of the mean to represent the center of the population. Procedures were provided to construct confidence intervals and tests of hypotheses for the population median. Alternatively, we can use bootstrap methods to approximate confidence intervals and tests when the population dis- tribution is nonnormal and n is small. Key Formulas Estimation and tests for m and the median: 1. 1001 ⫺ a confidence interval for m s known when sampling from a normal population or n large 2. 1001 ⫺ a confidence interval for m s unknown when sampling from a normal population or n large

3.

Sample size for estimating m with a 1001 ⫺ a confidence interval, 4. Statistical test for m s known when sampling from a normal population or n large Test statistics: 5. Statistical test for m s unknown when sampling from a normal population or n large Test statistics: t ⫽ y ⫺ m s 兾1n , df ⫽ n ⫺ 1 z ⫽ y ⫺ m s 兾 1n n ⫽ z a 兾2 2 s 2 E 2 y ⫾ E y ⫾ t a 兾2 s 兾 1n, df ⫽ n ⫺ 1 y ⫾ z a 兾2 s 兾1n