Inferences about M When Population Is Nonnormal
3.
Compute the mean and standard deviation of . 4. Compute the value of the statistic 5. Repeat Steps 2 – 4 a large number of times B to obtain Use these values to obtain an approximation to the sampling distribution of . Suppose we have n ⫽ 20 and we select B ⫽ 1,000 bootstrap samples. The steps in obtaining the bootstrap approximation to the sampling distribution of are depicted here. Obtain random sample y 1 , y 2 , . . . , y 20 , from population, and compute and s First bootstrap sample: yields , and Second bootstrap sample: yields , and . . . Bth bootstrap sample: yields , and We then use the B values of to obtain the approximate per- centiles. For example, suppose we want to construct a 95 confidence interval for m and B ⫽ 1,000. We need the lower and upper .025 percentiles, . Thus, we would take the 1,000.025 ⫽ 25th largest value of ⫽ .025 and the 1,000 1 ⫺ .025 ⫽ 975th largest value of ⫽ .975 . The approximate 95 confidence interval for m would be EXAMPLE 5.18 Secondhand smoke is of great concern, especially when it involves young children. Breathing secondhand smoke can be harmful to children’s health, contributing to health problems such as asthma, Sudden Infant Death Syndrome SIDS, bronchi- tis and pneumonia, and ear infections. The developing lungs of young children are severely affected by exposure to secondhand smoke. The Child Protective Services CPS in a city is concerned about the level of exposure to secondhand smoke for children placed by their agency in foster parents care. A method of determining level of exposure is to determine the urinary concentration of cotanine, a metabo- lite of nicotine. Unexposed children will typically have mean cotanine levels of 75 or less. A random sample of 20 children expected of being exposed to secondhand smoke yielded the following urinary concentrations of cotanine: 29, 30, 53, 75, 89, 34, 21, 12, 58, 84, 92, 117, 115, 119, 109, 115, 134, 253, 289, 287 冢 y ⫺ ˆt .025 s 1n , y ⫹ ˆt .975 s 1n 冣 ˆt ˆt ˆt ˆt ˆt .025 and ˆt .975 ˆt: ˆt 1 , ˆt 2 , . . . , ˆt B ˆt B ⫽ y ⫺ y s 兾120 s y y 1 , y 2 , . . . , y 20 ˆt 2 ⫽ y ⫺ y s 兾120 s y y 1 , y 2 , . . . , y 20 ˆt 1 ⫽ y ⫺ y s 兾120 s y y 1 , y 2 , . . . , y 20 y y ⫺ m s 兾 1n y ⫺ m s 兾 1n ˆt 1 , ˆt 2 , . . . , ˆt B . ˆt ⫽ y ⫺ y s 兾 1n y 1 , y 2 , . . . , y n s y y 1 , y 2 , . . . , y n y CPS wants an estimate of the mean cotanine level in the children under their care. From the sample of 20 children, they compute ⫽ 105.75 and s ⫽ 82.429. Construct a 95 confidence interval for the mean cotanine level for children under the supervision of CPS. Solution Because the sample size is relatively small, an assessment of whether the population has a normal distribution is crucial prior to using a confidence interval procedure based on the t distribution. Figure 5.20 displays a normal probability plot for the 20 data values. From the plot, we observe that the data do not fall near the straight line, and the p-value for the test of normality is less than .01. Thus, we would conclude that the data do not appear to follow a normal distribution. The confidence interval based on the t distribution would not be appropriate hence we will use a bootstrap confidence interval. y FIGURE 5.20 Normal probability plot for cotanine data ⫺100 100 C1 200 300 p -value .010 RJ .917 N 20 StDev 82.43 Mean 105.8 5 110 20
Percent 30 40 50 60 70 80 90 95 99 One thousand B ⫽ 1,000 samples of size 20 are selected with replacement from the original sample. Table 5.7 displays 5 of the 1,000 samples to illustrate the nature of the bootstrap samples. Original 29 30 53 75 89 34 21 12 58 84 Sample 92 117 115 119 109 115 134 253 289 287 Bootstrap 29 21 12 115 21 89 29 30 21 89 Sample 1 30 84 84 134 58 30 34 89 29 134 Bootstrap 30 92 75 109 115 117 84 89 119 289 Sample 2 115 75 21 92 109 12 289 58 92 30 Bootstrap 53 289 30 92 30 253 89 89 75 119 Sample 3 115 117 253 53 84 34 58 289 92 134 Bootstrap 75 21 115 287 119 75 75 53 34 29 Sample 4 117 115 29 115 115 253 289 134 53 75 Bootstrap 89 119 109 109 115 119 12 29 84 21 Sample 5 34 134 115 134 75 58 30 75 109 134 TABLE 5.7 Bootstrap samples Upon examination of Table 5.7, it can be observed that in each of the bootstrap samples there are repetitions of some of the original data values. This arises due to the sampling with replacement. The following histogram of the 1,000 values of illustrates the effect of the nonnormal nature of the population distribu- tion on the sampling distribution on the t statistic. If the sample had been ran- domly selected from a normal distribution, the histogram would be symmetric, as was depicted in Figure 5.14. The histogram in Figure 5.21 is somewhat left-skewed. ˆt ⫽ y ⫺ y s 兾1n FIGURE 5.21 Histogram of bootstrapped t-statistic 250 200 150 Frequenc y 100 50 –8 –6 –4 –2 2 Values of bootstrap t 4 6 After sorting the 1,000 values of from smallest to largest, we obtain the 25th smallest and 25th largest values ⫺3.288 and 1.776, respectively. We thus have the following percentiles: .025 ⫽ ⫺ 3.288 and .975 ⫽ 1.776 The 95 confidence interval for the mean cotanine concentration is given here using the original sample mean of ⫽ 105.75 and sample standard deviation s ⫽ 82.459: A comparison of these two percentiles to the percentiles from the t distribution Table 2 in the Appendix reveals how much in error our confidence intervals would have been if we would have directly applied the formulas from Section 5.7. From Table 2 in the Appendix, with df ⫽ 19, we have t .025 ⫽ ⫺ 2.093 and t .975 ⫽ 2.093. This would yield a 95 confidence interval on m of Note that the confidence interval using the t distribution is centered about the sam- ple mean; whereas, the bootstrap confidence interval has its lower limit further from the mean than its upper limit. This is due to the fact that the random sample from the population indicated that the population distribution was not symmetric. Thus, we would expect that the sampling distribution of our statistic would not be symmetric due to the relatively small size, n ⫽ 20. We will next apply the bootstrap approximation of the test statistic to obtain a test of hypotheses for the situation where n is relatively small and the population distribution is nonnormal. The method for obtaining the p-value for the bootstrap approximation to the sampling distribution of the test statistic under t ⫽ y ⫺ m s 兾 1n 105.75 ⫾ 2.093 82.429 120 1 67.17, 144.33 1 45.15, 138.48 冢 y ⫺ ˆt .025 s 1n , y ⫹ ˆt .975 s 1n 冣 1 冢 105.75 ⫺ 3.288 82.429 120 , 105.75 ⫹ 1.776 82.459 120 冣 y ˆt ˆt ˆt the null value of m, m involves the following steps: Suppose we want to test the fol- lowing hypotheses: H : m ⱕ m versus H a : m ⬎ m 1. Select a random sample y 1 , y 2 , . . . , y n of size n from the population and compute the value of . 2. Select a random sample of size n, with replacement from y 1 , y 2 , . . . , y n and compute the mean and standard deviation of .3.
Compute the value of the statistic 4. Repeat Steps 1– 4 a large number of times B to form the approximate sampling distribution of . 5. Let m be the number of values of the statistic that are greater than or equal to the value t computed from the original sample. 6. The bootstrap p-value is . When the hypotheses are H : m ⱖ m versus H a : m ⬍ m , the only change would be to let m be the number of values of the statistic that are less than or equal to the value t computed from the original sample. Finally, when the hypotheses are H : m ⫽ m versus H a : m ⫽ m , let m L be the number of values of the statistic that are less than or equal to the value t computed from the original sample and m U be the number of values of the statistic that are greater than or equal to the value t computed from the original sample. Compute and . Take the p-value to be the minimum of 2p L and 2p U . A point of clarification concerning the procedure described above: The boot- strap test statistic replaces m with the sample mean from the original sample. Recall that when we calculate the p-value of a test statistic, the calculation is always done under the assumption that the null hypothesis is true. In our bootstrap procedure, this requirement results in the bootstrap test statistic having m replaced with the sample mean from the original sample. This ensures that our bootstrap approxima- tion of the sampling distribution of the test statistic is under the null value of m, m . EXAMPLE 5.19 Refer to Example 5.18. The CPS personnel wanted to determine if the mean cota- nine level was greater than 75 for children under their supervision. Based on the sample of 20 children and using a ⫽ .05, do the data support the contention that the mean exceeds 75? Solution The set of hypotheses that we want to test are H : m ⱕ 75 versus H : m ⬎ 75 Because there was a strong indication that the distribution of contanine levels in the population of children under CPS supervision was not normally distributed and because the sample size n was relatively small, the use of the t distribution to compute the p-value may result in a very erroneous decision based on the observed data. Therefore, we will use the bootstrap procedure. First, we calculate the value of the test statistic in the original data: t ⫽ y ⫺ m s 兾 1n ⫽ 105.75 ⫺ 75 82.429 兾 120 ⫽ 1.668 p U ⫽ m U B p L ⫽ m L B ˆt ˆt ˆt m B ˆt y ⫺ m s 兾 1n ˆt ⫽ y ⫺ y s 兾 1n y 1 , y 2 , . . . , y n s y t ⫽ y ⫺ m s 兾 1n Next, we use the 1,000 bootstrap samples generated in Example 5.18, to determine the number of samples, m, with greater than 1.668. From the 1,000 values of , we find that m ⫽ 33 of the B ⫽ 1,000 values of exceeded 1.668. Therefore, our p-value ⫽ m 兾B ⫽ 33兾1000 ⫽ .033 ⬍ .05 ⫽ a. Therefore, we conclude that there is sufficient evidence that the mean cotanine level exceeds 75 in the population of children under CPS supervision. It is interesting to note that if we had used the t distribution with 19 degrees of freedom to compute the p-value, the result would have produced a different conclusion. From Table 2 in the Appendix, p-value ⫽ Pr[t ⱖ 1.668] ⫽ .056 ⬎ .05 ⫽ a Using the t-tables, we would conclude there is insufficient evidence in the data to support the contention that the mean cotanine exceeds 75. The small sample size, n ⫽ 20, and the possibility of non-normal data would make this conclusion suspect. Minitab Steps for Obtaining Bootstrap Sample The steps needed to generate the bootstrap samples are relatively straightforward in most software programs. We will illustrate these steps using the Minitab software. Suppose we have a random sample of 25 observations from a population. We want to generate 1,000 bootstrap samples each consisting of 25 randomly selected with replacement data samples from the original 25 data values. 1. Insert the original 25 data values in column C1. 2. Choose Calc → Calculator. a. Select the expression MeanC1. b. Place K1 in the “Store result in variable:” box. c. Select the expression STDEVC1. d. Place K2 in the “Store result in variable:” box. e. The constants Kl and K2 now contain the mean and standard deviation of the orginal data.3.
Choose Calc → Random Data rightarrow Sample From Columns. 4. Fill in the menu with the following: a. Check the box Sample with Replacement. b. Store 1,000 rows from Columns C1. c. Store samples in: Columns C2. 5. Repeat the above steps by replacing C2 with C3. 6. Continue repeating the above step until 1,000 data values have been placed in columns C2 –C26. a. The first row of columns, C2 –C26, represents Bootstrap Sample 1, the second row of columns, C2 –C26, represents Bootstrap Sample 2, . . . , row 1,000 represents Bootstrap Sample 1,000. 7. To obtain the mean and standard deviation of each of the 1,000 samples and store them in columns C27 and C28, respectively, follow the following steps: a. Choose Calc → Row Statistics, then fill in the menu with b. Click on Mean. c. Input variables: C2 –C26. d. Store result in: C27. e. Choose Calc → Row Statistics, then fill in the menu with f. Click on Standard Deviation. g. Input variables: C2 –C26. h. Store result in: C28. ˆt ˆt ⫽ y ⫺ 105.75 s 兾120 ˆt ⫽ y ⫺ y s 兾1n The 1,000 bootstrap sample means and standard deviations are now stored in C27 and C28. The sampling distribution of the sample mean and the t statistics can now be obtained from C27 and C28 by graphing the data in C27 using a histogram and calculating the 1,000 values of the t statistic using the following steps: 1. Choose Calc → Calculator. 2. Store results in C29.3.
In the Expression Box: C27-K1C28sqrt25. The 1,000 values of the t statistics are now stored in C29. Next, sort the data in C29 by the following steps: 1. Select Data → Sort. 2. Column C29.3.
By C29. 4. Click on Original Columns. The percentiles and p-values can now be obtained from these sorted values.5.9 Inferences about the Median
When the population distribution is highly skewed or very heavily tailed, the median is more appropriate than the mean as a representation of the center of the population. Furthermore, as was demonstrated in Section 5.7, the t procedures for constructing confidence intervals and for tests of hypotheses for the mean are not appropriate when applied to random samples from such populations with small sample sizes. In this section, we will develop a test of hypotheses and a confidence interval for the pop- ulation median that will be appropriate for all types of population distributions. The estimator of the population median M is based on the order statistics that were discussed in Chapter 3. Recall that if the measurements from a random sample of size n are given by y 1 , y 2 , . . . , y n , then the order statistics are these values ordered from smallest to largest. Let y 1 ⱕ y 2 ⱕ . . . ⱕ y n represent the data in ordered fashion. Thus, y 1 is the smallest data value and y n is the largest data value. The estimator of the population median is the sample median Recall that is computed as follows: If n is an odd number, then ⫽ y m , where m ⫽ n ⫹ 1 兾2. If n is an even number, then ⫽ y m ⫹ y m⫹1 兾2, where m ⫽ n兾2. To take into account the variability of as an estimator of M, we next con- struct a confidence interval for M. A confidence interval for the population me- dian M may be obtained by using the binomial distribution with p ⫽ 0.5. ˆ M ˆ M ˆ M ˆ M ˆ M. 1001 ⴚ ␣ Confidence Interval for the Median A confidence interval for M with level of confidence at least 1001 ⫺ a is given by where L a 兾2 ⫽ C a 2,n ⫹ 1 U a 兾2 ⫽ n ⫺ C a 2,n M L , M U ⫽ y L a 兾2 , y U a 兾2 Table 4 in the Appendix contains values for C a 2,n , which are percentiles from a binomial distribution with p ⫽ .5. Because the confidence limits are computed using the binomial distribution, which is a discrete distribution, the level of confidence of M L , M U will generally be somewhat larger than the specified 1001 ⫺ a. The exact level of confidence is given by Level ⫽ 1 ⫺ 2Pr[Binn, .5 ⱕ C a 2,n ] The following example will demonstrate the construction of the interval. EXAMPLE 5.20 The sanitation department of a large city wants to investigate ways to reduce the amount of recyclable materials that are placed in the city’s landfill. By separating the recyclable material from the remaining garbage, the city could prolong the life of the landfill site. More important, the number of trees needed to be harvested for paper products and the aluminum needed for cans could be greatly reduced. From an analy- sis of recycling records from other cities, it is determined that if the average weekly amount of recyclable material is more than 5 pounds per household, a commercial recycling firm could make a profit collecting the material. To determine the feasibility of the recycling plan, a random sample of 25 households is selected. The weekly weight of recyclable material in poundsweek for each household is given here. 14.2 5.3 2.9 4.2 1.2 4.3 1.1 2.6 6.7 7.8 25.9 43.8 2.7 5.6 7.8 3.9 4.7 6.5 29.5 2.1 34.8 3.6 5.8 4.5 6.7 Determine an appropriate measure of the amount of recyclable waste from a typi- cal household in the city. Normal probability plot of recyclable wastes .999 .99 .95 .80 .50 .20 .05 .01 .001 Probability20 30
10 40 Recyclable waste pounds per week Boxplot of recyclable wastes 45 40 35 30 25 20 15 10 5 R e cycl ab l e w a st e s poun d s p er w ee k FIGURE 5.22a Boxplot for waste data FIGURE 5.22b Normal probability plot for waste dataParts
» Introduction 2 Why Study Statistics? 6
» Some Current Applications of Statistics 8 A Note to the Student 12
» Summary 13 Exercises 13 Lyman Ott Michael Longnecker
» Introduction and Abstract of Research Study 16 Observational Studies 18
» Sampling Designs for Surveys 24 Experimental Studies 30
» Designs for Experimental Studies 35 Research Study: Exit Polls versus Election Results 46
» Summary 47 Exercises 48 Lyman Ott Michael Longnecker
» Introduction and Abstract of Research Study 56 Calculators, Computers, and Software Systems 61
» Describing Data on a Single Variable: Measures of Variability 85 The Boxplot 97
» Summary and Key Formulas 116 Introduction and Abstract of Research Study 140
» Finding the Probability of an Event 144 Basic Event Relations and Probability Laws 146
» Conditional Probability and Independence 149 Bayes’ Formula 152
» Variables: Discrete and Continuous 155 Probability Distributions for Discrete Random Variables 157
» A Continuous Probability Distribution: The Normal Distribution 171 Random Sampling 178
» Sampling Distributions 181 Normal Approximation to the Binomial 191
» Minitab Instructions 201 Summary and Key Formulas 203
» Exercises 203 Introduction and Abstract of Research Study 222
» Estimation of m 225 Choosing the Sample Size for Estimating m 230
» A Statistical Test for m 232 Research Study: Effects of Oil Spill on Plant Growth 325
» Summary and Key Formulas 330 Introduction and Abstract of Research Study 360
» Summary and Key Formulas 386 Introduction and Abstract of Research Study 402
» Checking on the AOV Conditions 416 An Alternative Analysis: Transformations of the Data 421
» Summary and Key Formulas 436 Introduction and Abstract of Research Study 451
» Measuring Strength of Relation 528 Odds and Odds Ratios 530
» Summary and Key Formulas 545 Introduction and Abstract of Research Study 572
» Estimating Model Parameters 581 Inferences about Regression Parameters 590
» Predicting New y Values Using Regression 594 Examining Lack of Fit in Linear Regression 598
» The Inverse Regression Problem Calibration 605 Correlation 608
» Research Study: Two Methods for Detecting E. coli 616 Summary and Key Formulas 621
» Introduction and Abstract of Research Study 664 The General Linear Model 674
» Estimating Multiple Regression Coefficients 675 Inferences in Multiple Regression 683
» Testing a Subset of Regression Coefficients 691 Introduction and Abstract of Research Study 878
» The Extrapolation Problem 1023 Introduction and Abstract of Research Study 1091
» Exercises 1160 Lyman Ott Michael Longnecker
» Introduction Lyman Ott Michael Longnecker
» Why Study Statistics? Lyman Ott Michael Longnecker
» Some Current Applications of Statistics
» Introduction and Abstract of Research Study
» Observational Studies Lyman Ott Michael Longnecker
» Sampling Designs for Surveys
» Experimental Studies Lyman Ott Michael Longnecker
» Designs for Experimental Studies
» Research Study: Exit Polls versus Election Results
» Summary Lyman Ott Michael Longnecker
» Exercises Lyman Ott Michael Longnecker
» A prospective study is conducted to study the relationship between incidence of
» A study was conducted to examine the possible relationship between coronary disease
» A hospital introduces a new screening procedure to identify patients suffering from
» A high school mathematics teacher is convinced that a new software program will
» Do you think the two types of surveys will yield similar results on the percentage of
» What types of biases may be introduced into each of the surveys? Edu.
» Each name is randomly assigned a number. The names with numbers 1 through 1,000
» The Environmental Protection Agency EPA is required to inspect landfills in the
» factors b. factor levels blocks d. experimental unit measurement unit f. replications treatments
» A horticulturalist is measuring the vitamin C concentration in oranges in an orchard
» A medical specialist wants to compare two different treatments T
» In the design described in b make the following change. Within each hospital, the
» An experiment is planned to compare three types of schools—public, private-
» measurement unit f. replications
» treatments Lyman Ott Michael Longnecker
» The 48 treatments comprised 3, 4, and 4 levels of fertilizers N, P, and K, respectively,
» Ten different software packages were randomly assigned to 30 graduate students. The
» Four different glazes are applied to clay pots at two different thicknesses. The kiln
» There are two possible experimental designs. Design A would use a random sample
» When asked how the experiment is going, the researcher replies that one recipe smelled
» What information should be collected from the workers? Bus.
» How would you obtain a list of private doctors and medical facilities so that a sample
» What are some possible sources for determining the population growth and health
» How could you sample the population of health care facilities and types of private
» Calculators, Computers, and Software Systems
» Describing Data on a Single Variable: Graphical Methods
» Describing Data on a Single Variable:
» The Boxplot Lyman Ott Michael Longnecker
» Summarizing Data from More Than One Variable:
» Research Study: Controlling for Student Background
» Construct a pie chart for these dat b. Construct a bar chart for these dat
» If one of these 25 days were selected at random, what would be the chance probability
» Would you describe per capita income as being fairly homogenous across the
» Construct separate relative frequency histograms for the survival times of both the
» Compare the two histograms. Does the new therapy appear to generate a longer
» Plot the defense expenditures time-series data and describe any trends across the time
» Plot the four separate time series and describe any trends in the separate time
» Do the trends appear to imply a narrowing in the differences between male and
» Construct a relative frequency histogram plot for the homeownership data given in the
» How could Congress use the information in these plots for writing tax laws that allow
» Describing Data on a Single Variable: Measures of Central Tendency
» What does the relationship among the three measures of center indicate about the
» Which of the three measures would you recommend as the most appropriate repre-
» Using your results from parts a and b, comment on the relative sensitivity of the
» Which measure, mean or median, would you recommend as the most appropriate
» What measure or measures best summarize the center of these distributions?
» Describing Data on a Single Variable: Measures of Variability
» Verify that the mean years’ experience is 5 years. Does this value appear to ade-
» Verify that c. Calculate the sample variance and standard deviation for the experience data.
» Calculate the coefficient of variation CV for both the racer’s age and their years of
» Estimate the standard deviations for both the racer’s age and their years of experience
» Might another measure of variability be better to compare luxury and budget hotel
» Generate a time-series plot of the mercury concentrations and place lines for both
» Select the most appropriate measure of center for the mercury concentrations. Compare
» Compare the variability in mercury concentrations at the two sites. Use the CV in
» When comparing the center and variability of the two sites, should the years
» From these graphs, determine the median, lower quartile, and upper quartile for the
» Comment on the similarities and differences in the distributions of daily costs for the
» Compare the mean and median for the 3 years of dat Which value, mean or median,
» Compare the degree of variability in homeownership rates over the 3 years. Soc.
» Use this side-by-side boxplot to discuss changes in the median homeownership rate
» Use this side-by-side boxplot to discuss changes in the variation in these rates over the
» Construct the intervals Lyman Ott Michael Longnecker
» Why do you think the Empirical Rule and your percentages do not match well? Edu.
» Data analysts often find it easier to work with mound-shaped relative frequency his-
» Refer to the data of Exercise 3.49. a. Compute the sample median and the mode.
» Refer to the data of Exercise 3.49. a. Compute the interquartile range.
» Find the 20th percentile for the homeownership percentage and interpret this value
» Congress wants to designate those states that have the highest homeownership percent-
» Similarly identify those states that fall into the upper 10th percentile of homeownership
» Can the combined median be calculated from the medians for each number of members? Gov.
» The DJIA is a summary of data. Does the DJIA provide information about a popula-
» Interpret the values 62, 32.8, 41.3, and 7.8 in the upper left cell of the cross tabulation.
» Does there appear to be a difference in the relationships between the seizure counts
» Describe the type of apparent differences, if any, that you found in a.
» Predict the effect of removing the patient with ID 207 from the data set on the size of
» Using a computer program, compute the correlations with patient ID 207 removed
» Is there support for the same conclusions for the math scores as obtained for the read-
» If the conclusions are different, why do you suppose this has happened? Med.
» Why is it not possible to conclude that large relative values for minority and
» List several variables related to the teachers and students in the schools which may be
» Construct a scatterplot of the number of AIDS cases versus the number of tuberculo-
» Compute the correlation between the number of AIDS cases and the number of
» Why do you think there may be a correlation between these two diseases? Med.
» Compute the correlation between the number of syphilis cases and the number of
» Identify the states having number of tuberculosis cases that are above the 90th
» Compute the correlation coefficient for this data set. Is there a strong or weak rela-
» Finding the Probability of an Event
» Basic Event Relations and Probability Laws
» b. Lyman Ott Michael Longnecker
» Conditional Probability and Independence
» Bayes’ Formula Lyman Ott Michael Longnecker
» Variables: Discrete and Continuous
» Probability Distributions for Discrete Random Variables
» Two Discrete Random Variables:
» Probability Distributions for Continuous
» A Continuous Probability Distribution:
» Random Sampling Lyman Ott Michael Longnecker
» Sampling Distributions Lyman Ott Michael Longnecker
» Normal Approximation to the Binomial
» Evaluating Whether or Not a Population
» Research Study: Inferences about Performance-
» Minitab Instructions Lyman Ott Michael Longnecker
» The National Angus Association has stated that there is a 60
» The quality control section of a large chemical manufacturing company has under-
» A new blend of coffee is being contemplated for release by the marketing division of a
» The probability that a customer will receive a package the day after it was sent by a
» The sportscaster in College Station, Texas, states that the probability that the Aggies
» Let a two-digit number represent an individual running of the screening test. Which
» If we generate 2,000 sets of 20 two-digit numbers, how can the outcomes of this simula-
» The state consumers affairs office provided the following information on the frequency of
» What is the probability that a randomly selected car will need some repairs?
» Suppose you purchase a lottery ticket. What is the probability that your 3-digit number
» Which of the probability approaches subjective, classical, or relative frequency did
» In Exercise 4.10, assume that each one of the outcomes has probability 1
» A: Observing exactly 1 head b. B: Observing 1 or more heads
» C: Observing no heads 4.12 For Exercise 4.11:
» A: Observe a 6 b. B: Observe an odd number
» C: Observe a number greater than 3 d. D: Observe an even number and a number greater than 2
» Complement of A b. Either A or B
» A volunteer blood donor walks into a Red Cross Blood office. What is the probability
» It is brown. b. It is red or green.
» It is not blue. d. It is both red and brown. b.
» Refer to Exercise 4.11. a. Are the events A and B independent? Why or why not?
» Which pairs of the events A B, B C, and A C are mutually exclusive? Justify
» Describe in words the event T
» Compute the probability of the occurrence of the event T
» What is the probability that a professional selected at random would accept the
» What is the probability that a professional selected at random is part of a two-
» Refer to Exercise 4.23 a. Are events A and B independent?
» Find Find PA, PB, and PC. b. Find , , and .
» Find , , Lyman Ott Michael Longnecker
» Suppose two customers are chosen at random from the list of all customers. What is
» Find the probability that a customer selected at random will pay two consecutive
» Find the probability that a customer selected at random will pay neither of two
» Find the probability that a customer chosen at random will pay exactly one month in full.
» In Example 4.4, compute the probability that the test incorrectly identifies the defects D
» Find the probability that a patient truly did not have appendicitis given that the radio-
» The thickness of ice 20 feet from the shoreline in Lake Superior during a random day
» Is the number of cars running a red light during a given light cycle a discrete or
» Is the time between the light turning red and the last car passing through the inter-
» Are the number of students in class responding Strongly agree a continuous or discrete
» Are the percent of students in class responding Strongly agree a continuous or discrete
» Construct a graph of Py. b. Find .
» Find . d. Find . Lyman Ott Michael Longnecker
» Suppose the fire station must call for additional equipment from a neighboring city
» A biologist randomly selects 10 portions of water, each equal to .1 cm
» All 10 automobiles failed the inspection. b. Exactly 6 of the 10 failed the inspection.
» Six or more failed the inspection. d. All 10 passed the inspection.
» Two are rated as outstanding. b. Two or more are rated as outstanding.
» Py ⫽ 1 given m ⫽ 3.0 b. Py ⬎ 1 given m ⫽ 2.5
» No cars arrive. b. More than one car arrives.
» Write an expression for the probability that there are less than six sales, do not com-
» What assumptions are needed to write the expression in part a?
» Use a computer program if available to compute the exact probability that less than
» z ⫽ 0 and z ⫽ 1.6 b. z ⫽ 0 and z ⫽ 2.3
» Repeat Exercise 4.53 for these values: a. z ⫽ .7 and z ⫽ 1.7
» z ⫽ ⫺1.2 and z ⫽ 0 4.55 Repeat Exercise 4.53 for these values:
» z ⫽ ⫺1.29 and z ⫽ 0 b. z ⫽ ⫺.77 and z ⫽ 1.2
» Repeat Exercise 4.53 for these values: a. z ⫽ ⫺1.35 and z ⫽ ⫺.21
» z ⫽ ⫺.37 and z ⫽ ⫺1.20 4.57 Find the probability that z is greater than 1.75.
» Find the probability that z is less than 1.14. 4.59 Find a value for z, say z
» Find a value for z, say z , such that Pz ⬎ z
» Find a value for z, say z , such that P⫺z
» Py ⬎ 100 b. Py ⬎ 105 Lyman Ott Michael Longnecker
» P100 ⬍ y ⬍ 108 Lyman Ott Michael Longnecker
» P304 ⬍ y ⬍ 665 d. k such that P500 ⫺ k ⬍ y ⬍ 500 ⫹ k ⫽ .60
» Convert y ⬎ 85 to the z-score equivalent. c. Find Py ⬍ 115 and Py ⬎ 85.
» Find the value of z for these areas. a. an area .025 to the right of z
» an area .05 to the left of z
» Find the probability of observing a value of z greater than these values. a. 1.96
» 2.21 c. ⫺2.86 Lyman Ott Michael Longnecker
» ⫺0.73 Lyman Ott Michael Longnecker
» What is the probability that the elapsed time between submission and reimbursement
» If you had a travel voucher submitted more than 55 days ago, what might you
» Greater than 600 b. Greater than 700
» Less than 450 d. Between 450 and 600
» Py ⬍ 200 b. Py ⬎ 100 Lyman Ott Michael Longnecker
» Using either a random number table or a computer program, generate a second ran-
» Give several reasons why you need to generate a different set of random numbers for
» Refer to Exercise 4.77. Describe the sampling distribution for the sample sum . Is it un-
» What fraction of the patients scored between 800 and 1,100? b. Less than 800?
» Greater than 1,200? So Lyman Ott Michael Longnecker
» Use the Empirical Rule to describe the distribution of y, the number of patients
» If the facility was built with a 160-patient capacity, what fraction of the weeks might
» What proportion of the population spends more than 7 hours per day watching
» In a 1998 study of television viewing, a random sample of 500 adults reported that the
» If the EPA mandates that a nitrogen oxide level of 2.7 gm cannot be exceeded, what
» At most, 25 of Polluters exceed what nitrogen oxide level value that is, find the
» The company producing the Polluter must reduce the nitrogen oxide level so that
» If a patient’s systolic readings during a given day have a normal distribution with
» If five measurements are taken at various times during the day, what is the probability
» How many measurements would be required so that the probability is at most 1 of
» The expected number of errors b. The probability of observing fewer than four errors
» The probability of observing more than two errors
» Compute the exact probabilities and corresponding normal approximations for y
» The normal approximation can be improved slightly by taking Py
» Compute the exact probabilities and corresponding normal approximations with the
» Let y be a binomial random variable with n
» Calculate P4 Lyman Ott Michael Longnecker
» Use a normal approximation without the continuity correction to calculate the
» Refer to Exercise 4.89. Use the continuity correction to compute the probability P4
» Does it appear that the 45 data values appear to be a random sample from a normal
» Compute the correlation coefficient and p-value to assess whether the data appear to
» Use the 45 sample means to determine whether the sampling distribution of the
» Compute the correlation coefficient and p-value to assess whether the 45 means
» Use a normal quantile plot to assess whether the data appear to fit a normal
» Compute the correlation coefficient and p-value for the normal quantile plot.
» Find the probability of selecting a 1-foot-square sample of material at random that on
» Describe the sampling distribution for based on random samples of 15 1-foot sections.
» What percentage of the males in this age bracket could be expected to have a serum
» Estimation of M Lyman Ott Michael Longnecker
» Choosing the Sample Size for Estimating M
» Choosing the Sample Size for Testing M
» The Level of Significance of a Statistical Test
» Inferences about M for a Normal Population,
» Inferences about M When Population Is Nonnormal
» Research Study: Percent Calories from Fat
» What characteristics of the nurses other than dietary intake might be important in
» Calculate a 95 confidence interval for the mean caffeine content m of the coffee pro-
» Explain to the CEO of the company in nonstatistical language, the interpretation of
» What would happen to the width of the confidence intervals if the level of confidence
» What would happen to the width of the confidence intervals if the number of samples
» If the level of confidence remains at 95 for the 720 confidence intervals in a given
» If the number of samples is increased from 50 to 100 each hour, how many of the
» If the number of samples remains at 50 each hour but the level of confidence is in-
» Construct a 99 confidence interval for the mean gross profit margin of m of all small
» The city manager reads the report and states that the confidence interval for m con-
» If the level of confidence remains at 99 but the tolerable width of the interval is .4,
» If the level of confidence decreases to 95 but the specified width of the interval
» If the level of confidence increases to 99.5 but the specified width of the interval
» If the level of confidence is increased to 99 with the average rent estimated to
» Suppose the budget for the project will not support both increasing the level of confi-
» Using a ⫽ .05, what conclusions can you make about the hypotheses based on the
» Refer to Exercise 5.18. Sketch the power curve for rejecting H : m ⱖ 28 by determining
» Suppose we keep a ⫽ .05 but change to n ⫽ 20. Without actually recalculating the
» How many of the 100 tests of hypotheses resulted in your reaching the decision to
» Suppose you were to conduct 100 tests of hypotheses and in each of these tests the
» What type of error are you making if you incorrectly reject H ?
» What proportion of the 100 tests of hypotheses resulted in the correct decision, that
» In part a, you were estimating the power of the test when m
» Based on your calculation in b how many of the 100 tests of hypotheses would you
» Did decreasing a from .05 to .01 increase or decrease the power of the test? Explain
» Refer to Exercise 5.23. Compute the power of the test PWRm
» Refer to Exercise 5.26. Suppose a random sample of 100 students is selected yielding
» Is there sufficient evidence a ⫽ .05 in the data that the mean lead concentration
» Based on your answer in c, is the sample size large enough for the test procedures to
» Place a 95 confidence interval on the mean reading time for all incoming freshmen
» Plot the reading time using a normal probability plot or boxplot. Do the data appear
» What are some weak points in this study relative to evaluating the potential of the
» Place a 99 confidence interval on the average number of miles driven, m, prior to
» Is there significant evidence a ⫽ .01 that the manufacturer’s claim is false? What is
» Is there a contradiction between the interval estimate of m and the conclusion
» Test the research hypothesis that the mean oxygen level is less than 5 ppm. What is
» Assuming the eighteen 2-week periods are fairly typical of the volumes throughout
» Construct a 90 confidence interval for the mean change in mileage. On the basis of
» Refer to Exercise 5.45. a. Calculate the probability of a Type II error for several values of m
» Suggest some changes in the way in which this study in Exercise 5.45 was conducted.
» Use a computer program to obtain 1,000 bootstrap samples from the 20 comprehen-
» Use a computer program to obtain 1,000 bootstrap samples from the 15 tire wear
» Use a computer program to obtain 1,000 bootstrap samples from the 8 oxygen levels.
» Use a computer program to obtain 1,000 bootstrap samples from the 18 recycle vol-
» Compare the p-value from part a to the p-value obtained in Exercise 5.44. x
» Use Table 4 in the Appendix to obtain L
» Use the large-sample approximation to determine L
» Graph the data using a boxplot or normal probability plot and determine whether the
» Based on your answer to part a, is the mean or the median cost per household a
» Place a 95 confidence interval on the amount spent on health care by the typical
» Does the typical worker spend more than 400 per year on health care needs? Use
» Is there sufficient evidence that a blood-alcohol level of .1 causes any increase in
» Which summary of reaction time differences seems more appropriate, the mean or
» Is there sufficient evidence that the median difference in reaction time is greater than
» What other factors about the drivers are important in attempting to decide whether
» For both fund A and fund B, estimate the mean and median annual rate of return and
» Which of the parameters, the mean or median, do you think best represents the
» Is there sufficient evidence that the mean annual rate of return for the two mutual
» Suppose the population has a distribution that is highly skewed to the right. The
» When testing hypotheses about the mean or median of a highly skewed population, the
» When testing hypotheses about the mean or median of a lightly skewed population,
» Construct a 95 confidence on the mean time to handle a complaint after imple-
» Is there sufficient evidence that the incentive plan has reduced the mean time to handle
» Is there sufficient evidence that the mean mercury concentration has increased since
» Assuming that the standard deviation of the mercury concentration is .32 mgm
» If the mean number of days to birth beyond the due date was 13 days prior to the in-
» What factors may be important in explaining why the doctors’ projected due dates
» Using a graphical display, determine whether the data appear to be a random sample
» Estimate the mean dissolution rate for the batch of tablets, for both a point estimate
» Is there significant evidence that the batch of pills has a mean dissolution rate less
» Calculate the probability of a Type II error if the true dissolution rate is 19.6 mg. Bus.
» Estimate the typical amount of ore produced by the mine using both a point estimate
» Is there significant evidence that on a typical day the mine produces more than
» Inferences about Lyman Ott Michael Longnecker
» A Nonparametric Alternative: Lyman Ott Michael Longnecker
» Inferences about M Lyman Ott Michael Longnecker
» Choosing Sample Sizes for Inferences about
» Research Study: Effects of Oil Spill on Plant Growth
» Describe a method for randomly selecting the tracts where flora density measurements
» State several hypotheses that may be of interest to the researchers. H
» H : m Lyman Ott Michael Longnecker
» Refer to the data of Exercise 6.3. a. Give the level of significance for your test.
» Place a 95 confidence interval on m
» Do the data provide sufficient evidence that rats exposed to a 5°C environment have a
» Do the data provide sufficient evidence that successful companies have a lower percent-
» How large is the difference between the percentage of returns for successful and unsuc-
» Identify the value of the pooled-variance t statistic the usual t test based on the equal
» Is there significant evidence that there is a difference in the distribution of SB2M for
» Discuss the implications of your findings in part c on the evaluation of the influence
» What has a greater effect, if any, on the level of significance of the t test, skewness or
» What has a greater effect, if any, on the level of significance of the Wilcoxon rank sum
» What has a greater effect, if any, on the power of the Wilcoxon rank sum test, skewness
» For what type of population distributions would you recommend using the Wilcoxon H H H
» Consider the data given here. Pair
» Conduct a paired t test of H
» Using a testing procedure related to the binomial distribution, test the
» Give the level of significance for your test. b. Place a 95 confidence interval on m
» Plot the pairs of observations in a scatterplot with the 1982 values on the horizontal
» Compute the correlation coefficient between the pair of observations.
» Answer the questions posed in Exercise 6.11 parts a and b using a paired data
» Consider the data given in Exercise 6.23. a. Conduct a Wilcoxon signed-rank test of H
» Compare your conclusions here to those given in Exercise 6.23. Does it matter which
» Refer to the data of Exercise 6.31. a. Give the level of significance for your test.
» Place a 95 confidence interval on the median difference, M.
» Use the level and power values for the paired t test and Wilcoxon signed-rank test given in
» Which type of deviations from a normal distribution, skewness or heavy-tailedness,
» For small sample sizes, n ⱕ 20, does the actual level of the Wilcoxon signed-rank test
» Suppose a boxplot of the differences in the pairs from a paired data set has many out-
» Suppose the sample sizes are the same for both groups. What sample size is needed
» Suppose the user group will have twice as many patients as the placebo group. What
» How many chemical plant cooling towers need to be measured if we want a probabil-
» What assumptions did you make in part a in order to compute the sample size? Env.
» Is there sufficient evidence to support the conjecture that ozone exposure increases
» Estimate the size of the increase in lung capacity after exposure to ozone using a 95
» After completion of the study, the researcher claimed that ozone causes increased
» Estimate the size of the difference in mean noise level between the two types of jets
» How would you select the jets for inclusion in this study? Ag.
» An entomologist is investigating which of two fumigants, F
» Estimate the size of the difference in the mean number of parasites between the two
» After combining the data from the two depths, does there appear to be a difference in
» Estimate the size of the difference in the mean population abundance at the two
» Refer to Exercise 6.46. Answer the following questions using the combined data for both depths.
» Use the Wilcoxon rank sum test to assess whether there is a difference in population
» Discuss any differences in the conclusions obtained using the t-procedures and the
» Plot the four data sets using side-by-side boxplots to demonstrate the effect of depth
» Separately for each depth, evaluate differences between the sites within and outside
» Discuss the veracity of the following statement: “The oil spill did not adversely affect the
» A possible criticism of the study is that the six sites outside the oil trajectory were not
» What are some possible problems with using the before and after oil spill data in
» Compare the mean drop in blood pressure for the high-dose group and the control
» Estimate the size of the difference in the mean drop for the high-dose and control
» Do the conditions required for the statistical techniques used in a and b appear to
» Estimate the size of the difference in the mean drop for the low-dose and control
» Estimate the size of the difference in the mean drop for the low-dose and high-dose
» If we tested each of the three sets of hypotheses at the .05 level, estimate the experiment-
» Suggest a procedure by which we could be ensured that the experiment-wide Type I
» Can the licensing board conclude that the mean score of nurses who receive a BS in
» The mean test scores are considered to have a meaningful difference only if they differ
» State the null and alternative hypotheses in
» Estimate the size of the difference in campaign expenditures for female and male
» What is the level of significance of the test for a change in mean pH after reclamation
» The land office assessed a fine on the mining company because the t test indicated a
» A Type I error? b. A Type II error?
» Both a Type I and a Type II error? d. Neither a Type I nor a Type II error?
» Refer to Exercise 6.60. Suppose we wish to test the research hypothesis that m
» Do the data support the conjecture that progabide reduces the mean number of
» Determine the sample size so that we are 95 confident that the estimate of the
» Estimation and Tests for a Population Variance
» Estimation and Tests for Comparing
» For the E. coli research study, answer the following. a. What are the populations of interest?
» What are some factors other than the type of detection method HEC versus HGMF
» Describe a method for randomly assigning the E. coli samples to the two devices for
» State several hypotheses that may be of interest to the researchers.
» Find Py ⬎ 52.62. d. Find Py ⬍ 10.52.
» Find Py ⬎ 34.38. e. Find P10.52 ⬍ y ⬍ 34.38.
» For a chi-square distribution with df ⫽ 80, compare the actual values given in Table 7
» Suppose that y has a chi-square distribution with df ⫽ 277. Find approximate values
» If the process yields jars having a normal distribution with a mean of 32.30 ounces and
» Does the plot suggest any violation of the conditions necessary to use the chi-square
» Place bounds on the p-value of the test. Engin.
» Does the boxplot suggest any violation of the conditions necessary to use the chi-square
» Estimate the standard deviation in the speeds of the vehicles on the interstate using a
» Do the data indicate at the 5 level that the standard deviation in vehicle speeds
Show more