Estimation and Tests for a Population Variance
10 20
30 40 50 60 70 .02 .04 .06 .08 .10 .12 .14 .16 Chi-square density Value of chi-square FIGURE 7.3 Critical values of the chi-square distribution with df ⫽ 14 .95 .025 .025 f 2 2 5.629 Value of chi-square 26.12 FIGURE 7.4 Upper-tail and lower-tail values of chi-square 2 2 2 f 2 2 2 U L General Confidence Interval for 2 or with Confidence Coefficient 1 ⴚ ␣ where is the upper-tail value of chi-square for with area to its right, and is the lower-tail value with area to its left see Figure 7.4. We can determine and for a specific value of df by obtaining the critical value in Table 7 of the Appendix corresponding to respec- tively. Note: The confidence interval for is found by taking square roots throughout. s a 兾2 and 1 ⫺ a兾2, x 2 L x 2 U a 兾2 x 2 L a 兾2 df ⫽ n ⫺ 1 x 2 U n ⫺ 1s 2 x 2 U ⬍ s 2 ⬍ n ⫺ 1s 2 x L 2 EXAMPLE 7.1 The machine that fills 500-gram coffee containers for a large food processor is monitored by the quality control department. Ideally, the amount of coffee in a container should vary only slightly about the nominal 500-gram value. If the varia- tion was large, then a large proportion of the containers would be either under- filled, thus cheating the customer, or overfilled, thus resulting in economic loss to the company. The machine was designed so that the weights of the 500-gram con- tainers would have a normal distribution with mean value of 506.6 grams and a standard deviation of 4 grams. This would produce a population of containers in which at most 5 of the containers weighed less than 500 grams. To maintain a population in which at most 5 of the containers are underweight, a random sam- ple of 30 containers is selected every hour. These data are then used to determine whether the mean and standard deviation are maintained at their nominal values. The weights from one of the hourly samples are given here: 501.4 498.0 498.6 499.2 495.2 501.4 509.5 494.9 498.6 497.6 505.5 505.1 499.8 502.4 497.0 504.3 499.7 497.9 496.5 498.9 504.9 503.2 503.0 502.6 496.8 498.2 500.1 497.9 502.2 503.2 Estimate the mean and standard deviation in the weights of coffee containers filled during the hour, in which the random sample of 30 containers was selected using a 99 confidence interval. Solution For these data, we find y ⫽ 500.453 and s ⫽ 3.433 To use our method for constructing a confidence interval for and , we must first check whether the weights are a random sample from a normal population. Figure 7.5 is a normal probability plot of the 30 weights. The 30 values fall near the straight line. Thus, the normality condition appears to be satisfied. The confidence coefficient for this example is . The upper-tail chi-square value can be obtained from Table 7 in the Appendix, for Sim- ilarly, the lower-tail chi-square value is obtained from Table 7, with 1 ⫺ ␣ 兾2 ⫽ .995. Thus, x 2 L ⫽ 13.12 and x 2 U ⫽ 52.34 df ⫽ n ⫺ 1 ⫽ 29 and ␣ 兾2 ⫽ .005. 1 ⫺ a ⫽ .99 s m FIGURE 7.5 Normal probability plot of container weights .999 .99 .95 .80 .50 .20 .05 .01 .001 495 500 505 510 Weight Probability The 99 confidence interval for is then or Thus, we are 99 confident that the standard deviation in the weights of coffee cans lies between 2.56 and 5.10 grams. The designed value for , 4 grams, falls within our confidence interval. Using our results from Chapter 5, a 99 confi- dence interval for is or Thus, it appears the machine is underfilling the containers, because 506.6 grams does not fall within the confidence limits. In addition to estimating a population variance, we can construct a statistical test of the null hypothesis that equals a specified value, . This test procedure is summarized next. s 2 s 2 498.7 ⬍ m ⬍ 502.2 500.453 ⫾ 2.756 3.433 130 500.453 ⫾ 1.73 m s 2.56 ⬍ s ⬍ 5.10 A 293.433 2 52.34 ⬍ s ⬍ A 293.433 2 13.12 s EXAMPLE 7.2 New guidelines define persons as diabetic if results from their fasting plasma glu- cose test on two different days are 126 milligrams per deciliter mgdL or higher. People who have a reading of between 110 and 125 are considered in danger of be- coming diabetic as their ability to process glucose is impaired. These people should be tested more frequently and counseled about ways to lower their blood sugar level and reduce the risk of heart disease. Amid sweeping changes in U.S. health care, the trend toward cost-effective self-care products used in the home emphasizes prevention and early intervention. The home test kit market is offering faster and easier products that lend themselves to being used in less-sophisticated environments to meet consumers’ needs. A home blood sugar glucose test measures the level of glucose in your blood at the time of testing. The test can be done at home, or anywhere, using a small portable machine called a blood glucose meter. People who take insulin to control their dia- betes may need to check their blood glucose level several times a day. Testing blood sugar at home is often called home blood sugar monitoring or self-testing. Home glucose meters are not usually as accurate as laboratory measurement. Problems arise from the machines not being properly maintained and, more im- portantly, when the persons conducting the tests are the patients themselves, who may be quite elderly and in poor health. In order to evaluate the variability in read- ings from such devices, blood samples with a glucose level of 200 mgdL are given to 20 diabetic patients to perform a self-test for glucose level. Trained technicians using the same self-test equipment obtain readings that have a standard deviation of 5 mgdL. The manufacturer of the equipment claims that, with minimal instruc- tion, anyone can obtain the same level of consistency in their measurements. The readings from the 20 diabetic patients are given here: Statistical Test for 2 or H :1. H
a : 1. 2. 2. 3. 3. T.S.: R.R.: For a specified value of ,1. Reject H
if is greater than , the upper-tail value for and .2. Reject H
if is less than , the lower-tail value for 1 ⫺ and .3. Reject H
if is greater than , based on and or less than , based on 1 ⫺ and . Check assumptions and draw conclusions. df ⫽ n ⫺ 1 a 兾2 x 2 L df ⫽ n ⫺ 1, a 兾2 x 2 U x 2 df ⫽ n ⫺ 1 a x 2 L x 2 df ⫽ n ⫺ 1 a x 2 U x 2 a x 2 ⫽ n ⫺ 1s 2 s 2 s 2 ⫽ s 2 s 2 ⫽ s 2 s 2 ⬍ s 2 s 2 ⱖ s 2 s 2 ⬎ s 2 s 2 ⱕ s 2 203.1 184.5 206.8 211.0 218.3 174.2 193.2 201.9 199.9 194.3 199.4 193.6 194.6 187.2 197.8 184.3 196.1 196.4 197.5 187.9 Use these data to determine whether there is sufficient evidence that the vari- ability in readings from the diabetic patients is higher than the manufacturer’s claim. Use a ⫽ .05. Solution The manufacturer claims that the diabetic patients should have a stan- dard deviation of 5 mgdL. The appropriate hypotheses are manufacturer’s claim is correct manufacturer’s claim is false In order to apply our test statistic to these hypotheses, it is necessary to check whether the data appear to have been generated from a normally distributed pop- ulation. From Figure 7.6, we observe that the plotted points fall relatively close to the straight line and that the p-value for testing normality is greater than .10. Thus, the normality condition appears to be satisfied. From the 20 data values, we compute the sample standard deviation s ⫽ 9.908. The test statistic and rejection regions are as follows: T.S.: R.R.: For , the null hypothesis, H is rejected if the value of the T.S. is greater than 30.14, obtained from Table 7 in the Appendix for and . Conclusion: Since the computed value of the T.S., 74.61, is greater than the critical value 30.14, there is sufficient evidence to reject H , the manufacturer’s claim at the .05 level. In fact, the p-value of the T.S. is p ⫺ value ⫽ using Table 7 from the Appendix. Thus, there is very strong evidence that patients using the self-test for glucose may have larger variability in their readings than what the manufacturer claimed. In fact, to further assess the size of this standard deviation, a 95 confidence interval for is given by Therefore, the standard deviation in glucose measurements for the diabetic pa- tients is potentially considerably higher than the standard deviation for the trained technicians. 冢 A 20 ⫺ 19.908 2 32.85 , A 20 ⫺ 19.908 2 8.907 冣 ⫽ 7.53, 14.47 s Px 2 19 ⬎ 43.82 ⫽ .001, Px 2 19 ⬎ 74.61 ⬍ df ⫽ n ⫺ 1 ⫽ 19 a ⫽ .05 a ⫽ .05 x 2 ⫽ n ⫺ 1s 2 s 2 ⫽ 199.908 2 5 2 ⫽ 74.61 H a : s 2 ⬎ 5 H : s 2 ⱕ 5 FIGURE 7.6 Normal probability plot for glucose readings 5 1 C1 10 20 Percent 30 40 50 60 70 80 90 95 99 170 180 190 200 210 220 P -value .100 RJ .983 N 20 StDev 9.908 Mean 196.2 The inference methods about are based on the condition that the random sample is selected from a population having a normal distribution similar to the requirements for using t distribution–based inference procedures. However, when sample sizes are moderate to large , the t distribution–based procedures can be used to make inferences about even when the normality condition does not hold, because for moderate to large sample sizes the Central Limit Theorem pro- vides that the sampling distribution of the sample mean is approximately normal. Unfortunately, the same type of result does not hold for the chi-square–based pro- cedures for making inferences about ; that is, if the population distribution is distinctly nonnormal, then these procedures for are not appropriate even if the sample size is large. Population nonnormality, in the form of skewness or heavy tails, can have serious effects on the nominal significance and confidence probabilities for . If a boxplot or normal probability plot of the sample data shows substantial skew- ness or a substantial number of outliers, the chi-square-based inference procedures should not be applied. There are some alternative approaches that involve compu- tationally elaborate inference procedures. One such procedure is the bootstrap. Bootstrapping is a technique that provides a simple and practical way to estimate the uncertainty in sample statistics like the sample variance. We can use bootstrap techniques to estimate the sampling distribution of sample variance. The estimated sampling distribution is then manipulated to produce confidence intervals for and rejection regions for tests of hypotheses about . Information about bootstrapping can be found in the books by Efron and Tibshirani An Introduction to the Boot- strap, Chapman and Hall, New York, 1993 and by Manly Randomization, Boot- strap and Monte Carlo Methods in Biology, Chapman and Hall, New York, 1998. EXAMPLE 7.3 A simulation study was conducted to investigate the effect on the level of the chi-square test of sampling from heavy-tailed and skewed distributions rather than the required normal distribution. The five distributions were normal, uniform short-tailed, t distribution with df ⫽ 5 heavy-tailed, and two gamma distribu- tions, one slightly skewed and the other heavily skewed. Some summary statistics about the distributions are given in Table 7.1. s s s s s m n ⱖ 30 s TABLE 7.1 Summary statistics for distributions in simulation Distribution Summary Gamma Gamma Statistic Normal Uniform t df ⴝ 5 shape ⴝ 1 shape ⴝ .1 Mean 17.32 10 3.162 Variance 100 100 100 100 100 Skewness 2 6.32 Kurtosis 3 1.8 9 9 63 Note that each of the distributions has the same variance, , but the skewness and kurtosis of the distributions vary. Skewness is a measure of lack of symmetry, and kurtosis is a measure of the peakedness or flatness of a distribution. From each of the distributions, 2,500 random samples of sizes 10, 20, and 50 were selected and a test of H : versus and a test of were conducted using for both sets of hypotheses. A chi-square test of variance was performed for each of the 2,500 sam- ples of the various sample sizes from each of the five distributions. The results are given in Table 7.2. What do the results indicate about the sensitivity of the test to sampling from a nonnormal population? a ⫽ .05 H : s 2 ⱖ 100 versus H a : s 2 ⬍ 100 H a : s 2 ⬎ 100 s 2 ⱕ 100 s 2 ⫽ 100 Solution The values in Table 7.2 are estimates of the probability of a Type I error, , for the chi-square test about variances. When the samples are taken from a nor- mal population, the actual probabilities of a Type I error are very nearly equal to the nominal ⫽ .05 value. When the population distribution is symmetric with shorter tails than a normal distribution, the actual probabilities are smaller than .05, whereas for a symmetric distribution with heavy tails, the Type I error proba- bilities are much greater than .05. Also, for the two skewed distributions, the actual values are much larger than the nominal .05 value. Furthermore, as the popula- tion distribution becomes more skewed, the deviation from .05 increases. From these results, there is strong evidence that the claimed value of the chi-square test of a population variance is very sensitive to nonnormality. This strongly rein- forces our recommendation to evaluate the normality of the data prior to conducting the chi-square test of a population variance.7.3 Estimation and Tests for Comparing
Two Population Variances In the research study about E. coli detection methods, we are concerned about comparing the standard deviations of the two procedures. In many situations in which we are comparing two processes or two suppliers of a product, we need to compare the standard deviations of the populations associated with process meas- urements. Another major application of a test for the equality of two population variances is for evaluating the validity of the equal variance condition that is, for a two-sample t test. The test developed in this section requires that the two population distributions both have normal distributions. We are interested in comparing the variance of population 1, , to the variance of population 2, . When random samples of sizes n 1 and n 2 have been independently drawn from two normally distributed populations, the ratio s 2 1 兾s 2 1 s 2 2 兾s 2 2 ⫽ s 2 1 兾s 2 2 s 2 1 兾s 2 2 s 2 2 s 2 1 s 2 1 ⫽ s 2 2 a a a a TABLE 7.2 Proportion of times H was rejected a ⫽ .05 Sample Distribution Size Normal Uniform t Gamma 1 Gamma .1 n ⫽ 10 .047 .004 .083 .134 .139 n ⫽ 20 .052 .006 .103 .139 .175 n ⫽ 50 .049 .004 .122 .156 .226 Sample Distribution Size Normal Uniform t Gamma 1 Gamma .1 n ⫽ 10 .046 .018 .119 .202 .213 n ⫽ 20 .050 .011 .140 .213 .578 n ⫽ 50 .051 .018 .157 .220 .528 H a : s 2 ⬍ 100 H a : s 2 ⬎ 100 evaluating equal variance condition possesses a probability distribution in repeated sampling referred to as an F distribution. The formula for the probability distribution is omitted here, but we will specify its properties. F distribution Properties of the F Distribution1. Unlike t or z but like , F can assume only positive values.
2. The F distribution, unlike the normal distribution or the t distribu-
tion but like the distribution, is nonsymmetrical. See Figure 7.7.3. There are many F distributions, and each one has a different shape.
We specify a particular one by designating the degrees of freedom associated with and . We denote these quantities by df 1 and df 2 , respectively. See Figure 7.7.4. Tail values for the F distribution are tabulated and appear in Table 8
in the Appendix. s 2 2 s 2 1 x 2 x 2 Table 8 in the Appendix records upper-tail values of F corresponding to areas , .10, .05, .025, .01, .005, and .001. The degrees of freedom for , des- ignated by df 1 , are indicated across the top of the table; df 2 , the degrees of freedom for , appear in the first column to the left. Values of are given in the next col- umn. Thus, for df 1 ⫽ 5 and df 2 ⫽ 10, the critical values of F corresponding to ⫽ .25, .10, .05, .025, .01, .005, and .001 are, respectively, 1.59, 2.52, 3.33, 4.24, 5.64, 6.78, and 10.48. It follows that only 5 of the measurements from an F distribution with df 1 ⫽ 5 and df 2 ⫽ 10 would exceed 3.33 in repeated sampling. See Figure 7.8. Sim- ilarly, for df 1 ⫽ 24 and df 2 ⫽ 10, the critical values of F corresponding to tail areas of ⫽ .01 and .001 are, respectively, 4.33 and 7.64. a a a s 2 2 s 2 1 a ⫽ .25 FIGURE 7.7 Densities of two F distributions .8 .7 .6 .5 .4 .3 .2 .1 1 2 3 4 5 6 7 8 9 10 df 1 = 10, df 2 = 20 df 1 = 5, df 2 = 10 Value of F F densityParts
» Introduction 2 Why Study Statistics? 6
» Some Current Applications of Statistics 8 A Note to the Student 12
» Summary 13 Exercises 13 Lyman Ott Michael Longnecker
» Introduction and Abstract of Research Study 16 Observational Studies 18
» Sampling Designs for Surveys 24 Experimental Studies 30
» Designs for Experimental Studies 35 Research Study: Exit Polls versus Election Results 46
» Summary 47 Exercises 48 Lyman Ott Michael Longnecker
» Introduction and Abstract of Research Study 56 Calculators, Computers, and Software Systems 61
» Describing Data on a Single Variable: Measures of Variability 85 The Boxplot 97
» Summary and Key Formulas 116 Introduction and Abstract of Research Study 140
» Finding the Probability of an Event 144 Basic Event Relations and Probability Laws 146
» Conditional Probability and Independence 149 Bayes’ Formula 152
» Variables: Discrete and Continuous 155 Probability Distributions for Discrete Random Variables 157
» A Continuous Probability Distribution: The Normal Distribution 171 Random Sampling 178
» Sampling Distributions 181 Normal Approximation to the Binomial 191
» Minitab Instructions 201 Summary and Key Formulas 203
» Exercises 203 Introduction and Abstract of Research Study 222
» Estimation of m 225 Choosing the Sample Size for Estimating m 230
» A Statistical Test for m 232 Research Study: Effects of Oil Spill on Plant Growth 325
» Summary and Key Formulas 330 Introduction and Abstract of Research Study 360
» Summary and Key Formulas 386 Introduction and Abstract of Research Study 402
» Checking on the AOV Conditions 416 An Alternative Analysis: Transformations of the Data 421
» Summary and Key Formulas 436 Introduction and Abstract of Research Study 451
» Measuring Strength of Relation 528 Odds and Odds Ratios 530
» Summary and Key Formulas 545 Introduction and Abstract of Research Study 572
» Estimating Model Parameters 581 Inferences about Regression Parameters 590
» Predicting New y Values Using Regression 594 Examining Lack of Fit in Linear Regression 598
» The Inverse Regression Problem Calibration 605 Correlation 608
» Research Study: Two Methods for Detecting E. coli 616 Summary and Key Formulas 621
» Introduction and Abstract of Research Study 664 The General Linear Model 674
» Estimating Multiple Regression Coefficients 675 Inferences in Multiple Regression 683
» Testing a Subset of Regression Coefficients 691 Introduction and Abstract of Research Study 878
» The Extrapolation Problem 1023 Introduction and Abstract of Research Study 1091
» Exercises 1160 Lyman Ott Michael Longnecker
» Introduction Lyman Ott Michael Longnecker
» Why Study Statistics? Lyman Ott Michael Longnecker
» Some Current Applications of Statistics
» Introduction and Abstract of Research Study
» Observational Studies Lyman Ott Michael Longnecker
» Sampling Designs for Surveys
» Experimental Studies Lyman Ott Michael Longnecker
» Designs for Experimental Studies
» Research Study: Exit Polls versus Election Results
» Summary Lyman Ott Michael Longnecker
» Exercises Lyman Ott Michael Longnecker
» A prospective study is conducted to study the relationship between incidence of
» A study was conducted to examine the possible relationship between coronary disease
» A hospital introduces a new screening procedure to identify patients suffering from
» A high school mathematics teacher is convinced that a new software program will
» Do you think the two types of surveys will yield similar results on the percentage of
» What types of biases may be introduced into each of the surveys? Edu.
» Each name is randomly assigned a number. The names with numbers 1 through 1,000
» The Environmental Protection Agency EPA is required to inspect landfills in the
» factors b. factor levels blocks d. experimental unit measurement unit f. replications treatments
» A horticulturalist is measuring the vitamin C concentration in oranges in an orchard
» A medical specialist wants to compare two different treatments T
» In the design described in b make the following change. Within each hospital, the
» An experiment is planned to compare three types of schools—public, private-
» measurement unit f. replications
» treatments Lyman Ott Michael Longnecker
» The 48 treatments comprised 3, 4, and 4 levels of fertilizers N, P, and K, respectively,
» Ten different software packages were randomly assigned to 30 graduate students. The
» Four different glazes are applied to clay pots at two different thicknesses. The kiln
» There are two possible experimental designs. Design A would use a random sample
» When asked how the experiment is going, the researcher replies that one recipe smelled
» What information should be collected from the workers? Bus.
» How would you obtain a list of private doctors and medical facilities so that a sample
» What are some possible sources for determining the population growth and health
» How could you sample the population of health care facilities and types of private
» Calculators, Computers, and Software Systems
» Describing Data on a Single Variable: Graphical Methods
» Describing Data on a Single Variable:
» The Boxplot Lyman Ott Michael Longnecker
» Summarizing Data from More Than One Variable:
» Research Study: Controlling for Student Background
» Construct a pie chart for these dat b. Construct a bar chart for these dat
» If one of these 25 days were selected at random, what would be the chance probability
» Would you describe per capita income as being fairly homogenous across the
» Construct separate relative frequency histograms for the survival times of both the
» Compare the two histograms. Does the new therapy appear to generate a longer
» Plot the defense expenditures time-series data and describe any trends across the time
» Plot the four separate time series and describe any trends in the separate time
» Do the trends appear to imply a narrowing in the differences between male and
» Construct a relative frequency histogram plot for the homeownership data given in the
» How could Congress use the information in these plots for writing tax laws that allow
» Describing Data on a Single Variable: Measures of Central Tendency
» What does the relationship among the three measures of center indicate about the
» Which of the three measures would you recommend as the most appropriate repre-
» Using your results from parts a and b, comment on the relative sensitivity of the
» Which measure, mean or median, would you recommend as the most appropriate
» What measure or measures best summarize the center of these distributions?
» Describing Data on a Single Variable: Measures of Variability
» Verify that the mean years’ experience is 5 years. Does this value appear to ade-
» Verify that c. Calculate the sample variance and standard deviation for the experience data.
» Calculate the coefficient of variation CV for both the racer’s age and their years of
» Estimate the standard deviations for both the racer’s age and their years of experience
» Might another measure of variability be better to compare luxury and budget hotel
» Generate a time-series plot of the mercury concentrations and place lines for both
» Select the most appropriate measure of center for the mercury concentrations. Compare
» Compare the variability in mercury concentrations at the two sites. Use the CV in
» When comparing the center and variability of the two sites, should the years
» From these graphs, determine the median, lower quartile, and upper quartile for the
» Comment on the similarities and differences in the distributions of daily costs for the
» Compare the mean and median for the 3 years of dat Which value, mean or median,
» Compare the degree of variability in homeownership rates over the 3 years. Soc.
» Use this side-by-side boxplot to discuss changes in the median homeownership rate
» Use this side-by-side boxplot to discuss changes in the variation in these rates over the
» Construct the intervals Lyman Ott Michael Longnecker
» Why do you think the Empirical Rule and your percentages do not match well? Edu.
» Data analysts often find it easier to work with mound-shaped relative frequency his-
» Refer to the data of Exercise 3.49. a. Compute the sample median and the mode.
» Refer to the data of Exercise 3.49. a. Compute the interquartile range.
» Find the 20th percentile for the homeownership percentage and interpret this value
» Congress wants to designate those states that have the highest homeownership percent-
» Similarly identify those states that fall into the upper 10th percentile of homeownership
» Can the combined median be calculated from the medians for each number of members? Gov.
» The DJIA is a summary of data. Does the DJIA provide information about a popula-
» Interpret the values 62, 32.8, 41.3, and 7.8 in the upper left cell of the cross tabulation.
» Does there appear to be a difference in the relationships between the seizure counts
» Describe the type of apparent differences, if any, that you found in a.
» Predict the effect of removing the patient with ID 207 from the data set on the size of
» Using a computer program, compute the correlations with patient ID 207 removed
» Is there support for the same conclusions for the math scores as obtained for the read-
» If the conclusions are different, why do you suppose this has happened? Med.
» Why is it not possible to conclude that large relative values for minority and
» List several variables related to the teachers and students in the schools which may be
» Construct a scatterplot of the number of AIDS cases versus the number of tuberculo-
» Compute the correlation between the number of AIDS cases and the number of
» Why do you think there may be a correlation between these two diseases? Med.
» Compute the correlation between the number of syphilis cases and the number of
» Identify the states having number of tuberculosis cases that are above the 90th
» Compute the correlation coefficient for this data set. Is there a strong or weak rela-
» Finding the Probability of an Event
» Basic Event Relations and Probability Laws
» b. Lyman Ott Michael Longnecker
» Conditional Probability and Independence
» Bayes’ Formula Lyman Ott Michael Longnecker
» Variables: Discrete and Continuous
» Probability Distributions for Discrete Random Variables
» Two Discrete Random Variables:
» Probability Distributions for Continuous
» A Continuous Probability Distribution:
» Random Sampling Lyman Ott Michael Longnecker
» Sampling Distributions Lyman Ott Michael Longnecker
» Normal Approximation to the Binomial
» Evaluating Whether or Not a Population
» Research Study: Inferences about Performance-
» Minitab Instructions Lyman Ott Michael Longnecker
» The National Angus Association has stated that there is a 60
» The quality control section of a large chemical manufacturing company has under-
» A new blend of coffee is being contemplated for release by the marketing division of a
» The probability that a customer will receive a package the day after it was sent by a
» The sportscaster in College Station, Texas, states that the probability that the Aggies
» Let a two-digit number represent an individual running of the screening test. Which
» If we generate 2,000 sets of 20 two-digit numbers, how can the outcomes of this simula-
» The state consumers affairs office provided the following information on the frequency of
» What is the probability that a randomly selected car will need some repairs?
» Suppose you purchase a lottery ticket. What is the probability that your 3-digit number
» Which of the probability approaches subjective, classical, or relative frequency did
» In Exercise 4.10, assume that each one of the outcomes has probability 1
» A: Observing exactly 1 head b. B: Observing 1 or more heads
» C: Observing no heads 4.12 For Exercise 4.11:
» A: Observe a 6 b. B: Observe an odd number
» C: Observe a number greater than 3 d. D: Observe an even number and a number greater than 2
» Complement of A b. Either A or B
» A volunteer blood donor walks into a Red Cross Blood office. What is the probability
» It is brown. b. It is red or green.
» It is not blue. d. It is both red and brown. b.
» Refer to Exercise 4.11. a. Are the events A and B independent? Why or why not?
» Which pairs of the events A B, B C, and A C are mutually exclusive? Justify
» Describe in words the event T
» Compute the probability of the occurrence of the event T
» What is the probability that a professional selected at random would accept the
» What is the probability that a professional selected at random is part of a two-
» Refer to Exercise 4.23 a. Are events A and B independent?
» Find Find PA, PB, and PC. b. Find , , and .
» Find , , Lyman Ott Michael Longnecker
» Suppose two customers are chosen at random from the list of all customers. What is
» Find the probability that a customer selected at random will pay two consecutive
» Find the probability that a customer selected at random will pay neither of two
» Find the probability that a customer chosen at random will pay exactly one month in full.
» In Example 4.4, compute the probability that the test incorrectly identifies the defects D
» Find the probability that a patient truly did not have appendicitis given that the radio-
» The thickness of ice 20 feet from the shoreline in Lake Superior during a random day
» Is the number of cars running a red light during a given light cycle a discrete or
» Is the time between the light turning red and the last car passing through the inter-
» Are the number of students in class responding Strongly agree a continuous or discrete
» Are the percent of students in class responding Strongly agree a continuous or discrete
» Construct a graph of Py. b. Find .
» Find . d. Find . Lyman Ott Michael Longnecker
» Suppose the fire station must call for additional equipment from a neighboring city
» A biologist randomly selects 10 portions of water, each equal to .1 cm
» All 10 automobiles failed the inspection. b. Exactly 6 of the 10 failed the inspection.
» Six or more failed the inspection. d. All 10 passed the inspection.
» Two are rated as outstanding. b. Two or more are rated as outstanding.
» Py ⫽ 1 given m ⫽ 3.0 b. Py ⬎ 1 given m ⫽ 2.5
» No cars arrive. b. More than one car arrives.
» Write an expression for the probability that there are less than six sales, do not com-
» What assumptions are needed to write the expression in part a?
» Use a computer program if available to compute the exact probability that less than
» z ⫽ 0 and z ⫽ 1.6 b. z ⫽ 0 and z ⫽ 2.3
» Repeat Exercise 4.53 for these values: a. z ⫽ .7 and z ⫽ 1.7
» z ⫽ ⫺1.2 and z ⫽ 0 4.55 Repeat Exercise 4.53 for these values:
» z ⫽ ⫺1.29 and z ⫽ 0 b. z ⫽ ⫺.77 and z ⫽ 1.2
» Repeat Exercise 4.53 for these values: a. z ⫽ ⫺1.35 and z ⫽ ⫺.21
» z ⫽ ⫺.37 and z ⫽ ⫺1.20 4.57 Find the probability that z is greater than 1.75.
» Find the probability that z is less than 1.14. 4.59 Find a value for z, say z
» Find a value for z, say z , such that Pz ⬎ z
» Find a value for z, say z , such that P⫺z
» Py ⬎ 100 b. Py ⬎ 105 Lyman Ott Michael Longnecker
» P100 ⬍ y ⬍ 108 Lyman Ott Michael Longnecker
» P304 ⬍ y ⬍ 665 d. k such that P500 ⫺ k ⬍ y ⬍ 500 ⫹ k ⫽ .60
» Convert y ⬎ 85 to the z-score equivalent. c. Find Py ⬍ 115 and Py ⬎ 85.
» Find the value of z for these areas. a. an area .025 to the right of z
» an area .05 to the left of z
» Find the probability of observing a value of z greater than these values. a. 1.96
» 2.21 c. ⫺2.86 Lyman Ott Michael Longnecker
» ⫺0.73 Lyman Ott Michael Longnecker
» What is the probability that the elapsed time between submission and reimbursement
» If you had a travel voucher submitted more than 55 days ago, what might you
» Greater than 600 b. Greater than 700
» Less than 450 d. Between 450 and 600
» Py ⬍ 200 b. Py ⬎ 100 Lyman Ott Michael Longnecker
» Using either a random number table or a computer program, generate a second ran-
» Give several reasons why you need to generate a different set of random numbers for
» Refer to Exercise 4.77. Describe the sampling distribution for the sample sum . Is it un-
» What fraction of the patients scored between 800 and 1,100? b. Less than 800?
» Greater than 1,200? So Lyman Ott Michael Longnecker
» Use the Empirical Rule to describe the distribution of y, the number of patients
» If the facility was built with a 160-patient capacity, what fraction of the weeks might
» What proportion of the population spends more than 7 hours per day watching
» In a 1998 study of television viewing, a random sample of 500 adults reported that the
» If the EPA mandates that a nitrogen oxide level of 2.7 gm cannot be exceeded, what
» At most, 25 of Polluters exceed what nitrogen oxide level value that is, find the
» The company producing the Polluter must reduce the nitrogen oxide level so that
» If a patient’s systolic readings during a given day have a normal distribution with
» If five measurements are taken at various times during the day, what is the probability
» How many measurements would be required so that the probability is at most 1 of
» The expected number of errors b. The probability of observing fewer than four errors
» The probability of observing more than two errors
» Compute the exact probabilities and corresponding normal approximations for y
» The normal approximation can be improved slightly by taking Py
» Compute the exact probabilities and corresponding normal approximations with the
» Let y be a binomial random variable with n
» Calculate P4 Lyman Ott Michael Longnecker
» Use a normal approximation without the continuity correction to calculate the
» Refer to Exercise 4.89. Use the continuity correction to compute the probability P4
» Does it appear that the 45 data values appear to be a random sample from a normal
» Compute the correlation coefficient and p-value to assess whether the data appear to
» Use the 45 sample means to determine whether the sampling distribution of the
» Compute the correlation coefficient and p-value to assess whether the 45 means
» Use a normal quantile plot to assess whether the data appear to fit a normal
» Compute the correlation coefficient and p-value for the normal quantile plot.
» Find the probability of selecting a 1-foot-square sample of material at random that on
» Describe the sampling distribution for based on random samples of 15 1-foot sections.
» What percentage of the males in this age bracket could be expected to have a serum
» Estimation of M Lyman Ott Michael Longnecker
» Choosing the Sample Size for Estimating M
» Choosing the Sample Size for Testing M
» The Level of Significance of a Statistical Test
» Inferences about M for a Normal Population,
» Inferences about M When Population Is Nonnormal
» Research Study: Percent Calories from Fat
» What characteristics of the nurses other than dietary intake might be important in
» Calculate a 95 confidence interval for the mean caffeine content m of the coffee pro-
» Explain to the CEO of the company in nonstatistical language, the interpretation of
» What would happen to the width of the confidence intervals if the level of confidence
» What would happen to the width of the confidence intervals if the number of samples
» If the level of confidence remains at 95 for the 720 confidence intervals in a given
» If the number of samples is increased from 50 to 100 each hour, how many of the
» If the number of samples remains at 50 each hour but the level of confidence is in-
» Construct a 99 confidence interval for the mean gross profit margin of m of all small
» The city manager reads the report and states that the confidence interval for m con-
» If the level of confidence remains at 99 but the tolerable width of the interval is .4,
» If the level of confidence decreases to 95 but the specified width of the interval
» If the level of confidence increases to 99.5 but the specified width of the interval
» If the level of confidence is increased to 99 with the average rent estimated to
» Suppose the budget for the project will not support both increasing the level of confi-
» Using a ⫽ .05, what conclusions can you make about the hypotheses based on the
» Refer to Exercise 5.18. Sketch the power curve for rejecting H : m ⱖ 28 by determining
» Suppose we keep a ⫽ .05 but change to n ⫽ 20. Without actually recalculating the
» How many of the 100 tests of hypotheses resulted in your reaching the decision to
» Suppose you were to conduct 100 tests of hypotheses and in each of these tests the
» What type of error are you making if you incorrectly reject H ?
» What proportion of the 100 tests of hypotheses resulted in the correct decision, that
» In part a, you were estimating the power of the test when m
» Based on your calculation in b how many of the 100 tests of hypotheses would you
» Did decreasing a from .05 to .01 increase or decrease the power of the test? Explain
» Refer to Exercise 5.23. Compute the power of the test PWRm
» Refer to Exercise 5.26. Suppose a random sample of 100 students is selected yielding
» Is there sufficient evidence a ⫽ .05 in the data that the mean lead concentration
» Based on your answer in c, is the sample size large enough for the test procedures to
» Place a 95 confidence interval on the mean reading time for all incoming freshmen
» Plot the reading time using a normal probability plot or boxplot. Do the data appear
» What are some weak points in this study relative to evaluating the potential of the
» Place a 99 confidence interval on the average number of miles driven, m, prior to
» Is there significant evidence a ⫽ .01 that the manufacturer’s claim is false? What is
» Is there a contradiction between the interval estimate of m and the conclusion
» Test the research hypothesis that the mean oxygen level is less than 5 ppm. What is
» Assuming the eighteen 2-week periods are fairly typical of the volumes throughout
» Construct a 90 confidence interval for the mean change in mileage. On the basis of
» Refer to Exercise 5.45. a. Calculate the probability of a Type II error for several values of m
» Suggest some changes in the way in which this study in Exercise 5.45 was conducted.
» Use a computer program to obtain 1,000 bootstrap samples from the 20 comprehen-
» Use a computer program to obtain 1,000 bootstrap samples from the 15 tire wear
» Use a computer program to obtain 1,000 bootstrap samples from the 8 oxygen levels.
» Use a computer program to obtain 1,000 bootstrap samples from the 18 recycle vol-
» Compare the p-value from part a to the p-value obtained in Exercise 5.44. x
» Use Table 4 in the Appendix to obtain L
» Use the large-sample approximation to determine L
» Graph the data using a boxplot or normal probability plot and determine whether the
» Based on your answer to part a, is the mean or the median cost per household a
» Place a 95 confidence interval on the amount spent on health care by the typical
» Does the typical worker spend more than 400 per year on health care needs? Use
» Is there sufficient evidence that a blood-alcohol level of .1 causes any increase in
» Which summary of reaction time differences seems more appropriate, the mean or
» Is there sufficient evidence that the median difference in reaction time is greater than
» What other factors about the drivers are important in attempting to decide whether
» For both fund A and fund B, estimate the mean and median annual rate of return and
» Which of the parameters, the mean or median, do you think best represents the
» Is there sufficient evidence that the mean annual rate of return for the two mutual
» Suppose the population has a distribution that is highly skewed to the right. The
» When testing hypotheses about the mean or median of a highly skewed population, the
» When testing hypotheses about the mean or median of a lightly skewed population,
» Construct a 95 confidence on the mean time to handle a complaint after imple-
» Is there sufficient evidence that the incentive plan has reduced the mean time to handle
» Is there sufficient evidence that the mean mercury concentration has increased since
» Assuming that the standard deviation of the mercury concentration is .32 mgm
» If the mean number of days to birth beyond the due date was 13 days prior to the in-
» What factors may be important in explaining why the doctors’ projected due dates
» Using a graphical display, determine whether the data appear to be a random sample
» Estimate the mean dissolution rate for the batch of tablets, for both a point estimate
» Is there significant evidence that the batch of pills has a mean dissolution rate less
» Calculate the probability of a Type II error if the true dissolution rate is 19.6 mg. Bus.
» Estimate the typical amount of ore produced by the mine using both a point estimate
» Is there significant evidence that on a typical day the mine produces more than
» Inferences about Lyman Ott Michael Longnecker
» A Nonparametric Alternative: Lyman Ott Michael Longnecker
» Inferences about M Lyman Ott Michael Longnecker
» Choosing Sample Sizes for Inferences about
» Research Study: Effects of Oil Spill on Plant Growth
» Describe a method for randomly selecting the tracts where flora density measurements
» State several hypotheses that may be of interest to the researchers. H
» H : m Lyman Ott Michael Longnecker
» Refer to the data of Exercise 6.3. a. Give the level of significance for your test.
» Place a 95 confidence interval on m
» Do the data provide sufficient evidence that rats exposed to a 5°C environment have a
» Do the data provide sufficient evidence that successful companies have a lower percent-
» How large is the difference between the percentage of returns for successful and unsuc-
» Identify the value of the pooled-variance t statistic the usual t test based on the equal
» Is there significant evidence that there is a difference in the distribution of SB2M for
» Discuss the implications of your findings in part c on the evaluation of the influence
» What has a greater effect, if any, on the level of significance of the t test, skewness or
» What has a greater effect, if any, on the level of significance of the Wilcoxon rank sum
» What has a greater effect, if any, on the power of the Wilcoxon rank sum test, skewness
» For what type of population distributions would you recommend using the Wilcoxon H H H
» Consider the data given here. Pair
» Conduct a paired t test of H
» Using a testing procedure related to the binomial distribution, test the
» Give the level of significance for your test. b. Place a 95 confidence interval on m
» Plot the pairs of observations in a scatterplot with the 1982 values on the horizontal
» Compute the correlation coefficient between the pair of observations.
» Answer the questions posed in Exercise 6.11 parts a and b using a paired data
» Consider the data given in Exercise 6.23. a. Conduct a Wilcoxon signed-rank test of H
» Compare your conclusions here to those given in Exercise 6.23. Does it matter which
» Refer to the data of Exercise 6.31. a. Give the level of significance for your test.
» Place a 95 confidence interval on the median difference, M.
» Use the level and power values for the paired t test and Wilcoxon signed-rank test given in
» Which type of deviations from a normal distribution, skewness or heavy-tailedness,
» For small sample sizes, n ⱕ 20, does the actual level of the Wilcoxon signed-rank test
» Suppose a boxplot of the differences in the pairs from a paired data set has many out-
» Suppose the sample sizes are the same for both groups. What sample size is needed
» Suppose the user group will have twice as many patients as the placebo group. What
» How many chemical plant cooling towers need to be measured if we want a probabil-
» What assumptions did you make in part a in order to compute the sample size? Env.
» Is there sufficient evidence to support the conjecture that ozone exposure increases
» Estimate the size of the increase in lung capacity after exposure to ozone using a 95
» After completion of the study, the researcher claimed that ozone causes increased
» Estimate the size of the difference in mean noise level between the two types of jets
» How would you select the jets for inclusion in this study? Ag.
» An entomologist is investigating which of two fumigants, F
» Estimate the size of the difference in the mean number of parasites between the two
» After combining the data from the two depths, does there appear to be a difference in
» Estimate the size of the difference in the mean population abundance at the two
» Refer to Exercise 6.46. Answer the following questions using the combined data for both depths.
» Use the Wilcoxon rank sum test to assess whether there is a difference in population
» Discuss any differences in the conclusions obtained using the t-procedures and the
» Plot the four data sets using side-by-side boxplots to demonstrate the effect of depth
» Separately for each depth, evaluate differences between the sites within and outside
» Discuss the veracity of the following statement: “The oil spill did not adversely affect the
» A possible criticism of the study is that the six sites outside the oil trajectory were not
» What are some possible problems with using the before and after oil spill data in
» Compare the mean drop in blood pressure for the high-dose group and the control
» Estimate the size of the difference in the mean drop for the high-dose and control
» Do the conditions required for the statistical techniques used in a and b appear to
» Estimate the size of the difference in the mean drop for the low-dose and control
» Estimate the size of the difference in the mean drop for the low-dose and high-dose
» If we tested each of the three sets of hypotheses at the .05 level, estimate the experiment-
» Suggest a procedure by which we could be ensured that the experiment-wide Type I
» Can the licensing board conclude that the mean score of nurses who receive a BS in
» The mean test scores are considered to have a meaningful difference only if they differ
» State the null and alternative hypotheses in
» Estimate the size of the difference in campaign expenditures for female and male
» What is the level of significance of the test for a change in mean pH after reclamation
» The land office assessed a fine on the mining company because the t test indicated a
» A Type I error? b. A Type II error?
» Both a Type I and a Type II error? d. Neither a Type I nor a Type II error?
» Refer to Exercise 6.60. Suppose we wish to test the research hypothesis that m
» Do the data support the conjecture that progabide reduces the mean number of
» Determine the sample size so that we are 95 confident that the estimate of the
» Estimation and Tests for a Population Variance
» Estimation and Tests for Comparing
» For the E. coli research study, answer the following. a. What are the populations of interest?
» What are some factors other than the type of detection method HEC versus HGMF
» Describe a method for randomly assigning the E. coli samples to the two devices for
» State several hypotheses that may be of interest to the researchers.
» Find Py ⬎ 52.62. d. Find Py ⬍ 10.52.
» Find Py ⬎ 34.38. e. Find P10.52 ⬍ y ⬍ 34.38.
» For a chi-square distribution with df ⫽ 80, compare the actual values given in Table 7
» Suppose that y has a chi-square distribution with df ⫽ 277. Find approximate values
» If the process yields jars having a normal distribution with a mean of 32.30 ounces and
» Does the plot suggest any violation of the conditions necessary to use the chi-square
» Place bounds on the p-value of the test. Engin.
» Does the boxplot suggest any violation of the conditions necessary to use the chi-square
» Estimate the standard deviation in the speeds of the vehicles on the interstate using a
» Do the data indicate at the 5 level that the standard deviation in vehicle speeds
Show more