Compute the correlation coefficient for this data set. Is there a strong or weak rela-

140 CHAPTER 4 Probability and Probability Distributions 4.1 Introduction and Abstract of Research Study 4.2 Finding the Probability of an Event 4.3 Basic Event Relations and Probability Laws 4.4 Conditional Probability and Independence

4.5 Bayes’ Formula

4.6 Variables: Discrete and Continuous 4.7 Probability Distributions for Discrete Random Variables 4.8 Two Discrete Random Variables: The Binomial and the Poisson 4.9 Probability Distributions for Continuous Random Variables 4.10 A Continuous Probability Distribution: The Normal Distribution 4.11 Random Sampling 4.12 Sampling Distributions 4.13 Normal Approximation to the Binomial 4.14 Evaluating Whether or Not a Population Distribution Is Normal 4.15 Research Study: Inferences about Performance-Enhancing Drugs among Athletes

4.16 Minitab Instructions

4.17 Summary and Key Formulas 4.18 Exercises

4.1 Introduction and Abstract of Research Study

We stated in Chapter 1 that a scientist uses inferential statistics to make state- ments about a population based on information contained in a sample of units selected from that population. Graphical and numerical descriptive techniques were presented in Chapter 3 as a means to summarize and describe a sample. However, a sample is not identical to the population from which it was selected. We need to assess the degree of accuracy to which the sample mean, sample stan- dard deviation, or sample proportion represent the corresponding population values. Most management decisions must be made in the presence of uncertainty. Prices and designs for new automobiles must be selected on the basis of shaky fore- casts of consumer preference, national economic trends, and competitive actions. The size and allocation of a hospital staff must be decided with limited information on patient load. The inventory of a product must be set in the face of uncertainty about demand. Probability is the language of uncertainty. Now let us examine probability, the mechanism for making inferences. This idea is probably best illus- trated by an example. Newsweek, in its June 20, 1998, issue, asks the question, “Who Needs Doctors? The Boom in Home Testing.” The article discusses the dramatic increase in med- ical screening tests for home use. The home-testing market has expanded beyond the two most frequently used tests, pregnancy and diabetes glucose monitoring, to a variety of diagnostic tests that were previously used only by doctors and certified laboratories. There is a DNA test to determine whether twins are fraternal or iden- tical, a test to check cholesterol level, a screening test for colon cancer, and tests to determine whether your teenager is a drug user. However, the major question that needs to be addressed is, How reliable are the testing kits? When a test indicates that a woman is not pregnant, what is the chance that the test is incorrect and the woman is truly pregnant? This type of incorrect result from a home test could translate into a woman not seeking the proper prenatal care in the early stages of her pregnancy. Suppose a company states in its promotional materials that its pregnancy test provides correct results in 75 of its applications by pregnant women. We want to evaluate the claim, and so we select 20 women who have been determined by their physicians, using the best possible testing procedures, to be pregnant. The test is taken by each of the 20 women, and for all 20 women the test result is negative, in- dicating that none of the 20 is pregnant. What do you conclude about the com- pany’s claim on the reliability of its test? Suppose you are further assured that each of the 20 women was in fact pregnant, as was determined several months after the test was taken. If the company’s claim of 75 reliability was correct, we would have expected somewhere near 75 of the tests in the sample to be positive. However, none of the test results was positive. Thus, we would conclude that the company’s claim is probably false. Why did we fail to state with certainty that the company’s claim was false? Consider the possible setting. Suppose we have a large popula- tion consisting of millions of units, and 75 of the units are Ps for positives and 25 of the units are Ns for negatives. We randomly select 20 units from the pop- ulation and count the number of units in the sample that are Ps. Is it possible to obtain a sample consisting of 0 Ps and 20 Ns? Yes, it is possible, but it is highly improbable. Later in this chapter we will compute the probability of such a sam- ple occurrence. To obtain a better view of the role that probability plays in making infer- ences from sample results to conclusions about populations, suppose the 20 tests result in 14 tests being positive—that is, a 70 correct response rate. Would you consider this result highly improbable and reject the company’s claim of a 75 correct response rate? How about 12 positives and 8 negatives, or 16 positives and 4 negatives? At what point do we decide that the result of the observed sample is