Some Current Applications of Statistics

oxide into the atmosphere when burned. Here are some of the many effects of acid rain: ● Acid rain, when present in spring snow melts, invades breeding areas for many fish, which prevents successful reproduction. Forms of life that depend on ponds and lakes contaminated by acid rain begin to disappear. ● In forests, acid rain is blamed for weakening some varieties of trees, making them more susceptible to insect damage and disease. ● In areas surrounded by affected bodies of water, vital nutrients are leached from the soil. ● Man-made structures are also affected by acid rain. Experts from the United States estimate that acid rain has caused nearly 15 billion of damage to buildings and other structures thus far. Solutions to the problems associated with acid rain will not be easy. The National Science Foundation NSF has recommended that we strive for a 50 reduction in sulfur-oxide emissions. Perhaps that is easier said than done. High- sulfur coal is a major source of these emissions, but in states dependent on coal for energy, a shift to lower sulfur coal is not always possible. Instead, better scrubbers must be developed to remove these contaminating oxides from the burning process before they are released into the atmosphere. Fuels for internal combustion engines are also major sources of the nitric and sulfur oxides of acid rain. Clearly, better emission control is needed for automobiles and trucks. Reducing the oxide emissions from coal-burning furnaces and motor vehicles will require greater use of existing scrubbers and emission control devices as well as the development of new technology to allow us to use available energy sources. Developing alternative, cleaner energy sources is also important if we are to meet the NSF’s goal. Statistics and statisticians will play a key role in monitoring atmos- phere conditions, testing the effectiveness of proposed emission control devices, and developing new control technology and alternative energy sources. Defining the Problem: Determining the Effectiveness of a New Drug Product The development and testing of the Salk vaccine for protection against po- liomyelitis polio provide an excellent example of how statistics can be used in solving practical problems. Most parents and children growing up before 1954 can recall the panic brought on by the outbreak of polio cases during the summer months. Although relatively few children fell victim to the disease each year, the pattern of outbreak of polio was unpredictable and caused great concern because of the possibility of paralysis or death. The fact that very few of today’s youth have even heard of polio demonstrates the great success of the vaccine and the testing program that preceded its release on the market. It is standard practice in establishing the effectiveness of a particular drug product to conduct an experiment often called a clinical trial with human partici- pants. For some clinical trials, assignments of participants are made at random, with half receiving the drug product and the other half receiving a solution or tablet that does not contain the medication called a placebo. One statistical problem con- cerns the determination of the total number of participants to be included in the clinical trial. This problem was particularly important in the testing of the Salk vac- cine because data from previous years suggested that the incidence rate for polio might be less than 50 cases for every 100,000 children. Hence, a large number of par- ticipants had to be included in the clinical trial in order to detect a difference in the incidence rates for those treated with the vaccine and those receiving the placebo. With the assistance of statisticians, it was decided that a total of 400,000 chil- dren should be included in the Salk clinical trial begun in 1954, with half of them ran- domly assigned the vaccine and the remaining children assigned the placebo. No other clinical trial had ever been attempted on such a large group of participants. Through a public school inoculation program, the 400,000 participants were treated and then observed over the summer to determine the number of children contracting polio. Although fewer than 200 cases of polio were reported for the 400,000 partici- pants in the clinical trial, more than three times as many cases appeared in the group receiving the placebo. These results, together with some statistical calculations, were sufficient to indicate the effectiveness of the Salk polio vaccine. However, these con- clusions would not have been possible if the statisticians and scientists had not planned for and conducted such a large clinical trial. The development of the Salk vaccine is not an isolated example of the use of statistics in the testing and developing of drug products. In recent years, the Food and Drug Administration FDA has placed stringent requirements on pharma- ceutical firms to establish the effectiveness of proposed new drug products. Thus, statistics has played an important role in the development and testing of birth con- trol pills, rubella vaccines, chemotherapeutic agents in the treatment of cancer, and many other preparations. Defining the Problem: Use and Interpretation of Scientific Data in Our Courts Libel suits related to consumer products have touched each one of us; you may have been involved as a plaintiff or defendant in a suit or you may know of some- one who was involved in such litigation. Certainly we all help to fund the costs of this litigation indirectly through increased insurance premiums and increased costs of goods. The testimony in libel suits concerning a particular product automobile, drug product, and so on frequently leans heavily on the interpretation of data from one or more scientific studies involving the product. This is how and why statistics and statisticians have been pulled into the courtroom. For example, epidemiologists have used statistical concepts applied to data to determine whether there is a statistical “association’’ between a specific character- istic, such as the leakage in silicone breast implants, and a disease condition, such as an autoimmune disease. An epidemiologist who finds an association should try to determine whether the observed statistical association from the study is due to random variation or whether it reflects an actual association between the charac- teristic and the disease. Courtroom arguments about the interpretations of these types of associations involve data analyses using statistical concepts as well as a clinical interpretation of the data. Many other examples exist in which statistical models are used in court cases. In salary discrimination cases, a lawsuit is filed claiming that an employer underpays employees on the basis of age, ethnicity, or sex. Statistical models are developed to explain salary differences based on many fac- tors, such as work experience, years of education, and work performance. The ad- justed salaries are then compared across age groups or ethnic groups to determine whether significant salary differences exist after adjusting for the relevant work performance factors. Defining the Problem: Estimating Bowhead Whale Population Size Raftery and Zeh 1998 discuss the estimation of the population size and rate of increase in bowhead whales, Balaena mysticetus. The importance of such a study derives from the fact that bowheads were the first species of great whale for which commercial whaling was stopped; thus, their status indicates the recovery prospects of other great whales. Also, the International Whaling Commission uses these estimates to determine the aboriginal subsistence whaling quota for Alaskan Eskimos. To obtain the necessary data, researchers conducted a visual and acoustic census off Point Barrow, Alaska. The researchers then applied statistical models and estimation techniques to the data obtained in the census to determine whether the bowhead population had increased or decreased since commercial whaling was stopped. The statistical estimates showed that the bowhead popu- lation was increasing at a healthy rate, indicating that stocks of great whales that have been decimated by commercial hunting can recover after hunting is discontinued. Defining the Problem: Ozone Exposure and Population Density Ambient ozone pollution in urban areas is one of the nation’s most pervasive envi- ronmental problems. Whereas the decreasing stratospheric ozone layer may lead to increased instances of skin cancer, high ambient ozone intensity has been shown to cause damage to the human respiratory system as well as to agricultural crops and trees. The Houston, Texas, area has ozone concentrations rated second only to Los Angeles that exceed the National Ambient Air Quality Standard. Carroll et al. 1997 describe how to analyze the hourly ozone measurements collected in Houston from 1980 to 1993 by 9 to 12 monitoring stations. Besides the ozone level, each station also recorded three meteorological variables: temperature, wind speed, and wind direction. The statistical aspect of the project had three major goals: 1. Provide information andor tools to obtain such information about the amount and pattern of missing data, as well as about the quality of the ozone and the meteorological measurements. 2. Build a model of ozone intensity to predict the ozone concentration at any given location within Houston at any given time between 1980 and 1993.

3.

Apply this model to estimate exposure indices that account for either a long-term exposure or a short-term high-concentration exposure; also, relate census information to different exposure indices to achieve population exposure indices. The spatial–temporal model the researchers built provided estimates demonstrating that the highest ozone levels occurred at locations with relatively small populations of young children. Also, the model estimated that the exposure of young children to ozone decreased by approximately 20 from 1980 to 1993. An examination of the distribution of population exposure had several policy im- plications. In particular, it was concluded that the current placement of monitors is not ideal if one is concerned with assessing population exposure. This project in- volved all four components of Learning from Data: planning where the monitoring stations should be placed within the city, how often data should be collected, and what variables should be recorded; conducting spatial–temporal graphing of the data; creating spatial–temporal models of the ozone data, meteorological data, and demographic data; and finally, writing a report that could assist local and fed- eral officials in formulating policy with respect to decreasing ozone levels. Defining the Problem: Assessing Public Opinion Public opinion, consumer preference, and election polls are commonly used to assess the opinions or preferences of a segment of the public for issues, products, or candidates of interest. We, the American public, are exposed to the results of these polls daily in newspapers, in magazines, on the radio, and on television. For example, the results of polls related to the following subjects were printed in local newspapers over a 2-day period: ● Consumer confidence related to future expectations about the economy ● Preferences for candidates in upcoming elections and caucuses ● Attitudes toward cheating on federal income tax returns ● Preference polls related to specific products for example, foreign vs. American cars, Coke vs. Pepsi, McDonald’s vs. Wendy’s ● Reactions of North Carolina residents toward arguments about the morality of tobacco ● Opinions of voters toward proposed tax increases and proposed changes in the Defense Department budget A number of questions can be raised about polls. Suppose we consider a poll on the public’s opinion toward a proposed income tax increase in the state of Michigan. What was the population of interest to the pollster? Was the pollster interested in all residents of Michigan or just those citizens who currently pay in- come taxes? Was the sample in fact selected from this population? If the population of interest was all persons currently paying income taxes, did the pollster make sure that all the individuals sampled were current taxpayers? What questions were asked and how were the questions phrased? Was each person asked the same ques- tion? Were the questions phrased in such a manner as to bias the responses? Can we believe the results of these polls? Do these results “represent’’ how the general public currently feels about the issues raised in the polls? Opinion and preference polls are an important, visible application of statis- tics for the consumer. We will discuss this topic in more detail in Chapter 10. We hope that after studying this material you will have a better understanding of how to interpret the results of these polls.

1.4 A Note to the Student

We think with words and concepts. A study of the discipline of statistics requires us to memorize new terms and concepts as does the study of a foreign language. Commit these definitions, theorems, and concepts to memory. Also, focus on the broader concept of making sense of data. Do not let details obscure these broader characteristics of the subject. The teaching objective of this text is to identify and amplify these broader concepts of statistics.

1.5 Summary

The discipline of statistics and those who apply the tools of that discipline deal with Learning from Data. Medical researchers, social scientists, accountants, agron- omists, consumers, government leaders, and professional statisticians are all in- volved with data collection, data summarization, data analysis, and the effective communication of the results of data analysis.

1.6 Exercises

1.1 Introduction

Bio. 1.1 Selecting the proper diet for shrimp or other sea animals is an important aspect of sea farm- ing. A researcher wishes to estimate the mean weight of shrimp maintained on a specific diet for a period of 6 months. One hundred shrimp are randomly selected from an artificial pond and each is weighed. a. Identify the population of measurements that is of interest to the researcher. b. Identify the sample. c. What characteristics of the population are of interest to the researcher? d. If the sample measurements are used to make inferences about certain characteristics of the population, why is a measure of the reliability of the inferences important? Env. 1.2 Radioactive waste disposal as well as the production of radioactive material in some mining operations are creating a serious pollution problem in some areas of the United States. State health officials have decided to investigate the radioactivity levels in one suspect area. Two hun- dred points in the area are randomly selected and the level of radioactivity is measured at each point. Answer questions a, b, c, and d in Exercise 1.1 for this sampling situation. Soc. 1.3 A social researcher in a particular city wishes to obtain information on the number of chil- dren in households that receive welfare support. A random sample of 400 households is selected from the city welfare rolls. A check on welfare recipient data provides the number of children in each household. Answer questions a, b, c, and d in Exercise 1.1 for this sample survey. Gov. 1.4 Because of a recent increase in the number of neck injuries incurred by high school football players, the Department of Commerce designed a study to evaluate the strength of football helmets worn by high school players in the United States. A total of 540 helmets were collected from the five companies that currently produce helmets. The agency then sent the helmets to an independent testing agency to evaluate the impact cushioning of the helmet and the amount of shock transmitted to the neck when the face mask was twisted. a. What is the population of interest? b. What is the sample? c. What variables should be measured? d. What are some of the major limitations of this study in regard to the safety of helmets worn by high school players? For example, is the neck strength of the player related to the amount of shock transmitted to the neck and whether the player will be injured? Pol. Sci. 1.5 During the 2004 senatorial campaign in a large southwestern state, the issue of illegal im- migration was a major issue. One of the candidates argued that illegal immigrants made use of ed- ucational and social services without having to pay property taxes. The other candidate pointed out that the cost of new homes in their state was 20 –30 less than the national average due to the low wages received by the large number of illegal immigrants working on new home construction. A random sample of 5,000 registered voters were asked the question, “Are illegal immigrants generally a benefit or a liability to the state’s economy?” The results were 3,500 people responded “liability,” 1,500 people responded “benefit,” and 500 people responded “uncertain.” a. What is the population of interest? b. What is the population from which the sample was selected?