What is the population of interest? b. What is the population from which the sample was selected? What is the population of interest to the professor? b. What is the sampled population?

c. Does the sample adequately represent the population? d. If a second random sample of 5,000 registered voters was selected, would the results be nearly the same as the results obtained from the initial sample of 5,000 voters? Explain your answer. Edu. 1.6 An American History professor at a major university is interested in knowing the history lit- eracy of college freshmen. In particular, he wanted to find what proportion of college freshman at the university knew which country controlled the original 13 states prior to the American Rev- olution. The professor sent a questionnaire to all freshmen students enrolled in HIST 101 and re- ceived responses from 318 students out of the 7,500 students who were sent the questionnaire. One of the questions was, “What country controlled the original 13 states prior to the American Revolution?” a. What is the population of interest to the professor? b. What is the sampled population? c. Is there a major difference in the two populations. Explain your answer. d. Suppose that several lectures on the American Revolution had been given in HIST 101 prior to the students receiving the questionnaire. What possible source of bias has the professor introduced into the study relative to the population of interest? P A R T 2 Collecting Data 2 Using Surveys and Experimental Studies to Gather Data 16 CHAPTER 2 Using Surveys and Experimental Studies to Gather Data 2.1 Introduction and Abstract of Research Study

2.2 Observational Studies

2.3 Sampling Designs for Surveys 2.4 Experimental Studies 2.5 Designs for Experimental Studies 2.6 Research Study: Exit Polls versus Election Results

2.7 Summary

2.8 Exercises

2.1 Introduction and Abstract of Research Study

As mentioned in Chapter 1, the first step in Learning from Data is to define the problem. The design of the data collection process is the crucial step in intelligent data gathering. The process takes a conscious, concerted effort focused on the following steps: ● Specifying the objective of the study, survey, or experiment ● Identifying the variables of interest ● Choosing an appropriate design for the survey or experimental study ● Collecting the data To specify the objective of the study, you must understand the problem being ad- dressed. For example, the transportation department in a large city wants to assess the public’s perception of the city’s bus system in order to increase the use of buses within the city. Thus, the department needs to determine what aspects of the bus system determine whether or not a person will ride the bus. The objective of the study is to identify factors that the transportation department can alter to increase the number of people using the bus system. To identify the variables of interest, you must examine the objective of the study. For the bus system, some major factors can be identified by reviewing studies conducted in other cities and by brainstorming with the bus system employees. Some of the factors may be safety, cost, cleanliness of the buses, whether or not there is a bus stop close to the person’s home or place of employment, and how often the bus fails to be on time. The measurements to be obtained in the study would con- sist of importance ratings very important, important, no opinion, somewhat unim- portant, very unimportant of the identified factors. Demographic information, such as age, sex, income, and place of residence, would also be measured. Finally, the measurement of variables related to how frequently a person currently rides the buses would be of importance. Once the objectives are determined and the variables of interest are specified, you must select the most appropriate method to collect the data. Data collection processes include surveys, experiments, and the exami- nation of existing data from business records, censuses, government records, and previous studies. The theory of sample surveys and the theory of experimental designs provide excellent methodology for data collection. Usually surveys are passive. The goal of the survey is to gather data on existing conditions, attitudes, or behaviors. Thus, the transportation department would need to construct a ques- tionnaire and then sample current riders of the buses and persons who use other forms of transportation within the city. Experimental studies, on the other hand, tend to be more active: The person conducting the study varies the experimental conditions to study the effect of the conditions on the outcome of the experiment. For example, the transportation department could decrease the bus fares on a few selected routes and assess whether the use of its buses increased. However, in this example, other factors not under the bus system’s control may also have changed during this time period. Thus, an increase in bus use may have taken place because of a strike of subway workers or an increase in gasoline prices. The decrease in fares was only one of several factors that may have “caused” the increase in the number of persons riding the buses. In most experimental studies, as many as possible of the factors that affect the measurements are under the control of the experimenter. A floriculturist wants to determine the effect of a new plant stimulator on the growth of a commercially produced flower. The floriculturist would run the experiments in a greenhouse, where temperature, humidity, moisture levels, and sunlight are controlled. An equal number of plants would be treated with each of the selected quantities of the growth stimulator, including a control—that is, no stimulator applied. At the con- clusion of the experiment, the size and health of the plants would be measured. The optimal level of the plant stimulator could then be determined, because ide- ally all other factors affecting the size and health of the plants would be the same for all plants in the experiment. In this chapter, we will consider some sampling designs for surveys and some designs for experimental studies. We will also make a distinction between an experimental study and an observational study. Abstract of Research Study: Exit Poll versus Election Results As the 2004 presidential campaign approached election day, the Democratic Party was very optimistic that their candidate, John Kerry, would defeat the incumbent, George Bush. Many Americans arrived home the evening of Election Day to watch or listen to the network coverage of the election with the expectation that John Kerry would be declared the winner of the presidential race, because throughout Election Day, radio and television reporters had provided exit poll re- sults showing John Kerry ahead in nearly every crucial state, and in many of these states leading by substantial margins. The Democratic Party, being better organ- ized with a greater commitment and focus than in many previous presidential elec- tions, had produced an enormous number of Democratic loyalists for this election. But, as the evening wore on, in one crucial state after another the election returns showed results that differed greatly from what the exit polls had predicted. The data shown in Table 2.1 are from a University of Pennsylvania technical report by Steven F. Freeman entitled “The Unexplained Exit Poll Discrepancy.”