Introduction and Abstract of Research Study

of interest are specified, you must select the most appropriate method to collect the data. Data collection processes include surveys, experiments, and the exami- nation of existing data from business records, censuses, government records, and previous studies. The theory of sample surveys and the theory of experimental designs provide excellent methodology for data collection. Usually surveys are passive. The goal of the survey is to gather data on existing conditions, attitudes, or behaviors. Thus, the transportation department would need to construct a ques- tionnaire and then sample current riders of the buses and persons who use other forms of transportation within the city. Experimental studies, on the other hand, tend to be more active: The person conducting the study varies the experimental conditions to study the effect of the conditions on the outcome of the experiment. For example, the transportation department could decrease the bus fares on a few selected routes and assess whether the use of its buses increased. However, in this example, other factors not under the bus system’s control may also have changed during this time period. Thus, an increase in bus use may have taken place because of a strike of subway workers or an increase in gasoline prices. The decrease in fares was only one of several factors that may have “caused” the increase in the number of persons riding the buses. In most experimental studies, as many as possible of the factors that affect the measurements are under the control of the experimenter. A floriculturist wants to determine the effect of a new plant stimulator on the growth of a commercially produced flower. The floriculturist would run the experiments in a greenhouse, where temperature, humidity, moisture levels, and sunlight are controlled. An equal number of plants would be treated with each of the selected quantities of the growth stimulator, including a control—that is, no stimulator applied. At the con- clusion of the experiment, the size and health of the plants would be measured. The optimal level of the plant stimulator could then be determined, because ide- ally all other factors affecting the size and health of the plants would be the same for all plants in the experiment. In this chapter, we will consider some sampling designs for surveys and some designs for experimental studies. We will also make a distinction between an experimental study and an observational study. Abstract of Research Study: Exit Poll versus Election Results As the 2004 presidential campaign approached election day, the Democratic Party was very optimistic that their candidate, John Kerry, would defeat the incumbent, George Bush. Many Americans arrived home the evening of Election Day to watch or listen to the network coverage of the election with the expectation that John Kerry would be declared the winner of the presidential race, because throughout Election Day, radio and television reporters had provided exit poll re- sults showing John Kerry ahead in nearly every crucial state, and in many of these states leading by substantial margins. The Democratic Party, being better organ- ized with a greater commitment and focus than in many previous presidential elec- tions, had produced an enormous number of Democratic loyalists for this election. But, as the evening wore on, in one crucial state after another the election returns showed results that differed greatly from what the exit polls had predicted. The data shown in Table 2.1 are from a University of Pennsylvania technical report by Steven F. Freeman entitled “The Unexplained Exit Poll Discrepancy.” Freeman obtained exit poll data and the actual election results for 11 states that were considered by many to be the crucial states for the 2004 presidential election. The exit poll results show the number of voters polled as they left the voting booth for each state along with the corresponding percentage favoring Bush or Kerry, and the predicted winner. The election results give the actual outcomes and winner for each state as reported by the state’s election commission. The final column of the table shows the difference between the predicted winning percentage from the exit polls and the actual winning percentage from the election. This table shows that the exit polls predicted George Bush to win in only 2 of the 11 crucial states, and this is why the media were predicting that John Kerry would win the election even before the polls were closed. In fact, Bush won 6 of the 11 crucial states, and, perhaps more importantly, we see in the final column that in 10 of these 11 states the difference between the election percentage margin from the actual results and the predicted margin of victory from the exit polls favored Bush. At the end of this chapter, we will discuss some of the cautions one must take in using exit poll data to predict actual election outcomes.

2.2 Observational Studies

A study may be either observational or experimental. In an observational study, the researcher records information concerning the subjects under study without any in- terference with the process that is generating the information. The researcher is a passive observer of the transpiring events. In an experimental study which will be discussed in detail in Sections 2.4 and 2.5, the researcher actively manipulates cer- tain variables associated with the study, called the explanatory variables, and then records their effects on the response variables associated with the experimental subjects. A severe limitation of observational studies is that the recorded values of the response variables may be affected by variables other than the explanatory variables. These variables are not under the control of the researcher. They are called confounding variables. The effects of the confounding variables and the ex- planatory variables on the response variable cannot be separated due to the lack of observational study experimental study explanatory variables response variables confounding variables TABLE 2.1 Crucial Exit Poll Results Election Results Election State Sample Bush Kerry Difference Bush Kerry Difference vs. Exit Colorado 2515 49.9 48.1 Bush 1.8 52.0 46.8 Bush 5.2 Bush 3.4 Florida 2223 48.8 49.2 Kerry 0.4 49.4 49.8 Kerry 0.4 No Diff. Iowa 2846 49.8 49.7 Bush 0.1 52.1 47.1 Bush 5.0 Bush 4.9 Michigan 2502 48.4 49.7 Kerry 1.3 50.1 49.2 Bush 0.9 Bush 2.2 Minnesota 2452 46.5 51.1 Kerry 4.6 47.8 51.2 Kerry 3.4 Kerry 1.2 Nevada 2178 44.5 53.5 Kerry 9.0 47.6 51.1 Kerry 3.5 Kerry 5.5 New Hampshire 2116 47.9 49.2 Kerry 1.3 50.5 47.9 Bush 2.6 Bush 3.9 New Mexico 1849 44.1 54.9 Kerry 10.8 49.0 50.3 Kerry 1.3 Kerry 9.5 Ohio 1951 47.5 50.1 Kerry 2.6 50.0 48.9 Bush 1.1 Bush 3.7 Pennsylvania 1963 47.9 52.1 Kerry 4.2 51.0 48.5 Bush 2.5 Bush 6.7 Wisconsin 1930 45.4 54.1 Kerry 8.7 48.6 50.8 Kerry 2.2 Kerry 6.5 control the researcher has over the physical setting in which the observations are made. In an experimental study, the researcher attempts to maintain control over all variables that may have an effect on the response variables. Observational studies may be dichotomized into either a comparative study or descriptive study. In a comparative study, two or more methods of achieving a result are compared for effectiveness. For example, three types of healthcare de- livery methods are compared based on cost effectiveness. Alternatively, several groups are compared based on some common attribute. For example, the starting income of engineers are contrasted from a sample of new graduates from private and public universities. In a descriptive study, the major purpose is to characterize a population or process based on certain attributes in that population or process— for example, studying the health status of children under the age of 5 years old in families without health insurance or assessing the number of overcharges by companies hired under federal military contracts. Observational studies in the form of polls, surveys, and epidemiological studies, for example, are used in many different settings to address questions posed by researchers. Surveys are used to measure the changing opinion of the nation with respect to issues such as gun control, interest rates, taxes, the mini- mum wage, Medicare, and the national debt. Similarly, we are informed on a daily basis through newspapers, magazines, television, radio, and the Internet of the re- sults of public opinion polls concerning other relevant and sometimes irrelevant political, social, educational, financial, and health issues. In an observational study, the factors treatments of interest are not manip- ulated while making measurements or observations. The researcher in an environ- mental impact study is attempting to establish the current state of a natural setting from which subsequent changes may be compared. Surveys are often used by natu- ral scientists as well. In order to determine the proper catch limits of commercial and recreational fishermen in the Gulf of Mexico, the states along the Gulf of Mexico must sample the Gulf to determine the current fish density. There are many biases and sampling problems that must be addressed in order for the survey to be a reliable indicator of the current state of the sampled population. A problem that may occur in observational studies is assigning cause- and-effect relationships to spurious associations between factors. For example, in many epidemiological studies we study various environmental, social, and ethnic factors and their relationship with the incidence of certain diseases. A public health question of considerable interest is the relationship between heart disease and the amount of fat in one’s diet. It would be unethical to randomly assign vol- unteers to one of several high-fat diets and then monitor the people over time to observe whether or not heart disease develops. Without being able to manipulate the factor of interest fat content of the diet, the scientist must use an observational study to address the issue. This could be done by comparing the diets of a sample of people with heart disease with the diets of a sample of people without heart disease. Great care would have to be taken to record other relevant factors such as family history of heart disease, smok- ing habits, exercise routine, age, and gender for each person, along with other physical characteristics. Models could then be developed so that differences be- tween the two groups could be adjusted to eliminate all factors except fat content of the diet. Even with these adjustments, it would be difficult to assign a cause-and- effect relationship between high fat content of a diet and the development of heart disease. In fact, if the dietary fat content for the heart disease group tended to be higher than that for the group free of heart disease after adjusting for relevant comparative study descriptive study cause-and-effect relationships