Designs for Experimental Studies

overview of the subject because much data requiring summarization and analysis arise from experimental studies involving one of a number of designs. We will work by way of examples. A consumer testing agency decides to evaluate the wear characteristics of four major brands of tires. For this study, the agency selects four cars of a standard car model and four tires of each brand. The tires will be placed on the cars and then driven 30,000 miles on a 2-mile racetrack. The decrease in tread thickness over the 30,000 miles is the variable of interest in this study. Four different drivers will drive the cars, but the drivers are professional drivers with comparable training and experience. The weather conditions, smoothness of track, and the maintenance of the four cars will be essentially the same for all four brands over the study period. All extraneous factors that may affect the tires are nearly the same for all four brands. Thus, the testing agency feels confident that if there is a difference in wear characteristics between the brands at the end of the study, then this is truly a difference in the four brands and not a difference due to the manner in which the study was conducted. The testing agency is interested in recording other factors, such as the cost of the tires, the length of warranty offered by the manufacturer, whether the tires go out of balance during the study, and the evenness of wear across the width of the tires. In this example, we will only consider tread wear. There should be a recorded tread wear for each of the sixteen tires, four tires for each brand. The methods presented in Chapters 8 and 15 could be used to summa- rize and analyze the sample tread wear data in order to make comparisons infer- ences among the four tire brands. One possible inference of interest could be the selection of the brand having minimum tread wear. Can the best-performing tire brand in the sample data be expected to provide the best tread wear if the same study is repeated? Are the results of the study applicable to the driving habits of the typical motorist? Experimental Designs There are many ways in which the tires can be assigned to the four cars. We will consider one running of the experiment in which we have four tires of each of the four brands. First, we need to decide how to assign the tires to the cars. We could randomly assign a single brand to each car, but this would result in a design having the unit of measurement the total loss of tread for all four tires on the car and not the individual tire loss. Thus, we must randomly assign the sixteen tires to the four cars. In Chapter 15, we will demonstrate how this randomization is conducted. One possible arrangement of the tires on the cars is shown in Table 2.2. In general, a completely randomized design is used when we are interested in comparing t “treatments” in our case, t ⫽ 4, the treatments are brand of tire. For each of the treatments, we obtain a sample of observations. The sample sizes could be different for the individual treatments. For example, we could test 20 tires from Brands A, B, and C but only 12 tires from Brand D. The sample of observations from a treatment is assumed to be the result of a simple random sample of observations TABLE 2.2 Completely randomized design of tire wear Car 1 Car 2 Car 3 Car 4 Brand B Brand A Brand A Brand D Brand B Brand A Brand B Brand D Brand B Brand C Brand C Brand D Brand C Brand C Brand A Brand D completely randomized design from the hypothetical population of possible values that could have resulted from that treatment. In our example, the sample of four tire-wear thicknesses from Brand A was considered to be the outcome of a simple random sample of four observations selected from the hypothetical population of possible tire-wear thicknesses for stan- dard model cars traveling 30,000 miles using Brand A. The experimental design could be altered to accommodate the effect of a variable related to how the experiment is conducted. In our example, we assumed that the effect of the different cars, weather, drivers, and various other factors was the same for all four brands. Now, if the wear on tires imposed by Car 4 was less severe than that of the other three cars, would our design take this effect into account? Because Car 4 had all four tires of Brand D placed on it, the wear ob- served for Brand D may be less than the wear observed for the other three brands because all four tires of Brand D were on the “best” car. In some situations, the ob- jects being observed have existing differences prior to their assignment to the treatments. For example, in an experiment evaluating the effectiveness of several drugs for reducing blood pressure, the age or physical condition of the participants in the study may decrease the effectiveness of the drug. To avoid masking the ef- fectiveness of the drugs, we would want to take these factors into account. Also, the environmental conditions encountered during the experiment may reduce the effec- tiveness of the treatment. In our example, we would want to avoid having the comparison of the tire brands distorted by the differences in the four cars. The experimental design used to accomplish this goal is called a randomized block design because we want to “block” out any differences in the four cars to obtain a precise comparison of the four brands of tires. In a randomized block design, each treatment appears in every block. In the blood pressure example, we would group the patients accord- ing to the severity of their blood pressure problem and then randomly assign the drugs to the patients within each group. Thus, the randomized block design is similar to a stratified random sample used in surveys. In the tire wear example, we would use the four cars as the blocks and randomly assign one tire of each brand to each of the four cars, as shown in Table 2.3. Now, if there are any differ- ences in the cars that may affect tire wear, that effect will be equally applied to all four brands. What happens if the position of the tires on the car affects the wear on the tire? The positions on the car are right front RF, left front LF, right rear RR, and left rear LR. In Table 2.3, suppose that all four tires from Brand A are placed on the RF position, Brand B on RR, Brand C on LF, and Brand D on LR. Now, if the greatest wear occurs for tires placed on the RF, then Brand A would be at a great disadvantage when compared to the other three brands. In this type of situa- tion we would state that the effect of brand and the effect of position on the car were confounded; that is, using the data in the study, the effects of two or more fac- tors cannot be unambiguously attributed to a single factor. If we observed a large difference in the average wear among the four brands, is this difference due to dif- ferences in the brands or differences due to the position of the tires on the car? TABLE 2.3 Randomized block design of tire wear Car 1 Car 2 Car 3 Car 4 Brand A Brand A Brand A Brand A Brand B Brand B Brand B Brand B Brand C Brand C Brand C Brand C Brand D Brand D Brand D Brand D randomized block design Using the design given in Table 2.3, this question cannot be answered. Thus, we now need two blocking variables: the “car” the tire is placed on and the “position” on the car. A design having two blocking variables is called a Latin square design. A Latin square design for our example is shown in Table 2.4. Note that with this design, each brand is placed in each of the four positions and on each of the four cars. Thus, if position or car has an effect on the wear of the tires, the position effect andor car effect will be equalized across the four brands. The observed differences in wear can now be attributed to differences in the brand of the car. The randomized block and Latin square designs are both extensions of the completely randomized design in which the objective is to compare t treatments. The analysis of data for a completely randomized design and for block designs and the inferences made from such analyses are discussed further in Chapters 14, 15, and 17. A special case of the randomized block design is presented in Chapter 6, where the number of treatments is t ⫽ 2 and the analysis of data and the inferences from these analyses are discussed. Factorial Treatment Structure in a Completely Randomized Design In this section, we will discuss how treatments are constructed from several factors rather than just being t levels of a single factor. These types of experiments are in- volved with examining the effect of two or more independent variables on a re- sponse variable y. For example, suppose a company has developed a new adhesive for use in the home and wants to examine the effects of temperature and humidity on the bonding strength of the adhesive. Several treatment design questions arise in any study. First, we must consider what factors independent variables are of greatest interest. Second, the number of levels and the actual settings of these lev- els must be determined for each factor. Third, having separately selected the levels for each factor, we must choose the factor-level combinations treatments that will be applied to the experimental units. The ability to choose the factors and the appropriate settings for each of the factors depends on budget, time to complete the study, and, most important, the experimenter’s knowledge of the physical situation under study. In many cases, this will involve conducting a detailed literature review to determine the current state of knowledge in the area of interest. Then, assuming that the experimenter has chosen the levels of each independent variable, he or she must decide which factor-level combinations are of greatest interest and are viable. In some situations, certain factor-level combinations will not produce an experimental setting that can elicit a reasonable response from the experimental unit. Certain combinations may not be feasible due to toxicity or practicality issues. One approach for examining the effects of two or more factors on a response is called the one-at-a-time approach. To examine the effect of a single variable, an experimenter varies the levels of this variable while holding the levels of the other factors one-at-a-time approach TABLE 2.4 Latin square design of tire wear Position Car 1 Car 2 Car 3 Car 4 RF Brand A Brand B Brand C Brand D RR Brand B Brand C Brand D Brand A LF Brand C Brand D Brand A Brand B LR Brand D Brand A Brand B Brand C Latin square design independent variables fixed. This process is continued until the effect of each variable on the response has been examined. For example, suppose we want to determine the combination of nitrogen and phosphorus that produces the maximum amount of corn per plot. We would select a level of phosphorus, say, 20 pounds, vary the levels of nitrogen, and observe which combination gives maximum yield in terms of bushels of corn per acre. Next, we would use the level of nitrogen producing the maximum yield, vary the amount of phosphorus, and observe the combination of nitrogen and phosphorus that pro- duces the maximum yield. This combination would be declared the “best” treat- ment. The problem with this approach will be illustrated using the hypothetical yield values given in Table 2.5. These values would be unknown to the experi- menter. We will assume that many replications of the treatments are used in the ex- periment so that the experimental results are nearly the same as the true yields. Initially, we run experiments with 20 pounds of phosphorus and the levels of nitrogen at 40, 50, and 60. We would determine that using 60 pounds of nitrogen with 20 pounds of phosphorus produces the maximum production, 160 bushels per acre. Next, we set the nitrogen level at 60 pounds and vary the phosphorus levels. This would result in the 10 level of phosphorus producing the highest yield, 175 bushels, when combined with 60 pounds of nitrogen. Thus, we would determine that 10 pounds of phosphorus with 60 pounds of nitrogen produces the maximum yield. The results of these experiments are summarized in Table 2.6. Based on the experimental results using the one-factor-at-a-time methodol- ogy, we would conclude that the 60 pounds of nitrogen and 10 pounds of phospho- rus is the optimal combination. An examination of the yields in Table 2.5 reveals that the true optimal combination was 40 pounds of nitrogen with 30 pounds of phosphorus producing a yield of 190 bushels per acre. Thus, this type of experi- mentation may produce incorrect results whenever the effect of one factor on the response does not remain the same at all levels of the second factor. In this situa- tion, the factors are said to interact. Figure 2.2 depicts the interaction between nitrogen and phosphorus in the production of corn. Note that as the amount of nitrogen is increased from 40 to 60 there is an increase in the yield when using the 10 level of phosphorus. At the 20 level of phosphorus, increasing the amount of nitrogen also produces an increase in yield but with smaller increments. At the 20 level of phosphorus, the yield increases 15 bushels when the nitrogen level is changed from 40 to 60. However, at the 10 level of phosphorus, the yield increases 50 bushels when the level of nitrogen is increased from 40 to 60. Furthermore, at the 30 level of phosphorus, increasing the level of nitrogen actually causes the TABLE 2.5 Hypothetical population yields bushels per acre Phosphorus Nitrogen

10 20

30

40 125 145 190 50 155 150 140 60 175 160 125 TABLE 2.6 Yields for the experimental results Phosphorus 20 20 20 10 30 Nitrogen 40 50 60 60 60 Yield 145 155 160 175 125 interact yield to decrease. When there is no interaction between the factors, increasing the nitrogen level would have produced identical changes in the yield at all levels of phosphorus. Table 2.7 and Figure 2.3 depict a situation in which the two factors do not in- teract. In this situation, the effect of phosphorus on the corn yield is the same for all three levels of nitrogen; that is, as we increase the amount of phosphorus, the change in corn yield is exactly the same for all three levels of nitrogen. Note that the change in yield is the same at all levels of nitrogen for a given change in phos- phorus. However, the yields are larger at the higher levels of nitrogen. Thus, in the profile plots we have three different lines but the lines are parallel. When interac- tion exists among the factors, the lines will either cross or diverge. TABLE 2.7 Hypothetical population yields no interaction Phosphorus Nitrogen

10 20

30

40 125 145 150 50 145 165 170 60 165 185 190 FIGURE 2.2 Yields from nitrogen–phosphorus treatments interaction is present. N-40 N-50 N-60

10 20

30 200 180 160 140 120 100 110 130 150 170 190 Phosphorus level Corn yield FIGURE 2.3 Yields from nitrogen–phosphorus treatments no nteraction N-40 N-50 N-60

10 20

30 200 180 160 140 120 100 Phosphorus level 190 170 150 130 110 From Figure 2.3 we can observe that the one-at-a-time approach is appropri- ate for a situation in which the two factors do not interact. No matter what level is selected for the initial level of phosphorus, the one-at-a-time approach will pro- duce the optimal yield. However, in most situations, prior to running the experi- ments it is not known whether the two factors will interact. If it is assumed that the factors do not interact and the one-at-a-time approach is implemented when in fact the factors do interact, the experiment will produce results that will often fail to identify the best treatment. Factorial treatment structures are useful for examining the effects of two or more factors on a response, whether or not interaction exists. As before, the choice of the number of levels of each variable and the actual settings of these variables is important. When the factor-level combinations are assigned to experimental units at random, we have a completely randomized design with treatments being the factor-level combinations. Using our previous example, we are interested in examining the effect of nitrogen and phosphorus levels on the yield of a corn crop. The nitrogen levels are 40, 50, and 60 pounds per plot and the phosphorus levels are 10, 20, and 30 pounds per plot. We could use a completely randomized design where the nine factor-level combinations treatments of Table 2.8 are assigned at random to the experimental units the plots of land planted with corn. It is not necessary to have the same number of levels of both factors. For ex- ample, we could run an experiment with two levels of phosphorus and three levels of nitrogen, a 2 ⫻ 3 factorial structure. Also, the number of factors can be more than two. The corn yield experiment could have involved treatments consisting of four levels of potassium along with the three levels of phosphorus and nitrogen, a 4 ⫻ 3 ⫻ 3 factorial structure. Thus, we would have 4 ⭈ 3 ⭈ 3 ⫽ 36 factor combina- tions or treatments. The methodology of randomization, analysis, and inferences for data obtained from factorial treatment structures in various experimental designs is discussed in Chapters 14, 15, 17, and 18. More Complicated Designs Sometimes the objectives of a study are such that we wish to investigate the effects of certain factors on a response while blocking out certain other extraneous sources of variability. Such situations require a block design with treatments from a factorial treatment structure and can be illustrated with the following example. An investigator wants to examine the effectiveness of two drugs A and B for controlling heartworms in puppies. Veterinarians have conjectured that the effectiveness of the drugs may depend on a puppy’s diet. Three different diets Factor 1 are combined with the two drugs Factor 2 and we have a 3 ⫻ 2 factorial treatment structure consisting of six treatments. Also, the effectiveness of the drugs may depend on a transmitted inherent protection against heartworm obtained from the puppy’s mother. Thus, four litters of puppies consisting of six puppies each were selected to serve as a blocking factor in the experiment because all puppies within a given litter have the same mother. The six factor-level com- binations treatments were randomly assigned to the six puppies within each of TABLE 2.8 Factor-level combinations for the 3 ⫻ 3 factorial treatment structure Treatment 1 2 3 4 5 6 7 8 9 Phosphorus 10 10 10 20 20

20 30

30 30 Nitrogen 40 50 60 40 50 60 40 50 60 factorial treatment structures the four litters. The design is shown in Table 2.9. Note that this design is really a randomized block design in which the blocks are litters and the treatments are the six factor-level combinations of the 3 ⫻ 2 factorial treatment structure. Other more complicated combinations of block designs and factorial treat- ment structures are possible. As with sample surveys, however, we will deal only with the simplest experimental designs in this text. The point we want to make is that there are many different experimental designs that can be used in scientific studies for designating the collection of sample data. Each has certain advantages and dis- advantages. We expand our discussion of experimental designs in Chapters 14 –18, where we concentrate on the analysis of data generated from these designs. In those situations that require more complex designs, a professional statistician needs to be consulted to obtain the most appropriate design for the survey or experimental setting. Controlling Experimental Error As we observed in Examples 2.4 and 2.5, there are many potential sources of ex- perimental error in an experiment. When the variance of experimental errors is large, the precision of our inferences will be greatly compromised. Thus, any tech- niques that can be implemented to reduce experimental error will lead to a much improved experiment and more precise inferences. The researcher may be able to control many of the potential sources of ex- perimental errors. Some of these sources are 1 the procedures under which the experiment is conducted, 2 the choice of experimental units and measurement units, 3 the procedure by which measurements are taken and recorded, 4 the blocking of the experimental units, 5 the type of experimental design, and 6 the use of ancillary variables called covariates. We will now address how each of these sources may affect experimental error and how the researcher may minimize the effect of these sources on the size of the variance of experimental error. Experimental Procedures When the individual procedures required to conduct an experiment are not done in a careful, precise manner, the result is an increase in the variance of the response variable. This involves not only the personnel used to conduct the experiments and to measure the response variable but also the equipment used in their procedures. Personnel must be trained properly in constructing the treatments and carrying out the experiments. The consequences of their performance on the success of the experiment should be emphasized. The researcher needs to provide the technicians with equipment that will produce the most precise measurements within budget block design covariates TABLE 2.9 Block design for heartworm experiment Litter Puppy 1 2 3 4 1 A-D1 A-D3 B-D3 B-D2 2 A-D3 B-D1 A-D2 A-D2 3 B-D1 A-D1 B-D2 A-D1 4 A-D2 B-D2 B-D1 B-D3 5 B-D3 B-D3 A-D1 A-D3 6 B-D2 A-D2 A-D2 B-D1 constraints. It is crucial that equipment be maintained and calibrated at frequent intervals throughout the experiment. The conditions under which the experiments are run must be as nearly constant as possible during the duration of the experiment. Otherwise, differences in the responses may be due to changes in the experimental conditions and not due to treatment differences. When experimental procedures are not of high quality, the variance of the re- sponse variable may be inflated. Improper techniques used when taking measure- ments, improper calibration of instruments, or uncontrolled conditions within a laboratory may result in extreme observations that are not truly reflective of the effect of the treatment on the response variable. Extreme observations may also occur due to recording errors by the laboratory technician or the data manager. In either case, the researcher must investigate the circumstances surrounding ex- treme observations and then decide whether to delete the observations from the analysis. If an observation is deleted, an explanation of why the data value was not included should be given in the appendix of the final report. When experimental procedures are not uniformly conducted throughout the study period, two possible outcomes are an inflation in the variance of the re- sponse variable and a bias in the estimation of the treatment mean. For example, suppose we are measuring the amount of drug in the blood of rats injected with one of four possible doses of a drug. The equipment used to measure the precise amount of drug to be injected is not working properly. For a given dosage of the drug, the first rats injected were given a dose that was less than the prescribed dose, whereas the last rats injected were given more than the prescribed amount. Thus, when the amount of drug in the blood is measured, there will be an increase in the variance in these measurements but the treatment mean may be estimated without bias because the overdose and underdose may cancel each other. On the other hand, if all the rats receiving the lowest dose level are given too much of the drug and all the rats receiving the highest dose level are not given enough of the drug, then the estimation of the treatment means will be biased. The treatment mean for the low dose will be overestimated, whereas the high dose will have an underesti- mated treatment mean. Thus, it is crucial to the success of the study that experi- mental procedures are conducted uniformly across all experimental units. The same is true concerning the environmental conditions within a laboratory or in a field study. Extraneous factors such as temperature, humidity, amount of sunlight, exposure to pollutants in the air, and other uncontrolled factors when not uni- formly applied to the experimental units may result in a study with both an inflated variance and a biased estimation of treatment means. Selecting Experimental and Measurement Units When the experimental units used in an experiment are not similar with respect to those characteristics that may affect the response variable, the experimental error variance will be inflated. One of the goals of a study is to determine whether there is a difference in the mean responses of experimental units receiving different treat- ments. The researcher must determine the population of experimental units that are of interest. The experimental units are randomly selected from that population and then randomly assigned to the treatments. This is of course the idealized situation. In practice, the researcher is somewhat limited in the selection of experimental units by cost, availability, and ethical considerations. Thus, the inferences that can be drawn from the experimental data may be somewhat restricted. When examining the pool of potential experimental units, sets of units that are more similar in char- acteristics will yield more precise comparisons of the treatment means. However, if the experimental units are overly uniform, then the population to which inferences may be properly made will be greatly restricted. Consider the following example. EXAMPLE 2.8 A sales campaign to market children’s products will use television commercials as its central marketing technique. A marketing firm hired to determine whether the attention span of children is different depending on the type of product being ad- vertised decided to examine four types of products: sporting equipment, healthy snacks, shoes, and video games. The firm selected 100 fourth-grade students from a New York City public school to participate in the study. Twenty-five students were randomly assigned to view a commercial for each of the four types of products. The attention spans of the 100 children were then recorded. The marketing firm thought that by selecting participants of the same grade level and from the same school system it would achieve a homogeneous group of subjects. What problems exist with this selection procedure? Solution The marketing firm was probably correct in assuming that by selecting the students from the same grade level and school system it would achieve a more homogeneous set of experimental units than by using a more general selection pro- cedure. However, this procedure has severely limited the inferences that can be made from the study. The results may be relevant only to students in the fourth grade and residing in a very large city. A selection procedure involving other grade levels and children from smaller cities would provide a more realistic study. Reducing Experimental Error through Blocking When we are concerned that the pool of available experimental units has large dif- ferences with respect to important characteristics, the use of blocking may prove to be highly effective in reducing the experimental error variance. The experimental units are placed into groups based on their similarity with respect to characteristics that may affect the response variable. This results in sets or blocks of experimental units that are homogeneous within the block, but there is a broad coverage of im- portant characteristics when considering the entire unit. The treatments are ran- domly assigned separately within each block. The comparison of the treatments is within the groups of homogeneous units and hence yields a comparison of the treat- ments that is not masked by the large differences in the original set of experimental units. The blocking design will enable us to separate the variability associated with the characteristics used to block the units from the experimental error. There are many criteria used to group experimental units into blocks; they include the following: 1. Physical characteristics such as age, weight, sex, health, and education of the subjects 2. Units that are related such as twins or animals from the same litter

3.

Spatial location of experimental units such as neighboring plots of land or position of plants on a laboratory table 4. Time at which experiment is conducted such as the day of the week, because the environmental conditions may change from day to day 5. Person conducting the experiment, because if several operators or tech- nicians are involved in the experiment they may have some differences in how they make measurements or manipulate the experimental units In all of these examples, we are attempting to observe all the treatments at each of the levels of the blocking criterion. Thus, if we were studying the number of cars with a major defect coming off each of three assembly lines, we might want to use day of the week as a blocking variable and be certain to compare each of the assembly lines on all 5 days of the work week. Using Covariates to Reduce Variability A covariate is a variable that is related to the response variable. Physical char- acteristics of the experimental units are used to create blocks of homogeneous units. For example, in a study to compare the effectiveness of a new diet to a control diet in reducing the weight of dogs, suppose the pool of dogs available for the study varied in age from 1 year to 12 years. We could group the dogs into three blocks: B 1 — under 3 years, B 2 — 3 years to 8 years, B 3 — over 8 years. A more exacting methodology records the age of the dog and then incorporates the age directly into the model when attempting to assess the effectiveness of the diet. The response variable would be adjusted for the age of the dog prior to comparing the new diet to the control diet. Thus, we have a more exact compar- ison of the diets. Instead of using a range of ages as is done in blocking, we are using the exact age of the dog, which reduces the variance of the experimental error. Candidates for covariates in a given experiment depend on the particular ex- periment. The covariate needs to have a relationship to the response variable, it must be measurable, and it cannot be affected by the treatment. In most cases, the covariate is measured on the experimental unit before the treatment is given to the unit. Examples of covariates are soil fertility, amount of impurity in a raw material, weight of an experimental unit, SAT score of student, cholesterol level of subject, and insect density in the field. The following example will illustrate the use of a covariate. EXAMPLE 2.9 In this study, the effects of two treatments, supplemental lighting SL and partial shading PS, on the yield of soybean plants were compared with normal lighting NL. Normal lighting will serve as a control. Each type of lighting was randomly assigned to 15 soybean plants and the plants were grown in a greenhouse study. When setting up the experiment, the researcher recognized that the plants were of differing size and maturity. Consequently, the height of the plant, a measurable characteristic of plant vigor, was determined at the start of the experiment and will serve as a covariate. This will allow the researcher to adjust the yields of the indi- vidual soybean plants depending on the initial size of the plant. On each plant we record two variables, x, y where x is the height of the plant at the beginning of the study and y is the yield of soybeans at the conclusion of the study. To determine whether the covariate has an effect on the response variable, we plot the two variables to assess any possible relationship. If no relationship exists, then the co- variate need not be used in the analysis. If the two variables are related, then we must use the techniques of analysis of covariance to properly adjust the response variable prior to comparing the mean yields of the three treatments. An initial as- sessment of the viability of the relationship is simply to plot the response variable versus the covariate with a separate plotting characteristic for each treatment. Figure 2.4 contains this plot for the soybean data. analysis of covariance From Figure 2.4, we observe that there appears to be an increasing relation- ship between the covariate—initial plant height—and the response variable— yield. Also, the three treatments appear to have differing yields; some of the variation in the response variable is related to the initial height as well as to the dif- ference in the amount of lighting the plant received. Thus, we must identify the amount of variation associated with initial height prior to testing for differences in the average yields of the three treatment. We can accomplish this using the tech- niques of analysis of variance. The analysis of covariance procedures will be discussed in detail in Chapter 16.

2.6 Research Study: Exit Polls versus Election Results

In the beginning of this chapter, we discussed the apparent “discrepancy” between exit polls and the actual voter count during the 2004 presidential election. We will now attempt to answer the following question. Why were there discrepancies between the exit polls and the election results obtained for the 11 “crucial” states? We will not be able to answer this question definitely, but we can look at some of the issues that pollsters must address when relying on exit polls to accurately predict election results. First, we need to understand how an exit poll is conducted. We will examine the process as implemented by one such polling company, Edison Media Research and Mitofsky International, as reported on their website. They conducted exit polls in each state. The state exit poll was conducted at a random sample of polling places among Election Day voters. The polling places are a stratified probability sample of a state. Within each polling place, an interviewer approached every nth voter as he or she exited the polling place. Approximately 100 voters completed a questionnaire at each polling place. The exact number depends on voter turnout and the willingness of selected voters to cooperate. In addition, absentee andor early voters were interviewed in pre-election telephone polls in a number of states. All samples were random-digit dialing FIGURE 2.4 Plot of plant height versus yield: S ⫽ Supplemental Lighting, C ⫽ Normal Lighting, P ⫽ Partial Shading 30 8 9 10 11 12 13 14 15 16 17 40 PP C C C S S S S S SS S S S S S C C C C C C C C C C C P P P P P P PP P PP P 50 HEIGHT YIELD 60 70 RDD selections except for Oregon, which used both RDD and some followup calling. Absentee or early voters were asked the same questions as voters at the polling place on Election Day. Results from the phone poll were combined with results from voters interviewed at the polling places. The combination reflects approximately the correct proportion of absenteeearly voters and Election Day voters. The first step in addressing the discrepancies between the exit poll results and actual election tabulation numbers would be to examine the results for all states, not just those thought to be crucial in determining the outcome of the election. These data are not readily available. Next we would have to make certain that voter fraud was not the cause for the discrepancies. That is the job of the state voter commissions. What can go wrong with exit polls? A number of possibilities exist, including the following: 1. Nonresponse: How are the results adjusted for sampled voters refusing to complete the survey? How are the RDD results adjusted for those screening their calls and refusing to participate? 2. Wording of the questions on the survey: How were the questions asked? Were they worded in an unbiased neutral way without leading questions?

3.

Timing of the exit poll: Were the polls conducted throughout the day at each polling station or just during one time frame? 4. Interviewer bias: Were the interviewers unbiased in the way they approached sampled voters? 5. Influence of election officials: Did the election officials at location evenly enforce election laws at the polling booths? Did the officials have an impact on the exit pollsters? 6. Voter validity: Did those voters who agreed to be polled, give accurate answers to the questions asked? 7. Agreement with similar pre-election Surveys: Finally, when the exit polls were obtained, did they agree with the most recent pre-election surveys? If not, why not? Raising these issues is not meant to say that exit polls cannot be of use in predict- ing actual election results, but they should be used with discretion and with safe- guards to mitigate the issues we have addressed as well as other potential problems. But, in the end, it is absolutely essential that no exit poll results be made public until the polls across the country are closed. Otherwise, there is a significant, seri- ous chance that potential voters may be influenced by the results, thus affecting their vote or, worse, causing them to decide not to vote based on the conclusions derived from the exit polls.

2.7 Summary

The first step in Learning from Data involves defining the problem. This was dis- cussed in Chapter 1. Next, we discussed intelligent data gathering, which involves specifying the objectives of the data-gathering exercise, identifying the variables of interest, and choosing an appropriate design for the survey or experimental study. In this chapter, we discussed various survey designs and experimental designs for scientific studies. Armed with a basic understanding of some design considerations