Calibration and Results under the Current Policy

Vigdor and Clotfelter 17 update their posterior probability distributions. As changes in an applicant’s beliefs about her true ability influence the probability she attaches to receiving any particular test score, her ‘‘reservation test scores’’ may change over time. Two applicants who receive identical scores on their first test may be differentially likely to retake the test for three basic reasons. First, they may face different costs of retaking the test. Those with part-time jobs, for instance, will tend to face higher opportunity costs of taking a test than other applicants. Applicants may have differen- tial psychic costs of undergoing a testing procedure. Even testing fees themselves, which are generally constant, may impose differential utility costs. Second, the value they attach to being admitted to a college may differ. These first two factors can be consolidated into one: applicants may differ in the ratio of their test-taking costs to the benefits they attach to admission—the ratio cV. Third, their prior beliefs, based on their practice draws, may lead them to expect different scores on their next test.

VI. Simulating Test-Taking Behavior

There are two basic reasons to simulate test-taking behavior. First, simulations provide further evidence on the relationship between an applicant’s re- taking behavior, the test-taking costs faced, the benefits attached to being admitted, and prior beliefs under the current policy. Second, simulations allow us to predict the impact of changes in SAT score ranking policy without actually ‘‘living through’’ the alternative policies.

A. Calibration and Results under the Current Policy

The simulation exercise we undertake here is calibrated in the sense that we choose parameter values that result in simulated behavior under the current SAT score rank- ing policy that closely resembles actually observed behavior under that same policy. To the extent that our simulation provides a reasonable facsimile of reality as ob- served in actual data, we have confidence that the procedure can suggest what changes in behavior might reasonably be associated with policy changes. The simulation procedure involves the following steps: 1. For each of 1,000 simulated applicants, draw a value for ρ m and ρ v , the applicant’s true math and verbal ability. We derive the population distribu- tion of values for ρ m and ρ v from our data on applicants to three selective universities. Specifically, these values are based on the distribution of first- time SAT scores in our data. 22 Scores are translated into ability parameters first by subtracting 200 the minimum score from each, then dividing the 22. To the extent that first-time SAT scores are not representative of applicants’ true ability in our data, we will be unable to fully match the behavior of our simulated applicants to patterns in the data. Since our data consist exclusively of applicants to selective institutions, it is quite likely that individuals with low observed first test scores are not representative of the entire population with low initial test scores. The implications of this selection issue are discussed below. 18 The Journal of Human Resources result by 600. 23 By deriving ability parameters directly from applicant SAT scores in our data, we are assuming that ρ m and ρ v each take on one of 61 discrete values, like the scores themselves. 2. Randomly draw an initial value of cV, the ratio of costs of test-taking to benefits of admission, for each applicant. For simplicity, the cost-to-benefit ratio will take on two values corresponding to ‘‘high cost’’ and ‘‘low cost’’ applicants. The values of cV are calibrated to yield a pattern of retaking similar to that found in our data. In this simulation, the ratio of test taking costs to admission benefits increases linearly in the number of times previ- ously taken. 3. Administer 120 ‘‘practice’’ Bernoulli trials, 60 each with probability of suc- cess ρ m and ρ v . Using the results of these practice trials, each applicant forms a prior probability distribution that indicates their perception of their own ability prior to any actual test-taking. Applicants will never learn their true ability parameters; they will only receive information about them based on how they perform on tests. 4. Administer a simulated SAT, which consists of 60 independent Bernoulli trials each for the math and verbal scores, with probability of success equal to ρ m and ρ v , respectively. 24 Calculate the applicants’ SAT scores by multi- plying the number of successes by 10, then adding 200. Applicants then update their beliefs regarding the true values of their ability parameters ρ m and ρ v . 5. Applicants use their newly calculated posterior distribution on ρ m and ρ v , their value of cV, and probabilities of admission conditional on test scores to decide whether to retake the SAT. Applicants are aware that they can expect their scores to drift upwards if they decide to retake. If applicants decide to refrain from retaking the test, the simulation stops. 6. For applicants that decide to retake the SAT, we administer an additional simulated SAT. Because our evidence presented in Section IV above indi- cates that individuals’ scores increase upon retaking, we increase the proba- bilities of success on the math and verbal exams. These increases in ρ m and ρ v reflect the presumption that each time an applicant retakes the SAT, she can expect both her math and verbal scores to increase by about 10 points. Applicants use the information in their newest set of SAT scores to update their beliefs regarding ρ m and ρ v , then return to Step 5. 23. There are two exceptions to this translation. SAT scores of 800 are translated into ability parameters of 0.99 rather than l, and SAT scores of 200 are translated into ability parameters of 0.01 rather than 0. Setting ability parameters equal to zero or one in our simulation exercise would eliminate all uncertainty in an applicant’s test scores. 24. The use of 120 Bernoulli trials 60 each for math and verbal to simulate an SAT administration can be justified on three grounds. First, the number of successes in 60 trials translates easily to the SAT scale. Second, the standard deviation of an applicant’s score distribution closely matches that observed in actual SAT scores 30 to 40 points. Third, the number of questions actually used to compute SAT math and verbal scores is roughly sixty. Vigdor and Clotfelter 19 Table 6 Simulation Results under the Current Score Ranking Policy One-time Two-time Three-time Four-time takers takers takers takers First scores mathverbal 585600 565581 601611 588613 Second scores mathverbal 580601 610617 593620 Third scores mathverbal 626637 595623 Fourth scores mathverbal 623644 Percent of sample 13.5 48.2 28.7 9.6 Mean true ability parameters mathverbal 573582 570587 604614 586610 Percent ‘‘high-cost’’ type 35 19 8 Details regarding the specific assumptions and parameter values used in the simu- lation can be found in the Appendix. The simulation was calibrated to match the probability of a simulated applicant taking the test a second, third, or fourth time to the observed probability of an applicant’s taking the test a second, third, or fourth time. As Table 6 shows, the calibration exercise performed relatively well in match- ing the observed probability of retaking. Among our simulated applicants, the proba- bility of taking the SAT two or more times under the current score ranking policy is 86.5 percent; the probability of taking the SAT three or more times is 38.3 percent, and the probability of taking the SAT four times is 9.6 percent. Since very few actual applicants take the SAT five or more times, our simulation stops after the fourth test administration. A comparison of Tables 4 and 6 suggests that our simulation fails to capture the exact nature of selection into the pool of retakers in two ways. First, in our data on actual applicants, the set of individuals stopping after one test administration obtains significantly higher scores, on average, than any other group. In our simulation, that is not the case. Second, our simulation suggests that applicants with exceptionally high test score gains are more likely to refrain from taking the test an additional time. Individuals who experience moderate increases, conversely, are more likely to take the test again. In our actual data, test score gains are spread more evenly through the population: the set of individuals who stop after the third administration, for example, experience roughly the same gain between the 2nd and 3rd administra- tions as do those who choose to take the test a fourth time. The most plausible explanation for this divergence is our inability to model the selection of SAT takers into the pool of applicants to one of our three sample univer- sities. Our simulation procedure explicitly equates an applicant’s true ability with her scores on the first SAT administration. In reality, the applicants with low initial SAT scores in our sample are probably not representative of the overall population with low initial SAT scores, since our sample consists of selective institutions. In our actual data, individuals with low initial scores are more likely to retake, presum- ably because they believe that their initial scores underestimate their true ability. To compensate for this underestimate of retaking in a subset of the sample, we overesti- 20 The Journal of Human Resources mate the extent of the retaking in the general population—implying that our simu- lated applicants must score higher, relative to expectations, than actual applicants before deciding to stop taking the test. This caveat should be considered carefully for two reasons. First, it implies that our simulation may not perfectly capture the degree of applicant response to changes in SAT score ranking policies. Second, it suggests that we are omitting one important source of applicant response to a change in test score ranking policies: the decision to apply in the first place. Bearing these concerns in mind, we will proceed with our analysis of simulation results. Table 7 examines the determinants of retaking by presenting probit regressions analogous to the ones performed with actual data in Table 3 above. The first result, which predicts the probability of an applicant deciding to take the test a second time, indicates that cost-to-benefit ratios, prior beliefs, and first test scores each enter significantly into the equation. Comparing a high-cost and low-cost applicant with all other variables alike and equal to the mean values, the high-cost applicant is 20 percent less likely to retake the test. When all other variables are set equal to their respective means, an increase of 50 points in both the SAT math score and verbal score reduces the probability of retaking by about ten percentage points—a magni- tude quite similar to that derived from our actual data. Prior beliefs display a qua- dratic relationship with retaking behavior. For most applicants, the probability of retaking increases as the number of practice trial successes increases. This tendency decreases as the number of practice successes approaches the maximum value of 60. Holding other things constant then, applicants with more ‘‘pessimistic’’ prior beliefs are less likely to retake the test. These basic results persist when analyzing the decision to take the test a third time or a fourth time. With each retaking, a greater fraction of high-cost types drop out. A high-cost applicant with mean values of all other variables is about 36 percent less likely to take the test a third time when compared to an identical low-cost appli- cant. As shown in Table 6, no high cost applicants choose to take the test a fourth time. These results bear a distinct resemblance to those discussed in Section III above, which indicated that the greatest degree of selection occurred in the decision to take the test a third time conditional on taking twice. Interestingly, the probability of retaking the test appears to depend only on the most recently obtained set of SAT scores. Controlling for the most recent scores, previously received scores do not significantly affect the probability of retaking. Prior beliefs continue to significantly affect retaking, however. The results from this analysis point to the same conclusions that we derived from our analysis of actual applicant data. In light of the caveats discussed above, this is encouraging. Here, we show that applicants with pessimistic prior beliefs are signifi- cantly less likely to retake the test. In Table 3, we showed that individuals with lower self-reported ability and lower class rank were less likely to retake the test, conditional on initial scores. The role of prior beliefs may also explain why many groups with lower average test scores, including African-Americans and those from low-income families, are less likely to retake the test conditional on initial scores. These groups might also face higher test-taking costs, another factor shown to be important in the simulation. Vigdor and Clotfelter 21 Table 7 Explaining Retaking Behavior in Simulated Data Dependent Variable: Indicates whether applicant chooses to take the nth test, conditional on having taken n ⫺ 1 Independent Variable n ⫽ 2 n ⫽ 3 n ⫽ 4 High cost type indicator ⫺1.345 ⫺1.265 — a 0.166 0.161 Math prior successes 0.390 0.352 0.420 0.038 0.046 0.127 Math prior successes squared ⫺0.003 ⫺0.003 ⫺0.004 0.0004 0.0006 0.002 Verbal prior successes 0.395 0.545 0.319 0.041 0.057 0.140 Verbal prior successes squared ⫺0.003 ⫺0.005 ⫺0.003 0.0005 0.0006 0.002 First math score ⫺0.158 ⫺0.011 0.020 0.020 0.012 0.021 First verbal score ⫺0.150 0.008 0.011 0.178 0.012 0.024 Second math score — ⫺0.046 0.002 0.012 0.022 Second verbal score — ⫺0.107 ⫺0.001 0.013 0.025 Third math score — — ⫺0.092 0.020 Third verbal score — — ⫺0.075 0.021 Log likelihood ⫺219.99 ⫺434.72 ⫺175.00 N 1,000 865 360 Note: Standard errors in parentheses. Coefficients are derived from probit estimation of each equation. Test scores and number of prior successes each take on integer values between 0 and 60. Denotes a coefficient significant at the 5 percent level, the 10 percent level. a. Exactly zero high-cost types choose to take the test a fourth time; thus, the high cost indicator is dropped from this probit equation. 22 The Journal of Human Resources

B. Evaluating the Current Score Ranking Policy