Vigdor and Clotfelter 17
update their posterior probability distributions. As changes in an applicant’s beliefs about her true ability influence the probability she attaches to receiving any particular
test score, her ‘‘reservation test scores’’ may change over time. Two applicants who receive identical scores on their first test may be differentially
likely to retake the test for three basic reasons. First, they may face different costs of retaking the test. Those with part-time jobs, for instance, will tend to face higher
opportunity costs of taking a test than other applicants. Applicants may have differen- tial psychic costs of undergoing a testing procedure. Even testing fees themselves,
which are generally constant, may impose differential utility costs. Second, the value they attach to being admitted to a college may differ. These first two factors can be
consolidated into one: applicants may differ in the ratio of their test-taking costs to the benefits they attach to admission—the ratio cV. Third, their prior beliefs, based
on their practice draws, may lead them to expect different scores on their next test.
VI. Simulating Test-Taking Behavior
There are two basic reasons to simulate test-taking behavior. First, simulations provide further evidence on the relationship between an applicant’s re-
taking behavior, the test-taking costs faced, the benefits attached to being admitted, and prior beliefs under the current policy. Second, simulations allow us to predict
the impact of changes in SAT score ranking policy without actually ‘‘living through’’ the alternative policies.
A. Calibration and Results under the Current Policy
The simulation exercise we undertake here is calibrated in the sense that we choose parameter values that result in simulated behavior under the current SAT score rank-
ing policy that closely resembles actually observed behavior under that same policy. To the extent that our simulation provides a reasonable facsimile of reality as ob-
served in actual data, we have confidence that the procedure can suggest what changes in behavior might reasonably be associated with policy changes.
The simulation procedure involves the following steps: 1. For each of 1,000 simulated applicants, draw a value for
ρ
m
and ρ
v
, the applicant’s true math and verbal ability. We derive the population distribu-
tion of values for ρ
m
and ρ
v
from our data on applicants to three selective universities. Specifically, these values are based on the distribution of first-
time SAT scores in our data.
22
Scores are translated into ability parameters first by subtracting 200 the minimum score from each, then dividing the
22. To the extent that first-time SAT scores are not representative of applicants’ true ability in our data, we will be unable to fully match the behavior of our simulated applicants to patterns in the data. Since
our data consist exclusively of applicants to selective institutions, it is quite likely that individuals with low observed first test scores are not representative of the entire population with low initial test scores.
The implications of this selection issue are discussed below.
18 The Journal of Human Resources
result by 600.
23
By deriving ability parameters directly from applicant SAT scores in our data, we are assuming that
ρ
m
and ρ
v
each take on one of 61 discrete values, like the scores themselves.
2. Randomly draw an initial value of cV, the ratio of costs of test-taking to benefits of admission, for each applicant. For simplicity, the cost-to-benefit
ratio will take on two values corresponding to ‘‘high cost’’ and ‘‘low cost’’ applicants. The values of cV are calibrated to yield a pattern of retaking
similar to that found in our data. In this simulation, the ratio of test taking costs to admission benefits increases linearly in the number of times previ-
ously taken.
3. Administer 120 ‘‘practice’’ Bernoulli trials, 60 each with probability of suc- cess
ρ
m
and ρ
v
. Using the results of these practice trials, each applicant forms a prior probability distribution that indicates their perception of their own
ability prior to any actual test-taking. Applicants will never learn their true ability parameters; they will only receive information about them based on
how they perform on tests.
4. Administer a simulated SAT, which consists of 60 independent Bernoulli trials each for the math and verbal scores, with probability of success equal
to ρ
m
and ρ
v
, respectively.
24
Calculate the applicants’ SAT scores by multi- plying the number of successes by 10, then adding 200. Applicants then
update their beliefs regarding the true values of their ability parameters ρ
m
and ρ
v
. 5. Applicants use their newly calculated posterior distribution on
ρ
m
and ρ
v
, their value of cV, and probabilities of admission conditional on test scores
to decide whether to retake the SAT. Applicants are aware that they can expect their scores to drift upwards if they decide to retake. If applicants
decide to refrain from retaking the test, the simulation stops.
6. For applicants that decide to retake the SAT, we administer an additional simulated SAT. Because our evidence presented in Section IV above indi-
cates that individuals’ scores increase upon retaking, we increase the proba- bilities of success on the math and verbal exams. These increases in
ρ
m
and ρ
v
reflect the presumption that each time an applicant retakes the SAT, she can expect both her math and verbal scores to increase by about 10 points.
Applicants use the information in their newest set of SAT scores to update their beliefs regarding
ρ
m
and ρ
v
, then return to Step 5.
23. There are two exceptions to this translation. SAT scores of 800 are translated into ability parameters of 0.99 rather than l, and SAT scores of 200 are translated into ability parameters of 0.01 rather than 0.
Setting ability parameters equal to zero or one in our simulation exercise would eliminate all uncertainty in an applicant’s test scores.
24. The use of 120 Bernoulli trials 60 each for math and verbal to simulate an SAT administration can be justified on three grounds. First, the number of successes in 60 trials translates easily to the SAT scale.
Second, the standard deviation of an applicant’s score distribution closely matches that observed in actual SAT scores 30 to 40 points. Third, the number of questions actually used to compute SAT math and
verbal scores is roughly sixty.
Vigdor and Clotfelter 19
Table 6 Simulation Results under the Current Score Ranking Policy
One-time Two-time
Three-time Four-time
takers takers
takers takers
First scores mathverbal 585600
565581 601611
588613 Second scores mathverbal
580601 610617
593620 Third scores mathverbal
626637 595623
Fourth scores mathverbal 623644
Percent of sample 13.5
48.2 28.7
9.6 Mean true ability parameters mathverbal
573582 570587
604614 586610
Percent ‘‘high-cost’’ type 35
19 8
Details regarding the specific assumptions and parameter values used in the simu- lation can be found in the Appendix. The simulation was calibrated to match the
probability of a simulated applicant taking the test a second, third, or fourth time to the observed probability of an applicant’s taking the test a second, third, or fourth
time. As Table 6 shows, the calibration exercise performed relatively well in match- ing the observed probability of retaking. Among our simulated applicants, the proba-
bility of taking the SAT two or more times under the current score ranking policy is 86.5 percent; the probability of taking the SAT three or more times is 38.3 percent,
and the probability of taking the SAT four times is 9.6 percent. Since very few actual applicants take the SAT five or more times, our simulation stops after the fourth
test administration.
A comparison of Tables 4 and 6 suggests that our simulation fails to capture the exact nature of selection into the pool of retakers in two ways. First, in our data on
actual applicants, the set of individuals stopping after one test administration obtains significantly higher scores, on average, than any other group. In our simulation, that
is not the case. Second, our simulation suggests that applicants with exceptionally high test score gains are more likely to refrain from taking the test an additional
time. Individuals who experience moderate increases, conversely, are more likely to take the test again. In our actual data, test score gains are spread more evenly
through the population: the set of individuals who stop after the third administration, for example, experience roughly the same gain between the 2nd and 3rd administra-
tions as do those who choose to take the test a fourth time.
The most plausible explanation for this divergence is our inability to model the selection of SAT takers into the pool of applicants to one of our three sample univer-
sities. Our simulation procedure explicitly equates an applicant’s true ability with her scores on the first SAT administration. In reality, the applicants with low initial
SAT scores in our sample are probably not representative of the overall population with low initial SAT scores, since our sample consists of selective institutions. In
our actual data, individuals with low initial scores are more likely to retake, presum- ably because they believe that their initial scores underestimate their true ability. To
compensate for this underestimate of retaking in a subset of the sample, we overesti-
20 The Journal of Human Resources
mate the extent of the retaking in the general population—implying that our simu- lated applicants must score higher, relative to expectations, than actual applicants
before deciding to stop taking the test. This caveat should be considered carefully for two reasons. First, it implies that
our simulation may not perfectly capture the degree of applicant response to changes in SAT score ranking policies. Second, it suggests that we are omitting one important
source of applicant response to a change in test score ranking policies: the decision to apply in the first place. Bearing these concerns in mind, we will proceed with
our analysis of simulation results.
Table 7 examines the determinants of retaking by presenting probit regressions analogous to the ones performed with actual data in Table 3 above. The first result,
which predicts the probability of an applicant deciding to take the test a second time, indicates that cost-to-benefit ratios, prior beliefs, and first test scores each enter
significantly into the equation. Comparing a high-cost and low-cost applicant with all other variables alike and equal to the mean values, the high-cost applicant is 20
percent less likely to retake the test. When all other variables are set equal to their respective means, an increase of 50 points in both the SAT math score and verbal
score reduces the probability of retaking by about ten percentage points—a magni- tude quite similar to that derived from our actual data. Prior beliefs display a qua-
dratic relationship with retaking behavior. For most applicants, the probability of retaking increases as the number of practice trial successes increases. This tendency
decreases as the number of practice successes approaches the maximum value of 60. Holding other things constant then, applicants with more ‘‘pessimistic’’ prior
beliefs are less likely to retake the test.
These basic results persist when analyzing the decision to take the test a third time or a fourth time. With each retaking, a greater fraction of high-cost types drop
out. A high-cost applicant with mean values of all other variables is about 36 percent less likely to take the test a third time when compared to an identical low-cost appli-
cant. As shown in Table 6, no high cost applicants choose to take the test a fourth time. These results bear a distinct resemblance to those discussed in Section III
above, which indicated that the greatest degree of selection occurred in the decision to take the test a third time conditional on taking twice.
Interestingly, the probability of retaking the test appears to depend only on the most recently obtained set of SAT scores. Controlling for the most recent scores,
previously received scores do not significantly affect the probability of retaking. Prior beliefs continue to significantly affect retaking, however.
The results from this analysis point to the same conclusions that we derived from our analysis of actual applicant data. In light of the caveats discussed above, this is
encouraging. Here, we show that applicants with pessimistic prior beliefs are signifi- cantly less likely to retake the test. In Table 3, we showed that individuals with
lower self-reported ability and lower class rank were less likely to retake the test, conditional on initial scores. The role of prior beliefs may also explain why many
groups with lower average test scores, including African-Americans and those from low-income families, are less likely to retake the test conditional on initial scores.
These groups might also face higher test-taking costs, another factor shown to be important in the simulation.
Vigdor and Clotfelter 21
Table 7 Explaining Retaking Behavior in Simulated Data
Dependent Variable: Indicates whether applicant chooses to take
the nth test, conditional on having taken n ⫺ 1
Independent Variable n ⫽ 2
n ⫽ 3 n ⫽ 4
High cost type indicator ⫺1.345
⫺1.265 —
a
0.166 0.161
Math prior successes 0.390
0.352 0.420
0.038 0.046
0.127 Math prior successes squared
⫺0.003 ⫺0.003
⫺0.004 0.0004
0.0006 0.002
Verbal prior successes 0.395
0.545 0.319
0.041 0.057
0.140 Verbal prior successes squared
⫺0.003 ⫺0.005
⫺0.003 0.0005
0.0006 0.002
First math score ⫺0.158
⫺0.011 0.020
0.020 0.012
0.021 First verbal score
⫺0.150 0.008
0.011 0.178
0.012 0.024
Second math score —
⫺0.046 0.002
0.012 0.022
Second verbal score —
⫺0.107 ⫺0.001
0.013 0.025
Third math score —
— ⫺0.092
0.020 Third verbal score
— —
⫺0.075 0.021
Log likelihood ⫺219.99
⫺434.72 ⫺175.00
N 1,000
865 360
Note: Standard errors in parentheses. Coefficients are derived from probit estimation of each equation. Test scores and number of prior successes each take on integer values between 0 and 60.
Denotes a coefficient significant at the 5 percent level, the 10 percent level. a. Exactly zero high-cost types choose to take the test a fourth time; thus, the high cost indicator is dropped
from this probit equation.
22 The Journal of Human Resources
B. Evaluating the Current Score Ranking Policy