Vigdor and Clotfelter 11
occurs in the decision to take the test a third time, where an applicant from a com- pletely urban ZIP code was 8 percentage points more likely to retake the test than
an applicant from a completely rural ZIP code. Other ZIP code characteristics, such as racial composition and income, do not display a consistent relationship with re-
taking.
Finally, those who took their first SAT early were generally more likely than others to retake the test. This result is not surprising, since those applicants who
initially took the SAT on a late date would simply not have had many chances to retake the exam.
In conclusion, the empirical analysis of who retakes the SAT indicates significant differences by race, income, parental education, self-reported class rank and ability,
and type of community. Most of these relationships are obscured in the raw data, presumably by the extremely strong tendency for students who score well on the
test to refrain from taking it again. These explanatory variables might measure differ- ential expectations regarding future test scores, variation in test-taking costs, or varia-
tion in the benefits associated with admission. Many of the applicant characteristics associated with a lower propensity to retake are also correlated with lower overall
SAT scores, suggesting that applicants may form expectations in a manner that re- sembles statistical discrimination.
12
The greatest amount of selection into the pool of retakers appears to occur in the decision to take the test a third time.
IV. Explaining the Increase in Scores
The tendency for SAT scores to increase is evident in the averages presented in Table 4. Using both nationwide data and data from our three-institution
sample, the table shows that students taking the test on average improve their scores with each successive administration.
13
This tendency applies to both math and verbal tests. Consider, for example, those who took the SAT three times. Among all those
in the 1997 national cohort, the average score among this group on the verbal test increased from 493 in the first taking to 515 on the third and from 510 to 537 on
the math. Within our sample of applicants to three institutions, the comparable in- creases were 573 to 602 for the verbal and 555 to 583 for the math. For all those
taking the test at least twice, the average increase on the second try in the national sample was 13 points for the verbal and 16 points for math. The comparable increases
for the three-university sample were both about 16 points. Both samples show the same thing: retaking the test is associated with higher scores.
What explains these score increases? At least three possible reasons for this ten- dency suggest themselves. First, improvement might arise because of students’ in-
creased familiarity with the SAT test, its format, and the kinds of questions it in- cludes. Second, rising scores may reflect the general increase in knowledge that one
12. A simple regression of first-time SAT scores on the covariates other than SAT scores in Table 3 reveals that income, parental education, class rank, self-reported math and writing ability, residence in an
urban or wealthy ZIP code, and Asian racial background are all significantly positively correlated with test scores. Black, American Indian, Hispanic, and Female applicants receive significantly lower scores
on their first test. The R
2
for this regression, with 22,678 observations, is 0.52. 13. For an analysis of various aspects associated with changes in scores, see Nathan and Camara 1998.
12 The Journal of Human Resources
Table 4 Average SAT Scores for Students, by Number of times Taking the test
Panel A: 1997 Graduating Cohort Average
score increase
Number of times taken over
previous 1
2 3
4 5
test
c
Number of applicants
a
567,495 426,569
107,870 15,633
2,417 Percentage
b
50.6 38.0
9.6 1.4
0.2 Average score—Verbal
First test 492
507 493
468 442
Second test 520
504 480
453 13
Third test 515
488 460
11 Fourth test
499 469
11 Fifth test
480 11
Average score—Math First test
492 512
510 495
481 Second test
528 525
511 496
16 Third test
537 522
507 12
Fourth test 532
518 10
Fifth test 526
8 Panel B: Three University Sample
Number of times taken Average score
increase over 1
2 3
4⫹ previous test
c
Number of applicants
d
4,040 11,007
6,000 1,631
Percentage
b
17.8 48.7
27.0 6.5
Average score—Verbal First test
649 602
573 538
Second test 617
591 562
15.9 Third test
602 575
11.1 Fourth test
582 7.7
Average score—Math First test
641 589
555 515
Second test 606
572 536
16.2 Third test
583 548
10.8 Fourth test
557 9.7
a. Data are based on 1,119,984 students who took the SAT1 one to five times in their junior or senior years.
b. Rows sum to 100.0. c. Calculated for all those who took the test at least the indicated number of times.
d. Data are based on 22,678 students who graduated in 1998 and took the SAT1 one to four or more times from their spring of sophomore year to their senior years.
Vigdor and Clotfelter 13
expects to correspond to aging and time in school. Third, the increase could arise out of a selection effect, whereby those who had performed badly relative to their
expectations constituted the bulk of retakers, in which case the improvement in their scores might arise from ordinary regression to the mean. The first two possible
causes could be thought of as ‘‘real,’’ as opposed to mere selection.
Determining whether scores truly tend to ‘‘drift upward’’ upon retaking is an important precursor to modeling retaking behavior. To test whether the observed
increases in test scores are consistent with selection effects, we employ the two- stage Heckman sample-selection procedure Heckman 1979.
14
The first stage of the procedure consists of probit regressions identical to those reported in Table 3 above,
which predict the probability that any particular applicant will enter the sample of retakers. In the second stage, we estimate the following equations for both math and
verbal test scores:
1 Predicted Test Score Gain
ij
⫽ βˆ
0j
⫹ βˆ
1j
λˆ
i
, where i indexes students, j represents either the verbal or math score, and
λˆ
i
is an applicant’s inverse Mills ratio as estimated by the first-stage probit regression.
15
This procedure allows us to predict selection-corrected test score gains for applicants both
in and out of sample.
16
Table 5 presents estimates of βˆ
0j
and βˆ
1j
. We estimate separate second-stage equations for test score changes from the first
to the second, second to third, and third to fourth test administrations; one set each is estimated for the verbal and math parts of the test. For all six equations, the selection
coefficient βˆ
1j
is negative, indicating that, as theory would predict, those individuals most likely to retake the test are those with the highest expected score gains. Esti-
mates of selection into the pool of two- and three-time takers are statistically signifi- cant; selection into the pool of four-time takers is not.
To give an idea of the importance of this selection effect on the amount of gains, we use the second-stage Heckman results to calculate simple predicted score changes
between the nth and n⫹1th administrations for all individuals who took the nth test, regardless of whether they actually took the n⫹1th. We compare these selection-
corrected average test score gains with the observed average score gain, equating the difference in these values with a selection effect.
14. The Heckman selection correction procedure presumes that the error terms in the sample selection equation the probit equations reported in Table 3 and the outcome equations where outcomes here are
increases in test scores follow a bivariate normal distribution, with some correlation between the error terms. In other words, individuals with exceptionally low latent increases in test scores might also be
exceptionally unlikely to retake the test. 15. The inverse Mills ratio for each observation is computed as follows:
λˆ
i
⫽ φx
i
γˆ Φx
i
γˆ where x
i
is the vector of characteristics included on the right hand side of the probit equation, γˆ is the
vector of estimated coefficients, and φ and Φ are the density and cumulative density of the standard normal
distribution, respectively. Relatively low values of the inverse Mills ratio in this case correspond to individ- uals with a relatively high probability of retaking the test.
16. We restrict the right hand side of Equation l to include only an intercept term and the inverse Mills ratio because we are interested only in obtaining a mean predicted value from this regression, rather than
estimating consistent values of other parameters. Adding additional explanatory variables would change individual predicted values, but would not influence the mean predicted value.
14 The Journal of Human Resources
Table 5 Explaining Improvement in SAT Scores and the Role of Selection for Those
Taking SAT at Least Two, Three, or Four Times
Dependent variable: change in score from
n ⫺ 1th to nth test n ⫽ 2
n ⫽ 3 n ⫽ 4
Math Estimates of Heckman second-stage parameters
βˆ
0j
23.2 21.8
20.0 0.652
1.73 8.60
βˆ
1j
⫺29.4 ⫺14.1
⫺8.09 2.63
2.52 8.39
Decomposition of test score changes Observed score gain
16.2 10.8
10.5 Expected score gain, corrected for selection
14.0 7.6
8.0 Difference due to selection
2.2 3.2
2.5 Verbal
Estimates of Heckman second-stage parameters βˆ
0j
23.6 19.7
13.8 0.657
1.69 8.15
βˆ
1j
⫺32.7 ⫺11.1
⫺5.19 2.66
2.45 7.94
Decomposition of test score changes Observed score gain
15.9 11.1
7.7 Expected score gain corrected for selection
13.4 8.6
6.1 Difference due to selection
2.5 2.5
1.6 N first stage
22,678 18,638
7,631
Note: Standard errors in parentheses. denotes coefficients significant at the 1 percent level.
Consistent with the negative point estimates of βˆ
1j
, at least part of observed test score gains can be attributed to selection into the pool of applicants in all six cases.
This procedure also shows, however, that most of the observed gain in test scores associated with retaking cannot be attributed to selection. Between 70 and 90 percent
of the observed test score gain in each instance is robust to selection correction.
17
17. Simple ‘‘back-of-the-envelope’’ calculations confirm the notion that the observed test score increases are unlikely to result from selection effects. Consider the test score changes from the first to the second
administration. Among the 82 percent of applicants who retake the test, the average score increase is roughly 16 points on both the math and verbal scales. If the average test score increase in the entire
population were zero, the remaining 18 percent of applicants would have to expect average test score decreases of 73 points on both the math and verbal scales. The College Board reports that the test-retest
standard deviation of SAT scores is roughly 30 points. The selection-only hypothesis therefore implies
Vigdor and Clotfelter 15
These residual test score gains can be considered real and attributable either to a gain from familiarity with the test or to gains due to learning more over time.
18
V. Model and Implications