Explaining the Increase in Scores

Vigdor and Clotfelter 11 occurs in the decision to take the test a third time, where an applicant from a com- pletely urban ZIP code was 8 percentage points more likely to retake the test than an applicant from a completely rural ZIP code. Other ZIP code characteristics, such as racial composition and income, do not display a consistent relationship with re- taking. Finally, those who took their first SAT early were generally more likely than others to retake the test. This result is not surprising, since those applicants who initially took the SAT on a late date would simply not have had many chances to retake the exam. In conclusion, the empirical analysis of who retakes the SAT indicates significant differences by race, income, parental education, self-reported class rank and ability, and type of community. Most of these relationships are obscured in the raw data, presumably by the extremely strong tendency for students who score well on the test to refrain from taking it again. These explanatory variables might measure differ- ential expectations regarding future test scores, variation in test-taking costs, or varia- tion in the benefits associated with admission. Many of the applicant characteristics associated with a lower propensity to retake are also correlated with lower overall SAT scores, suggesting that applicants may form expectations in a manner that re- sembles statistical discrimination. 12 The greatest amount of selection into the pool of retakers appears to occur in the decision to take the test a third time.

IV. Explaining the Increase in Scores

The tendency for SAT scores to increase is evident in the averages presented in Table 4. Using both nationwide data and data from our three-institution sample, the table shows that students taking the test on average improve their scores with each successive administration. 13 This tendency applies to both math and verbal tests. Consider, for example, those who took the SAT three times. Among all those in the 1997 national cohort, the average score among this group on the verbal test increased from 493 in the first taking to 515 on the third and from 510 to 537 on the math. Within our sample of applicants to three institutions, the comparable in- creases were 573 to 602 for the verbal and 555 to 583 for the math. For all those taking the test at least twice, the average increase on the second try in the national sample was 13 points for the verbal and 16 points for math. The comparable increases for the three-university sample were both about 16 points. Both samples show the same thing: retaking the test is associated with higher scores. What explains these score increases? At least three possible reasons for this ten- dency suggest themselves. First, improvement might arise because of students’ in- creased familiarity with the SAT test, its format, and the kinds of questions it in- cludes. Second, rising scores may reflect the general increase in knowledge that one 12. A simple regression of first-time SAT scores on the covariates other than SAT scores in Table 3 reveals that income, parental education, class rank, self-reported math and writing ability, residence in an urban or wealthy ZIP code, and Asian racial background are all significantly positively correlated with test scores. Black, American Indian, Hispanic, and Female applicants receive significantly lower scores on their first test. The R 2 for this regression, with 22,678 observations, is 0.52. 13. For an analysis of various aspects associated with changes in scores, see Nathan and Camara 1998. 12 The Journal of Human Resources Table 4 Average SAT Scores for Students, by Number of times Taking the test Panel A: 1997 Graduating Cohort Average score increase Number of times taken over previous 1 2 3 4 5 test c Number of applicants a 567,495 426,569 107,870 15,633 2,417 Percentage b 50.6 38.0 9.6 1.4 0.2 Average score—Verbal First test 492 507 493 468 442 Second test 520 504 480 453 13 Third test 515 488 460 11 Fourth test 499 469 11 Fifth test 480 11 Average score—Math First test 492 512 510 495 481 Second test 528 525 511 496 16 Third test 537 522 507 12 Fourth test 532 518 10 Fifth test 526 8 Panel B: Three University Sample Number of times taken Average score increase over 1 2 3 4⫹ previous test c Number of applicants d 4,040 11,007 6,000 1,631 Percentage b 17.8 48.7 27.0 6.5 Average score—Verbal First test 649 602 573 538 Second test 617 591 562 15.9 Third test 602 575 11.1 Fourth test 582 7.7 Average score—Math First test 641 589 555 515 Second test 606 572 536 16.2 Third test 583 548 10.8 Fourth test 557 9.7 a. Data are based on 1,119,984 students who took the SAT1 one to five times in their junior or senior years. b. Rows sum to 100.0. c. Calculated for all those who took the test at least the indicated number of times. d. Data are based on 22,678 students who graduated in 1998 and took the SAT1 one to four or more times from their spring of sophomore year to their senior years. Vigdor and Clotfelter 13 expects to correspond to aging and time in school. Third, the increase could arise out of a selection effect, whereby those who had performed badly relative to their expectations constituted the bulk of retakers, in which case the improvement in their scores might arise from ordinary regression to the mean. The first two possible causes could be thought of as ‘‘real,’’ as opposed to mere selection. Determining whether scores truly tend to ‘‘drift upward’’ upon retaking is an important precursor to modeling retaking behavior. To test whether the observed increases in test scores are consistent with selection effects, we employ the two- stage Heckman sample-selection procedure Heckman 1979. 14 The first stage of the procedure consists of probit regressions identical to those reported in Table 3 above, which predict the probability that any particular applicant will enter the sample of retakers. In the second stage, we estimate the following equations for both math and verbal test scores: 1 Predicted Test Score Gain ij ⫽ βˆ 0j ⫹ βˆ 1j λˆ i , where i indexes students, j represents either the verbal or math score, and λˆ i is an applicant’s inverse Mills ratio as estimated by the first-stage probit regression. 15 This procedure allows us to predict selection-corrected test score gains for applicants both in and out of sample. 16 Table 5 presents estimates of βˆ 0j and βˆ 1j . We estimate separate second-stage equations for test score changes from the first to the second, second to third, and third to fourth test administrations; one set each is estimated for the verbal and math parts of the test. For all six equations, the selection coefficient βˆ 1j is negative, indicating that, as theory would predict, those individuals most likely to retake the test are those with the highest expected score gains. Esti- mates of selection into the pool of two- and three-time takers are statistically signifi- cant; selection into the pool of four-time takers is not. To give an idea of the importance of this selection effect on the amount of gains, we use the second-stage Heckman results to calculate simple predicted score changes between the nth and n⫹1th administrations for all individuals who took the nth test, regardless of whether they actually took the n⫹1th. We compare these selection- corrected average test score gains with the observed average score gain, equating the difference in these values with a selection effect. 14. The Heckman selection correction procedure presumes that the error terms in the sample selection equation the probit equations reported in Table 3 and the outcome equations where outcomes here are increases in test scores follow a bivariate normal distribution, with some correlation between the error terms. In other words, individuals with exceptionally low latent increases in test scores might also be exceptionally unlikely to retake the test. 15. The inverse Mills ratio for each observation is computed as follows: λˆ i ⫽ φx i γˆ Φx i γˆ where x i is the vector of characteristics included on the right hand side of the probit equation, γˆ is the vector of estimated coefficients, and φ and Φ are the density and cumulative density of the standard normal distribution, respectively. Relatively low values of the inverse Mills ratio in this case correspond to individ- uals with a relatively high probability of retaking the test. 16. We restrict the right hand side of Equation l to include only an intercept term and the inverse Mills ratio because we are interested only in obtaining a mean predicted value from this regression, rather than estimating consistent values of other parameters. Adding additional explanatory variables would change individual predicted values, but would not influence the mean predicted value. 14 The Journal of Human Resources Table 5 Explaining Improvement in SAT Scores and the Role of Selection for Those Taking SAT at Least Two, Three, or Four Times Dependent variable: change in score from n ⫺ 1th to nth test n ⫽ 2 n ⫽ 3 n ⫽ 4 Math Estimates of Heckman second-stage parameters βˆ 0j 23.2 21.8 20.0 0.652 1.73 8.60 βˆ 1j ⫺29.4 ⫺14.1 ⫺8.09 2.63 2.52 8.39 Decomposition of test score changes Observed score gain 16.2 10.8 10.5 Expected score gain, corrected for selection 14.0 7.6 8.0 Difference due to selection 2.2 3.2 2.5 Verbal Estimates of Heckman second-stage parameters βˆ 0j 23.6 19.7 13.8 0.657 1.69 8.15 βˆ 1j ⫺32.7 ⫺11.1 ⫺5.19 2.66 2.45 7.94 Decomposition of test score changes Observed score gain 15.9 11.1 7.7 Expected score gain corrected for selection 13.4 8.6 6.1 Difference due to selection 2.5 2.5 1.6 N first stage 22,678 18,638 7,631 Note: Standard errors in parentheses. denotes coefficients significant at the 1 percent level. Consistent with the negative point estimates of βˆ 1j , at least part of observed test score gains can be attributed to selection into the pool of applicants in all six cases. This procedure also shows, however, that most of the observed gain in test scores associated with retaking cannot be attributed to selection. Between 70 and 90 percent of the observed test score gain in each instance is robust to selection correction. 17 17. Simple ‘‘back-of-the-envelope’’ calculations confirm the notion that the observed test score increases are unlikely to result from selection effects. Consider the test score changes from the first to the second administration. Among the 82 percent of applicants who retake the test, the average score increase is roughly 16 points on both the math and verbal scales. If the average test score increase in the entire population were zero, the remaining 18 percent of applicants would have to expect average test score decreases of 73 points on both the math and verbal scales. The College Board reports that the test-retest standard deviation of SAT scores is roughly 30 points. The selection-only hypothesis therefore implies Vigdor and Clotfelter 15 These residual test score gains can be considered real and attributable either to a gain from familiarity with the test or to gains due to learning more over time. 18

V. Model and Implications