Evaluating the Current Score Ranking Policy

22 The Journal of Human Resources

B. Evaluating the Current Score Ranking Policy

Our data on simulated applicants has one central advantage over data on actual appli- cants: we are able to observe the ability parameter that SAT scores are intended to estimate. We can therefore examine the effectiveness of current and alternative col- lege test score ranking policies in providing a high-quality point estimate of an appli- cant’s true ability. We use four different criteria to determine the quality of a ranking policy. 1. Accuracy. This is simply the average difference between the estimate of an ability parameter derived from a policy and the true value of that parameter ‘‘ranking error’’. Both positive and negative values are theoretically pos- sible. 2. Precision. This measure equals the standard deviation of ranking errors asso- ciated with a particular policy. A policy can be inaccurate yet precise, if ranking errors are more or less the same for all applicants. An imprecise policy is one where the ranking errors vary quite a bit from applicant to applicant. Precision can never be negative; values closer to zero are prefera- ble, other things equal. 3. Bias. This measure should not be confused with accuracy, which in a statisti- cal sense could be referred to as biasedness. Here, we refer to bias as the degree to which the test score ranking policy places high–test taking cost types at a disadvantage. It equals the difference between the average ranking error for low-cost types and the average ranking error for high-cost types. We presume that zero is the most preferred bias value. 25 4. Cost. The cost of a ranking policy is simply the average number of test administrations per applicant observed under that policy. Other things equal, a ranking policy that induces a lower frequency of test-taking is considered superior. In using this criterion, we presume that the value of resources con- sumed in retaking the test exceeds the value of any benefits, such as learning, that accrue to the applicant in the process. Table 8 presents our calculations of the accuracy, precision, bias and cost of the most common current SAT score ranking policy, along with those of several alterna- tive policies to be discussed in the following section. Under the current SAT score ranking policy, our simulation suggests that admis- sions officers’ point estimate of the typical applicant’s ability is significantly higher than the true value. 26 The average deviation between the test score used to estimate an applicant’s ability and that applicant’s true ability, which we define as the ‘‘rank- 25. It is conceivable that colleges might wish to implement a biased score ranking policy. Recall that it is not possible to separate individuals with a high test taking cost from those who place a low value on admission. If colleges determined that variation in benefits were much more important than variation in costs, and they wished to provide an advantage to students attaching the highest value to admission, then a biased policy would appear attractive. 26. Because we allow an applicant’s expected SAT score to drift upward upon retaking, there is some ambiguity as to what exactly should be considered ‘‘true’’ ability. In this analysis, we equate true ability with the applicant’s expected score on the first test administration. Vigdor and Clotfelter 23 Table 8 Comparing Student Test Score Ranking Policies Policy Alternative Accuracy Precision Bias Cost 1. Current: use highest math score and highest verbal score, no correction for ⫹31m⫹31v 29m30v ⫹15m⫹12v 2.3 upward drift. 2. Use highest math score and highest verbal score, correct for upward drift. ⫹20m⫹17v 28m27v ⫹2m⫹5v 1.9 3. Use first submitted score only ⫺1m⫹1v 35m35v ⫹4m0v 1.0 4. Average of all scores submitted, no correction for upward drift. ⫹3m⫹5v 32m32v ⫹1m⫹2v 1.4 5. Average of all scores submitted, correct for upward drift. ⫹2m⫹3v 34m32v ⫹3m⫺2v 1.2 6. Use last submitted score only, no correction for upward drift. ⫹16m⫹17v 33m33v ⫹5m⫺1v 1.7 7. Use last submitted score only, correct for upward drift. ⫹8m⫹7v 33m33v ⫹6m⫺4v 1.4 8. Mandatory retake, use average of first two scores only, no correction for up- ⫹4m⫹5v 26m24v ⫺1m⫺1v 2.0 ward drift. 9. Mandatory retake, use average of first two scores only, correct for upward drift ⫺1m0v 26m24v ⫺1m⫺1v 2.0 Note: Results are based on simulations described in Section IV of the text. The simulation assumes that applicants receive prior information equivalent to one test administration before taking their first real test. Upward drift in test scores is equal to 10 points each on the math and verbal segments. Applicant cost-to-benefit ratios equal 0.015 for ‘‘low cost’’ types and 0.025 for ‘‘high cost’’ types. Roughly 85 percent of simulated applicants are assigned the ‘‘low cost’’ designation. The marginal cost of retaking the test is assumed to increase linearly with the number of administrations. ‘‘Accuracy’’ is equal to the average difference between an applicant’s test scores and true ability under the ranking policy indicated. ‘‘Precision’’ is equal to the standard deviation of differences between an applicant’s test scores and true ability. ‘‘Bias’’ is the difference in accuracy measures between low and high cost types. Cost is equal to the average number of test administrations per applicant under the indicated policy. 24 The Journal of Human Resources ing error,’’ is approximately 30 points on the verbal scale and 30 points on the math scale. Because current policy effectively picks the most positive outlier among competing point estimates, it is not surprising that applicants are consistently rated above their true ability. The tendency for scores to drift upward upon retaking exacer- bates ranking errors. Ranking errors also differ appreciably among applicants under the current policy. The precision values convey this basic fact, and the bias values show one component of the variance in ranking errors across applicants. In our simulation, high-cost appli- cants, who were approximately 20 percent less likely to retake the test and 36 percent less likely to take the test a third time conditional on taking twice, are consistently ranked lower than low-cost applicants of equal true ability. By taking the test more frequently, low-cost applicants are receiving more chances to draw a positive outlier from their distribution of possible test scores. These applicants also are more likely to benefit from upward drift in test scores. If applicants are ranked according to the sum of their math and verbal scores, the average high-cost applicant in this simula- tion is placed at a 27-point disadvantage relative to an equivalent low-cost rival. Finally, the current policy leads to a situation where the average applicant takes the test 2.3 times. By construction, this value is observed both in our simulated data and in our actual data on applicants to selective colleges.

C. Evaluating Alternative Score Ranking Policies