Data and Construction of Samples

Hubbard 571 Figure 1 Fraction of All Wage Observations Subject to Censoring, by Sex and larger numbers of censored wage observations. Figure 1 shows the dramatic rise in the share of observations in my sample described in Section IV subject to topcoding or recensoring at 100,000. Crucially, for estimates of the college wage premium, the increase over time in the censoring of wage observations is not benign. First, the overwhelming majority of topcoded or recensored observations in any year are college-educated individuals. Because of this, using topcoded data without accounting for censoring will bias college wage premium estimates downward. Second, the great majority of topcoded or recensored observations are male. Thus, we would expect topcoding bias to dis- proportionately depress the college wage premium for males. Together, these facts suggest that topcoding and recensoring have increasingly biased estimates of the relative college wage premiums of women and men.

IV. Data and Construction of Samples

To estimate college wage premiums after accounting for topcodes, I use the IPUMS CPS data series for 1970–2008. 4 I construct a sample that is fairly representative of the samples used in the literature: I include white, non-Hispanic, adult civilians who were age 18 to 65 at the time of survey and who worked the previous year as private or government employees for a wage or salary. I exclude 4. See King, et al. 2009. Integrated Public Use Microdata Series, Current Population Survey: Version 2.0. Minneapolis, Minn.: Minnesota Population Center. 572 The Journal of Human Resources observations with negative CPS sample weights. 5 I further restrict the sample to workers with 1–40 years of potential experience, as defined below. I limit the sample to full-time, full-year FTFY workers, where FTFY is defined as 35 or more hours per week and 50 or more weeks per year. I focus on FTFY workers for two reasons. First, the 1994 CPS redesign was intended to increase the measured labor force participation of workers believed to be mostly women who in previous survey formats were being recorded as not in the labor force. Given that FTFY workers are the subset of workers who are least likely to be affected by recategorizing workers on the margin of labor force partic- ipation, I reduce any potential spurious trend generated by the 1994 survey redesign. Second, studies that have attempted to independently verify the accuracy of CPS wage data find that FTFY wage data appear to be very accurately reported, while wages for part-time, part-year workers appear to be substantially underreported. See Roemer 2000, 2002. The coding of educational variables in the CPS data changed between 1990 and 1991. For 1970–90, the education variable is the number of whole and partial years of education completed topcoded at 18. For 1991–2008, the education variable is coded as intervalled years of schooling for observations with less than a high school degree, and as the highest degree obtained for those with at least a high school degree, with a separate category for some college with no college degree. To ensure maximum consistency across time, I generate the following educational category recodes: I define Dropout as anyone with fewer than 12 years of schooling completed 1970–90 or less than a high school diploma or General Education Development GED certificate 1991–2008; High School Graduate as anyone with exactly 12 years of schooling completed 1970–90 or exactly a high school diploma or GED certificate 1991–2008; Some College as anyone with more than 12 and fewer than 16 years of schooling completed 1970–90 or categorized as “Associate Degree” or “Some College, No Degree” by the CPS 1991–2008; Bachelor’s Degree as 16 or 17 years of schooling completed 1970–90 6 or categorized as “Bachelor’s De- gree” by the CPS 1991–2008; and Advanced Degree as 18 or more years of school- ing completed 1970–2008 or categorized as “Masters,” “Professional,” or “Doc- torate” degree holder by the CPS 1991–2008. I define College Graduate as any observation that I have defined as either Bachelor’s Degree or Advanced Degree above. I also generate a “Years of School” variable. As noted above, the CPS reports years of schooling only until 1990. For 1991–2008, I impute years of completed schooling as follows: I divide the sample into demographic cells based on sex and educational category. For each demographic cell, I compute the mean years of schooling during the period 1988–90. These values, rounded down to the nearest integer, are the years of schooling used for observations in the same demographic categories for 1991–2008. With this measure of years of school, I generate potential experience as AGE – YEARS OF SCHOOL – 7. 5. Fewer than 230 out of nearly 2.9 million observations in the sample have negative weights. 6. The results are not sensitive to how observations with 17 years of schooling are categorized. Hubbard 573 I deflate all wage values using the Personal Consumption Expenditures PCE price index to 1982 dollars, and drop all observations with annual wages less than 3,484, or one-half minimum wage in 1982. For a 52-week year, this is equivalent to the 67week threshold used in Katz and Murphy 1992 and Mulligan and Ru- binstein 2008. 7 Finally, I also exclude observations flagged as containing “allo- cated” imputed values for education or the amount or source of wage and salary income. No results are sensitive to the inclusion or exclusion of either type of im- puted data. Table 1 provides descriptive statistics for the sample.

V. Results