Topcoding and Topcode Bias

570 The Journal of Human Resources for women is consistently about 0.2 log points higher than for men. Card and DiNardo 2002 use CPS data for years 1975–99 and report college wage premiums for women that are greater than or equal to those of men for all years. Some studies use data sources other than the CPS. Dougherty 2005 runs wage regressions on years of schooling using National Longitudinal Survey of Youth 1979 NLSY79 data and finds higher wage premiums for women throughout the period 1988–2000. He also cites more than 20 other studies that use data sources other than the CPS and find higher “returns to schooling” for women than men. None of these studies, however, looks at data from 1990 or later. Pen˜a 2007 disagrees with these findings, but offers only evidence from outside the United States to support the claim that the college wage premium is higher for men.

III. Topcoding and Topcode Bias

Wage and salary earnings data “wage income” or “wages” in CPS public use files have been topcoded since 1967. From 1967–80, the topcode was 50,000; from 1981–83, it was 75,000; from 1984–94, it was 99,999; from 1995– 2001, it was 150,000; and since 2002, it has been 200,000. 3 These are nominal values; all income data, including topcodes, reported in the CPS are in current year dollars. Topcoding is a widely recognized issue for CPS data. See, for example, Katz and Murphy 1992; Card and DiNardo 2002; Autor, Katz, and Kearney 2005; Di- Prete and Buchmann 2006; Mulligan and Rubinstein 2008; Hirsh and McPherson 2008; Larrimore et al. 2008. I have found no prior study, however, that has identified and examined the biasing effect of topcodes on the college wage premium. Although they do not discuss bias due to topcoding, Katz and Murphy 1992 do adjust topcoded wages before calculating college wage premiums for the years 1963– 87. Card and DiNardo 2002 note the presence of topcodes and suggest adjusting them by a factor of 1.4, although their results described above do not make this adjustment. Instead, some prior work has attempted to address the presence of topcodes by recensoring wage data at maximum values that are more consistent across time. For example, DiPrete and Buchmann 2006 recensor the 1963–2001 CPS wage data at topcodes that are linearly smoothed over time and always below 124,000; Card and DiNardo 2002 recensor all observations after 1994 at 100,000. This type of correction should avoid spurious jumps in estimated wages in years when topcodes change. Recensoring, however, greatly increases the number of observations that are sub- ject to censoring in recent years. Recensoring of wage observations has lead to larger 3. Until 1988, all wage income was reported as a single total subject to these topcodes. Since 1988, the CPS has reported wage income from separate jobs separately, applying a separate topcode to each job; for these years, the topcodes listed are for wage income from the longest job held that year. The Integrated Public Use Microdata Series IPUMS CPS data series aggregates these individual-job wage income vari- ables to consistently report total wage income subject to the topcodes listed above for all years. The results herein use this IPUMS CPS total wage income variable for all years. When wages are disaggregated and topcodes are adjusted separately, my results are not discernibly affected. Hubbard 571 Figure 1 Fraction of All Wage Observations Subject to Censoring, by Sex and larger numbers of censored wage observations. Figure 1 shows the dramatic rise in the share of observations in my sample described in Section IV subject to topcoding or recensoring at 100,000. Crucially, for estimates of the college wage premium, the increase over time in the censoring of wage observations is not benign. First, the overwhelming majority of topcoded or recensored observations in any year are college-educated individuals. Because of this, using topcoded data without accounting for censoring will bias college wage premium estimates downward. Second, the great majority of topcoded or recensored observations are male. Thus, we would expect topcoding bias to dis- proportionately depress the college wage premium for males. Together, these facts suggest that topcoding and recensoring have increasingly biased estimates of the relative college wage premiums of women and men.

IV. Data and Construction of Samples