Estimating Strategy Data and Empirical Strategy

blacks acquiring more years of education than whites, conditional on AFQT score. 7 Consistent with the model, blacks in the NLSY79 obtain higher levels of education than whites, conditional on AFQT scores: Black men with AFQT scores in the middle of the distribution attain 1.2 more years of education than white men. The disparity in educational attainment is even starker for women: At all but the lowest levels of AFQT scores, black women attain 1.3 more years of education than white women. Therefore, since wage returns to education are positive, omitting a control for years of education in estimates of racial wage differences is expected to bias the coeffi cient on “black” upward. 8,9 Lang and Manove 2011 show that including both the AFQT score and educational attainment causes black males to perform between six and eight percentage points more poorly than white males, but they do not examine the impact of controlling for educational attainment on the wage gap for women. Since, con- ditional on AFQT score, black women acquire even more education, we expect the impact of including years of education to be even larger.

III. Data and Empirical Strategy

A. Estimating Strategy

We begin by replicating prior estimates of the black wage premium for women us- ing the 2006 NLSY79. This longitudinal survey of 12,686 individuals was designed to be nationally representative of individuals between the ages of 14 and 22 in the year 1979. Respondents were interviewed annually through 1994, at which point it switched to a biennial survey. The NLSY79 data include detailed information about hourly wages, labor force participation, educational attainment, and AFQT score. We acquired the restricted- use fi les so that we can identify respondents’ counties of resi- dence within the United States to measure cost of living. To replicate estimates in Fryer 2011, we estimate the following equation: 1 Ln wage i = ␣ + ␤BLACK i + ␥ 1 AFQT i + ␥ 2 AFQT i 2 + ␦ 1 age i + ␦ 2 age i 2 + ␧ i where the estimate for β represents the conditional racial wage difference. If β is positive, our estimates are consistent with black women receiving a wage premium. We then examine the impact of accounting for selection. In recent work, the most common approach to address selection out of work is to impute a potential wage for the nonworkers in the sample, and estimate median regressions of wage differentials for example, Johnson, Kitamura, and Neal 2000; Chandra 2003; Neal 2004. Re- searchers include nonworkers in the estimation sample based on the assumption that 7. Under the assumption that it is harder for employers to estimate the productivity of a black worker than a white worker, employers place more weight on educational attainment when evaluating black workers. Therefore, black workers invest more in education than white workers do, in order to signal their productivity. 8. One concern is that the relevant omitted variable is school quality and that educational attainment proxies for school quality. If blacks attend schools of lower quality, on average, then blacks may acquire more schooling to acquire the same skills. This is unlikely to be the case. Lang and Manove 2011 show that including a host of school characteristics associated with school quality does not substantially change relative educational attainment by race, conditional on AFQT. 9. NLSY79 respondents obtained a substantial amount of schooling after taking the AFQT, so in this data set, years of education probably includes additional information about respondents’ labor market skills. the imputed wage and the wage an individual could potentially earn potential wage fall on the same side of the conditional median. Under this assumption, estimates are consistent for the population median without being sensitive to the chosen imputed value. We account for differential selection by imputing low and high potential wages. We impute a low potential wage of 1 for women who: 1 received any benefi ts from the Temporary Assistance for Needy Families TANF, Supplemental Security Income SSI, or Food Stamp programs between 2002 and 2006; 2 have a high school degree or less education; and 3 report no spousal income in the previous fi ve years. We adopt these strict criteria to reduce the chance of errors because systematically imput- ing erroneous low potential wages for women of one race would impact our estimate of racial wage differences. For example, improperly imputing low potential wages for white women would result in overstated black relative wages. We impute a high potential wage for women who meet the following two criteria: 1 married to a high- earning spouse and 2 earned at least some college education. We defi ne “high- earning spouse” in two ways. In our more conservative estimate, a high- earning spouse has average annual earnings over the past fi ve years that place him at or above the 90 th percentile for men of his race in the 2006 NLSY79. We then loosen this restriction somewhat to include women whose spouse earns above the 75 th percentile for men of his race. Improperly imputing high potential wages for black but not white women would result in overstated black relative wages. These criteria for imputation help ensure that the imputed wages are on the same side of the median as the respondent’s potential wage; however, adhering to these criteria leaves several groups of nonworking women without imputed wages, such as highly educated, unem- ployed, single women. 10 If our decision rule leaves more highly skilled white women without an imputed wage than similar black women, then we would overstate relative black wages. 11 We next add controls for local cost of living to our OLS and median regression estimates. To justify these controls, we demonstrate that black and white women face systematically different costs of living. We measure locations as commuting zones CZs, which are collections of counties defi ned by the U.S. Department of Agriculture to have signifi cant economic integration, measured by journey- to- work links Tolbert and Sizer 1996. In metropolitan areas, CZs and MSAs overlap signifi cantly. The real advantage of using CZs as a unit of geography is in rural areas. With CZs, we do not need to drop all non- MSA areas or pool them together within each state or Census region two common methods. Pooling is costly, as rural areas within a state can vary considerably. Consider Colorado, where rural areas include tourist towns like Breckenridge and also the San Luis Valley. Average monthly rent for two and three bedroom dwellings is 1,110 in the counties around Breckenridge but 540 in the San Luis Valley 2005–2009 ACS. Housing is the most important local price in consumers’ budgets, and we use it as one proxy for local costs of living. Banzhaf and Farooque 2012 compare alterna- 10. We include women who are temporarily unemployed in our main OLS sample if they were working and had an observed wage in 2004. 11. We do not impute wages for 338 nonworking women in the 2006 NLSY79, 190 of whom acquired at least some college education. Imputing wages for these highly skilled women would increase the sample size 7 percent and is unlikely to move estimates of the racial wage gap much in either direction. tive methods for measuring local housing costs and fi nd that average rental prices perform well: They are closely associated with housing transaction price data which are more costly to collect, and rental prices are closely associated with measured local amenities and average incomes. Similar to Moretti 2013, we calculate average gross monthly rent including utility costs for two and three bedroom dwellings in each CZ with the pooled 2005 to 2007 ACS. 12 In Table 1, Panel A we present the housing costs in CZs where the white and black women in the 2006 NLSY79 live. Column 1 shows blacks face higher costs of living on average: The black women in our sample face a mean monthly rent of 852 versus 816 for whites. The difference is statistically and economically signifi cant. The remaining columns of the table show that blacks face higher rent at several quantiles of the cost- of- living distribution. 13 For the wage regressions, we construct a measure of relative housing costs for each CZ. We defi ne relative housing costs as the mean rent in a CZ divided by the average rent over all CZs. We use these relative housing costs to construct a cost of living index that refl ects that housing costs comprise only 42 percent of household expendi- tures from the 2007 consumer price index CPI- U calculation. 14 Of course, there are several concerns with rental costs as a measure of cost of living. For example, higher rental prices might be offset by lower costs of transportation or childcare for some individuals. If this is the case, rental prices do not accurately refl ect differences in the total cost of living. Relatedly, rental prices may be bid up by high- income households, but this does not correspondingly bid up prices for other goods. Therefore, we also consider a second proxy for the cost of living that captures dif- ferences in wages across geographic areas. One concern with a wage- based measure of cost of living is that the higher wages paid in higher cost of living areas might re- fl ect productivity differences among workers. Geographically mobile workers in par- ticular will live in areas with high productivity and wages. For this group of workers, the location- specifi c component of wages might refl ect unobserved individual produc- tivity. To mitigate such concerns, our wage- based measure of cost of living focuses on the wages of workers employed in occupations whose workers are relatively immo- bile geographically, such as farmers and funeral directors. See Appendix Table A1. A CZ’s average wage in these “low- mobility” occupations is our second control for 12. The smallest identifi able area in the ACS is the public use microdata area PUMA, a Census-defi ned place with population over 100,000. Some PUMA boundaries do not perfectly align with counties. When this is the case, we assign PUMA characteristics to a CZ based on the PUMA’s population share in the CZ. See McHenry 2014. 13. Columns 2-6 describe the distribution of housing costs faced by NLSY79 black and white respondents. For example, Column 2 implies that 10 percent of black respondents to the NLSY79 live in CZs with average rental costs below 586 as measured in the ACS, while 10 percent of white respondents live in CZs with average rental costs below 518. 14. That is, the CZ housing cost measure is computed as follows: HousingCostCZ = MeanRentCZ CZ =1 N ∑ MeanRentCZ N and the cost of living is computed as CostofLivingCZ = 0.42 HousingCostCZ + 0.58 1 . The 42 percent hous- ing expenditure share is from Appendix 4 in the U.S. Bureau of Labor Statistics Handbook of Methods chapter about the Consumer Price Index. Table 1 Local Cost of Living by Race, Characteristics of Locations Where NLSY79 Respondents Live Percentile in the Distribution of NLSY79 Respondents’ Locations Mean 10th 25th 50th 75th 90th Panel 1: Average Rent for 2 to 3 Bedroom Property Black 852.2 246.0 586.4 655.2 805.6 976.4 1,267 White 815.7 245.4 517.8 639.3 773.3 978.9 1,188 Ratio blackwhite 1.045 1.132 1.025 1.042 0.997 1.066 Panel 2: Mean Hourly Wage for Workers in “Low- Mobility” Occupations Black 16.37 1.86 14.11 15.20 16.11 17.62 19.34 White 16.31 1.97 13.65 14.82 16.13 17.52 19.34 Ratio blackwhite 1.004 1.034 1.026 0.999 1.006 1.000 Panel 3: Mean Hourly Wage for Workers in “High- Mobility” Occupations Black 26.58 4.22 20.93 24.58 26.35 29.62 30.38 White 25.96 4.49 19.79 22.95 26.08 29.31 31.43 Ratio blackwhite 1.024 1.058 1.071 1.010 1.011 0.967 Notes: Panel 1 contains summary statistics about the average monthly rent for 2- and 3- bedroom single- family dwellings in the NLSY79 respondent’s commuting zone CZ in the year 2006. CZ- average monthly rent data calculated using the pooled 2005–2007 ACS samples from IPUMS Ruggles et. al. 2010. We calculate average “gross monthly rent” over households in each PUMA and aggregate to CZs with averages weighted by population overlaps between PUMAs and CZs. Left- most column shows for each respondent category the mean and standard deviation in parentheses; the remaining columns show percentiles of the residence CZ rental price distribution. Panels 2 and 3 contain corresponding summary statistics about average hourly wages in NLSY79 respondents’ CZs. Panel 2 shows average wages for workers in “low- mobility occupations”— those occupations for which workers are most likely to live in their birth states in the 2005–2007 ACS. Panel 3 shows average wages for workers in “high- mobility occupations”—those occupations for which workers are least likely to live in their birth states in the 2005–2007 ACS. There are 1,167 black women and 1,853 white women in the NLSY79 sample. Asterisks indicate statistical signifi cance of differences between cost of living experienced by blacks and whites p 0.01 p 0.05 p 0.1. local costs of living. 15 As shown in Table 1, Panel B, we fi nd that the average black woman lives in an area with higher wages among low- mobility occupations. The gap between local wages where black and white women live is highest in CZs at the lower end of the local wage distribution that is, the 10 th and 25 th percentiles of respondents’ locations. 16,17 In addition to cost of living, our preferred specifi cations control for years of educa- tion. If, conditional on AFQT score, black women acquire more years of education than white women, then omitting years of education would result in an upward bias on the coeffi cient estimate for the black indicator variable. Lang and Manove 2011 show that black women in the NLSY79 had acquired more years of education by 2000 than white women with the same AFQT score. We confi rm that this is true in 2006 as well. Incorporating these methods, our preferred estimate of the black- white wage gap among women is the estimate for β in: 2 Ln wage i = ␣ + ␤BLACK i + ␥ 1 AFQT i + ␥ 2 AFQT i 2 + ␦ 1 age i + ␦ 2 age i 2 + ␾COL i + ␭EDUC i + ␧ i . This equation includes local cost of living COL and years of education EDUC. The model of employer discrimination in the Appendix implies that local costs of living and individual productivity traits like education are important controls to in- clude; otherwise, the regression is unlikely to identify wage differences due to racial discrimination. We estimate Equation 2 using OLS and median regression. Observed log hourly wage among workers is the dependent variable for our OLS estimates. Median regression estimates also include nonworkers with imputed potential wages, described above.

B. NLSY79 Data