blacks acquiring more years of education than whites, conditional on AFQT score.
7
Consistent with the model, blacks in the NLSY79 obtain higher levels of education than whites, conditional on AFQT scores: Black men with AFQT scores in the middle
of the distribution attain 1.2 more years of education than white men. The disparity in educational attainment is even starker for women: At all but the lowest levels of AFQT
scores, black women attain 1.3 more years of education than white women.
Therefore, since wage returns to education are positive, omitting a control for years of education in estimates of racial wage differences is expected to bias the coeffi cient
on “black” upward.
8,9
Lang and Manove 2011 show that including both the AFQT score and educational attainment causes black males to perform between six and eight
percentage points more poorly than white males, but they do not examine the impact of controlling for educational attainment on the wage gap for women. Since, con-
ditional on AFQT score, black women acquire even more education, we expect the impact of including years of education to be even larger.
III. Data and Empirical Strategy
A. Estimating Strategy
We begin by replicating prior estimates of the black wage premium for women us- ing the 2006 NLSY79. This longitudinal survey of 12,686 individuals was designed
to be nationally representative of individuals between the ages of 14 and 22 in the year 1979. Respondents were interviewed annually through 1994, at which point it
switched to a biennial survey. The NLSY79 data include detailed information about hourly wages, labor force participation, educational attainment, and AFQT score. We
acquired the restricted- use fi les so that we can identify respondents’ counties of resi- dence within the United States to measure cost of living. To replicate estimates in
Fryer 2011, we estimate the following equation:
1 Ln wage
i
= ␣ + BLACK
i
+ ␥
1
AFQT
i
+ ␥
2
AFQT
i 2
+ ␦
1
age
i
+ ␦
2
age
i 2
+
i
where the estimate for β represents the conditional racial wage difference. If β is
positive, our estimates are consistent with black women receiving a wage premium. We then examine the impact of accounting for selection. In recent work, the most
common approach to address selection out of work is to impute a potential wage for the nonworkers in the sample, and estimate median regressions of wage differentials
for example, Johnson, Kitamura, and Neal 2000; Chandra 2003; Neal 2004. Re- searchers include nonworkers in the estimation sample based on the assumption that
7. Under the assumption that it is harder for employers to estimate the productivity of a black worker than a white worker, employers place more weight on educational attainment when evaluating black workers.
Therefore, black workers invest more in education than white workers do, in order to signal their productivity. 8. One concern is that the relevant omitted variable is school quality and that educational attainment
proxies for school quality. If blacks attend schools of lower quality, on average, then blacks may acquire more schooling to acquire the same skills. This is unlikely to be the case. Lang and Manove 2011 show that
including a host of school characteristics associated with school quality does not substantially change relative educational attainment by race, conditional on AFQT.
9. NLSY79 respondents obtained a substantial amount of schooling after taking the AFQT, so in this data set, years of education probably includes additional information about respondents’ labor market skills.
the imputed wage and the wage an individual could potentially earn potential wage fall on the same side of the conditional median. Under this assumption, estimates are
consistent for the population median without being sensitive to the chosen imputed value.
We account for differential selection by imputing low and high potential wages. We impute a low potential wage of 1 for women who: 1 received any benefi ts from the
Temporary Assistance for Needy Families TANF, Supplemental Security Income SSI, or Food Stamp programs between 2002 and 2006; 2 have a high school degree
or less education; and 3 report no spousal income in the previous fi ve years. We adopt these strict criteria to reduce the chance of errors because systematically imput-
ing erroneous low potential wages for women of one race would impact our estimate of racial wage differences. For example, improperly imputing low potential wages for
white women would result in overstated black relative wages.
We impute a high potential wage for women who meet the following two criteria: 1 married to a high- earning spouse and 2 earned at least some college education.
We defi ne “high- earning spouse” in two ways. In our more conservative estimate, a high- earning spouse has average annual earnings over the past fi ve years that place
him at or above the 90
th
percentile for men of his race in the 2006 NLSY79. We then loosen this restriction somewhat to include women whose spouse earns above the 75
th
percentile for men of his race. Improperly imputing high potential wages for black but not white women would result in overstated black relative wages. These criteria
for imputation help ensure that the imputed wages are on the same side of the median as the respondent’s potential wage; however, adhering to these criteria leaves several
groups of nonworking women without imputed wages, such as highly educated, unem- ployed, single women.
10
If our decision rule leaves more highly skilled white women without an imputed wage than similar black women, then we would overstate relative
black wages.
11
We next add controls for local cost of living to our OLS and median regression estimates. To justify these controls, we demonstrate that black and white women face
systematically different costs of living. We measure locations as commuting zones CZs, which are collections of counties defi ned by the U.S. Department of Agriculture
to have signifi cant economic integration, measured by journey- to- work links Tolbert and Sizer 1996. In metropolitan areas, CZs and MSAs overlap signifi cantly. The real
advantage of using CZs as a unit of geography is in rural areas. With CZs, we do not need to drop all non- MSA areas or pool them together within each state or Census
region two common methods. Pooling is costly, as rural areas within a state can vary considerably. Consider Colorado, where rural areas include tourist towns like
Breckenridge and also the San Luis Valley. Average monthly rent for two and three bedroom dwellings is 1,110 in the counties around Breckenridge but 540 in the San
Luis Valley 2005–2009 ACS.
Housing is the most important local price in consumers’ budgets, and we use it as one proxy for local costs of living. Banzhaf and Farooque 2012 compare alterna-
10. We include women who are temporarily unemployed in our main OLS sample if they were working and had an observed wage in 2004.
11. We do not impute wages for 338 nonworking women in the 2006 NLSY79, 190 of whom acquired at least some college education. Imputing wages for these highly skilled women would increase the sample size
7 percent and is unlikely to move estimates of the racial wage gap much in either direction.
tive methods for measuring local housing costs and fi nd that average rental prices perform well: They are closely associated with housing transaction price data which
are more costly to collect, and rental prices are closely associated with measured local amenities and average incomes. Similar to Moretti 2013, we calculate average gross
monthly rent including utility costs for two and three bedroom dwellings in each CZ with the pooled 2005 to 2007 ACS.
12
In Table 1, Panel A we present the housing costs in CZs where the white and black women in the 2006 NLSY79 live. Column 1 shows
blacks face higher costs of living on average: The black women in our sample face a mean monthly rent of 852 versus 816 for whites. The difference is statistically and
economically signifi cant. The remaining columns of the table show that blacks face higher rent at several quantiles of the cost- of- living distribution.
13
For the wage regressions, we construct a measure of relative housing costs for each CZ. We defi ne relative housing costs as the mean rent in a CZ divided by the average
rent over all CZs. We use these relative housing costs to construct a cost of living index that refl ects that housing costs comprise only 42 percent of household expendi-
tures from the 2007 consumer price index CPI- U calculation.
14
Of course, there are several concerns with rental costs as a measure of cost of living. For example, higher rental prices might be offset by lower costs of transportation or
childcare for some individuals. If this is the case, rental prices do not accurately refl ect differences in the total cost of living. Relatedly, rental prices may be bid up by high-
income households, but this does not correspondingly bid up prices for other goods.
Therefore, we also consider a second proxy for the cost of living that captures dif- ferences in wages across geographic areas. One concern with a wage- based measure
of cost of living is that the higher wages paid in higher cost of living areas might re- fl ect productivity differences among workers. Geographically mobile workers in par-
ticular will live in areas with high productivity and wages. For this group of workers, the location- specifi c component of wages might refl ect unobserved individual produc-
tivity. To mitigate such concerns, our wage- based measure of cost of living focuses on the wages of workers employed in occupations whose workers are relatively immo-
bile geographically, such as farmers and funeral directors. See Appendix Table A1. A CZ’s average wage in these “low- mobility” occupations is our second control for
12. The smallest identifi able area in the ACS is the public use microdata area PUMA, a Census-defi ned place with population over 100,000. Some PUMA boundaries do not perfectly align with counties. When this
is the case, we assign PUMA characteristics to a CZ based on the PUMA’s population share in the CZ. See McHenry 2014.
13. Columns 2-6 describe the distribution of housing costs faced by NLSY79 black and white respondents. For example, Column 2 implies that 10 percent of black respondents to the NLSY79 live in CZs with average
rental costs below 586 as measured in the ACS, while 10 percent of white respondents live in CZs with average rental costs below 518.
14. That is, the CZ housing cost measure is computed as follows:
HousingCostCZ = MeanRentCZ
CZ =1 N
∑
MeanRentCZ N
and the cost of living is computed as CostofLivingCZ = 0.42 HousingCostCZ + 0.58 1
. The 42 percent hous- ing expenditure share is from Appendix 4 in the U.S. Bureau of Labor Statistics Handbook of Methods
chapter about the Consumer Price Index.
Table 1 Local Cost of Living by Race, Characteristics of Locations Where NLSY79
Respondents Live
Percentile in the Distribution of NLSY79 Respondents’ Locations
Mean 10th
25th 50th
75th 90th
Panel 1: Average Rent for 2 to 3 Bedroom Property Black
852.2 246.0
586.4 655.2
805.6 976.4
1,267 White
815.7 245.4
517.8 639.3
773.3 978.9
1,188 Ratio blackwhite
1.045 1.132
1.025 1.042
0.997 1.066
Panel 2: Mean Hourly Wage for Workers in “Low- Mobility” Occupations Black
16.37 1.86
14.11 15.20
16.11 17.62
19.34 White
16.31 1.97
13.65 14.82
16.13 17.52
19.34 Ratio blackwhite
1.004 1.034
1.026 0.999
1.006 1.000
Panel 3: Mean Hourly Wage for Workers in “High- Mobility” Occupations Black
26.58 4.22
20.93 24.58
26.35 29.62
30.38 White
25.96 4.49
19.79 22.95
26.08 29.31
31.43 Ratio blackwhite
1.024 1.058
1.071 1.010
1.011 0.967
Notes: Panel 1 contains summary statistics about the average monthly rent for 2- and 3- bedroom single- family dwellings in the NLSY79 respondent’s commuting zone CZ in the year 2006. CZ- average monthly rent
data calculated using the pooled 2005–2007 ACS samples from IPUMS Ruggles et. al. 2010. We calculate average “gross monthly rent” over households in each PUMA and aggregate to CZs with averages weighted
by population overlaps between PUMAs and CZs. Left- most column shows for each respondent category the mean and standard deviation in parentheses; the remaining columns show percentiles of the residence CZ
rental price distribution. Panels 2 and 3 contain corresponding summary statistics about average hourly wages in NLSY79 respondents’ CZs. Panel 2 shows average wages for workers in “low- mobility occupations”—
those occupations for which workers are most likely to live in their birth states in the 2005–2007 ACS. Panel 3 shows average wages for workers in “high- mobility occupations”—those occupations for which workers
are least likely to live in their birth states in the 2005–2007 ACS. There are 1,167 black women and 1,853 white women in the NLSY79 sample. Asterisks indicate statistical signifi cance of differences between cost
of living experienced by blacks and whites p 0.01 p 0.05 p 0.1.
local costs of living.
15
As shown in Table 1, Panel B, we fi nd that the average black woman lives in an area with higher wages among low- mobility occupations. The gap
between local wages where black and white women live is highest in CZs at the lower end of the local wage distribution that is, the 10
th
and 25
th
percentiles of respondents’ locations.
16,17
In addition to cost of living, our preferred specifi cations control for years of educa- tion. If, conditional on AFQT score, black women acquire more years of education
than white women, then omitting years of education would result in an upward bias on the coeffi cient estimate for the black indicator variable. Lang and Manove 2011 show
that black women in the NLSY79 had acquired more years of education by 2000 than white women with the same AFQT score. We confi rm that this is true in 2006 as well.
Incorporating these methods, our preferred estimate of the black- white wage gap among women is the estimate for
β in: 2
Ln wage
i
= ␣ + BLACK
i
+ ␥
1
AFQT
i
+ ␥
2
AFQT
i 2
+ ␦
1
age
i
+ ␦
2
age
i 2
+ COL
i
+ EDUC
i
+
i
. This equation includes local cost of living COL and years of education EDUC.
The model of employer discrimination in the Appendix implies that local costs of living and individual productivity traits like education are important controls to in-
clude; otherwise, the regression is unlikely to identify wage differences due to racial discrimination. We estimate Equation 2 using OLS and median regression. Observed
log hourly wage among workers is the dependent variable for our OLS estimates. Median regression estimates also include nonworkers with imputed potential wages,
described above.
B. NLSY79 Data