Empirical Framework Conceptual Framework

processes, there are still a number of reasons why it may be correlated with labor market success in the western world. First, the development of the muscular system may be related to the same ge- netic predispositions and susceptible to the same environmental conditions as height growth, thereby yielding a positive correlation between the two Silventoinen et al. 2008. 8 To the extent that the same genetic and environment factors also infl uence a person’s labor market success, muscle strength would be an important control variable when estimating the height premium. A second possibility is that tall people to a larger extent participate in activities that not only builds noncognitive skills, as suggested by Persico, Postlewaite, and Silver- man 2004, but also muscle strength, such as sports, and that these same characteris- tics have a labor market return. See Rooth 2011. In fact, to the extent that being tall constitutes an advantage in certain sports activities, one would expect tall people to participate to a greater extent. From this perspective, muscle strength would instead be a mediating factor in the height- earnings relationship, because it may partly be caused by being tall. Finally, it is also possible that male muscle strength signals certain personality traits. Some studies have for instance found that handgrip strength predicts not only physical fi tness but also aggressiveness and dominance Gallup, White, and Gallup 2007. These traits can be recognized as dimensions of noncognitive skills, which would mean that muscle strength would partly pick up the same factors as direct mea- sures of noncognitive skills. Even though there are good reasons to believe that muscle strength may play a role in the height- earnings relationship, our discussion also shows there is great uncer- tainty about the mechanisms at work. In our empirical analysis, we will therefore be somewhat agnostic about whether muscle strength mainly plays the role of a control variable or a mediating variable and leave that for future research to investigate.

D. Empirical Framework

Our baseline empirical model relates adult earnings to height at age 18 for the full population and controls only for age: 3 y i = α + βh i + δa i + ε i where i is the index for the individual, h is a measure of individual height in centimeters, and where a indicates age. In this model, β will pick up both the causal return to height, that run through preferential treatment of tall people or through mediating factors, and the infl uence of any omitted control variables that are correlated with height and earn- ings. We will then alter this baseline model by including different control variables and mediating variables into the regression. In particular, the previous section introduced three sets of variables that are of importance in understanding the relationship between height and earnings; cognitive skills, noncognitive skills, and muscular strength. We will also introduce variables indicating family background; that is, parental education 8. Related to this are fi ndings indicating an important role of early life conditions for the development of both muscle strength and height. For the former outcome, studies have shown a positive association between birthweight and adult muscle strength for example, Gale et al. 2001. and parental earnings, which will be thought of as control variables, as they may relate to both an individual’s height and earnings. By entering the explanatory variables one by one, starting with the controls, and then together, we can analyze to what extent the relationship between height and earnings in Equation 3 above is affected when account- ing for control variables and mediating factors. Our full model can thus be written: 4 y i = α + βh i + X i γ + ε i where X now is the vector of variables measuring age, parental background, cognitive skills, noncognitive skills, and muscle strength, and where γ is the associated vector of regression coeffi cients. In this model, the estimate of β can be interpreted as the returns to height that remains after accounting for the effect of height that runs through noncognitive skills and after controlling for factors associated with, but not caused by, height—that is, cognitive skills and parental background. 9 This remaining height premium can then be interpreted as the part of the height premium that runs through preferential treatment of tall people on the labor market, given that all mediating fac- tors and confounding factors have been accounted for. Note that in our main empirical analysis, we only include variables that are deter- mined before labor market entry. As argued by Case and Paxson 2008a, controlling for variables such as occupation and postsecondary education would hide part of the height premium if taller individuals sort themselves into certain educations or jobs. However, at the end of the analysis found in Tables 4.1 we will examine the extent to which the remaining height premium works through sorting into occupations and educations. Our second main specifi cation exploits our data on siblings and can be written as: 5 y ij = α + βh ij + X ij γ + μ j + ε ij where ij is an index for an individual i in family j and μ represents a family fi xed ef- fect. The fi xed effect captures family characteristics common to all siblings within the same family. Here, identifi cation of the height coeffi cient relies upon sibling variation in height at age 18. In this specifi cation, our estimate of β will not be biased due to any confounding infl uence from unobserved family level factors that are also associated with earnings. The availability of large- scale sibling data constitutes an important advantage in our analyses. First, height is known to have a very high genetic heritability explaining up to 80 percent of its variation Visscher, Hill, and Wray 2008. 10 If some of the genes that affect height, via separate channels also affect earnings, this will bias the height coeffi cient, because genes are not observed. However, since biological siblings share on average 50 percent of their genes, our fi xed effects approach therefore ”washes” out some of the genetic infl uence, which may reduce the bias in the height coeffi cient. A similar reasoning applies to the coeffi cients for cognitive skills, noncognitive skills, and muscle strength. Even though the siblings fi xed effects approach does not fully cancel out the infl u- ence of genetics, the sibling data give us another advantage: Mendellian randomiza- 9. This also assumes that all relevant control variables have been included in the regression. 10. It should be noted, however, that the genetic loci identifi ed by the largest study to date only explain about 10 percent of the phenotypic variation in height Allen et al. 2010. tion, or the “genetic lottery,” implies that within a family, it is random which child inherits a particular gene. 11 Thus, if the variation in height across siblings that remains after accounting for sibling fi xed effects would be mainly genetically determined, we can be confi dent that height variation across siblings is exogenous. 12 Second, the sibling approach allows us to difference out confounding environmental factors operating at the family level. Sibling teenagers are likely to share important unobserved factors such as food and nutrition supply in the home, parental practices, and preferences. They are also likely to attend the same school and thus face the same school environment and neighborhood characteristics. 13 However, parents may also reinforce or compensate for differences in endowments between siblings by directing various amounts of skill building resources to each child. Empirical evidence from Western countries seems to mainly support the hypothesis that parents try to com- pensate for ability differences between children. See Almond and Currie 2011 for a recent literature overview. To the extent that skills and strength are associated with labor market success as well as height, such compensatory practices would dampen the association between height and earnings.

III. Data and Descriptive Statistics