Selection Bias and Sibling Fixed- Effects Models

fi ve MAOA repeats, and some previous studies have included such individuals in the positive MAOA status group. To test the sensitivity of my results to this alternative classifi cation, Column 8 of Table 4 shows results where individuals with fi ve repeats are classifi ed as having positive MAOA status, and they are not materially different from those using the classifi cation scheme of the baseline specifi cations. A fi nal issue is population stratifi cation by MAOA status. It has been established elsewhere that positive MAOA status is not evenly distributed across the population but is instead more common at the higher end of the income distribution, among the college educated, and within the white population, among other socioeconomic and demographic traits Sabol, Hu, and Hamer 1998. To the extent that the control vari- ables in my baseline specifi cations are correlated with household income, this strati- fi cation could lead to bias in the estimated interaction between income and MAOA status due to missing interactions between income and other righthand side variables. To address this possibility, Table A1 of the web appendix 22 shows results from mod- els where the two MAOA groups are pooled, and both MAOA status and household income are interacted with all of the other righthand side variables. For purposes of comparison, baseline pooled specifi cations in which only income and MAOA are in- teracted are shown as well. The results indicate that while the addition of so many interaction terms increases the size of the standard errors, the point estimate for the income- MAOA interaction term actually increases when these highly fl exible interac- tive models are implemented, which reduces concerns about the basic results being driven by missing interaction terms. Of course these models do not remove bias that is due to unobserved traits which are correlated with MAOA status or family income. This issue is discussed further in the next two subsections. To summarize, the differences in the association between family income and educa- tional attainment across MAOA groups is robust to a variety of reasonable alternative specifi cation, modeling, and classifi cation choices.

C. Selection Bias and Sibling Fixed- Effects Models

The models presented above found large interactions between genetic status and envi- ronmental conditions but alone they are not suffi cient for assigning those interactions causal interpretations. Because the effect of interest is an interaction, a fully causal model would need to rely on conditionally exogenous variation in both MAOA sta- tus and childhood income. Despite the relatively large set of controls included in the models above, it is diffi cult to maintain that the residual of the educational attainment measures are conditionally independent of MAOA status and childhood income. With respect to MAOA status, the primary problem is that for a given student in my sample to inherit positive MAOA status, their biological mother must have at least one copy of the implicated MAOA alleles as well. Because maternal MAOA status may impact relevant but unobserved childhood conditions that affect educational attainment, this could lead to a violation of the assumption that MAOA status is conditionally orthogo- nal to the unexplained portion of educational attainment. With respect to childhood in- come, there may be factors that are related to both income and academic achievement 22. Web appendix tables are available at http:jhr.uwpress.org and at the author’s university homepage, http:www4.uwm.edu letsci economicsfacultythompson.cfm. but that are not directly controlled for in the regression models. Examples of such omitted variables may include personality traits or ability, and their omission would also lead to inconsistent estimation of the gene- environment interaction under study. If a controlled experiment were possible, causal gene- environment interactions could be established by randomly assigning students into groups that differ in terms of both MAOA status and childhood income, and then estimating models similar to those in Table 3. While such an experiment is clearly implausible, the Add Health data does include a substantial number of sibling pairs, and this allows for the estimation of sibling fi xed- effects models that come closer to the experimental ideal of random assignment—at least with respect to MAOA status. The ability of such models to identify defensibly causal genetic effects comes from an elementary principle of ge- netics, sometimes called the principle of random fertilization. This principle states that parental genes combine at random during fertilization so that the probability of a given zygote fertilized egg having a particular gene combination is directly proportional to the frequency of those genes among the biological parents or biological mother in the case of an X- linked gene like MAOA. Because full biological siblings defi nitionally share the same parents, any genetic differences between such siblings are due entirely to chance. As an example, consider a mother with a three- repeat allele at one copy of her MAOA locus and a four- repeat allele at the other copy recall that since females have two X chromosomes, they also have two copies of the MAOA gene. Random fertil- ization asserts that each male biological child of this mother will have an independent 50- 50 chance of inheriting a three- repeat versus a four- repeat allele as their sole copy of MAOA. 23 In the context of the current study, this means that if a particular male student has positive MAOA status, but their full biological brother does not, this dif- ference is as good as randomly assigned and can therefore be used to estimate an arguably causal MAOA effect. 24 While powerful, the sibling fi xed- effects approach has the disadvantage of sub- stantially reducing the sample size and precision of parameter estimation. This is be- cause while the Add Health data reports both MAOA and household income data for 931 male respondents, only 468 of these observations are part of a male- male sibling pair. Furthermore, these pairs cannot be used to identify causal effects unless the two siblings differ in terms of MAOA status, which further reduces the sample to 70. 25 23. While random fertilization ensures that the MAOA status of siblings are independent of each other, the fact that each allelic variant possessed by the mother has an equal probability of being transmitted to the child is due to the related genetic principle of segregation, which holds that paired genes separate or segregate in such a way that each reproductive cell is equally likely to contain either member of the pair. See Hartl and Jones 1999. 24. The use of within- family genetic variation has some precedence in the economics literature. For ex- ample, Fletcher and Lehrer 2009, 2011 term such variation “the genetic lottery” and use it as an instrument to identify the effect of mental health on academic performance. 25. Two factors cause this reduction from 468 to 70 to be so large. First is the fact that 122 of the 468 respondents were part of monozygotic identical twin pairs, and therefore share all of their genes, including MAOA. Second is that in order for there to be the potential for within sibling MAOA heterogeneity, the biological mother of the male siblings must have two different MAOA alleles a condition known heterozy- gocity, in contrast to homozygocity. While this is by no means rare, it is not universal either. For example, among female Add Health respondents with valid MAOA data, only around 40 percent are heterozygotic with respect to MAOA. Combining both factors, we can estimate that only 468 – 122 × 0.4 ≈ 138 observations had the potential to differ from their sibling in MAOA status. If half of these observations actually did differ Finally, even among male- male sibling pairs with differing MAOA status, there is no guarantee that the two siblings will have differing educational attainment; indeed, it is quite common for siblings to complete identical amounts of education. While a larger sample of MAOA- varying male- male sibling pairs would of course be preferable, the relatively small number of such cases in the Add Health data turns out to be suffi cient for generating reasonably precise estimates of the MAOA- income interaction term of interest. As a fi rst approximation of the interactive effect, Fig- ure 1 presents a scatter plot of childhood income against sibling- differenced years of education. Each point in the scatter plot represents one sibling pair in which the “fi rst” sibling has positive MAOA status while the “second” sibling does not, and the vertical axis records the difference between the fi rst and second siblings in terms of from their sibling in terms of MAOA status, the identifying sample in the sibling fi xed- effects models would be 69, which is very close to the number who actually do differ 70. Of these 70 observations, 22 are full siblings and the remaining 48 are DZ twins. Figure 1 Within Sibship Differences in Educational Attainment by Household Income Notes: Each point of the scatter plot represents a sibling pair in which one sibling has MAOA = 1 status and the other sibling has MAOA = 0 status. The vertical axis measures the difference in years of education between the MAOA = 1 sibling and the MAOA = 0 sibling, so that points where the y- coordinate is zero represent cases in which both siblings completed the same number of years of education, points with a positive y- coordinate represent cases in which the MAOA = 1 sibling completed more years of education than the MAOA = 0 sibling, and points with a negative y- coordinate represent cases in which the MAOA = 1 sibling completed fewer years of education than the MAOA = 0 sibling. -4 -3 -2 -1 1 2 3 4 8 9 10 11 Log Household Income During Childhood Sibling Difference in Years educational attainment. Therefore, points on the zero- line of the vertical axis represent pairs in which both siblings had the same number of years of education while points above the zero- line represent pairs in which the MAOA positive sibling completed more years of education, and vice versa for points below the zero- line. If the effect of having positive MAOA status falls as income increases, then we would expect the locus of points in Figure 1 to be downward- sloping, and the included fi t line shows that we do in fact observe a declining relationship between the effect of positive MAOA status and household income. However, the included confi dence bands make it clear that this estimated relationship is not particularly precise, in large part due to the small size of the subsample used to identify it. 26 Regression- based estimates of the sibling fi xed- effects models are reported for all three educational outcomes in Table 5. Specifi cally, I estimate models of the following form, where subscript i, f denotes student i from family f: 2 Educational Attainment if = ␤ + ␤ 1 MAOA if + ␤ 2 MAOA if × Log Income f + ′ X if ␤ + ␥ f + ε if . Here X if is a vector that now contains only control variables that vary within families child age and birth order, and γ f is a family fi xed- effect. Results of these models are shown in the fi rst, third, and fi fth columns of Table 5. Because household income is invariant within families, its main effect is unidentifi ed in the sibling fi xed- effects models. But MAOA’s main effect and its interaction with household income are identifi ed, and our primary interest lies in the interaction term. In each of the three cases, this interaction term has the expected negative sign and is nontrivial in magnitude, although the small sample sizes typically prevent the interac- tions from achieving statistical signifi cance at conventional levels. For example, the interaction term in the model estimating total years of schooling Column 5 has a coeffi cient of –1.487, which indicates that the increase in total years of education as- sociated with a doubling of household income is estimated to be 1.487 years greater for students without positive MAOA status that it is for students with positive MAOA status. The interaction terms are similarly negative and substantively large in magni- tude for the models predicting college enrollment and college completion. To better understand how selection on MAOA status impacts the estimation of the MAOA- income interactions, the second, fourth, and sixth columns of Table 5 show re- sults from models that are estimated using the same sample as the sibling fi xed- effects models but use simple cross- sectional MAOA variation to identify the interaction term that is, they exclude the sibling fi xed- effect. Perhaps surprisingly, the size of the interactions are broadly similar in the cross- sectional and sibling fi xed- effects speci- fi cations. For example, the interaction term in the cross- sectional model estimating college attendance is –0.304, as opposed to –0.247 in the sibling fi xed- effects speci- fi cation. Similarly, the cross sectional and sibling fi xed- effects estimates of the inter- action term in the models predicting for college completion are –0.165 and –0.118, respectively, while the analogous estimates in the models predicting total years of 26. A visual inspection of Figure 1 naturally leads one to question whether the slope of the fi t line is largely driven by the observation with a log income of approximately 8 and a differenced education value of +2 years. Figure A1 if the web appendix reproduces Figure 1 with this observation excluded, and shows that the fi t line for the modifi ed scatter plot is still clearly downward sloping. Thompson 281 Table 5 Sibling Fixed-Effects Models College Attendance College Graduation Years of Education Sibling FE 1 Cross- Sectional 2 Sibling FE 3 Cross- Sectional 4 Sibling FE 5 Cross- Sectional 6 Log Income — 0.128 — –0.904 — 11.158 0.648 0.494 4.929 MAOA 2.505 2.989 0.977 1.419 14.907 9.517 1.727 4.016 1.307 2.110 3.350 11.954 MAOA × Log Income –0.247 –0.304 –0.118 –0.165 –1.487 –1.052 0.171 0.395 0.133 0.231 0.347 1.154 Observations 70 70 70 70 60 60 Notes: All regressions are estimated for males only. Sibling FE models include controls for birth order and child age, while cross-sectional models include controls for birth order, number of siblings, race, language spoken in the home, parent and child ages, and school dummies. Standard errors are clustered at the family level. , and indicate statistical signifi cance at the 10, 5 and 1 levels, respectively. education are –1.052 and –1.487, respectively. The broad similarity of these two sets of estimates suggests that, while selection bias from nonrandom MAOA variation is important in principle, it may be relatively small in practice, at least in the context of the particular sample and environmental measures used here. The robustness of the basic result to sibling fi xed- effects specifi cations that use plausibly exogenous genetic variation to identify the interaction terms is important suggestive evidence that those interactions are causal in nature. However, a number of considerations make it inappropriate to assign the interaction a fully causal in- terpretation. One outstanding issue is whether the interactive effects are specifi c to family income, as opposed to socioeconomic background more generally, and this is discussed in the next subsection. Additional reasons for caution are discussed in Sec- tions V and VI below.

D. Parental Education as an Alternative Measure of Economic Background