Tests Related to Inference

This section conducts tests to help rule out the possibility that the statistical sig- nificance observed in the baseline regression is merely an artifact of underesti- mated standard errors. First, following Bertrand, Duflo, and Mullainathan’s (2004)

20 Specifically, early before consists of months 1 to 31 of the pretreatment period while late before consists of months 32 to 63. Early after consists of the first 21 months of the after period and late after

the last 21.

Does Universal Coverage Improve Health? / 53 Table 5. Testing for differential pretreatment trends and delayed effects.

Dependent variable: overall health

One-year splits MA×Late Before

Split before and after periods

MA×During

MA×Early After

MA×Late After

1,979,383 Notes: Coefficient estimates are shown; treatment effects are available upon request. Standard errors,

heteroskedasticity robust and clustered by state, are in parentheses. ***Statistically significant at the 0.1 percent level. **Statistically significant at the 1 percent level. *Statistically significant at the 5 percent level. All regressions include the individual-level control variables, state fixed effects, and fixed effects for each

month in each year. The full control group is used. Observations are weighted using the BRFSS sampling weights.

suggestion, we compress all the available data into a state-level panel with three time periods—before, during, and after—and regress state average health index on MA × During, MA × After, and state and time period fixed effects. Next, we compress the data into only two cross-sectional units—Massachusetts and other states—and

10 years, defining 2006 and 2007 as the during period and 2008 to 2010 as the after period. We then regress average health index on MA × During, MA × After,

a Massachusetts dummy, and year fixed effects. 21 MA × After remains statistically significant in both regressions despite the small sample, and the effect sizes in stan-

21 The small sample sizes preclude the inclusion of the full set of control variables. In lieu of these, we include in both regressions a single control variable that summarizes the influence of all the controls at

once. This variable, which we call “predicted health status,” is computed by regressing the health index

54 / Does Universal Coverage Improve Health? dard deviations (of the individual-level health index) are 0.039 and 0.053, which are

similar to those from Table 4. More detailed results are available in Table A4 of the online Appendix. 22

In the spirit of Abadie, Diamond, and Hainmueller (2010), we also consider a different approach to inference and ask how likely it would be to estimate similarly large health improvements simply by picking any state at random. We reestimate the baseline ordered probit regression with each of the 45 control states as the treated unit, considering the P-value for Massachusetts to be the proportion of these states that exhibited as large a health improvement as that estimated for Massachusetts (i.e., the probability a health improvement as large as the one seen in Massachusetts would be observed in a random draw). Since Abadie, Diamond, and Hainmueller (2010) specifically recommended this method for use with the synthetic control approach, we also repeat the analysis using a synthetic control group for each state rather than the full control group. In the full sample regressions, four states— Florida, New Jersey, New York, and Tennessee—had larger health improvements in the after period than Massachusetts, for a P-value of 0.089. Using synthetic controls, only two states—Florida and Tennessee—had larger gains than Massachusetts, for

a P-value of 0.044. 23

Tests Related to Inference