Estimation strategy: measuring the impact of Medicaid using firm data

tions, they find that among children who began the period uninsured, those in the Ž . target group were slightly and insignificantly more likely to have private insurance 28 months later, suggesting that expanded Medicaid eligibility did not impede transitions from uninsurance to private coverage. Finally, they find that among children who were initially uninsured, those in the target group were 7.9 percentage points more likely to gain Medicaid coverage, and 8.6 percentage points less likely to remain uninsured than those in the comparison group. While the data sources and analytical techniques are quite different, the studies using longitudinal data share an important shortcoming with the analysis we present below. Compared to a repeated cross-section of CPS data, the NLSY, the SIPP and our employer survey data provide much smaller samples. For example, Ž . the main analysis of children by Cutler and Gruber 1996 is based on a sample that is over 100 times as large as the NLSY sample used by Yazici and Kaestner Ž . Ž . 1998 : 266,421 observations vs. 2244 observations. Blumberg et al. 1999 use samples ranging from 902 to 2587 observations; as discussed in the next section, the largest sample available from our employer survey data is 3062 observations. Thus, in addition to any differences in the empirical specifications used, there are important differences across existing studies in terms of the precision with which policy effects can be estimated. As a result, in many cases, because of large confidence intervals what appear to be widely divergent results are not statistically distinguishable from one another.

3. Estimation strategy: measuring the impact of Medicaid using firm data

In this paper we use firm-level data to investigate how the availability of public insurance coverage for low income workers affects the health insurance decisions of employers and the workers themselves. The basic econometric model that we estimate can be written as Y s a M U q X X b q g MARKET q g YEAR q ´ 1 Ž . f f f m f y f where subscript f is for firms and Y is one of several insurance-related outcomes: the decision by the firm to offer coverage at all, several firm decisions regarding plan generosity, and the take-up decision of workers who are offered employer- sponsored coverage. The regressor measuring the availability of public insurance coverage is M U , the fraction of the firm’s employees who are eligible for f Ž . Medicaid or who have Medicaid-eligible dependents . Additional explanatory Ž . variables in the model include firm characteristics X , variables capturing Ž . conditions in local labor and health care markets MARKET and a set of year Ž . dummies YEAR to account for secular trends in the various insurance-related outcomes. The Medicaid eligibility status of a firm’s employees is not directly observed in any employer survey data, but as a starting point for our analysis, it is useful to Ž . consider issues in the estimation of Eq. 1 were such a variable available. The most important issue would be the likely endogeneity of M U . To the extent that workers do sort among employers according to their demand for health insurance Ž . and other benefits as predicted by the local public good model , individuals with Medicaid coverage may seek out higher wage jobs at non-insuring firms, raising a problem of reverse causality. More generally, a spurious correlation between M U and Y may result from unobserved firm characteristics correlated with both the Medicaid eligibility of a firm’s workers and its policy on health benefits. Since Medicaid rules vary significantly across states, an additional source of potential bias comes from a possible correlation between the percentage of persons eligible for Medicaid in a state and the condition of the state’s economy. Of course these issues are not unique to firm level data; in previous work on crowd-out, re- searchers have attempted to account for the endogeneity of Medicaid eligibility by Ž using instrumental variables and controlling for state fixed effects Cutler and . Gruber, 1996; Shore-Sheppard, 1997 , by using a quasi-experimental design Ž . Ž Dubay and Kenney, 1997 , or by using panel data Blumberg et al., 1999; Yazici . and Kaestner, 1998 . As M U is not observed in our data, we proceed by using a two-sample estimation technique that combines data on Medicaid eligibility from the March CPS with the firm-level data. This technique is closely analogous to conventional two-stage least squares, and has the advantage of addressing two problems simultaneously: the lack of information on eligibility in the firm data set and the potential endogeneity of Medicaid eligibility as discussed above. 4 Note that since the variable we are interested in is Medicaid eligibility, rather than actual Medicaid coverage, there are actually two steps within our ‘‘first stage’’. We begin by imputing eligibility for everyone in the CPS sample according to the rules applicable to each state in each year. 5 Briefly, a child Ždefined as someone who is not a family head and is under age 19 or between 19 . and 23 and a full-time student is imputed to be eligible if his or her age, family income, and family structure meet either state AFDC standards, state optional standards such as the Ribicoff or Medically Needy programs, or meet federally mandated or state optional Medicaid expansion criteria. A woman is considered to be eligible if she is a single parent and qualifies for AFDC, or under the expansions if she meets the federally mandated or state optional income criteria Ž . and she is of child-bearing age between the ages of 15 and 45 . A man is considered to be eligible if he is a single parent and he qualifies for AFDC. In practice very few men are imputed to be eligible, though men may have eligible family members. 4 Ž . Two-sample estimation methods have been used in previous work by Angrist and Krueger 1992 Ž . and Card and McCall 1995; 1996 , among others. 5 One implication of using imputed eligibility rather than actual Medicaid coverage is that our results are not affected by ambiguities in how respondents interpret CPS questions on insurance or by changes over time in the framing of those questions. Ž . As noted, one potential source of bias in estimating Eq. 1 comes from the fact that the number of persons eligible for Medicaid in a state varies over the business cycle. Therefore, the coefficients from our first-stage regressions should reflect differences in program generosity across states and over time within a state, but should not pick up macroeconomic shocks potentially affecting both eligibility and coverage. To ensure this, we estimate our first stage regression on three pooled CPS samples from the beginning of the period, imputing eligibility to this constant population according to the rules in effect in each year of the firm data. 6 This Ž . method is similar to that used by Shore-Sheppard 1997 in previous work on Medicaid. 7 Imputed eligibility, M, is then the dependent variable in the following regres- sion using individual data from the CPS: X Prob M s 1 Z s Z u , 2 Ž . Ž . i i i Ž where Z is a vector of variables some of which may be included as controls in Ž .. Eq. 1 that are correlated with eligibility. The estimated coefficients from this regression are then applied to the firm-level analogs to these variables to construct ˆ X ˆ U Ž . fitted values, M s Z u , that are used in place of M in 1 . Card and McCall f f f Ž . 8 1995 show that this type of two-sample procedure yields consistent estimates. One requirement for consistency of the two-sample procedure is that the two samples must be drawn from the same universe. Hence, our CPS sample consists of workers in firms with fewer than 100 employees, the population which corresponds to employees of the firms in the firm sample. The independent Ž . variables in Eq. 2 are ones that are available in both the CPS and the firm level Ž . data: year dummies which are fully interacted with the other variables , state Ž dummies, 8 industry dummies, 2 firm size dummies and interactions between . firm size and industry , and a dummy for whether the worker earned less than US10,000 per year from her main job, which is interacted with state, industry, Ž . and firm size. The last variable is in the firm data described below as the percentage of a firm’s employees who are paid less than US10,000 per year. For computational feasibility, we estimate separate eligibility regressions for each year. Even then, interactions among the other variables produce 125 coeffi- Ž . cients for each of the ten regressions 5 years times two outcomes . For reasons of 6 Three surveys were used to ensure there were enough observations in all stateryearrfirm sizerwage category cells. 7 We also explored, using the method of replicating a national sample of data from a single CPS, the Ž . method used by Cutler and Gruber 1996 . Estimates using this method were less precise, perhaps because state-specific identifying information could not be used. 8 One difference they note between this approach and a conventional two-stage least squares estimator is that since Z is limited to variables appearing in both data sets, it does not include all the Ž . controls from Eq. 1 . space, we cannot report all of these coefficients. Because of the large number of interactions, reporting selected coefficients is also problematic — the ‘‘main effects’’ of industry, firm size, and income are different for each state and each year, as are interactions among these variables. Therefore, we present our first-stage results in the following way: we report F-statistics for the various groups of variables and their interactions in Table 1A, and selected mean predicted values in Table 1B. Not all of the state or industry fixed effects and the interactions are significantly different from zero, though F-statistics show that for each year each group of variables or interactions is statistically significant at conventional levels. The R 2 statistics, which are 0.08 on average for the individual eligibility equations and 0.09 for the family eligibility equations, are reasonable considering that the first stage equations are estimated as linear probability models. 9 Our first stage results are summarized in a more meaningful way in Table 1B, which reports mean predicted values by year and earnings category for the two measures of Medicaid eligibility. The fitted values capture three important sources of variation in Ž . Medicaid eligibility two of which are evident in the table . First, since Medicaid is a joint state–federal program, in any year there are differences across states in the fraction of workers eligible for Medicaid. Second, as shown in the table, there is variation over time coming from the fact that federal and state legislation caused Ž Medicaid eligibility to increase over the time period covered by our data 1989 to . 1995 . However, even in the later years the percentage of workers eligible themselves is quite low: 13.77 in 1995 as compared to 11.15 in 1989. 10 The family-based measure of Medicaid eligibility is higher in all years and has a slightly larger percentage point increase over the period. Third, within any state in any year, M U will vary across firms according to the degree to which they rely on low wage workers. This is seen by a comparison of the second and third column of Table 1B. While the effect of the Medicaid expansions on employer decisions obviously Ž depends on how eligible workers are distributed across firms something that we . cannot observe in the CPS, and can only partially observe in our employer data , the low percentage of eligible workers overall gives a reason to suspect that any such effect was small. This is because a firm in which a significant majority of workers are ineligible for Medicaid is unlikely to drop or otherwise alter health benefits in response to the expansions, and low wage firms with a high fraction of 9 Ž . 2 Morrison 1972 shows that with a binary dependent variable, the R is bounded below 1. We investigated using a logit for the first-stage equation, however, the results were extremely similar to those of the linear probability model, and the logit complicated the estimation of the correct standard errors in the second stage. 10 Note that actual eligibility increased by more than the eligibility of this constant population due to Ž . the effects of the recession. See Shore-Sheppard 1997 for a discussion of the effects on eligibility of changing population characteristics. workers who gained Medicaid eligibility via the expansions have always been less likely to offer insurance. Table 1 Ž . A Fit statistics for first-stage regressions predicting Medicaid eligibility Independent variables Degrees of F-statistics freedom 1989 1990 1991 1993 1995 Dependent Õariable: own eligibility State dummies 50 3.68 2.99 2.70 7.04 4.30 Ž . State=1 -US10,000 50 9.06 5.53 5.40 8.27 5.74 Industry dummies 7 6.05 6.04 5.71 4.28 3.97 Ž . Industry=1 -US10,000 7 11.53 17.22 16.94 20.82 19.12 Firm size=industry 7 7.90 10.06 9.80 10.71 9.63 Overall model 124 60.19 62.20 62.47 73.41 67.4 2 Adjusted R 0.081 0.083 0.085 0.097 0.090 Dependent Õariable: family eligibility State dummies 50 11.26 8.51 7.89 18.48 11.51 Ž . State=1 -US10,000 50 6.64 3.68 3.66 4.02 3.94 Industry dummies 7 9.00 12.15 12.31 14.35 12.86 Ž . Industry=1 -US10,000 7 10.67 12.85 12.26 14.17 12.14 Firm size=industry 7 7.35 7.93 8.01 8.55 7.63 Overall model 124 69.11 66.99 66.78 74.44 68.42 2 Adjusted R 0.092 0.089 0.089 0.098 0.091 Regressions estimated using workers in small firms from 3 years of the March CPS as described in the text. The number of observations for all regressions is 83,566. All F-statistics are significant at the 1 level or better. Ž . B First-stage results: trends in Medicaid eligibility among workers, by income category All Workers Workers earning -US10,000ryear US10,000ryear Percentage of eligible for Medicaid 1989 11.15 19.31 3.85 1990 12.22 21.28 4.12 1991 12.24 21.64 4.23 1993 13.85 23.63 5.11 1995 13.77 23.58 5.01 Percentage in families with eligible members 1989 16.81 26.72 7.96 1990 18.66 29.59 8.90 1991 19.07 30.11 9.20 1993 21.24 32.47 11.22 1995 21.22 32.57 11.08 Entries are for workers in firms with fewer than 100 employees in the 1988–1990 March CPS. Predicted eligibility comes from regressions on year, state, industry, firm size, an income -US10,000 dummy, and various interactions as described in the text. A final econometric issue with our two-sample technique pertains to the standard errors. In estimating the standard errors we treat the unobservability of M U as a missing data problem, and use a multiple imputations approach. A single f ˆ X X imputation is done by replacing M with M where M is a draw from the normal ˆ Ž . distribution N M, s . A consistent estimate of the covariance matrix is T ˆ M J where ˜ T s V q B . 3 Ž . J J ˜ Ž . In this equation, V is the original uncorrected estimate of the covariance matrix and B is J J 1 X B s b y b b y b 4 Ž . Ý ž ž J j j J y 1 js1 Ž . where b is the coefficient vector from one of our J J s 100 regressions using j X y1 11 M , a simulated value from the above distribution, and b s J Ý b . j j j

4. The firm-level data