Data and empirical specification

4. Data and empirical specification

I follow a number of earlier self-employment transition analyses in developing an empirical strategy to estimate these effects. 3 Specifically, I estimate equations of the following type: D s b X X q g T q m q n , 1 Ž . i , tq1 i , t i , tq1 i i , tq1 where D is a dummy variable that equals one if individual i moves from i, tq1 wage-and-salary at time t to self-employment at time t q 1, and zero if he remains wage-and-salary in both periods. The X vector includes a constant term and a i, t set of time t exogenous variables, and the T term represents the individual- i, tq1 specific difference in wage-and-salary and self-employment tax rates at time t q 1. The error term in this equation includes an individual-specific time-invariant Ž . random effect m to capture unobserved individual heterogeneity, and an inde- i Ž . pendently and identically distributed residual component Õ with zero mean i, tq1 Ž . and finite variance. A convenient empirical specification for Eq. 1 is a random effects probit. It should be noted that the use of tax rates as opposed to tax levels, or liabilities, is somewhat arbitrary. The two are certainly correlated, but equal tax Ž . rates where the difference in the rates equals zero will have the same effect in the multivariate analysis regardless of the levels of tax liabilities. At the heart of Ž . this situation is the issue of tax evasion or avoidance. Joulfaian and Rider 1998 have found a positive relationship between marginal tax rates and tax evasion Ž . among the self-employed, and Blumenthal et al. 1998 have shown that Schedule Ž . C filers for self-employment income are significantly more likely to evade taxes. Ž . Consequently, it would be preferential but highly difficult to control for this type of behavior. By assuming full compliance on the part of the self-employed for the purposes of this study, I am measuring the benefit from noncompliance by the tax rate rather than the actual dollar value, or tax liability. As it is expected that rates and Ž levels are highly correlated and that either, when implemented as differences from wage-and salary rates or liabilities, would capture the payoff to evasion and . avoidance , I use rates for purposes of comparison to all but the three earliest empirical studies in this literature. 4 3 Transition analysis offers the important advantage over cross-sectional analysis of being able to date the explanatory variables in the pre-transition period, thus avoiding problems of potential endogeneity. Further, previous research has found that empirical results are very similar in both types Ž . of studies see, for example, Meyer, 1990 . 4 Including a tax level effect in the probits would be inappropriate due to the high degree of correlation between tax rates and levels. However, rate differences could be replaced with differences in liabilities within the current framework. It is anticipated that empirical results would be similar in both cases. Table 1 Earnings regressions results: Panel A — wage-and-salary Variable 1980 1981 1982 1983 1984 1985 Age 889.27 1592.01 1290.02 1705.97 1524.42 2394.51 Age squared y6.91 y15.40 y11.41 y16.60 y13.20 y24.45 Dropout y3559.66 y5020.86 y5125.29 y4122.11 y4749.21 y5615.76 Some college 3226.96 3911.67 3784.66 2648.05 2958.97 3560.44 College graduate 7973.10 8916.98 10,706.29 8937.25 11,595.03 10,387.91 Post-college 8856.38 9150.73 12,946.27 11,825.43 12,756.87 13,571.03 N 1341 1381 1368 1370 1403 1448 2 R 0.19 0.20 0.21 0.18 0.20 0.19 Variable 1986 1987 1988 1989 1990 1991 Age 1933.74 2376.97 1764.93 1267.09 1449.30 2298.25 Age squared y17.97 y23.19 y15.18 y8.48 y10.43 y22.52 Dropout y6036.67 y5060.52 y5665.47 y6765.12 y5636.81 y5186.47 Some college 5093.06 5488.82 4929.59 4695.14 4622.51 5364.25 College graduate 12,226.74 14,761.83 13,460.71 13,754.43 14,976.37 16,836.97 Post-college 15,980.23 19,233.98 21,367.32 24,603.75 25,878.29 29,834 N 1493 1548 1553 1593 1601 1600 2 R 0.22 0.22 0.20 0.20 0.19 0.13 Entries are OLS regression coefficients. Regressions also include a constant term. The dependent variable is the head’s labor earnings in dollars. Statistically significant at the 5 level. Of course, I only observe one actual tax rate-either in wage-and-salary or in self-employment-for each individual in each year depending on which sector is actually chosen. Consequently, I estimate each individual’s labor earnings in the alternative sector for each time period. Earnings regressions of the following form are estimated for each sector in each year: Y WS s a Z q e , 2 Ž . i , t i , t i , t Y SE s f Z q f , 3 Ž . i , t i , t i , t where Y is gross labor income, i indexes individuals, and t indexes time. Z i, t Ž consists of measures for age and educational attainment, and the error terms e i, t . 5 and f are normally distributed with zero mean and finite variance. The i, t coefficients from these regressions are then used to predict wage-and-salary Ž . earnings for self-employed individuals, and vice versa Tables 1 and 2 . I use these predicted income figures together with the National Bureau of Economic Research 5 The earnings equations include age in quadratic form and a series of dummies for educational attainment that are identical to those used in later stages of this analysis. Results from these regressions can be found in Table 1. Table 2 Earnings regressions results: Panel B — self-employed Variable 1980 1981 1982 1983 1984 1985 Age 951.28 y265.09 3581.27 3489.7 5564.08 6284.93 Age squared 0.88 14.16 y38.87 y36.75 y64.28 y70.77 Dropout y9440.33 y4725.58 y8314.44 y3076.38 y2595.45 y4367.88 Some college 3782.40 4534.92 1768.52 4982.5 3449.65 965.74 College 24,980.55 17,415.52 20,264.44 16,037.15 17,660.95 20,980.68 Post-college 11,753.99 20,830.66 10,544.74 20,952.87 36,264.58 33,268.97 N 301 297 304 299 318 312 2 R 0.10 0.15 0.12 0.13 0.13 0.13 Variable 1986 1987 1988 1989 1990 1991 Age 1654.50 y2341.10 y9530.83 y1503.45 8337.04 6137.39 Age squared y10.78 45.64 137.97 30.59 y99.46 y69.94 Dropout y10,159.8 y9889.13 y8158.59 y3684.45 y6017.81 y7499.61 Some college 556.09 5559.45 10,011.71 3745.37 9567.65 6991.68 College 20,221.81 39,409.85 44,846.52 32,118.31 18,350.6 18,357.24 Post-college 36,609.22 31,747.14 33,295.13 47,134.93 55,825.86 67,923.81 N 305 313 322 329 320 327 2 R 0.16 0.06 0.07 0.09 0.18 0.15 Entries are OLS regression coefficients. Regressions also include a constant term. The dependent variable is the head’s labor earnings in dollars. Statistically significant at the 5 level. Ž . NBER TAXSIM model to calculate predicted alternative-sector tax rates, and their difference from the actual tax rates, in each year. I also calculate payroll tax liability and include it in all tax rate calculations. 6 Consider, for example, the case of a wage-and-salary individual who becomes Ž self-employed in the next survey year. His post-transition earnings in self-em- . ployment are observed, and his actual taxes and rates can be predicted using the TAXSIM program. In order to investigate differential taxation, however, his hypothetical earnings and taxes in wage-and-salary must be estimated. After this prediction procedure is complete, I have two sets of earnings and tax data for this and all other individuals that are then used to create necessary tax differentials for the empirical analysis. An in-depth discussion of the tax rate calculation process is provided in Appendix A. 6 Payroll taxes might be expected to have smaller effects on transition probabilities, primarily because the payment of Social Security and Medicare taxes is associated with clearly defined benefits. Higher payroll tax rates will typically carry higher benefits, albeit on a less than one-for-one basis. It should be noted, however, that the time period in this analysis is characterized by rate increases for the self-employed relative to wage-and salary workers without equivalent relative benefit increases. Indeed, Ž . experimentation with the tax rate differentials separated into parts Federal, state, and payroll did not reveal a consistent pattern of reduced magnitude or significance from the payroll tax component. An important issue to address is the potential endogeneity of the tax rate differential in the above transition probit. Whether or not an individual moves from wage-and-salary to self-employment will certainly have some effect on his calculated tax rate differential. To control for this endogeneity, I use the instru- mental variables approach suggested in the study of tax reforms and investment by Ž . Ž . Cummins et al. 1994 and later used by Carroll et al. 1995, 1997 . Specifically, I compute two separate tax rate differentials for each t to t q 1 transition. The first, discussed above, uses t q 1 incomes and tax rules and represents the closest approximation to the actual differential. Each individual’s post-transition tax rate differential is a function of all observable and unobservable individual behavior, however, which makes it potentially endogenous. The second, hypothetical differential uses time t q 1 income under the time t tax rules. The TAXSIM model allows this approach to be implemented quite easily. The hypothetical differential captures the difference in tax rates that would have existed had the tax rules remained constant. The instrumental variable is then equal to the difference in these two tax rate differentials, and represents the part of the actual differential that is caused by the change in the tax code only; the part that is a result of individual behavior is subtracted out of the actual differential. Because of the two-period design of the statistical analysis, panel data is required. My initial sample consists of data from the 1970 through 1991 waves of Ž . the Panel Study of Income Dynamics PSID . TAXSIM can only be used to estimate tax rates beginning in 1979, however, so I focus on transitions beginning in the years 1979 through 1990. This rich panel data set provides ample informa- tion on individual, household, and occupational characteristics. Further, sample sizes for self-employed individuals are large enough to permit longitudinal analy- sis of the start-up decision. Yearly self-employed sample sizes are often small enough, however, that some form of pooled data analysis is preferred. Following much of the existing empirical literature, I confine the analysis to male heads of household who are between the ages of 25 and 54. The dynamics of self-employment are likely to be quite different among younger and older individ- uals and among females. 7 The head-of-household restriction is a direct result of the fact that the PSID provides self-employment status for household heads and their spouses only. These assumptions leave a sample of 2638 individuals with at least 1 year of data on self-employment status. For the purposes of this study, an individual is considered to be self-employed if he reports working for himself or for himself and someone else at the time of the survey. This latter category is minimal-usually less than 1 of the working 7 A number of other studies have analyzed self-employment for these groups. For example, Fuchs Ž . Ž . 1982 examines self-employment among older individuals. Dunn and Holtz-Eakin forthcoming and Ž . Ž . Ž . Blanchflower and Meyer 1994 consider younger individuals. Devine 1994 , MacPherson 1988 , and Ž . Bruce forthcoming look at female self-employment. sample in each year. These individuals are kept in order to increase sample sizes and to capture all experiences in self-employment. 8 Fig. 2 shows self-employment rates for the sample of 2638 men from 1979 to 1991. The self-employment rate is fairly constant over time for this group, usually between 17 and 18. Fig. 2 also presents transition rates from wage-and-salary to self-employment over time. Despite fairly constant self-employment rates, the transition rate from wage employment to self-employment shows a noticeable decline over time for this sample. The rate of exit from self-employment to wage employment partially explains these seemingly incompatible trends. However, Fig. Ž 2 shows an exit rate that seems to trend upward during the early 1980s perhaps in . response to eroding tax advantages , but then declines, albeit in a rather volatile manner, contributing to the fairly constant self-employment rate. Nonetheless, the reduction in the tax ApayoffB to becoming self-employed may be one of the many factors at work in the downturn in entry rates during the 1980s. For an individual to be in the sample used to estimate the transition probit, he must be wage-and-salary in the first period and either wage-and-salary or self-em- ployed in the next period, and he must be out of school. Consequently, I have a select sample; those who are already in self-employment cannot enter the analysis. To be more precise, the individuals in the transition probit have not yet made an observable transition into self-employment, or, if they entered before they were first observed in the data, they returned to wage-and-salary work at some point before the period of analysis. The data-screening process removes those who start in self-employment and never leave. Therefore, the individual random effect — which is intended to capture unobserved entrepreneurial ability — is potentially correlated with the transition indicator. This is a case of the so-called initial conditions problem. To correct for this type of potential bias, I follow the method suggested by Ž . Orme 1997 . This procedure essentially allows the initial conditions problem to be converted into a more tractable sample selection problem. The first stage of this procedure is a probit regression of a dummy variable that takes the value of one if the man is first observed in a wage-and-salary job and zero if he is first observed in self-employment. Regressors consist of a set of individual, household, and regional characteristics in the year he is first observed. Note that each individual’s Ž initial observation may come from any year during the panel period 1970 through . 1990 if the individual happens to be in school or out of the labor force in the first 8 Concerns have been raised in many studies about the appropriateness of screening a self-employed sample on the basis of earnings or hours worked. Such a procedure would supposedly eliminate the Apartially self-employedB or those claiming to be fully self-employed but who in fact are not really Ž . working or in the labor force on a full-time basis. Holtz-Eakin et al. 1994b note that such screening has virtually no effect on empirical results, however. Also, it should be noted that the PSID does not distinguish between incorporated and unincorporated self-employment. While important tax differences apply to these two categories, my focus on the newly self-employed is likely to minimize the amount of incorporated self-employment that enters the analysis. D. Bruce r Labour Economics 7 2000 545 – 574 556 Fig. 2. Self-employment rates, entry rates, and exit rates male household heads, ages 25–54. panel year. Identification of this stage is accomplished by including veteran status as a regressor, which does not appear later in the transition equations. Ž . In order for Orme’s 1997 procedure to be appropriate in this case, however, individuals who are either self-employed in their initially observed occupation or have entered self-employment at some previous point in the panel period must be omitted from the transition probits. This leaves a sample of initially wage-and- salary workers who will be followed until they either make a single transition into self-employment, they drop out of the survey, or they reach the end of the panel Ž . period. In a simplification of Orme’s 1997 procedure, an inverse Mills ratio is calculated using the estimates of the first-stage probit. 9 This Mills ratio is included as a regressor in the random effects transition probit, along with a similar set of individual, household, and regional characteristics. Approximately 11 of the individuals in the initial sample are self-employed in their first observed jobs. Only about 26 of all first jobs actually occur in 1970, the first year in my sample. The remaining 74 are distributed nearly evenly over the years from 1971 to 1990, indicating that I observe actual initial conditions for the majority of the sample. When I look only at those who make at least one transition from wage-and-salary to self-employment, slightly less than 10 are self-employed in their first job. Next, looking at the remaining sample of those who were initially wage-and- salary and eliminating multiple transitions leaves a total sample of 206 first observable transitions to analyze. However, a few of these individuals do not report information for one or more of the various control variables, which restricts the actual regression sample size to 1193 individuals, 184 of whom eventually make a transition into self-employment. 10 Pooling the observations from these 1193 individuals over the period from 1979 to 1990 yields a final sample of 5622 person-years of usable data for the transition analysis. Again, I follow previous studies of transitions into self-employment in selecting a set of control variables to include in X . Individual characteristics include age i, t Ž . in quadratic form , a set of indicators for educational attainment of less than high Ž . Ž . Ž school 11 or fewer years , some college between 13 and 15 years , college 16 . Ž . years , and post-college more than 16 years , and an indicator for nonwhite race. Household-level controls include marital status, entered as a dummy variable for married, and a series of continuous variables for the number of children in the household in various age groups. 9 Ž . Orme’s 1997 procedure is designed to allow observations coming from more than one initial condition to enter the final stage of the estimation process. 10 Ž . Fitzgerald et al. 1998 examine the impact of sample attrition on the representativeness of the PSID, concluding that attrition is not generally a serious problem. While I have not repeated their procedure to gauge the impact of attrition as it relates to self-employment, it is not clear whether my reduced sample should be representative of any particular population. For this reason, none of the Ž . results in this study use PSID or any other weights. Table 3 Variable definitions Variable Definition a Self-employment transition s1 if wage-and-salary in year t and self-employed in year t q 1 Age Age in years Age squared AgeAge Tenure Tenure on current job in months Ž . Tenure Squared TenureTenure r100 a Dropout s1 if less than 12 years of education a Some college s1 if 13 to 15 years of education a College graduate s1 if 16 years of education a Post-college s1 if more than 16 years of education a Non-white s1 if black or other non-white race a North Central s1 if living in North Central region a South s1 if living in South region a West s1 if living in West region a Married s1 if married, with spouse present Ž . Income from Capital Household’s income from capital US1000 a Part-Time s1 if worked between 52 and 1820 annual hours Kids 1 to 2 Number of children in the household between the ages of 1 and 2 Kids 3 to 5 Number of children in the household between the ages of 3 and 5 Kids 6 to 13 Number of children in the household between the ages of 6 and 13 Kids 14 to 17 Number of children in the household between the ages of 14 and 17 a Union s1 for membership in a labor union Unemployment rate County unemployment rate a MSA s1 if living in a metropolitan statistical area When not otherwise indicated, all variables represent information at year t. a Dummy variable. As a transition into self-employment carries certain opportunity costs, I also Ž . include a set of job-specific controls. These consist of tenure in months on the current wage-and-salary job entered in quadratic form, and dummies for part-time employment and union membership in the pre-transition year. Regional and macroeconomic effects are controlled for via a set of indicators for residence in the north-central, south, and west regions, a dummy for whether or not the Ž . individual lives in a metropolitan statistical area MSA , and the local area Ž . county unemployment rate. Dummy variables for the year of the observation are included to control for other potential time-related effects. A number of studies have found that greater wealth holdings increase the likelihood of a transition into self-employment. 11 This is consistent with the notion 11 Ž . Ž . Ž . Evans and Leighton 1989 , Evans and Jovanovic 1989 , and Meyer 1990 are among the Ž . pioneering studies of liquidity constraints and self-employment. Blanchflower and Oswald 1998 and Ž . Holtz-Eakin et al. 1994a,b also reveal the importance of available financial capital to self-employ- ment entry and duration. While I do not specifically include spouse’s income as a control, it is included Ž . in the calculation of all household tax rates in this analysis. that liquidity constraints are present. While the PSID does not include a yearly wealth variable, it does have the household’s yearly income from capital. This will Ž . presumably be positively correlated with the household’s unobserved total wealth holdings, so it is used here as a proxy. Table 3 provides definitions of these control variables, and Table 4 presents descriptive statistics for the regression sample. Individuals making a transition into self-employment are, before the transition, slightly younger than those who do not enter self-employment. They have also worked fewer months on their wage-and- salary job, are more likely to be white, are more likely to be working part time, and are much less likely to belong to a union. Table 5 provides a preliminary look at the actual and hypothetical post-transi- tion tax situations for the regression sample. Looking first at those who do not enter self-employment: their predicted total taxes would have been nearly US1500 lower if they had entered self-employment. Their average and marginal tax rates Table 4 Summary statistics for regression variables-pooled data Variable All Those remaing in a Those making a transition wage-and-salary job into self-employment Ž . Self-emp. transition 0.033 0.178 1 Ž . Ž . Ž . Age 30.346 5.160 30.379 5.162 29.348 5.012 Ž . Ž . Ž . Age squared 947.478 360.813 949.549 360.990 886.283 351.004 Ž . Ž . Ž . Tenure 65.966 59.844 66.893 60.144 38.587 41.864 Ž . Ž . Ž . Tenure squared 79.322 144.058 80.912 145.745 32.320 63.588 Ž . Ž . Ž . Dropout 0.082 0.274 0.081 0.273 0.109 0.312 Ž . Ž . Ž . Some college 0.250 0.433 0.251 0.433 0.223 0.417 Ž . Ž . Ž . College graduate 0.186 0.389 0.187 0.390 0.174 0.380 Ž . Ž . Ž . Post-college education 0.123 0.329 0.122 0.327 0.168 0.375 Ž . Ž . Ž . Non-white 0.076 0.266 0.078 0.268 0.043 0.204 Ž . Ž . Ž . North Central 0.260 0.439 0.260 0.439 0.255 0.437 Ž . Ž . Ž . South 0.304 0.460 0.305 0.460 0.266 0.443 Ž . Ž . Ž . West 0.200 0.400 0.198 0.399 0.250 0.434 Ž . Ž . Ž . Married 0.835 0.371 0.834 0.372 0.859 0.349 Ž . Ž . Ž . Income from capital 0.729 2.675 0.735 2.704 0.531 1.603 Ž . Ž . Ž . Part-time 0.143 0.350 0.141 0.348 0.212 0.410 Ž . Ž . Ž . Kids 1 to 2 0.338 0.550 0.338 0.550 0.337 0.559 Ž . Ž . Ž . Kids 3 to 5 0.290 0.530 0.286 0.525 0.418 0.639 Ž . Ž . Ž . Kids 6 to 13 0.398 0.752 0.399 0.751 0.359 0.769 Ž . Ž . Ž . Kids 14 to 17 0.071 0.318 0.072 0.321 0.027 0.194 Ž . Ž . Ž . Union 0.206 0.405 0.211 0.408 0.076 0.266 Ž . Ž . Ž . Unemployment rate 6.283 2.753 6.298 6.298 5.832 2.431 Ž . Ž . Ž . MSA 0.573 0.495 0.574 0.574 0.543 0.499 N 5622 5438 184 Entries are means, with standard deviations in parentheses. Observations are person-years. Ž . A two-tailed t test rejects the null hypothesis of equal means for columns 2 and 3 at the 5 significance level. Table 5 Summary statistics for tax variables-pooled data Variable Those remaining in a Those making a transition wage-and-salary job into self-employment Ž . Ž . Actual total taxes 11,903.64 9225.58 8262.28 10,529.33 Ž . Ž . Ž . Predicted Alternate total taxes 10,580.41 9691.89 11,641.88 8825.72 Ž . Ž . Actual Federal ATR 12.10 7.69 10.98 13.18 Ž . Ž . Ž . Predicted Alternate Federal ATR 11.09 9.02 13.22 7.80 Ž . Ž . Actual Federal MTR 23.68 8.56 21.47 9.78 Ž . Ž . Ž . Predicted Alternate Federal MTR 22.37 8.99 25.75 8.96 Ž . Ž . Actual State ATR 2.80 2.70 2.81 7.82 Ž . Ž . Ž . Predicted Alternate State ATR 2.65 3.98 2.72 2.10 Ž . Ž . Actual State MTR 5.07 3.36 4.61 3.38 Ž . Ž . Ž . Predicted Alternate State MTR 4.83 3.30 5.23 3.63 Ž . Ž . Actual payroll ATR 12.60 5.04 9.76 8.62 Ž . Ž . Ž . Predicted Alternate payroll ATR 11.22 6.87 11.55 4.43 Ž . Ž . Actual payroll MTR 12.73 4.77 11.02 3.60 Ž . Ž . Ž . Predicted Alternate payroll MTR 10.09 5.30 13.68 2.91 Ž . Ž . Total ATR differential 2.91 9.68 4.96 10.12 Ž . Ž . Total MTR differential 4.11 8.91 7.26 9.43 Ž . Ž . Net-of-tax income differential 971.11 10,887.76 4214.60 8018.64 Entries are means, with standard deviations in parentheses. Differential variables are calculated as the Ž . Ž . actual or predicted value under wage-and-salary minus the actual or predicted value under self-employment. See text for additional details. ATR s Average Tax Rate. MTR s Marginal Tax Rate. would all have been lower in self-employment. A similar pattern emerges for those who make a transition into self-employment. Their actual total tax payments are, on average, nearly US3400 lower than they would have been had they remained in their wage-and salary job. Further, all tax rates except the state-level average tax rate are lower in self-employment than they would have been in wage-and- salary. The last three rows of Table 5 provide summary statistics for the differential variables used in the probits. If taxes affect the probability of becoming self-em- ployed, we might expect to find that those who stand to gain the most from self-employment in terms of lower tax rates or liabilities are also the most likely to enter self-employment. A second possibility is that those whose tax rates would be higher in self-employment would be most likely to enter in order to capture the increased benefits from a given level of business-related deductions. It is the first Ž of these possibilities that is observed in Table 5. The total federal and state . income plus payroll average and marginal tax rates in wage-and-salary exceed the corresponding total rates in self-employment for both categories of individuals, but by a greater amount for those who make a transition into self-employment. Finally, those entering self-employment experience a much larger drop in after-tax household income. While those who do not enter self-employment would have earned US971 less on average had they entered, those who make a transition would have earned an average of US4215 more if they had remained in a wage-and-salary job. This larger difference could be interpreted as the implicit value of the nonpecuniary benefits in self-employment.

5. Results and discussion