Conclusion Directory UMM :Journals:Journal of Health Economics:Vol19.Issue5.Sep2000:

when heteroscedasticity is ignored, and the bias is large enough that the estimate without controlling for heteroscedasticity falls outside of the 95 confidence interval. This example shows the importance of controlling for heteroscedasticity both in the point estimate and in the standard error. Ž The three estimands are also calculated for the nonparametric model see Table . 4 . The pattern of results is similar. Controlling for heteroscedasticity is clearly important for both the point estimates and the standard errors. The point estimates for the slope and average slope are much larger in absolute value after controlling for heteroscedasticity. The standard errors for all three stastistics are quite different after controlling for heteroscedasticity, and in two of the cases are larger. In this example, however, the difference between the results for the normal and nonparametric models is small. Comparisons between Tables 3 and 4 show qualitatively similar results. In our example, the residuals are approximately normal, so the nonparametric model is not different from the normal model. In other data sets, however, the nonparametric model may be more appropriate and could give quite different results.

4. Conclusion

The use of the log-transformed dependent variable in applied economics creates w x a potential bias when computing estimates of E y N x on the original scale, if the error term either does not have a normal distribution or is heteroscedastic. Estimates on the original scale should be reported with standard errors, like all estimated values. One reason that the calculation of standard errors is not common is that the equations are not commonly known. This paper provides equations for the general case of error terms that have any distribution and are heteroscedastic, as well as for simpler cases. Another reason is that computing the standard errors is not automatically done in software. The authors will provide sample programs in Stata to compute the standard errors upon request. Acknowledgements Chunrong Ai received financial support through a summer grant from the Warrington College of Business Administration. We would like to thank Willard Manning, John Mullahy, and two anonymous referees for their comments on an earlier draft. An earlier version of this paper was presented at the Second World Conference of the International Health Economics Association, Rotterdam, the Netherlands, June 6–9, 1999. Appendix A A.1. Formulas for the normal linear model For the normal linear model the formulas for Step III are: X ˆ X x b q0.5 x a ˆ Ž . m x s e scalar ˆ ˆ b ,a scalars, estimated parameters ˆ k k for x k e vector of 0s, with 1 in the k th k position E m x Ž . ˆ ˆ Ž .Ž . s m x b q 0.5a scalar ˆ ˆ k k k E x E m x Ž . ˆ Ž . s m x x vector ˆ E b E m x Ž . ˆ Ž . s 0.5m x x vector ˆ E a 2 E m x E m x Ž . Ž . ˆ ˆ Ž . s x q m x e vector ˆ k k k E bE x E x 2 E m x E m x Ž . Ž . ˆ ˆ s 0.5 x q m x e vector Ž . ˆ k k k ž E a E x E x 2 2 E m x 1 E m x Ž . Ž . ˆ i N ˆ E s Ý vector, simple average i s1 k k N E bE x E bE x 2 2 E m x 1 E m x Ž . Ž . ˆ i N ˆ E s Ý vector, simple average i s1 k k N E a E x E a E x A.2. Formulas for the normal nonlinear model For the normal nonlinear model the formulas for Step III are: ˆ Ž hŽ x, b .q 0.5 sŽ x, a .. ˆ Ž . m x s e scalar ˆ ˆ E m x E h x, b Ž . ˆ Ž . Ž . s m x vector ˆ E b E b E m x E s x, a Ž . Ž . ˆ ˆ Ž . s m x 0.5 vector ˆ E a E a ˆ E m x E h x, b E s x, a Ž . Ž . ˆ ˆ Ž . Ž . s m x q 0.5 scalar ˆ k k k E x E x E x 2 2 ˆ ˆ E m x E m x E h x, b E s x, a E h x, b Ž . Ž . Ž . ˆ ˆ ˆ Ž . Ž . Ž . s q 0.5 q m x ˆ X k k k k E b E x Eb E x E x E x Eb vector 2 2 ˆ E m x E m x E h x, b E s x, a E s x, a Ž . Ž . Ž . Ž . ˆ ˆ ˆ ˆ Ž . Ž . s q 0.5 q 0.5m x ˆ X k k k k E a E x Ea E x E x E x Ea vector A.3. Formulas for the nonparametric model with homoscedasticity For the nonparametric model with homoscedasticity the formulas for Step III are: 2 N ˆ ˆ E m x E h x , b 1 E h x , b Ž . ˆ Ž . Ž . i ˆ hŽ x , b . ´ ˆ i s m x y e e vector Ž . ˆ Ý ž E b E b N E b i s1 The second term in the first derivative above arises from the fact that the ˆ residuals ´ are estimated and hence depend on the parameter estimate b. ˆ i 2 ˆ ˆ E m x E h x , b E m x E h x , b Ž . Ž . ˆ ˆ Ž . Ž . s m x q vector Ž . ˆ k k k E b E x Eb E x Eb E x 2 2 N ˆ E m x 1 E h x , b Ž . Ž . i ˆ E s y vector Ý i k k N E x Eb E x Eb i s1 Four of the S ’s are estimated by running a simple regression, saving the vector of regression coefficients, and dividing them by N. The part of the dependent variable that is not ´ or h must have its mean subtracted before multiplying by ˆ ˆ i i the estimated error. Estimated Dependent variable Independent coefficients variable ˆ ˆ ˆ Ž Ž . . Ž . S exp h x, b q ´ = ´ E h x , b rEb ˆ ˆ 1 Db i i i ˆ ˆ ˆ Ž Ž . . Ž . S exp h x, b q ´ E h x , b rEb ˆ 2 Db i i k ˆ Ž . =E h x, b rE x = ´ ˆ i y1 ˆ ˆ ˆ w Ž Ž . . Ž . S N Ý exp h x ,b q ´ E h x , b rEb ˆ 3 Db j j i i k ˆ Ž . x =E h x , b rE x = ´ ˆ j i k ˆ ˆ ˆ w Ž . x Ž . S y = Eh x , b rE x = ´ E h x , b rEb ˆ mb i i i i Four of the S ’s are estimated by computing the sample variance divided by N Ž . scalar . Estimated variance Sample variance of ˆ ˆ Ž Ž . . S exp h x, b q ´ 1DD i k ˆ ˆ ˆ Ž Ž . .Ž Ž . . S exp h x, b q ´ E h x, b rE x 2 DD i y1 k ˆ ˆ ˆ Ž Ž . . Ž . S N Ý exp h x ,b q ´ = Eh x ,b rE x ˆ 3DD j j i j k ˆ ˆ Ž Ž . . S y Eh x , b rE x m i i A.4. Formulas for the nonparametric model with heteroscedasticity For the nonparametric model with heteroscedasticity the formulas for Step III are: N ˆ E m x 1 E g x , b , a Ž . ˆ ˆ Ž . i s vector Ý E b N E b i s1 N ˆ E m x 1 E g x , b , a Ž . ˆ ˆ Ž . i s vector Ý E a N E a i s1 2 N 2 ˆ E m x 1 E g x , b , a Ž . ˆ ˆ Ž . i s vector Ý k k N E x Eb E x Eb i s1 2 N 2 ˆ E m x 1 E g x , b , a Ž . ˆ ˆ Ž . i s vector Ý k k N E x Ea E x Ea i s1 2 N ˆ E m x 1 E q b ,a Ž . ˆ Ž . i ˆ E s vector Ý k ž N E b E x Eb i s1 2 N ˆ E m x 1 E q b ,a Ž . ˆ Ž . i ˆ E s vector. Ý k ž N E a E x Ea i s1 y1 N N ˆ ˆ ˆ E h x , b E h x , b E h x , b E s x , a Ž . ˆ Ž . Ž . Ž . i i i i ˆ S s ´ h ˆ ˆ Ý Ý X X b a i i ž ž E b E b E b E a i s1 i s1 y1 N E s x , a E s x , a Ž . Ž . ˆ ˆ i i = matrix Ý X ž E a E a i s1 Eight of the S ’s are estimated by running a simple regression, saving the vector of regression coefficients, and dividing them by N. The part of the dependent variable that is not ´ or h must have its mean subtracted before ˆ ˆ i i multiplying by the estimated error. Estimated Dependent variable Independent coefficient variable ˆ ˆ ˆ Ž . Ž . S g x, b , a = ´ E h x , b rEb ˆ ˆ 1Db i i i ˆ ˆ Ž . Ž . S g x, b , a = h E s x , a rEa ˆ ˆ ˆ 1Da i i i k ˆ ˆ ˆ Ž Ž . . Ž . S E g x, b ,a rE x = ´ E h x , b rEb ˆ ˆ 2 Db i i i k ˆ ˆ Ž Ž . . Ž . S E g x, b ,a rE x = h E s x , a rEa ˆ ˆ ˆ 2 Da i i i y1 k ˆ ˆ ˆ Ž Ž . . Ž . S N Ý E g x ,b ,a rE x = ´ E h x , b rEb ˆ ˆ 3Db j i j i i y1 k ˆ ˆ Ž Ž . . Ž . S N Ý E g x ,b ,a rE x = h E s x , a rEa ˆ ˆ ˆ 3Da j i j i i ˆ ˆ ˆ Ž . Ž . S q b , a = ´ E h x , b rEb ˆ ˆ q b i i i ˆ ˆ Ž . Ž . S q b , a = h E s x , a rEa ˆ ˆ ˆ qa i i i Four of the S ’s are estimated by computing the sample variance divided by N Ž . scalar . Estimated variance Sample variance of ˆ ˆ Ž . S g x, b , a ˆ 1DD i k ˆ ˆ Ž . S E g x, b , a rE x ˆ 2 DD i y1 k ˆ ˆ Ž . S N Ý E g x ,b ,a rE x ˆ 3DD j i j ˆ ˆ Ž . S q b , a ˆ m i A.5. Proof of asymptotic independence ˆ Let b denote the least squares estimator of the regression ln y s h x ,b q ´ i s 1,2, . . . , N. Ž . Ž . i i i Then applying a Taylor expansion, we obtain y1 N N E h x , b E h x , b E h x , b Ž . Ž . Ž . i i i y0 .5 ˆ b y b s ´ q o N . Ž . Ý Ý X i p ž E b E b E b i s1 i s1 Ž . See Amemiya 1985 for a derivation. Let ´ denote the least squares residuals. ˆ i Note that 2 E ´ N x s s x ,a . Ž . i i i We can write ´ 2 s s x ,a q h Ž . i i i where h is the error term. Let a denote the least squares estimator from ˆ i ´ 2 s s x ,a q h Ž . ˆ i i i Applying the first-order Taylor series approximation, we obtain y1 N N E s x , a E s x , a E s x , a Ž . Ž . Ž . i i i y0 .5 a y a s h q o N . Ž . ˆ Ý Ý X i p ž E a E a E a i s1 i s1 ˆ 3 w x w x Note that the covariance between b and a depends on E ´ h N x s E ´ h N x . ˆ i i i i i i 3 ˆ w x For normal ´ , E ´ N x s 0, implying that b and a are asymptotically indepen- ˆ i i i dent. References Amemiya, T., 1985. Advanced Econometrics. Harvard Univ. Press, Cambridge, MA. Duan, N., 1983. Smearing estimate: a nonparametric retransformation method. J. Am. Stat. Assoc. 78, 605–610. Duan, N. et al., 1983. A comparison of alternative models for the demand for medical care. J. Bus. Econ. Stat. 1, 115–126. Duan, N. et al., 1984. Choosing between the sample selection model and the multi-part model. J. Bus. Econ. Stat. 2, 283–289. Greene, W.H., 2000. Econometric Analysis. Prentice-Hall, Upper Saddle River, NJ. Jones, A.M., Yen, S.T., 1999. A Box–Cox Double-Hurdle Model. Manchester School, in press. Manning, W.G., 1998. The logged dependent variable, heteroscedasticity, and the retransformation problem. J. Health Econ. 17, 283–296. Manning, W.G., Mullahy, J., 1999. Estimating Log Models: To Transform or not to Transform. University of Chicago working paper. Mullahy, J., 1998. Much ado about two: reconsidering retransformation and the two-part model in health econometrics. J. Health Econ. 17, 247–282. White, H., 1980. A heteroskedasticity-consistent convariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48, 817–830.