when heteroscedasticity is ignored, and the bias is large enough that the estimate without controlling for heteroscedasticity falls outside of the 95 confidence
interval. This example shows the importance of controlling for heteroscedasticity both in the point estimate and in the standard error.
Ž The three estimands are also calculated for the nonparametric model see Table
. 4 . The pattern of results is similar. Controlling for heteroscedasticity is clearly
important for both the point estimates and the standard errors. The point estimates for the slope and average slope are much larger in absolute value after controlling
for heteroscedasticity. The standard errors for all three stastistics are quite different after controlling for heteroscedasticity, and in two of the cases are larger.
In this example, however, the difference between the results for the normal and nonparametric models is small. Comparisons between Tables 3 and 4 show
qualitatively similar results. In our example, the residuals are approximately normal, so the nonparametric model is not different from the normal model. In
other data sets, however, the nonparametric model may be more appropriate and could give quite different results.
4. Conclusion
The use of the log-transformed dependent variable in applied economics creates w
x a potential bias when computing estimates of E y
N x on the original scale, if the error term either does not have a normal distribution or is heteroscedastic.
Estimates on the original scale should be reported with standard errors, like all estimated values. One reason that the calculation of standard errors is not common
is that the equations are not commonly known. This paper provides equations for the general case of error terms that have any distribution and are heteroscedastic,
as well as for simpler cases. Another reason is that computing the standard errors is not automatically done in software. The authors will provide sample programs
in Stata to compute the standard errors upon request.
Acknowledgements
Chunrong Ai received financial support through a summer grant from the Warrington College of Business Administration. We would like to thank Willard
Manning, John Mullahy, and two anonymous referees for their comments on an earlier draft. An earlier version of this paper was presented at the Second World
Conference of the International Health Economics Association, Rotterdam, the Netherlands, June 6–9, 1999.
Appendix A
A.1. Formulas for the normal linear model For the normal linear model the formulas for Step III are:
X
ˆ
X
x b q0.5 x a
ˆ
Ž . m x s e
scalar
ˆ ˆ
b ,a scalars, estimated parameters
ˆ
k k
for x
k
e vector of 0s, with 1 in the k th
k
position E
m x
Ž . ˆ
ˆ
Ž .Ž .
s m x b q 0.5a
scalar
ˆ ˆ
k k
k
E x
E m x
Ž . ˆ
Ž . s m x x
vector
ˆ
E b
E m x
Ž . ˆ
Ž . s 0.5m x x
vector
ˆ
E a
2
E m x
E m x
Ž . Ž .
ˆ ˆ
Ž . s
x q m x e
vector
ˆ
k k
k
E bE x
E x
2
E m x
E m x
Ž . Ž .
ˆ ˆ
s 0.5 x
q m x e vector
Ž . ˆ
k k
k
ž
E a E x
E x
2 2
E m x
1 E
m x
Ž . Ž
. ˆ
i N
ˆ
E s
Ý vector, simple average
i s1
k k
N E
bE x E
bE x
2 2
E m x
1 E
m x
Ž . Ž
. ˆ
i N
ˆ
E s
Ý vector, simple average
i s1
k k
N E
a E x E
a E x A.2. Formulas for the normal nonlinear model
For the normal nonlinear model the formulas for Step III are:
ˆ
Ž hŽ x, b .q 0.5 sŽ x, a ..
ˆ
Ž . m x s e
scalar
ˆ ˆ
E m x
E h x, b
Ž . ˆ
Ž .
Ž . s m x
vector
ˆ
E b
E b
E m x
E s x, a
Ž . Ž
. ˆ
ˆ
Ž . s m x 0.5
vector
ˆ
E a
E a
ˆ
E m x
E h x, b
E s x, a
Ž . Ž
. ˆ
ˆ
Ž .
Ž . s m x
q 0.5 scalar
ˆ
k k
k
E x
E x
E x
2 2
ˆ ˆ
E m x
E m x
E h x, b
E s x, a
E h x, b
Ž . Ž .
Ž .
ˆ ˆ
ˆ
Ž .
Ž .
Ž . s
q 0.5 q m x
ˆ
X k
k k
k
E b
E x Eb
E x
E x
E x Eb
vector
2 2
ˆ
E m x
E m x
E h x, b
E s x, a
E s x, a
Ž . Ž .
Ž .
Ž .
ˆ ˆ
ˆ ˆ
Ž .
Ž . s
q 0.5 q 0.5m x
ˆ
X k
k k
k
E a
E x Ea
E x
E x
E x Ea
vector A.3. Formulas for the nonparametric model with homoscedasticity
For the nonparametric model with homoscedasticity the formulas for Step III are:
2
N
ˆ ˆ
E m x
E h x , b
1 E
h x , b
Ž . ˆ
Ž .
Ž .
i
ˆ
hŽ x , b . ´
ˆ
i
s m x y e
e vector
Ž . ˆ
Ý
ž
E b
E b
N E
b
i s1
The second term in the first derivative above arises from the fact that the
ˆ
residuals ´ are estimated and hence depend on the parameter estimate b.
ˆ
i 2
ˆ ˆ
E m x
E h x , b
E m x
E h x , b
Ž . Ž .
ˆ ˆ
Ž .
Ž .
s m x q
vector
Ž . ˆ
k k
k
E b
E x Eb
E x Eb
E x
2 2
N
ˆ
E m x
1 E
h x , b
Ž .
Ž .
i
ˆ
E s
y vector
Ý
i k
k
N E
x Eb E
x Eb
i s1
Four of the S ’s are estimated by running a simple regression, saving the vector of regression coefficients, and dividing them by N. The part of the dependent
variable that is not ´ or h must have its mean subtracted before multiplying by
ˆ ˆ
i i
the estimated error. Estimated
Dependent variable Independent
coefficients variable
ˆ ˆ
ˆ
Ž Ž .
. Ž
. S
exp h x, b q ´ = ´
E h x , b
rEb
ˆ ˆ
1 Db i
i i
ˆ ˆ
ˆ
Ž Ž .
. Ž
. S
exp h x, b q ´
E h x , b
rEb
ˆ
2 Db i
i k
ˆ
Ž .
=E h x, b
rE x = ´
ˆ
i y1
ˆ ˆ
ˆ
w Ž Ž
. .
Ž .
S N
Ý exp h x ,b q ´ E
h x , b rEb
ˆ
3 Db j
j i
i k
ˆ
Ž .
x =E
h x , b rE x
= ´
ˆ
j i
k
ˆ ˆ
ˆ
w Ž
. x
Ž .
S y = Eh x , b
rE x =
´ E
h x , b rEb
ˆ
mb i
i i
i
Four of the S ’s are estimated by computing the sample variance divided by N Ž
. scalar .
Estimated variance Sample variance of
ˆ ˆ
Ž Ž .
. S
exp h x, b q ´
1DD i
k
ˆ ˆ
ˆ
Ž Ž .
.Ž Ž
. .
S exp h x, b
q ´ E
h x, b rE x
2 DD i
y1 k
ˆ ˆ
ˆ
Ž Ž .
. Ž
. S
N Ý exp h x ,b q ´ = Eh x ,b rE x
ˆ
3DD j
j i
j k
ˆ ˆ
Ž Ž
. .
S y Eh x , b
rE x
m i
i
A.4. Formulas for the nonparametric model with heteroscedasticity For the nonparametric model with heteroscedasticity the formulas for Step III
are:
N
ˆ
E m x
1 E
g x , b , a
Ž . ˆ
ˆ
Ž .
i
s vector
Ý
E b
N E
b
i s1
N
ˆ
E m x
1 E
g x , b , a
Ž . ˆ
ˆ
Ž .
i
s vector
Ý
E a
N E
a
i s1
2
N
2
ˆ
E m x
1 E
g x , b , a
Ž . ˆ
ˆ
Ž .
i
s vector
Ý
k k
N E
x Eb E
x Eb
i s1
2
N
2
ˆ
E m x
1 E
g x , b , a
Ž . ˆ
ˆ
Ž .
i
s vector
Ý
k k
N E
x Ea E
x Ea
i s1
2
N
ˆ
E m x
1 E
q b ,a
Ž . ˆ
Ž .
i
ˆ
E s
vector
Ý
k
ž
N E
b E
x Eb
i s1
2
N
ˆ
E m x
1 E
q b ,a
Ž . ˆ
Ž .
i
ˆ
E s
vector.
Ý
k
ž
N E
a E
x Ea
i s1
y1
N N
ˆ ˆ
ˆ
E h x , b
E h x , b
E h x , b
E s x , a
Ž .
ˆ
Ž .
Ž .
Ž .
i i
i i
ˆ
S s
´ h
ˆ ˆ
Ý Ý
X X
b a i
i
ž ž
E b
E b
E b
E a
i s1
i s1
y1
N
E s x , a
E s x , a
Ž .
Ž .
ˆ ˆ
i i
= matrix
Ý
X
ž
E a
E a
i s1
Eight of the S ’s are estimated by running a simple regression, saving the vector of regression coefficients, and dividing them by N. The part of the
dependent variable that is not ´ or h must have its mean subtracted before
ˆ ˆ
i i
multiplying by the estimated error. Estimated
Dependent variable Independent
coefficient variable
ˆ ˆ
ˆ
Ž .
Ž .
S g x, b , a = ´
E h x , b
rEb
ˆ ˆ
1Db i
i i
ˆ ˆ
Ž .
Ž .
S g x, b , a = h
E s x , a
rEa
ˆ ˆ
ˆ
1Da i
i i
k
ˆ ˆ
ˆ
Ž Ž
. .
Ž .
S E
g x, b ,a rE x
= ´
E h x , b
rEb
ˆ ˆ
2 Db i
i i
k
ˆ ˆ
Ž Ž
. .
Ž .
S E
g x, b ,a rE x
= h
E s x , a
rEa
ˆ ˆ
ˆ
2 Da i
i i
y1 k
ˆ ˆ
ˆ
Ž Ž
. .
Ž .
S N
Ý E g x ,b ,a rE x =
´ E
h x , b rEb
ˆ ˆ
3Db j
i j
i i
y1 k
ˆ ˆ
Ž Ž
. .
Ž .
S N
Ý E g x ,b ,a rE x =
h E
s x , a rEa
ˆ ˆ
ˆ
3Da j
i j
i i
ˆ ˆ
ˆ
Ž .
Ž .
S q b , a = ´
E h x , b
rEb
ˆ ˆ
q b i
i i
ˆ ˆ
Ž .
Ž .
S q b , a = h
E s x , a
rEa
ˆ ˆ
ˆ
qa i
i i
Four of the S ’s are estimated by computing the sample variance divided by N Ž
. scalar .
Estimated variance Sample variance of
ˆ ˆ
Ž .
S g x, b , a
ˆ
1DD i
k
ˆ ˆ
Ž .
S E
g x, b , a rE x
ˆ
2 DD i
y1 k
ˆ ˆ
Ž .
S N
Ý E g x ,b ,a rE x
ˆ
3DD j
i j
ˆ ˆ
Ž .
S q b , a
ˆ
m i
A.5. Proof of asymptotic independence
ˆ
Let b denote the least squares estimator of the regression ln y
s h x ,b q ´ i
s 1,2, . . . , N.
Ž . Ž
.
i i
i
Then applying a Taylor expansion, we obtain
y1
N N
E h x , b
E h x , b
E h x , b
Ž .
Ž .
Ž .
i i
i y0 .5
ˆ
b y b s ´ q o
N .
Ž .
Ý Ý
X i
p
ž
E b
E b
E b
i s1
i s1
Ž .
See Amemiya 1985 for a derivation. Let ´ denote the least squares residuals.
ˆ
i
Note that
2
E ´ N x s s x ,a .
Ž .
i i
i
We can write ´
2
s s x ,a q h
Ž .
i i
i
where h is the error term. Let a denote the least squares estimator from
ˆ
i
´
2
s s x ,a q h
Ž .
ˆ
i i
i
Applying the first-order Taylor series approximation, we obtain
y1
N N
E s x , a
E s x , a
E s x , a
Ž .
Ž .
Ž .
i i
i y0 .5
a y a s h q o
N .
Ž .
ˆ
Ý Ý
X i
p
ž
E a
E a
E a
i s1
i s1
ˆ
3
w x
w x
Note that the covariance between b and a depends on E ´ h N x s E ´ h N x .
ˆ
i i
i i
i i
3
ˆ
w x
For normal ´ , E ´ N x s 0, implying that b and a are asymptotically indepen-
ˆ
i i
i
dent.
References
Amemiya, T., 1985. Advanced Econometrics. Harvard Univ. Press, Cambridge, MA. Duan, N., 1983. Smearing estimate: a nonparametric retransformation method. J. Am. Stat. Assoc. 78,
605–610. Duan, N. et al., 1983. A comparison of alternative models for the demand for medical care. J. Bus.
Econ. Stat. 1, 115–126. Duan, N. et al., 1984. Choosing between the sample selection model and the multi-part model. J. Bus.
Econ. Stat. 2, 283–289. Greene, W.H., 2000. Econometric Analysis. Prentice-Hall, Upper Saddle River, NJ.
Jones, A.M., Yen, S.T., 1999. A Box–Cox Double-Hurdle Model. Manchester School, in press. Manning, W.G., 1998. The logged dependent variable, heteroscedasticity, and the retransformation
problem. J. Health Econ. 17, 283–296. Manning, W.G., Mullahy, J., 1999. Estimating Log Models: To Transform or not to Transform.
University of Chicago working paper. Mullahy, J., 1998. Much ado about two: reconsidering retransformation and the two-part model in
health econometrics. J. Health Econ. 17, 247–282. White, H., 1980. A heteroskedasticity-consistent convariance matrix estimator and a direct test for
heteroskedasticity. Econometrica 48, 817–830.