Sample Selection Model Journal of Asian Economics tahun 2009

ROLE OF EDUCATION IN CIGARETTE SMOKING 3 © 2009 The Authors Journal compilation © 2009 East Asian Economic Association and Blackwell Publishing Ltd. cigarettes, cigars and other tobacco related items consumption amongst households in Malaysia. Understanding how socio-demographic factors, such as education, influence the likelihood of purchasing and the amount spent on tobacco products is important to policy-makers interested in identifying household characteristics that determine the demand for tobacco products in Malaysia. This may, in turn, be relevant in designing effective smoking prevention and cessation programs aimed at specific target groups in Malaysia.

II. Sample Selection Model

To accommodate the zero observations typically encountered in household tobacco expenditure studies, we use the sample selection model Heckman, 1979, also known as the type 2 Tobit model Amemiya, 1985. The model is characterized as: log y = x ′β + v if z ′α + u 1 y = if z ′α + u ≤ 0, where y is the dependent variable, x and z are vectors of explanatory variables, β and α are conformable vectors of parameters, and the error terms u and v are distributed as bivariate normal with zero means and a finite covariance matrix: 2 where, σ is the standard deviation of v , and ρ is the correlation between u and v . The standard deviation of u is not identified and, therefore, is set at unity because the selection outcomes are observed as binary. The sample likelihood function is: 3 where y –1 is the Jacobian of the transformation from log y to y , and φ ⋅ and Φ ⋅ are the standard normal probability density function and cumulative distribution function cdf, respectively. The likelihood function Equation 3 reduces to that of the two-part model when the errors are independent ρ = 0, in which case the log-likelihood function is separable in parameters α and [ β′ , σ ] ′ and, therefore, estimation can be broken down to a probit model to estimate α using the whole sample and a linear regression of log y on x to estimate β and σ using only the non-limit observations . There is continued interest in the marginal effects calculation in the sample selection model. In applications of log-transformed sample selection models, marginal effects are often calculated by differentiating the conditional mean of the logarithm of the dependent variable e.g. Cheng and Capps, 1988; u v N ⎡ ⎣⎢ ⎤ ⎦⎥ ⎡ ⎣⎢ ⎤ ⎦⎥ ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ⎧ ⎨ ⎩ ⎫ ⎬ ⎭ ~ , , 1 2 ρσ ρσ σ L z z y x y y x y y [ ] log log , = − ′ ′ + − ′ − ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ − ′ ⎛ ⎝⎜ ⎞ ⎠⎟ = − ∏ ∏ 1 1 1 2 1 2 1 Φ Φ α α ρ β σ ρ σ φ β σ ASIAN ECONOMIC JOURNAL 4 © 2009 The Authors Journal compilation © 2009 East Asian Economic Association and Blackwell Publishing Ltd. Hoffman and Kassouf, 2005. Yen and Rosinski 2008 show that such an approximation can lead to substantial errors. They also present the conditional mean of the dependent variable and marginal effect formulas for a log-transformed sample selection model. In essence, the conditional mean of the dependent variable y is Yen and Rosinski, 2008: E y | y = exp x ′β + σ 2 2 Φ z ′α + ρσ Φ z ′α . 4 Because the marginal probability of a positive observation is: Pr y = Φz′α 5 the unconditional mean of y is: E y = expx′β + σ 2 2 Φz′α + ρσ. 6 Differentiating Equations 4, 5 and 6 gives the marginal effects on probability, conditional mean and unconditional mean of a common element of x and z say x j = z j : ∂Pr y 0∂x j = φz′αα j 7 ∂Ey|y 0∂x j = [Φz′α] −2 expx ′β + σ 2 2{[ Φz′αφz′α + ρσ − φz′αΦz′α + ρσ]α j + Φz′αΦz′α + ρσβ j } 8 ∂Ey∂x j = expx′β + σ 2 2 [ φz′α + ρσα j + Φz′α + ρσβ j ]. 9 These marginal effects can be evaluated at data points of interest, such as the sample means of explanatory variables.

III. Data and Variable Definitions