Then the local linear GMM estimator of g g., Q V iu iu

to minimize the following local linear GMM criterion function 1 n Q h u ′ K hλ u Y − ξ u α ′ n u −1 × Q h u ′ K hλ u Y − ξ u α , 2.8 where n u is a symmetric k p c + 1 × k p c + 1 weight ma- trix that is positive definite for large n. Clearly, the solution to the above minimization problem is given by α n u; h, λ = ξ u ′ K hλ u Q h u n u −1 Q h u ′ K hλ u ξ u −1 × ξ u ′ K hλ u Q h u n u −1 Q h u ′ K hλ u Y. 2.9 Let e j,d 1+p c denote the d1 + p c × 1 unit vector with 1 at the jth position and 0 elsewhere. Let e j,p c ,d 1+p c denote the p c × d1 + p c selection matrix such that e j,p c ,d 1+p c α = . g j

u. Then the local linear GMM estimator of g

j u and . g j u are, respectively, given by g j u;h,λ = e ′ j,d 1+p c α n u; h, λ and . g j u; h, λ = e j,p c ,d 1+p c α n u; h, λ for j = 1, . . . , d. 2.10 We will study the asymptotic properties of α n u; h, λ in the next section. Remark 1 Choice of IVs. The choice of QV i is important in applications. One can choose it from the union of Z i and U i

e.g., Q V

i = V i such that a certain identification condition is satisfied. A necessary identification condition is k ≥ d, which ensures that the dimension of Q

h,iu

is not smaller than the di- mension of αu. In the following, we will consider the optimal choice of QV i where optimality is in the sense of minimizing the AVC matrix for the class of local linear GMM estimators given the orthogonal condition in 2.1 . We do so by extending the work of Newey 1990 , 1993 , Baltagi and Li 2002 , and Ai and Chen 2003 to our framework, but the latter authors only consider OIVs for GMM estimates of finite dimensional parameters based on conditional moment conditions. Remark 2 Local linear vs. local constant GMM esti- mators. An alternative to the local linear GMM estimator is the local constant GMM estimator; see, for example, Lewbel 2007 and Tran and Tsionas 2010 . In this case, the parame- ter of interest α contains only the set of functional coefficients g j , j = 1, . . . , d, evaluated at u = u c′ , u d′ ′ , but not their first- order derivatives with respect to the continuous arguments. As a result, one can set Q

h,iu

= QV i so that there is no distinc- tion between local and global instruments. In addition, our local linear GMM estimator in 2.9 reduces to that of Cai and Li 2008 by setting n u to be the identity matrix and choosing k = d global instruments. The latter condition is necessary for the model to be locally just identified. 3. ASYMPTOTIC PROPERTIES OF THE LOCAL LINEAR GMM ESTIMATOR In this section, we first give a set of assumptions and then study the asymptotic properties of the local linear GMM estimator. 3.1 Assumptions To facilitate the presentation, define 1 u = E Q V i X ′ i |U i = u and 2 u = E Q V i Q V i ′ σ 2 V i |U i = u , where σ 2 v ≡ E[ε 2 i |V i = v]. Let f U u ≡ f U u c , u d denote the joint density of U c i and U d i and pu d be the marginal proba- bility mass of U d i at u d . We use U c and U d = p d t =1 {0, 1, . . . , c t − 1} to denote the support of U c i and U d i , respectively. We now list the assumptions that will be used to establish the asymptotic distribution of our estimator. Assumption A1. Y i , X i , Z i , U i , i = 1, . . . , n, are indepen- dent and identically distributed IID. Assumption A2. E|ε i | 2+δ ∞ for some δ 0. EQV i X ′ i 2 ∞. Assumption A3. i U c is compact. ii The functions f U ·, u d , 1 ·, u d , and 2 ·, u d are continuously differen- tiable on U c for all u d ∈ U d . 0 f U u c , u d ≤ C for some C ∞. iii The functions g j ·, u d , j = 1, . . . , d, are second order continuously differentiable on U c for all u d ∈ U d . Assumption A4. i rank 1 u = d, and the k × k matrix 2 u is positive definite. ii n u = u + o P 1, where u is symmetric and positive definite. Assumption A5. The kernel function w· is a probability density function PDF that is symmetric, bounded, and has compact support [−c w , c w ]. It satisfies the Lipschitz condition |wv 1 − wv 2 | ≤ C w |v 1 − v 2 | for all v 1 , v 2 ∈ [−c w , c w ]. Assumption A6. As n → 0, the bandwidth sequences h = h 1 , . . . , h p c ′ and λ = λ 1 , . . . , λ p d ′ satisfy i nh → ∞, and ii nh 12 h 2 + λ = O1, where h ≡ h 1 , . . . , h p c . A1 requires IID observations. Following Cai and Li 2008 and SCU 2009, this assumption can be relaxed to allow for time series observations. A2 and A3 impose some moment and smoothness conditions, respectively. A4i imposes rank con- ditions for the identification of the functional coefficients and their first-order derivatives and A4ii is weak in that it allows the random weight matrix n to be consistently estimated from the data. As Hall, Wolf, and Yao 1999 remarked, the require- ment in A5 that w· is compactly supported can be removed at the cost of lengthier arguments used in the proofs, and in particular, the Gaussian kernel is allowed. A6 is standard for nonparametric regression with mixed data; see, for example, Li and Racine 2008 . Downloaded by [Universitas Maritim Raja Ali Haji] at 22:05 11 January 2016 3.2 Asymptotic Theory for the Local Linear Estimator Let μ s,t = R v s w v t dv, s, t = 0, 1, 2. Define u = f U u 1 u k×dp c kp c ×d μ 2,1 1 u ⊗ I p c , 3.1 and ϒ u = f U u μ p c 0,2 2 u k×kp c kp c ×k μ 2,2 2 u ⊗ I p c . 3.2 Clearly, u is a k1 + p c × d1 + p c matrix and ϒu is k 1 + p c × k1 + p c matrix. To describe the leading bias term associated with the discrete random variables, we define an indicator function I s ·, · by I s u d , u d = 1{u d = u d s } p d t =s 1{u d = u d t }. That is, I s u d , u d is one if and only if u d and u d differ only in the sth component and is zero otherwise. Let B u; h, λ = 1 2 μ 2,1 f U u 1 uA u ; h kp c ×1 + u d ∈U d p d s=1 λ s I s u d , u d f U u c , u d × 1 u c , u d gu c , u d − gu c , u d −μ 2,1 1 u c , u d ⊗ I p c · g u c , u d , 3.3 where A u; h = p c s=1 h 2 s g 1,ss u, . . . , p c s=1 h 2 s g d,ss u ′ , g u = g 1

u, . . . , g