to minimize the following local linear GMM criterion function 1
n
Q
h
u
′
K
hλ
u Y − ξ u α
′ n
u
−1
× Q
h
u
′
K
hλ
u Y − ξ u α ,
2.8 where
n
u is a symmetric k p
c
+ 1 × k p
c
+ 1 weight ma- trix that is positive definite for large n. Clearly, the solution to
the above minimization problem is given by α
n
u; h, λ
= ξ u
′
K
hλ
u Q
h
u
n
u
−1
Q
h
u
′
K
hλ
u ξ u
−1
× ξ u
′
K
hλ
u Q
h
u
n
u
−1
Q
h
u
′
K
hλ
u Y. 2.9
Let e
j,d 1+p
c
denote the d1 + p
c
× 1 unit vector with 1 at the jth position and 0 elsewhere. Let
e
j,p
c
,d 1+p
c
denote the p
c
× d1 + p
c
selection matrix such that e
j,p
c
,d 1+p
c
α =
.
g
j
u. Then the local linear GMM estimator of g
j
u and
.
g
j
u
are, respectively, given by g
j
u;h,λ = e
′ j,d
1+p
c
α
n
u; h, λ and
.
g
j
u; h, λ = e
j,p
c
,d 1+p
c
α
n
u; h, λ for j = 1, . . . , d.
2.10 We will study the asymptotic properties of
α
n
u; h, λ in the
next section. Remark 1 Choice of IVs.
The choice of QV
i
is important
in applications. One can choose it from the union of Z
i
and U
i
e.g., Q V
i
= V
i
such that a certain identification condition is satisfied. A necessary identification condition is k ≥ d, which
ensures that the dimension of Q
h,iu
is not smaller than the di-
mension of αu. In the following, we will consider the optimal choice of QV
i
where optimality is in the sense of minimizing the AVC matrix for the class of local linear GMM estimators
given the orthogonal condition in 2.1
. We do so by extending the work of Newey
1990 ,
1993 , Baltagi and Li
2002 , and
Ai and Chen 2003
to our framework, but the latter authors only consider OIVs for GMM estimates of finite dimensional
parameters based on conditional moment conditions. Remark 2 Local linear vs. local constant GMM esti-
mators. An alternative to the local linear GMM estimator is
the local constant GMM estimator; see, for example, Lewbel 2007
and Tran and Tsionas 2010
. In this case, the parame- ter of interest α contains only the set of functional coefficients
g
j
, j = 1, . . . , d, evaluated at u = u
c′
, u
d′ ′
, but not their first-
order derivatives with respect to the continuous arguments. As a result, one can set Q
h,iu
= QV
i
so that there is no distinc- tion between local and global instruments. In addition, our local
linear GMM estimator in 2.9
reduces to that of Cai and Li 2008
by setting
n
u to be the identity matrix and choosing
k = d global instruments. The latter condition is necessary for the model to be locally just identified.
3. ASYMPTOTIC PROPERTIES OF THE LOCAL
LINEAR GMM ESTIMATOR In this section, we first give a set of assumptions and
then study the asymptotic properties of the local linear GMM estimator.
3.1 Assumptions
To facilitate the presentation, define
1
u = E Q
V
i
X
′ i
|U
i
= u
and
2
u = E Q
V
i
Q V
i ′
σ
2
V
i
|U
i
= u ,
where σ
2
v ≡ E[ε
2 i
|V
i
= v]. Let f
U
u ≡ f
U
u
c
, u
d
denote
the joint density of U
c i
and U
d i
and pu
d
be the marginal proba-
bility mass of U
d i
at u
d
. We use
U
c
and U
d
=
p
d
t =1
{0, 1, . . . , c
t
−
1} to denote the support of U
c i
and U
d i
, respectively. We now list the assumptions that will be used to establish the
asymptotic distribution of our estimator. Assumption A1.
Y
i
, X
i
, Z
i
, U
i
, i = 1, . . . , n, are indepen- dent and identically distributed IID.
Assumption A2. E|ε
i
|
2+δ
∞ for some δ 0. EQV
i
X
′ i
2
∞. Assumption A3.
i U
c
is compact. ii The functions f
U
·, u
d
,
1
·, u
d
, and
2
·, u
d
are continuously differen- tiable on
U
c
for all u
d
∈ U
d
. 0 f
U
u
c
, u
d
≤ C for some C ∞. iii The functions g
j
·, u
d
, j = 1, . . . , d, are second order continuously differentiable on
U
c
for all u
d
∈ U
d
. Assumption A4.
i rank
1
u = d, and the k × k matrix
2
u is positive definite. ii
n
u = u + o
P
1, where
u is symmetric and positive definite.
Assumption A5. The kernel function w· is a probability
density function PDF that is symmetric, bounded, and has compact support [−c
w
, c
w
]. It satisfies the Lipschitz condition |wv
1
− wv
2
| ≤ C
w
|v
1
− v
2
| for all v
1
, v
2
∈ [−c
w
, c
w
]. Assumption A6.
As n → 0, the bandwidth sequences h =
h
1
, . . . , h
p
c
′
and λ = λ
1
, . . . , λ
p
d
′
satisfy i nh → ∞, and ii nh
12
h
2
+ λ = O1, where h ≡ h
1
, . . . , h
p
c
. A1 requires IID observations. Following Cai and Li
2008 and SCU 2009, this assumption can be relaxed to allow for
time series observations. A2 and A3 impose some moment and smoothness conditions, respectively. A4i imposes rank con-
ditions for the identification of the functional coefficients and their first-order derivatives and A4ii is weak in that it allows
the random weight matrix
n
to be consistently estimated from the data. As Hall, Wolf, and Yao
1999 remarked, the require-
ment in A5 that w· is compactly supported can be removed at the cost of lengthier arguments used in the proofs, and in
particular, the Gaussian kernel is allowed. A6 is standard for nonparametric regression with mixed data; see, for example, Li
and Racine
2008 .
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:05 11 January 2016
3.2 Asymptotic Theory for the Local Linear Estimator
Let μ
s,t
=
R
v
s
w v
t
dv, s, t = 0, 1, 2. Define
u = f
U
u
1
u
k×dp
c
kp
c
×d
μ
2,1 1
u ⊗ I
p
c
, 3.1
and ϒ
u = f
U
u
μ
p
c
0,2 2
u
k×kp
c
kp
c
×k
μ
2,2 2
u ⊗ I
p
c
. 3.2
Clearly, u is a k1 + p
c
× d1 + p
c
matrix and ϒu is
k 1 + p
c
× k1 + p
c
matrix. To describe the leading bias term associated with the discrete
random variables, we define an indicator function I
s
·, · by I
s
u
d
, u
d
= 1{u
d
= u
d s
}
p
d
t =s
1{u
d
= u
d t
}. That is, I
s
u
d
, u
d
is one if and only if u
d
and u
d
differ only in the sth component and is zero otherwise. Let
B u; h, λ =
1 2
μ
2,1
f
U
u
1
uA u
; h
kp
c
×1
+
u
d
∈U
d
p
d
s=1
λ
s
I
s
u
d
, u
d
f
U
u
c
, u
d
×
1
u
c
, u
d
gu
c
, u
d
− gu
c
, u
d
−μ
2,1 1
u
c
, u
d
⊗ I
p
c
·
g u
c
, u
d
, 3.3
where A
u; h =
p
c
s=1
h
2 s
g
1,ss
u,
. . . ,
p
c
s=1
h
2 s
g
d,ss
u
′
,
g u = g
1
u, . . . , g