continuous and discrete exogenous regressors and the endoge- nous regressors enter the model linearly. Then we propose local
linear GMM estimates for the functional coefficients.
2.1 Functional Coefficient Representation
We consider the following functional coefficient IV model Y
i
= g U
c i
, U
d i
′
X
i
+ ε
i
=
d j =1
g
j
U
c i
, U
d i
X
i,j
+ ε
i
, E ε
i
|Z
i
, U
i
= 0 a.s., 2.1
where Y
i
is a scalar random variable, g = g
1
, . . . , g
d ′
, {g
j
}
d j =1
are the unknown structural functions of interest, X
i, 1
= 1, X
i
= X
i, 1
, . . . , X
i,d ′
is a d × 1 vector consisting of d − 1
endogenous regressors, U
i
= U
c′ i
, U
d′ i
′
, U
c i
, and U
d i
denote a p
c
× 1 vector of continuous exogenous regressors and a p
d
× 1 vector of discrete exogenous regressors, respectively,
Z
i
is a q
z
× 1 vector of IVs, and a.s. abbreviates almost surely. We assume that a random sample {Y
i
, X
i
, Z
i
, U
i
}
n i=1
is observed.
In the absence of U
d i
, 2.1
reduces to the model of CDXW
2006. If none of the variables in X
i
are endogenous, the model becomes that of SCU 2009. As the latter authors demonstrated
through the estimation of earnings function, it is important to allow the variables in the functional coefficients to include both
continuous and discrete variables, where the discrete variables may represent race, profession, region, etc.
2.2 Local Linear GMM Estimation
The orthogonality condition in 2.1
suggests that we can estimate the unknown functional coefficients via the principle
of nonparametric generalized method of moments NPGMM, which is similar to the GMM of Hansen
1982 for parametric
models. Let V
i
= Z
′ i
, U
′ i
′
. It indicates that for any k × 1 vector
function QV
i
, we have E
[Q V
i
ε
i
|V
i
] = E
⎡ ⎣Q V
i
⎧ ⎨
⎩ Y
i
−
d j =1
g
j
U
c i
, U
d i
X
i,j
⎫ ⎬
⎭ |V
i
⎤ ⎦ = 0. 2.2
Following Cai and Li 2008
, we propose an estimation proce- dure to combine the orthogonality condition in
2.2 with the
idea of local linear fitting in the nonparametrics literature to estimate the unknown functional coefficients.
Like Racine and Li 2004
, we use U
d i,t
to denote the tth
component of U
d i
. U
c i,t
is similarly defined. Analogously, we let u
d t
and u
c t
denote the tth component of u
d
and u
c
, re-
spectively, that is, u
d
= u
d 1
, . . . , u
d p
d
′
and u
c
= u
c 1
, . . . , u
c p
c
′
.
We assume that U
d i,t
can take c
t
≥ 2 different values, that is, U
d i,t
∈ {0, 1, . . . , c
t
− 1} for t = 1, . . . , p
d
. Let u = u
c
, u
d
∈ R
p
c
× R
p
d
. To define the kernel weight function, we focus on
the case for which there is no natural ordering in U
d i
. Define
l U
d i,t
, u
d t
, λ
t
= 1
if U
d i,t
= u
d t
, λ
t
if U
d i,t
= u
d t
, 2.3
where λ
t
is a bandwidth that lies on the interval [0, 1]. Clearly, when λ
t
= 0, lU
d i,t
, u
d t
, 0 becomes an indicator function, and
λ
t
= 1, lU
d i,t
, u
d t
, 1 becomes a uniform weight function. We
define the product kernel for the discrete random variables by L
U
d i
, u
d
, λ = L
λ
U
d i
− u
d
=
p
d
t =1
l U
d i,t
, u
d t
, λ
t
. 2.4
For the continuous random variables, we use w· to denote a univariate kernel function and define the product kernel function
by W
h,iu