4. DIRECTIONAL FINITE DIFFERENCES AND
DIRECTIONAL DERIVATIVE PROCESSES We review the notions of directional finite differences and
directional derivative processes for a univariate process over locations s ∈ R
d
. We assume that the process has mean 0
and is stationary with covariance function Cs, s
′
= Ks − s
′
, where K is a valid covariance function in R
d
. Two frequently used isotropic forms are the power exponential class where
K = α exp−φ
ν
, 0 ν ≤ 2, and the Matèrn class
where K = αφ
ν
K
ν
φ . K
ν
is the modified Bessel function of order ν Abramovitz and Stegun 1965. We
use the more flexible Matèrn class here.
The univariate process {βs : s ∈ R
d
} is L
2
mean square
continuous at s if Eβs − βs
2
→ 0 as s → s . Under sta-
tionarity, because Eβs − βs
2
= 2K0 − Ks − s
′
, the
process βs is mean square continuous at all s if K is continu- ous at 0. Turning to mean square differentiability, βs is mean
square differentiable at s
if there exists a vector ∇
β
s , such
that for any scalar h and any unit vector u, βs
+ hu = βs + hu
T
∇
β
s + rs
, hu, 3
where rs , huh →
0 in the L
2
sense as h → 0.
With u a unit vector, let
β
u,h
s = βs + hu − βs
h 4
be the finite difference at s in direction u at scale h .
Next, let D
u
βs = lim
h→
β
u,h
s if the limit exists. D
u
βs
is a well-defined process in R
d
, which we refer to as the direc-
tional derivative process in the direction u. In particular, this process exists if the covariance function K is twice differen-
tiable. If the unit vectors e
1
, e
2
, . . . , e
d
form an orthonormal ba- sis set for R
d
, then any unit vector u in R
d
can be written as
u =
d i=
1
w
i
e
i
with w
i
= u
T
e
i
and
d i=
1
w
2 i
= 1. But then D
u
βs = u
T
∇
β
s =
d i=
1
w
i
e
T i
∇
β
s =
d i=
1
w
i
D
e
i
βs.
5 Hence, to study directional derivative processes in arbitrary
directions, we need only work with a basis set of direc- tional derivative processes. Also from 5, it is clear that
D
−u
βs = −D
u
βs . Applying the Cauchy–Schwarz inequal-
ity to 5, for every unit vector u, D
2
u
βs ≤
d i=
1
D
2
e
i
βs .
Hence
d i=
1
D
2
e
i
βs is the maximum over all directions
of D
2
u
β s . At location s, it is achieved in the direction
u = ∇
β
s∇
β
s . In the sequel we work with d = 2, set-
ting e
1
= 1, 0 and e
2
= 0, 1. On this basis, ∇
β
s =
D
e
1
βs, D
e
2
βs
T
. From 4, to study finite difference pro- cesses at scale h in arbitrary directions, we have no reduction to
a basis set. Distribution theory for β
u,h
s and D
u
βs is developed
briefly in Appendix A. In particular, the cross-covariance functions are provided for the bivariate spatial processes
βs, β
u,h
s and βs, D
u
βs as well as the trivariate
process βs, D
u
1
βs, D
u
2
βs . It is convenient to work
with the Matèrn covariance functions because the parameter v
controls the smoothness of process realizations. A closed- form choice sets v = 32, which yields once but not twice
mean square differentiable realizations and takes the form ρs
i
− s
′ i
, φ =
1 + φs − s
′ i
exp−φs − s
′ i
. 5.
THE MODEL AND ITS DISTRIBUTION THEORY We now return to the model in 2. Following the discussion
in Section 2, we model land values on the log scale, suggest- ing that distances on the right side should go in untransformed.
Therefore, apart from a first component of 1, we envision the entries in Xs as distances from s to various fixed externalities
located at, say, s
∗ l
, l = 2, . . . , p. This ensures that the X
l
s are
twice differentiable, which we need later. By allowing the co- efficients to change with the location, the mean in 2 is essen-
tially, as flexible as we could seek; there is no need to consider introduction of functional forms in the distances.
As noted in Section 1, McMillen 1996 also considered modeling for land values with spatially varying coefficients us-
ing locally weighted regression. A brief comparison between his approach and ours might be useful. In building the model
for βs, we envision a collection of p random but dependent spatial surfaces, captured as a realization of a multivariate spa-
tial process. McMillen had Xs as the argument of his β [in fact, the argument is a function of {Xs
i
} , as well]. Further-
more, locally weighted regression requires an inverse distance function calculated between Xs and each of the {Xs
i
} ’s,
yielding the distance between vectors of distances. It also re- quires an associated window width. Considering the distance
between s and the s
i
’s, as we do, may be more attractive. We offer full inference at each location on, for example, direc-
tional gradients, the maximum gradient, and the direction of the maximum gradient. Gradients will be very difficult to study
under McMillen’s model. In fact, through the posterior distri- bution, we offer an exact assessment of uncertainty associated
with the coefficient vector at any location, as well as for any location on the mean surface and also for gradients of inter-
est. Again, such estimation of variability will not be possible through McMillen’s model.
To provide the model specification in 2, we need to pro-
pose a multivariate spatial process model for βs. Following Gelfand, Schmidt, Banerjee, and Sirmans 2004b, we use a lin-
ear model of coregionalization LMC see also Wackernagel 2003 in this regard. Specifically, we set βs = Aws, where
the components of w, the w
l
, l = 1, 2, . . . , p, are independent
spatial processes defined on D, with mean 0 and variance 1. The w
l
has an isotropic correlation function, ρ·, φ
l
, l = 1, 2, . . . , p. Without loss of generality, we may work with a lower triangular
Cholesky form for A. The resultant cross-covariance function, C
βs, βs
′
, has l, l
′
entry
Cβs, βs
′ l,l
′
=
p j=
1
a
lj
a
l
′
j
ρ·, φ
j
. Parameters in the model include the global β vector, the lower
triangular A, the φ
l
’s, and τ
2
, which we collect into a vector θ .
Now, with observations Ys
i
at locations s
i
, i = 1, . . . , n, let β
∗
= βs
1
, . . . , βs
n T
be the np × 1 column vector of a coefficient process realization. The resulting covariance matrix
for β
∗
is of the form
β
= ˜ A ˜
Dφ ˜ A
T
, 6
where ˜ A
= A ⊗ I
n×n
and ˜ Dφ is block diagonal with blocks
Rφ
l
, where Rφ
l i,i
′
= 1 + φ
l
s
i
− s
i
′
exp−φ
l
s
i
−
s
i
′
.
Downloaded by [Universitas Maritim Raja Ali Haji] at 23:23 12 January 2016
We note that marginalizing over the random effects β
∗
is helpful, leaving us to run the much lower-dimensional Markov
chain Monte Carlo MCMC algorithm for θ . Let the p × n ma- trix X
T
= Xs
1
, . . . , Xs
n
, and let ˜ X
be an n × np block- diagonal matrix with its ith block entry, i = 1, 2, . . . , n given
by X
T
s
i
. Then the marginal likelihood becomes
LY; θ ∼ ||
−12
exp−{Y − Xβ}
T −1
{Y − Xβ}2, 7
where is an np × np matrix defined by = ˜ X
β
˜ X
T
+ τ
2
I.
We assume flat normal priors for the β
l
’s, inverse gammaa
τ
2
, b
τ
2
for τ
2
, and gammaa
l
, b
l
priors for each of the decay parameters φ
l
, whereas those for the entries in
the lower-triangular matrix A were induced from an inverted Wishart prior on AA
T
also see Gelfand et al. 2004a, b. We note that the joint full conditional distribution for β is mul-
tivariate normal. For the rest of the parameters in θ , the full conditional distributions are nonstandard distributions and so
we use a Metropolis–Hastings algorithm to sample them.
We next turn to directional derivatives associated with the model in 1. In particular, for various locations and various
directions, we might seek gradients for the β
l
s ’s but have pri-
mary interest in gradients to the EYs surface. Note that di- rectional derivatives do not exist for the Ys surface, because
the ǫs surface is not even continuous, let alone differentiable.
6. INFERENCE WITH DIRECTIONAL
DERIVATIVE PROCESSES
Returning to the model in 2, the mean surface EYs =
X
T
s ˜ βs will reveal “first-order” behavior. We can think of
gradients to the mean surface as revealing “second-order” be- havior, that is, more subtle changes in the EYs surface. In fact,
we clarify several gradients of possible interest in Section 6.1. In Section 6.2 we develop the distribution theory required to
study D
u
EYs , the gradient to the mean surface at location s in
direction u. Section 6.3 provides a technical result that can be useful.
If, for example, EYs = β +
p l=
2
β
l
X
l
s , then straightfor-
ward calculation shows that when X
l
s = s − s
∗ l
, D
u
EYs =
p l=
2
β
l
u
T
s − s
∗ l
s − s
∗ l
. 8
Thus, even with constant coefficients, there will be differing directional gradients at s arising strictly from the definition
of Xs. If, say, β
l
0, then the component of the gradient as- sociated with X
l
s will be most positive in the direction away
from the externality at s
∗ l
u = s − s
∗ l
and will be 0 in the direc-
tion orthogonal to this. For a fixed u, it will be constant on an arc of fixed length about s
∗ l
. In the direction away from s
∗ l
, the l
th term in the sum for D
u
EYs will be constant. Hence local
examination near an externality of gradients both on arcs and along rays should facilitate “seeing” second-order effects.
Finally, the argument for envisioning differing gradients is that in 2 there are surely omitted unobservable explanatory
variables with spatial content. These effects will be reflected in the spatial error, and hence in EYs and thus also in D
u
EYs
. 6.1
Possible Gradients of Interest Here we briefly remind the reader that several notions of gra-
dients can be considered. For instance, suppose that we sim- plify to the p = 2 case and consider Ys = β
+ β
1
Xs + ǫs ,
where Xs = dists = s − s
∗
. Then dEYsd dists = β
1
, the most basic gradient idea. If in this regression we transform
distance, setting Xs = gdists, then dEYsd dists = β
1
g
′
dists.
Note that D
u
EYs is entirely different. D
u
EYs =
β
1
D
u
Xs , but D
u
Xs has nothing to do with g
′
dists. It
is not a function of distance alone, as 8 reveals. Now, if we replace β
1
with ˜ β
1
s , then we have dEYs
dXs = ˜ β
1
s , which again has nothing to do with D
u
EYs .
However, the mean is a function of location, not just of dis- tance from s
∗
. Viewing EYs as a surface, there is a gradient at each s in each direction u. We learn about rates of change
in the expected land surface through D
u
EYs , not through
dEYsdXs = ˜ β
1
s , the slope surface.
6.2 Further Distribution Theory
When p = 2, consider the finite differences for the process
EYs in terms of the processes β
s and β
1
s . For a scale
of h, after some manipulation, the finite difference at s toward unit vector u is given by
EYs + hu − EYs
h =
β s + hu − β
s
h + β
1
Xs + hu − Xs
h + β
1
s + hu Xs + hu − Xs
h
+ Xs
β
1
s + hu − β
1
s
h ,
and letting h → 0, D
u
EYs = D
u
β s + β
1
D
u
Xs
+ β
1
sD
u
Xs + D
u
β
1
sXs.
9
Again, with Xs = s − s
∗
, we have D
u
Xs = u
T
s − s
∗
s − s
∗
.
For the general form in 1, EYs = X
T
s ˜ βs and
D
u
EYs =
p l=
1
X
l
sD
u
˜ β
l
s + ˜ β
l
D
u
X
l
s 10
provided that D
u
X
j
s exists. In the direction away from, say, s
∗ l
[i.e., u
∗ l
= s − s
∗ l
s − s
∗ l
], the contribution of the lth term to D
u
∗
EYs is ˜
β
l
s + X
l
sD
u
∗ l
β
l
s . In a direction u
⊥ l
, orthog-
onal to the direction away from s
∗ l
, u
⊥ l
T
s − s
∗ l
= 0, and the
contribution simplifies to X
l
sD
u
⊥ l
β
l
s .
We return to the discussion in Section 2 regarding amenities that do not change continuously. Recall that for, say, crime rate,
Xs
would be a tiled surface. But then it is differentiable almost everywhere a.e. over the region; that is, D
u
Xs exists and, in
fact, equals 0 a.e. In 10, the second term on the right side for such a covariate will be 0. In the case of school quality, the
contribution to 2 would take the form
r
β
r
Is ∈ B
r
, where
Downloaded by [Universitas Maritim Raja Ali Haji] at 23:23 12 January 2016
B
r
is the rth school district and I is an indicator function. So for such a regressor, the contribution to the gradient in 10 is 0. To
clarify, the foregoing regressors provide an explanation for the mean surface but will appear only partially or not at all in the
gradient analysis.
From Section 4.1, the direction in which the directional derivative is maximized is
∇
T
EYs
= D