h h h h

4. DIRECTIONAL FINITE DIFFERENCES AND DIRECTIONAL DERIVATIVE PROCESSES We review the notions of directional finite differences and directional derivative processes for a univariate process over locations s ∈ R d . We assume that the process has mean 0 and is stationary with covariance function Cs, s ′ = Ks − s ′ , where K is a valid covariance function in R d . Two frequently used isotropic forms are the power exponential class where K = α exp−φ ν , 0 ν ≤ 2, and the Matèrn class where K = αφ ν K ν φ . K ν is the modified Bessel function of order ν Abramovitz and Stegun 1965. We use the more flexible Matèrn class here. The univariate process {βs : s ∈ R d } is L 2 mean square continuous at s if Eβs − βs 2 → 0 as s → s . Under sta- tionarity, because Eβs − βs 2 = 2K0 − Ks − s ′ , the process βs is mean square continuous at all s if K is continu- ous at 0. Turning to mean square differentiability, βs is mean square differentiable at s if there exists a vector ∇ β s , such that for any scalar h and any unit vector u, βs + hu = βs + hu T ∇ β s + rs , hu, 3 where rs , huh → 0 in the L 2 sense as h → 0. With u a unit vector, let β

u,h

s = βs + hu − βs h 4 be the finite difference at s in direction u at scale h . Next, let D u βs = lim h→ β

u,h

s if the limit exists. D u βs is a well-defined process in R d , which we refer to as the direc- tional derivative process in the direction u. In particular, this process exists if the covariance function K is twice differen- tiable. If the unit vectors e 1 , e 2 , . . . , e d form an orthonormal ba- sis set for R d , then any unit vector u in R d can be written as u = d i= 1 w i e i with w i = u T e i and d i= 1 w 2 i = 1. But then D u βs = u T ∇ β s = d i= 1 w i e T i ∇ β s = d i= 1 w i D e i βs. 5 Hence, to study directional derivative processes in arbitrary directions, we need only work with a basis set of direc- tional derivative processes. Also from 5, it is clear that D −u βs = −D u βs . Applying the Cauchy–Schwarz inequal- ity to 5, for every unit vector u, D 2 u βs ≤ d i= 1 D 2 e i βs . Hence d i= 1 D 2 e i βs is the maximum over all directions of D 2 u β s . At location s, it is achieved in the direction u = ∇ β s∇ β s . In the sequel we work with d = 2, set- ting e 1 = 1, 0 and e 2 = 0, 1. On this basis, ∇ β s = D e 1 βs, D e 2 βs T . From 4, to study finite difference pro- cesses at scale h in arbitrary directions, we have no reduction to a basis set. Distribution theory for β

u,h

s and D u βs is developed briefly in Appendix A. In particular, the cross-covariance functions are provided for the bivariate spatial processes βs, β

u,h

s and βs, D u βs as well as the trivariate process βs, D u 1 βs, D u 2 βs . It is convenient to work with the Matèrn covariance functions because the parameter v controls the smoothness of process realizations. A closed- form choice sets v = 32, which yields once but not twice mean square differentiable realizations and takes the form ρs i − s ′ i , φ = 1 + φs − s ′ i exp−φs − s ′ i . 5. THE MODEL AND ITS DISTRIBUTION THEORY We now return to the model in 2. Following the discussion in Section 2, we model land values on the log scale, suggest- ing that distances on the right side should go in untransformed. Therefore, apart from a first component of 1, we envision the entries in Xs as distances from s to various fixed externalities located at, say, s ∗ l , l = 2, . . . , p. This ensures that the X l s are twice differentiable, which we need later. By allowing the co- efficients to change with the location, the mean in 2 is essen- tially, as flexible as we could seek; there is no need to consider introduction of functional forms in the distances. As noted in Section 1, McMillen 1996 also considered modeling for land values with spatially varying coefficients us- ing locally weighted regression. A brief comparison between his approach and ours might be useful. In building the model for βs, we envision a collection of p random but dependent spatial surfaces, captured as a realization of a multivariate spa- tial process. McMillen had Xs as the argument of his β [in fact, the argument is a function of {Xs i } , as well]. Further- more, locally weighted regression requires an inverse distance function calculated between Xs and each of the {Xs i } ’s, yielding the distance between vectors of distances. It also re- quires an associated window width. Considering the distance between s and the s i ’s, as we do, may be more attractive. We offer full inference at each location on, for example, direc- tional gradients, the maximum gradient, and the direction of the maximum gradient. Gradients will be very difficult to study under McMillen’s model. In fact, through the posterior distri- bution, we offer an exact assessment of uncertainty associated with the coefficient vector at any location, as well as for any location on the mean surface and also for gradients of inter- est. Again, such estimation of variability will not be possible through McMillen’s model. To provide the model specification in 2, we need to pro- pose a multivariate spatial process model for βs. Following Gelfand, Schmidt, Banerjee, and Sirmans 2004b, we use a lin- ear model of coregionalization LMC see also Wackernagel 2003 in this regard. Specifically, we set βs = Aws, where the components of w, the w l , l = 1, 2, . . . , p, are independent spatial processes defined on D, with mean 0 and variance 1. The w l has an isotropic correlation function, ρ·, φ l , l = 1, 2, . . . , p. Without loss of generality, we may work with a lower triangular Cholesky form for A. The resultant cross-covariance function, C βs, βs ′ , has l, l ′ entry Cβs, βs ′ l,l ′ = p j= 1 a lj a l ′ j ρ·, φ j . Parameters in the model include the global β vector, the lower triangular A, the φ l ’s, and τ 2 , which we collect into a vector θ . Now, with observations Ys i at locations s i , i = 1, . . . , n, let β ∗ = βs 1 , . . . , βs n T be the np × 1 column vector of a coefficient process realization. The resulting covariance matrix for β ∗ is of the form β = ˜ A ˜ Dφ ˜ A T , 6 where ˜ A = A ⊗ I n×n and ˜ Dφ is block diagonal with blocks Rφ l , where Rφ l i,i ′ = 1 + φ l s i − s i ′ exp−φ l s i − s i ′ . Downloaded by [Universitas Maritim Raja Ali Haji] at 23:23 12 January 2016 We note that marginalizing over the random effects β ∗ is helpful, leaving us to run the much lower-dimensional Markov chain Monte Carlo MCMC algorithm for θ . Let the p × n ma- trix X T = Xs 1 , . . . , Xs n , and let ˜ X be an n × np block- diagonal matrix with its ith block entry, i = 1, 2, . . . , n given by X T s i . Then the marginal likelihood becomes LY; θ ∼ || −12 exp−{Y − Xβ} T −1 {Y − Xβ}2, 7 where is an np × np matrix defined by = ˜ X β ˜ X T + τ 2 I. We assume flat normal priors for the β l ’s, inverse gammaa τ 2 , b τ 2 for τ 2 , and gammaa l , b l priors for each of the decay parameters φ l , whereas those for the entries in the lower-triangular matrix A were induced from an inverted Wishart prior on AA T also see Gelfand et al. 2004a, b. We note that the joint full conditional distribution for β is mul- tivariate normal. For the rest of the parameters in θ , the full conditional distributions are nonstandard distributions and so we use a Metropolis–Hastings algorithm to sample them. We next turn to directional derivatives associated with the model in 1. In particular, for various locations and various directions, we might seek gradients for the β l s ’s but have pri- mary interest in gradients to the EYs surface. Note that di- rectional derivatives do not exist for the Ys surface, because the ǫs surface is not even continuous, let alone differentiable. 6. INFERENCE WITH DIRECTIONAL DERIVATIVE PROCESSES Returning to the model in 2, the mean surface EYs = X T s ˜ βs will reveal “first-order” behavior. We can think of gradients to the mean surface as revealing “second-order” be- havior, that is, more subtle changes in the EYs surface. In fact, we clarify several gradients of possible interest in Section 6.1. In Section 6.2 we develop the distribution theory required to study D u EYs , the gradient to the mean surface at location s in direction u. Section 6.3 provides a technical result that can be useful. If, for example, EYs = β + p l= 2 β l X l s , then straightfor- ward calculation shows that when X l s = s − s ∗ l , D u EYs = p l= 2 β l u T s − s ∗ l s − s ∗ l . 8 Thus, even with constant coefficients, there will be differing directional gradients at s arising strictly from the definition of Xs. If, say, β l 0, then the component of the gradient as- sociated with X l s will be most positive in the direction away from the externality at s ∗ l u = s − s ∗ l and will be 0 in the direc- tion orthogonal to this. For a fixed u, it will be constant on an arc of fixed length about s ∗ l . In the direction away from s ∗ l , the l th term in the sum for D u EYs will be constant. Hence local examination near an externality of gradients both on arcs and along rays should facilitate “seeing” second-order effects. Finally, the argument for envisioning differing gradients is that in 2 there are surely omitted unobservable explanatory variables with spatial content. These effects will be reflected in the spatial error, and hence in EYs and thus also in D u EYs . 6.1 Possible Gradients of Interest Here we briefly remind the reader that several notions of gra- dients can be considered. For instance, suppose that we sim- plify to the p = 2 case and consider Ys = β + β 1 Xs + ǫs , where Xs = dists = s − s ∗ . Then dEYsd dists = β 1 , the most basic gradient idea. If in this regression we transform distance, setting Xs = gdists, then dEYsd dists = β 1 g ′ dists. Note that D u EYs is entirely different. D u EYs = β 1 D u Xs , but D u Xs has nothing to do with g ′ dists. It is not a function of distance alone, as 8 reveals. Now, if we replace β 1 with ˜ β 1 s , then we have dEYs dXs = ˜ β 1 s , which again has nothing to do with D u EYs . However, the mean is a function of location, not just of dis- tance from s ∗ . Viewing EYs as a surface, there is a gradient at each s in each direction u. We learn about rates of change in the expected land surface through D u EYs , not through dEYsdXs = ˜ β 1 s , the slope surface. 6.2 Further Distribution Theory When p = 2, consider the finite differences for the process EYs in terms of the processes β s and β 1 s . For a scale of h, after some manipulation, the finite difference at s toward unit vector u is given by EYs + hu − EYs h = β s + hu − β s h + β 1 Xs + hu − Xs h + β 1 s + hu Xs + hu − Xs h + Xs β 1 s + hu − β 1 s h , and letting h → 0, D u EYs = D u β s + β 1 D u Xs + β 1 sD u Xs + D u β 1 sXs. 9 Again, with Xs = s − s ∗ , we have D u Xs = u T s − s ∗ s − s ∗ . For the general form in 1, EYs = X T s ˜ βs and D u EYs = p l= 1 X l sD u ˜ β l s + ˜ β l D u X l s 10 provided that D u X j s exists. In the direction away from, say, s ∗ l [i.e., u ∗ l = s − s ∗ l s − s ∗ l ], the contribution of the lth term to D u ∗ EYs is ˜ β l s + X l sD u ∗ l β l s . In a direction u ⊥ l , orthog- onal to the direction away from s ∗ l , u ⊥ l T s − s ∗ l = 0, and the contribution simplifies to X l sD u ⊥ l β l s . We return to the discussion in Section 2 regarding amenities that do not change continuously. Recall that for, say, crime rate, Xs would be a tiled surface. But then it is differentiable almost everywhere a.e. over the region; that is, D u Xs exists and, in fact, equals 0 a.e. In 10, the second term on the right side for such a covariate will be 0. In the case of school quality, the contribution to 2 would take the form r β r Is ∈ B r , where Downloaded by [Universitas Maritim Raja Ali Haji] at 23:23 12 January 2016 B r is the rth school district and I is an indicator function. So for such a regressor, the contribution to the gradient in 10 is 0. To clarify, the foregoing regressors provide an explanation for the mean surface but will appear only partially or not at all in the gradient analysis. From Section 4.1, the direction in which the directional derivative is maximized is ∇ T EYs = D