MODEL-CHECKS FOR HOMOSCHEDASTIC SPATIAL LINEAR REGRESSION MODEL BASED ON BOOTSTRAP METHOD | Somayasa | JURNAL ILMIAH MATEMATIKA DAN TERAPAN 138 470 1 PB
MODEL-CHECKS FOR HOMOSCHEDASTIC SPATIAL LINEAR REGRESSION MODEL BASED
ON BOOTSTRAP METHOD
W. Somayasa
Mathematics Department of Haluoleo University, Kendari
email: [email protected]
Abstract
In this paper we propose Efron residual based-bootstrap approximation methods in asymptotic modelchecks for homoschedastic spatial linear regression models. It is shown that under some regularity
conditions given to the known regression functions the bootstrap version of the sequence of least squares
residual partial sums processes converges in distribution to a centred Gaussian process having sample
paths in the space of continuous functions on
I : 0,1
0,1
. Thus, Efron residual based-bootstrap is
a consistent approximation in the usual sense. The finite sample performance of the bootstrap level
Kolmogorov-Smirnov (KS) type test is also investigated by means of Monte Carlo simulation.
Key words: residual based-bootstrap, asymptotic model-check, homoschedastic spatial linear regression
models, partial sums, Gaussian process.
I.
Introduction
Practically the correctness of the assumed linear models is usually evaluated by analysing the
cumulative sums (CUSUM) of the least squares residuals. To this end a huge amount of literature is
available, see among others MacNeill (1978) or Bischoff and Miller (2000) for one dimensional
context. In the spatial context we refer the reader to MacNeill and Jandhyala (1993) and Xie and
MacNeill (2006).
To see the above mentioned problem in more detail let us consider an experiment performed
on an experimental region I :
0,1
0,1 under n 2 experimental conditions taken from a regular
lattice, given by
n
:
( / n, k / n ) : 1 , k
I , n 1.
n
(1)
Accordingly, we put together the observations carried out in
Yn
n
:
n ,n
Yk k 1, 1
( / n, k / n ) , and
product
A
n n
A, B
n n
n n
n n
, where for
is the space of
: trace(A t B)
trace(A t A) , for A, B
1 , k n , Yk
in an
n
is the observation at the point
n n real matrices furnished with the Euclidean inner
and
n n
the
corresponding
norm
given
. It is worth noting that for our model we take
experimental region without loss of generality instead of any compact subset of
f :I
, let
f(
n ):
f ( / n, k / n)
n n matrix
n ,n
k 1, 1
n n
, then we have Yn
n
g(
2
by
I as the
. For any
n)
where g is a true but unknown real valued regression function having bounded variation on
En n ,
I and
Model-Checks For Homoschedastic Spatial Linear Regression Model Based On Bootstrap Method
En
n
n ,n
k k 1, 1 is the corresponding matrix of independent and identically distributed random
:
errors having mean zero and variance
2
(0, ) , for 1 , k n . We are interested in testing
the hypotheses
H 0 : g(
where
fp (
n ),, f p (
Wn : f1 (
n)
n)
Wn , n 1 ,
n n
is a sub space of
(2)
n n
generated by f1 (
,
n),
n).
H0 ,
Under
Rn
Wn vs. K : g(
n)
n
: pr
Yn
Wn
the
pr
n
Wn
matrix
En
of
least
squares
(Stapleton, 1995), where
n
pr
Wn
residuals
is
given
by
is the orthogonal projector onto
the orthogonal complement of W n . Without loss of the generality we assume throughout this work
that f1 (
n),
, fp (
n ) is an orthogonal basis of
Wn . Hence, by the elementary linear algebra
we get under H 0 ,
p
Rn
n
: En
n
n ), E n n
fi (
i 1
vec
Equivalently by the
fi (
n n
n ), f i (
fi (
n)
n)
.
(3)
n n
operator, we have
X n (X nt X n ) 1 X nt vec(E n n ) ,
vec(R n n ) vec(E n n )
where X n is the design matrix of the model under H 0 , i.e., an n
is given by the n
2
is given by ˆ n
2
dimensional vector vec(f k (
n
p matrix whose k th column
k 1, , p . A consistent estimator for
2
2
Rn
:
n )) ,
2
n
2
n n
p
2
in the sense ˆ n converges in probability to
Suppose that the regression functions f 1 , ,
integrable on I with respect to the Lebesgue measure
2
, as
n
.
f p are linearly independent and squared
I , then under
H 0 , MacNeill and Jandhyala
(1993) and Xie and MacNeill (2006) showed that
1
n
Tn (vec(R n n ))( )
where for
D
Bf ( ) in C I , as n
( t , s) I ,
f td
B f ( t , s) : B 2 ( t , s)
0, t
and
,
Tn :
n2
C(I)
is
the
I
G
1
(R )
partial
fdB 2 ,
(4)
I
0 ,s
sums
operator
defined
in
Park
(1971).
Here
p, p
G:
fkf d
I
p p
I
, B 2 is the standard Brownian sheet having sample pats in C I
k 1, 1
(R )
and
stands for the Riemann-Stieltjes integral.
52
JIMT, Vol. 7, No.2, Nopember 2010 : 51 – 59
tn
If
is
a
realisation
of
the
Kolmogorov-Smirnov
(KS)
statistics
1
Tn (vec(R n n ))( / n, k / n ) , then a level test of KS type based on residual
n
partial sums processes for testing (2) will reject H 0 if and only if t n c1 , where c1
is a
KS n ,f : max
0 ,k n
constant that satisfies the equation
PH 0
max
0 ,k n
1
Tn ( vec(R n n ))( / n, k / n )
n
c1
.
Hence, by the continuous mapping theorem (Billingsley, 1968, p. 29-30), c1
by means of the distribution of
can be approximated
sup B f ( t , s) .
0 t ,s 1
As it is presented in (4), the limiting distribution of the KS type statistics under H 0 is not
mathematically tractable, because it depends on the structure of the designs and the regression
functions. Therefore the application of the preceding tests procedure is limited. It is the purpose of
the preceding paper to investigate the performance of bootstrap test of (2). Although the application
of the bootstrap appears naturally in the present context, to the knowledge of the author this topic
has not found much attention in the literature. One purpose of the present paper is to demonstrate
that Efron residual based-bootstrap (Shao and Tu, 1995) provides simple and reliable alternative for
constructing asymptotic critical region of such a KS type test. This will be discussed in the Section 2.
In Section 3 we develop Monte Carlo simulation in investigating the finite sample characteristics of
the bootstrap test. An application of the developed method to a real data will be presented in Section
4. Finally we close the paper in Section 5 presenting conclusion and some remarks for future
researches.
II.
Bootstrap Approximation
k th row and th column computed
under H 0 , 1 , k n . We define an n n matrix of
Let rk be the component of the residual matrix in the
based on the original observations Yn
centred residuals
R̂ n
n
:
rk
n
n ,n
k 1, 1 ,
r
where
r:
empirical distribution function of the components of
1 , k n , i.e., for x
of
,
FR̂ (x) :
1
n2
n n
1
k 1 1
,x
1
n2
R̂ n
(rk
n
n
k 1 1
n
rk . Further, let FR̂ be the
putting mass
1
n2
to rk
r ) , where 1 A is the indicator function
A . Efron residual based-bootstrap defines the bootstrap observations by
Yn* n : Y*k
where
E *n
n
n ,n
k 1, 1
* n ,n
k k 1, 1
:
g(
n)
E *n n ,
is a random matrix whose components are a random sample from
and
g(
53
r , for
n)
p
i 1
fi (
n ), Yn n
fi (
n ), f i (
n n
n)
fi (
n n
n)
(5)
FR̂
Model-Checks For Homoschedastic Spatial Linear Regression Model Based On Bootstrap Method
is the ordinary least squares estimator of g (
and variance operators conditioned on
ˆ n*2 : var * (
and
n2 p
n2
*
k )
ˆ n2
n
) under H 0 . Let E * and var * be the expectation
FR̂ . Then by the construction it is clear that E * (
(r ) 2 which clearly converges in probability to
2
*
k )
0
by the
famous weak law of large number and Slutsky’s theorem.
The bootstrap analogous of the residual partial sums process is given by
1
Tn ( vec(R *n n ))( )
n ˆ n*
1
Tn ( vec(E *n n ) X n (X nt X n ) 1 X nt vec(E *n n ))( ) .
n ˆ n*
(6)
Hence, the bootstrap version of the KS type statistics can be represented by
KS *n ,f :
1
Tn ( vec(R *n n ))( / n, k / n ) .
*
n ˆn
max
0 ,k n
(7)
Theorem 1
Suppose that the regression functions f 1 , ,
squared integrable functions with respect to
f p are linearly independent in the space of
I , denoted by
L 2 ( I ) , continuous and have bounded
variation on I. Then under H 0 it holds,
1
Tn ( vec(R *n n ))( )
n ˆ n*
D
Bf ( ) in C I , as n
.
Proof
A : t1, t 2 s1, s 2 I , t1 t 2 , s1
f on I , the increment of f over A is defined by
I f : f (t 2 , s 2 ) - f (t 2 , s1 ) f (t1, s 2 ) f (t1, s1 ) .
s 2 and any real-valued function
For any rectangle
Further, for a fixed natural number n, let us define operators
O n : C( I )
U n : C(I)
n2
and
C(I) , such that for any u C(I) and ( t , s) I ,
n ,n
Ik u k 1, 1
U n (u ) : vec
,
(8)
(9)
O n (u)(t, s) : Tn U n (u) X n (X nt X n ) 1 X nt U n (u) (t, s) ,
(k 1) / n, k / n , 1 , k n . It is obvious that O n is linear by
where I k : ( 1) / n , / n
n2
the linearity of Tn . Moreover, since for every a
, U n (Tn (a ))
a , it follows that O n is
idempotent. It can also be shown by the expression
O n (u )(t , s)
Tn ( U n (u ))(t , s)
p
vec(f i (
n )), U n ( u )
vec(f i (
i 1
Tn ( vec(f i (
n )), vec(f i (
n )))(t , s)
n ))
,
by the definition of norm of a operator on Banach space (Conway, 1985, p. 70), that
p
On
13 1 4
i 1
fi
2
2
inf f i ( t, s)
.
0 t ,s 1
54
JIMT, Vol. 7, No.2, Nopember 2010 : 51 – 59
Thus, O n is uniformly bounded on
where .
C(I) . This will directly imply to the continuity of O n on C(I) ,
is the supremum norm. Let us define an operator
O : C(I)
C(I) , such that for every
u C(I) and ( t, s) I ,
O(u )(t , s) :
0, t
0 ,s
f td
u
0, t
G
I
1
(R )
fdu .
0 ,s
I
The right hand side of (9) also can be written by
O n (u )(t , s) : Tn U n (u ) ( t , s)
1
n2
where
n2
nt ns
t
nt ns
1 t
X n X n X nt U n (u) ,
2
n
Xn
is the vector whose first nt ns
zero. By the definition and by the continuity of
n
Tn U n (u ) ( t , s)
0, t
Furthermore, since for
1
n2
t
nt ns
u,
u , we have
( t , s) I .
(10)
i 1, , p , f i is bounded on I, then it holds component wise
n
Xn
0 ,s
components are one and the reminder is
f 1d I , ,
0, t
0 ,s
fpd
0,t
I
.
(11)
0 ,s
The convergence component wise
1
p,p
1
1 t
XnXn
n2
n
f i f jd
G 1,
I
I
(12)
i 1, j 1
follows from the fact that the mapping B
B
1
is continuous on the space of invertible matrices.
i 1, , p , f i has bounded variation and u is continuous on I, the RiemannStieltjes integral of f i with respect to u exists for all i. Consequently, we have the following
Further, since for
convergence component wise
(R )
(R )
n
X nt U n (u)
f1du, , f p du .
Thus, from (10), (11), (12) and (13) , for every
Next let (u n ) n
bounded on
0
be any sequence in
1
(13)
I
I
u C(I) , it holds O n (u ) O(u )
C(I) such that u n
u
n
n
0.
0 , then since O n is
C(I) , we finally have
O n ( u n ) O( u )
On
un
u
O n (u ) O(u )
n
0.
C(I) that satisfies the condition there exists a
that converges to u in C(I) , but the sequence (O n (u n )) n 1 does not converge
The last result shows that the only subset of
sequence (u n ) n
to
(1971), we get
55
1
O(u ) is an empty set. Hence, by Theorem 5.5 in Billingsley (1968) and Theorem 4 in Park
Model-Checks For Homoschedastic Spatial Linear Regression Model Based On Bootstrap Method
1
Tn vec(R *n n )
n ˆ n*
On
1
Tn vec(E *n n )
n ˆ n*
D
.
O(B 2 ) , as n
The proof of the theorem is complete since B 2 ( t , s) 0 , almost surely if t 0 or s 0 .
Corollary 1
By the continuous mapping theorem, under H 0 we have
D
KS *n ,f
Sup B f ( t , s) , as n
.
0 t ,s 1
(0,1) , the constant c 1
Hence, for
constant
c1*
that satisfies the equation P
defined in page 4 can now be approximated by the
*
KS *n ,f
c1*
KS type test will reject H 0 if
then an asymptotic level
. If t n is a realisation of
tn
KS n,f ,
c1* . We will illustrate in Section 4
the finite sample properties of this approach by means of simulation study.
Remark 3.
The extension of Theorem 1 to the case of
design
III.
n 1 n 2 n p regular lattice with n i
p 3 dimensional unit cube and experimental
n j , for i
j is straightforward.
Simulation
We develop Monte Carlo simulation for constructing the finite sample sizes critical region of
the bootstrap level
W vs. K : g
H0 : g
where W :
test of the hypotheses
f1 , f 2 , f 3
W' ,
(15)
and W ' : f 1 , f 2 , f 3 , f 4 , f 5 , f 6 , f 1 , f 2 , f 3 , f 4 , f 5 , f 6 : I
, defined
by f 1 ( t , s) : 1 , f 2 ( t , s) : t , f 3 ( t , s) : s ,
f 4 ( t , s) : t 2 , f 5 ( t , s) : ts , and f 6 ( t, s) : s 2 . The
simulation is developed according to the following algorithm.
Begin Algorithm
n n matrix of pseudo random errors E n n from a distribution having mean zero
Step 1: Generate
and variance
2
.
Step 2: Compute the least squares residuals by the equation
vec(R n n ) vec(E n n ) X n (X nt X n ) 1 X nt vec(E n n ) .
Step 3: Generate
n n matrix of bootstrap errors E *n n by sampling with replacement from the
centred residuals.
Step 4: Compute the bootstrap residuals by the equation
vec(R *n n ) vec(E *n n ) X n (X nt X n ) 1 X nt vec(E *n n ) .
Step 5: Compute
ˆ n* and KS *n ,f .
Step 6: Repeat Step 1 to Step 5 M times.
Step 7: Compute
c1* : sort M values of KS *n ,f obtained in Step 5 in ascending order, i.e.,
1)
KS *(
n ,f
2)
KS *(
n ,f
M)
KS *(
n ,f ,
56
JIMT, Vol. 7, No.2, Nopember 2010 : 51 – 59
(1
KS *(M
n,f
then
α))
; if M (1 - α) Ν
c1*
,
(1 α) 1)
KS *(M
n,f
; if M (1 - α) Ν
where N is the set of natural numbers.
End Algorithm
n
30
40
50
M=500
0.010
0.025
0.050
df 0
0.965
0.893
0.844
df1
1.183
1.119
df 2
1.381
df 3
0.100
0.150
0.200
0.776
0.746
0.717
1.057
0.994
0.954
0.928
1.287
1.168
1.094
1.047
1.013
1.327
1.240
1.150
1.092
1.026
0.970
df 0
0.934
0.868
0.816
0.757
0.730
0.697
df1
1.490
1.331
1.214
1.129
1.056
1.017
df 2
1.304
1.232
1.184
1.093
1.030
0.989
df 3
1.133
1.085
1.052
0.995
0.945
0.919
df 0
0.918
0.893
0.853
0.776
0.736
0.717
df1
1.316
1.234
1.165
1.081
1.023
0.993
df 2
1.291
1.175
1.147
1.086
1.045
1.000
df 3
1.336
1.220
1.173
1.098
1.047
1.014
Table 1. Simulated rejection probabilities of the Kolmogorov-Smirnov bootstrap test.
For the simulation, we generate the error variables from i.i.d. random variables having mean
zero and variance 0.5. Four cases are considered, that is for
k
~ N(0, 0.5) and
k
~
2
4 ,
1 , k n ,
1, 2, 3 .
The simulation results are depicted in Table 1 for the sample sizes
n 30, 40, 50 and 60 and level
1%, 2.5%, 5%, 10% , 15%, and 20% with 500 replications. The entries in the row named df 0
N(0, 0.5) , whereas those in the
1, 2, 3 degrees of freedom
rows named df are generated from chi-square distribution having
such that E( k ) 0 and var( k ) 0.5 , for 1 , k n . We note that for the simulation the
are simulation results for which the errors are generated from
generation of the random errors are restricted neither to a specific family of distributions nor to the
distribution having variance 0.5 only. The variance as well as the family of the distribution may vary
as long as they satisfy the conditions specified in Theorem 1.
IV.
Example
As an example we consider the wheat-yield data (Mercer and Hall’s data) presented in
Cressie (1993), p. 454-455. The data are yield of grain (in pounds) observed over a 25 x 20 lattice of
plots having 20 rows running from east to west and 25 columns of plots running from north to south.
57
Model-Checks For Homoschedastic Spatial Linear Regression Model Based On Bootstrap Method
The experiment consists of giving the 500 plots the same treatment (presumably fertilizer, water,
etc.), from which it can be identified that the data are a realization of 500 independent random
variables. As was informed in Cressie (1993), the plots are assumed to be equally spaced with the
dimension of each plot being 10.82 ft by 8.05 ft. Figure 2 presents the perspective plot of the data.
Observing Figure 2, we postulate under H 0 a first order polynomial model as in Hypothesis
(16). Since the variance is unknown, we use a consistent estimator
data give
2
ˆ nm
= 0.1898, where
Yn
3
m
i 1
2
ˆ nm
nm
2
ˆ nm
. Calculated under H 0 , the
fi (
nm ), Yn m
fi (
nm ), f i (
n m
nm )
fi (
nm )
2
n m
n m
,
nm 3
( / 25, k / 20) : 1 25, 1 k 20 . The functions f 1 , f 2 , and f 3 are defined as in
Section 4. After computation, we get the critical value of the KS type test 1.5919, so bootstrap
approximation of the p-value of the test computed by simulation is given by
be concluded that H 0 is rejected for all suitable values of
ˆ = 0.0001. Thus it can
.
Figure 2. The perspective plot of Mercer and Hall’s data
V.
Concluding Remark
Efron residual based-bootstrap approximation to the KS type statistics based on least squares
residual partial sums processes of homoschedastic spatial linear regression model is consistent. In
the forthcoming paper we shall investigate the application of Efron as well as wild bootstrap in the
case of heteroschedastic spatial linear regression model.
The experimental design considered so far is restricted to the regular lattice on the unit square
I, since the prerequisite condition of
the theory is satisfied well. In practice this situation is
sometimes not reasonable, therefore we need to develop the theory in more general setting.
VI.
References
1.
Billingsley, P., 1968, Convergence of Probability Measures, John Wiley & Sons, Inc, New York,
Singapore, Toronto.
58
JIMT, Vol. 7, No.2, Nopember 2010 : 51 – 59
2.
Bischoff, W. and Miller, F., 2000, Asymptotically Optimal Tests and Optimal Designs for Testing
the Mean in Regression Models with Application to Change-Point Problems, Ann.
Inst. Statist. Math. Vol. 52, 658-679.
3.
Conway, J.B., 1985, A Course in Functional Analysis, Springer-Verlag New York Inc., New York.
4.
Cressie, Noel A. C., 1993, Statistics for Spatial Data, Revised Edition, John Wiley & Sons, Inc,
New York, Singapore, Toronto.
5.
MacNeill, I.B., 1978, Properties of Sequences of Partial Sums of Polynomial Regression
Residuals with Applications to Tests for Change of Regression at Unknown Times ,
Ann. Statist. Vol. 6, 422-433.
6.
MacNeill, I.B. and Jandhyala, V.K., 1993, Change Point Methods for Spatial Data, Multivariate
Environmental Statistics, eds. by G.P. Patil and C.R. Rao, Elsevier Science
Publishers B.V., 298-306.
7.
Park, W.J., 1971, Weak Convergence of Probability Measures on the Function Space
2
C( 0,1 ) , J. of Multivariate Analysis. Vol. 1, 433-444.
8.
Shao, J. and Tu, D., 1995, The Jackknife and Bootstrap, Springer-Verlag, New York.
9.
Stapleton, J. H., 1995, Linear Statistical Model, John Wiley & Sons Inc, New York, Toronto,
Singapore.
10. Xie, L. and MacNeill, I.B., 2006, Spatial Residual Processes and Boundary Detection, South
African Statist. J. Vol. 40, no. 1, 33-53.
59
ON BOOTSTRAP METHOD
W. Somayasa
Mathematics Department of Haluoleo University, Kendari
email: [email protected]
Abstract
In this paper we propose Efron residual based-bootstrap approximation methods in asymptotic modelchecks for homoschedastic spatial linear regression models. It is shown that under some regularity
conditions given to the known regression functions the bootstrap version of the sequence of least squares
residual partial sums processes converges in distribution to a centred Gaussian process having sample
paths in the space of continuous functions on
I : 0,1
0,1
. Thus, Efron residual based-bootstrap is
a consistent approximation in the usual sense. The finite sample performance of the bootstrap level
Kolmogorov-Smirnov (KS) type test is also investigated by means of Monte Carlo simulation.
Key words: residual based-bootstrap, asymptotic model-check, homoschedastic spatial linear regression
models, partial sums, Gaussian process.
I.
Introduction
Practically the correctness of the assumed linear models is usually evaluated by analysing the
cumulative sums (CUSUM) of the least squares residuals. To this end a huge amount of literature is
available, see among others MacNeill (1978) or Bischoff and Miller (2000) for one dimensional
context. In the spatial context we refer the reader to MacNeill and Jandhyala (1993) and Xie and
MacNeill (2006).
To see the above mentioned problem in more detail let us consider an experiment performed
on an experimental region I :
0,1
0,1 under n 2 experimental conditions taken from a regular
lattice, given by
n
:
( / n, k / n ) : 1 , k
I , n 1.
n
(1)
Accordingly, we put together the observations carried out in
Yn
n
:
n ,n
Yk k 1, 1
( / n, k / n ) , and
product
A
n n
A, B
n n
n n
n n
, where for
is the space of
: trace(A t B)
trace(A t A) , for A, B
1 , k n , Yk
in an
n
is the observation at the point
n n real matrices furnished with the Euclidean inner
and
n n
the
corresponding
norm
given
. It is worth noting that for our model we take
experimental region without loss of generality instead of any compact subset of
f :I
, let
f(
n ):
f ( / n, k / n)
n n matrix
n ,n
k 1, 1
n n
, then we have Yn
n
g(
2
by
I as the
. For any
n)
where g is a true but unknown real valued regression function having bounded variation on
En n ,
I and
Model-Checks For Homoschedastic Spatial Linear Regression Model Based On Bootstrap Method
En
n
n ,n
k k 1, 1 is the corresponding matrix of independent and identically distributed random
:
errors having mean zero and variance
2
(0, ) , for 1 , k n . We are interested in testing
the hypotheses
H 0 : g(
where
fp (
n ),, f p (
Wn : f1 (
n)
n)
Wn , n 1 ,
n n
is a sub space of
(2)
n n
generated by f1 (
,
n),
n).
H0 ,
Under
Rn
Wn vs. K : g(
n)
n
: pr
Yn
Wn
the
pr
n
Wn
matrix
En
of
least
squares
(Stapleton, 1995), where
n
pr
Wn
residuals
is
given
by
is the orthogonal projector onto
the orthogonal complement of W n . Without loss of the generality we assume throughout this work
that f1 (
n),
, fp (
n ) is an orthogonal basis of
Wn . Hence, by the elementary linear algebra
we get under H 0 ,
p
Rn
n
: En
n
n ), E n n
fi (
i 1
vec
Equivalently by the
fi (
n n
n ), f i (
fi (
n)
n)
.
(3)
n n
operator, we have
X n (X nt X n ) 1 X nt vec(E n n ) ,
vec(R n n ) vec(E n n )
where X n is the design matrix of the model under H 0 , i.e., an n
is given by the n
2
is given by ˆ n
2
dimensional vector vec(f k (
n
p matrix whose k th column
k 1, , p . A consistent estimator for
2
2
Rn
:
n )) ,
2
n
2
n n
p
2
in the sense ˆ n converges in probability to
Suppose that the regression functions f 1 , ,
integrable on I with respect to the Lebesgue measure
2
, as
n
.
f p are linearly independent and squared
I , then under
H 0 , MacNeill and Jandhyala
(1993) and Xie and MacNeill (2006) showed that
1
n
Tn (vec(R n n ))( )
where for
D
Bf ( ) in C I , as n
( t , s) I ,
f td
B f ( t , s) : B 2 ( t , s)
0, t
and
,
Tn :
n2
C(I)
is
the
I
G
1
(R )
partial
fdB 2 ,
(4)
I
0 ,s
sums
operator
defined
in
Park
(1971).
Here
p, p
G:
fkf d
I
p p
I
, B 2 is the standard Brownian sheet having sample pats in C I
k 1, 1
(R )
and
stands for the Riemann-Stieltjes integral.
52
JIMT, Vol. 7, No.2, Nopember 2010 : 51 – 59
tn
If
is
a
realisation
of
the
Kolmogorov-Smirnov
(KS)
statistics
1
Tn (vec(R n n ))( / n, k / n ) , then a level test of KS type based on residual
n
partial sums processes for testing (2) will reject H 0 if and only if t n c1 , where c1
is a
KS n ,f : max
0 ,k n
constant that satisfies the equation
PH 0
max
0 ,k n
1
Tn ( vec(R n n ))( / n, k / n )
n
c1
.
Hence, by the continuous mapping theorem (Billingsley, 1968, p. 29-30), c1
by means of the distribution of
can be approximated
sup B f ( t , s) .
0 t ,s 1
As it is presented in (4), the limiting distribution of the KS type statistics under H 0 is not
mathematically tractable, because it depends on the structure of the designs and the regression
functions. Therefore the application of the preceding tests procedure is limited. It is the purpose of
the preceding paper to investigate the performance of bootstrap test of (2). Although the application
of the bootstrap appears naturally in the present context, to the knowledge of the author this topic
has not found much attention in the literature. One purpose of the present paper is to demonstrate
that Efron residual based-bootstrap (Shao and Tu, 1995) provides simple and reliable alternative for
constructing asymptotic critical region of such a KS type test. This will be discussed in the Section 2.
In Section 3 we develop Monte Carlo simulation in investigating the finite sample characteristics of
the bootstrap test. An application of the developed method to a real data will be presented in Section
4. Finally we close the paper in Section 5 presenting conclusion and some remarks for future
researches.
II.
Bootstrap Approximation
k th row and th column computed
under H 0 , 1 , k n . We define an n n matrix of
Let rk be the component of the residual matrix in the
based on the original observations Yn
centred residuals
R̂ n
n
:
rk
n
n ,n
k 1, 1 ,
r
where
r:
empirical distribution function of the components of
1 , k n , i.e., for x
of
,
FR̂ (x) :
1
n2
n n
1
k 1 1
,x
1
n2
R̂ n
(rk
n
n
k 1 1
n
rk . Further, let FR̂ be the
putting mass
1
n2
to rk
r ) , where 1 A is the indicator function
A . Efron residual based-bootstrap defines the bootstrap observations by
Yn* n : Y*k
where
E *n
n
n ,n
k 1, 1
* n ,n
k k 1, 1
:
g(
n)
E *n n ,
is a random matrix whose components are a random sample from
and
g(
53
r , for
n)
p
i 1
fi (
n ), Yn n
fi (
n ), f i (
n n
n)
fi (
n n
n)
(5)
FR̂
Model-Checks For Homoschedastic Spatial Linear Regression Model Based On Bootstrap Method
is the ordinary least squares estimator of g (
and variance operators conditioned on
ˆ n*2 : var * (
and
n2 p
n2
*
k )
ˆ n2
n
) under H 0 . Let E * and var * be the expectation
FR̂ . Then by the construction it is clear that E * (
(r ) 2 which clearly converges in probability to
2
*
k )
0
by the
famous weak law of large number and Slutsky’s theorem.
The bootstrap analogous of the residual partial sums process is given by
1
Tn ( vec(R *n n ))( )
n ˆ n*
1
Tn ( vec(E *n n ) X n (X nt X n ) 1 X nt vec(E *n n ))( ) .
n ˆ n*
(6)
Hence, the bootstrap version of the KS type statistics can be represented by
KS *n ,f :
1
Tn ( vec(R *n n ))( / n, k / n ) .
*
n ˆn
max
0 ,k n
(7)
Theorem 1
Suppose that the regression functions f 1 , ,
squared integrable functions with respect to
f p are linearly independent in the space of
I , denoted by
L 2 ( I ) , continuous and have bounded
variation on I. Then under H 0 it holds,
1
Tn ( vec(R *n n ))( )
n ˆ n*
D
Bf ( ) in C I , as n
.
Proof
A : t1, t 2 s1, s 2 I , t1 t 2 , s1
f on I , the increment of f over A is defined by
I f : f (t 2 , s 2 ) - f (t 2 , s1 ) f (t1, s 2 ) f (t1, s1 ) .
s 2 and any real-valued function
For any rectangle
Further, for a fixed natural number n, let us define operators
O n : C( I )
U n : C(I)
n2
and
C(I) , such that for any u C(I) and ( t , s) I ,
n ,n
Ik u k 1, 1
U n (u ) : vec
,
(8)
(9)
O n (u)(t, s) : Tn U n (u) X n (X nt X n ) 1 X nt U n (u) (t, s) ,
(k 1) / n, k / n , 1 , k n . It is obvious that O n is linear by
where I k : ( 1) / n , / n
n2
the linearity of Tn . Moreover, since for every a
, U n (Tn (a ))
a , it follows that O n is
idempotent. It can also be shown by the expression
O n (u )(t , s)
Tn ( U n (u ))(t , s)
p
vec(f i (
n )), U n ( u )
vec(f i (
i 1
Tn ( vec(f i (
n )), vec(f i (
n )))(t , s)
n ))
,
by the definition of norm of a operator on Banach space (Conway, 1985, p. 70), that
p
On
13 1 4
i 1
fi
2
2
inf f i ( t, s)
.
0 t ,s 1
54
JIMT, Vol. 7, No.2, Nopember 2010 : 51 – 59
Thus, O n is uniformly bounded on
where .
C(I) . This will directly imply to the continuity of O n on C(I) ,
is the supremum norm. Let us define an operator
O : C(I)
C(I) , such that for every
u C(I) and ( t, s) I ,
O(u )(t , s) :
0, t
0 ,s
f td
u
0, t
G
I
1
(R )
fdu .
0 ,s
I
The right hand side of (9) also can be written by
O n (u )(t , s) : Tn U n (u ) ( t , s)
1
n2
where
n2
nt ns
t
nt ns
1 t
X n X n X nt U n (u) ,
2
n
Xn
is the vector whose first nt ns
zero. By the definition and by the continuity of
n
Tn U n (u ) ( t , s)
0, t
Furthermore, since for
1
n2
t
nt ns
u,
u , we have
( t , s) I .
(10)
i 1, , p , f i is bounded on I, then it holds component wise
n
Xn
0 ,s
components are one and the reminder is
f 1d I , ,
0, t
0 ,s
fpd
0,t
I
.
(11)
0 ,s
The convergence component wise
1
p,p
1
1 t
XnXn
n2
n
f i f jd
G 1,
I
I
(12)
i 1, j 1
follows from the fact that the mapping B
B
1
is continuous on the space of invertible matrices.
i 1, , p , f i has bounded variation and u is continuous on I, the RiemannStieltjes integral of f i with respect to u exists for all i. Consequently, we have the following
Further, since for
convergence component wise
(R )
(R )
n
X nt U n (u)
f1du, , f p du .
Thus, from (10), (11), (12) and (13) , for every
Next let (u n ) n
bounded on
0
be any sequence in
1
(13)
I
I
u C(I) , it holds O n (u ) O(u )
C(I) such that u n
u
n
n
0.
0 , then since O n is
C(I) , we finally have
O n ( u n ) O( u )
On
un
u
O n (u ) O(u )
n
0.
C(I) that satisfies the condition there exists a
that converges to u in C(I) , but the sequence (O n (u n )) n 1 does not converge
The last result shows that the only subset of
sequence (u n ) n
to
(1971), we get
55
1
O(u ) is an empty set. Hence, by Theorem 5.5 in Billingsley (1968) and Theorem 4 in Park
Model-Checks For Homoschedastic Spatial Linear Regression Model Based On Bootstrap Method
1
Tn vec(R *n n )
n ˆ n*
On
1
Tn vec(E *n n )
n ˆ n*
D
.
O(B 2 ) , as n
The proof of the theorem is complete since B 2 ( t , s) 0 , almost surely if t 0 or s 0 .
Corollary 1
By the continuous mapping theorem, under H 0 we have
D
KS *n ,f
Sup B f ( t , s) , as n
.
0 t ,s 1
(0,1) , the constant c 1
Hence, for
constant
c1*
that satisfies the equation P
defined in page 4 can now be approximated by the
*
KS *n ,f
c1*
KS type test will reject H 0 if
then an asymptotic level
. If t n is a realisation of
tn
KS n,f ,
c1* . We will illustrate in Section 4
the finite sample properties of this approach by means of simulation study.
Remark 3.
The extension of Theorem 1 to the case of
design
III.
n 1 n 2 n p regular lattice with n i
p 3 dimensional unit cube and experimental
n j , for i
j is straightforward.
Simulation
We develop Monte Carlo simulation for constructing the finite sample sizes critical region of
the bootstrap level
W vs. K : g
H0 : g
where W :
test of the hypotheses
f1 , f 2 , f 3
W' ,
(15)
and W ' : f 1 , f 2 , f 3 , f 4 , f 5 , f 6 , f 1 , f 2 , f 3 , f 4 , f 5 , f 6 : I
, defined
by f 1 ( t , s) : 1 , f 2 ( t , s) : t , f 3 ( t , s) : s ,
f 4 ( t , s) : t 2 , f 5 ( t , s) : ts , and f 6 ( t, s) : s 2 . The
simulation is developed according to the following algorithm.
Begin Algorithm
n n matrix of pseudo random errors E n n from a distribution having mean zero
Step 1: Generate
and variance
2
.
Step 2: Compute the least squares residuals by the equation
vec(R n n ) vec(E n n ) X n (X nt X n ) 1 X nt vec(E n n ) .
Step 3: Generate
n n matrix of bootstrap errors E *n n by sampling with replacement from the
centred residuals.
Step 4: Compute the bootstrap residuals by the equation
vec(R *n n ) vec(E *n n ) X n (X nt X n ) 1 X nt vec(E *n n ) .
Step 5: Compute
ˆ n* and KS *n ,f .
Step 6: Repeat Step 1 to Step 5 M times.
Step 7: Compute
c1* : sort M values of KS *n ,f obtained in Step 5 in ascending order, i.e.,
1)
KS *(
n ,f
2)
KS *(
n ,f
M)
KS *(
n ,f ,
56
JIMT, Vol. 7, No.2, Nopember 2010 : 51 – 59
(1
KS *(M
n,f
then
α))
; if M (1 - α) Ν
c1*
,
(1 α) 1)
KS *(M
n,f
; if M (1 - α) Ν
where N is the set of natural numbers.
End Algorithm
n
30
40
50
M=500
0.010
0.025
0.050
df 0
0.965
0.893
0.844
df1
1.183
1.119
df 2
1.381
df 3
0.100
0.150
0.200
0.776
0.746
0.717
1.057
0.994
0.954
0.928
1.287
1.168
1.094
1.047
1.013
1.327
1.240
1.150
1.092
1.026
0.970
df 0
0.934
0.868
0.816
0.757
0.730
0.697
df1
1.490
1.331
1.214
1.129
1.056
1.017
df 2
1.304
1.232
1.184
1.093
1.030
0.989
df 3
1.133
1.085
1.052
0.995
0.945
0.919
df 0
0.918
0.893
0.853
0.776
0.736
0.717
df1
1.316
1.234
1.165
1.081
1.023
0.993
df 2
1.291
1.175
1.147
1.086
1.045
1.000
df 3
1.336
1.220
1.173
1.098
1.047
1.014
Table 1. Simulated rejection probabilities of the Kolmogorov-Smirnov bootstrap test.
For the simulation, we generate the error variables from i.i.d. random variables having mean
zero and variance 0.5. Four cases are considered, that is for
k
~ N(0, 0.5) and
k
~
2
4 ,
1 , k n ,
1, 2, 3 .
The simulation results are depicted in Table 1 for the sample sizes
n 30, 40, 50 and 60 and level
1%, 2.5%, 5%, 10% , 15%, and 20% with 500 replications. The entries in the row named df 0
N(0, 0.5) , whereas those in the
1, 2, 3 degrees of freedom
rows named df are generated from chi-square distribution having
such that E( k ) 0 and var( k ) 0.5 , for 1 , k n . We note that for the simulation the
are simulation results for which the errors are generated from
generation of the random errors are restricted neither to a specific family of distributions nor to the
distribution having variance 0.5 only. The variance as well as the family of the distribution may vary
as long as they satisfy the conditions specified in Theorem 1.
IV.
Example
As an example we consider the wheat-yield data (Mercer and Hall’s data) presented in
Cressie (1993), p. 454-455. The data are yield of grain (in pounds) observed over a 25 x 20 lattice of
plots having 20 rows running from east to west and 25 columns of plots running from north to south.
57
Model-Checks For Homoschedastic Spatial Linear Regression Model Based On Bootstrap Method
The experiment consists of giving the 500 plots the same treatment (presumably fertilizer, water,
etc.), from which it can be identified that the data are a realization of 500 independent random
variables. As was informed in Cressie (1993), the plots are assumed to be equally spaced with the
dimension of each plot being 10.82 ft by 8.05 ft. Figure 2 presents the perspective plot of the data.
Observing Figure 2, we postulate under H 0 a first order polynomial model as in Hypothesis
(16). Since the variance is unknown, we use a consistent estimator
data give
2
ˆ nm
= 0.1898, where
Yn
3
m
i 1
2
ˆ nm
nm
2
ˆ nm
. Calculated under H 0 , the
fi (
nm ), Yn m
fi (
nm ), f i (
n m
nm )
fi (
nm )
2
n m
n m
,
nm 3
( / 25, k / 20) : 1 25, 1 k 20 . The functions f 1 , f 2 , and f 3 are defined as in
Section 4. After computation, we get the critical value of the KS type test 1.5919, so bootstrap
approximation of the p-value of the test computed by simulation is given by
be concluded that H 0 is rejected for all suitable values of
ˆ = 0.0001. Thus it can
.
Figure 2. The perspective plot of Mercer and Hall’s data
V.
Concluding Remark
Efron residual based-bootstrap approximation to the KS type statistics based on least squares
residual partial sums processes of homoschedastic spatial linear regression model is consistent. In
the forthcoming paper we shall investigate the application of Efron as well as wild bootstrap in the
case of heteroschedastic spatial linear regression model.
The experimental design considered so far is restricted to the regular lattice on the unit square
I, since the prerequisite condition of
the theory is satisfied well. In practice this situation is
sometimes not reasonable, therefore we need to develop the theory in more general setting.
VI.
References
1.
Billingsley, P., 1968, Convergence of Probability Measures, John Wiley & Sons, Inc, New York,
Singapore, Toronto.
58
JIMT, Vol. 7, No.2, Nopember 2010 : 51 – 59
2.
Bischoff, W. and Miller, F., 2000, Asymptotically Optimal Tests and Optimal Designs for Testing
the Mean in Regression Models with Application to Change-Point Problems, Ann.
Inst. Statist. Math. Vol. 52, 658-679.
3.
Conway, J.B., 1985, A Course in Functional Analysis, Springer-Verlag New York Inc., New York.
4.
Cressie, Noel A. C., 1993, Statistics for Spatial Data, Revised Edition, John Wiley & Sons, Inc,
New York, Singapore, Toronto.
5.
MacNeill, I.B., 1978, Properties of Sequences of Partial Sums of Polynomial Regression
Residuals with Applications to Tests for Change of Regression at Unknown Times ,
Ann. Statist. Vol. 6, 422-433.
6.
MacNeill, I.B. and Jandhyala, V.K., 1993, Change Point Methods for Spatial Data, Multivariate
Environmental Statistics, eds. by G.P. Patil and C.R. Rao, Elsevier Science
Publishers B.V., 298-306.
7.
Park, W.J., 1971, Weak Convergence of Probability Measures on the Function Space
2
C( 0,1 ) , J. of Multivariate Analysis. Vol. 1, 433-444.
8.
Shao, J. and Tu, D., 1995, The Jackknife and Bootstrap, Springer-Verlag, New York.
9.
Stapleton, J. H., 1995, Linear Statistical Model, John Wiley & Sons Inc, New York, Toronto,
Singapore.
10. Xie, L. and MacNeill, I.B., 2006, Spatial Residual Processes and Boundary Detection, South
African Statist. J. Vol. 40, no. 1, 33-53.
59