Directory UMM :Data Elmu:jurnal:J-a:Journal of Econometrics:Vol98.Issue1.Sep2000:
Journal of Econometrics 98 (2000) 1}25
The strength of evidence for unit
autoregressive roots and structural breaks:
A Bayesian perspective
John Marriott!, Paul Newbold",*
!Department of Mathematics, Statistics and Operational Research, Nottingham Trent University,
Nottingham NG1 4BU, UK
"Department of Economics, University of Nottingham, Nottingham NG7 2RD, UK
Received 1 January 1998; received in revised form 1 October 1999
Abstract
Economic time series may be generated by a process with a unit autoregressive root,
and the generating process may exhibit an abrupt break in trend. It is well known that the
outcomes of classical tests for either one of these phenomena can be seriously in#uenced
when the presence of the other is ignored. Therefore, care is required in disentangling
evidence in the data supporting the two phenomena, and there is some question as to the
extent to which such disentanglement is feasible. We approach this question from
a Bayesian perspective, assessing the impact on the strength of evidence for each of the
phenomena in the presence of the other. ( 2000 Elsevier Science S.A. All rights
reserved.
JEL classixcation: C12; C15
Keywords: Bayesian analysis; Posterior odds; Structural breaks; Unit autoregressive
roots
* Corresponding author. Tel.: #44-115-951-5392; fax: #44-115-951-4159.
E-mail address: [email protected] (P. Newbold).
0304-4076/00/$ - see front matter ( 2000 Elsevier Science S.A. All rights reserved.
PII: S 0 3 0 4 - 4 0 7 6 ( 9 9 ) 0 0 0 7 7 - 9
2
J. Marriott, P. Newbold / Journal of Econometrics 98 (2000) 1}25
1. Introduction
In a seminal paper, Perron (1989) drew attention to the fact that Dickey}
Fuller tests of the null hypothesis of a unit autoregressive root in the generating
process of a time series could have very low power when the true process
was stationary around a broken trend. Moreover, it was demonstrated
that incorporating the possibility of a trend break at a given point in time
could have a dramatic impact on the outcome of unit root tests. Subsequently,
many authors, including Christiano (1992) and Zivot and Andrews (1992),
stressed the importance of endogenous rather than exogenous selection of
a break date. Ideally, as indicated for example by Perron (1989,1994),
Banerjee et al. (1992) and Nunes et al. (1997), a complete analysis should
permit the possibility of a break under the unit root speci"cation as well as
under trend stationarity. Then, as noted by Vogelsang and Perron (1998),
Dickey}Fuller-type tests, allowing for a break at an unknown point in time, can
have unreliable size properties when there is a break under the null. Alternatively, as for example in Andrews (1993), one might want to test the null
hypothesis of no trend break against the alternative of a break at an unknown
point in time. Chu and White (1992) show that such tests can reject the null
hypothesis far too often when the true generating process has a unit autoregressive root and no break. Further evidence on this phenomenon is provided by
Nunes et al. (1995) and Bai (1998). The foregoing discussion suggests that there
may be considerable di$culty in disentangling from data evidence on the
unit root/stationarity dichotomy from evidence on the break/no break dichotomy. Hendry and Neale (1991) provide informal graphical illustration of this
point.
In this paper we begin an exploration of the possibility of separating evidence
on these two issues. Speci"cally, we adopt a Bayesian perspective, and take the
posterior odds of a particular speci"cation as a measure of the strength of
support in the data for that speci"cation. We concentrate on the comparison of
an ARIMA(p, 1, q) model with a stationary ARMA(p#1, q) model with unknown mean, but no trend. In the latter case, we allow the possibility of a single
abrupt change in mean at an unknown point in time, corresponding to an
outlier in the series of "rst di!erences. Our model is then in the form of the
additive outlier model analysed by Perron and Vogelsang (1992), though we
treat (p, q) as "xed. A Bayesian analysis of this general model is provided in
Section 2 of the paper. In Section 3 we specialise to the simplest possible case
where, apart from the possibility a break, the generating model is either a random walk with no drift or a stationary "rst-order autoregression with unknown
mean. This allows us to conduct a simulation experiment to estimate what, on
average, might be found for posterior probabilities of a unit autoregressive root
and of a structural break under particular true model speci"cations. In this way
it is possible to assess the impact of a break on the strength of evidence for unit
J. Marriott, P. Newbold / Journal of Econometrics 98 (2000) 1}25
3
root/stationarity, and the e!ect of the stochastic structure of the model on the
strength of evidence for a break.
There is a substantial literature on the Bayesian analysis of the possibility of
a unit autoregressive root, though practically all of it neglects the possibility of
a structural break. Most of the issues involved are most easily discussed in terms
of the "rst-order autoregressive model for the series >
t
(1!/¸)(> !k)"e
(1)
t
t
where ¸ is the lag operator, and e is a zero-mean white noise, generally assumed
t
to be Gaussian. The process is stationary for D/D(1, while /"1 corresponds to
a random walk. The great majority of the classical theoretical and applied
econometric literature seeks to distinguish between these two alternatives, and
that is the problem approached here within the Bayesian paradigm. Interest in
the Bayesian approach to this problem dates from Sims (1988) and Sims and
Uhlig (1991). While several authors, including Phillips and Ploberger (1996) and
Kim (1998) have addressed asymptotic properties of posterior distributions, and
associated decision rules, our interest here is in practical implementation where
sample sizes need not be large. Bayesian approaches to this problem have been
considered by many authors, including De Jong and Whiteman (1991), Phillips
(1991), Poirier (1991), Schotman and Van Dijk (1991a,b), Koop (1992), Uhlig
(1994a,b), Schotman (1994), Zivot (1994), Lubrano (1995) and Marriott and
Newbold (1998).
Ignoring for the present the choice of prior, there are at least two distinct
approaches to posterior odds calculations. In one of these, values for the
autoregressive parameter / of (1) are permitted to be greater than one, thus
allowing the possibility of explosive models. Then, given the posterior density
for /, posterior probabilities for /(1 are taken as measures of the strength of
evidence for stationarity, though De Jong and Whiteman (1991) modify this
somewhat by computing the probability of /(/ for some / a little less than
0
0
one. One objection to this approach is doubt that economists truly have much,
if any, prior belief in explosivity. Certainly in the classical non-Bayesian
Dickey}Fuller-test-based approach to the problem /"1 is assumed when the
null hypothesis is not rejected in favour of the alternative of stationarity, and it is
rare indeed to "nd explosive models "tted in practical applications. In line with
the classical approach, Marriott and Newbold (1998) attach point prior probability mass to /"1 and permit only stationary alternatives. Much of the
Bayesian unit root literature has been devoted to the choice of prior for the
parameters of (1). The "rst issue concerns the parameter k, which is the mean of
the process under stationarity, but is unde"ned under /"1, leading to di$culties noted by Schotman and Van Dijk (1991b). Moreover Schotman (1994), in
a paper reviewing much of the literature on the choice of prior for this problem,
notes that prior assumptions on the dependence of / on k can strongly in#uence
conclusions about the unit root question. However, as argued in Marriott and
4
J. Marriott, P. Newbold / Journal of Econometrics 98 (2000) 1}25
Newbold (1998), we "nd it inconsistent to simultaneously hold a proper prior for
k under stationarity while attaching non-zero prior probability to the random
walk (/"1) model, where k is unde"ned. It is therefore tempting under
stationarity to adopt an uninformative uniform improper prior for k. However,
as discussed by O'Hagan (1995) in a general framework and by Schotman and
Van Dijk (1991b) in the present context, when improper priors are used for
parameters occurring in one model but not the other, posterior odds ratios are
unde"ned. Marriott and Newbold (1998) circumvent this di$culty by analysing
the series of "rst di!erences, showing that this involves no information loss
about the autoregressive parameter under stationarity compared with the analysis of levels based on an improper prior on the mean. We shall follow the same
strategy in this paper. Of course, in di!erencing, all information about the mean
is lost, though in the context of the present paper we employ a proper prior on
the amount of any mean shift amount, so that a posterior density for that
parameter could be derived.
It remains to consider the speci"cation of a prior for the autoregressive
parameter / of (1). Our approach is #exible in the sense that we feel it is useful to
allow this to represent an analyst's genuine prior beliefs, recognising that the
choice of prior can then exert strong in#uence on posterior inference. We shall
adopt the &purist' view that such prior belief cannot logically depend on the data,
or even on the number of observations. This would eliminate from consideration
the approach of Schotman and Van Dijk (1991a). Following a demonstration of
the inevitability of the dependence of posterior odds on the prior in the random
walk versus stationary autoregression case } so that no prior on / can be truly
uninformative on this matter } these authors produced an ingenious Bayesian
analysis of a simple variant of the problem that coincided with the classical
Dickey}Fuller analysis. However, to achieve this requires a heavily datadependent prior. A great deal of consideration, for example Berger and Yang
(1994) and Uhlig (1994a), has been given to the derivation of the &exact'
Je!reys prior. Two di$culties are that this prior depends on the number of
observations and is improper. It would seem more reasonable to follow
Je!reys (1967), Box and Jenkins (1970), and others in adopting the limiting
case as the number of observations becomes in"nite, giving a prior proportional to (1!/2)~1@2. However, as Zellner (1997) points out, Je!reys
himself viewed as ¬ tolerable' the implication of a singularity at /"!1.
Zellner further shows that application of the maximal data information prior
approach (MDIP) in this case leads to a prior proportional to (1!/2)1@2.
This would be inconvenient for our purposes given that it approaches
zero as / approaches one, while we wish to attach non-zero prior probability mass to /"1. Note however that if the domain of / is extended so that
both stationary and explosive models are allowed, but restricting to a "nite
range of parameter values, application of the MDIP approach leads to a
uniform prior.
J. Marriott, P. Newbold / Journal of Econometrics 98 (2000) 1}25
5
In this paper we adopt two approaches. First, as a benchmark we consider
a prior for / that is uniform on (!1, 1). This at least has the virtue of ease of
interpretation, while as illustrated in Fig. 2 and the associated discussion of
Uhlig (1994b), in the region (!1, 1) it does not di!er radically from alternatives
that have been proposed. That is not the case outside of that region, leading to
a critique of the uniform prior by Phillips (1991) when explosive models are
permitted. Then Phillips argues strongly for the Je!reys prior, demonstrating
that its adoption can lead to large, and desirable, di!erences in posterior
inference. It is ironic that this outcome is achieved by placing additional prior
probability on /'1, a region in which there is little evidence that economists
have genuine prior belief. This is, of course, an inevitable consequence of taking
the posterior probability of /(1 as a measure of the strength of evidence for
stationarity. Second, we follow Marriott and Newbold (1998) in arguing that
imposition of point prior probability mass at /"1 would logically be associated with prior belief that the probability of / in a region close to one would be
much greater than the probability of / in a comparable range far from one.
These authors propose the beta prior with density
C(a#b)
(1#/)a~1(1!/)b~1, D/D(1
p(/)"
C(a)C(b)2a`b~1
(2)
with b"0.5, giving, as seems desirable in this application, a singularity at /"1.
The larger is the parameter a in (2), the more tightly concentrated towards /"1
is the prior density, and the analyst can select this parameter according to belief.
For example, as emphasised by Sims (1988) and Geweke (1994), all else equal the
higher the frequency of observation, the larger would one expect the autoregressive parameter to be. Marriott and Newbold (1998) obtained simulation results
with attractive properties using a"5, and for illustration we shall use this value
here for comparison with the uniform prior.
In the remainder of this paper we allow for the possibilities of both a unit
autoregressive root and a structural break. Although this seems like a natural
problem for Bayesian analysis, generating posterior odds, relatively little work
along these lines has been reported. An exception is De Jong (1996), whose
analysis is in the same vein as De Jong and Whiteman (1991), taking the
posterior probability of /(0.98 as a measure of the evidence for stationarity.
Further, by contrast with our approach, point probability mass is not put on the
possibility of no break.
2. Bayesian model selection for an ARMA generating process
Assume that the underlying generating process is ARMA, with the orders of
the autoregressive and moving average polynomial operators taken as given.
6
J. Marriott, P. Newbold / Journal of Econometrics 98 (2000) 1}25
Attention will be restricted to the case where, under stationarity with no breaks,
the generating model has unknown mean but no trend, though extension to
stationarity around a linear trend is quite straightforward. There is uncertainty
about the presence of a unit autoregressive root in the model and also about
whether there is a break in mean under stationarity. Moreover, if such a break
exists, its location is uncertain. Within this framework we seek posterior probabilities for each of four possible structures allowing for stationarity/unit
autoregressive root and break/no break.
Let > (t"0, 1,2, n) denote an observed time series. As one possible
t
generating process, we consider the ARIMA(p, 1, q) model
a(¸)(1!¸)> "h(¸)e
(3)
t
t
where e is zero-mean white noise, with variance p2,
t
a(¸)"1!a ¸!2!a ¸p, h(¸)"1!h ¸!2!h ¸q
1
p
1
q
and ¸ is the lag operator. It is assumed that (p, q) are given, that the conditions
for stationarity and invertibility are satis"ed, and that a(¸) and h(¸) have no
common factors. The stationary alternative to (3) is the ARMA(p#1, q) model
D
C
p
(4)
1!(/#a )¸! + (a !a )¸j#a ¸p`1 (> !k)"h(¸)e
t
t
1
j
j~1
p
j/2
which reduces to (3) when /"1. The two structural break alternatives we
consider are
G
k #u ; t)n
t
1
>" 1
t
k #u ; t*n #1
2
t
1
a(¸) (1!¸)u "h(¸)e
t
t
and (5) in conjunction with
C
(5)
D
p
1!(/#a )¸! + (a !a )¸j#a ¸p`1 u "h(¸)e .
t
t
1
j
j~1
p
j/2
In what follows we adopt the approach of Marriott and Newbold (1998) and
formulate the four models in terms of the "rst di!erences, = "(1!¸)> . In the
t
t
case of the structural break models we will be working with
= "dX #(1!¸)u
t
t
t
where
X 1 "1
n `1
X "0; tOn #1
t
1
(6)
J. Marriott, P. Newbold / Journal of Econometrics 98 (2000) 1}25
7
that is, = has a single &outlier', of magnitude d"k !k , at time n #1. The
t
2
1
1
models that are to be analysed are therefore
M : a(¸)= "h(¸)e ,
1
t
t
C
D
C
D
p
M : 1!(/#a )¸! + (a !a )¸j#a ¸p`1 = "h(¸)(1!¸)e ,
2
1
j
j~1
p
t
t
j/2
M : a(¸)(= !dX )"h(¸)e ,
3
t
t
t
p
M : 1!(/#a )¸! + (a !a )¸j#a ¸p`1 (= !dX )
4
1
j
j~1
p
t
t
j/2
" h(¸)(1!¸)e ,
t
with X de"ned in (6). The two unit root models are M and M , and the
t
1
3
structural break models are M and M .
3
4
Given a sample ="(= ,2, = ), the Bayesian comparison of the four
1
n
models proceeds by computing the posterior model probabilities, which are
given by Bayes' theorem
P(M )P(=DM )
i
i
.
(7)
P(M D=)"
i
+4 P(M )P(=DM )
i/1
i
i
where P(M ) is the prior probability assigned to model M . If we write c for the
i
i
i
vector of ARMA parameters of model M then
i
=
P(=DM )"
p(c , pDM )p(=Dc , p, M ) dp dc
i
i
i
i
i
i
ci 0
gives the integrated joint densities of (c , p, =) for M and M and
i
1
2
= =
P(=DM )"+
p(c , d, p, n DM )p(=Dc , d, p, n , M ) dd dp dc
i
i
1 i
i
1
i
i
n1 ci 0 ~=
gives the integrated joint densities of (c , d, p, n , =) for M and M . Here,
i
1
3
4
p(c , pDM ) and p(c , d, p, n DM ) are the joint prior densities for the parameters
i
i
i
1 i
and p(=Dc , p, M ) and p(=Dc , d, p, n , M ) are the likelihoods. For the api
i
i
1
i
proach we are adopting here we assume the four models are equally likely
a priori, so that P(M )"0.25, implying that the marginal prior probability for
i
a unit root model is P(M )#P(M )"0.5.
1
3
The likelihoods can be determined through the Kalman "lter or algorithms of
Newbold (1974) or Ansley (1979) and, for given n , take the form
1
1
S(c )
(8)
p(=Dc , d, p, M )"(2np2)~(1@2)nDA D~1@2 exp !
i
i
i
2p2 i
PP
PPP
G
H
8
J. Marriott, P. Newbold / Journal of Econometrics 98 (2000) 1}25
where the elements of the matrix A will involve only /, (a ) and (h ) and S(c ) is
i
i
i
i
quadratic in the = or (= !dX ). We propose the following prior densities for
t
t
t
the di!erent models:
M : p(c , pDM )"p(c )p(p),
1
1
1
1
M : p(c , pDM )"p(c )p(p),
2
2
2
2
M : p(c , d, p, n DM )"p(c )p(dDp)p(p)p(n ),
3
3
1 3
3
1
M : p(c , d, p, n DM )"p(c )p(dDp)p(p)p(n ),
4
4
1 4
4
1
where p(c ) is uniform, p(dDp) is N(0, k2p2) and p(p) is the non-informative density
i
p(p)"p~1. The prior density for n is taken as the discrete uniform density
1
p(n )"n~1 for n "0,2, n!1.
1
1
For M and M the joint densities of (c , p, =) are given by
1
2
i
1
1
DA D~1@2 exp !
S(c ) p(c ).
p(c , p, =DM )"(2n)~n@2
i
i
i
i
2p2 i
pn`1
G
H
Integrating this with respect to p gives
C(n )
p(c , =DM )" 2 DA D~1@2[S(c )]~n@2p(c )
i
i
i
i
2nn@2 i
(9)
with p(=DM ) found by integrating out c . For M and M the joint densities for
i
i
3
4
(c , d, p, n , =) are
i
1
1
p(c , d, p, n , =DM )"(2n)~(n`1)@2
DA D~1@2
i
1
i
nkpn`2 i
G
H
1
1
]exp !
S(c , n )!
d2 p(c ).
i
1
i
2p2
2k2p2
(10)
Using Newbold (1974) it is straightforward to show that for these models
S(c , n )"S(c )!2d
The strength of evidence for unit
autoregressive roots and structural breaks:
A Bayesian perspective
John Marriott!, Paul Newbold",*
!Department of Mathematics, Statistics and Operational Research, Nottingham Trent University,
Nottingham NG1 4BU, UK
"Department of Economics, University of Nottingham, Nottingham NG7 2RD, UK
Received 1 January 1998; received in revised form 1 October 1999
Abstract
Economic time series may be generated by a process with a unit autoregressive root,
and the generating process may exhibit an abrupt break in trend. It is well known that the
outcomes of classical tests for either one of these phenomena can be seriously in#uenced
when the presence of the other is ignored. Therefore, care is required in disentangling
evidence in the data supporting the two phenomena, and there is some question as to the
extent to which such disentanglement is feasible. We approach this question from
a Bayesian perspective, assessing the impact on the strength of evidence for each of the
phenomena in the presence of the other. ( 2000 Elsevier Science S.A. All rights
reserved.
JEL classixcation: C12; C15
Keywords: Bayesian analysis; Posterior odds; Structural breaks; Unit autoregressive
roots
* Corresponding author. Tel.: #44-115-951-5392; fax: #44-115-951-4159.
E-mail address: [email protected] (P. Newbold).
0304-4076/00/$ - see front matter ( 2000 Elsevier Science S.A. All rights reserved.
PII: S 0 3 0 4 - 4 0 7 6 ( 9 9 ) 0 0 0 7 7 - 9
2
J. Marriott, P. Newbold / Journal of Econometrics 98 (2000) 1}25
1. Introduction
In a seminal paper, Perron (1989) drew attention to the fact that Dickey}
Fuller tests of the null hypothesis of a unit autoregressive root in the generating
process of a time series could have very low power when the true process
was stationary around a broken trend. Moreover, it was demonstrated
that incorporating the possibility of a trend break at a given point in time
could have a dramatic impact on the outcome of unit root tests. Subsequently,
many authors, including Christiano (1992) and Zivot and Andrews (1992),
stressed the importance of endogenous rather than exogenous selection of
a break date. Ideally, as indicated for example by Perron (1989,1994),
Banerjee et al. (1992) and Nunes et al. (1997), a complete analysis should
permit the possibility of a break under the unit root speci"cation as well as
under trend stationarity. Then, as noted by Vogelsang and Perron (1998),
Dickey}Fuller-type tests, allowing for a break at an unknown point in time, can
have unreliable size properties when there is a break under the null. Alternatively, as for example in Andrews (1993), one might want to test the null
hypothesis of no trend break against the alternative of a break at an unknown
point in time. Chu and White (1992) show that such tests can reject the null
hypothesis far too often when the true generating process has a unit autoregressive root and no break. Further evidence on this phenomenon is provided by
Nunes et al. (1995) and Bai (1998). The foregoing discussion suggests that there
may be considerable di$culty in disentangling from data evidence on the
unit root/stationarity dichotomy from evidence on the break/no break dichotomy. Hendry and Neale (1991) provide informal graphical illustration of this
point.
In this paper we begin an exploration of the possibility of separating evidence
on these two issues. Speci"cally, we adopt a Bayesian perspective, and take the
posterior odds of a particular speci"cation as a measure of the strength of
support in the data for that speci"cation. We concentrate on the comparison of
an ARIMA(p, 1, q) model with a stationary ARMA(p#1, q) model with unknown mean, but no trend. In the latter case, we allow the possibility of a single
abrupt change in mean at an unknown point in time, corresponding to an
outlier in the series of "rst di!erences. Our model is then in the form of the
additive outlier model analysed by Perron and Vogelsang (1992), though we
treat (p, q) as "xed. A Bayesian analysis of this general model is provided in
Section 2 of the paper. In Section 3 we specialise to the simplest possible case
where, apart from the possibility a break, the generating model is either a random walk with no drift or a stationary "rst-order autoregression with unknown
mean. This allows us to conduct a simulation experiment to estimate what, on
average, might be found for posterior probabilities of a unit autoregressive root
and of a structural break under particular true model speci"cations. In this way
it is possible to assess the impact of a break on the strength of evidence for unit
J. Marriott, P. Newbold / Journal of Econometrics 98 (2000) 1}25
3
root/stationarity, and the e!ect of the stochastic structure of the model on the
strength of evidence for a break.
There is a substantial literature on the Bayesian analysis of the possibility of
a unit autoregressive root, though practically all of it neglects the possibility of
a structural break. Most of the issues involved are most easily discussed in terms
of the "rst-order autoregressive model for the series >
t
(1!/¸)(> !k)"e
(1)
t
t
where ¸ is the lag operator, and e is a zero-mean white noise, generally assumed
t
to be Gaussian. The process is stationary for D/D(1, while /"1 corresponds to
a random walk. The great majority of the classical theoretical and applied
econometric literature seeks to distinguish between these two alternatives, and
that is the problem approached here within the Bayesian paradigm. Interest in
the Bayesian approach to this problem dates from Sims (1988) and Sims and
Uhlig (1991). While several authors, including Phillips and Ploberger (1996) and
Kim (1998) have addressed asymptotic properties of posterior distributions, and
associated decision rules, our interest here is in practical implementation where
sample sizes need not be large. Bayesian approaches to this problem have been
considered by many authors, including De Jong and Whiteman (1991), Phillips
(1991), Poirier (1991), Schotman and Van Dijk (1991a,b), Koop (1992), Uhlig
(1994a,b), Schotman (1994), Zivot (1994), Lubrano (1995) and Marriott and
Newbold (1998).
Ignoring for the present the choice of prior, there are at least two distinct
approaches to posterior odds calculations. In one of these, values for the
autoregressive parameter / of (1) are permitted to be greater than one, thus
allowing the possibility of explosive models. Then, given the posterior density
for /, posterior probabilities for /(1 are taken as measures of the strength of
evidence for stationarity, though De Jong and Whiteman (1991) modify this
somewhat by computing the probability of /(/ for some / a little less than
0
0
one. One objection to this approach is doubt that economists truly have much,
if any, prior belief in explosivity. Certainly in the classical non-Bayesian
Dickey}Fuller-test-based approach to the problem /"1 is assumed when the
null hypothesis is not rejected in favour of the alternative of stationarity, and it is
rare indeed to "nd explosive models "tted in practical applications. In line with
the classical approach, Marriott and Newbold (1998) attach point prior probability mass to /"1 and permit only stationary alternatives. Much of the
Bayesian unit root literature has been devoted to the choice of prior for the
parameters of (1). The "rst issue concerns the parameter k, which is the mean of
the process under stationarity, but is unde"ned under /"1, leading to di$culties noted by Schotman and Van Dijk (1991b). Moreover Schotman (1994), in
a paper reviewing much of the literature on the choice of prior for this problem,
notes that prior assumptions on the dependence of / on k can strongly in#uence
conclusions about the unit root question. However, as argued in Marriott and
4
J. Marriott, P. Newbold / Journal of Econometrics 98 (2000) 1}25
Newbold (1998), we "nd it inconsistent to simultaneously hold a proper prior for
k under stationarity while attaching non-zero prior probability to the random
walk (/"1) model, where k is unde"ned. It is therefore tempting under
stationarity to adopt an uninformative uniform improper prior for k. However,
as discussed by O'Hagan (1995) in a general framework and by Schotman and
Van Dijk (1991b) in the present context, when improper priors are used for
parameters occurring in one model but not the other, posterior odds ratios are
unde"ned. Marriott and Newbold (1998) circumvent this di$culty by analysing
the series of "rst di!erences, showing that this involves no information loss
about the autoregressive parameter under stationarity compared with the analysis of levels based on an improper prior on the mean. We shall follow the same
strategy in this paper. Of course, in di!erencing, all information about the mean
is lost, though in the context of the present paper we employ a proper prior on
the amount of any mean shift amount, so that a posterior density for that
parameter could be derived.
It remains to consider the speci"cation of a prior for the autoregressive
parameter / of (1). Our approach is #exible in the sense that we feel it is useful to
allow this to represent an analyst's genuine prior beliefs, recognising that the
choice of prior can then exert strong in#uence on posterior inference. We shall
adopt the &purist' view that such prior belief cannot logically depend on the data,
or even on the number of observations. This would eliminate from consideration
the approach of Schotman and Van Dijk (1991a). Following a demonstration of
the inevitability of the dependence of posterior odds on the prior in the random
walk versus stationary autoregression case } so that no prior on / can be truly
uninformative on this matter } these authors produced an ingenious Bayesian
analysis of a simple variant of the problem that coincided with the classical
Dickey}Fuller analysis. However, to achieve this requires a heavily datadependent prior. A great deal of consideration, for example Berger and Yang
(1994) and Uhlig (1994a), has been given to the derivation of the &exact'
Je!reys prior. Two di$culties are that this prior depends on the number of
observations and is improper. It would seem more reasonable to follow
Je!reys (1967), Box and Jenkins (1970), and others in adopting the limiting
case as the number of observations becomes in"nite, giving a prior proportional to (1!/2)~1@2. However, as Zellner (1997) points out, Je!reys
himself viewed as ¬ tolerable' the implication of a singularity at /"!1.
Zellner further shows that application of the maximal data information prior
approach (MDIP) in this case leads to a prior proportional to (1!/2)1@2.
This would be inconvenient for our purposes given that it approaches
zero as / approaches one, while we wish to attach non-zero prior probability mass to /"1. Note however that if the domain of / is extended so that
both stationary and explosive models are allowed, but restricting to a "nite
range of parameter values, application of the MDIP approach leads to a
uniform prior.
J. Marriott, P. Newbold / Journal of Econometrics 98 (2000) 1}25
5
In this paper we adopt two approaches. First, as a benchmark we consider
a prior for / that is uniform on (!1, 1). This at least has the virtue of ease of
interpretation, while as illustrated in Fig. 2 and the associated discussion of
Uhlig (1994b), in the region (!1, 1) it does not di!er radically from alternatives
that have been proposed. That is not the case outside of that region, leading to
a critique of the uniform prior by Phillips (1991) when explosive models are
permitted. Then Phillips argues strongly for the Je!reys prior, demonstrating
that its adoption can lead to large, and desirable, di!erences in posterior
inference. It is ironic that this outcome is achieved by placing additional prior
probability on /'1, a region in which there is little evidence that economists
have genuine prior belief. This is, of course, an inevitable consequence of taking
the posterior probability of /(1 as a measure of the strength of evidence for
stationarity. Second, we follow Marriott and Newbold (1998) in arguing that
imposition of point prior probability mass at /"1 would logically be associated with prior belief that the probability of / in a region close to one would be
much greater than the probability of / in a comparable range far from one.
These authors propose the beta prior with density
C(a#b)
(1#/)a~1(1!/)b~1, D/D(1
p(/)"
C(a)C(b)2a`b~1
(2)
with b"0.5, giving, as seems desirable in this application, a singularity at /"1.
The larger is the parameter a in (2), the more tightly concentrated towards /"1
is the prior density, and the analyst can select this parameter according to belief.
For example, as emphasised by Sims (1988) and Geweke (1994), all else equal the
higher the frequency of observation, the larger would one expect the autoregressive parameter to be. Marriott and Newbold (1998) obtained simulation results
with attractive properties using a"5, and for illustration we shall use this value
here for comparison with the uniform prior.
In the remainder of this paper we allow for the possibilities of both a unit
autoregressive root and a structural break. Although this seems like a natural
problem for Bayesian analysis, generating posterior odds, relatively little work
along these lines has been reported. An exception is De Jong (1996), whose
analysis is in the same vein as De Jong and Whiteman (1991), taking the
posterior probability of /(0.98 as a measure of the evidence for stationarity.
Further, by contrast with our approach, point probability mass is not put on the
possibility of no break.
2. Bayesian model selection for an ARMA generating process
Assume that the underlying generating process is ARMA, with the orders of
the autoregressive and moving average polynomial operators taken as given.
6
J. Marriott, P. Newbold / Journal of Econometrics 98 (2000) 1}25
Attention will be restricted to the case where, under stationarity with no breaks,
the generating model has unknown mean but no trend, though extension to
stationarity around a linear trend is quite straightforward. There is uncertainty
about the presence of a unit autoregressive root in the model and also about
whether there is a break in mean under stationarity. Moreover, if such a break
exists, its location is uncertain. Within this framework we seek posterior probabilities for each of four possible structures allowing for stationarity/unit
autoregressive root and break/no break.
Let > (t"0, 1,2, n) denote an observed time series. As one possible
t
generating process, we consider the ARIMA(p, 1, q) model
a(¸)(1!¸)> "h(¸)e
(3)
t
t
where e is zero-mean white noise, with variance p2,
t
a(¸)"1!a ¸!2!a ¸p, h(¸)"1!h ¸!2!h ¸q
1
p
1
q
and ¸ is the lag operator. It is assumed that (p, q) are given, that the conditions
for stationarity and invertibility are satis"ed, and that a(¸) and h(¸) have no
common factors. The stationary alternative to (3) is the ARMA(p#1, q) model
D
C
p
(4)
1!(/#a )¸! + (a !a )¸j#a ¸p`1 (> !k)"h(¸)e
t
t
1
j
j~1
p
j/2
which reduces to (3) when /"1. The two structural break alternatives we
consider are
G
k #u ; t)n
t
1
>" 1
t
k #u ; t*n #1
2
t
1
a(¸) (1!¸)u "h(¸)e
t
t
and (5) in conjunction with
C
(5)
D
p
1!(/#a )¸! + (a !a )¸j#a ¸p`1 u "h(¸)e .
t
t
1
j
j~1
p
j/2
In what follows we adopt the approach of Marriott and Newbold (1998) and
formulate the four models in terms of the "rst di!erences, = "(1!¸)> . In the
t
t
case of the structural break models we will be working with
= "dX #(1!¸)u
t
t
t
where
X 1 "1
n `1
X "0; tOn #1
t
1
(6)
J. Marriott, P. Newbold / Journal of Econometrics 98 (2000) 1}25
7
that is, = has a single &outlier', of magnitude d"k !k , at time n #1. The
t
2
1
1
models that are to be analysed are therefore
M : a(¸)= "h(¸)e ,
1
t
t
C
D
C
D
p
M : 1!(/#a )¸! + (a !a )¸j#a ¸p`1 = "h(¸)(1!¸)e ,
2
1
j
j~1
p
t
t
j/2
M : a(¸)(= !dX )"h(¸)e ,
3
t
t
t
p
M : 1!(/#a )¸! + (a !a )¸j#a ¸p`1 (= !dX )
4
1
j
j~1
p
t
t
j/2
" h(¸)(1!¸)e ,
t
with X de"ned in (6). The two unit root models are M and M , and the
t
1
3
structural break models are M and M .
3
4
Given a sample ="(= ,2, = ), the Bayesian comparison of the four
1
n
models proceeds by computing the posterior model probabilities, which are
given by Bayes' theorem
P(M )P(=DM )
i
i
.
(7)
P(M D=)"
i
+4 P(M )P(=DM )
i/1
i
i
where P(M ) is the prior probability assigned to model M . If we write c for the
i
i
i
vector of ARMA parameters of model M then
i
=
P(=DM )"
p(c , pDM )p(=Dc , p, M ) dp dc
i
i
i
i
i
i
ci 0
gives the integrated joint densities of (c , p, =) for M and M and
i
1
2
= =
P(=DM )"+
p(c , d, p, n DM )p(=Dc , d, p, n , M ) dd dp dc
i
i
1 i
i
1
i
i
n1 ci 0 ~=
gives the integrated joint densities of (c , d, p, n , =) for M and M . Here,
i
1
3
4
p(c , pDM ) and p(c , d, p, n DM ) are the joint prior densities for the parameters
i
i
i
1 i
and p(=Dc , p, M ) and p(=Dc , d, p, n , M ) are the likelihoods. For the api
i
i
1
i
proach we are adopting here we assume the four models are equally likely
a priori, so that P(M )"0.25, implying that the marginal prior probability for
i
a unit root model is P(M )#P(M )"0.5.
1
3
The likelihoods can be determined through the Kalman "lter or algorithms of
Newbold (1974) or Ansley (1979) and, for given n , take the form
1
1
S(c )
(8)
p(=Dc , d, p, M )"(2np2)~(1@2)nDA D~1@2 exp !
i
i
i
2p2 i
PP
PPP
G
H
8
J. Marriott, P. Newbold / Journal of Econometrics 98 (2000) 1}25
where the elements of the matrix A will involve only /, (a ) and (h ) and S(c ) is
i
i
i
i
quadratic in the = or (= !dX ). We propose the following prior densities for
t
t
t
the di!erent models:
M : p(c , pDM )"p(c )p(p),
1
1
1
1
M : p(c , pDM )"p(c )p(p),
2
2
2
2
M : p(c , d, p, n DM )"p(c )p(dDp)p(p)p(n ),
3
3
1 3
3
1
M : p(c , d, p, n DM )"p(c )p(dDp)p(p)p(n ),
4
4
1 4
4
1
where p(c ) is uniform, p(dDp) is N(0, k2p2) and p(p) is the non-informative density
i
p(p)"p~1. The prior density for n is taken as the discrete uniform density
1
p(n )"n~1 for n "0,2, n!1.
1
1
For M and M the joint densities of (c , p, =) are given by
1
2
i
1
1
DA D~1@2 exp !
S(c ) p(c ).
p(c , p, =DM )"(2n)~n@2
i
i
i
i
2p2 i
pn`1
G
H
Integrating this with respect to p gives
C(n )
p(c , =DM )" 2 DA D~1@2[S(c )]~n@2p(c )
i
i
i
i
2nn@2 i
(9)
with p(=DM ) found by integrating out c . For M and M the joint densities for
i
i
3
4
(c , d, p, n , =) are
i
1
1
p(c , d, p, n , =DM )"(2n)~(n`1)@2
DA D~1@2
i
1
i
nkpn`2 i
G
H
1
1
]exp !
S(c , n )!
d2 p(c ).
i
1
i
2p2
2k2p2
(10)
Using Newbold (1974) it is straightforward to show that for these models
S(c , n )"S(c )!2d