Directory UMM :Data Elmu:jurnal:J-a:Journal of Econometrics:Vol97.Issue1.July2000:
Journal of Econometrics 97 (2000) 51}91
Exact small-sample inference in stationary,
fully regular, dynamic demand models
Philippe J. Deschamps*
Se& minaire d+E! conome& trie, Universite& de Fribourg Suisse, Avenue de Beauregard 13,
CH-1700 Fribourg, Switzerland
Received 1 June 1998; received in revised form 1 March 1999
Abstract
Asymptotics are known to be unreliable in multivariate models with cross-equation or
non-linear restrictions, and the dimension of the problem makes bootstrapping impractical. In this paper, "nite sample results are obtained by Markov chain Monte Carlo
methods for a nearly non-stationary VAR, and for a di!erential dynamic demand model
with homogeneity, Slutsky symmetry, and negativity. The full likelihood function is used in
each case. Slutsky symmetry and negativity are tested using simulation estimates of partial
Bayes factors. We argue that a di!use prior on the long-run error covariance matrix helps
to identify the equilibrium coe$cients. ( 2000 Elsevier Science S.A. All rights reserved.
JEL classixcation: C11; C15; C32; D12
Keywords: Vector autoregressions; Markov chain Monte Carlo; Exact likelihood; Truncated normal; Allais intensities
1. Introduction
Non-linear functions involving ratios of regression coe$cients arise naturally
in the context of stationary autoregressions, in the form of unconditional
expectations. Similar functions arise in the context of dynamic demand models,
* Tel.: #41-26-300-8252; fax: #41-26-300-9781.
E-mail address: [email protected] (P.J. Deschamps).
0304-4076/00/$ - see front matter ( 2000 Elsevier Science S.A. All rights reserved.
PII: S 0 3 0 4 - 4 0 7 6 ( 9 9 ) 0 0 0 6 6 - 4
52
P.J. Deschamps / Journal of Econometrics 97 (2000) 51}91
where the parameters of interest are the equilibrium multipliers. Cross-equation
restrictions (such as Slutsky symmetry) and further non-linear restrictions (such
as negativity) are of importance when demand systems are investigated.
However, even in simple cases, the maximum likelihood estimators of ratios
of regression coe$cients are known not to always possess "nite moments
in small samples; see, e.g., Zellner (1978). Furthermore, it is now well recognized that asymptotic approximations are usually misleading in the
context of multivariate regression models, especially when cross-equation restrictions are present or when non-linear functions of the parameters are
estimated.
The main purpose of this paper is to show that the Bayesian methodology,
combined with the use of Markov chain Monte Carlo methods, provides
a practical alternative to the usual (point and interval) frequentist estimation of
demand models and may avoid these di$culties. The fact that these methods are
relatively new may explain the relative rarity of their use in applied demand
analysis; an exception is Terrell (1996), where the emphasis is on #exibility rather
than global regularity. However, this rarity is unfortunate, since Percy (1992)
shows that they are eminently suited to the estimation of multivariate regression
models.
This paper discusses the estimation of two related, but somewhat di!erent,
dynamic demand models. The "rst one, in Sections 2}6, is a vector autoregression based on an assumption that the expenditure shares are logistic normal.
Whereas this model is quite restrictive from a demand-theoretic standpoint, it is
nevertheless useful for the dynamic prediction of budget shares on the unit
simplex.
It will also serve as a convenient means of introducing, in a relatively simple
context, most of the statistical methodology that will be used later on to
investigate a much more sophisticated demand system. Estimation results based
on two di!erent di!use priors will be presented, and the comparison used to
justify the choice of prior in the more sophisticated model.
In addition, the pure vector autoregression is an interesting model in its own
right, due to its popularity in the empirical literature. The dependent variables in
our example appear to be stationary (implying the existence of unconditional
expectations), but close to non-stationarity. This makes the estimation problem
non-trivial; indeed, the risk exists that such a system is improperly classi"ed as
cointegrated, leading to improper reduced rank restrictions and dynamic misspeci"cation.
The second model, discussed in Sections 7}9 and in Appendices C and D, is
a di!erential dynamic demand system with all the regularity restrictions of
economic theory (homogeneity, Slutsky symmetry, and negativity) imposed.
Dynamics are allowed for by reparameterizing a VARX model in terms of the
equilibrium coe$cients; the resulting model is one of partial adjustment on the
growth rates, and can be justi"ed by minimizing the sum of an adjustment
P.J. Deschamps / Journal of Econometrics 97 (2000) 51}91
53
cost and a disequilibrium cost (Bewley, 1986). The full and conditional
maximum likelihood estimates of the di!erential system with Slutsky symmetry
will be contrasted with the Bayesian small-sample estimates. In the fully
regular model, the substitution relations between pairs of commodities will
be analyzed, using the very di!erent concepts of Hicks and Allais substitutability.
For reasons of practicality and convenience, the tests of dynamic speci"cation
in this paper will all be based on classical (non-Bayesian) methods. Indeed, all
our models involve a large number of parameters, so that the Monte Carlo
estimation of Bayes factors is very time-consuming; on the other hand, easily
implemented small-sample versions of most classical dynamic speci"cation tests
are available. The situation is markedly di!erent in the case of regularity tests.
No small-sample corrections are available for the classical tests of Slutsky
symmetry when dynamics are present. The same can be said of the Wald test of
negativity proposed by Kodde and Palm (1987), even in the static case; moreover, its exact asymptotic critical values are di$cult to determine, leading the
authors to propose lower and upper bounds. For these reasons, Slutsky symmetry and negativity will be tested by means of simulation estimates of partial
Bayes factors, using a methodology recently proposed by Chib and Greenberg
(1998).
The practicality of the proposed method in arbitrary samples requires specialized simulation tools, which are presented in Appendices A and B. Appendix
A discusses the generation of autoregressive coe$cient matrices with truncated
normal distributions on the stationarity region. Appendix B discusses the
generation of truncated normal negative-de"nite matrices.
In this paper, D denotes the a2]a(a#1)/2 duplication matrix, and D` its
a
a
Moore}Penrose inverse (Magnus and Neudecker, 1988, p. 49). I denotes an
a
identity matrix of order a, and O
a null matrix with a rows and b columns.
aCb
ι is an a]1 vector of ones. o(A) is the spectral radius of a square matrix A, i.e.
a
the largest modulus of the eigenvalues of A. For a matrix U and a set S, IS (U) is
an indicator function which is unity if U3S, and 0 otherwise. f (m, and X according to (2.13) and(2.14) with n"n(i).
Step 4. Using naive rejection sampling (when practical) or the Gibbs sampling
algorithm of Appendix A, draw a candidate U from:
f (vec U)Jf [vec >X@(XX@)~1, (XX@)~1?X(i~1)]IS (U),
NO
set U(i)"U with probability:
C
D
L (n(i), U, X(i~1))
H
,1
L (n(i), U(i~1), X(i~1))
H
and set U(i)"U(i~1) with probability 1!a.
Step 5. Draw a candidate X from
a"min
f [l, (>!U(i)X)(>!U(i)X)@],
IW
where l"¹#n#1 or l"¹ according to the choice of prior. Set X(i)"X with
probability:
C
D
L (n(i), U(i), X)
H
,1
L (n(i), U(i), X(i~1))
H
and set X(i)"X(i~1) with probability 1!b.
Step 6. Set i"i#1 and return to Step 2.
b"min
It will be seen in Appendix A that S is arc-connected. Conditions similar to
Proposition 2 of Chib and Greenberg (1994) will then be veri"ed if L is
H
bounded, or, equivalently, if R in (2.6) is positive de"nite. This will be the case
when e(1 in (3.4), since X is positive de"nite and since U3S. Indeed, it is
well-known that R is non-negative de"nite for stationary VARs (LuK tkepohl,
1991, p. 25), and a singular R would imply an exact linear relation between some
elements of (y@ , y@
, , y@ ), contradicting the non-degeneracy of the
t~p t~p`1 2 t
disturbances in (2.1). The results in Tierney (1994) and Roberts and Smith (1994)
P.J. Deschamps / Journal of Econometrics 97 (2000) 51}91
59
then imply that the Markov chains converge to the target distribution as
¸ PR.
1
5. Choice of the starting values
The algorithm of Section 4 must be initialized at appropriate starting values.
In this section, we propose to choose these starting values randomly, from
a thick-tailed approximation to the joint posterior density of (n, U, X). This
approximation is constructed from the asymptotic distribution of the maximum
likelihood estimates. A similar proposal was made by Koop et al. (1994).
Let UK (¸) and k( be the OLS estimates of U(¸) and k in (2.1), and let
n( "UK ~1(1)k( ,
(5.1)
e( "UK (¸)y !k( ,
(5.2)
t
t
p`T
1
(5.3)
+ e( e( @ .
XK "
t t
¹!np!1
t/p`1
If UK is the OLS estimate of U in (2.8), standard results on maximum likelihood
estimation give the following asymptotic approximation to the sampling distribution of (n( , UK ):
f (n( , vec UK )+f
n
NO
CA B D
, IK ~1 ,
vec U
(5.4)
where
p`T UK (1)@
IK " +
XK ~1(UK (1) Z@ )
t
Z
t/p`1
t
A B
and
A B
y !n(
t~p
?I .
Z" F
n
t
y !n(
t~1
Furthermore, XK is asymptotically independent of (n( , UK ), and, from Zellner (1971,
p. 227), an approximate marginal posterior distribution of X is Wishart inverse
with ¹!np#n degrees of freedom. The required starting values n(0), U(0), and
X(0) are then random drawings from the following densities:
p(n, vec U)Jf
p(X)"f
IW
A
nL
ST
CA B
vec UK
D
B
, IK ~1, 3 IS (U),
p`T
¹!np#n, + e( e( @ .
t t
t/p`1
(5.5)
(5.6)
60
P.J. Deschamps / Journal of Econometrics 97 (2000) 51}91
The truncated Student density in (5.5) is most conveniently sampled by
a naive rejection procedure, since it is usually the case that UK 3S.
6. A 5rst comparative illustration
In this section, we will compare the Bayesian stationarity constrained and the
OLS estimates in a four-equation VAR of order two, "tted to British data.
Speci"cally, we consider Eq. (2.1) with p"2, and the following endogenous
variables:
A B
w
for i"1,2,4 and t"1,2,44,
(6.1)
y "ln it
it
w
5t
where w is the expenditure share for food (i"1), alcoholic drink and tobacco
it
(i"2), clothing and footwear (i"3), energy products (i"4), and other goods
excluding durables, housing services, and other services (i"5), out of the total
expenditure for these "ve categories. The data are annual, cover the period
1952}1995, and are taken from the Economic Trends Annual Supplement
(1996/1997 edition), published by the British Central Statistical O$ce.
Eq. (6.1) is of course the additive logistic transformation, which has been
extensively studied by Aitchison (1986). The inverse transformation is given by
exp(y )
it
for i"1,2,4,
w "
it
4
1# + exp(y )
jt
(6.2)
j/1
1
w "
5t
4
1# + exp(y )
jt
j/1
and it is straightforward to check that the polynomial U(¸) in (2.1), as well as the
joint distribution of the shares in (6.2), are invariant with respect to the choice of
normalization in (6.1).
It is also of some interest to note that the model resulting from (2.2) and (6.1) is
consistent with utility theory, and in fact embodies a naive version of
the Random Utility Hypothesis (see, e.g., Brown and Walker, 1989). Indeed, if
the inverse logistic transformation (6.2) is applied to the y "n #u in
it
i
it
Eq. (2.2), we may view the transforms as random, time-varying, and autocorrelated elasticities in a Cobb}Douglas utility function.
An estimation of the model by classical methods suggested that the data
generating process is close to non-stationarity, but did not provide very strong
evidence for trends in the joint process generating y . Deterministic trends were
t
not signi"cant; an estimation of (2.1) by ordinary least squares gave a spectral
radius of o[F(UK )]"0.956. Dickey}Fuller tests on the individual variables in
P.J. Deschamps / Journal of Econometrics 97 (2000) 51}91
61
(6.1) did not reject unit roots; however, the relevant issue here is one of joint
stationarity, and it may be argued that cointegration tests have more power in
this context. The Johansen trace test did not reject the hypothesis that the
cointegrating rank r is strictly less than 4; however, both the Akaike information
criterion and the Hannan}Quinn criterion favor r"4, i.e. the absence of unit
roots. Since little is known about the properties of the Johansen test in small
samples, it seems therefore safe to consider the evidence as inconclusive and to
investigate whether the unconditional expectation vector n can be reliably
estimated by the Bayesian methodology of the previous sections. If r(4, n does
not exist, and the sampler cannot be expected to converge unless the prior
parameter e in (3.4) is strictly less than one (in this case, most of the posterior
mass will be concentrated near e).
Using the methodology of Section 4, 10,000 Markov chains of length 400 were
generated with e"1 and the prior in (3.5). The length of 400 was chosen after
convergence diagnostics with ¸ "300 and ¸ "400 passes gave conclusive
1
2
results, the critical probabilities being all in excess of 0.05 and the robust Wald
test yielding a critical probability of 0.85. Table 1 presents summaries of some
estimated posterior distributions; for comparison purposes, the OLS estimates,
some characteristics of their estimated asymptotic distributions (based on (5.4)
in the case of the n( ), and some characteristics of their estimated small-sample
i
distributions (based on a parametric bootstrap) are also presented. In Table 1,
m (h) denotes the ith sample moment of h and h denotes the ath percentage
i
a
point of its distribution; NSE denotes the numerical standard error of the
posterior expectation estimate m (h), computed by the method of Andrews and
1
Monahan (1992) with a Parzen kernel and VAR(1) prewhitening; and RNE
denotes its relative numerical e$ciency, as de"ned by Geweke (1992).
The estimated Bayesian posterior distributions of the k appear to be closer to
i
normality than the bootstrap distributions of the k( , as indicated by the skewi
ness and excess kurtosis statistics. On the other hand, the Bayesian, asymptotic,
and bootstrap estimated standard errors of the k and k( are rather close.
i
i
By contrast, the last four columns of Table 1 reveal large di!erences between
the three methods in the case of the unconditional expectation estimates. The
posterior distributions of the n are very far from normality; kernel estimation
i
reveals densities that are close to zero over a wide interval, except for a sharp
central peak. The situation is similar, but much more extreme, when the
bootstrap densities of the OLS estimates n( are considered: the bootstrap
i
con"dence intervals are too wide to be useful, and the magnitude of the
higher-order sample moments (reported for comparison purposes only) indeed
casts doubts on the existence of the corresponding population moments. The
asymptotic approximations turn out to be very poor. The Bayesian point
estimates m (n ) are quite di!erent from the OLS estimates n( ; they are, however,
1 i
i
closer than the OLS estimates to the sample means y6 "0.587, y6 "0.098,
1
2
y6 "!0.262, and y6 "!0.427.
3
4
Table 1
Bayesian (prior 3.5) and OLS estimation of logistic normal VAR
k
4
n
1
n
2
n
3
n
4
0.065
0.091
0.072
0.107
!0.213
0.084
!0.052
0.184
0.39
0.67
!0.01
0.50
!0.36
0.53
!0.72
0.36
0.016
!0.132
!0.049
0.017
!9.56
!9.38
!9.27
!3.07
0.209
0.291
0.084
0.239
368.11
384.11
355.09
217.04
!0.154
0.064
0.280
!0.368
0.446
0.001
1.005
!0.190
0.074
0.320
!0.424
0.536
0.001
0.993
!0.411
!0.213
!0.019
!0.590
0.104
0.001
1.013
!0.484
!0.053
0.383
!0.932
0.658
0.002
0.963
!1.07
0.43
1.51
!23.42
17.02
0.01
1.01
!1.04
0.01
0.79
!18.62
11.94
0.01
1.02
!1.52
!0.33
0.54
!18.78
12.56
0.01
1.01
!1.62
!0.68
!0.19
!10.71
9.01
0.00
0.98
!0.093
0.171
!0.491
0.305
!0.05
0.43
!1.05
0.95
!0.37
0.29
!1.05
0.30
!0.71
0.33
!1.47
0.04
!0.83
0.24
!1.38
!0.28
OLS estimates and estimated asymptotic distributions
hK
"UX#E, with the coe$cient matrix U de"ned in
(2.8). The counterparts of Eqs. (2.9) and (2.10) are
A B
A B
y
1
F
y
p
,
y "
H
U(¸)y
p`1
F
(7.11)
U(¸)y
p`T
x@ ?I
1
n
F
x@ ?I
p
n
X "
D ,
H
x@ ?U(1) S
p`1
F
(7.12)
x@ ?U(1)
p`T
where D is a matrix of zeros and ones such that vec P"D c. This conveniently
S
S
imposes the symmetry constraint. Also, upon noting that (7.3) implies
p
y !z " + U (y !z )#e ,
t
t
j t~j
t
t
j/1
the counterparts of Eqs. (2.13) and (2.14) are
>"(y
!z
y
!z
2y
!z
),
p`1
p`1 p`2
p`2
p`T
p`T
A
y !z
1
p`1
X"
F
y !z
p
p`1
y !z
2
p`2
F
y
p`1
!z
p`2
2
2
2 y
(7.13)
y !z
T
p`T
F
p`T~1
!z
p`T
B
.
(7.14)
As before, we de"ne H(U, X) as the n]n matrix given by the diagonal blocks
of R, and extend our second prior in Section 3 to impose full regularity on the
equilibrium demand relation. Upon recalling that P"(P C) and denoting
0
by N the set of all negative de"nite n]n matrices, our joint prior becomes
p(P , C, U, X)JIS (U)IN (C)[det H(U, X)]~(n`1)@2
0
(7.15)
P.J. Deschamps / Journal of Econometrics 97 (2000) 51}91
67
implying as joint posterior
p(P , C, U, X D Y, X)JIS (U)IN (C)[det H(U, X)]~(n`1)@2
0
L (P, U, X)L (P, U, X).
(7.16)
1
2
The arguments of Section 4 are now readily extended to the present context.
We let < be as in (2.12), X
consist of the "rst 2n columns of X , and
HH
H
XC consist of the remaining columns (indexed from 2n#1 to 2n#n(n#1)/2) of
X . We also de"ne v(C) as the column vector containing the lower triangle of C,
H
and
yC "y !X vec P ,
H
HH
0
y "y !XC v(C).
HH
H
The full conditional posteriors are
p(vec P D U, C, X, Y, X)"f [(X@
Exact small-sample inference in stationary,
fully regular, dynamic demand models
Philippe J. Deschamps*
Se& minaire d+E! conome& trie, Universite& de Fribourg Suisse, Avenue de Beauregard 13,
CH-1700 Fribourg, Switzerland
Received 1 June 1998; received in revised form 1 March 1999
Abstract
Asymptotics are known to be unreliable in multivariate models with cross-equation or
non-linear restrictions, and the dimension of the problem makes bootstrapping impractical. In this paper, "nite sample results are obtained by Markov chain Monte Carlo
methods for a nearly non-stationary VAR, and for a di!erential dynamic demand model
with homogeneity, Slutsky symmetry, and negativity. The full likelihood function is used in
each case. Slutsky symmetry and negativity are tested using simulation estimates of partial
Bayes factors. We argue that a di!use prior on the long-run error covariance matrix helps
to identify the equilibrium coe$cients. ( 2000 Elsevier Science S.A. All rights reserved.
JEL classixcation: C11; C15; C32; D12
Keywords: Vector autoregressions; Markov chain Monte Carlo; Exact likelihood; Truncated normal; Allais intensities
1. Introduction
Non-linear functions involving ratios of regression coe$cients arise naturally
in the context of stationary autoregressions, in the form of unconditional
expectations. Similar functions arise in the context of dynamic demand models,
* Tel.: #41-26-300-8252; fax: #41-26-300-9781.
E-mail address: [email protected] (P.J. Deschamps).
0304-4076/00/$ - see front matter ( 2000 Elsevier Science S.A. All rights reserved.
PII: S 0 3 0 4 - 4 0 7 6 ( 9 9 ) 0 0 0 6 6 - 4
52
P.J. Deschamps / Journal of Econometrics 97 (2000) 51}91
where the parameters of interest are the equilibrium multipliers. Cross-equation
restrictions (such as Slutsky symmetry) and further non-linear restrictions (such
as negativity) are of importance when demand systems are investigated.
However, even in simple cases, the maximum likelihood estimators of ratios
of regression coe$cients are known not to always possess "nite moments
in small samples; see, e.g., Zellner (1978). Furthermore, it is now well recognized that asymptotic approximations are usually misleading in the
context of multivariate regression models, especially when cross-equation restrictions are present or when non-linear functions of the parameters are
estimated.
The main purpose of this paper is to show that the Bayesian methodology,
combined with the use of Markov chain Monte Carlo methods, provides
a practical alternative to the usual (point and interval) frequentist estimation of
demand models and may avoid these di$culties. The fact that these methods are
relatively new may explain the relative rarity of their use in applied demand
analysis; an exception is Terrell (1996), where the emphasis is on #exibility rather
than global regularity. However, this rarity is unfortunate, since Percy (1992)
shows that they are eminently suited to the estimation of multivariate regression
models.
This paper discusses the estimation of two related, but somewhat di!erent,
dynamic demand models. The "rst one, in Sections 2}6, is a vector autoregression based on an assumption that the expenditure shares are logistic normal.
Whereas this model is quite restrictive from a demand-theoretic standpoint, it is
nevertheless useful for the dynamic prediction of budget shares on the unit
simplex.
It will also serve as a convenient means of introducing, in a relatively simple
context, most of the statistical methodology that will be used later on to
investigate a much more sophisticated demand system. Estimation results based
on two di!erent di!use priors will be presented, and the comparison used to
justify the choice of prior in the more sophisticated model.
In addition, the pure vector autoregression is an interesting model in its own
right, due to its popularity in the empirical literature. The dependent variables in
our example appear to be stationary (implying the existence of unconditional
expectations), but close to non-stationarity. This makes the estimation problem
non-trivial; indeed, the risk exists that such a system is improperly classi"ed as
cointegrated, leading to improper reduced rank restrictions and dynamic misspeci"cation.
The second model, discussed in Sections 7}9 and in Appendices C and D, is
a di!erential dynamic demand system with all the regularity restrictions of
economic theory (homogeneity, Slutsky symmetry, and negativity) imposed.
Dynamics are allowed for by reparameterizing a VARX model in terms of the
equilibrium coe$cients; the resulting model is one of partial adjustment on the
growth rates, and can be justi"ed by minimizing the sum of an adjustment
P.J. Deschamps / Journal of Econometrics 97 (2000) 51}91
53
cost and a disequilibrium cost (Bewley, 1986). The full and conditional
maximum likelihood estimates of the di!erential system with Slutsky symmetry
will be contrasted with the Bayesian small-sample estimates. In the fully
regular model, the substitution relations between pairs of commodities will
be analyzed, using the very di!erent concepts of Hicks and Allais substitutability.
For reasons of practicality and convenience, the tests of dynamic speci"cation
in this paper will all be based on classical (non-Bayesian) methods. Indeed, all
our models involve a large number of parameters, so that the Monte Carlo
estimation of Bayes factors is very time-consuming; on the other hand, easily
implemented small-sample versions of most classical dynamic speci"cation tests
are available. The situation is markedly di!erent in the case of regularity tests.
No small-sample corrections are available for the classical tests of Slutsky
symmetry when dynamics are present. The same can be said of the Wald test of
negativity proposed by Kodde and Palm (1987), even in the static case; moreover, its exact asymptotic critical values are di$cult to determine, leading the
authors to propose lower and upper bounds. For these reasons, Slutsky symmetry and negativity will be tested by means of simulation estimates of partial
Bayes factors, using a methodology recently proposed by Chib and Greenberg
(1998).
The practicality of the proposed method in arbitrary samples requires specialized simulation tools, which are presented in Appendices A and B. Appendix
A discusses the generation of autoregressive coe$cient matrices with truncated
normal distributions on the stationarity region. Appendix B discusses the
generation of truncated normal negative-de"nite matrices.
In this paper, D denotes the a2]a(a#1)/2 duplication matrix, and D` its
a
a
Moore}Penrose inverse (Magnus and Neudecker, 1988, p. 49). I denotes an
a
identity matrix of order a, and O
a null matrix with a rows and b columns.
aCb
ι is an a]1 vector of ones. o(A) is the spectral radius of a square matrix A, i.e.
a
the largest modulus of the eigenvalues of A. For a matrix U and a set S, IS (U) is
an indicator function which is unity if U3S, and 0 otherwise. f (m, and X according to (2.13) and(2.14) with n"n(i).
Step 4. Using naive rejection sampling (when practical) or the Gibbs sampling
algorithm of Appendix A, draw a candidate U from:
f (vec U)Jf [vec >X@(XX@)~1, (XX@)~1?X(i~1)]IS (U),
NO
set U(i)"U with probability:
C
D
L (n(i), U, X(i~1))
H
,1
L (n(i), U(i~1), X(i~1))
H
and set U(i)"U(i~1) with probability 1!a.
Step 5. Draw a candidate X from
a"min
f [l, (>!U(i)X)(>!U(i)X)@],
IW
where l"¹#n#1 or l"¹ according to the choice of prior. Set X(i)"X with
probability:
C
D
L (n(i), U(i), X)
H
,1
L (n(i), U(i), X(i~1))
H
and set X(i)"X(i~1) with probability 1!b.
Step 6. Set i"i#1 and return to Step 2.
b"min
It will be seen in Appendix A that S is arc-connected. Conditions similar to
Proposition 2 of Chib and Greenberg (1994) will then be veri"ed if L is
H
bounded, or, equivalently, if R in (2.6) is positive de"nite. This will be the case
when e(1 in (3.4), since X is positive de"nite and since U3S. Indeed, it is
well-known that R is non-negative de"nite for stationary VARs (LuK tkepohl,
1991, p. 25), and a singular R would imply an exact linear relation between some
elements of (y@ , y@
, , y@ ), contradicting the non-degeneracy of the
t~p t~p`1 2 t
disturbances in (2.1). The results in Tierney (1994) and Roberts and Smith (1994)
P.J. Deschamps / Journal of Econometrics 97 (2000) 51}91
59
then imply that the Markov chains converge to the target distribution as
¸ PR.
1
5. Choice of the starting values
The algorithm of Section 4 must be initialized at appropriate starting values.
In this section, we propose to choose these starting values randomly, from
a thick-tailed approximation to the joint posterior density of (n, U, X). This
approximation is constructed from the asymptotic distribution of the maximum
likelihood estimates. A similar proposal was made by Koop et al. (1994).
Let UK (¸) and k( be the OLS estimates of U(¸) and k in (2.1), and let
n( "UK ~1(1)k( ,
(5.1)
e( "UK (¸)y !k( ,
(5.2)
t
t
p`T
1
(5.3)
+ e( e( @ .
XK "
t t
¹!np!1
t/p`1
If UK is the OLS estimate of U in (2.8), standard results on maximum likelihood
estimation give the following asymptotic approximation to the sampling distribution of (n( , UK ):
f (n( , vec UK )+f
n
NO
CA B D
, IK ~1 ,
vec U
(5.4)
where
p`T UK (1)@
IK " +
XK ~1(UK (1) Z@ )
t
Z
t/p`1
t
A B
and
A B
y !n(
t~p
?I .
Z" F
n
t
y !n(
t~1
Furthermore, XK is asymptotically independent of (n( , UK ), and, from Zellner (1971,
p. 227), an approximate marginal posterior distribution of X is Wishart inverse
with ¹!np#n degrees of freedom. The required starting values n(0), U(0), and
X(0) are then random drawings from the following densities:
p(n, vec U)Jf
p(X)"f
IW
A
nL
ST
CA B
vec UK
D
B
, IK ~1, 3 IS (U),
p`T
¹!np#n, + e( e( @ .
t t
t/p`1
(5.5)
(5.6)
60
P.J. Deschamps / Journal of Econometrics 97 (2000) 51}91
The truncated Student density in (5.5) is most conveniently sampled by
a naive rejection procedure, since it is usually the case that UK 3S.
6. A 5rst comparative illustration
In this section, we will compare the Bayesian stationarity constrained and the
OLS estimates in a four-equation VAR of order two, "tted to British data.
Speci"cally, we consider Eq. (2.1) with p"2, and the following endogenous
variables:
A B
w
for i"1,2,4 and t"1,2,44,
(6.1)
y "ln it
it
w
5t
where w is the expenditure share for food (i"1), alcoholic drink and tobacco
it
(i"2), clothing and footwear (i"3), energy products (i"4), and other goods
excluding durables, housing services, and other services (i"5), out of the total
expenditure for these "ve categories. The data are annual, cover the period
1952}1995, and are taken from the Economic Trends Annual Supplement
(1996/1997 edition), published by the British Central Statistical O$ce.
Eq. (6.1) is of course the additive logistic transformation, which has been
extensively studied by Aitchison (1986). The inverse transformation is given by
exp(y )
it
for i"1,2,4,
w "
it
4
1# + exp(y )
jt
(6.2)
j/1
1
w "
5t
4
1# + exp(y )
jt
j/1
and it is straightforward to check that the polynomial U(¸) in (2.1), as well as the
joint distribution of the shares in (6.2), are invariant with respect to the choice of
normalization in (6.1).
It is also of some interest to note that the model resulting from (2.2) and (6.1) is
consistent with utility theory, and in fact embodies a naive version of
the Random Utility Hypothesis (see, e.g., Brown and Walker, 1989). Indeed, if
the inverse logistic transformation (6.2) is applied to the y "n #u in
it
i
it
Eq. (2.2), we may view the transforms as random, time-varying, and autocorrelated elasticities in a Cobb}Douglas utility function.
An estimation of the model by classical methods suggested that the data
generating process is close to non-stationarity, but did not provide very strong
evidence for trends in the joint process generating y . Deterministic trends were
t
not signi"cant; an estimation of (2.1) by ordinary least squares gave a spectral
radius of o[F(UK )]"0.956. Dickey}Fuller tests on the individual variables in
P.J. Deschamps / Journal of Econometrics 97 (2000) 51}91
61
(6.1) did not reject unit roots; however, the relevant issue here is one of joint
stationarity, and it may be argued that cointegration tests have more power in
this context. The Johansen trace test did not reject the hypothesis that the
cointegrating rank r is strictly less than 4; however, both the Akaike information
criterion and the Hannan}Quinn criterion favor r"4, i.e. the absence of unit
roots. Since little is known about the properties of the Johansen test in small
samples, it seems therefore safe to consider the evidence as inconclusive and to
investigate whether the unconditional expectation vector n can be reliably
estimated by the Bayesian methodology of the previous sections. If r(4, n does
not exist, and the sampler cannot be expected to converge unless the prior
parameter e in (3.4) is strictly less than one (in this case, most of the posterior
mass will be concentrated near e).
Using the methodology of Section 4, 10,000 Markov chains of length 400 were
generated with e"1 and the prior in (3.5). The length of 400 was chosen after
convergence diagnostics with ¸ "300 and ¸ "400 passes gave conclusive
1
2
results, the critical probabilities being all in excess of 0.05 and the robust Wald
test yielding a critical probability of 0.85. Table 1 presents summaries of some
estimated posterior distributions; for comparison purposes, the OLS estimates,
some characteristics of their estimated asymptotic distributions (based on (5.4)
in the case of the n( ), and some characteristics of their estimated small-sample
i
distributions (based on a parametric bootstrap) are also presented. In Table 1,
m (h) denotes the ith sample moment of h and h denotes the ath percentage
i
a
point of its distribution; NSE denotes the numerical standard error of the
posterior expectation estimate m (h), computed by the method of Andrews and
1
Monahan (1992) with a Parzen kernel and VAR(1) prewhitening; and RNE
denotes its relative numerical e$ciency, as de"ned by Geweke (1992).
The estimated Bayesian posterior distributions of the k appear to be closer to
i
normality than the bootstrap distributions of the k( , as indicated by the skewi
ness and excess kurtosis statistics. On the other hand, the Bayesian, asymptotic,
and bootstrap estimated standard errors of the k and k( are rather close.
i
i
By contrast, the last four columns of Table 1 reveal large di!erences between
the three methods in the case of the unconditional expectation estimates. The
posterior distributions of the n are very far from normality; kernel estimation
i
reveals densities that are close to zero over a wide interval, except for a sharp
central peak. The situation is similar, but much more extreme, when the
bootstrap densities of the OLS estimates n( are considered: the bootstrap
i
con"dence intervals are too wide to be useful, and the magnitude of the
higher-order sample moments (reported for comparison purposes only) indeed
casts doubts on the existence of the corresponding population moments. The
asymptotic approximations turn out to be very poor. The Bayesian point
estimates m (n ) are quite di!erent from the OLS estimates n( ; they are, however,
1 i
i
closer than the OLS estimates to the sample means y6 "0.587, y6 "0.098,
1
2
y6 "!0.262, and y6 "!0.427.
3
4
Table 1
Bayesian (prior 3.5) and OLS estimation of logistic normal VAR
k
4
n
1
n
2
n
3
n
4
0.065
0.091
0.072
0.107
!0.213
0.084
!0.052
0.184
0.39
0.67
!0.01
0.50
!0.36
0.53
!0.72
0.36
0.016
!0.132
!0.049
0.017
!9.56
!9.38
!9.27
!3.07
0.209
0.291
0.084
0.239
368.11
384.11
355.09
217.04
!0.154
0.064
0.280
!0.368
0.446
0.001
1.005
!0.190
0.074
0.320
!0.424
0.536
0.001
0.993
!0.411
!0.213
!0.019
!0.590
0.104
0.001
1.013
!0.484
!0.053
0.383
!0.932
0.658
0.002
0.963
!1.07
0.43
1.51
!23.42
17.02
0.01
1.01
!1.04
0.01
0.79
!18.62
11.94
0.01
1.02
!1.52
!0.33
0.54
!18.78
12.56
0.01
1.01
!1.62
!0.68
!0.19
!10.71
9.01
0.00
0.98
!0.093
0.171
!0.491
0.305
!0.05
0.43
!1.05
0.95
!0.37
0.29
!1.05
0.30
!0.71
0.33
!1.47
0.04
!0.83
0.24
!1.38
!0.28
OLS estimates and estimated asymptotic distributions
hK
"UX#E, with the coe$cient matrix U de"ned in
(2.8). The counterparts of Eqs. (2.9) and (2.10) are
A B
A B
y
1
F
y
p
,
y "
H
U(¸)y
p`1
F
(7.11)
U(¸)y
p`T
x@ ?I
1
n
F
x@ ?I
p
n
X "
D ,
H
x@ ?U(1) S
p`1
F
(7.12)
x@ ?U(1)
p`T
where D is a matrix of zeros and ones such that vec P"D c. This conveniently
S
S
imposes the symmetry constraint. Also, upon noting that (7.3) implies
p
y !z " + U (y !z )#e ,
t
t
j t~j
t
t
j/1
the counterparts of Eqs. (2.13) and (2.14) are
>"(y
!z
y
!z
2y
!z
),
p`1
p`1 p`2
p`2
p`T
p`T
A
y !z
1
p`1
X"
F
y !z
p
p`1
y !z
2
p`2
F
y
p`1
!z
p`2
2
2
2 y
(7.13)
y !z
T
p`T
F
p`T~1
!z
p`T
B
.
(7.14)
As before, we de"ne H(U, X) as the n]n matrix given by the diagonal blocks
of R, and extend our second prior in Section 3 to impose full regularity on the
equilibrium demand relation. Upon recalling that P"(P C) and denoting
0
by N the set of all negative de"nite n]n matrices, our joint prior becomes
p(P , C, U, X)JIS (U)IN (C)[det H(U, X)]~(n`1)@2
0
(7.15)
P.J. Deschamps / Journal of Econometrics 97 (2000) 51}91
67
implying as joint posterior
p(P , C, U, X D Y, X)JIS (U)IN (C)[det H(U, X)]~(n`1)@2
0
L (P, U, X)L (P, U, X).
(7.16)
1
2
The arguments of Section 4 are now readily extended to the present context.
We let < be as in (2.12), X
consist of the "rst 2n columns of X , and
HH
H
XC consist of the remaining columns (indexed from 2n#1 to 2n#n(n#1)/2) of
X . We also de"ne v(C) as the column vector containing the lower triangle of C,
H
and
yC "y !X vec P ,
H
HH
0
y "y !XC v(C).
HH
H
The full conditional posteriors are
p(vec P D U, C, X, Y, X)"f [(X@