07350015%2E2012%2E747839

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

Structural Dynamic Factor Analysis Using Prior
Information From Macroeconomic Theory
Gregor Bäurle
To cite this article: Gregor Bäurle (2013) Structural Dynamic Factor Analysis Using Prior
Information From Macroeconomic Theory, Journal of Business & Economic Statistics, 31:2,
136-150, DOI: 10.1080/07350015.2012.747839
To link to this article: http://dx.doi.org/10.1080/07350015.2012.747839

View supplementary material

Accepted author version posted online: 28
Nov 2012.

Submit your article to this journal

Article views: 408


View related articles

Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]

Date: 11 January 2016, At: 22:04

Supplementary materials for this article are available online. Please go to http://tandfonline.com/r/JBES

Structural Dynamic Factor Analysis Using Prior
Information From Macroeconomic Theory
¨
Gregor BAURLE

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:04 11 January 2016

Swiss National Bank, Zurich,
Switzerland (gregor.baeurle@snb.ch)
¨

Dynamic factor models are becoming increasingly popular in empirical macroeconomics due to their
ability to cope with large datasets. Dynamic stochastic general equilibrium (DSGE) models, on the other
hand, are suitable for the analysis of policy interventions from a methodical point of view. In this article, we
provide a Bayesian method to combine the statistically rich specification of the former with the conceptual
advantages of the latter by using information from a DSGE model to form a prior belief about parameters
in the dynamic factor model. Because the method establishes a connection between observed data and
economic theory and at the same time incorporates information from a large dataset, our setting is useful
to study the effects of policy interventions on a large number of observed variables. An application of
the method to U.S. data shows that a moderate weight of the DSGE prior is optimal and that the model
performs well in terms of forecasting. We then analyze the impact of monetary shocks on both the factors
and selected series using a DSGE-based identification of these shocks. Supplementary materials for this
article are available online.
KEY WORDS: Bayesian analysis; DSGE model; Dynamic factor model; Forecasting; Transmission of
shocks.

1.

INTRODUCTION

Dynamic factor models are becoming increasingly popular

in empirical macroeconomics due to their ability to cope with
large datasets. The idea, introduced into economic literature by
Sargent and Sims (1977) and Geweke (1977), is to gather the
informational content of many data series in a small dimensional vector of common factors. Each series is decomposed
into a combination of these common factors and an idiosyncratic
term. Compared with a small dimensional vector autoregression
(VAR), the analysis is more robust with respect to noise in the
idiosyncratic components such as measurement errors. Due to
these properties, dynamic factor models are successfully used
for forecasting. However, these factor models are usually implemented as purely statistical models without a direct economic
interpretation. As such, they are not immediately useful to analyze the effect of policy interventions.
For policy analysis, a second product of macroeconometric
research has recently become popular among policy makers: dynamic stochastic general equilibrium (DSGE) models are solely
based on microfoundations, and therefore, conceptually coherent tools for the analysis of policy interventions. However, to
keep this type of model tractable, a number of simplifying assumptions about the behavior of economic agents have to be
incorporated. As a consequence, it is challenging to provide
sensible statistical implications for the large set of variables that
is usually considered as relevant by policy makers with these
models—at least without augmenting them with rather ad hoc
frictions and shocks.

In this article, we provide a possibility to combine the statistically rich specification of dynamic factor analysis with the
conceptual advantages of DSGE models. The idea is the following: we assume that the data-generating process is a general
dynamic factor model. Hence, a small number of unobserved
factors is responsible for the comovement in a large number of
data series. At the same time, we formulate a limited number of

economic concepts that are, from an economist’s point of view,
the main driving forces in the economy. We then interpret the
unobserved factors as empirical counterparts of these economic
concepts. A prior belief about the relationship between observed
series and unobserved factors establishes this interpretation of
the factors as economic concepts. Finally, we use economic
theory to inform ourselves about the parameters governing the
dynamics of the factors. This is done by formulating a DSGE
model for the economic concepts corresponding to the factors.
Taken together, we have at our disposal fully specified prior
beliefs on the stochastic properties of the observed variables.
This prior belief is then used in a Bayesian estimation of the
factor model. By estimating the model with a version of a Gibbs
sampler, we are able to embed well-known methods for linear

regressions and VARs into the estimation. In particular, a method
developed by DelNegro and Schorfheide (2004) can be invoked
to incorporate information from DSGE models into the dynamic
equation for the factors.
Importantly, by varying the tightness of the prior distribution,
the researcher can decide how close the empirical model should
be related to the prior belief. In this article, the tightness of the
prior with regard to the coefficients relating observed data to the
factors is chosen in such a way that the factors are closely tied to
standard measures of these economic concepts. For series with
a more ambiguous interpretation, the connection is assumed to
be rather loose, a priori. For the coefficient related to the factor
dynamics, we suggest using pseudo out-of-sample forecasts and
posterior marginal data densities to measure for in-sample fit to
determine the tightness of the prior.
Because the setting establishes a connection between observed data and economic theory and at the same time

136

© 2013 American Statistical Association

Journal of Business & Economic Statistics
April 2013, Vol. 31, No. 2
DOI: 10.1080/07350015.2012.747839

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:04 11 January 2016

Baurle:
Structural Dynamic Factor Analysis Using Prior Information From Macroeconomic Theory
¨

incorporates empirical evidence contained in a large dataset,
it is useful to study the empirical effects of policy changes on a
large set of variables. As an example, the transmission of monetary policy shocks to many economic variables can be studied
by identifying the shocks based on the underlying DSGE model.
Moreover, the setting potentially improves the forecasting performance as it shrinks the parameter space in a way that is
compatible with beliefs derived from economic theory.
We implement our method by estimating a model for quarterly
U.S. data. The underlying DSGE model is of the standard NewKeynesian model type. The model relates output, inflation, and
interest rates. We, therefore, compile a dataset with variables
that are related to the said variables.

The main empirical results are two-fold: first, we establish
that our implemented model performs well in terms of forecasting and in-sample fit: compared with a factor model estimated with a nonstructural but informative prior, the DSGEprior-based forecast performance is clearly better for interest
rate series and roughly equal for inflation and output growth.
However, for large weights of the DSGE prior the performance
gets worse in most cases. Thus, along certain dimensions, the
data are not fully compatible with prior expectations. This finding is supported by the posterior marginal data density estimates,
pointing to a moderately positive optimal prior weight.
Second, we make use of the structural interpretation to analyze the impact of an identified monetary shock on the factors
and the observed series. We find that the response of the factors
is largely in line with the predictions of the New-Keynesian
DSGE model: a contractionary monetary shock decreases inflation, increases interest rates, and has a negative impact on
output growth. Hence, we do not observe a “price puzzle” as
prices decrease after a contractionary money shock. For comparison purposes, we also identify shocks based on a recursive
ordering and a sign-restriction approach (Uhlig 2005). The results indicate that the recursive ordering, with the interest rates
ordered last, provokes the result of rising prices as a response to
a contractionary monetary policy shock. Results derived with a
sign-restriction approach are largely consistent with the theorybased identification. However, it turns out that the posterior
distribution of the response of output is rather uninformative
using standard sign restrictions. We conclude that, first, the
specification of the identification scheme influences the economic conclusions. And second, a theory-based scheme allows

more precise answers to the question at hand. Regarding the
heterogeneity of the reaction among observed series to a contractionary monetary policy shock, we find that the reaction of
the output-related series shows some variation across the series:
gross domestic product (GDP) and industrial production react
negatively to a surprise tightening in monetary policy, while
some other series related to income, for example, disposable
income and expenditures, react positively. This suggests that
channels that are not taken into account in the DSGE model are
important for explaining the reaction of these latter variables to
monetary shocks. In line with economic theory, we find that the
response of interest rates to a monetary policy shock mainly depends on the maturity of the contract. The response of different
measures of inflation is rather uniform.
The article is structured as follows: first, we relate our approach to the existing literature. Section 3 describes the empiri-

137

cal model and its estimation in detail. In Section 4, the empirical
application of the method is specified. We describe the data and
the DSGE model from which we infer the prior distribution for
the coefficients governing the dynamics of the factors. Then,

we provide measures of in-sample fit and discuss the forecast
performance. Finally, we analyze the impact of identified monetary shocks on the factors and the observed variables. Section
5 provides conclusions.
2.

RELATION TO LITERATURE

To our knowledge, there is no contribution in economic literature that builds prior knowledge from DSGE models into
dynamic factor analysis. However, our method is related to a
number of contributions in the factor model and VAR literature. The closest precursor to this article is Boivin and Giannoni
(2006). Extending the ideas of Sargent (1989) and Altug (1989),
they estimate a DSGE model with a large dataset, interpreting
variables in the DSGE model as factors and their observed data
as their (imperfect) measures. Our model continuously bridges
the gap between a nonstructural factor model and the model of
Boivin and Giannoni (2006) in the following sense: by strictly
imposing the restriction of the DSGE model and identifying
the factors using degenerate priors for a subset of factor loadings, one estimates a DSGE model akin to Boivin and Giannoni
(2006). By relaxing restrictions implied by the DSGE model, it
is possible to move toward a nonstructural factor model. A more

recent contribution, related to Boivin and Giannoni (2006), is
Schorfheide, Sill, and Kryshko (2010). These authors used a
DSGE model to estimate unobserved states of the economy in
a first step and then embed these estimates in the information
set to forecast a large set of economic time series. This stepwise
estimation method is less demanding computationally than our
approach, providing an advantage in environments where models have to be estimated very frequently. A disadvantage is
that only a limited set of variables—the “core” variables in the
DSGE model—can be informative about the unobserved state
of the economy. A further difference is that Schorfheide, Sill,
and Kryshko (2010) focused on improving forecasts. In contrast, both our approach and that of Boivin and Giannoni (2006)
explicitly aim at providing a full-information structural analysis
of many economic data series.
Our approach is also related to the analysis in Giannone,
Reichlin, and Sala (2006). They showed that the state variables
of a DSGE model can be interpreted as common factors driving
the observed variables. However, their focus was different:
Giannone, Reichlin, and Sala (2006) modeled the dynamics
of one observed series per variable in the DSGE model, in
which the number of variables can be larger than the number of

shocks. In contrast, we assume that we have the same number
of shocks as variables that will be regarded observables in an
estimation of the DSGE model and interpret these variables as
common factors driving a large number of observed variables.
In addition, our method builds on many important contributions related to different substeps of the estimation. Most importantly, DelNegro and Schorfheide (2004) showed how to use
a prior based on a DSGE model to estimate vector autoregressive models. Because our estimation method relies on a Gibbs
sampler, we are able to embed their method in our setting to

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:04 11 January 2016

138

Journal of Business & Economic Statistics, April 2013

induce prior information from the DSGE model into the estimation of the factor dynamics. Regarding the interpretation of the
factors as economic concepts, we build on the literature on rotating factors toward an interpretable structure, which has a long
tradition in classical factor analysis, see Lawley and Maxwell
(1971). These methods have rarely been used in dynamic models of economic time series. An exception is Eickmeier (2005),
who estimated a factor model for the euro area in a classical
framework.
Our article is also related to the literature studying the transmission of structural shocks to economic variables in factor
models. In this context, Forni, Lippi, and Reichlin (2003) and
Giannone and Reichlin (2006) argued that factor models are
more suitable than VARs, as the large information set potentially helps to overcome nonfundamentalness problems. In previous studies, however, identification has been achieved using
merely ad hoc contemporaneous and long-run restrictions (see,
e.g., Bernanke, Boivin, and Eliasz 2005). In contrast, our method
incorporates restrictions derived from a DSGE model. The identification scheme based on a DSGE model—even though widely
used in the context of VARs—is novel in the factor model literature. The idea to use sign restrictions to identify shocks, which
we use for comparison purpose, goes back to Faust (1998) and
has been elaborated by Uhlig (2005) and Canova (2002) in the
context of structural VARs (SVARs). For dynamic factor models, this approach has already been recognized as a potential
strategy in Stock and Watson (2005). However, they have not
yet applied the method so far.
3.

EMPIRICAL MODEL, ESTIMATION,
AND IDENTIFICATION

The estimation methodology is described in three steps. First,
we describe the empirical model determining the likelihood of
the model parameters. We then present the prior distribution,
reflecting information from economic theory, and show how the
posterior distribution is calculated. Finally, we propose different
methods to identify the structural shocks, building on the relation
of the empirical model to economic theory.
3.1 The Empirical Model
We assume that the data evolve according to the following
dynamic factor model:
Observation equation:
Xt = Ft + vt

(1)

(L)Ft = et

(2)

State equation:

where Xt is a potentially high-dimensional vector of n =
1, . . . , N data series observed over t = 1, . . . , T time periods.
The idiosyncratic component is allowed to be serially correlated:
vt = vt−1 + ut

(3)

Ft is a vector of unobserved dynamic factors, the states, whose
dimension M is typically much smaller than N. Each variable
in Xt loads at least on one factor.  is the N × M matrix of

factor loadings. The factors Ft are related to their lagged values by (L) = I − 1 L − · · · − p Lp . The error processes are
assumed to be Gaussian white noise:
 
  

R 0
ut
0
,
.
(4)
∼ iidN
et
0 
0
While we do not restrict the structure of , we assume that
R and  are diagonal. Hence, the idiosyncratic components are
cross-sectionally uncorrelated. These assumptions fully determine the likelihood ϕ(X|), that is, the distribution of the data
given a set of parameters  = (, R, , , ). Here and in
what follows, ϕ(·) denotes a generic density function.
3.2 Specifying a Prior Distribution Motivated
by Economic Theory
The specification of the prior distribution is the key element
of this article. As described in the introduction, we build up a
belief on which common factors drive the economy in a first
step. We implement this belief by forming a prior on how the
observed variables are related to these common factors. This
prior identifies the factors in a economically meaningful way.
In a second step, we induce prior information from economic
theory shaping the dynamics of the factors. We implement these
ideas by assuming that the parameters in the observation equation are a priori independent from the parameters in the state
equation. This allows us to proceed in two steps, described in
turn in the following sections.
3.2.1 Prior for Observation Equation: Identification of the
Factors. A major difference between our setting and standard
factor models (e.g., Stock and Watson 2002) is that the loading
matrix  is identified using a prior centered at an economically
interpretable structure, instead of using an arbitrary statistical
normalization. This exploits the fact that the factors are only
identified up to an invertible transformation. To see this, plug
an invertible matrix Q into Equation (1):
Xt = QQ−1 Ft + vt
˜ F˜ t + vt
=

(5)
(6)

˜ = Q. Adapting Equation (2) accordwith F˜ t = Q−1 Ft and 
ingly, the representation (5) is observationally equivalent to (6).
The fact that we can rotate the factors with any invertible transformation Q can be used to make the factors interpretable. More
˜ = Q comes as
specifically, we can rotate the factors so that 
close as possible to the desired factor structure. In our Bayesian
setting, a natural way to rotate the factors is to use an informative prior distribution for . This “identifies” the factors in the
sense that it puts curvature into the posterior density function
for regions in which the likelihood function is flat. Generally,
Bayesian analysis is always possible in the context of nonidentified models as long as proper prior on all coefficients are
specified (see, e.g., Poirier 1998; Aldrich 2001 for a discussion
of identification in Bayesian context). In our implementation,
the prior is centered in such a way that we have an economically sensible and interpretable relationship between the factors
and the observed series a priori. In particular, the prior mean
0 is chosen so that the a priori interpretation of the factors
corresponds to the economic concepts contained in the DSGE

Baurle:
Structural Dynamic Factor Analysis Using Prior Information From Macroeconomic Theory
¨

model. However, the posterior distribution of  does not necessarily satisfy strictly the restrictions contained in 0 as it is not
always possible to find a transformation Q such that 0 = Q.
The deviation of the posterior distribution of  from 0 depends on the tightness of the prior. Note that the DSGE prior
for the factors (see Section 3.2.2) also puts “curvature” into the
likelihood function through  and , and therefore, possibly
identifies the factors without having to use an informative prior
on the factor loadings. The success of this mechanism depends
on specifics of the DSGE model and the prior distribution of the
deep parameters. In our application, we find that the identification through this channel plays a minor role.
Concretely, the prior for the parameters in the observation
equation is

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:04 11 January 2016

ϕ(R, , ) =

N


ϕ(Rn , n |n )ϕ(n )

(7)

n=1

with ϕ(Rn , n |n ) and ϕ(n ) such that
Rn ∼ IG2 (s, ν)


n |Rn ∼ N 0,n , Rn M−1
0,n
n ∼ N(0, 1)

(8)
(9)
(10)

IG2 denotes an inverse gamma distribution as defined in the appendix of Bauwens, Lubrano, and Richard (2000). Rn and n
are the n, nth element of R and , respectively. n is the nth row
of , that is, the marginal effect of Ft on Xt,n . 0,n is the prior
mean of this vector chosen according to the economic interpretation of Xt,n . Thus, the factors are rotated so that they have a
clear economic interpretation a priori. M0,n is a M × M-matrix
of parameters that influences the tightness of the priors in the observation equation. The subscript n reflects the fact that the prior
tightness for the factor loadings can vary for different observed
series. Our empirical implementation in Section 4 exemplifies
that the tightness depends on the researchers’ knowledge about
the relation of observed data to the economic concepts driving
the factors.
3.2.2 Prior for State Equation: Inducing Information From
Economic Theory. The prior distribution of the parameters in
the state equation ϕ(, ) reflects information from economic
theory. Following DelNegro and Schorfheide (2004), we introduce an additional parameter vector θ that collects the deep
structural parameters from a linearized DSGE model and define
a hierarchical prior
ϕ(, , θ ) = ϕ(, |θ)ϕ(θ),

(11)

where ϕ(, |θ ) is of the Inverted-Wishart-Normal form:
|θ ∼ IW(λT  DSGE (θ), λT − Mp)


−1 
|θ,  ∼ N DSGE (θ),  ⊗ λT Ŵ DSGE
XX (θ)

(12)
(13)

 DSGE (θ) and DSGE (θ) can be interpreted as the coefficients of
a VAR estimated on an infinite sample of observations simulated
DSGE DSGE′
|θ) is the
Xt
with a DSGE model. Ŵ DSGE
XX (θ ) = E(Xt
DSGE
DSGE′
= (Ft−1 , . . . , FDSGE
)
as
implied
variance of the factors Xt
t−p
by the DSGE model. These matrices are calculated as follows.
Given θ , a solution to a DSGE model is derived using standard

139

methods (see, e.g., Sims 2002):
St = G(θ)St−1 + H(θ )εt

(14)

St are the fundamental states and εt gathers the structural shocks
in the DSGE model. G(θ ) and H(θ ) determine the dynamics of
the states in relation to the shocks. The DSGE-model-implied
VAR coefficients are
−1 DSGE
DSGE (θ ) = Ŵ DSGE
XX (θ) Ŵ XF (θ)

(15)

DSGE
DSGE
 DSGE (θ ) = Ŵ DSGE
(θ ),
FF (θ) − Ŵ FX (θ)

(16)

DSGE
DSGE

where the moments Ŵ DSGE
FX (θ) = Ŵ XF (θ) and Ŵ FF (θ) are
DSGE
defined analogously to Ŵ XX (θ). We define a selection matrix
Z mapping St to the counterpart of the empirical factors Ft
implied by the DSGE model:

FDSGE
= ZSt .
t

(17)

If the states in the DSGE model directly correspond to
the factors, this matrix is an identity matrix. In many cases,
however, the factors are a linear combination of a subset of
states as exemplified in Section 4. With this definition, the
moment matrices in (15) and (16) can be calculated using
E(Ft F′t−h ) = ZG(θ )h E(St S′t |θ)Z′ , where E(St S′t |θ) is a solution to the Lyapunov equation
E(St S′t |θ) = G(θ )E(St S′t |θ )G(θ)′ + H(θ )E(εt ε ′t )H(θ )′ .
(18)
The hyperparameter λ in (12) and (13) determines the tightness of the DSGE prior in the state equation. Following DelNegro and Schorfheide (2004), we select λ by estimating the model
over a grid of λ and compare the resulting models by means of
marginal data densities and out-of-sample forecast performance,
see Section 4. The specification of the prior is completed with
a distribution of the DSGE model parameters, ϕ(θ ). This distribution is specific to the DSGE model, and therefore, discussed
in Section 4.
The interpretation of the DSGE model prior is best described
for the extreme cases of λ close to zero and λ → ∞. In the case
of a very small λ, the posterior distribution of the factor model
parameters will be mostly a function of the empirical moments
of the data. That is, the posterior mode of the parameters governing the factor dynamics converges to the maximum likelihood
estimate given the factors. Note, however, that also in this case,
the DSGE model parameters θ are updated. Specifically, the
posterior estimate of  and  can be interpreted as a minimum distance estimate (see, e.g., Chamberlain 1984; Moon and
Schorfheide 2002) that is obtained by minimizing the weighted
discrepancy between the unrestricted VAR estimates and the
restriction function implied by the DSGE model. On the other
hand, in the case of λ → ∞, the estimation strictly incorporates
the DSGE model restrictions in the factor dynamics. The difference to a maximum likelihood estimation of the DSGE model
arises if, according to the DSGE model, the stochastic process
of the factors is a VAR of infinite order. In this case, the likelihood function of the finite order VAR is only approximately
equal to the likelihood function under the assumption that the
DSGE model is the true data-generating process.

140

Journal of Business & Economic Statistics, April 2013

3.3 Deriving the Posterior Distribution
As an analytical derivation of joint posterior distribution of
parameters is not tractable, we use a Gibbs sampling approach
to simulate from the posterior distribution. For a recent treatment of Markov chain Monte Carlo (MCMC) methods and a
general discussion of Gibbs sampling, see Geweke (2005). In
our implementation, we exploit the fact that given the factors
F = (F1 , . . . , FT ), the parameters in the state equation are independent from X = (X1 , . . . , XT ) and from the parameters in
the observation equation. Furthermore, conditional on  and
, F is independent of θ . Thus, starting with initial draws
for 0 , R0 ,  0 , 0 ,  0 and θ 0 , a Gibbs sampler can be implemented by iterating j = 1 . . . J times over the following steps:

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:04 11 January 2016

j

j−1

j −1

j −1

j −1

Step 1. Draw F from ϕ(F| , R ,  ,  ,
 j −1 , X). It has become standard to use a multimove
sampler (Fr¨uhwirth-Schnatter 1994 and Carter and Kohn
1994). In our setting, the algorithm has to be adapted for
autoregressive errors and potentially colinear states, see,
for example, Anderson and Moore (1979) and Kim and
Nelson (1999).
Step 2. Draw  j , R j , and  j from ϕ(, R, |F j−1 , X).
The derivation of the posterior distribution of the parameters in the observation equation is standard, see, for example, Chib (1993) and Bauwens, Lubrano, and Richard
(2000). Omitting the superscript j − 1 for notational convenience, this results in

where

Rn |X, F ∼ iG(R¯ n , T − 1 + ν)


¯ n , Rn M
¯ −1
n |X, F ∼ N 
n


¯ n , N¯ n−1
n |X, F, n , Rn ∼ N 



 
ˆ n − 0,n )′ M−1 +′ F˜ ′ F˜ n −1 −1
R¯ n = s + u′n un + (
n
0,n

ˆ n − 0,n )
× (


¯n = M
˜′ ˜ ˆ
¯ −1

n M0,n 0 + Fn Fn 

¯ n = M0,n + F˜ ′n F˜ n
M

ˆn
¯ n = N¯ n−1 Rn−1 v′n vn 

N¯ n = 1 + Rn−1 v′n vn ,
F˜ n = (F1,n , . . . , FT ,n ) consists of the filtered states F˜ t,n =
˜ n = (X1,n , . . . , XT ,n ) collects the filtered
Ft − n Ft−1 , X
ˆ n is the ordinary
observations X˜ t,n = Xt,n − n Xt−1,n , 
˜ n on F˜ n , and un is the vecleast-square (OLS) estimate of X
ˆ n is the OLS
tor of estimated residuals in this regression. 
estimate of a regression of vnt = Xnt − n Ft on its lagged
value and vn is a vector collecting vnt for t = 2, . . . , T .
Step 3. Draw  j ,  j , and θ j from ϕ(, , θ |F j −1 , X). We
invoke the method of DelNegro and Schorfheide (2004) in
this step. First, we draw θ j from ϕ(θ |F) using the Metropolis algorithm in Schorfheide (2000). Specifically, we select
a candidate θ ∗ from a proposal distribution θ ∗ = θ j −1 + ξ j
with ξ j drawn from a scaled Student’s t-distribution, see
Section 4.4 for the detailed specification of the scale in our

application. The proposal draw is accepted with probability



ϕ(Fj | θ ∗ )ϕ(θ ∗ )
,1 .
r = min
ϕ(Fj | θ j −1 )ϕ(θ j −1 )
Prerequisite is that ϕ(Fj | θ ∗ )/ϕ(Fj | θ j −1 ) can be evaluated. As shown by DelNegro and Schorfheide (2004), this
is the case here with
ϕ(F | θ )

M
λT Ŵ DSGE (θ) + T Ŵ XX − 2 |(λ + 1)T (θ
˜ )|− (λ+1)T2 −Mp
XX

,

M
λT Ŵ DSGE (θ) − 2 |λT  DSGE (θ )|− λT −Mp
2
XX

where

 


1  DSGE
˜
λŴ FF +Ŵ FF − λŴ DSGE
+ Ŵ FX (θ)
FX
λ+1
 DSGE
 

˜ ) = λŴ XX + Ŵ XX −1 λŴ DSGE
(θ
+ Ŵ DSGE
XF
XX
˜ )=
(θ

Ŵ XX , for instance, is defined in analogy to Ŵ DSGE
with the
XX
model implied factors replaced by the factors Fj . Finally,
we draw  j and j from ϕ(, |θ j , Fj −1 ) using the fact
that their distribution is of the Inverted-Wishart-Normal
form:
˜
|F, θ ∼ IW((λ + 1)T (θ),
(1 + λ)T − Mp)
˜
|, F, θ ∼ N((θ),
( ⊗ (λT Ŵ(θ) + T Ŵ XX )−1 )).
Standard methods can be used to draw from the inverted
gamma, Inverted-Wishart, and Normal distributions in Steps 1–
3. To initialize the algorithm, any value for 0 with ϕ(0 ) > 0
is valid in principle. In practice, however, it is recommended to
run the algorithm for different initial values and to assure that
the results do not differ. See Section 4.4 for details regarding
our implementation of convergence diagnostics.
3.4 Identification of Shocks
So far, we have described how the posterior distribution of
the parameters in Equations (1) and (2) is derived. We now need
to describe how structural, economically interpretable shocks
can be identified. Generally, the residuals in the state equation
relate to structural shocks εt as
et = HVAR ε t

(19)

E(εt ε′t )

with
= IM . We assume that HVAR is invertible, hence,
that there are as many shocks as factors. The problem of identification of structural shocks arises because HVAR cannot be
uniquely determined using only information from the reduced
form estimation of the factor model. HVAR is only restricted by
its relationship to the covariance matrix of the reduced form
residuals:
 = HVAR E(εt ε ′t )H′VAR = HVAR H′VAR .

(20)

˜ = HVAR with an orthonormal
However, any transformation H
matrix also satisfies these restrictions, but implies potentially
very different reactions of Ft to the shocks. In other words, each
identification scheme corresponds to a specific choice of . We
now present the three different schemes that are implemented
in the application.

Baurle:
Structural Dynamic Factor Analysis Using Prior Information From Macroeconomic Theory
¨

3.4.1 DSGE Model Rotation. DelNegro and Schorfheide
(2004) proposed an approach that relies on the fact that in
the DSGE model, the shocks are exactly identified. Hence,
∂F∗t
= H(θ) is uniquely determined. Recall that H (θ) can be
∂ε′t
calculated using standard methods to solve linear(ized) DSGE
models. Furthermore, there is a unique decomposition of this
matrix into the product of a triangular matrix Htr,DSGE (θ ) and an
orthonormal matrix (θ):
H(θ ) = Htr,DSGE (θ) (θ).

(21)

˜ to Htr,VAR , the Cholesky decomposition
The idea is to set H
of , and then to use (θ ) as a rotation:

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:04 11 January 2016

HVAR (θ ) = Htr,VAR (θ).

(22)

On impact, the response differs to the extent that Htr,DSGE (θ)
and Htr,VAR differ. Thus, if the covariance matrix of the residuals
is similar to its counterpart in the DSGE model, the responses
on impact will be close. For horizons bigger than zero, there is
the influence of , which allows for further deviations of the
factor model responses to the DSGE model implications.
3.4.2 Sign Restrictions. The idea of the second approach
is to be “agnostic”: one tries to find restrictions on the sign of the
response, which are consistent with commonly accepted theories. Depending on the nature of these restrictions, it is possible
to reduce the range of possible rotations . This idea goes back
to Faust (1998) and has been elaborated by Uhlig (2005) and
Canova (2002). We implement the “pure” sign-restriction approach, as opposed to the “penalty-function” approach. Hence,
we do use an additional criterion to select the “best” of all
impulse response vectors. All impulse responses satisfying the
sign restrictions are considered to be equally likely. In the “pure”
sign-restriction approach, one estimates the impulse responses
and the reduced form coefficients jointly. The impulse responses
are parameterized as follows: Hsign = Hchol (α), where (α)
is the orthonormal rotation matrix with one column given by
a vector of unit length α. Hchol is the Cholesky decomposition
of the covariance matrix  = Hchol H′chol . Uhlig (2005) showed
that the set of impulse response functions can be characterized
by a suitable choice of α. The prior for the coefficients in the
state equation is formulated as
ϕ(, , α) ∝ ϕ(, )I (α),

(23)

where I (α) is one if the sign restrictions are satisfied and zero
otherwise. Note that the posterior distribution of the reduced
form coefficients is different from the pure reduced form estimation. Draws for which it is more likely that the sign restrictions are satisfied receive more weight. For details regarding the
implementation, see Uhlig (2005). Note that in our setting, it is
possible to restrict the response of the factors or the response of
a set of observed series (or both).
3.4.3 Recursive Identification. A prominent assumption
in the SVAR literature is that inflation and output react to
monetary policy shocks only with a lag. Hence, based on
this assumption, one can exactly identify monetary policy
shocks, technically using a Cholesky decomposition (see, e.g.,
Christiano, Eichenbaum, and Evans 1999 for a discussion).
We attempt to compare the previous two approaches with this

141

scheme based on timing restrictions, as it is a method that has
been widely applied to identify monetary policy shocks.
4.

EMPIRICAL ANALYSIS OF U.S. DATA

By implementing our method, we show that it performs well
in terms of forecasting, and most importantly, that the structural
interpretation allows us to deduce economic insights from the
estimated model. The model is estimated on quarterly U.S. data
as described in Section 4.1. The DSGE prior and the prior for the
observation equation are presented in 4.2 and 4.3, respectively.
Section 4.4 addresses some issues concerning the implementation of the MCMC algorithm. Section 4.5 evaluates the forecast
performance. In Section 4.6, we provide the discussion of the
optimal weights based on measures of in-sample fit. Finally,
we investigate how identified monetary shocks influence the
common factors and the observed series in Section 4.7.
4.1

Data

The variables that are of primary interest for the analysis of
monetary policy are inflation, interest rates, and output. We,
therefore, compile a dataset with measures related to these variables. We use quarterly data from 1985:1 to 2007:3, discarding
observations from periods earlier than 1985 because there is evidence for structural break at around 1984 (see, e.g., Stock and
Watson 2003). The output series include data on real personal
income, consumption expenditures, domestic product, industrial
production, and capacity utilization. Price indicators are deflators of GDP and consumption expenditures, and consumer price
indexes for several subgroups of goods. Interest rates include
bonds with different ratings, treasury bonds, and the federal
(FED) funds rate. If there was only monthly data available, we
took averages to obtain a quarterly series. A complete list of the
63 series including references to the data sources is contained
in the online supplementary material. We demean the data and
standardize the variance of the series to the standard deviation of
one specific series in the sample: in particular, we standardize all
“output series” to have the same standard deviation as GDP. For
the “price series,” we use the GDP deflator and for the “interest
rate series,” we use the FED funds rate as normalizing series.
This makes the estimation more robust against the influence of
data series with large variance.
One further issue is that, in particular in classical analysis
of factor models, there is a large and still developing literature
of statistical tests to determine the number of factors. In a
Bayesian framework, posterior data densities can be used to
determine the optimal model among several models even if
they are not nested. Hence, in principle, a set of models for a
grid of numbers of factors could be specified and evaluated
accordingly. However, these alternative models have to be well
specified, including prior distribution for its coefficients. Our
DSGE prior cannot immediately be adapted to a number of
factors other than three. Alternatively, we test for the number of
factors using generic information criteria. We report the results
based on the dataset with the series scaled to unit variance.
The IP1 and IP2 criteria proposed by Bai and Ng (2002) always
select the maximum number of factors allowed for. However,
these criteria are rather sensitive to the choice of the penalty

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:04 11 January 2016

142

Journal of Business & Economic Statistics, April 2013

function, see, for example, Alessi, Barigozzi, and Capasso
(2009) and Forni, Lippi, and Reichlin (2003). In our case,
applying the log version of the criteria BIC3 from Bai and Ng
(2002), which incorporates a moderately stronger penalty for an
increasing number of factors, points to three factors. Moreover,
when multiplying the penalty term of IP1 and IP2 by two, the
adjusted criteria also point to three factors. A multiplication
of the penalty term by a constant factor does not affect the
asymptotic results in Bai and Ng (2002), but influences the
finite sample properties as shown in Alessi, Barigozzi, and
Capasso (2009). Multiplying the penalty term by two can be
rationalized based on the proposal in Alessi, Barigozzi, and
Capasso (2009) to select the constant in such a way that the
result is robust across sub samples. A further method, proposed
by Ahn and Horenstein (2009), is to set the number of factors
so that the ratio of the eigenvalues to the respective adjoining
eigenvalues is maximized. This method also points to M = 3.
Detailed results for these tests are documented in the online
supplementary material. Overall, we conclude that information
criteria based evidence supports our assumption of three factors.
4.2 Prior for Factor Dynamics: A New-Keynesian
DSGE Model
A central building block in our prior distribution is the DSGE
model, providing a prior belief on the common factor dynamics.
Presumably the most popular model in contemporary monetary
macroeconomics is the standard “New-Keynesian” model (see,
e.g., Clarida, Gali, and Gertler 2000 for an overview). It describes the joint dynamics of output, inflation, and the interest
rate based on optimizing behavior of a representative consumer
and firms subject to sticky prices. In this article, we closely follow the specification of DelNegro and Schorfheide (2004). A
detailed description of the model is given in the online supplementary material. The resulting log-linearized equations are
1
ρz
(rt − Et πt+1 ) + (1 − ρg )gt + zt (24)
τ
τ
πt = βEt πt+1 + κ(yt − gt )
(25)
yt = Et yt+1 −

rt = ρrt−1 + (1 − ρ)[ψ1 πt + ψ2 yt ] + εtr .

(26)

All the variables in the model are written in deviations from
trend. The first equation is a standard Euler equation in a model
without capital, linking output yt to the expected real interest
rate rt − Et πt+1 , expected output Et yt+1 , exogenous technology zt and gt , which can be interpreted as government spending
shock. The Philips curve can be derived by assuming price adjustment costs, perfectly competitive labor markets, and a linear
production function. It relates current inflation πt to expected
inflation Et πt+1 , the output gap yt , and gt . The third equation is
a Taylor rule that attempts to describe the behavior of the Central Bank. The nominal interest rate rt depends on the lagged
nominal interest rate, a monetary shock εtr , and the reaction of
the Central Bank to current inflation and to the output gap. The
exogenous components gt and zt evolve according to
zt = ρz zt−1 + εtz
g

gt = ρg gt−1 + εt .

Table 1. Prior distributiona
Parameter
ψ1
ψ2
ρr
κ
τ
ρg
ρz
σr
σg
σz

Distribution

Mean

Std.Deviation

Gamma
Gamma
Beta
Gamma
Gamma
Beta
Beta
Inverse gamma-1b
Inverse gamma-1b
Inverse gamma-1b

1.5
0.125
0.5
0.3
2
0.8
0.3
0.251
0.630
0.875

0.5
0.1
0.2
0.15
0.5
0.1
0.1
0.139
0.323
0.430

NOTES: a The inverse gamma-1 density is parameterized as in Bauwens, Lubrano, and
  −˜ν
2
Richard (2000): ϕ(σ |˜ν , s˜ ) = 2ν˜ 2s˜ 2 σ −˜ν −1 e−˜s /2σ , where ν˜ = 4 and s˜ equals 0.16,
Ŵ( 2 )

1, and 1.96, respectively.
Following DelNegro and Schorfheide (2004), we truncate the prior density such that the
parameter space is restricted to the determinacy region (corresponding to approximately
98.5% of the prior mass as defined above).

b

g

The shocks εtz , εt , and the monetary policy shock εtr are assumed to be uncorrelated with each other and across time. Their
standard deviation is σz , σg , and σr , respectively. The rational
expectations solution to the model can be calculated using various methods, for example, Sims (2002). The model provides a
complete description of comovement between output, inflation,
and interest rates as a function of its deep parameters (denoted
by θ , in what follows).
The prior distribution for θ is taken from DelNegro and
Schorfheide (2004). Parameters are assumed to be independently distributed according to Table 1. We do not attempt to
estimate the steady state values for the interest rate, and there∗
fore, calibrate rγ = β = 0.99.
A central issue is to define the mapping Z, that is, to determine how the economic concepts contained in the factors relate
to the variables in the DSGE model. In our dataset, we include
the measures of inflation, the growth rate of the output series,
and the (nonannualized) interest rate series for our estimation.
= (y˜t , π˜ t , r˜t )′ , where y˜t , π˜ t , and
Therefore, we define FDSGE
t
r˜t are the model implied equivalents of output growth, inflation,
and interest rates. With St = (yt , yt−1 , πt , rt , zt , gt ), the mapping from the DSGE model variables to the factors is


1 −1 0 0 1 0


Z = ⎝0 0 1 0 0 0⎠,
(27)
0

0

0 1 0 0

see DelNegro and Schorfheide (2004). The DSGE model equations paired with the mapping Z and the prior distribution for θ
allow the derivation of a prior distribution for the parameters in
the state Equation (2) as described in Section 3.2.
4.3 Prior for Observation Equation: Identification
of Factors

The prior distribution on the coefficients in the observation
equation induces the economic interpretation of the factors. The
DSGE model contains the economic concepts “output growth,”
“inflation,” and “short-term interest rates.” In the dataset, we
have measures of these concepts as well as long-term interest

Baurle:
Structural Dynamic Factor Analysis Using Prior Information From Macroeconomic Theory
¨

rates. Hence, we assume that a priori the output measures load
exclusively on the first factor and the inflation series only on
the second one. The short-term interest rates (with a maturity
of at most one quarter) load solely on the third factor a priori,
while the long-term interest rates load on all the factors. We set
the corresponding element of 0 to one if a series loads on the
factor a priori, otherwise to zero. As for most of the series, we
do not have a clear prior on the size of the loading, we impose
only a rather flat prior by setting M0,n = 41 IM in

However, for three series, we do have a stronger prior: the
GDP and its deflator are standard measures for output and prices
and the FED funds rate is the most prominent measure for the
short-term interest rate (see, e.g., Uhlig 2005). For these series,
we let the elements of M0,n converge to infinity, strictly imposing that they exclusively load on their respective concept with
the corresponding element of 0,n equal to one and the other
elements of 0,n equal to zero. In this case, Step 2 in Section 3.3
reduces to setting n = 0,n and drawing Rn and  n according
to the standard formulas with R¯ n = s + u′n un . This identifies
the factors in the desired way, making it possible to interpret
them as economic concepts. Note that these restrictions can be
interpreted as exactly identifying restrictions for the factors in a
classical framework. Hence, we rotate the factors in such a way
that the standard measures for the variables in the DSGE model
are directly and exclusively driven by the corresponding factor. With respect to the remaining hyperparameters in the prior
distribution, we set s = 3 and ν = 4.001 to get a rather diffuse
prior distribution with a finite variance (see Bauwens, Lubrano,
and Richard 2000, p. 303).

proximately 300,000. This somewhat slow convergence is due
to chains with rather extreme initial draws. For the parameters
that are relevant for forecasting, convergence is achieved much
earlier, after approximately 50,000 draws. Therefore, we keep
J = 500,000 for the structural analysis, but set J = 250,000 in
the forecasting exercise to reduce computational burden.
On a standard 1.2 GHz PC, around 160 min are needed to
produce 100,000 draws. The exact time depends on the sample
size and is also slightly influenced by the prior weight λ. The
DSGE-based identification is calculated in less than a minute
for 10,000 draws. For the sign-restriction approach, we draw 10
rotations for each draw of the posterior distribution, resulting
in approximately 15,000 accepted draws. This procedure takes
approximately 2 min on our 1.2 GHz PC.
The covariance matrix of the proposal draw is chosen proportionally to the dispersion of the prior distribution of each element
of θ , appropriately scaled to obtain an acceptance rate between
0.2 and 0.3. The degrees of freedom in the t-distribution of the
proposal innovation is set to 40. The number of lags p in the state
equation is 4. In the benchmark model, we replace the DSGE
prior with a Minnesota-style prior. This is a natural benchmark,
as it has been used by DelNegro and Schorfheide (2004) in
the context of a Bayesian vector autoregression (BVAR), and
has also been applied in the factor model literature (see, e.g.,
Kryshko 2011). Further, by loosening the tightness of the prior,
it approaches a classical likelihood-based model.
The Minnesota prior is implemented with dummy observations as described in the appendix of Lubik and Schorfheide
(2006). However, we adjust the parameters governing the first
autoregressive coefficient to zero. The overall tightness τ of the
Minnesota-style prior is chosen by maximizing the marginal
data density, resulting in τ = 0.5, see Table 3.

4.4 Implementation of the MCMC Algorithm

4.5

We iterate J = 500,000 times over Steps 1–3 described before for λ ∈ (0.25, 0.5, 1, 2, 5, 100). The lowest value for λ is
chosen such that the DSGE prior remains proper (see DelNegro
and Schorfheide 2004 for a condition defining a threshold). The
largest value, λ = 100, corresponds to an almost strict implementation of the DSGE model restrictions. To mitigate the effect
of the initial values, set to values close to the prior mean, we
discard the first 20% of the draws. For computational reasons,
we evaluate only every 40th draw such that we have 10,000
draws to calculate the posterior distribution of the parameters.
We assess the convergence properties of the algorithm by
running 10 distinct chains using different initial values, drawn
from the prior distribution. Specifically, we draw θ j from ϕ(θ)
allowing us to calculate (θ j ) and (θ j ), and then draw R, ,
and  from their prior distributions. Based on these chains, we
compare between and within moments as proposed by Gelman
and Rubin (1992) with the so-called scale reduction factor calculated as suggested by Brooks and Gelman (1998). Additionally, we rely on a number of graphical considerations including
plots of recursively calculated moments and cusum path plots
(see Yu and Mykland 1998). Gelman and Rubin (1992) suggested running the sampler until the squared scale reduction
factor is below 1.2 or 1.1. For the structural hyperparameters
θ, the squared scale factor turned out to be below 1.1 after ap-

The analysis of the forecast performance suggests, first, that a
positive but moderate weight of the DSGE prior is optimal and,
second, that the DSGE-factor model improves the forecast as
compared with a factor model with a nonstructural, Minnesotastyle prior in some dimensions. These results are derived by
comparing the forecasting performance dur