07350015%2E2014%2E940081

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

Density-Tempered Marginalized Sequential Monte
Carlo Samplers
Jin-Chuan Duan & Andras Fulop
To cite this article: Jin-Chuan Duan & Andras Fulop (2015) Density-Tempered Marginalized
Sequential Monte Carlo Samplers, Journal of Business & Economic Statistics, 33:2, 192-202,
DOI: 10.1080/07350015.2014.940081
To link to this article: http://dx.doi.org/10.1080/07350015.2014.940081

Accepted author version posted online: 25
Sep 2014.

Submit your article to this journal

Article views: 153

View related articles


View Crossmark data

Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]

Date: 11 January 2016, At: 19:29

Density-Tempered Marginalized Sequential
Monte Carlo Samplers
Jin-Chuan DUAN
Business School, Risk Management Institute, National University of Singapore, Singapore
(bizdjc@nus.edu.sg)

Andras FULOP

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:29 11 January 2016

ESSEC Business School, Avenue Bernard Hirsch B.P. 50105, 95021 Cergy-Pontoise Cedex, France
(fulop@essec.fr)

We propose a density-tempered marginalized sequential Monte Carlo (SMC) sampler, a new class of
samplers for full Bayesian inference of general state-space models. The dynamic states are approximately
marginalized out using a particle filter, and the parameters are sampled via a sequential Monte Carlo
sampler over a density-tempered bridge between the prior and the posterior. Our approach delivers exact
draws from the joint posterior of the parameters and the latent states for any given number of state
particles and is thus easily parallelizable in implementation. We also build into the proposed method a
device that can automatically select a suitable number of state particles. Since the method incorporates
sample information in a smooth fashion, it delivers good performance in the presence of outliers. We
check the performance of the density-tempered SMC algorithm using simulated data based on a linear
Gaussian state-space model with and without misspecification. We also apply it on real stock prices using
a GARCH-type model with microstructure noise.
KEY WORDS: Bayesian methods; MCMC; Particle filter.

1.

INTRODUCTION

In the last decade, particle filtering, based on sequential importance sampling, has become a state-of-the-art technique in
handling general dynamic state-space models (see Doucet, De
Freitas, and Gordon 2001 for a review). While such algorithms

are well adapted for filtering dynamic latent states based on
some fixed model parameter value, full Bayesian inference over
parameters remains a hard problem. In a recent contribution,
Andrieu, Doucet, and Holenstein (2010) proposed the particle
MCMC (PMCMC) method as a generic solution which runs
an MCMC chain over parameters, and used a particle filter
to marginalize out latent states and to determine the acceptance
probability in the Metropolis-Hastings (MH) step. They showed
that for any finite number of particles, the equilibrium distribution of the Markov chain is the joint posterior of the parameters
and the latent states. Compared with traditional MCMC schemes
that augment the parameter space with the latent states, the PMCMC method is easy-to-use and is applicable to a wide range
of problems.1
However, PMCMC can be computationally costly as it needs
to run the particle filtering algorithm for tens of thousands of parameter sets. A recent stream of papers proposed ways to enlarge
the applicability of PMCMC by using adaptive MH proposals
and relying on algorithms that are parallelizable in terms of
model parameters. The former makes the algorithm more efficient, whereas the latter takes advantage of modern general
purpose graphical processing units (GPUs) and massively parallel computing architectures. Pitt et al. (2010) stayed within the

PMCMC framework and advocated the use of adaptive MCMC

kernels. An alternative approach is to define a sequential Monte
Carlo (SMC) sampling scheme over the parameters where the
latent states are approximately marginalized out using a particle
filter. Chopin, Jacob, and Papaspiliopoulos (2013) and Fulop and
Li (2013) were primarily concerned with sequential inference,
and defined the sequence of targets over gradually expanding
data samples, which allows them to conduct joint sequential
inference over the states and parameters.
This article makes two contributions. First, we propose an alternative sequence of targets for the marginalized SMC routine.
We set up a density-tempered bridge between the prior and the
posterior that allows a smooth transition between the two distributions. We show that with a proper choice of the tempering
scheme, the algorithm is easy to implement and delivers exact
draws from the joint posterior of the dynamic states and the
model parameters. There are two advantages of our proposed
density-tempered marginalized SMC sampler as compared to
the sequentially expanding data approach, which are particularly important in analyzing real data using models that are
likely misspecified. First, it provides a direct link between the
prior and posterior, reflected in fewer resample-move steps in
running the algorithm. Second, through a judicious choice of the
tempering scheme as in Del Moral, Doucet, and Jasra (2012),

one can have a better control of particle diversity, reflected in
lower Monte Carlo errors. This article’s second contribution

1See

Fernandez-Villaverde and Rubio (2007) for an intuitive application of
particle filter within MCMC, and Flury and Shephard (2008) for a demonstration
of the PMCMC approach for an array of financial and economic models.
192

© 2015 American Statistical Association
Journal of Business & Economic Statistics
April 2015, Vol. 33, No. 2
DOI: 10.1080/07350015.2014.940081
Color versions of one or more of the figures in the article can be
found online at www.tandfonline.com/r/jbes.

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:29 11 January 2016

Duan and Fulop: Density-Tempered Marginalized Sequential Monte Carlo Samplers


rests in proposing a new method for tuning the number of particles for the latent state through using a reinitialized density
tempering that is more stable than the exchange step advocated
by Chopin, Jacob, and Papaspiliopoulos (2013).
Our contributions are illustrated by two applications. First,
we examine the performance using a simple linear Gaussian
state-space model where the exact likelihood is available from
the Kalman filter. We investigate the performance of different algorithms when data are simulated based on a correctly specified
model as well as when outliers of different magnitude are added
to the data sample. Our first finding is that while the expandingdata and density-tempered SMC procedures perform similarly
for noncontaminated data, the density-tempered approach is
much more robust to data contaminations. These results hold
true regardless of whether the exact likelihood (from the Kalman
filter) or the estimated likelihood (from the particle filter) is used.
Our second finding suggests that our new method for tuning the
number of state particles leads to substantially lower Monte
Carlo errors than the exchange step method of Chopin, Jacob,
and Papaspiliopoulos (2013). The key to this improvement is a
better control of particle diversity achieved by tempering.
Our second application is a study using a nonlinear asymmetric GARCH model with microstructure noises. The model

is estimated on the daily stock price data of 100 randomly selected firms in the CRSP database during the 2002–2008 period.
Our results suggest that the density-tempered approach outperforms the expanding-data approach, and our new state particle
tuning method also yields more reliable results.
Our article joins a growing literature on using SMC techniques in the Bayesian estimation of financial and economic
models that often work with the joint density of the latent states
and model parameters. For example, Jasra et al. (2011) estimated
a stochastic volatility model using adaptive SMC samplers. In
an online setting, Carvalho et al. (2010) provided methods to
jointly estimate the model parameters and states for models that
admit sufficient statistics for the posterior. Johannes, Korteweg,
and Polson (2014) presented an application to predictability
and portfolio choice. Herbst and Schorfheide (2014) developed
an SMC procedure over parameters for macroeconomic models where the likelihood function at fixed parameters is exactly
known. Durham and Geweke (2011) investigated a GPU implementation of expanding-data SMC samplers on a range of commonly used models in economics. With our density-tempered
SMC sampler, estimation and inference for a large class of complex state-space models can be reliably conducted.
2.

ESTIMATION METHOD

We are concerned with the estimation of state-space models,

where some latent variables xt completely determine the future
evolution of the system. Denote the model parameters by θ . The
dynamics of the latent states is determined by a Markov process
with the initial density µθ (·) (i.e., for x1 ) and the transition density fθ (xt+1 |xt ). The observations {yt , t = 1, . . . , T } are linked
to the state of the system through the measurement equation
gθ (yt |xt ). In what follows, for a given vector (z1 , . . . , zt ), we
use the notation z1:t .
Our objective is to perform Bayesian inference over the latent
states and model parameters conditional on the observations

193

y1:T . If p(θ ) is the prior distribution over θ , the joint posterior is
p(θ, x1:T | y1:T ) ∝ pθ (x1:T , y1:T )p(θ ),

(1)

where
pθ (x1:T , y1:T ) = µθ (x1 )


T


fθ (xn |xn−1 )

T


gθ (yn | xn ).

(2)

n=1

n=2

In general, simulation-based methods are needed to sample from
p(θ, x1:T | y1:T ).
2.1


An Overview of Pseudomarginal Approaches to
Inference Using Particle Filter

Over the last 15 years, a considerable literature has been developed on the use of sequential Monte Carlo methods (particle filters) that provide sequential approximations to densities pθ (x1:t | y1:t ) and likelihoods pθ (y1:t ) for t = 1, . . . , T
at some fixed θ . Particle filters are ways to propagate a
set of particles (representing the filtered distribution) over t
by importance sampling and resampling steps.2 In particular,
these methods sequentially produce sets of weighted particles
(k)
, wt(k) ), k = 1, . . . , M} whose empirical distribution ap{(x1:t
proximates pθ (x1:t | y1:t ):
pˆ θ (x1:t |y1:t ) =

M


wt(k) δx (k) (x1:t ) with
1:t

M



wt(k) = 1.

(3)

k=1

k=1

Note that δx (k) (x1:t ) is an indicator function giving a value of 1
1:t

(k)
and 0 otherwise.
when x1:t = x1:t
Furthermore, the likelihood estimate is a byproduct:

pˆ θ (y1:t ) = pˆ θ (y1 )

t


pˆ θ (yl |y1:l−1 ),

(4)

l=2

where pˆ θ (yl |y1:l−1 ) can be easily obtained by combining
pˆ θ (xl−1 |y1:l−1 ), fθ (xl |xl−1 ) and gθ (yl |xl ), which has in fact
been obtained before one can compute wl(k) for k = 1, . . . , M.
Crucially for our purpose, these likelihood estimates have
surprisingly good properties: (i) they are unbiased, that is,
E [pˆ θ (y1:T )] = pθ (y1:T ), where the expectation is taken with
respect to all random quantities used in the particle filter (see
Proposition 7.4.1 of Del Moral 2004); and (ii) the variance of
the estimates only increases linearly with sample size T (see
Cerou, Del Moral, and Guyader 2011).
The good behavior of the likelihood estimates suggests that it
may be useful to take a hierarchical approach to the full inference
problem by targeting the posterior density of the model parameters p(θ | y1:T ) and separately using a particle filter to estimate
the necessary likelihoods p(y1:T | θ ). Samples for the latent
states at different time points will be obtained as a byproduct of
the algorithm. This is exactly the pseudomarginal approach of
Andrieu and Roberts (2008) that has been specialized to particle filters by Andrieu, Doucet, and Holenstein (2010). The main
point is that even though one cannot directly tackle p(θ | y1:T )
due to the lack of a closed-form likelihood, the unbiasedness
result opens a way of defining auxiliary variables whose joint
2For

a general introduction to particle filters, readers are referred to Doucet and
Gordon (2001) whereas for the theoretical results, please see Del Moral (2004).

194

Journal of Business & Economic Statistics, April 2015

distribution with θ admits the target as marginal. Use ut to denote all random variables produced by a particle filter in step
t. The corresponding joint density of these ensembles given θ
for the whole sample is ψ(u1:T | θ, y1:T ). Now, extend the statespace to include u1:T and define the auxiliary posterior as
˜ u1:T | y1:T ) ∝ pˆ θ (y1:T )ψ(u1:T | θ, y1:T )p(θ ).
p(θ,

(5)

Unbiasedness of the likelihood estimate means that the original
target is a marginal distribution of the extended target as follows:

˜ u1:T | y1:T )du1:T .
p(θ,
(6)
p(θ | y1:T ) =

of densities to the ultimate target which is much harder to
sample from. We construct a sequence of P densities between π1 (θ, u1:T ) = p(θ )ψ(u1:T | θ, y1:T ), and the extended
˜ u1:T |y1:T ), using a tempering seposterior, πP (θ, u1:T ) = p(θ,
quence {ξl ; l = 1, . . . , P } where ξl is increasing with ξ1 = 0
and ξP = 1. Note that π1 (θ, u1:T ) is easy to sample from and
admits the prior over the fixed parameters as a marginal distribution. The tempered sequence of targets is as follows: for
l = 1, . . . , P ,
γl (θ, u1:T ) = [pˆ θ (y1:T )]ξl ψ(u1:T | θ, y1:T )p(θ )

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:29 11 January 2016

u1:T

Furthermore, the original posterior shares the same normalizing
constant as the extended posterior, which is exactly the marginal
likelihood of the model:

MT = pθ (y1:T )p(θ )dθ
=



πl (θ, u1:T ) =
where
Zl =

(7)

u1:T ,θ

There is a quickly growing family of methods that purport to
draw from the extended target in (5). Particle MCMC methods,
initiated by Andrieu, Doucet, and Holenstein (2010), propose
to run a long Markov chain whose equilibrium distribution is
˜ u1:T | y1:T ).3 A more recent set of algorithms, described in
p(θ,
Chopin, Jacob, and Papaspiliopoulos (2013) and Fulop and Li
(2013), propose to set up a sequence of densities on expanding
data samples and to simulate a set of particles through this
sequence using the iterated batch importance sampling (IBIS)
routine of Chopin (2002). In particular, they work with the
following sequence of expanding targets: for t = 1, . . . , T ,
γt (θ, u1:t ) = pˆ θ (y1:t )ψ(u1:t | θ, y1:t )p(θ )
πt (θ, u1:t ) =
where
Zt =

γt (θ, u1:t )
,
Zt


γt (θ, u1:t )d(θ, u1:t ).
θ,u1:t

The main rationale for this choice of targets is that each
πt (θ, u1:t ) is an extended posterior over the data sample y1:t ,
hence the algorithm provides full sequential posterior inference
which is the main objective of these methods. Note further that
the algorithm also provides estimates of Zt , and hence sequential marginal likelihoods can be computed.
In contrast to Chopin, Jacob, and Papaspiliopoulos (2013) and
Fulop and Li (2013), this article is only concerned with batch
inference and thus proposes an alternative sequence of densities
based on a tempering bridge that can provide a more direct
˜ u1:T | y1:T ) as compared to those expanding-data
route to p(θ,
approaches.
2.2 Density-Tempered Marginalized SMC Sampler

• Sample θ (i) (1) ∼ p(θ ) from the prior distribution over the
model parameters.
(i)
• To attach u(i)
1:T (1), according to ψ(u1:T | θ (1), y1:T ), run a
(i)
particle filter for each θ (1) with M state particles. To save
for later use, store the estimate of the normalizing constant
pˆ (i) (1) = pˆ θ (i) (1) (y1:T ). Note that the random numbers used
in the particle filter are independent across samples and
iterations.
• Attribute an equal weight, S (i) (1) = 1/N, to each particle.
The weighted sample at the (l − 1)-stage of the iteration,
that is, (S (i) (l − 1), θ (i) (l − 1), u(i)
1:T (l − 1), i = 1, . . . , N ) represents πl−1 (θ, u1:T ). With it, the algorithm goes through the
following steps to advance to the next-stage representation of
πl (θ, u1:T ).
2.2.2 Reweighting. Moving from πl−1 (θ, u1:T ) to
πl (θ, u1:T ) can be implemented by reweighting the particles
by the ratio of the two densities. This yields the following
unnormalized incremental weights:
s˜ (i) (l)
=

=
recent extensions, see Pitt et al. (2010), Jasra, Beskos, and Thiery (2013),
Jasra, Lee, Zhang, and Yau (2013).

γl (θ, u1:T )d(θ, u1:T ).

The idea of tempering comes from Del Moral, Doucet, and
Jasra (2006), but note that we only put the tempering term on the
marginal likelihood estimate, pˆ θ (y1:T ) as opposed to the entire
quantity. This holds the key to the feasibility of our algorithm as
will be shown later. Furthermore, πl (θ, u1:T ), l < P does not admit the corresponding “ideal” bridging density [pθ (y1:T )]ξl p(θ )
as a marginal, but this is of no consequence as the latter is of no
independent interest. Next, as in Del Moral, Doucet, and Jasra
(2006), we propagate N points through this sequence using sequential Monte Carlo sampling. In the following, the superscript
(i) always means for each i = 1, . . . , N .
2.2.1 Initialization. To begin at l = 1, we need to obtain
N samples from πl (θ, u1:T ) = ψ(u1:T | θ, y1:T )p(θ ).

The main idea of the SMC methodology is to begin with
an easy-to-sample distribution and traverse through a sequence
3For



θ,u1:T

θ

pˆ θ (y1:T )ψ(u1:T | θ, y1:T )p(θ )d(u1:T , θ ).

γl (θ, u1:T )
,
Zl

γl (θ (i) (l − 1), u(i)
1:T (l − 1))
γl−1 (θ (i) (l − 1), u(i)
1:T (l − 1))

 (i)
(i)
[pˆ (i) (l − 1)]ξl ψ u(i)
1:T (l − 1 | θ (l − 1), y1:T )p(θ (l − 1))

(i)
(i)
[pˆ (i) (l −1)]ξl−1 ψ(u(i)
1:T (l − 1) | θ (l − 1), y1:T )p(θ (l − 1))

ξl −ξl−1
= pˆ (i) (l − 1)
.
(8)

Duan and Fulop: Density-Tempered Marginalized Sequential Monte Carlo Samplers

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:29 11 January 2016

Importantly, the hard-to-evaluate density of the auxiliary ran(i)
dom variables, ψ(u(i)
1:T (l − 1) | θ (l − 1), y1:T ), does not show
up in the final expression.
Del Moral, Doucet, and Jasra (2012) noted that the tempering
sequence can be adaptively chosen to ensure sufficient particle
diversity. We follow their procedure and always set the next
value of the tempering sequence ξl automatically to ensure that
the effective sample size (ESS)4 stays close to some constant
B. This is achieved by a simple grid search, where the ESS is
evaluated at the grid points of ξl , and the one with the ESS
closest to B is chosen.

1:T )
Furthermore, the identity ZZl−1l = θ,u1:T γγl−1l (θ,u
π
(θ,u1:T ) l−1
(θ, u1:T )d(θ, u1:T ) suggests that the ratio of the normalizing
constants can be estimated as

 
N
Zl
=
S (i) (l − 1)˜s (i) (l).
Zl−1
i=1
Finally, the weights are normalized to sum up to one and become
(i)
s (i) (l)
.
S (i) (l) = NS S(l−1)˜
(j ) (l−1)˜
s (j ) (l)
j =1

2.2.3 Resampling. As one proceeds with the algorithm,
the variability of the importance weights will tend to increase,
leading to a sample impoverishment. To focus the computational effort on areas of high probability, it is common in the
sequential Monte Carlo literature to resample particles proportional to their weights. If ESS < B, one resamples the
particles (θ (i) (l − 1), u(i)
1:T (l − 1), i = 1, . . . , N ) proportional to
S (i) (l − 1) and after resampling, set S (i) (l − 1) = N1 .
2.2.4 Moving the Particles. With repeatedly reweighting
and resampling, the support of the sample of the parameters would gradually deteriorate, leading to a well-known
particle depletion situation. Periodically boosting the support is thus a must. For this, we resort to particle marginal
Metropolis-Hastings moves as in Andrieu, Doucet, and Holenstein (2010), and keep the target, γl (θ, u1:T ), unchanged. A
new particle (θ ∗ , u∗1:T ) is proposed using the proposal density
hl (θ ∗ |)ψ(u∗1:T | θ ∗ , y1:T ). The importance weight needed for
the MH move is
γl (θ ∗ , u∗1:T )


hl (θ ∗ |)ψ u∗1:T | θ ∗ , y1:T



[pˆ θ ∗ (y1:T )]ξl ψ u∗1:T | θ ∗ , y1:T p(θ ∗ )


=
hl (θ ∗ |)ψ u∗1:T | θ ∗ , y1:T
=

[pˆ θ ∗ (y1:T )]ξl p(θ ∗ )
,
hl (θ ∗ |)

where pˆ θ ∗ (y1:T ) is the marginalized likelihood estimate evaluated at (θ ∗ , u∗1:T ). Note again that the density of the auxiliary
variables, ψ(u∗1:T | θ ∗ , y1:T ), falls out the final expression and
hence does not need to be evaluated. The resulting MH move
step looks as follows:

195

• Draw parameters from θ ∗ ∼ hl (·|θ (i) (l − 1)) and run a particle filter for each θ ∗ with M state particles. Compute the
likelihood estimate pˆ θ ∗ (y1:T ).
• The MH acceptance probability is α = 1 ∧
[pˆ θ ∗ (y1:T )]ξl p(θ ∗ )
hl (θ (i) (l−1)|θ ∗ )
. That is, (θ (i) (l),
[pˆ (i)
(y1:T )]ξl p(θ (i) (l−1)) hl (θ ∗ |θ (i) (l−1))
θ

(l−1))



(i)
u(i)
1:T (l)) = (θ , u1:T ) with probability α and (θ (l),
(i)
(i)
(i)
u1:T (l)) = (θ (l − 1), u1:T (l − 1)) with probability of
1 − α.

The marginal likelihood of the model, which is exactly

Zl
ZP
] = P [
].
be estimated as [
Z1

ZP
Z1

, can

l=2 Zl−1

Our algorithm basically entails running N particle filters in
parallel, each with M particles. By the results of Del Moral,
Doucet, and Jasra (2006), this algorithm provides consistent
˜ u1:T |y1:T ) as N goes to
inference for the extended target p(θ,
infinity.5 Given that the extended target admits our original
target p(θ |y1:T ) as a marginal, the algorithm is an “exact approximation” in the spirit of Andrieu and Roberts (2008); that
is, it provides consistent inference on the marginal, p(θ |y1:T ),
for any given M state particles, as the number of parameter particles, N, goes to infinity. When inference over the
model parameters θ is of interest, only the marginal likelihood
pˆ θ (i) (y1:T ) and parameter θ (i) need to be stored. If one stores
the whole state particle path, the algorithm also provides full
inference over the joint distribution p(θ, x1:T | y1:T ), a direct
consequence of the results in Andrieu, Doucet, and Holenstein
(2010). In particular, they showed that the resulting extended
density, N1 pˆ θ (y1:T )ψ(u1:T | θ, y1:T )p(θ ), admits the joint posterior p(θ, x1:T | y1:T ) as a marginal. A recent article by Jacob,
Murray, and Rubenthaler (2014) shows a theoretical result that
bounds the expected memory cost of the path storage and proposes an efficient algorithm to realize this.
Our algorithm can be trivially parallelized in the parameter
dimension, a property that it shares with Chopin, Jacob, and Papaspiliopoulos (2013), Fulop and Li (2013), and particle MCMC
with independent MH proposals as in Pitt et al. (2010). This is an
important feature as it allows users to fully use the computational
power of modern graphical processing units (GPUs) equipped
with thousands of parallel cores. Lee et al. (2010) demonstrated
how to speed up particle filters under a fixed parameter using
GPUs when the number of particles becomes really large. They
documented significant speedup when the number of particles,
M goes to 10,000. In financial applications, however, one can
often obtain a decent estimate of the likelihood with a couple of
hundred state particles, which is not enough to keep all threads
of the GPU occupied. Parallelizing the algorithm in the parameter dimension is thus likely to better use the computing power.
For this article, the SMC algorithm was coded in MATLAB
whereas the particle filter was written in CUDA mex files to
run on GPUs, where essentially each thread runs a particle filter
corresponding to a fixed parameter value. Readers are referred
to the Appendix of Fulop and Li (2013) for details of the CUDA
implementation.

4ESS stands for effective sample size and is a commonly used measure of weight

variability. It is defined as ESS =

N

i=1 [S

1
(i) (l−1)]2

where the weights S (i) (l − 1)

(i = 1, . . . , N ) sum up to 1. ESS varies between 1 and N, and a low value
signals a sample where weights are concentrated on a few particles.

5For

recent consistency results for the adaptive algorithm we use here and the
associated central limit theorems, see Jasra, Beskos, and Thiery (2013), Jasra,
Lee, Zhang, and Yau (2013).

196

Journal of Business & Economic Statistics, April 2015

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:29 11 January 2016

2.3 Tuning the Number of Particles
The number of state particles, M, is a key parameter determining the efficiency of move steps in particle MCMC. Pitt
et al. (2012) showed optimality of setting an M such that the
Monte Carlo standard error of the likelihood estimator is around
one. Now, we use this result to tune M in our proposed densitytempered SMC method.
In the first stage, we run our density-tempered SMC algorithm with a moderate value of state particles MI , and continue
until the average acceptance rate drops below some prespecified number, C.6 Our main insight is that by var(ξ ln pˆ θ (y1:T )) =
ξ 2 var(ln pˆ θ (y1:T )), the Monte Carlo noise in our objective function of the MH move is much reduced in the earlier move steps
when the tempering parameter is small. Hence, the algorithm
moves freely in the sample space even with a smaller MI . As
the system is being heated up, this tempering parameter effect
weakens. Increasing estimation errors in the likelihood target
tend to decrease the acceptance rates. Denote the tempering parameter at the stop time as ξI and the mean of the parameter
particles as θˆI .
In the second stage, we continue the algorithm with a new but
larger value of M. Following Pitt et al. (2012), we aim to set the
new M such that the standard error of the untempered likelihood
estimator is around one. Operationally, we need to estimate the
Monte Carlo error of the likelihood estimate at a reasonable
value of θ . Note that our initial parameter estimate, θˆI , is a natural candidate, because it can be interpreted as an approximate
weighted maximum likelihood estimator which provides a consistent estimate of θ0 as T goes to infinity.7 Thus, we proceed to
estimate the variance of the likelihood by running independent
particle filters using MI particles at θˆI .8 Denote this variance
2
(pˆ θ (y1:T )), and set the new number of state
estimator by σM
I
2
(pˆ θ (y1:T )). Next we need a valid
particles to MF ≈ MI × σM
I
way to transition from MI to MF state particles in the SMC
algorithm. Here there are at least two alternatives:
First, one could keep the existing θ population, run a new
independent particle filter for each θ (i) with MF state particles,
and modify the weights to account for the change in M. This is
exactly the exchange step proposed in Chopin, Jacob, and Papaspiliopoulos (2013). The argument implies that by attaching
ξI
 MF
pˆ (i) (y1:T )
(i)
θ
incremental weights IW =
to each existing paMI


θi

(y1:T )

rameter particle, it will be valid to continue the tempered SMC
routine from ξI using the new MF . Note that it is also important
to adjust the estimate of marginal likelihood
with the weighted

(i) (i)
average of these incremental weights, N
i=1 IW s˜ (l). Unfortunately, we find that this exchange step in our applications can
lead to large weight variability because the variance of IW(i) is
hard to control, leading to increased Monte Carlo noise.
Instead, we propose to use a reinitialized tempering procedure
to change M. First, fit some distribution Q(θ ) to the existing
6Such

a stopping criterion is taken from Chopin, Jacob, and Papaspiliopoulos
(2013).
7See,
8In

for example, Hu and Zidek (2002).

contrast, Pitt et al. (2012) suggested to run an initial short Markov Chain
with a large M, compute the posterior mean θˆ0 , and estimate the variance of
likelihood by running independent particle filters at θˆ0 .

parameter particle population to encode the information that
we have built up so far on the parameters. In the applications
later, we use a flexible mixture of normals for this purpose.
In general, the more flexible Q(θ ) is, the more θ particles are
needed to reliably estimate it. Next, new parameter particles
are drawn from Q(θ ), and for each sampled θ , an independent
particle filters with MF state particles is run, resulting in a joint
density of Q(θ )ψMF (u1:T | θ, y1:T ). Evidently this proposal does
not admit the ultimate posterior as a marginal, hence we link it
to the extended target pˆ θ (y1:T )p(θ )ψMF (u1:T | θ, y1:T ) through
a sequence of tempered densities as follows:
γl (θ, u1:T ) = Q(θ )1−ξl [pˆ θ (y1:T )p(θ )]ξl ψMF (u1:T | θ, y1:T )
πl (θ, u1:T ) =
where
Zl =

γl (θ, u1:T )
,
Zl


γl (θ, u1:T )d(θ, u1:T )

θ,u1:T

where ξl is increasing with ξ1 = 0 and ξP = 1. Obviously,
γP (θ, u1:T ) is exactly your eventual target—the full-sample posterior, and ZP /Z1 is the marginal likelihood if both the proposal
Q(θ ) and the prior p(θ ) integrates to 1 as densities should be.
We can then run an adaptive density-tempered SMC algorithm
on this sequence. Note that Q(θ ) is kept fixed throughout the
tempering steps, that is, is not recalibrated.
3.

APPLICATIONS

3.1 Linear Gaussian State-Space Model
To study the performance of the density-tempered SMC sampler, we start with a simple linear Gaussian state-space model
where the likelihood is available in a closed form from the
Kalman filter. The model is
yt = µ + xt + σε εt
xt = φxt−1 + ση ηt ,
εt
σ2
) ∼ N(0, I ) and x0 ∼ N(0, 1−φη 2 ). This simulation
ηt
study follows Flury and Shephard (2011) who used the same
model to benchmark the performance of the PMCMC algorithm.
To ensure positive variances, we parameterize with the log of
the variances. That is, take θ = (µ, ln σε2 , φ, ln ση2 ) and constrain
φ ∈ (−1, 1) using a truncated prior for this parameter. We investigate the performance of different SMC algorithms for (1) data
in accordance with the model and (2) data being contaminated
by outliers. In each scenario, we generate T = 1000 observations using parameter values θ ∗ = (0.5, ln(1), 0.825, ln(0.75)).
Then, a given observation is contaminated by noise equal to
Bt × ǫt , where Bt is a Bernoulli indicator with parameter 0.01
and ǫt is normal with mean zero and standard deviation α. Note
that α controls the degree of model misspecification and we
consider the values in {0, 5, 10, 15, 20}; that is, we examine
cases from no outliers to severe outliers. To focus on the role of
simulation error by different algorithms, we simulate a single
dataset in each scenario and compute statistics from 50 independent runs of each algorithm. We assume a Gaussian prior given
where (

Duan and Fulop: Density-Tempered Marginalized Sequential Monte Carlo Samplers

197

Table 1. Simulation results for the linear Gaussian state-space model with an exact likelihood
Expanding-data SMC
α

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:29 11 January 2016

µ std
ln σε2 std
φ std
ln ση2 std
ln MLL mean
ln MLL std
Time mean
#RM mean

Density-tempered SMC

0

5

10

15

20

0

5

10

15

20

0.007
0.007
0.001
0.006
−1842
0.178
5.1
20.2

0.008
0.003
0.001
0.009
−1859
0.136
5.5
19.0

0.009
0.003
0.002
0.011
−2031
0.767
9.4
29.6

0.008
0.004
0.002
0.011
−2042
1.135
9.1
27.7

0.677
0.758
0.204
1.101
−2593
58.577
10.3
31.2

0.008
0.006
0.001
0.005
−1842
0.134
5.2
7.0

0.007
0.003
0.001
0.007
−1859
0.099
5.2
7.0

0.007
0.003
0.002
0.011
−2030
0.102
5.2
7.0

0.008
0.004
0.001
0.010
−2041
0.098
5.2
7.0

0.005
0.005
0.005
0.025
−2481
0.142
5.2
7.0

NOTE: This table presents the results of a small Monte Carlo study of the linear Gaussian state-space model with different degrees of data contamination. For each scenario, a dataset of
T = 1000 observations is generated using parameter value θ ∗ = (0.5, ln(1), 0.825, ln(0.75)), and random normal contaminations are added to the observations with probability 0.01, and
when happens, the noise has a standard deviation α. The statistics in the table are computed from 50 independent runs of the expanding-data and density-tempered SMC algorithms where
the exact likelihood from the Kalman filter is used. The number of parameter particles is N = 1024, and each time a move takes place, it takes 10 joint random walk MH moves. The first
four rows report the standard deviation of posterior mean estimates of the parameters over the 50 independent runs. The fifth and sixth rows report the mean and standard deviation of the
log of the marginal likelihood estimates over the 50 runs. The seventh and eighth rows report the average runtime in seconds and the average number of resample-move steps taken. The
code was run on a Fermi M2050 GPU.

by θ ∼ N(θ0 ; I4 ), where θ0 = (0.25, ln(1.5), 0.475, ln(0.475)),
and always use N = 1024 parameter particles. The resample
move step is triggered whenever the ESS drops below B = N/2.
A move step always takes 10 joint normal random walk moves
2.382 ˆ
Vl with Vˆl
where the variance of the innovations is set to dim(θ)
being set to the estimated covariance matrix reflective of the
current particle population.9
To zero in on the difference between the expanding-data and
density-tempered SMC approaches, we compare the perfect
marginalized algorithms that use the exact likelihood available
from the Kalman filter. The first four rows of Table 1 report the
standard deviation of the posterior mean parameter estimates
over the 50 runs, whereas the fifth and sixth rows report the
mean and standard deviation of the log of the normalizing constant (i.e., marginal likelihood) estimate, the seventh row the
mean computing times (in seconds), and the eighth row the
average number of resample move steps taken.10
The two columns corresponding to a correctly specified data
generating process (i.e., α = 0) show that the standard deviations of the parameter estimates and the normalizing constants
are similar for the two algorithms, and hence both provide reliable inference in this case. Their computing times seem also
comparable, which is a somewhat surprising result given that
the initial resample-move steps for the expanding-data algorithm
should take much less time because it processes a much smaller
dataset. Our results show that this advantage is neutralized by a
desirable feature of the density-tempered SMC sampler which
needs significantly fewer resample-move steps to reach from
the prior to the posterior (7 vs 20 as in Table 1). The tempering
sequence provides a more direct link between the prior and the
posterior as it is not forced through all the intermediate posteriors. Looking through the columns with increasingly prominent
presence of outliers (growing α), the two algorithms starkly differ. While the density-tempered approach provides stable results
9This is a standard approach in the adaptive MCMC literature; see, for example,

Andrieu and Thoms (2008) for a survey.
10The

average of the posterior means are basically identical for the two algorithms, apart from the most serious case of misspecification which the
expanding-data algorithm is completely unreliable.

for all scenarios (though with some increased simulation noise),
the Monte Carlo error of the expanding-data approach blows up
as outliers become larger in magnitude. In essence, outliers lead
to highly variable incremental weights in the reweighting step of
the SMC algorithm, resulting in a poorly represented posterior.
Moreover, the Monte Carlo error of the marginal likelihood estimates seem to deteriorate faster than the full-sample parameter
estimates. This is understandable because the former reflects
the approximation errors over the whole sequence of densities
between the prior and posterior, whereas the latter only measures the simulation error at the end, and by which stage the
MCMC moves have corrected some of the intermediate errors.
Finally, one observes that the number of resample-move steps
and the computing time are quite stable for the density-tempered
algorithm across the different scenarios, but the expanding-data
approach takes substantially more move steps and need more
computing time when outliers become more severe.
Next, we investigate the effect of using pseudomarginalized
SMC algorithms by running the expanding-data and densitytempered SMC algorithms where approximate likelihoods are
generated by a particle filter. For the linear Gaussian statespace model, the locally optimal importance proposal f (xt+1 |
xt , yt+1 ) is available analytically, and it is what we use in the
adapted particle filter. In all applications in this article we use
stratified resampling within the particle filters. We fix the number of state particle at M = 512 for both algorithms. Other than
this, the simulation setting is identical to the previous case. Overall, the results in Table 2 suggest that the two algorithms perform
similarly when the model has no misspecification. However, the
density-tempered SMC algorithm is much more stable for data
with severe contamination. In general, we can observe larger
Monte Carlo noises due to the use of an approximate likelihood,
which is evident by comparing Table 2 with Table 1.
Finally, we investigate how the two methods of setting the
number of state particles performs on simulated data and report
the results in Table 3. We simulate 50 datasets and each with
T = 1000 observations. The data contamination is conducted
with a probability 0.01 to add a normally distributed error with
a standard deviation of 10. For each simulated dataset, 20 independent runs of three variants of the density-tempered SMC

198

Journal of Business & Economic Statistics, April 2015

Table 2. Simulation results for the linear Gaussian state model with an approximate likelihood using a fixed number
of state particles (M = 512)
Expanding-data SMC
α

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:29 11 January 2016

µ std
ln σε2 std
φ std
ln ση2 std
ln MLL mean
ln MLL std
Time mean
#RM mean

Density-tempered SMC

0

5

10

15

20

0

5

10

15

20

0.011
0.008
0.001
0.007
−1842
0.174
239.6
20.3

0.006
0.004
0.001
0.008
−1859
0.151
262.8
19.3

0.014
0.007
0.002
0.014
−2031
1.219
491.8
30.0

0.025
0.031
0.005
0.046
−2044
1.148
473.6
27.9

1.408
0.821
0.173
0.960
−2623
75.964
518.2
30.6

0.009
0.007
0.001
0.007
−1842
0.114
300.0
7.0

0.007
0.004
0.001
0.008
−1859
0.100
299.8
7.0

0.025
0.010
0.004
0.026
−2030
0.242
315.9
7.2

0.030
0.014
0.005
0.031
−2041
0.312
343.4
7.9

0.024
0.016
0.018
0.070
−2481
0.290
317.9
7.6

NOTE: This table presents the results of a small Monte Carlo study of the linear Gaussian state-space model with different degrees of data contamination. For each scenario, a dataset
of T = 1000 observations is generated using parameter value θ ∗ = (0.5, ln(1), 0.825, ln(0.75)), and random normal contaminations are added to the observations with probability 0.01,
and when happens, the noise has a standard deviation α. The statistics in the table are computed from 50 independent runs of the expanding-data and density-tempered SMC algorithms
where the approximate likelihood from the adapted particle filter is used. The number of parameter particles is N = 1024, and each time a move takes place, it takes 10 joint random
walk MH moves. The first four rows report the standard deviation of posterior mean estimates of the parameters over the 50 independent runs. The fifth and sixth rows report the mean
and standard deviation of the log of the marginal likelihood estimates over the 50 runs. The seventh and eighth rows report the average runtime in seconds and the average number of
resample-move steps taken. The code was run on a Fermi M2050 CPU.

algorithms are carried out. Two of these variants adjust M as
described in Section 2.3. For each run, the algorithms are initialized with MI = 32 and the adjustment to M is triggered when
the acceptance rate drops below C = 0.2.11 The remaining algorithm is based on a fixed number of state particle at M = 512.
All other settings for the SMC algorithms are the same as in
the previous case. The first column of Table 3 corresponds to
using the exchange step of Chopin, Jacob, and Papaspiliopoulos (2013) to modify M, whereas the second column is for the
reinitialized tempering procedure proposed in this article. The
function Q(·) used in reinitialization is a six-component mixture of normals. The third column reports the results of the
density-tempered SMCS with a fixed M = 512. The first four
rows report the average across the 50 datasets of the standard
deviation of posterior mean estimates of the model parameters
over the 20 independent runs of the algorithms. The fifth row
reports the average standard deviation of the log marginal likelihood estimates, again the average is taken across the 50 datasets
of the standard deviation over the 20 density-tempered SMC
runs. The sixth row reports the average final number of state
particles M, whereas the seventh row gives the average runtime
in seconds.
Comparing the second and third columns, it is clear that the
algorithm with automatically adjusted M with reinitialized tempering produces at least as reliable results as does the algorithm
with a fixed M. Looking at the first and the second columns,
one can conclude that the exchange step of Chopin, Jacob, and
Papaspiliopoulos (2013) introduces considerable extra MonteCarlo noise into the estimates, and it is particularly pronounced
for the normalizing constant (i.e., marginal likelihood) estimates. The reason underlying this phenomenon is the occasional
particle impoverishment due to the incremental weights introduced by the exchange step. For further illustration, we plot
in Figure 1, a measure of particle impoverishment versus the
extra Monte Carlo noise in the normalizing constant estimate
due to the exchange step for each simulated dataset. To measure

11We

take this value from Chopin, Jacob, and Papaspiliopoulos (2013).

particle impoverishment, we compute the minimum effective
sample size (ESS) of the parameter particles during each SMC
run with the exchange step. The horizontal axis in this plot is
the average of the minimum ESS over the 20 independent runs
of the SMC algorithm. The vertical axis is a measure of the
extra Monte Carlo noise from the algorithm with the exchange
step as compared to the algorithm with reinitialized tempering.
Specifically, we compute the standard deviation of the log
marginal likelihood estimates over the 20 runs for both algoTable 3. Simulation results for the linear Gaussian model by a
density-tempered SMC algorithm with and without tuning M
Tuned M

Avg µ std
Avg ln σε2 std
Avg φ std
Avg ln ση2 std
Avg ln MLL std
Avg final M
Time mean

Exchange
step

Reinitialized
tempering

0.027
0.014
0.005
0.034
0.368
336
108.7

0.022
0.011
0.004
0.027
0.189
336
107.1

Fixed
M = 512
0.019
0.011
0.004
0.026
0.242
317.2

NOTE: This table presents the results of a small Monte Carlo study of the linear Gaussian
state-space model with and without tuning the number of state particle M. 50 datasets of T =
1000 observations are generated using parameter value θ ∗ = (0.5, ln(1), 0.825, ln(0.75)),
and random normal contaminations are added to the observations with probability 0.01,
and when happens, the noise has a standard deviation of 10. For each simulated dataset,
we run 20 independent runs of three variants of the density-tempered SMC algorithm. The
likelihood is approximated by an adapted particle filter throughout with M being adjusted
according to the algorithm. The number of parameter particles is N = 1024, and each time
a move takes place, it takes 10 joint random walk MH moves. The first two columns report
the results where M is automatically adjusted using the approach described in Section
2.3. For each run, the algorithms are initialized with M = 32, and an adjustment to M is
triggered when the acceptance rate drops below 0.2. The first column corresponds to using
the exchange step of Chopin et al. (2013) to modify M, whereas the second column is
for the reinitialized tempering procedure proposed in this article. The function Q(·) used
in reinitialization is a six-component mixture of normals. The third column reports the
results of the density-tempered SMCS with a fixed M = 512. The first four rows report the
average across the 50 datasets of the standard deviation of posterior mean estimates of the
model parameters over the 20 independent runs of the algorithms. The fifth row reports the
average standard deviation of the log marginal likelihood estimates, again the average is
taken across the 50 datasets of the standard deviation over the 20 density-tempered SMC
runs. The sixth row reports the average final number of state particles M, whereas the
seventh row gives the average runtime in seconds. The code was run on a Fermi M2050
GPU.

Duan and Fulop: Density-Tempered Marginalized Sequential Monte Carlo Samplers

199

formulation. A similar model has been investigated in Pitt et al.
(2010).
The logarithm of the observed stock price, denoted by Yt ,
often contains a transient component Zt , due to microstructure
noises particularly for smaller and less liquid firms. Denoting
the log “efficient” price by Et , the log observed price can be
written as
(9)

Yt = Et + Zt .

The efficient price is assumed to have the following NGARCH
dynamic and its innovations follow a generalized error distribution:
(10)

Et − Et−1 = µ + σt εt

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:29 11 January 2016

2
2
σt2 = α0 (1 − α1 ) + α1 σt−1
+ β1 σt−1

Figure 1. Particle impoverishment versus Monte Carlo noise from
the exchange step.
Note: Each point in this scatterplot represents the results for one of the
50 simulated datasets in the Monte Carlo study presented in Table 3.
For each simulated dataset and each run of the SMC algorithm with
the exchange step (described in Section 2.3), we compute the minimum effective sample size of the parameter particles during the run.
The horizontal axis is the average of the minimum ESS over the 20
independent runs of the SMC algorithm. The vertical axis is a measure
of the extra Monte Carlo noise from the algorithm with the exchange
step as compared to the algorithm with reinitialized tempering. Specifically, we compute the standard deviation of the log marginal likelihood
estimates over the 20 runs for both algorithms, and take the ratio of the
standard deviations (exchange/reinitialized tempering).

rithms, and take the ratio of the standard deviations (exchange/
reinitialized tempering).
It is evident from this plot that for datasets where the SMC
algorithm with the exchange step encounters phases with small
ESSs, the normalizing factor estimate has a much higher variance with the algorithm containing the exchange step as compared to the one with reinitialized tempering. Similar results
hold for the parameter estimates. In unreported experiments,
we have found that the SMC algorithm with the exchange step
can be quite sensitive to the choice of the adjustment trigger, C,
with lower values of the trigger leading to substantially higher
Monte Carlo noise. For such cases, the tempering parameter at
the adjustment tends to be larger, leading toward more variable
incremental weights when moving to a new M. In contrast, reinitialized tempering seemed quite robust to this choice and to the
choice of MI . The intuition is that the first stage of the algorithm
only needs to provide a reasonable set of parameter particles and
no information on the state particles is carried forward.

3.2 GARCH Model With Microstructure Noise
The purpose of this section is to demonstrate the applicability of the density-tempered SMC sampler on a typical nonlinear model in finance and to compare its performance with the
expanding-data SMC sampler on real data. Here we consider
the nonlinear asymmetric GARCH(1,1) (NGARCH) model of
Engle and Ng (1993) with observation noise in a state-space

× [(εt−1 − γ )2 − (1 + γ 2 )]
εt | Ft−1 ∼ GED(v),

(11)
(12)

where Ft stands for the information set available at time t
and GED(v) is a generalized error distribution with mean 0,
unit variance and tail-thickness parameter v > 0.12 The denv exp[− 1 | ε |v ]

2 λ
sity of the GED(v) distribution is f (ε) = λ2(1+1/v) Ŵ(1/v)
, where
(−2/v)
1/2
Ŵ(1/v)/ Ŵ(3/v)] . This family includes quite a
λ = [2
few well-known distributions; for example, v = 2 yields the
normal distribution and v = 1 the double exponential distribution. We use the results of Duan (1997) to impose the following
sufficient conditions for positivity and stationarity of the variance process: α0 , α1 , β1 > 0, α1 − β1 (1 + γ 2 ) > 0 and α1 < 1.
We assume a constant signal-to-noise ratio, so the conditional
volatility of the measurement noise, Zt , is δσt . The constant δ is
the ratio of the noise volatility over the signal volatility, that is,
the inverse signal-to-noise ratio. To allow for the possibility of
fat-tailed measurement noises, we assume that the standardized
measurement noise has a generalized error distribution with tail
thickness parameter v2 . Hence, we have

Zt = δσt ηt where ηt ∼ GED(v2 ) iid.

(13)

We estimate the model on daily stock prices between 2002–
2008 of 100 ra