getdocc8a6. 332KB Jun 04 2011 12:04:58 AM

(1)

El e c t ro n ic

Jo ur

n a l o

f P

r o

b a b i_{l i t y} Vol. 16 (2011), Paper no. 2, pages 45–75.

Journal URL

http://www.math.washington.edu/~ejpecp/

Can the adaptive Metropolis algorithm collapse without

the covariance lower bound?

∗

Matti Vihola

Department of Mathematics and Statistics University of Jyväskylä

P.O.Box 35

FI-40014 University of Jyväskyla Finland

[email protected]

http://iki.fi/mvihola/

Abstract

The Adaptive Metropolis (AM) algorithm is based on the symmetric random-walk Metropolis algorithm. The proposal distribution has the following time-dependent covariance matrix at step n+1

S_n=Cov(X₁, . . . ,X_n) +εI,

that is, the sample covariance matrix of the history of the chain plus a (small) constantε >0 multiple of the identity matrixI. The lower bound on the eigenvalues ofS_ninduced by the factor

εI is theoretically convenient, but practically cumbersome, as a good value for the parameter

εmay not always be easy to choose. This article considers variants of the AM algorithm that do not explicitly bound the eigenvalues ofSn away from zero. The behaviour ofSn is studied

in detail, indicating that the eigenvalues ofSn do not tend to collapse to zero in general. In

dimension one, it is shown thatSnis bounded away from zero if the logarithmic target density

is uniformly continuous. For a modification of the AM algorithm including an additional fixed ∗_{The author was supported by the Academy of Finland, projects no. 110599 and 201392, by the Finnish Academy}

of Science and Letters, Vilho, Yrjö and Kalle Väisälä Foundation, by the Finnish Centre of Excellence in Analysis and Dynamics Research, and by the Finnish Graduate School in Stochastics and Statistics

(2)

component in the proposal distribution, the eigenvalues ofS_nare shown to stay away from zero with a practically non-restrictive condition. This result implies a strong law of large numbers for super-exponentially decaying target distributions with regular contours.

Key words: Adaptive Markov chain Monte Carlo, Metropolis algorithm, stability, stochastic ap-proximation.

AMS 2000 Subject Classification:Primary 65C40.

(3)

1 Introduction

Adaptive Markov chain Monte Carlo (MCMC) methods have attracted increasing interest in the last few years, after the original work of Haario, Saksman, and Tamminen Haario et al. (2001) and the subsequent advances in the field Andrieu and Moulines (2006); Andrieu and Robert (2001); Atchadé and Rosenthal (2005); Roberts and Rosenthal (2007); see also the recent review Andrieu and Thoms (2008). Several adaptive MCMC algorithms have been proposed up to date, but the seminal Adaptive Metropolis (AM) algorithm Haario et al. (2001) is still one of the most applied methods, perhaps due to its simplicity and generality.

The AM algorithm is a symmetric random-walk Metropolis algorithm, with an adaptive proposal distribution. The algorithm starts1 at some point X₁ _≡ x₁ _∈ Rd with an initial positive definite covariance matrixS₁_≡s₁_∈_Rd×d and follows the recursion

(S1) LetY_n+1=Xn+θS1/2n Wn+1, whereWn+1 is an independent standard Gaussian random vector

andθ >0 is a constant.

(S2) Accept Y_n+1 with probability min

1,π(Yn+1)

π(Xn) and let Xn+1 =Yn+1; otherwise rejectYn+1 and

letXn+1=Xn.

(S3) SetSn+1= Γ(X1, . . . ,Xn+1).

In the original work Haario et al. (2001) the covariance parameter is computed by Γ(X1, . . . ,Xn+1) =

1 n

n+1

k=1

(Xk−Xn+1)(Xk−Xn+1)T +εI, (1)

where Xn := n−1

k=1Xk stands for the mean. That is, Sn+1 is a covariance estimate of the

history of the ‘Metropolis chain’ X1, . . . ,Xn+1 plus a small ε > 0 multiple of the identity matrix

I _∈ _Rd×d. The authors prove a strong law of large numbers (SLLN) for the algorithm, that is, n−1Pn_k₌₁f(X_k)_→R

Rdf(x)π(x)dx almost surely asn→ ∞for any bounded functional f when the

target distributionπis bounded and compactly supported. Recently, SLLN was shown to hold also forπwith unbounded support, having super-exponentially decaying tails with regular contours and

f growing at most exponentially in the tails Saksman and Vihola (2010).

This article considers the original AM algorithm (S1)–(S3), without the lower bound induced by the factorεI. The proposal covariance function Γ, defined precisely in Section 2, is a consistent covariance estimator first proposed in Andrieu and Robert (2001). A special case of this estimator behaves asymptotically like the sample covariance in (1). Previous results indicate that if this algo-rithm is modified by truncating the eigenvalues ofS_n within explicit lower and upper bounds, the algorithm can be verified in a fairly general setting Atchadé and Fort (2010); Roberts and Rosenthal (2007). It is also possible to determine an increasing sequence of truncation sets forS_n, and mod-ify the algorithm to include a re-projection scheme in order to vermod-ify the validity of the algorithm Andrieu and Moulines (2006).

While technically convenient, such pre-defined bounds on the adapted covariance matrixS_n can be inconvenient in practice. Ill-defined values can affect the efficiency of the adaptive scheme dramat-ically, rendering the algorithm useless in the worst case. In particular, if the factorε >0 in the AM

(4)

algorithm is selected too large, the smallest eigenvalue of the true covariance matrix ofπ may be well smaller than ε > 0, and the chain Xn is likely to mix poorly. Even though the re-projection

scheme of Andrieu and Moulines (2006) avoids such behaviour by increasing truncation sets, which eventually contain the desirable values of the adaptation parameter, the practical efficiency of the algorithm is still strongly affected by the choice of these sets Andrieu and Thoms (2008).

Without a lower bound on the eigenvalues ofSn (or a re-projection scheme), there is a potential

danger of the covariance parameter S_n collapsing to singularity. In such a case, the increments X_n₋X_n₋₁would be smaller and smaller, and theX_nchain could eventually get ‘stuck’. The empirical evidence suggests that this does not tend to happen in practice. The present results validate the empirical findings by excluding such a behaviour in different settings.

After defining precisely the algorithms in Section 2, the above mentioned unconstrained AM algo-rithm is analysed in Section 3. First, the AM algoalgo-rithm run on an improper uniform targetπ≡c>0 is studied. In such a case, the asymptotic expected growth rate ofS_nis characterised quite precisely, beinge2θpnfor the original AM algorithm Haario et al. (2001). The behaviour of the AM algorithm in the uniform target setting is believed to be similar as in a situation whereS_n is small and the target πis smooth whence locally constant. The results support the strategy of choosing a ‘small’ initial covariances1 in practice, and letting the adaptation take care of expanding it to the proper

size.

In Section 3, it is also shown that in a one-dimensional setting and with a uniformly continuous logπ, the variance parameterS_n is bounded away from zero. This fact is shown to imply, with the results in Saksman and Vihola (2010), a SLLN in the particular case of a Laplace target distribution. While this result has little practical value in its own right, it is the first case where the unconstrained AM algorithm is shown to preserve the correct ergodic properties. It shows that the algorithm possesses self-stabilising properties and further strengthens the belief that the algorithm would be stable and ergodic under a more general setting.

Section 4 considers a slightly different variant of the AM algorithm, due to Roberts and Rosenthal Roberts and Rosenthal (2009), replacing (S1) with

(S1’) With probabilityβ, letYn+1 =Xn+Vn+1 where Vn+1 is an independent sample ofqfix;

other-wise, letY_n+1=Xn+θSn1/2Wn+1 as in (S1).

While omitting the parameterε >0, the proposal strategy (S1’) includes two additional parameters: the mixing probability β ∈ (0, 1) and the fixed symmetric proposal distribution qfix. It has the

advantage that the ‘worst case scenario’ having ill-definedq_fixonly ‘wastes’ the fixed proportionβof samples, whileS_n can take any positive definite value on adaptation. This approach is analysed also in the recent preprint Bai et al. (2008), relying on a technical assumption that ultimately implies thatX_nis bounded in probability. In particular, the authors show that ifq_fix is a uniform density on a ball having a large enough radius, then the algorithm is ergodic. Section 4 uses a perhaps more transparent argument to show that the proposal strategy (S1’) with a mild additional condition implies a sequenceS_nwith eigenvalues bounded away from zero. This fact implies a SLLN using the technique of Saksman and Vihola (2010), as shown in the end of Section 4.

(5)

2 The general algorithm

Let us define a Markov chain(X_n,M_n,S_n)_n_≥₁ evolving in spaceRd×Rd× Cd _{with the state space}

Rd and_Cd_⊂_Rd×dstanding for the positive definite matrices. The chain starts at an initial position X1 ≡ x1∈Rd, with an initial mean2 M1 ≡m1∈Rd and an initial covariance matrixS1≡s1∈ Cd.

Forn≥1, the chain is defined through the recursion

X_n+1 ∼ Pq_Sn(Xn,·) (2)

Mn+1 := (1−ηn+1)Mn+ηn+1Xn+1 (3)

S_n₊₁ := (1₋ηn+1)Sn+ηn+1(Xn+1−Mn)(Xn+1−Mn)T. (4)

Denoting the natural filtration of the chain as_Fn :=σ(X_k,M_k,S_k : 1_≤k_≤n), the notation in (2) reads that PXn+1∈A

= Pq_Sn(Xn,A)for any measurableA⊂Rd. The Metropolis transition

kernelP_qis defined for any symmetric probability densityq(x,y) =q(x₋y)through

Pq(x,A):=1A(x)

1−

min

1,π(y) π(x)

q(y−x)dy

min

1,π(y) π(x)

q(y−x)dy where 1A stands for the characteristic function of the set A. The proposal densities {qs}s

∈Cd are

defined as a mixture

q_s(z):= (1₋β)q˜_s(z) +βq_fix(z) (5) where the mixing constant β ∈ [0, 1) determines the portion how often a fixed proposal density q_fix is used instead of the adaptive proposal ˜q_s(z) := det(θs)−1/2˜q(θ−1/2s−1/2z) with ˜q being a ‘template’ probability density. Finally, the adaptation weights(ηn)n≥2⊂(0, 1)appearing in (3) and

(4) is assumed to decay to zero.

One can verify that forβ=0 this setting corresponds to the algorithm (S1)–(S3) of Section 1 with W_n+1 having distribution ˜q, and forβ ∈(0, 1), (S1’) applies instead of (S1). Notice also that the

original AM algorithm essentially fits this setting, withηn:=n−1,β:=0 and if ˜qsis defined slightly

differently, being a Gaussian density with mean zero and covariances+εI. Moreover, if one sets β =1, the above setting reduces to a non-adaptive symmetric random walk Metropolis algorithm with the increment proposal distributionq_fix.

3 The unconstrained AM algorithm

3.1 Overview of the results

This section deals with the unconstrained AM algorithm, that is, the algorithm described in Section 2 with the mixing constantβ=0 in (5). Sections 3.2 and 3.3 consider the case of an improper uniform target distributionπ≡cfor some constantc>0. This implies that (almost) every proposed sample is accepted and the recursion (2) reduces to

Xn+1=Xn+θSn1/2Wn+1 (6)

2_{A customary choice is to set}_m 1=x1.

(6)

100 102 104 106 −10

−5 0 5

Figure 1: An example of the exact development of_E

, whens1=1 andθ=0.01. The sequence

(_E

S_n

)_n_≥₁decreases untilnis over 27, 000 and exceeds the initial value only withnover 750, 000.

where(W_n)_n_≥₂are independent realisations of the distribution ˜q.

Throughout this subsection, let us assume that the template proposal distribution ˜q is spherically symmetric and the weight sequence is defined as ηn := cn−γ for some constants c ∈ (0, 1] and

γ∈(1/2, 1]. The first result characterises the expected behaviour ofSn when(Xn)n≥2follows (6).

Theorem 1. Suppose(X_n)_n_≥₂ follows the ‘adaptive random walk’ recursion (6), with _EW_nW_nT = I .

Then, for allλ >1there is n0≥m such that for all n≥n0and k≥1, the following bounds hold

1 λ



 θ

n+k

j=n+1

η_j



 ≤log

ES_n₊_k

ESn

≤λ



 θ

n+k

j=n+1

η_j



 .

Proof. Theorem 1 is a special case of Theorem 12 in Section 3.2.

Remark2. Theorem 1 implies that with the choice ηn:=cn−γ for somec∈(0, 1)andγ∈(1/2, 1],

the expectation grows with the speed

ES_n

≃exp θ

c 1₋γ₂n

1−γ₂

Remark 3. In the original setting (Haario et al. 2001) the weights are defined as ηn := n−1 and

Theorem 1 implies that the asymptotic growth rate of _E[S_n] is e2θpn when (X_n)_n_≥₂ follows (6). Suppose the value ofSnis very small compared to the scale of a smooth target distributionπ. Then,

it is expected that most of the proposal are accepted,X_n behaves almost as (6), andS_n is expected to grow approximately at the ratee2θpn until it reaches the correct magnitude. On the other hand, simple deterministic bound implies thatSn can decay slowly, only with the polynomial speed n−1.

Therefore, it may be safer to choose the initials1 small.

Remark4. The selection of the scaling parameterθ >0 in the AM algorithm does not seem to affect the expected asymptotic behaviourS_ndramatically. However, the choice 0< θ ≪1 can result in an significant initial ‘dip’ of the adapted covariance values, as exemplified in Figure 1. Therefore, the valuesθ ≪1 are to be used with care. In this case, the significance of a successful burn-in is also emphasised.

(7)

It may seem that Theorem 1 would automatically also ensure thatS_n _{→ ∞}also path-wise. This is not, however, the case. For example, consider the probability space[0, 1]with the Borelσ-algebra and the Lebesgue measure. Then(M_n,Fn)_n_≥₁ defined as M_n :=22n1[0,2−n) andFn :=σ(X

k : 1≤

k_≤n)is, in fact, a submartingale. Moreover,_EM_n=2n_{→ ∞}, but M_n_→0 almost surely. The AM process, however, does produce an unbounded sequenceSn.

Theorem 5. Assume that(X_n)_n_≥₂follows the ‘adaptive random walk’ recursion(6). Then, for any unit

vector u_∈_Rd, the process uTS_nu_{→ ∞}almost surely.

Proof. Theorem 5 is a special case of Theorem 18 in Section 3.3.

In a one-dimensional setting, and when logπ is uniformly continuous, the AM process can be ap-proximated with the ‘adaptive random walk’ above, wheneverS_n is small enough. This yields

Theorem 6. Assume d =1and logπ is uniformly continuous. Then, there is a constant b>0such

thatlim inf_n_→∞S_n≥b.

Proof. Theorem 6 is a special case of Theorem 18 in Section 3.4. Finally, having Theorem 6, it is possible to establish

Theorem 7. Assumeq is Gaussian, the one-dimensional target distribution is standard Laplace˜ π(x):=

1 2e−|

x| _{and the functional f} _: _R _→ _R _satisfies _sup

xe−γ|x||f(x)| < ∞ for some γ∈ (0, 1/2). Then,

n−1Pn_k₌₁f(X_k)_→R f(x)π(x)dx almost surely as n_{→ ∞}. Proof. Theorem 7 is a special case of Theorem 21 in Section 3.4.

Remark8. In the caseηn := n−1, Theorem 7 implies that the parameters Mn andSn of the

adap-tive chain converge to 0 and 2, that is, the true mean and variance of the target distribution π, respectively.

Remark 9. Theorem 6 (and Theorem 7) could probably be extended to cover also targets π with compact supports. Such an extension would, however, require specific handling of the boundary effects, which can lead to technicalities.

3.2 Uniform target: expected growth rate

Define the following matrix quantities an := E

(Xn−Mn−1)(Xn−Mn−1)T

(7) bn := E

(8) forn_≥1, with the convention thata₁_≡0_∈_Rd×d. One may write using (3) and (6)

X_n₊₁₋M_n=X_n₋M_n+θS1/2_n W_n₊₁= (1₋ηn)(Xn−Mn−1) +θS1/2n Wn+1.

IfEW_nW_nT =I, one may easily compute E

(X_n+1−Mn)(Xn+1−Mn)T

= 1₋ηn

(8)

sinceW_n₊₁ is independent of_Fn and zero-mean due to the symmetry of ˜q. The values of (a_n)_n_≥₂ and(bn)n≥2are therefore determined by the joint recursion

a_n₊₁ = (1₋ηn)2an+θ2bn (9)

b_n+1 = (1−ηn+1)bn+ηn+1an+1. (10)

Observe that for any constant unit vectoru∈Rd, the recursions (9) and (10) hold also for a(_nu₊)₁ := _EuT(X_n+1−Mn)(Xn+1−Mn)Tu

b(_nu₊)₁ := _EuTSn+1u

The rest of this section therefore dedicates to the analysis if the one-dimensional recursions (9) and (10), that is,a_n,b_n_∈_R₊for alln_≥1. The first result shows that the tail of(b_n)_n_≥₁ is increasing.

Lemma 10. Let n₀_≥1and suppose a_n₀_≥0, b_n₀>0and for n_≥n₀the sequences a_nand b_nfollow the

recursions(9)and(10), respectively. Then, there is a m₀_≥n₀such that(b_n)_n_≥_m

0is strictly increasing.

Proof. Ifθ ≥1, we may estimate a_n₊₁ _≥(1₋ηn)2an+bn implying bn+1 ≥ bn+ηn+1(1−ηn)2an

for all n≥ n0. Since bn > 0 by construction, and therefore also an+1 ≥ θ2bn > 0, we have that

b_n+1> bnfor alln≥n0+1.

Suppose thenθ <1. Solvinga_n+1 from (10) yields

a_n₊₁=η−_n₊1₁ b_n₊₁₋b_n

+b_n Substituting this into (9), we obtain forn_≥n₀+1

η−_n₊1₁ bn+1−bn

+bn= (1−ηn)2

η−_n1 bn−bn−1

+bn−1

+θ2bn

After some algebraic manipulation, this is equivalent to bn+1−bn=

ηn+1

ηn

(1−ηn)3(bn−bn−1) +ηn+1

(1−ηn)2−1+θ2

bn. (11)

Now, sinceηn→0, we have that(1−ηn)2−1+θ2>0 whenevernis greater than somen1. So, if

we have for somen′>n₁ that b_n′−b_n′₋₁≥0, the sequence(b_n)_n_≥_n′ is strictly increasing aftern′. Suppose conversely that b_n+1−bn < 0 for all n≥ n1. From (10), bn+1− bn = ηn+1(an+1− bn)

and hence b_n > a_n₊₁ forn_≥n₁. Consequently, from (9), a_n₊₁ > (1₋ηn)2an+θ2an+1, which is

equivalent to

a_n₊₁> (1−ηn)

1₋θ2 an.

Sinceηn→0, there is aµ >1 andn2such thatan+1≥µanfor alln≥n2. That is,(an)n≥n2 grows at least geometrically, implying that after some timea_n₊₁>b_n, which is a contradiction. To conclude, there is anm0≥n0 such that(bn)n≥m0 is strictly increasing.

Lemma 10 shows that the expectation_EuTS_nuis ultimately bounded from below, assuming only thatηn →0. By additional assumptions on the sequenceηn, the growth rate can be characterised

(9)

Assumption 11. Suppose(ηn)n≥1⊂(0, 1)and there ism′≥2 such that

(i) (η_n)_n_≥_m′ is decreasing withη_n→0, (ii) (η−_n₊1/2₁ −η−_n1/2)_n_≥_m′ is decreasing and (iii) P∞_n₌₂ηn=∞.

The canonical example of a sequence satisfying Assumption 11 is the one assumed in Section 3.1, η_n:=cn−γforc∈(0, 1)andγ∈(1/2, 1].

Theorem 12. Suppose a_m _≥0and b_m> 0for some m_≥1, and for n>m the a_n and b_n are given

recursively by(9)and (10), respectively. Suppose also that the sequence(ηn)n≥2 satisfies Assumption

11 with some m′ _≥m. Then, for allλ >1there is m₂_≥ m′ such that for all n_≥ m₂ and k_≥1, the following bounds hold

1 λ



 θ

n+k

j=n+1

ηj



 ≤log

n+k

b_n

≤λ



 θ

n+k

j=n+1

ηj



 .

Proof. Let m₀ be the index from Lemma 10 after which the sequence b_n is increasing. Let m₁ > max{m0,m′}and define the sequence(zn)n≥m1−1 by settingzm1−1 =bm1−1 andzm1 = bm1, and for

n_≥m₁through the recursion z_n+1=zn+

ηn+1

ηn

(1−η_n)3(z_n−z_n₋₁) +η_n+1θ˜2zn (12)

where ˜θ > 0 is a constant. Consider such a sequence (z_n)_n_≥_m₁₋₁ and define another sequence (gn)n≥m1+1through

gn+1:=η− 1/2 n+1

z_n+1−zn

z_n =η

−1/2 n+1

ηn+1

ηn

(1₋ηn)3

z_n₋z_n₋₁ z_n₋₁

z_n₋₁

z_n +ηn+1 ˜ θ2

=η1/2_n₊₁ (1−ηn)

ηn

g_n g_n+η−n1/2

+θ˜2

Lemma 33 in Appendix A shows thatg_n_→θ˜. Let us consider next two sequences(z(_n1))_n_≥_m

1−1 and(z (2)

n )n≥m1−1 defined as(zn)n≥m1−1 above but using two different values ˜θ(1) and ˜θ(2), respectively. It is clear from (11) that for the choice

θ(1)_:₌_θ _{one has} _b

n≤z(n1)for alln≥m1−1. Moreover, sincebm1+1/bm1 ≤z (1)

m1+1/z (1)

m1, it holds by induction that

b_n+1

b_n ≤1+ ηn+1

ηn

(1₋ηn)3

1₋ bn−1 b_n

+ηn+1θ˜2

≤1+ηn+1

η_n (1−ηn)

3 ₁

−z

(1)

n−1

zn(1)

+ηn+1θ˜2=

z_n(1₊)₁ zn(1)

also for alln_≥ m₁+1. By a similar argument one shows that if ˜θ(2):= [(1₋ηm1)

−1+θ2]1/2 thenbn≥zn(2)andbn+1/bn≥z

(2)

n+1/z

(2)

(10)

Letλ′>1. Since g_n(1)_→θ˜(1) and g_n(2)_→θ˜(2) there is a m₂ _≥ m₁ such that the following bounds apply

1+ ˜ θ(2)

λ′

p_η n≤

z(_n2) z(_n2₋)₁

and z (1)

z(_n1₋)₁ ≤

1+λ′θ˜(1)pηn

for alln≥m2. Consequently, for alln≥m2, we have that

log

n+k

b_n

≤log





z_n(1₊)_k z_n(1)



≤

n+k

j=n+1

log

1+λ′θ˜(1)p

ηj

≤λ′θ

n+k

j=n+1 p_η

Similarly, by the mean value theorem

log

bn+k

≥ n+k

j=n+1

log

1+ ˜ θ(2)

λ′

ηj

≥

˜ θ(2) λ′₍₁₊_λ′−1_θ_˜(2)p_η

n) n+k

j=n+1

ηj

sinceηnis decreasing. By letting the constantm1above be sufficiently large, the difference|θ˜(2)−θ|

can be made arbitrarily small, and by increasing m2, the constantλ′>1 can be chosen arbitrarily

close to one.

3.3 Uniform target: path-wise behaviour

Section 3.2 characterised the behaviour of the sequence_E

when the chain(Xn)n≥2 follows the

‘adaptive random walk’ recursion (6). In this section, we shall verify that almost every sample path (S_n)_n_≥₁of the same process are increasing.

Let us start by expressing the processS_nin terms of an auxiliary process(Z_n)_n_≥₁.

Lemma 13. Let u_∈_Rd be a unit vector and suppose the process(X_n,M_n,S_n)_n_≥₁is defined through(3),

(4)and (6), where(Wn)n≥1 are i.i.d. following a spherically symmetric, non-degenerate distribution.

Define the scalar process(Z_n)_n_≥₂through

Z_n+1:=uT

X_n₊₁₋M_n

kS1/2n uk

(13)

wherekxk:=pxTx stands for the Euclidean norm. Then, the process(Z_n,S_n)_n_≥₂ follows

uTSn+1u = [1+ηn+1(Zn2+1−1)]u T_S

nu (14)

Z_n₊₁ = θW˜_n₊₁+U_nZ_n (15) where(W˜_n)_n_≥₂are non-degenerate i.i.d. random variables and U_n:= (1₋ηn) 1+ηn(Zn2−1)

−1/2

. The proof of Lemma 13 is given in Appendix B.

It is immediate from (14) that only values|Zn|<1 can decreaseuTS_nu. On the other hand, if both ηnandηnZn2 are small, then the variableUnis clearly close to unity. This suggests a nearly random

walk behaviour of Zn. Let us consider an auxiliary result quantifying the behaviour of this random

(11)

Lemma 14. Let n₀ _≥ 2, suppose Z˜_n

0−1 isFn0−1-measurable random variable and suppose(W˜n)n≥n0

are respectively (_Fn)_n_≥_n

0-measurable and non-degenerate i.i.d. random variables. Define for Z˜n for

n≥2through

Z_n₊₁=Z˜_n+θW˜_n₊₁. Then, for any N,δ1,δ2>0, there is a k0≥1such that



 

1 k

j=1

{|_Zn˜₊_j_|≤_N_}≥δ1



 ≤δ2

a.s. for all n≥1and k≥k0.

Proof. From the Kolmogorov-Rogozin inequality, Theorem 36 in Appendix C, P(_Z˜_n₊_j₋_Z˜_n_∈[x,x+2N]_{| Fn})_≤c1j−1/2

for any x ∈ R, where the constant c1 > 0 depends on N, θ and on the distribution of Wj. In

particular, since ˜Z_n+j − Z˜n is independent of ˜Zn, one may set x = −Zn − N above, and thus

P|Z˜_n+j| ≤N

≤c₁j−1/2. The estimate



 

1 k

j=1

{|_Zn˜₊_j_|≤_N_}



 ≤

j=1

j−1/2_≤c₂k−1/2

impliesP k−1Pk_j₌₁1

{|Z˜n+j|≤N}≥δ1

Fn≤δ−1

1 c2k−1/2, concluding the proof.

The technical estimate in the next Lemma 16 makes use of the above mentioned random walk approximation and guarantees ultimately a positive ‘drift’ for the eigenvalues of S_n. The result requires that the adaptation sequence (ηn)n≥2 is ‘smooth’ in the sense that the quotients converge

to one.

Assumption 15. The adaptation weight sequence(ηn)n≥2⊂(0, 1)satisfies

lim

n→∞

ηn+1

η_n =1.

Lemma 16. Let n₀ _≥2, suppose Z_n

0−1 isFn0−1-measurable, and assume(Zn)n≥n0 follows(15)with

non-degenerate i.i.d. variables (_W˜_n)_n_≥_n

0 measurable with respect to (Fn)n≥n0, respectively, and the

adaptation weights(ηn)n≥n0 satisfy Assumption 15. Then, for any C≥1andε >0, there are indices

k≥1and n1≥n0 such thatP

Ln,k

≤εa.s. for all n≥n1, where

L_n,k:=







j=1

logh1+ηn+j

Z_n2₊_j₋1i<kCηn







(12)

Proof. Fix aγ∈(0, 2/3). Define the setsA_n:j:=_∩_ij₌_n₊₁_{Z_i2_≤η−_i γ}andA′_i:=_{Z_i2> η−_i γ}. Write the conditional expectation in parts as follows,

PLn,k

=_PLn,k,An:n+k

+_PLn,k,A′n

n+k

i=n+1

PL_n,k,A_n:i₋₁,A′_i Fn

. (16)

Letω∈A′_i for anyn<i≤n+kand compute log1+ηi(Zi2−1)

≥log1+ηi(η− γ i −1)

≥log

1+2ηikC

≥ ₁₊2η₂ikC

ηikC ≥

kCηn

whenevern_≥n₀is sufficiently large, sinceηn→0, and by Assumption 15. That is, ifnis sufficiently

large, all but the first term in the right hand side of (16) are a.s. zero. It remains to show the inequality for the first.

Suppose now thatZ_n2≤η−nγ. One may estimate

Un = (1−ηn)1/2

1− ηnZ 2 n

1−ηn+ηnZn2

1/2

≥ (1₋ηn)1/2 1−

η1n−γ

1−ηn

!1/2

≥ (1₋η1_n−γ)1/2 1−2η

1−γ n

1₋ηn

!1/2

≥1₋c₁η1_n−γ

wherec1:=2 supn≥n0(1−ηn)

−1/2_<

∞. Observe also thatUn≤1.

Letk₀_≥1 be from Lemma 14 applied withN=p8C+1+1,δ1=1/8 andδ2=ε, and fixk≥k0+1.

Letn≥n0 and define an auxiliary process(Z˜

(n)

j )j≥n0−1 as ˜Z (n)

j ≡ Zj forn0−1≤ j≤n+1, and for

j>n+1 through

Z(_jn)=Z_n+1+θ j

i=n+2

˜ W_i. For anyn+2≤ j≤n+kandω∈An:j, the difference of ˜Z

(n)

j andZj can be bounded by |_Z˜(n)

j+1−Zj+1| ≤ |Zj||1−Uj|+|Z˜

(n)

j −Zj| ≤c1η 1−3₂γ j +|Z˜

(n)

j −Zj| ≤ · · · ≤c1

i=n+1

η1− 3 2γ

i ≤c1η 1−3₂γ n

i=n+1

ηi

ηn

1−3

2γ

≤c2(j−n)η 1−3₂γ n

by Assumption 15. Therefore, for sufficiently large n_≥n₀, the inequality_|Z˜(_jn)₋Z_{j| ≤}1 holds for alln_≤ j_≤n+kandω∈A_n:n₊_k. Now, ifω∈A_n:n₊_k, the following bound holds

log

1+ηj(Z2j −1)

≥log1+ηj(min{N,|Zj|}2−1)

≥1

{|Z˜(_jn)|>N}log

1+ηj((N−1)2−1)

{|Z˜(_jn)|≤N}log

1₋ηj

≥1

{|Z˜(_jn)|>N}(1−βj)ηj8C−1

(13)

by the mean value theorem, where the constantβj =βj(C,ηj)∈(0, 1) can be selected arbitrarily

small whenever jis sufficiently large. Using this estimate, one can write forω∈An:n+k k

j=1

log

1+ηn+j

Z_n2₊_j−1

≥(1−βn)

j∈I+_n₊_1:_k

ηn+j8C−(1+βn) k

j=1

ηn+j

whereI_n+₊_1:k:=_{j_∈[1,k]: ˜Z_n(n₊)_j>N_}. Define the sets

B_n,k:=







1 k−1

k−1

j=1

{|_Zn˜₊_j₊₁_|≤_N_}≤δ1







Within B_n,k, it clearly holds that #I_n+₊_1:k _≥ k₋1₋(k₋1)δ1 = 7(k−1)/8. Thereby, for all ω ∈

Bn,k∩An:n+k k

j=1

log

1+ηn+j

Z_n2₊_j−1

≥ηnk

(1₋βn)

7 2

inf

1≤j≤k

ηn+j

ηn

C₋(1+βn)

sup

1≤j≤k

ηn+j

ηn

≥kCηn

for sufficiently largen≥1, as then the constantβn can be chosen small enough, and by Assumption

15. In other words, ifn≥1 is sufficiently large, thenBn,k∩An:n+k∩Ln,k=;. Now, Lemma 14 yields

PLn,k,An:n+k

=_PLn,k,An:n+k,Bn,k

+_PL_n,k,A_n:n₊_k,B_n,k∁ Fn

≤P B∁_n,k_{| Fn}

≤ε.

Using the estimate of Lemma 16, it is relatively easy to show that uTS_nutends to infinity, if the adaptation weights satisfy an additional assumption.

Assumption 17. The adaptation weight sequence(ηn)n≥2⊂(0, 1)is inℓ2 but not inℓ1, that is,

∞

n=2

ηn=∞ and

∞

n=2

η2_n<∞.

Theorem 18. Assume that (Xn)n≥2 follows the ‘adaptive random walk’ recursion (6)and the

adap-tation weights(ηn)n≥2 satisfy Assumptions 15 and 17. Then, for any unit vector u∈Rd, the process

uTS_nu_{→ ∞}almost surely.

Proof. The proof is based on the estimate of Lemma 16 applied with a similar martingale argument as in Vihola (2009).

Letk _≥2 be from Lemma 16 applied withC =4 andε=1/2. Denoteℓi :=ki+1 fori≥0 and,

inspired by (14), define the random variables(Ti)i≥1 by

Ti:=min

kMηℓi−1,

ℓi

j=ℓi−1+1

logh1+ηj

Z2_j −1i

(14)

with the convention thatη0 =1. Form a martingale(Yi,Gi)i≥1 withY1 ≡0 and having differences

dYi :=Ti−E

Gi₋₁

and where_G1_{≡ {;},Ω_}and_Gi:=_Fℓ

i fori≥1. By Assumption 17,

∞

i=2

EdY_i2≤c

∞

i=1

η2_ℓ

i <∞

with a constantc=c(k,C)>0, soY_i is aL2-martingale and converges a.s. to a finite limitM_∞(e.g. Hall and Heyde 1980, Theorem 2.15).

By Lemma 16, the conditional expectation satisfies

ETi+1

≥kCηℓi(1−ε) + ℓi+1

j=ℓi+1

log(1₋ηj)ε≥kηℓi

when i is large enough, and where the second inequality is due to Assumption 15. This implies, with Assumption 17, thatP_i_ET_i Gi₋₁

=_∞a.s., and sinceY_i converges a.s. to a finite limit, it holds thatP_iTi =∞a.s.

By (14), one may estimate for anyn=ℓm withm≥1 that

log(uTSnu)≥log(uTS1u) + m

i=1

Ti→ ∞

asm→ ∞. Simple deterministic estimates conclude the proof for the intermediate values ofn.

3.4 Stability with one-dimensional uniformly continuous log-density

In this section, the above analysis of the ‘adaptive random walk’ is extended to imply that lim inf_n_→∞S_n > 0 for the one-dimensional AM algorithm, assuming logπ uniformly continuous. The result follows similarly as in Theorem 18, by coupling the AM process with the ‘adaptive ran-dom walk’ wheneverS_nis small enough to ensure that the acceptance probability is sufficiently close to one.

Theorem 19. Assume d = 1 and logπ is uniformly continuous, and that the adaptation weights

(ηn)n≥2satisfy Assumptions 15 and 17. Then, there is a constant b>0such thatlim infn→∞Sn≥ b.

Proof. Fix aδ∈(0, 1). Due to the uniform continuity of logπ, there is a ˜δ >0 such that logπ(y)₋logπ(x)_≥ 1

2log

1₋δ 2

for all|x− y| ≤δ˜1. Choose ˜M >0 sufficiently large so that

{|z|≤M˜}q˜(z)dz≥

1−δ/2. Denote by

Q_q(x,A):=

(15)

the random walk transition kernel with increment distribution q, and observe that the ‘adaptive random walk’ recursion (6) can be written as “Xn+1∼Qq˜_Sn(Xn,·).” For anyx ∈Rd and measurable

A⊂Rd

|Q˜qs(x,A)−P˜qs(x,A)| ≤2

1−

min

1,π(y) π(x)

qs(y−x)dy

≤2



1− Z

{|z|≤_M˜_}

min

1,π(x+

θsz) π(x)

˜ q(z)dz



.

Now,_|Q_˜_q

s(x,A)−Pq˜s(x,A)| ≤δwhenever p

θsz_≤δ˜1for all|z| ≤M˜. In other words, there exists a

µ=µ(δ)>0 such that whenevers< µ, the total variation normkQ˜qs(x,·)−P˜qs(x,·)k ≤δ.

Next we shall consider a ‘adaptive random walk’ process to be coupled with(Xn,Mn,Sn)n≥1. Let

n,k ≥ 1 and define the random variables (_X˜(n)

j , ˜M

(n)

j , ˜S

(n)

j )j∈[n,n+k] by setting (X˜

(n)

n , ˜M

(n)

n , ˜S

(n)

n ) ≡

(X_n,M_n,S_n)and

X(_jn₊)₁ ∼ Q˜q

S(_jn)

(_X˜(n)

j ,·),

M(_j₊n)₁ := (1₋ηj+1)M˜

(n)

j +ηj+1X˜

(n)

j+1 and

S(_jn₊)₁ := (1₋ηj+1)S˜

(n)

j +ηj+1(X˜

(n)

j+1−M˜

(n)

j ) 2

for j+1 _∈ [n+1,n+k]. The variable ˜X_n(n₊)₁ can be selected so that _P(X˜_n(n₊)₁ = X_n₊₁ _{| Fn}) = 1−kPq˜_Sn(Xn,·)−Q˜q

S(_nn)

(_X˜(n)

n ,·)k; see Theorem 37 in Appendix D. Consequently,P(X˜

(n)

n+16=Xn+1,Sn<

µ| Fn)_≤δ. By the same argument, ˜X(_nn₊)₂ can be chosen so that P X˜(_nn₊)₂6=X_n+2, ˜X

(n)

n+1=Xn+1,Sn+1< µ

σ(Fn₊₁, ˜X(n)

n+1)

≤δ

since if ˜X_n(n₊)₁=X_n+1, then also ˜S

(n)

n+1=Sn+1. This implies

P {_X˜(n)

n+26=Xn+2} ∪ {X˜ (n)

n+16=Xn+1} ∩Bn:n+2

Fn≤2δ

where Bn:j := ∩ j−1

i=n{Si < µ} for j > n. The same argument can be repeated to construct

(X˜(_jn))_j_∈[n,n+k] so that

PD_n:n₊_k Fn

≥1₋kδ (17)

whereDn:n+k:=

Tn+k j=n{X˜

(n)

j =Xj} ∪B

∁

n:n+k.

Apply Lemma 16 withC=18 andε=1/6 to obtaink≥1, and fixδ=ε/k. Denoteℓi:=ik+1 for

anyi≥0, and define the random variables(T_i)_i_≥₁by

Ti:=1

{S_ℓ

i−1<µ/2}min

kMηℓi−1,

ℓi

j=ℓi−1+1 log

1+ηj

Z2_j −1

i «

(18)

(16)

Define also ˜Ti similarly as Ti, but having ˜Z

(ℓi−1)

j with j∈[ℓi−1+1,ℓi]in the right hand side of (18),

defined as ˜Z(ℓi−1)

ℓi−1 ≡Zℓi−1 and by ˜ Z(ℓi−1)

j := X˜

(ℓi−1)

j −M˜

(ℓi−1)

j−1

. q

˜ S(ℓi−1)

j−1 .

for j ∈[ℓ_i₋₁+1,ℓ_i]. Notice that T_i coincides with ˜T_i in Bℓi−1:ℓi ∩Dℓi−1:ℓi. Observe also that ˜X

(ℓi−1)

follows the ‘adaptive random walk’ equation (6) forj_∈[ℓi−1+1,ℓi], and hence ˜Z

(ℓi−1)

j follows (15).

Consequently, denotingGi:=Fℓi, Lemma 16 guarantees that

Lℓi−1,k

≤ε (19)

where L_ℓ

i−1,k:={T˜i<kMηℓi−1}.

Let us show next that wheneverSℓi−1 is small, the variable Ti is expected to have a positive value proportional to the adaptation weight,

ETi

Gi₋₁

{S_ℓ

i−1<µ/2}≥kηℓi−1

{S_ℓ

i−1<µ/2} (20)

almost surely for any sufficiently largei_≥1. Write first

ET_i Gi₋₁

{Sℓi−1<µ/2}=E

B∁_ℓ i−1:ℓi

Bℓi−1:ℓi)Ti

Gi−1

{Sℓi−1<µ/2}

≥E

B_ℓ∁ i−1:ℓi

min

kCηℓi−1, µ 2+ξi

+1B

ℓi−1:ℓiξi

Gi−1

{S_ℓ i−1<µ/2} where the lower boundξi ofTi is given as

ξi := ℓi

j=ℓi−1+1

log(1₋ηj).

By Assumption 15, ξi ≥ −2kηℓi−1 ≥ −µ/4 for any sufficiently large i. Therefore, whenever P

B_ℓ∁

i−1:ℓi

Gi−1

≥ε=3/C, it holds that ET_i Gi₋₁

1{S

ℓi−1<µ/2}≥kηℓi−1

1{S

ℓi−1<µ/2} for any sufficiently largei. On the other hand, ifP

B_ℓ∁

i−1:ℓi

Gi−1

≤ε, then by defining E_i:=B_ℓ∁

i−1:ℓi ∪D

∁

ℓi−1:ℓi∪Lℓi−1,k one has by (17) and (19) thatP(E_i)_≤3ε, and consequently

ET_i Gi₋₁

≥P

E∁_i

Gi−1

ξi+E

iT˜i

Gi₋₁

(17)

This establishes (20).

Define the stopping timesτ1≡1 and forn≥2 throughτn:=inf{i> τn−1:Sℓi−1≥µ/2,Sℓi < µ/2}

with the convention that inf_;= _∞. That is,τi record the times when Sℓi enters (0,µ/2]. Using

τ_i, define the latest such time up to n by σ_n := sup{τ_i : i ≥ 1,τ_i ≤ n}. As in Theorem 18, define the almost surely converging martingale(Y_i,_Gi)_i_≥₁ with Y₁ _≡0 and having the differences dYi := (Ti−E

Gi₋₁

)fori≥2. It is sufficient to show that lim inf_i_→∞S_ℓ

i ≥ b:= µ/4>0 almost surely. If there is a finite i0 ≥1

such thatSℓi ≥µ/2 for alli≥i0, the claim is trivial. Let us consider for the rest of the proof the case

that_{Sℓ_i < µ/2}happens for infinitely many indicesi≥1.

For anym_≥2 such thatSℓm< µ/2, one can write

logSℓm≥logSℓσm+ m

i=σm+1

≥logSℓ_σ_m+ (Ym−Yσm) + m

i=σm+1

kηℓi−1

(21)

since thenS_ℓ

i < µ/2 for alli∈[σm,m−1]and hence alsoE

T_i Gi₋₁

≥kηℓi−1.

Suppose for a moment that there is a positive probability thatSℓm stays within(0,µ/2)indefinitely,

starting from some indexm₁_≥1. Then, there is an infiniteτi and consequentlyσm≤σ <∞for

allm≥1. But asYmconverges, |Ym−Yσm|is a.s. finite, and since

mηℓm=∞by Assumptions 15

and 17, the inequality (21) implies thatSℓm≥µ/2 for sufficiently largem, which is a contradiction.

That is, the stopping timesτi for all i≥ 1 must be a.s. finite, wheneverSℓm < µ/2 for infinitely

many indicesm≥1.

For the rest of the proof, supposeSℓm < µ/2 for infinitely many indicesm≥1. Observe that since

Y_m _→ Y_∞, there exists an a.s. finite index m₂ so that Y_m₋Y_∞ _{≥ −}1/2 log 2 for all m _≥ m₂. As ηn→0 andσm→ ∞, there is an a.s. finitem3 such thatξσm−1 ≥ −1/2 log 2 for allm≥m3. For all

m≥max{m2,m3}and wheneverSℓm< µ/2, it thereby holds that

logSℓm≥logSℓσm −(Ym−Yσm)≥logSℓσm−1+ξσm− 1 2log 2

≥logµ

2−log 2=logb. The caseS_ℓ

m≥µ/2 trivially satisfies the above estimate, concluding the proof.

As a consequence of Theorem 19, one can establish a strong law of large numbers for the uncon-strained AM algorithm running with a Laplace target distribution. Essentially, the only ingredient that needs to be checked is that the simultaneous geometric ergodicity condition holds. This is verified in the next lemma, whose proof is given in Appendix E.

Lemma 20. Suppose that the template proposal distribution ˜q is everywhere positive and

non-increasing away from the origin: ˜q(z) _≥ q˜(w) for all |z| ≤ |w|. Suppose also that π(x) :=

2bexp

(18)

constants M,b such that the following drift and minorisation condition are satisfied for all s_≥ L and measurable A⊂R

P_sV(x)_≤λsV(x) +b1C(x), ∀x∈R (22)

P_s(x,A)_≥δsν(A), ∀x ∈C (23)

where V :_R_→[1,_∞)is defined as V(x):= (sup_zπ(z))1/2π−1/2(x), the set C := [m₋M,m+M], the probability measureµis concentrated on C and PsV(x):=

V(y)Ps(x, dy). Moreover,λs,δs∈(0, 1)

satisfy for all s≥L

max_{(1₋λs)−1,δ−s1} ≤cs

γ ₍₂₄₎

for some constants c,γ >0that may depend on L.

Theorem 21. Assume the adaptation weights(ηn)n≥2satisfy Assumptions 15 and 17, and the template

proposal density˜q and the target distributionπsatisfy the assumptions in Lemma 20. If the functional f satisfiessup_x_∈_Rπ−γ(x)_|f(x)_|<∞for someγ∈(0, 1/2). Then, n−1Pn_k₌₁f(Xk)→

f(x)π(x)dx almost surely as n_{→ ∞}.

Proof. The conditions of 19 are clearly satisfied implying that for anyε >0 there is aκ=κ(ε)>0 such that the event

Bκ:=

inf

n Sn≥κ

has a probabilityP(Bκ)≥1−ε.

The inequalities (22) and (23) of Lemma 20 with the bound (24) imply, using (Saksman and Vihola 2010, Proposition 7 and Lemma 12), that for anyβ >0 there is a constantA=A(κ,ε,β)<∞such that_P(B_κ_{∩ {}max_{|S_n|,_|M_n|}>Anβ_})_≤ε. Let us define the sequence of truncation sets

K_n:=_{(m,s)_∈_R_×_R₊:λmin(s)≥κ, max{|s|,|m|} ≤Anβ}

for n_≥ 1. Construct an auxiliary truncated process (X˜_n, ˜M_n, ˜S_n)_n_≥₁, starting from (X˜₁, ˜M₁, ˜S₁) _≡ (X₁,M₁,S₁)and forn_≥2 through

X_n+1 ∼ Pq˜_Sn˜ (X˜n,·) (_M˜_n₊₁_{, ˜}_S_n₊₁) = σn+1

(_M˜_n_{, ˜}_S_n),ηn+1 Xñ+1−Mñ,(Xñ+1−Mñ)2−Sñ

where the truncation functionσn+1:(Kn)×(R×R)→Knis defined as

σn+1(z,z′) =

(

z+z′, ifz+z′_∈K_n z, otherwise.

Observe that this constrained process coincides with the AM process with probability _P _∀n _≥ 1 : (_X˜_n_{, ˜}_M_n_{, ˜}_S_n) = (Xn,Mn,Sn)

≥ 1− 2ε. Moreover, (Saksman and Vihola 2010, Theorem 2) implies that a strong law of large numbers holds for the truncated process (X˜_n)_n_≥₁, since sup_x_|f(x)_|V−α(x)<∞for someα∈(0, 1₋β), by selectingβ >0 above sufficiently small. Since ε >0 was arbitrary, the strong law of large numbers holds for(Xn)n≥1.

(1)

(a) (ηn+1/ηn)n≥m′is increasing withη_n₊₁/η_n→1and

(b) η−1_n₊/₁2−η−1/2 n →0.

Proof. Define a_n := η−_n1/2 for all n _≥ m′. By Assumption 11 (i) (a_n)_n_≥_m′ is increasing and by Assumption 11 (ii),(∆a_n)_n_≥_m′₊₁ is decreasing, where∆a_n:=a_n−a_n₋₁. One can write

a_n a_n+1

= 1

1+∆an+1

≥ 1

1+ ∆an an₋1

= an−1 a_n

implying that(ηn+1/ηn)n≥m′ is increasing. Denotec=lim_n_→∞η_n₊₁/η_n ≤1. It holds thatη_m′₊_k≤

cη_m′₊_k−₁ ≤ · · · ≤ckη_m′. If c<1, then

nηn<∞contradicting Assumption 11 (iii), socmust be

one, establishing (a). From (a), one obtains

η−1_n₊/₁2−η−1/2 n

η−n1/2 =

ηn

ηn+1

1/2

−1→0 implying (b).

Lemma 33. Suppose m₁ _≥1, g_m₁_≥0, the sequence(ηn)n≥m1 satisfies Assumption 11 andθ >˜ 0is a

constant. The sequence(g_n)_n_>_m

1defined through

g_n+1:=η 1/2 n+1

(1−ηn)3

ηn

gn g_n+η−1n /2

+θ˜2

satisfieslimn→∞gn=θ˜.

Proof. Define the functions f_n:R+→R+forn≥m1+1 by fn+1(x):=η1

/2 n+1

(1₋ηn)3

ηn

x x+η−n1/2

+θ˜2

The functions f_nare contractions on[0,_∞)with contraction coefficientq_n:= (1₋ηn)3since for all x,y ≥0

f_n₊₁(x)− f_n₊₁(y) =η1/2

n+1

(1−ηn)3

ηn

x x+η−1n /2

− y

y+η−1n /2

_η

n+1

ηn

1/2₍₁

−η_n)3

ηn

x−y

(x+η−n1/2)(y+η−n1/2)

≤

_η

n+1

η_n

1/2

(1₋ηn)3

x−y

≤q_n₊₁

x−y

where the second inequality holds sinceηn+1≤ηn.

The fixed point of f_n₊₁can be written as

x_n∗₊₁:=1

−ξn+1+

ξ2_n₊₁+µn+1

(2)

where

ξn+1 := η−n1/2−η 1/2 n+1η−

n (1−ηn)3−η 1/2 n+1θ˜

µn+1 := 4η−n1/2η 1/2 n+1θ˜

2_.

Lemma 32 (a) impliesµn+1→4 ˜θ2. Moreover,

ξn+1=η−1n /2−η 1/2

n+1η−1n +η 1/2

n+1(3−3ηn+η2n−θ˜ 2₎ =

ηn+1

ηn

1/2

η−_n₊1/₁2−η−_n1/2

+η1_n/₊2₁(3₋3ηn+η2n−θ˜ 2₎_.

Therefore, by Assumption 11 (i) and Lemma 32,ξ_n+1→0 and consequently the fixed points satisfy x∗_n_→θ˜.

Consider next the consecutive differences of the fixed points. Using the mean value theorem and the triangle inequality, write

2x∗

n+1−xn∗

≤

ξ_n₊₁−ξ_n

1 2pτn

ξ2

n+1−ξ 2

n+µn+1−µn

≤ξ_n₊₁−ξ_n

τ′_n p_τ

ξ_n₊₁−ξ_n

1 2pτn

µ_n₊₁−µ_n

≤c1

ξ_n₊₁−ξ_n +c₁

µ_n₊₁−µ_n

where the value ofτ_n is betweenξ2

n+1+µn+1 andξ2n+µn converging to 4 ˜θ2 >0, the value ofτ′n

is between_|ξn+1|and|ξn|converging to zero, andc1>0 is a constant.

The differences of the latter terms satisfy for allm≥m′ m

n=m′

µ_n₊₁−µ_n =4 ˜θ2

n=m′

ηn+1

ηn

1/2

−

ηn

ηn−1

1/2

≤4 ˜θ2

1₋

ηm′ ηm′₋₁

1/2

≤4 ˜θ2. by Assumption 11 (ii) and Lemma 32 (a). For the first term, let us estimate

ξ_n₊₁−ξ_n ≤ _η

n+1

η_n

1/2

η−_n₊1/₁2−η−_n1/2

−

_η

η_n−1

1/2

η−_n1/2−η−_n₋1/₁2

+3−θ˜2 η

1/2 n −η

1/2 n+1

+ η

1/2

n+1(3ηn−η2n)−η 1/2

n (3ηn−1−η2n−1)

Assumption 11 (i) implies thatη1_n/2−η1_n/₊2₁ ≥0 for n_≥ m′ and hencePm_n₌_m′

1/2 n −η

1/2 n+1

≤ η

1/2 m′ for anym≥m′. Since the function(x,y)_7→x(3y−y2)is Lipschitz on[0, 1]2, there is a constantc2

independent ofnsuch thatη1/2

n+1(3ηn−η2n)−η 1/2

n (3ηn−1−η2n−1)

≤c₂ |η1/2

n+1−η 1/2

n |+|ηn−ηn−1|

, and a similar argument shows that

n=m′

1/2

n+1(3ηn−η2n)−η 1/2

n (3ηn−1−η2n−1)

(3)

One can also estimate

_η

n+1

η_n

1/2

η−_n₊1/₁2−η−1_n /2

−

_η

η_n−1

1/2

η−1_n /2−η−_n₋1/₁2

≤c4

ηn+1

ηn

1/2

−

ηn

η_n−1

1/2

+c4

η−_n₊1/₁2−η−_n1/2−η_n−1/2−η_n−₋1/₁2

yielding by Assumption 11 (ii) and Lemma 32 that Pm_n₌_m′|ξn+1−ξn| ≤c5 for allm≥ m′, with a

constantc₅<∞. Combining the above estimates, the fixed point differences satisfy

n=m′

|x∗_n₊₁−x_n∗|<∞. Fix aδ >0 and letnδ>m1be sufficiently large so that

P_∞

k=n_δ+1|xn∗+1−x∗n| ≤δimplying also that

|x∗_n₋θ˜| ≤δfor alln_≥n_δ. Then, forn_≥n_δ one may write

g_n−θ˜

≤

g_n−x∗

x∗

n−θ˜

≤

f_n(g_n₋₁)−f_n(x∗

+δ

≤q_ng_n₋₁−x∗

+δ≤q_n

g_n₋₁−x∗

n−1

x∗

n−1−xn∗

+δ

≤q_nq_n−1g_n−2−x∗

n−2

x∗

n−2−x∗n−1

x∗

n−1−xn∗

+δ

≤ · · · ≤



 

k=nδ+1



  gn_δ−x

∗ n_δ

+2δ.

Since logQn_k₌_n

δ+1qk =3

k=n_δ+1log(1−ηk−1)≤ −3

Pn−1

k=n_δηk→ −∞as n→ ∞by Assumption

11 (iii), it holds that(Qn_k₌_n

δ+1qk)|gnδ−x

∗

nδ| →0. That is,|gn−

θ| ≤3δfor any sufficiently large

n, and sinceδ >0 was arbitrary,gn→θ.˜

B

Lemmas for Section 3.3

Proof of Lemma 13. Equation (14) follows directly by writing

uTS_n+1u= (1−ηn+1)uTSnu+ηn+1uT(Xn+1−Mn)(Xn+1−Mn)Tu = [1+ηn+1(Zn2+1−1)]u

T_S nu.

Forn≥2, write using the above equation

Z_n₊₁=θuTS 1/2 n Wn+1

kSn1/2uk

+ (1₋ηn)uT

Xn−Mn−1

kSn1/2uk =θ u

T_S1/2 n

kS1n/2uk

W_n+1+ (1−ηn)

uTSn−1u uT_S

1/2

Z_n

=θ_W˜_n₊₁+ (1−η_n)

1 1+η_n(Z_n2−1)

1/2

Z_n

where (W˜n+1 := kSn1/2uk−

1_uT_S1/2

n Wn+1 are non-degenerate i.i.d. random variables by Lemma 34

(4)

Lemma 34. Assume u _∈ _Rd is a non-zero vector, (W_n)_n_≥₁ are independent random variables fol-lowing a common spherically symmetric non-degenerate distribution in Rd. Assume also that Sn are symmetric and positive definite random matrices taking values inRd×d measurable with respect

Fn:=σ(W1, . . . ,Wn). Then the random variables(W˜n)n≥2defined through

Wn+1:=

uTS1_n/2

kS_n1/2u_kWn+1 are i.i.d. non-degenerate real-valued random variables.

Proof. Choose a measurableA⊂R, denoteT_n:=_kS_n1/2uk−1S_n1/2uand defineA_n:=_{x ∈Rd:T_nTx ∈

A_}. LetR_nbe a rotation matrix such thatRT_nT_n=e₁:= (1, 0, . . . , 0)_∈_Rd. SinceW_n₊₁is independent of_F_n, we have

P(W˜_n₊₁_∈A_{| F}_n) =_P(W_n₊₁_∈A_n_{| F}_n) =_P(R_nW_n₊₁_∈A_n_{| F}_n) =_P(e₁TWn+1∈A| Fn) =P(e1TW1∈A)

by the rotational invariance of the distribution of (W_n)_n_≥₁. Since the common distribution of

(W_n)_n≥₁is non-degenerate, so is the distribution ofe₁TW1.

Remark 35. Notice particularly that if(W_n)_n_≥₂ in Lemma 34 are standard Gaussian vectors in_Rd then(_W˜_n)_n_≥₂are standard Gaussian random variables.

C

The Kolmogorov-Rogozin inequality

Define the concentration functionQ(X;λ)of a random variableX by

Q(X;λ):=sup

x∈RP

(X _∈[x,x+λ])

for allλ≥0.

Theorem 36. Let X₁,X₂, . . .be mutually independent random variables. There is a universal constant c>0such that

Q n

k=1 Xk;L

≤ c L λ

k=1

1−Q(X_k;λ) !−1/2

for all L_≥λ >0.

Proof. Rogozin’s original work Rogozin (1961) uses combinatorial results, and Esseen’s alternative proof Esseen (1966) is based on characteristic functions.

D

A coupling construction

Theorem 37. Suppose µ and ν are probability measures and the random variable X ∼ µ. Then, possibly by augmenting the probability space, there is another random variable Y such that Y _∼ν and

(5)

Proof (adopted from Theorem 3 in (Roberts and Rosenthal 2004)). Define the measure ρ := µ+ν, and the densities g := dµ/dρ andh:= dν/dρ, existing by the Radon-Nikodym theorem. Let us introduce two auxiliary variables U and Z independent of each other and X, whose existence is ensured by possible augmentation of the probability space. Then,Y is defined through

Y =1{U≤r(X)}X+1{U>r(X)}Z

where the ‘coupling probability’r is defined as r(y):=min{1,h(y)/g(y)_}wheneverg(y)>0 and

r(y) := 1 otherwise. The variable U is uniformly distributed on [0, 1]. If r(y) = 1 forρ-almost every y, then the choice ofZis irrelevant,µ=ν, and the claim is trivial. Otherwise, the variableZ

is distributed following the ‘residual measure’ξgiven as

ξ(A):=

Amax{0,h−g}dρ

max_{0,h₋g_}dρ.

Observe that Rmax{0,h− g}dρ = Rmax{0,g−h}dρ > 0 in this case, so ξ is a well defined probability measure.

Let us check thatY ∼ν,

P(Y _∈A) =

rdµ+ξ(A)

(1₋r)dµ

min_{g,h_}dρ+ξ(A)

h<g

(g₋h)dρ

min{g,h}+max{0,h−g}ρ(dx) =ν(A). Moreover, by observing thatr(y) =1 in the support ofξ, one has

P(X =Y) =

rdµ=

min{g,h}dρ=1−

g<h

(h₋g)dρ=1− kν−µk

sinceR_g_<_h(h₋g)dρ=R

h<g(g−h)dρ=supf

f(h₋g)dρ

=kµ−νkwhere the supremum taken

over all measurable functions f taking values in[0, 1].

E

Proof of Lemma 20

Observe that without loss of generality it is sufficient to check the case m= 0 and b= 1, that is, consider the standard Laplace distributionπ(x):= 1

2e −|x|_.

Let x>0 and start by writing 1− PsV(x)

V(x) =

Z x

−x

a(x,y)˜q_s(y−x)dy−

|y|>x

(6)

where

a(x,y) :=



1−

π(x)

π(y)



=1−e−

x−|y|

2 and

b(x,y) :=

π(y)

π(x) 1−

π(y)

π(x)

=e−|y|−2x

1−e−|y|−2x

. Compute then that

Z x

a(x,y)˜q_s(y₋x)dy₋

Z 2x

b(x,y)q˜_s(y₋x)dy=

Z x

1₋e−z22q˜

s(z)dz.

The estimates

Z 0

−x

a(x,y)˜qs(y−x)dy ≥˜qs(2x)

Z x

a(x,y)dy =˜qs(2x)

Z x

(1−e−z2)dz Z −x

−∞

b(x,y)˜q_s(y₋x)dy _≤˜q_s(2x)

Z ∞

b(x,y)dy=˜q_s(2x)

Z ∞

e−z2(1−e

2)dz

due to the non-increasing property of ˜qs yield

Z 0

−x

a(x,y)˜qs(y−x)dy−

Z −x

−∞

b(x,y)˜qs(y−x)dy

≥q˜_s(2x)

Z x

(1₋e−z2)2dz−

Z ∞

e−2zdz

>0 for any sufficiently large x>0. Similarly, one obtains

1 2

Z x

1₋e−z22˜q

s(z)dz−

Z ∞

b(x,y)q_s(y₋x)dy >0 for large enoughx >0.

Summing up, lettingM>0 be sufficiently large, then forx _≥M ands_≥L>0 1₋ PsV(x)

V(x) ≥

1 2

Z x

1₋e−z22q˜_s(z)dz≥

1 2q˜s(M)

Z M

1₋e−z22dz

≥c1s−1/2q(˜ θ−1/2s−1/2M)≥c2s−1/2

for some constants c₁,c₂ >0. The same inequality holds also for₋x _{≤ −}M due to symmetry. The simple bound PsV(x)≤ 2V(x)observed from (32) with the above estimate establishes (22). The

minorisation inequality (23) holds since for all x∈C one may write

P_s(x,A)_≥

A∩C

max

1,π(y) π(x)

q_s(y₋x)dy

≥ infz∈Cπ(z)

sup_zπ(z) s≥L,infz,y∈C˜qs(z−y)

A∩C

dy≥c3s−1/2ν(A).

getdocc8a6. 332KB Jun 04 2011 12:04:58 AM

Can the adaptive Metropolis algorithm collapse without

the covariance lower bound?

∗

Matti Vihola

1

Introduction

2

The general algorithm

3

The unconstrained AM algorithm

3.1

Overview of the results

3.2

Uniform target: expected growth rate

3.3

Uniform target: path-wise behaviour

3.4

Stability with one-dimensional uniformly continuous log-density

B

Lemmas for Section 3.3

C

The Kolmogorov-Rogozin inequality

D

A coupling construction

E

Proof of Lemma 20

Parts

Dokumen yang terkait

AN ALIS IS YU RID IS PUT USAN BE B AS DAL AM P E RKAR A TIND AK P IDA NA P E NY E RTA AN M E L AK U K A N P R AK T IK K E DO K T E RA N YA NG M E N G A K IB ATK AN M ATINYA P AS IE N ( PUT USA N N O MOR: 9 0/PID.B /2011/ PN.MD O)

ANALISIS FAKTOR YANGMEMPENGARUHI FERTILITAS PASANGAN USIA SUBUR DI DESA SEMBORO KECAMATAN SEMBORO KABUPATEN JEMBER TAHUN 2011

EFEKTIVITAS PENDIDIKAN KESEHATAN TENTANG PERTOLONGAN PERTAMA PADA KECELAKAAN (P3K) TERHADAP SIKAP MASYARAKAT DALAM PENANGANAN KORBAN KECELAKAAN LALU LINTAS (Studi Di Wilayah RT 05 RW 04 Kelurahan Sukun Kota Malang)

FAKTOR Â– FAKTOR YANG MEMPENGARUHI PENYERAPAN TENAGA KERJA INDUSTRI PENGOLAHAN BESAR DAN MENENGAH PADA TINGKAT KABUPATEN / KOTA DI JAWA TIMUR TAHUN 2006 - 2011

A DISCOURSE ANALYSIS ON “SPA: REGAIN BALANCE OF YOUR INNER AND OUTER BEAUTY” IN THE JAKARTA POST ON 4 MARCH 2011

Pengaruh kualitas aktiva produktif dan non performing financing terhadap return on asset perbankan syariah (Studi Pada 3 Bank Umum Syariah Tahun 2011 – 2014)

Pengaruh pemahaman fiqh muamalat mahasiswa terhadap keputusan membeli produk fashion palsu (study pada mahasiswa angkatan 2011 & 2012 prodi muamalat fakultas syariah dan hukum UIN Syarif Hidayatullah Jakarta)

Pendidikan Agama Islam Untuk Kelas 3 SD Kelas 3 Suyanto Suyoto 2011

ANALISIS NOTA KESEPAHAMAN ANTARA BANK INDONESIA, POLRI, DAN KEJAKSAAN REPUBLIK INDONESIA TAHUN 2011 SEBAGAI MEKANISME PERCEPATAN PENANGANAN TINDAK PIDANA PERBANKAN KHUSUSNYA BANK INDONESIA SEBAGAI PIHAK PELAPOR

KOORDINASI OTORITAS JASA KEUANGAN (OJK) DENGAN LEMBAGA PENJAMIN SIMPANAN (LPS) DAN BANK INDONESIA (BI) DALAM UPAYA PENANGANAN BANK BERMASALAH BERDASARKAN UNDANG-UNDANG RI NOMOR 21 TAHUN 2011 TENTANG OTORITAS JASA KEUANGAN

Dokumen yang Anda mencari sudah siap untuk unduhkan