π
2
x = cp, q, N qp
x
. It is an interesting problem to study the merging property of the set Q = {Q
1
, Q
2
}. Here, we only make a simple observation concerning the behavior of the sequence K
i
= Q
i mod 2
. Let Q = K
0,2
= Q
1
Q
2
. The graph structure of this kernel is a circle. As an example, below we give the graph structure for N = 10.
2 4
6 8
10 1
3 5
7 9
Edges are drawn between points x and y if Qx, y 0. Note that Qx, y 0 if and only if
Q y, x 0, so that all edges can be traversed in both directions possibly with different probabili-
ties. For the Markov chain driven by Q, there is equal probability of going from a point x to any of its
neighbors as long as x 6= 0, N . Using this fact, one can compute the invariant measure π of Q and
conclude that max
V
N
{π} ≤ pq
2
min
V
N
{π}. It follows that q
p
2
≤ N + 1πx ≤ pq
2
. This and the comparison techniques of [8] show that the sequence Q
1
, Q
2
, Q
1
, Q
2
, . . . , is merging in relative sup in time of order N
2
. Compare with the fact that each kernel K
i
in the sequence has a mixing time of order N .
3 Singular value analysis
3.1 Preliminaries
We say that a measure µ is positive if ∀x, µx 0. Given a positive probability measure µ on V
and a Markov kernel K, set µ
′
= µK. If K satisfies ∀ y ∈ V,
X
x∈V
Kx, y 1
then µ
′
is also positive. Obviously, any irreducible kernel K satisfies 1. Fix p ∈ [1, ∞] and consider K as a linear operator
K : ℓ
p
µ
′
→ ℓ
p
µ, K f x = X
y
Kx, y f y. 2
It is important to note, and easy to check, that for any measure µ, the operator K : ℓ
p
µ
′
→ ℓ
p
µ is a contraction.
Consider a sequence K
i ∞
1
of Markov kernels satisfying 1. Fix a positive probability measure µ
and set µ
n
= µ K
0,n
. Observe that µ
n
0 and set d
p
K
0,n
x, ·, µ
n
= X
y
K
0,n
x, y µ
n
y − 1
p
µ
n
y
1 p
. 1464
Note that 2kK
0,n
x, · − µ
n
k
TV
= d
1
K
0,n
x, ·, µ
n
and, if 1 ≤ p ≤ r ≤ ∞, d
p
K
0,n
x, ·, µ
n
≤ d
r
K
0,n
x, ·, µ
n
. Further, one easily checks the important fact that n 7→ d
p
K
0,n
x, ·, µ
n
is non-increasing. It follows that we may control the total variation merging of a sequence K
i ∞
with K
i
satisfying 1 by
kK
0,n
x, · − K
0,n
y, ·k
TV
≤ max
x∈V
{d
2
K
0,n
x, ·, µ
n
}. 3
To control relative-sup merging we note that if max
x, y,z
¨ K
0,n
x, z µ
n
z − 1
« ≤ ε ≤ 12 then max
x, y,z
¨ K
0,n
x, z K
0,n
y, z − 1
« ≤ 4ε.
The last inequality follows from the fact that if 1 − ε ≤ ab, cb ≤ 1 + ε with ε ∈ 0, 12 then
1 − 2 ε ≤
1 − ε
1 + ε
≤ a
c ≤
1 + ε
1 − ε
≤ 1 + 4ε. 4
3.2 Singular value decomposition
The following material can be developed over the real or complex numbers with little change. Since our operators are Markov operators, we work over the reals. Let H and G be real Hilbert spaces
equipped with inner products 〈·, ·〉
H
and 〈·, ·〉
G
respectively. If u : H × G → R is a bounded bilinear form, by the Riesz representation theorem, there are unique operators A : H → G and B : G → H
such that uh, k = 〈Ah, k〉
G
= 〈h, Bk〉
H
. 5
If A : H → G is given and we set uh, k = 〈Ah, k〉
G
then the unique operator B : G → H satisfying 5 is called the adjoint of A and is denoted as B = A
∗
. The following classical result can be derived from [20, Theorem 1.9.3].
Theorem 3.1 Singular value decomposition. Let H and G be two Hilbert spaces of the same dimen- sion, finite or countable. Let A : H → G be a compact operator. There exist orthonormal bases
φ
i
of H and ψ
i
of G and non-negative reals σ
i
= σ
i
H , G , A such that Aφ
i
= σ
i
ψ
i
and A
∗
ψ
i
= σ
i
φ
i
. The non-negative numbers
σ
i
are called the singular values of A and are equal to the square root of the eigenvalues of the self-adjoint compact operator A
∗
A : H → H and also of AA
∗
: G → G . One important difference between eigenvalues and singular values is that the singular values depend
very much on the Hilbert structures carried by H , G . For instance, a Markov operator K on a finite set V may have singular values larger than 1 when viewed as an operator from
ℓ
2
ν to ℓ
2
µ for arbitrary positive probability measure
ν, µ even with ν = µ. We now apply the singular value decomposition above to obtain an expression of the
ℓ
2
distance between
µ
′
= µK and Kx, · when K is a Markov kernel satisfying 1 and µ a positive probability
1465
measure on V . Consider the operator K = K
µ
: ℓ
2
µ
′
→ ℓ
2
µ defined by 2. Then the adjoint K
∗ µ
: ℓ
2
µ → ℓ
2
µ
′
has kernel K
∗ µ
x, y given by K
∗ µ
y, x = Kx, y
µx µ
′
y .
By Theorem 3.1, there are eigenbases ϕ
i |V |−1
and ψ
i |V |−1
of ℓ
2
µ
′
and ℓ
2
µ respectively such that
K
µ
ϕ
i
= σ
i
ψ
i
and K
∗ µ
ψ
i
= σ
i
ϕ
i
where σ
i
= σ
i
K, µ, i = 0, . . . |V | − 1 are the singular values of K, i.e., the square roots of the eigenvalues of K
∗ µ
K
µ
and K
µ
K
∗ µ
given in non-increasing order, i.e. 1 =
σ ≥ σ
1
≥ · · · ≥ σ
|V |−1
and ψ
= ϕ ≡ 1. From this it follows that, for any x ∈ V ,
d
2
Kx, ·, µ
′ 2
=
|V |−1
X
i=1
|ψ
i
x|
2
σ
2 i
. 6
To see this, write d
2
Kx, ·, µ
′ 2
= Kx, ·
µ
′
− 1, Kx, ·
µ
′
− 1
µ
′
= Kx, ·
µ
′
, Kx, ·
µ
′ µ
′
− 1. With ˜
δ
y
= δ
y
µ
′
y, we have Kx, yµ
′
y = K ˜ δ
y
x. Write ˜
δ
y
=
|V |−1
X a
i
ϕ
i
where a
i
= 〈 ˜ δ
y
, ϕ
i
〉
µ
′
= ϕ
i
y so we get that
Kx, y µ
′
y =
|V |−1
X
i=0
σ
i
ψ
i
xϕ
i
y. Using this equality yields the desired result. This leads to the main result of this section. In what
follows we often write K for K
µ
when the context makes it clear that we are considering K as an operator from
ℓ
2
µ
′
to ℓ
2
µ for some fixed µ.
Theorem 3.2. Let K
i ∞
1
be a sequence of Markov kernels on a finite set V , all satisfying 1. Fix a positive starting measure
µ and set
µ
i
= µ K
0,i
. For each i = 0, 1, . . . , let σ
j
K
i
, µ
i−1
, j = 0, 1, . . . , |V | − 1, be the singular values of K
i
: ℓ
2
µ
i
→ ℓ
2
µ
i−1
in non-increasing order. Then X
x∈V
d
2
K
0,n
x, ·, µ
n 2
µ x ≤
|V |−1
X
j=1 n
Y
i=1
σ
j
K
i
, µ
i−1 2
. 1466
and, for all x ∈ V , d
2
K
0,n
x, ·, µ
n 2
≤ 1
µ x
− 1
n
Y
i=1
σ
1
K
i
, µ
i−1 2
. Moreover, for all x, y ∈ V ,
K
0,n
x, y µ
n
y − 1
≤ 1
µ x
− 1
1 2
1 µ
n
y − 1
1 2 n
Y
i=1
σ
1
K
i
, µ
i−1
. Proof. Apply the discussion prior to Theorem 6 with
µ = µ , K = K
0,n
and µ
′
= µ
n
. Let
ψ
i |V |−1
be the orthonormal basis of ℓ
2
µ given by Theorem 3.1 and ˜
δ
x
= δ
x
µ x. Then
˜ δ
x
= P
|V |−1
ψ
i
xψ
i
. This yields
|V |−1
X
i=0
|ψ
i
x|
2
= k ˜ δ
x
k
2 ℓ
2
µ
= µ x
−1
. Furthermore, Theorem 3.3.4 and Corollary 3.3.10 in [15] give the inequality
∀ k = 1, . . . , |V | − 1,
k
X
j=1
σ
j
K
0,n
, µ
2
≤
k
X
j=1 n
Y
i=1
σ
j
K
i
, µ
i−1 2
. Using this with k = |V | − 1 in 6 yields the first claimed inequality. The second inequality then
follows from the fact that σ
1
K
0,n
, µ
≥ σ
j
K
0,n
, µ
for all j = 1 . . . |V | − 1. The last inequality follows from writing
K
0,n
x, y µ
n
y − 1
≤ σK
0,n
, µ
|V |−1
X
1
|ψ
i
xφ
i
y| and bounding
P
|V |−1 1
|ψ
i
xφ
i
y| by µ x
−1
− 1
1 2
µ
n
y
−1
− 1
1 2
. Remark 3.3. The singular value
σ
1
K
i
, µ
i−1
= p
β
1
i is the square root of the second largest eigenvalue
β
1
i of K
∗ i
K
i
: ℓ
2
µ
i
→ ℓ
2
µ
i
. The operator P
i
= K
∗ i
K
i
has Markov kernel P
i
x, y = 1
µ
i
x X
z∈V
µ
i−1
zK
i
z, xK
i
z, y 7
with reversible measure µ
i
. Hence 1 −
β
1
i = min
f 6≡ µ
i
f
¨ E
P
i
, µ
i
f , f Var
µ
i
f «
with E
P
i
, µ
i
f , f = 1
2 X
x, y∈V
| f x − f y|
2
P
i
x, yµ
i
x. The difficulty in applying Theorem 3.2 is that it usually requires some control on the sequence of
measures µ
i
. Indeed, assume that each K
i
is aperiodic irreducible with invariant probability measure 1467
π
i
. One natural way to put quantitative hypotheses on the ergodic behavior of the individual steps K
i
, π
i
is to consider the Markov kernel eP
i
x, y = 1
π
i
x X
z∈V
π
i
zK
i
z, xK
i
z, y which is the kernel of the operator K
∗ i
K
i
when K
i
is understood as an operator acting on ℓ
2
π
i
note the difficulty of notation coming from the fact that we are using the same notation K
i
to denote two operators acting on different Hilbert spaces. For instance, let e
β
i
be the second largest eigenvalue of e
P
i
, π
i
. Given the extreme similarity between the definitions of P
i
and e P
i
, one may hope to bound β
i
using e β
i
. This however requires some control of M
i
= max
z
π
i
z µ
i−1
z ,
µ
i
z π
i
z .
Indeed, by a simple comparison argument see, e.g., [7; 9; 21], we have β
i
≤ 1 − M
−2 i
1 − e β
i
. One concludes that
d
2
K
0,n
x, ·, µ
n 2
≤ 1
µ x
− 1
n
Y
i=1
1 − M
−2 i
1 − e β
i
. and
K
0,n
x, y µ
n
y − 1
≤ 1
µ x
− 1
1 2
1 µ
n
y − 1
1 2 n
Y
i=1
1 − M
−2 i
1 − e β
i 1
2
. Remark 3.4. The paper [4] studies certain contraction properties of Markov operators. It contains, in
a more general context, the observation made above that a Markov operator is always a contraction from
ℓ
p
µK to ℓ
p
µ and that, in the case of ℓ
2
spaces, the operator norm kK − µ
′
k
ℓ
2
µK→ℓ
2
µ
is given by the second largest singular value of K
µ
: ℓ
2
µK → ℓ
2
µ which is also the square root of the second eigenvalue of the Markov operator P acting on
ℓ
2
µK where P = K
∗ µ
K
µ
, K
∗ µ
: ℓ
2
µ → ℓ
2
µK. This yields a slightly less precise version of the last inequality in Theorem 3.2. Namely, writing
K
0,n
− µ
n
= K
1
− µ
1
K
2
− µ
2
· · · K
n
− µ
n
and using the contraction property above one gets kK
0,n
− µ
n
k
ℓ
2
µ
n
→ℓ
2
µ
≤
n
Y
1
σ
1
K
i
, µ
i−1
. As kI −
µ k
ℓ
2
µ →ℓ
∞
µ
= max
x
µ x
−1
− 1
1 2
, it follows that max
x∈V
d
2
K
0,n
x, ·, µ
n
≤ 1
min
x
{µ x}
− 1
1 2 n
Y
1
σ
1
K
i
, µ
i−1
.
1468
Example 3.5 Doeblin’s condition. Assume that, for each i, there exists α
i
∈ 0, 1, and a probabil- ity measure
π
i
which does not have to have full support such that ∀ i, x, y ∈ V, K
i
x, y ≥ α
i
π
i
y. This is known as a Doeblin type condition. For any positive probability measure
µ , the kernel P
i
defined at 7 is then bounded below by P
i
x, y ≥ α
i
π
i
y µ
i
x X
z
µ
i−1
zK
i
z, x = α
i
π
i
y. This implies that
β
1
i, the second largest eigenvalue of P
i
, is bounded by β
1
i ≤ 1−α
i
2. Theorem 3.2 then yields
d
2
K
0,n
x, ·, µ
n
≤ µ x
−12 n
Y
i=1
1 − α
i
2
1 2
. Let us observe that the very classical coupling argument usually employed in relation to Doeblin’s
condition applies without change in the present context and yields
max
x, y
{kK
0,n
x, · − K
0,n
y, ·k
TV
} ≤
n
Y
1
1 − α
i
. See [11] for interesting developments in this direction.
Example 3.6. On a finite state space V , consider a sequence of edge sets E
i
⊂ V × V . For each i, assume that
1. For all x ∈ V , x, x ∈ E
i
. 2. For all x, y ∈ V , there exist k = ki, x, y and a sequence x
j k
of elements of V such that x
= x, x
k
= y and x
j
, x
j+1
∈ E
i
, j ∈ {0, . . . , k − 1}. Consider a sequence K
i ∞
1
of Markov kernels on V such that ∀ i, ∀ x, y ∈ V, K
i
x, y ≥ ε1