Directory UMM :Journals:Journal_of_mathematics:OTHER:

M
M PP EE JJ
Mathematical Physics Electronic Journal
ISSN 1086-6655
Volume 10, 2004
Paper 3
Received: Feb 17, 2004, Revised: Feb 26, 2004, Accepted: Mar 1, 2004
Editor: R. de la Llave

The Kolmogorov–Arnold–Moser theorem
Dietmar A. Salamon
ETH-Z¨
urich
26 February 2004
Abstract
This paper gives a self contained proof of the perturbation theorem
for invariant tori in Hamiltonian systems by Kolmogorov, Arnold, and
Moser with sharp differentiability hypotheses. The proof follows an idea
outlined by Moser in [16] and, as byproducts, gives rise to uniqueness and
regularity theorems for invariant tori. 1


1

Introduction

KAM theory is concerned with the existence of invariant tori for Hamiltonian
differential equations. More precisely, we consider the following situation. Let
H(x, y) be a smooth real valued function of 2n variables x1 , . . . , xn , y1 , . . . , yn .
We assume that H is of period 1 in each of the variables x1 , . . . , xn and y is
restricted to an open domain G ⊂ Rn so that H is defined on Tn ×G where Tn :=
1 The present paper was written in 1986 while I was a postdoc at ETH Z¨
urich. I didn’t
publish it at the time because the results are well known and the paper is of expository nature. The paper is reproduced here with the following changes: there are a few updates in
the introduction, a mistake in Lemma 3 and the proof of Theorem 2 has been corrected, the
hypotheses of Theorem 2 have been weakened, and Lemma 5 has been moved to Section 3.
The original manuscript can be found on my webpage http://www.math.ethz.ch/ salamon/publications.html.

1

Rn /Zn denotes the n-torus. Consider the Hamiltonian differential equation
y˙ = −Hx ,


x˙ = Hy ,

(1.1)

where Hx ∈ Rn and Hy ∈ Rn denote the vectors of partial derivatives of H with
respect to xν and yν . As a matter of fact, any Hamiltonian vector field on a
symplectic manifold can, in a neighborhood of a Lagrangian invariant torus, be
represented in the form (1.1).
By an invariant torus for (1.1) in parametrized form with prescribed frequencies ω1 , . . . , ωn we mean an embedding x = u(ξ), y = v(ξ) of Tn into
Tn × G such that u(ξ) − ξ and v(ξ) are of period 1 in all variables, u represents
a diffeomorphism of the torus, and the solutions of the differential equation
ξ˙ = ω
are mapped to solutions of (1.1). Equivalently, u and v satisfy the degenerate,
nonlinear partial differential equations
Dv = −Hx (u, v),

Du = Hy (u, v),

(1.2)


where D denotes the first order differential operator
D :=

n
X

ν=1

ων


.
∂ξν

Observe that any such invariant torus consists of quasiperiodic solutions of the
Hamiltonian differential equation (1.1). In fact, the solutions are periodic if all
the frequencies ω1 , . . . , ωn are integer multiples of a fixed number and they lie
dense on the torus if the frequencies are rationally independent.
The simplest case is where the function H = H(y) is independent of x so that

each of the sets y = constant defines an invariant torus which can be represented
by the functions u(ξ) = ξ and v(ξ) = y. The frequency vector is then given
by ω = Hy (y) and will in general depend on the value of y. This situation
corresponds to an integrable Hamiltonian system and must be considered as the
exceptional case. In fact, it has already been shown by Poincar´e that in general
one cannot expect equation (1.2) to have a solution for all values of ω in an
open region, even if the Hamiltonian function H(x, y) is close to an integrable
one. The remarkable discovery of KAM theory was that even though a solution
of (1.2) cannot be expected for rational frequency vectors it does indeed exist
if the frequencies ω1 , . . . , ωn are rationally independent and, moreover, satisfy
the Diophantine inequalities
j ∈ Zn \ {0}

=⇒

|hj, ωi| ≥

1
c0 |j|τ


(1.3)

for some constants c0 > 0 and τ > n − 1. Here |j| denotes the Euclidean norm.
Actually, only the perturbation problem for equation (1.2) is treated in KAM
theory. More precisely, in addition to (1.3) we will have to assume that the
2

Hamiltonian function H(x, y) is sufficiently close to a function F (x, y) for which
a solution is known to exist. After a suitable coordinate transformation this
amounts to the condition
Fx (x, 0) = 0,

Fy (x, 0) = ω,

(1.4)

so that the invariant torus for F can be represented by the functions u(ξ) = ξ
and v(ξ) = 0. Moreover, we assume that the unperturbed Hamiltonian function
satisfies the Legendre type condition
µZ


det
Fyy (x, 0) dx 6= 0.
(1.5)
Tn

Under these assumptions the theorem asserts that the invariant torus with prescribed frequencies ω1 , . . . , ωn survives under sufficiently small perturbations of
the Hamiltonian function and its derivatives.
The perturbation theory described above goes back over forty years to the
work of Kolmogorov [10, 11], Arnold [3, 4], and Moser [13, 14], Actually, the
history of the problem of finding quasiperiodic solutions of Hamiltonian differential equations is much longer. In particular, it was of great interest to Poincar´e
and Weierstrass especially because of its relevance for the stability theory in
celestial mechanics. For a more detailed account of the historical development
the interested reader is referred to [17]. Closely related developments in the
theory of Hamiltonian systems like the theory of Mather and Aubry as well as
connections to various other fields in mathematics are described in [19].
The present paper builds on the ideas developed by Kolmogorov, Arnold,
Moser and also by R¨
ussmann, Zehnder, P¨
oschel and many others. Its purpose

is to summarize some of their work and to present a complete proof of the
above mentioned perturbation theorem for invariant tori. We are particularly
interested in the minimal number of derivatives required for F and H. The
main ideas of this proof were outlined by Moser [16] and the details were later
provided by P¨
oschel [22, 23] in a somewhat different way than it is done here.
Moser’s proof may be roughly sketched as follows. The first step is to prove a
theorem for analytic Hamiltonians which involves quantitative estimates (Theorem 1). For differentiable Hamiltonians the result will then be obtained by
approximating H(x, y) with a sequence of real analytic functions - again including quantitative estimates - and then applying Theorem 1 to each element of
the sequence (Theorem 2).
A beautiful presentation of similar ideas can be found in [15] in connection
with Arnold’s theorem about vector fields on a torus [2]. Moser has also used
these methods for proving his perturbation theorem for minimal foliations for
variational problems on a torus. This result was first stated in [18] and its
proof appeared in [20]. In [9] Jacobowitz has applied the same approach to
Nash’s embedding problem for compact Riemannian manifolds [21] in order to
reduce the number of derivatives to ℓ > 2. Later Zehnder [28] has developed
a general abstract setup for small divisor problems along these lines. Other
versions of such abstract implicit function theorems can be found in the work
of Hamilton [6] and H¨

ormander [8].
3

In connection with his paper [16] Moser produced a set of handwritten notes
in which he sketched the necessary estimates for the proofs of the theorems
presented here. The present paper is to a large extent based on those notes.
Incidentally, also P¨
oschel’s paper [22] was based on Moser’s notes. Both in [16]
and [22] it is essential that the unperturbed Hamiltonian function F is analytic
whereas, changing the proof slightly, one can dispense with the unperturbed
function F and instead assume that H is a C ℓ Hamiltonian function that satisfies (1.5) and is an approximate solution of (1.4) in a suitable sense. Here
ℓ > 2τ + 2 > 2n
and τ is the number appearing in (1.3). The resulting solutions u and v of (1.2)
are of class C m , where m < ℓ − 2τ − 2, and the function v ◦ u −1 , whose graph
is the invariant torus, is of class C m+τ +1 (Theorem 2). These are precisely the
regularity requirements which we need for the solutions of (1.2) to be locally
unique (Theorem 3). Combining these two results we obtain that every solution of (1.2) of class C ℓ+1 for a Hamiltonian function of class C ∞ must itself
be of class C ∞ (Theorem 4). We point out that the result presented here is
close to the optimal one as in [7] Herman gave a counterexample concerning
the nonexistence of an invariant curve for an annulus mapping of class C 3−ε .

Translated to our situation this corresponds to the case n = 2 with ℓ = 4 − ε.
Other counterexamples with less smoothness were given by Takens [27] and
Mather [12].
Before entering into the details of the proof we shall discuss a few fundamental properties of invariant tori. If the frequencies ω1 , . . . , ωn are rationally
independent then it follows from (1.2) that the torus is a Lagrangian submanifold of Tn × G or, equivalently,
uTξ vξ = vξT uξ .

(1.6)

This implies that the embedding of the torus extends to a symplectic embedding
z = φ(ζ) where z = (x, y), ζ = (ξ, η), and
x = u(ξ),

y = v(ξ) + uTξ (ξ)−1 η.

(1.7)

In this notation u and v satisfy equations (1.2) if and only if the transformed
Hamiltonian function K := H ◦ φ satisfies
Kξ (ξ, 0) = 0,


Kη (ξ, 0) = ω

(1.8)

Note that the transformations of the form (1.7) with u and v satisfying (1.6)
form a subgroup of the group of symplectic transformations of Tn × Rn .
It follows from (1.6) that the transformation z = φ(ζ) defined by (1.7) can
be represented in terms of its generating function
S(x, η) := U (x) + hV (x), ηi
where the scalar function U (x) and the vector function V (x) are chosen as to
satisfy
V ◦ u = id,
Ux ◦ u = v,
(1.9)
4

so that z = φ(ζ) if and only if y = Sx and ξ = Sη . Observe that the functions
V (x) − x and Ux (x) are of period 1 in all variables. The invariant torus can now
be represented as the graph of

y = Ux (x)

(1.10)

and the flow on it can be described by the differential equation
x˙ = Hy (x, Ux ).

(1.11)

The requirement that equation (1.10) defines an invariant torus is equivalent to
the Hamilton-Jacobi equation
H(x, Ux ) = constant.

(1.12)

In particular, the Hamiltonian system (1.1) can be understood in terms of the
characteristics of the partial differential equation (1.12). Thus we are trying
to find a solution of the partial differential equation (1.12) such that Ux is of
period 1 and such that there is a diffeomorphism ξ = V (x) of the torus which
transforms the differential equation (1.11) into ξ˙ = ω. The latter condition can
be expressed by the formula
Vx Hy (x, Ux ) = ω.

(1.13)

Equations (1.12) and (1.13) together are equivalent to (1.8) and hence to (1.2).

2

The analytic case

The existence proof of invariant tori for analytic Hamiltonians goes back to
Kolmogorov [10]. It was his idea to solve in each step of the iteration a linearized
version of equations (1.12) and (1.13). The complete argument of the proof was
given by Arnold [3] for the first time and in [16] Moser provided the quantitative
estimates. The latter play an essential role in this paper.
In the course of the iteration we shall need an estimate for the solutions of the
degenerate, linear partial differential equation Df = g. Such an estimate was
given by Arnold [3] and Moser [15] and was later improved by R¨
ussmann [24, 25].
A slightly modified proof of R¨
ussmann’s result can be found in P¨
oschel [22]. For
the convenience of the reader we include a proof here (Lemma 2).
This requires some preparation. We introduce the space Wr of all bounded
real analytic functions w(ξ) in the strip |Im ξ| ≤ r, ξ ∈ Cn , which are of period
1 in all variables. Here we denote by |Im ξ| the Euclidean norm of the vector
Im ξ ∈ Rn . Moreover, we introduce the norms
|w|r :=

sup |w(ξ)|,

|Im ξ|≤r

kwkr := sup

|v|≤r

5

µZ

Tn

|w(u + iv)|2

¶1/2

,

for w ∈ Wr and we denote by Wr0 the subspace of those functions w ∈ Wr
which have mean value zero on Im ξ = 0. Observe that every w ∈ Wr can be
represented by its Fourier series
Z
X
w(ξ) =
wj e2πihj,ξi ,
wj :=
w(u + iv)e−2πihj,ui du e2πhj,vi .
Tn

j∈Zn

Here the formula for wj holds for every v ∈ Rn with |v| ≤ r and w−j = w
¯j .
Note also that w ∈ Wr0 if and only if w0 = 0.
Lemma 1. There exists a constant c = c(n) > 0 such that the following inequalities hold for every w ∈ Wr and every ξ ∈ Cn with |Imξ| ≤ ρ < r ≤ 1.
(i) kwkr ≤ |w|r and |wj | ≤ kwkr e−2π|j|r .
P
(ii) |w(ξ)| ≤ j∈Zn |wj |e−2πhj,Im ξi ≤ c(r − ρ)−n/2 kwkr .
(iii) |wξ |ρ ≤ (r − ρ)−1 |w|r .

Proof. The first inequality in (i) is obvious and the second is a consequence of
the following estimate for v := −r|j|−1 j:
Z
|w(u + iv)| du e2πhj,vi ≤ kwkr e−2π|j|r .
|wj | ≤
Tn

The first inequality in (ii) is again obvious. To establish the second inequality
in (ii) fix a vector ξ ∈ Cn with |Im ξ| ≤ ρ and consider the set J0 ⊂ Zn of those
integer vectors j ∈ Zn that satisfy
2hj, Im ξi < −|j|ρ.
We will use the identity
Z
X
|w(u + iv)|2 du =
|wj |2 e−4πhj,vi
Tn

j∈Zn

and obtain with µ := r/ρ > 1 that
X
X
|wj |e−2πhj,Im ξi ≤
|wj |e−2πhj,µIm ξi e−π|j|(r−ρ)
J0

J0



Ã

X
J0

2 −4πhj,µIm ξi

|wj | e

1/2



!1/2 Ã

X

e

J0

c1
kwkr ,
(r − ρ)n/2

where
c1 = c1 (n) := sup

X

0 n − 1, and c0 > 0 be given.
Then there exists a constant c > 0 such that the following holds for every vector
ω ∈ Rn that satisfies (1.3). If g ∈ Wr0 with 0 < r ≤ 1 then the equation
Df = g
7

has a unique solution f ∈ Wρ0 , ρ < r, and this solution satisfies the inequality
|f |ρ ≤

c
kgkr .
(r − ρ)τ

(2.1)

Proof. Representing the functions f, g ∈ Wr0 by their Fourier series we obtain
that the equation Df = g is equivalent to
fj =

gj
,
2πihj, ωi

0 6= j ∈ Zn .

This proves uniqueness. To establish existence and the inequality (2.1), we
first single out the subset J0 ⊂ Zn of all those vectors j 6= 0 that satisfy
|hj, ωi|−1 ≤ 2c0 and define
X
f 0 (ξ) :=
fj e2πihj,ξi .
j∈J0

Then the following inequality holds for |Im ξ| ≤ ρ < r:
|f 0 (ξ)| ≤



1 X |gj | −2πhj,Im ξi
e

|hj, ωi|
j∈J0
c0 X
|gj |e−2πhj,Im ξi
π
j∈J0

c0 c1
kgkr .
(r − ρ)τ

Here we have used the inequality τ > n − 1 ≥ n/2 and chosen c 1 = c1 (n) > 0
so that c := c1 π is the constant of Lemma 1 (ii).
The more delicate part of the estimate concerns the integer vectors in Zn \ J0
and is based on the observation that only a few of the divisors hj, ωi are actually small. This fact was used e.g. by Siegel [26], Arnold [3], Moser [15],

ussmann [25], and P¨
oschel [22].
Fix a number K ≥ 1 and, for ν = 1, 2, 3, . . . , denote by J(ν, K) the set of
all integer vectors j ∈ Zn that satisfy the inequality
1
1
≤ |hj, ωi| < ν ,
2ν+1 c0
2 c0

0 < |j| ≤ K.

In order to estimate the number of points in J(ν, K) we assume without loss of
generality that |ων | ≤ |ωn | for all ν and define ¯j := (j1 , . . . , jn−1 ) ∈ Zn−1 for
j ∈ Zn . Fixing ¯j 6= 0 and choosing
√ jn so as to minimize |hj, ωi|√we then obtain
|jn | ≤ |j1 | + · · · + |jn−1 | + 1 ≤ 2 n − 1|¯j| which implies |j| ≤ 2 n|¯j|. Therefore
it follows from (1.3) that
|hj, ωi| ≥

1
√ ¯ τ
c0 (2 n|j|)

8

for every j ∈ Zn with ¯j 6= 0. It follows also from (1.3) that |ωn | ≥ 1/c0 and
therefore any two integer vectors j, j ′ ∈ J(ν, K) with j 6= j ′ must satisfy ¯j 6= ¯j ′ .
Hence, by what we have just observed,



|¯j − ¯j ′ |−τ ≤ c0 (2 n)τ |hj − j ′ , ωi| ≤ 2(2 n)τ 2−ν ≤ (4 n)τ 2−ν .
Thus the distance

2ν/τ
|¯j − ¯j ′ | ≥ √
4 n

gets very large for large ν. This shows that the number of points in J(ν, K) can
be estimated by
#J(ν, K) ≤ c2 K n−1 2−ν(n−1)/τ

for some constant c2 = c2 (n) > 0. Moreover, we obtain from (1.3) that
J(ν, K) = ∅ for 2ν/τ ≥ K. Denote by J(K) the set of all integer vectors j ∈ Zn
that satisfy
1
> 2c0 ,
0 < |j| ≤ K.
|hj, ωi|
Then J(K) is the union of the sets J(ν, K) for ν = 1, 2, 3, . . . and we conclude
that
X
X
1
≤ 2c0 c2 K n−1
2ν(τ +1−n)/τ ≤ c0 c3 K τ
|hj, ωi|
ν/τ
j∈J(K)

2

≤K

for some constant c3 = c3 (n, τ ) > 0. It is interesting to note that the size of
this sum is of the same order as the size of the largest term in it.
Using Lemma 1 (i) we can now continue estimating |f |ρ :
|f − f 0 |ρ



1 X |gj | 2π|j|ρ
e

|hj, ωi|
j6=J0



1
kgkr X
e−2π|j|(r−ρ)

|hj, ωi|

=

kgkr


=

j6=J0

X

X

k=1

j6=J0
|j|2 =k


kgkr X


X


1
e−2π k(r−ρ)
|hj, ωi|


k=1 j∈J( k)




´

1 ³ −2π√k(r−ρ)
− e−2π k+1(r−ρ)
e
|hj, ωi|


c0 c3 kgkr X τ /2 r − ρ −2π√k(r−ρ)
√ e
k
2
k
k=1


X
√ τ λ −2πλ√k
c0 c3 kgkr
k) √ e
.
sup

(r − ρ)τ 0 0, we have
³√




√ ´
λ
e−2λ k − e−2λ k+1 ≤ 2λ
k + 1 − k e−2λ k ≤ √ e−2λ k .
k
9

This proves the lemma.
Theorem 1 (Kolmogorov, Arnold, Moser). Let n ≥ 2, τ > n − 1, c0 > 0,
0 < θ < 1, and M ≥ 1 be given. Then there are positive constants δ ∗ and c such
that cδ ∗ ≤ 1/2 and the following holds for every 0 < r ≤ 1 and every ω ∈ R n
that satisfies (1.3).
Suppose H(x, y) is a real analytic Hamiltonian function defined in the strip
|Im x| ≤ r, |y| ≤ r, which is of period 1 in the variables x1 , . . . , xn and satisfies
¯
¯
Z
¯
¯
¯H(x, 0) −
H(ξ, 0) dξ ¯¯ ≤ δr2τ +2 ,
¯
Tn

|Hy (x, 0) − ω| ≤ δr τ +1 ,

,
|Hyy (x, y) − Q(x, y)| ≤
2M

(2.2)

for |Im x| ≤ r and |y| ≤ r, where 0 < δ ≤ δ ∗ , and Q(x, y) ∈ Cn×n is a symmetric
(not necessarily analytic) matrix valued function in the strip |Im x| ≤ r, |y| ≤ r
and satisfies in this domain
¯µZ
¶−1 ¯¯
¯
¯
¯
|Q(z)| ≤ M,
Q(x, 0) dx
(2.3)
¯
¯ ≤ M.
¯ Tn
¯
Then there exists a real analytic symplectic transformation z = φ(ζ) of the
form (1.7) mapping the strip |Im ξ| ≤ θr, |η| ≤ θr into |Im x| ≤ r, |y| ≤ r,
such that u(ξ) − ξ and v(ξ) are of period 1 in all variables and the Hamiltonian
function K := H ◦ φ satisfies (1.8). Moreover, φ and K satisfy the estimates
|φ(ζ) − ζ| ≤ cδ(1 − θ)r,

|Kηη (ζ) − Q(ζ)| ≤
,
M
¯
¯
¯v ◦ u−1 (x)¯ ≤ cδrτ +1 ,

|φζ (ζ) − 1l| ≤ cδ,
(2.4)

for |Im ξ| ≤ θr, |η| ≤ θr, and |Im x| ≤ θr.

Proof. We will construct inductively a sequence of real analytic Hamiltonian
functions H ν (x, y) in the strips |Im x| ≤ rν , |y| ≤ rν , where

µ
1+θ 1−θ
r0 = r,
+ ν+1 r,
rν :=
2
2
such that H 0 := H and H ν+1 := H ν ◦ ψ ν . The symplectic transformation
z = ψ ν (ζ) will be real analytic, will map the strip |Im ξ| ≤ θrν+1 , |η| ≤ θrν+1
into |Im x| ≤ rν , |y| ≤ rν , and it will be represented in terms of its generating
function
S ν (x, η) = U ν (x) + hV ν (x), ηi.
Following Kolmogorov [10] we will choose the real analytic functions
U ν (x) = hα, xi + a(x),
10

V ν (x) = x + b(x)

in the strip |Im x| ≤ rν such that a(x) and b(x) are of period 1 in all variables,
have mean value zero over the n-torus, and satisfy the following equations for
|Im x| ≤ rν :
Z
H ν (ξ, 0) dξ − H ν (x, 0),
Da(x) =
Tn

Z µ
(2.5)
ν
ν
Hy (ξ, 0) + Hyy (ξ, 0)(α + ax (ξ)) dξ = ω,
Tn

ν
Db(x) = ω − Hyν (ξ, 0) − Hyy
(ξ, 0)(α + ax (ξ)).

Note that the second equation in (2.5) determines α and that it is necessary in
order for the right hand side of the third equation to have mean value zero so
that Lemma 2 can be applied. Moreover, observe that (2.5) can be obtained
from (1.12) and (1.13) by linearizing these equations around V (x) = x and
U (x) = 0 if we replace Da by hax , Hyν (x, 0)i and Db by bx Hyν (x, 0).
Let us now define the error at the νth step of the iteration to be the smallest
number εν > 0 that satisfies the inequalities
¯
¯
Z
¯ ν
¯
¯ ν
¯
ν
¯H (x, 0) −
¯ ≤ εν ,
¯Hy (x, 0) − ω ¯ (rν − rν+1 )τ +1 ≤ εν
H
(ξ,
0)

¯
¯
n
T

for |Im x| ≤ rν . We will then show that εν converges to zero according to the
quadratic estimate
c3
εν+1 ≤
ε2 .
(2.6)
(rν − rν+1 )2τ +2 ν
Here c3 > 0 is a suitable constant independent of r. It is important to notice
that the geometrically growing factor in front of ε2ν in (2.6) is dominated by the
quadratic convergence of εν . In fact, one checks easily that (2.6) implies
εν ≤ δν r2τ +2
where the sequence δν is defined by the recursive law
δν+1 := c2ν(2τ +3) δν2 ,
γν := c2(ν+1)(2τ +3) δν ,

δ0 := δ ≤ δ ∗ ,

γ0 := 22τ +3 cδ,

(2.7)

and the constant c > 0 is related to c3 as in equation (2.10) below.2 The
sequence γν will then satisfy
γν+1 = γν2
(2.8)
and therefore γν converges to zero whenever γ0 < 1. For the sequence H ν this
will lead to the estimates
¯
¯
Z
¯
¯ ν
ν
¯H (x, 0) −
H (ξ, 0) dξ ¯¯ ≤ δν r2τ +2 ,
¯
Tn
¯ ν
¯
¯Hy (x, 0) − ω ¯ (rν − rν+1 )τ +1 ≤ δν r2τ +2 ,
(2.9)
¯ ν
¯
¯Hyy (x, y) − Qν (x, y)¯ ≤ 2−ν cδ
2M

2 A small flaw in the formula (2.7) is the factor 2τ + 3 where one would like to have 2τ + 2.
The number 2τ + 3 is only needed in Step 3 for estimating the y-component of ψ ν − id.

11

for |Im x| ≤ rν and |y| ≤ rν , where
Q0 := Q,

ν−1
Qν := Hyy

for ν ≥ 1. In this discussion the constants are chosen explicitly as follows. Let
c1 = c1 (n, τ, c0 ) be the constant of Lemma 2 and define c2 , c3 , and c by
¢2
¡
c2 := 12M 1 + c1 8τ +1 ,
3

c3 :=

4M c22

+ c2 ,

c :=

µ

4
1−θ

¶2τ +3

c3 . (2.10)

Next we choose δ ∗ > 0 so small that
cδ ∗ ≤ 2−2τ −4 .

(2.11)

This implies γ0 ≤ 1/2 and therefore γν+1 ≤ γν /2 for ν ≥ 1. Finally, define
Mν :=
and note that

Mν−1
,
1 − 2−ν cδ ∗
1−ν

Mν ≤ Mν−1 e2

cδ ∗

M−1 := M,

(2.12)



≤ M e4cδ ≤ 2M

(2.13)

for ν ≥ 0. Here the last inequality follows from (2.11).

With the constants in place, we are now ready to prove by induction that
H ν satisfies (2.9). First observe that, by assumption, the inequalities (2.9) are
satisfied for ν = 0 provided that δ ≤ δ ∗ . Fix an integer ν ≥ 0 and assume, by
induction, that the real analytic functions H µ (x, y) on the domains |Im x| ≤ rµ ,
|y| ≤ rµ , have been constructed for µ = 1, . . . , ν such that (2.9) is satisfied
with ν replaced by µ. Then we obtain from (2.3), (2.9), (2.12), and (2.13) by
induction that
¯µZ
¶−1 ¯¯
¯
¯
¯
ν
|Hyy
(x, y)| ≤ Mν ,
H ν (x, 0) dx
(2.14)
¯
¯ ≤ Mν
¯ Tn yy
¯
for |Im x| ≤ rν and |y| ≤ rν . We shall now construct, in four steps, a real
analytic Hamiltonian function H ν+1 = H ν ◦ ψ ν and show that it satisfies (2.9)
with ν replaced by ν + 1.

Step 1. If H ν (x, y) satisfies (2.14) then
2

|H ν (x, y) − H ν (x, 0) − hH ν (x, 0), yi| ≤ M |y| ,
¯ ν
¯
¯Hy (x, y) − Hyν (x, 0)¯ ≤ 2M |y| ,
¯ ν
¯ 2M |y|2
ν
¯Hy (x, y) − Hyν (x, 0) − Hyy
(x, 0)y ¯ ≤
rν − |y|

for |Im x| ≤ rν and |y| < rν .

12

These estimates follow from the fact that Mν ≤ 2M and from the identities
Z 1Z t
ν
ν
ν
ν
H (x, y) − H (x, 0) − hHy (x, 0), yi =
hy, Hyy
(x, sy)yi dsdt,
0

H ν (x, y) − H ν (x, 0) =
ν
Hyν (x, y) − Hyν (x, 0) − Hyy
(x, 0)y =

=

Z

0

1

ν
Hyy
(x, ty)y dt,

0

Z

1

0

Z

0

1

¡ ν
¢
ν
Hyy (x, ty)y − Hyy
(x, 0)y dsdt
1
2πi

Z

Γ

1
H ν (x, λty)y dλdt,
λ(λ − 1) yy

with Γ := {λ ∈ C | |λ| = rν /|y| > 1} . This proves Step 1.
Now it follows from Lemma 2 that there exist unique solutions a(x), α, b(x)
of equation (2.5) in the strip |Im x| < rν such that a(x) and b(x) are of period
1 in all variables and have mean value zero over the torus.
Step 2. The solutions a(x), α, b(x) of (2.5) satisfy the estimates
|a(x)| ≤

c2 ε ν
,
(rν − rν+1 )τ

|b(x)| ≤

c2 ε ν
,
(rν − rν+1 )2τ +1

c2 ε ν
,
(rν − rν+1 )τ +1

(2.15)

c2 ε ν
,
(rν − rν+1 )2τ +2

(2.16)

|α + ax (x)| ≤
|bx (x)| ≤

for |Im x| ≤ (rν + rν+1 )/2.

Define ρj := ((8 − j)rν + jrν+1 ) /8 for 0 ≤ j ≤ 4 so that ρ0 = rν and ρ4 =
(rν + rν+1 )/2. Then it follows from Lemma 2 that
¯
¶τ ¯
µ
Z
¯ ν
¯
8
c1 8 τ ε ν
ν
¯
¯ ≤
H
(·,
0)

H
(ξ,
0)

|a|ρ1 ≤ c1
¯
¯
rν − rν+1
(rν − rν+1 )τ
Tn
ρ0
and hence, by Lemma 1 (iii),
|ax |ρ2 ≤

8
c1 8τ +1 εν
|a|ρ1 ≤
.
rν − rν+1
(rν − rν+1 )τ +1

This estimate, together with (2.5) and (2.14), implies
³¯
¯ ´
¯ ν
¯
(·, 0)ax ¯ρ
|α| ≤ Mν ¯ω − Hyν (·, 0)¯ρ + ¯Hyy
2
2
¢
¡
εν
τ +1
≤ M ν 1 + M ν c1 8
(rν − rν+1 )τ +1
¢
¡
εν
.
≤ 4M 2 1 + c1 8τ +1
(rν − rν+1 )τ +1

Here the second inequality follows from the definition of εν and the last from
equation (2.13). Thus we have proved that
¢
¡
εν
|α + ax |ρ2 ≤ 5M 2 1 + c1 8τ +1
.
(rν − rν+1 )τ +1
13

Combining this with (2.10) gives (2.15). Furthermore, we obtain from (2.5) and
Lemma 2 that
¶τ
µ
¯
¯
8
ν
¯ω − Hyν (·, 0) − Hyy
(·, 0)(α + ax )¯ρ
|b|ρ3 ≤ c1
2
rν − rν+1

¶τ µ
µ
¢
¡
εν
8
1 + 10M 3 1 + c1 8τ +1
≤ c1
rν − rν+1
(rν − rν+1 )τ +1
¢2
12M 3 ¡
εν

1 + c1 8τ +1
.
8
(rν − rν+1 )2τ +1

Hence, by Lemma 1 (iii), we have
|bx |ρ4 ≤

¢2
¡
8
εν
|b|ρ3 ≤ 12M 3 1 + c1 8τ +1
.
rν − rν+1
(rν − rν+1 )2τ +2

Combining this with (2.10) gives (2.16). This proves Step 2.

Having constructed the functions U (x) := U ν (x) := hα, xi + a(x) and
V (x) := V ν (x) := x + b(x) we can define the symplectic transformation z =
ψ ν (ζ) by (1.7) and (1.9). Thus
z = ψ ν (x)

⇐⇒

ξ = x + b(x),

y = α + ax (x) + η + bx (x)T η.

That this map is well defined will be established in the proof of Step 3.
Step 3. The transformation z = ψ ν (ζ) maps the strip |Im ξ| ≤ rν+1 , |η| ≤ rν+1
into |Im x| ≤ (rν + rν+1 )/2, |y| ≤ (rν + rν+1 )/2 and satisfies the estimates
|ψ ν (ζ) − ζ| ≤ 2−ν cδ
and

rν − rν+1
,
4

¯ ν
¯
¯ψζ (ζ) − 1l¯ ≤ 2−ν cδ,

(2.17)

(2.18)

for |Im ξ| ≤ rν+1 and |η| ≤ rν+1 .

First note that the induction hypothesis (2.9) implies εν ≤ δν r2τ +2 and hence
c2 ε ν
≤ c2
(rν − rν+1 )2τ +2

µ

2ν+2
1−θ

¶2τ +2

δν

µ
¶2τ +2
c3
4
=
2ν(2τ +2) δν
4M c2 + 1 1 − θ
1−θ
γν
= 2τ +3
2
(4M c2 + 1) 2ν+2
2−ν cδ 1 − θ

.
16M 2 2ν+2

(2.19)

Here the second equality follows from the definition of c3 , the third equality
from the definition of c and γν , and the last inequality follows from the fact
14

that γν ≤ 2−ν γ0 = 2−ν 22τ +3 cδ and c2 ≥ 4M . Combining (2.19) with Step 2 we
obtain the following estimates for |Im x| ≤ (rν + rν+1 )/2 and ξ := V ν (x):
2−ν cδ
c2 ε ν

(rν − rν+1 ) ,

+1
(rν − rν+1 )
16M 2
c2 ε ν
2−ν cδ 1 − θ
|bx (x)| ≤

.

+2
(rν − rν+1 )
16M 2 2ν+2

|x − ξ| = |b(x)| ≤

(2.20)

This implies that V ν = id + b has an inverse u := (V ν )−1 which maps the
strip |Im ξ| ≤ (rν + 3rν+1 )/4 into |Im x| ≤ (rν + rν+1 )/2. To see this fix a
vector ξ ∈ Cn with |Im ξ| ≤ (rν + 3rν+1 )/4 and apply the contraction mapping
principle to the map x 7→ ξ − b(x) on the domain |Im x| ≤ (rν + rν+1 )/2.
Now let ξ, η ∈ Cn such that |Im ξ| ≤ (rν + 3rν+1 )/4 and |η| ≤ (rν + 3rν+1 )/4,
let x ∈ Cn be the unique vector that satisfies |Im x| ≤ (rν + rν+1 )/2 and
x + b(x) = ξ, and define y ∈ Cn by
y := α + ax (x) + η + bx (x)T η.
Then z = ψ ν (ζ) and it follows again from Step 2 that
¯
¯
|y − η| ≤ |α + ax | + ¯bTx η ¯
rν + 3rν+1
c2 ε ν
c2 ε ν
+

τ
+1

+2
(rν − rν+1 )
(rν − rν+1 )
4
c2 ε ν r

(rν − rν+1 )2τ +2
2−ν cδ

(rν − rν+1 )
16M 2

(2.21)

τ

Here the third inequality uses the fact that (rν − rν+1 ) ≤ 1/4. The last inequality follows from (2.19) and the fact that rν − rν+1 = 2−ν−2 (1 − θ)r. It
follows from (2.21) that |y| ≤ (rν + rν+1 )/2. The inequalities (2.20) and (2.21)
together show that ψ ν satisfies (2.17) in the domain |Im ξ| ≤ (rν + 3rν+1 )/4,
|η| ≤ (rν + 3rν+1 )/4. The estimate (2.18) follows from (2.17) and Cauchy’s
integral formula (Lemma 1 (iii)). This proves Step 3.
In view of Step 3 we can define
H ν+1 := H ν ◦ ψ ν .
The next step establishes (2.6) along with the required estimates for H ν+1 .
Step 4. The inequalities in (2.9) are satisfied with ν replaced by ν + 1.
We will first establish (2.6). For this define the real number h by
Z
h :=
H ν (ξ, 0) dξ + hω, αi.
Tn

15

and denote z := (x, α + ax ) := ψ ν (ξ, 0), where |Im ξ| ≤ rν+1 . Then it follows
from Step 3 that |Im x| ≤ (rν + rν+1 ) /2 and |y| = |α + ax | ≤ (rν + rν+1 ) /2.
Moreover, it follows from (2.5) that
H ν+1 (ξ, 0) − h

=
=
=

H ν (x, α + ax ) − h

H ν (x, α + ax ) − H ν (x, 0) − Da − hω, αi
H ν (x, α + ax ) − H ν (x, 0) − hHyν (x, 0), α + ax i
+ hHyν (x, 0) − ω, α + ax i.

By Step 1 and Step 2, this implies
¯ ν+1
¯
¯
¯
2
¯H
(ξ, 0) − h¯ ≤ M |α + ax | + ¯Hyν (x, 0) − ω ¯ |α + ax |




M c22 + c2
ε2
(rν − rν+1 )2τ +2 ν
c3 /2
ε2 ,
(rν − rν+1 )2τ +2 ν

and hence
¯
Z
¯ ν+1
¯H
(ξ,
0)

¯

T2

¯
¯
H ν+1 (ξ, 0) dξ ¯¯ ≤

c3
ε2 .
(rν − rν+1 )2τ +2 ν

(2.22)

Secondly, it follows from (2.5) that
Hyν+1 (ξ, 0) − ω

=
=

(1l + bx ) Hyν (x, α + ax ) − ω

ν
Hyν (x, α + ax ) − Hyν (x, 0) − Hyy
(x, 0)(α + ax )
¢
¡ ν
ν
+ bx Hy (x, α + ax ) − Hy (x, 0)
¡
¢
+ bx Hyν (x, 0) − ω .

By Step 1 and Step 2, this implies
¯
¯ ν+1
¯H
(ξ, 0) − ω ¯ ≤
y



=

2

¯
¯
2M |α + ax |
+ 2M |bx | |α + ax | + |bx | ¯Hyν (x, 0) − ω ¯
rν − |α + ax |
4M c22 + c2
ε2
(rν − rν+1 )3τ +3 ν
c3
1
ε2 .
(rν+1 − rν+2 )τ +1 (rν − rν+1 )2τ +2 ν

In the second inequality we have used the fact that |α + ax | ≤ (rν − rν+1 )/2,
by (2.21), and hence (rν − rν+1 )τ +1 ≤ (rν − rν+1 )/2 ≤ rν − |α + ax |. The
estimate (2.6) now follows by combining the last inequality with (2.22).
Combining (2.6) with (2.7) and the induction hypothesis εν ≤ δν r2τ +2 we
obtain
µ ν+2 ¶2τ +2
c3 δν2 r4τ +4
2
εν+1 ≤
= c3
δν2 r2τ +2 ≤ δν+1 r2τ +2 .
(rν − rν+1 )2τ +2
1−θ
16

This implies the first two inequalities in (2.9) with ν replaced by ν + 1. To
prove the last inequality, suppose that |Im ξ| ≤ rν+1 , |η| ≤ rν+1 and denote
z := ψ ν (ζ). Then Step 3 shows that |Im x| ≤ (rν + rν+1 ) /2, |y| ≤ (rν + rν+1 ) /2
and, moreover, it follows from the definition of H ν+1 that
¢
¡
ν+1
ν
Hyy
(ξ, η) = (1l + bx )Hyy
(x, y) 1l + bTx .
Hence, by Step 2, we obtain
¯ ν+1
¯
ν
¯Hyy (ζ) − Hyy
(z)¯

¯ ν
¯
¯
2¯ ν
≤ 2 |bx | ¯Hyy
(z)¯ + |bx | ¯Hyy
(z)¯
¯ ν
¯
≤ 3 |bx | ¯Hyy
(z)¯


2−ν cδ
6M c2 εν

.

+1
(rν − rν+1 )
4M

The last inequality follows from (2.19). Moreover,
Z
1
1
ν
ν
Hyy
(z) − Hyy
(ζ) =
H ν (ζ + λ(z − ζ)) dλ,
2πi Γ λ(λ − 1) yy
where

½
½
¾
¾
rν − |Im ξ| rν − |η|
Γ := λ ∈ C | |λ| = min
,
>1 .
|x − ξ|
|y − η|

This implies

¯ ν
¯
¯H (z) − H ν (ζ)¯
yy
yy

½

|y − η|
|x − ξ|
,
rν − |Im ξ| − |x − ξ| rν − |η| − |y − η|
2−ν cδ
2−ν cδ rν − rν+1
=
.
≤ 2M
2
16M (rν − rν+1 )/2
4M
≤ 2M max

¾

Here the second inequality follows from (2.20) and (2.21) and the fact that
|x − ξ| ≤ (rν − rν+1 )/2 and |y − η| ≤ (rν − rν+1 )/2. This proves Step 4.

Step 4 completes the induction step and it remains to establish the uniform
convergence of the sequence
φν := ψ 0 ◦ ψ 1 ◦ · · · ◦ ψ ν

in the domain |Im ξ| ≤ θr, |η| ≤ θr along with the estimates in (2.4). First it
follows from Step 3 that, if |Im ξ| ≤ rν and |η| ≤ rν and z := ψ µ+1 ◦· · ·◦ψ ν−1 (ζ)
then |Im x| ≤ rµ+1 and |y| ≤ rµ+1 and therefore, by (2.18),
¯
¯
¯ µ µ+1
¯
◦ · · · ◦ ψ ν−1 (ζ))¯ ≤ 1 + 2−µ cδ.
¯ψζ (ψ
This implies

¯
¯
¯ ν−1 ¯
¯φζ (ζ)¯ ≤ (1 + cδ) · · · (1 + 21−ν cδ) ≤ e2cδ ≤ 2
17

for |Im ξ| ≤ rν and |η| ≤ rν and hence
¯ ν
¯ ¯
¯
¯φ (ζ) − φν−1 (ζ)¯ = ¯φν−1 (ψ ν (ζ)) − φν−1 (ζ)¯ ≤ 2 |ψ ν (ζ) − ζ| ≤ cδ(rν − rν+1 )

for |Im ξ| ≤ rν+1 and |η| ≤ rν+1 and ν ≥ 1. Here the last inequality follows
from (2.17). The same inequality holds for ν = 0 if we define φ−1 := id. We
conclude that the limit function
φ := lim φν
ν→∞

satisfies the estimate
|φ(ζ) − ζ| ≤ cδ


X

(rν − rν+1 ) = cδ

ν=0

1−θ
r
2

for |Im ξ| ≤ r(1 + θ)/2 and |η| ≤ r(1 + θ)/2. This proves the first inequality
in (2.4) and the second follows from Lemma 1 (iii). Since cδ ≤ 1/2 this second
estimate also shows that φ is a diffeomorphism and, as a limit of symplectomorphisms of the form (1.7), it is itself a symplectomorphism of this form.
The transformed Hamiltonian function can be expressed as the limit
K(ζ) := H ◦ φ(ζ) = lim H ◦ ψ 0 ◦ · · · ◦ ψ ν (ζ) = lim H ν (ζ)
ν→∞

ν→∞

for |Im ξ| ≤ θr and |η| ≤ θr. Since the sequence
¶τ +1
µ
δν
δν
4
=
(rν − rν−1 )τ +1
(1 − θ)r
2ν(τ +1)
converges to zero, by (2.7) and (2.8), it follows from (2.9) that the limit K
satisfies (1.8). Moreover,
¯ ν
¯
|Kηη (ζ) − Q(ζ)| = lim ¯Hyy
(ζ) − Q0 (ζ)¯
ν→∞



ν→∞




X

lim

µ=0

ν
X
¯ µ
¯
¯Hyy (ζ) − Qµ (ζ)¯

µ=0

2−µ



=
2M
M

for |Im ξ| ≤ θr and |η| ≤ θr. Here the third inequality follows from (2.9). Thus
we have proved the third inequality in (2.4). To prove the estimate for v ◦ u −1
we observe that
v ◦ u−1 (x) =


X
¢T
¡
¡ 0 ¢T ¡ 1 ¢T
· Vx · · · Vxν−1 · Uxν
Vx
ν=1

for |Im x| ≤ θr. Here we abbreviate
Vxµ := Vxµ (V µ−1 ◦ · · · ◦ V 0 (x)).
18

This expression is well defined for |Im x| ≤ θr since, by Step 2 and (2.19), we
have
¯ ν
¯
c2 ε ν
¯V ◦ · · · ◦ V 0 (x) − V ν−1 ◦ · · · ◦ V 0 (x)¯ ≤
≤ rν − rν+1 ,
(rν − rν+1 )2τ +1
¯ ν
¯
¯V ◦ · · · ◦ V 0 (x) − x¯ ≤ r − rν+1 ≤ 1 − θ r ≤ rν+1 + rν+2 − θr.
2
2

This last expression allows us to use Step 2 and (2.19) again to estimate the
terms Vxν+1 := Vxν+1 (V ν ◦ · · · ◦ V 0 (x)) by 1 + 2−ν−1 cδ for |Im x| ≤ θr. It also
follows from Step 2 and (2.19) that |Ux (x)| ≤ 2−ν cδrτ +1 /4. Hence we obtain
¯
¯
¯v ◦ u−1 (x)¯

=


X
¯ 0 ¯ ¯ ν−1 ¯ ν
¯Vx ¯ · · · ¯Vx ¯ |Ux |
ν=0





X

(1 + cδ) · · · (1 + 21−ν cδ)2−ν cδ

ν=1

rτ +1
4


e2cδ X −ν τ +1
2 cδr
4 ν=0

≤ cδrτ +1

for |Im x| ≤ θr. This completes the proof of Theorem 1.

3

The differentiable case

The perturbation theorem for invariant tori for differentiable Hamiltonian functions is due to Moser [13, 14]. He first proved this result in the context of
invariant curves for area preserving annulus mappings which satisfy the monotone twist property [14]. This corresponds to the case of two degrees of freedom.
One of the main ideas in [13, 14] was to use a smoothing operator in order to
compensate for the loss of smoothness that arises in solving the linearized equation. Moreover, it was essential to observe that the error introduced by the
smoothing operator would not destroy the rapid convergence of the iteration. A
similar approach was also used by Nash [21] for the embedding problem of compact Riemannian manifolds. But this method required an excessive number of
derivatives like for example ℓ ≥ 333 for the annulus mapping [14]. Incidentally,
this number was later reduced by R¨
ussmann [24] to ℓ ≥ 5.
In [16] Moser proposed a different approach which is based on approximating
differentiable functions by analytic ones. In this section we provide the complete
arguments for this second approach as it has also been done by P¨
oschel in [22] in
a somewhat different way. The fundamental observation is that the qualitative
property of differentiability of a function can be characterized in terms of quantitative estimates for an approximating sequence of analytic functions. Moser’s
first proof of this result in [15] was based on a classical approximation theorem
due to Jackson. A direct proof was later provided by Zehnder [28]. For the sake
19

of completeness we will include proofs of these results that were given by Moser
in a set of unpublished notes entitled Ein Approximationssatz (see Lemmata 3
and 4 below).
Let us first recall that C µ (Rn ) for 0 < µ < 1 denotes the space of of bounded

older continuous functions f : Rn → R with the norm
|f |C µ :=

|f (x) − f (y)|
+ sup |f (x)| .
µ
|x − y|
x∈Rn
0 0 and every p > 0, there exists a constant c1 = c1 (m, p) > 0 such
that
¯ β
¯
¯∂ K(u + iv)¯ ≤ c1 (1 + |u|)−p ea|v|
|β| ≤ m
=⇒
(3.3)

for all β ∈ Nn and u, v ∈ Rn . Moreover, it follows from Cauchy’s integral formula
that the expression on the left hand side of (3.2) is independent of v. This
proves (3.2) in the case β = 0 and in general it follows from partial integration.
We prove that any such function K satisfies the requirements of the lemma.
The proof is based on the observation that, by (3.2), the convolution operator (3.1) acts as the identity on polynomials, i.e.
Sr p = p
for every polynomial p : Rn → R. We will make use of this fact in case of the
Taylor polynomial
X 1
∂ α f (x)y α
pk (x; y) := Pf,k (x; y) =
α!
|α|≤k

of f with k := [ℓ]. The difference f − pk can be expressed in the form
f (x + y) − pk (x; y)
µ

Z sk−1 X
Z 1 Z s1
k! α
∂ f (x + sk y) − ∂ α f (x) y α dsk dsk−1 · · · ds1
···
=
α!
0
0
0
|α|=k

Z 1
X 1µ
∂ α f (x + ty) − ∂ α f (x) y α dt.
=
k(1 − t)k−1
α!
0
|α|=k

for x, y ∈ Rn . This gives rise to the well known estimate



|f (x + y) − pk (x; y)| ≤ c2 |f |C ℓ |y|

for x, y ∈ Rn , k := [ℓ], and a suitable constant c2 = c2 (n, ℓ) > 0.
Now let x = u + iv with u, v ∈ Rn . Then
Z
¡
¢
Sr f (x) = r−n
K r−1 (u − y) + ir−1 v f (y) dy
Rn
Z
¡
¢
K ir−1 v − η f (u + rη) dη.
=
Rn

21

(3.4)

Moreover, it follows from (3.2) that
Z
−n
pk (u; iv) = r
K(r−1 (iv − y))pk (u; y) dy
Rn
Z
¡
¢
K ir−1 v − η pk (u; rη) dη.
=
Rn

Hence, by (3.4), we have

Z

¯ ¡ −1
¢¯
¯K ir v − η ¯ |f (u + rη) − pk (u; rη)| dη
Rn
Z
¯ ¡ −1
¢¯
¯K ir v − η ¯ |η|ℓ dη
≤ c2 |f |C ℓ rℓ
Rn
Z
a

≤ c1 c2 e |f |C ℓ r
(1 + |η|)ℓ−p dη.

|Sr f (u + iv) − pk (u; iv)| ≤

Rn

¯
¯
The last inequality follows from (3.3) with β = 0, ¯r−1 v ¯ ≤ 1, and p > ℓ + n so
that the integral on the right is finite. This proves the lemma for α = 0. For
α 6= 0 the result follows from the fact that Sr commutes with ∂ α .
The converse statement of Lemma 3 holds only if ℓ is not an integer. A
classical version of this converse result is due to Bernstein and relates the differentiability properties of a periodic function to quantitative estimates for an
approximating sequence of trigonometric polynomials [1].
Lemma 4 (Bernstein, Moser). Let ℓ ≥ 0 and n be a positive integer. Then
there exists a constant c = c(ℓ, n) > 0 with the following significance. If f :
Rn → R is the limit of a sequence of real analytic functions fν (x) in the strips
|Im x| ≤ rν := 2−ν r0 such that 0 < r0 ≤ 1 and
f0 = 0,

|fν (x) − fν−1 (x)| ≤ Arνℓ

for ν ≥ 1 and |Im x| ≤ rν , then f ∈ C s (Rn ) for every s ≤ ℓ which is not an
integer and, moreover,
|f |C s ≤

cA
rℓ−s ,
µ(1 − µ) 0

0 < µ := s − [s] < 1.

Proof. It is enough to consider the case ℓ = s. Moreover, once the result has
been established for 0 < ℓ < 1 it follows for ℓ > 1 by Cauchy’s estimate,
or else by repeatedly applying Lemma 1 (iii). Therefore we may assume that
0 < µ = s = ℓ < 1.
Let us now define
gν := fν − fν−1 .
P∞
Then f = ν=1 gν satisfies the estimate
|f |C 0 ≤ A


X

ν=1

rνµ =

2A µ
2−µ Ar0µ

r .
1 − 2−µ
µ 0

22

Here we have used the inequality 1 − 2−µ ≥ µ/2 for 0 < µ < 1. For x, y ∈ Rn
with r0 < |x − y| ≤ 1 this implies
|f (x) − f (y)| ≤

4A
4A µ
µ
r0 ≤
|x − y| .
µ
µ(1 − µ)

In the case 0 < |x − y| < r0 there is an integer N ≥ 0 such that
2−N −1 r0 ≤ |x − y| < 2−N r0 .

Moreover, it follows from Lemma 1 (iii) that

for every u ∈ Rn . and hence

|gνx (u)| ≤ Arνµ−1

|gν (x) − gν (y)| ≤ Arνµ−1 |x − y| .

We shall use this estimate for ν = 1, . . . , N . For ν > N we use the trivial
estimate
|gν (x) − gν (y)| ≤ 2Arνµ .

Taking into account the inequalities 1 − 2−µ ≥ µ/2 and 21−µ − 1 ≥ (1 − µ)/2
for 0 < µ < 1 we conclude that
|f (x) − f (y)| ≤


X

ν=1

|gν (x) − gν (y)|

≤ A |x − y|


=


N
X
X
¡ ν −1 ¢1−µ
¡ −ν ¢µ
+ 2A
2 r0
2 r0

ν=1

ν=N +1

¡
¢1−µ
2A
2A ¡ −N −1 ¢µ
2
r0
|x − y| 2N r0−1
+
1−µ
2
−1
1 − 2−µ
4A
4A
|x − y|µ +
|x − y|µ
1−µ
µ
4A
µ
|x − y| .
µ(1 − µ)

This proves the lemma.

The approximation result of Lemma 3 can be used to prove the following
interpolation and product estimates. Denote by C s (Tn , Rk ) the space of all
functions w ∈ C s (Rn , Rk ) that are of period 1 in all variables, and by C0s (Tn , Rk )
the space of all functions w ∈ C s (Tn , Rk ) with mean value zero. For k = 1 we
abbreviate C s (Tn ) := C s (Tn , R) and C0s (Tn ) := C0s (Tn , R).
Lemma 5. For every n ∈ N and every ℓ > 0 there is a constant c = c(ℓ, n) > 0
such that the following inequalities hold for all f, g ∈ C ℓ (Tn ):
ℓ−k

ℓ−m

m−k

|f |C m ≤ c |f |C k |f |C ℓ ,
k ≤ m ≤ ℓ,

µ
0 ≤ s ≤ ℓ.
|f g|C s ≤ c |f |C 0 |g|C s + |f |C s |g|C 0 ,
23

Proof. If Sr : C 0 (Tn ) → C ∞ (Tn ) denotes the smoothing operator of Lemma 3
then one checks easily that
|f − Sr f |C m ≤ c1 rℓ−m |f |C ℓ ,

|Sr f |C m ≤ c1 rk−m |f |C k

for all f ∈ C ℓ (Tn ), 0 < r ≤ 1, k ≤ m ≤ ℓ, and a suitable constant c 1 =
c1 (ℓ, n) > 0. Choosing r > 0 so as to satisfy
rℓ−k =

|f |C k
|f |C ℓ

we obtain
|f |C m


µ
≤ c1 rℓ−m |f |C ℓ + rk−m |f |C k
=

(ℓ−m)/(ℓ−k)

2c1 |f |C k

(m−k)/(ℓ−k)

|f |C ℓ

.

This proves the first estimate. Moreover, the second follows from the first since
X µβ ¶
¡
¢
β
∂ (f g) =
(∂ α f ) ∂ β−α g
α
α≤β

where α ≤ β if and only if αν ≤ βν for all ν and
µ ¶
µ ¶ µ ¶
β
βn
β1
.
···
:=
αn
α1
α
This proves the lemma.
The next theorem is the main result of this paper. It is Moser’s perturbation theorem for invariant tori of differentiable Hamiltonian systems. We have
phrased it in the form that the existence of an approximate invariant torus implies the existence of a true invariant torus nearby. It then follows that every
invariant torus of class C ℓ+1 satisfying (1.5) with a frequency vector satisfying (1.3) persists under C ℓ small perturbations of the Hamiltonian function. In
this formulation it is also an obvious consequence that every invariant torus of
class C ℓ+1 satisfying (1.5) gives rise to nearby invariant tori for nearby frequency
vectors that satisfy the Diophantine inequalities (1.3).
Theorem 2 (Moser). Let n ≥ 2, τ > n − 1, c0 > 0, m > 0, ℓ > 2τ + 2 + m,
M ≥ 1, and ρ > 0 be given. Then there are positive constant ε ∗ and c such that
the following holds for every vector ω ∈ Rn that satisfies (1.3) and every open
set G ⊂ Rn that contains the ball Bρ (0).
Suppose H ∈ C ℓ (Tn × G) satisfies
¯µZ
¶−1 ¯¯
¯
¯
¯
|H|C ℓ ≤ M,
Hyy (ξ, 0) dξ
(3.5)
¯
¯ ≤ M,
¯ Tn
¯
24

and

¯
Z
¯
¯H(x, 0) −
¯

T

¯
¯
H(ξ, 0) dξ ¯¯ + |Hy (x, 0) − ω| ετ +1 < M εℓ
n

(3.6)

for every x ∈ Rn and some constant 0 < ε ≤ ε∗ . Then there is a solution
x = u(ξ),

y = v(ξ)

of (1.2) such that u(ξ) − ξ and v(ξ) are of period 1 in all variables. Moreover,
u ∈ C s (Rn , Rn ) and v ◦ u−1 ∈ C s+τ (Rn , G) for every s ≤ m + 1 such that s ∈
/N
and s + τ ∈
/ N and
c
εm+1−s , 0 < s ≤ m + 1,
µ(1 − µ)
c
εm+τ +1−s , 0 < s ≤ m + τ + 1,

µ(1 − µ)

|u − id|C s ≤

¯
¯
¯v ◦ u−1 ¯ s
C

(3.7)

where 0 < µ := s − [s] < 1.
Proof. By Lemma 3, we can approximate H(x, y) by a sequence of real analytic
functions H ν (x, y) for ν = 0, 1, 2, . . . in the strips
|Im x| ≤ rν−1 ,

|Im y| ≤ rν−1 ,

rν := 2−ν ε,

around Re x ∈ Tn , |Re y| ≤ ρ, such that
¯
¯
¯
¯
α¯
X
¯ ν
(iIm
z)
α
¯ ≤ c1 |H| ℓ rℓ ,
¯H (z) −
∂ H(Re z)
ν
C
¯
α! ¯¯
¯
|α|≤ℓ
¯
¯
¯
¯
α¯
X
¯ ν
(iIm
z)
α
¯H (z) −
¯ ≤ c1 |H| ℓ rℓ−1 ,
∂ Hy (Re z)
ν
C
¯ y
α! ¯¯
¯
|α|≤ℓ
¯
¯
¯
¯
α¯
X
¯ ν
(iIm
z)
α
¯ ≤ c1 |H| ℓ rνℓ−2 ,
¯Hyy (z) −
∂ Hyy (Re z)
C
¯
α! ¯¯
¯
|α|≤ℓ

(3.8)

for |Im x| ≤ rν−1 , |Im y| ≤ rν−1 , and |Re y| ≤ ρ. Here the constant c1 =
c1 (ℓ, n) > 0 is chosen appropriately (Lemma 3) and ε > 0 is the number appearing in (3.6).

Fix the constant θ := 1/ 2. By induction, we shall construct a sequence of
real analytic symplectic transformations z = φν (ζ) of the form
x = uν (ξ),

¡ ¢T
y = v ν (ξ) + uνξ (ξ)−1 η,

(3.9)

such that uν (ξ) − ξ and v ν (ξ) are of period 1 in all variables, φν maps the strip
|Im ξ| ≤ θrν+1 , |η| ≤ θrν+1 int