6.2.
MAIN RESULT 77
their orange brand and d
2
i for their green brand. To meet unexpected demand it is necessary that the inventory stock of either brand should not fall below
s 0. If we let si = s
1
i, s
2
i
′
denote the stock at the beginning of the ith day, and mi = m
1
i, m
2
i
′
denote the amounts manufactured on the ith day, then clearly
si + 1 + si + mi − di , where
di = d
1
i, d
2
i
′
. Suppose that the initial stock is ˆ
s, and the cost of storing inventory s for one day is
cs whereas the cost of manufacturing amount m is bm. The the cost-minimization decision problem can be formalized as 6.4:
Maximize
30
X
i=0
csi + bmi subject to
si + 1 = si + mi − di, 0 ≤ i ≤ 29 s0 = ˆ
s si ≥ s, s
′
, mi ≥ 0, 0 ≤ i ≤ 30 . 6.4
Before we formulate the general problem let us note that 6.3 and 6.4 are in the form of non- linear programming problems. The reason for treating these problems separately is because of their
practical importance, and because the conditions of optimality take on a special form.
6.2 Main Result
The general problem we consider is of the form 6.5. Maximize
N −1
X
i=0
f i, xi, ui
subject to dynamics
: xi + 1 − xi = fi, xi, ui, i = 0, . . . , N − 1 , initial condition:
q x0 ≤ 0, g
x0 = 0 , final condition:
q
N
xN ≤ 0, g
N
xN = 0 , state-space constraint:
q
i
xi ≤ 0, i = 1, . . . , N − 1 ,
control constraint: h
i
ui ≤ 0, i = 0, . . . , N − 1 .
6.5
Here xi ∈ R
n
, ui ∈ R
p
, f i, ·, · : R
n+p
→ R, f i, ·, · : R
n+p
→ R
n
, q
i
: R
n
→ R
m
i
, g
i
: R
n
→ R
ℓ
i
, h
i
: R
p
→ R
s
i
are given differentiable functions. We follow the control theory terminology, and refer to
xi as the state of the system at time i, and ui as the control or input
at time i.
We use the formulation mentioned in the Remark following Theorem 3 of V.1.2, and construct the Lagrangian function
L by Lx0, . . . , xN ; u0, . . . , uN − 1; p1, . . . , pN;
λ , . . . , λ
N
; α , α
N
; γ , . . . , γ
N −1
78
CHAPTER 6. DISCRETE-TIME OPTIMAL CONTROL
=
N −1
X
i=0
f i, xi, ui −
N −1
X
i=0
pi + 1
′
xi + 1 − xi − f i, xi, ui+
N
X
i=0
λ
i ′
q
i
xi + α
′
g x0 + α
N ′
g
N
xN +
N −1
X
i=0
γ
i ′
h
i
ui .
Suppose that CQ is satisfied for 6.5, and x
∗
0, . . . , x
∗
N ; u
∗
0, . . . , u
∗
N − 1, is an optimal solution. Then by Theorem 2 of 5.1.2, there exist
p
∗
i in R
n
for 1 ≤ i ≤ N, λ
i
∗
≥ 0 in R
m
i
for 0 ≤ i ≤ N, α
i
∗
in R
ℓ
i
for i = 0, N, and γ
i
∗
≥ 0 in R
s
i
for 0 ≤ i ≤ N − 1, such that
A the derivative of L evaluated at these points vanishes,
and B
λ
i
∗
q
i
x
∗
i = 0 for 0 ≤ i ≤ N, γ
i
∗
h
i
u
∗
i = 0 for 0 ≤ i ≤ N − 1 . We explore condition A by taking various partial derivatives.
Differentiating L with respect to x0 gives
f
0x
0, x
∗
0, u
∗
0 − {−p
∗
1
′
− p
∗
1
′
[f
x
0, x
∗
0, u
∗
0] +λ
∗
′
[q
0x
x
∗
0] + α
∗
′
[g
0x
x
∗
0]} = 0 , or
p
∗
0 − p
∗
1 = [f
x
0, x
∗
0, u
∗
x]
′
p
∗
1 +[f
0x
0, x
∗
0, u
∗
0]
′
− [q
0x
x
∗
0]
′
λ
∗
, 6.6
where we have defined p
∗
0 = [g
0x
x
∗
x]
′
α
∗
. 6.7
Differentiating L with respect to xi, 1 ≤ i ≤ N − 1, and re-arranging terms gives
p
∗
i − p
∗
i + 1 = [f
x
i, x
∗
i, u
∗
i]
′
p
∗
i + 1 +[f
0x
i, x
∗
i, u
∗
i]
′
− [q
ix
x
∗
i]
′
λ
i
∗
. 6.8
Differentiating L with respect to xN gives,
p
∗
N = −[g
N x
x
∗
N ]
′
α
N
∗
− [q
N x
x
∗
N ]
′
λ
N
∗
. It is convenient to replace
α
N
∗
by −α
N
∗
so that the equation above becomes 6.9 p
∗
N = [g
N x
x
∗
N ]
′
α
N
∗
− [q
N x
x
∗
N ]
′
λ
N
∗
. 6.9
Differentiating L with respect to ui, 0 ≤ i ≤ N − 1 gives
[f
0u
i, x
∗
i, u
∗
i]
′
+ [f
u
i, x
∗
i, u
∗
i]
′
p
∗
i + l − [h
iu
u
∗
i]
′
γ
i
∗
= 0 . 6.10
We summarize our results in a convenient form in Table 6.1
Remark 1: Considerable elegance and mnemonic simplification is achieved if we define the
Hamiltonian function H by
6. 2.
MAI N
RESUL
T
7 9
Suppose x
∗
0, . . . , x
∗
N ; u
∗
0, . . . , u
∗
N − 1 maximizes
N −1
X
i=0
f i, xi, ui subject
to the constraints below then there exist
p
∗
N ; λ
∗
, . . . , λN
∗
; α
∗
, α
N
∗
;
γ
∗
, . . . , γ
N −1
∗
, such that
dynamics: i = 0, . . . , N − 1
xi + 1 − xi = f i, xi, ui initial condition:
q x
∗
0 ≤ 0, g x
∗
0 = 0 final conditions:
q
N
x
∗
N ≤ 0, g
N
x
∗
N = 0 state space constraint:
i = 1, . . . , N − 1 q
i
x
∗
i ≤ 0 control constraint:
i = 0, . . . , N − 1 h
i
u
∗
i ≤ 0 adjoint equations:
i = 0, . . . , N − 1 p
∗
i − p
∗
i + 1 = [f
x
i, x
∗
i, u
∗
i]
′
p
∗
i + 1 +[f
0x
i, x
∗
i, u
∗
i]
′
− [q
ix
x
∗
i]
′
γ
i
∗
transversality conditions: p
∗
0 = [g
0x
x
∗
0]
′
α
∗
p
∗
N = [g
N x
x
∗
N ]
′
α
N
∗
− [q
N x
x
∗
N ]
′
λ
N
∗
[f
0u
i, x
∗
i, u
∗
i]
′
+ [f
u
i, x
∗
iu
∗
i]
′
. p
∗
i
1
= [h
iu
u
∗
i]
′
γ
i
∗
λ
∗
≥ 0, λ
∗
′
q x
∗
0 = 0 λ
N
∗
≥ 0, λ
N
∗ ′
q
N
x
∗
N = 0 λ
i
∗
≥ 0, λ
i
∗
′
q
i
x
∗
i = 0 γ
i
∗
≥ 0 γ
i
∗ ′
h
i
u
∗
i = 0 Ta
b le
6 .1
:
80
CHAPTER 6. DISCRETE-TIME OPTIMAL CONTROL
Hi, x, u, p = f i, x, u + p
′
f i, x, u . The dynamic equations then become
x
∗
i + 1 − x
∗
i = [H
p
i, x
∗
i, u
∗
i, p
∗
i + 1]
′
, 0 ≤ i ≤ N − 1 .
6.11 and the adjoint equations 6.6 and 6.8 become
p
∗
i − p
∗
i + 1 = [H
x
i, x
∗
i, u
∗
i, u
∗
i, p
∗
i + 1]
′
− [q
ix
x
∗
i]
′
λ
i
∗
, 0 ≤ i ≤ N − 1 ,
whereas 6.10 becomes [h
iu
u
∗
i]
′
γ
i
∗
= [H
u
i, x
∗
i, u
∗
i, p
∗
i + 1]
′
, 0 ≤ i ≤ N − 1 . 6.12
Remark 2: If we linearize the dynamic equations about the optimal solution we obtain
δxi + 1 − δxi = [f
x
i, x
∗
i, u
∗
i]δxi + [f
u
i, x
∗
i, x
∗
, i, u
∗
i]δui , whose homogeneous part is
zi + 1 − zi = [f
x
i, x
∗
i, u
∗
i]zi , which has for it adjoint the system
ri − ri + 1 = [f
x
i, x
∗
i, u
∗
i]
′
ri + 1 . 6.13
Since the homogeneous part of the linear difference equations 6.6, 6.8 is 6.13, we call 6.6, 6.8 the adjoint equations, and the
p
∗
i are called adjoint variables. Remark 3:
If the f
i, ·, · are concave and the remaining function in 6.5 are linear, then CQ is satisfied, and the necessary conditions of Table 6.1 are also sufficient. Furthermore, in this case we
see from 6.13 that u
∗
i is an optimal solution of Maximize
Hi, x
∗
i, u, p
∗
i + 1, subject to
h
i
u ≤ 0 . For this reason the result is sometimes called the maximum principle.
Remark 4: The conditions 6.7, 6.9 are called transversality conditions for the following reason.
Suppose q
≡ 0, q
N
≡ 0, so that the initial and final conditions read g x0 = 0, g
N
xN = 0, which describe surfaces in
R
n
. Conditions 6.7, 6.9 become respectively p
∗
0 = [g
0x
x
∗
0]
′
α
∗
, p
∗
N = [g
N x
xN ]
′
α
N
∗
which means that p
∗
0 and p
∗
N are respectively orthogonal or transversal to the initial and final surfaces. Furthermore, we note that in this case the initial and final conditions
specify ℓ
+ ℓ
n
conditions whereas the transversality conditions specify n −ℓ + n −ℓ
n
condi- tions. Thus, we have a total of
2n boundary conditions for the 2n-dimensional system of difference equations 6.5, 6.12; but note that these
2n boundary conditions are mixed, i.e., some of them refer to the initial time
0 and the rest refer to the final time.
6.2.
MAIN RESULT 81
Exercise 1: For the regulator problem,
Maximize
1 2
N −1
X
i=0
xi
′
Qxi + 1
2
N −1
X
i=0
ui
′
P ui subject to
xi + 1 − xi = Axi + Bui, 0 ≤ i ≤ N − 1 x0 = ˆ
x0, ui ∈ R
p
, 0 ≤ i ≤ N − 1 , where
xi ∈ R
n
, A and B are constant matrices, ˆ x0 is fixed, Q = Q
′
is positive semi-definite, and
P = P
′
is positive definite, show that the optimal solution is unique and can be obtained by solving a
2n-dimensional linear difference equation with mixed boundary conditions.
Exercise 2:
Show that the minimal fuel problem, Minimize
N −1
X
i=0
P
X
j=1
|ui
j
|
,
subject to xi + 1 − xi = Axi + Bui, 0 ≤ i ≤ N − 1
x0 = ˆ x0, xN = ˆ
xN , ui ∈ R
p
, |ui
j
| ≤ 1, 1 ≤ j ≤ p, 0 ≤ i ≤ N − 1 can be transformed into a linear programming problem. Here
ˆ x0, ˆ
xN are fixed, A and B are as in Exercise 1.
82
CHAPTER 6. DISCRETE-TIME OPTIMAL CONTROL
Chapter 7
SEQUENTIAL DECISION PROBLEMS: CONTINUOUS-TIME OPTIMAL
CONTROL OF LINEAR SYSTEMS
We will investigate decision problems similar to those studied in the last chapter with one math- ematically crucial difference. A choice of control has to be made at each instant of time
t where t varies continuously over a finite interval. The evolution in time of the state of the systems to be
controlled is governed by a differential equation of the form:
˙xt = f t, xt, ut , where
xt ∈ R
n
and ut ∈ R
p
are respectively the state and control of the system at time t.
To understand the main ideas and techniques of analysis it will prove profitable to study the linear case first. The general nonlinear case is deferred to the next chapter. In Section 1 we present the
general linear problem and study the case where the initial and final conditions are particularly simple. In Section 2 we study more general boundary conditions.
7.1 The Linear Optimal Control Problem