Chapter 6
SEQUENTIAL DECISION PROBLEMS: DISCRETE-TIME OPTIMAL
CONTROL
In this chapter we apply the results of the last two chapters to situations where decisions have to be made sequentially over time. A very important class of problems where such situations arise is in
the control of dynamical systems. In the first section we give two examples, and in Section 2 we derive the main result.
6.1 Examples
The trajectory of a vertical sounding rocket is controlled by adjusting the rate of fuel ejection which generates the thrust force. Specifically suppose that the equations of motion are given by 6.1.
˙x
1
t = x
2
t ˙x
2
t = −
C
D
x
3
t
ρx
1
tx
2 2
t − g +
C
T
x
3
t
ut ˙x
3
t = −ut , 6.1
where x
1
t is the height of the rocket from the ground at time t, x
2
t is the vertical speed at time
t, x
3
t is the weight of the rocket = weight of remaining fuel at time t. The “dot” denotes differentiation with respect to
t. These equations can be derived from the force equations under the assumption that there are four forces acting on the rocket, namely: inertia =
x
3
¨ x
1
= x
3
˙x
2
; drag force
= C
D
ρx
1
x
2 2
where C
D
is constant, ρx
1
is a friction coefficient depending on atmospheric density which is a function of
x
1
; gravitational force = gx
3
with g assumed constant; and thrust
force C
T
˙x
3
, assumed proportional to rate of fuel ejection. See Figure 6.1. The decision variable at time
t is ut, the rate of fuel ejection. At time 0 we assume that x
1
0, x
2
0, x
3
0 = 0, 0, M ; that is, the rocket is on the ground, at rest, with initial fuel of weight
M . At a prescribed final time t
f
, it is desired that the rocket be at a position as high above the ground as possible. Thus, the 75
76
CHAPTER 6. DISCRETE-TIME OPTIMAL CONTROL
decision problem can be formalized as 6.2. Maximize
x
1
t
f
subject to ˙xt = f xt, ut, 0 ≤ t ≤ t
f
x0 = 0, 0, M ut ≥ 0, x
3
t ≥ 0, 0 ≤ t ≤ t
f
, 6.2
where x = x
1
, x
2
, x
3 ′
, f : R
3+1
→ R
3
is the right-hand side of 6.1. The constraint inequalities ut ≥ 0 and x
3
t ≥ 0 are obvious physical constraints.
x
3
¨ x
1
= inertia
C
D
ϕx
1
x
2 2
= drag
gx
3
= gravitational force C
R
˙x
3
= thrust Figure 6.1: Forces acting on the rocket.
The decision problem 6.2 differs from those considered so far in that the decision variables, which are functions
u : [0, t
f
] → R, cannot be represented as vectors in a finite-dimensional space. We shall treat such problems in great generality in the succeeding chapters. For the moment
we assume that for computational or practical reasons it is necessary to approximate or restrict the permissible function
u· to be constant over the intervals [0, t
1
, [t
1
, t
2
, . . . , [t
N −1
, t
f
, where t
1
, t
2
, . . . , t
N −1
are fixed a priori. But then if we let ui be the constant value of u· over [t
i
, t
i+1
, we can reformulate 6.2 as 6.3:
Maximize x
1
t
N
t
N
= t
f
subject to xt
i+1
= gi, xt
i
, ui, i = 0, 1, . . . , N − 1 xt
= x0 = 0, 0, M ui ≥ 0, x
3
t
i
≥ 0, i = 0, 1, . . . , N . 6.3
In 6.3 gi, xt
1
, ui is the state of the rocket at time t
i+1
when it is in state xt
i
at time t
i
and ut ≡ ui for t
i
≤ t t
i+1
. As another example consider a simple inventory problem where time enters discretely in a natural
fashion. The Squeezme Toothpaste Company wants to plan its production and inventory schedule for the coming month. It is assumed that the demand on the ith day,
0 ≤ i ≤ 30, is d
1
i for
6.2.
MAIN RESULT 77
their orange brand and d
2
i for their green brand. To meet unexpected demand it is necessary that the inventory stock of either brand should not fall below
s 0. If we let si = s
1
i, s
2
i
′
denote the stock at the beginning of the ith day, and mi = m
1
i, m
2
i
′
denote the amounts manufactured on the ith day, then clearly
si + 1 + si + mi − di , where
di = d
1
i, d
2
i
′
. Suppose that the initial stock is ˆ
s, and the cost of storing inventory s for one day is
cs whereas the cost of manufacturing amount m is bm. The the cost-minimization decision problem can be formalized as 6.4:
Maximize
30
X
i=0
csi + bmi subject to
si + 1 = si + mi − di, 0 ≤ i ≤ 29 s0 = ˆ
s si ≥ s, s
′
, mi ≥ 0, 0 ≤ i ≤ 30 . 6.4
Before we formulate the general problem let us note that 6.3 and 6.4 are in the form of non- linear programming problems. The reason for treating these problems separately is because of their
practical importance, and because the conditions of optimality take on a special form.
6.2 Main Result