Directory UMM :Data Elmu:jurnal:O:Operations Research Letters:Vol26.Issue4.2000:

Operations Research Letters 26 (2000) 155–158
www.elsevier.com/locate/dsw

A fast algorithm for the transient reward distribution in
continuous-time Markov chains
H.C. Tijmsa; ∗ , R. Veldmanb
a Department

of Econometrics, Vrije Universiteit, De Boelelaan, 1081 HV Amsterdam, The Netherlands
b ORTEC Consultants, P.O. Box 490, 2800 AL Gouda, The Netherlands
Received 1 July 1998; received in revised form 1 November 1999

Abstract
This note presents a generally applicable discretization method for computing the transient distribution of the cumulative
reward in a continuous-time Markov chain. A key feature of the algorithm is an error estimate for speeding up the calculations.
c 2000 Elsevier Science B.V. All rights reserved.
The algorithm is easy to program and is numerically stable.
Keywords: Transient reward distribution; Discretization algorithm

1. Introduction
A fundamental and practically important problem

is the calculation of the transient probability distribution of the cumulative reward in a continuous-time
Markov chain. Practical applications include computer
systems and oil-production platforms with guaranteed
levels of performability over nite time periods. In
such systems high penalties must be paid when not
meeting the guaranteed levels of performability over
the speci ed time interval and thus the computation of
the probability of not meeting the performability requirement becomes necessary. More speci cally, consider a continuous-time Markov chain {X (t); t¿0}
with nite state space I and in nitesimal transition
rates qij ; j 6= i. A reward at rate ri is earned for each



Corresponding author.

unit of time the system is in state i. De ning the random variable O(t) by
O(t) = the cumulative reward earned up to time t;
the problem is to calculate the transient reward distribution P{O(t) ¿ x} for given value of t. An important special case of this general problem is the case
in which the state space I splits up in two disjoint
sets Io and If of operational and failed states and the

reward function ri = 1 for i ∈ Io and ri = 0 for i ∈ If .
In this case, the transient distribution of the cumulative reward reduces to the transient distribution of the
cumulative operational time. A transparent and elegant algorithm for the 0 –1 reward case was given in
De Souza e Silva and Gail [1] using the well-known
idea of uniformization in continuous-time Markov
chains. Recently, De Souza e Silva and Gail [2] extended their algorithm for the 0 –1 case to the case of
general reward rates. Unfortunately, this generalized

c 2000 Elsevier Science B.V. All rights reserved.
0167-6377/00/$ - see front matter
PII: S 0 1 6 7 - 6 3 7 7 ( 0 0 ) 0 0 0 2 3 - 7

156

H.C. Tijms, R. Veldman / Operations Research Letters 26 (2000) 155–158

algorithm is quite complicated and, worse, it is not
always numerically stable so its numerical answers
may be unreliable. In this note, we present an alternative algorithm that is both easy to program and
numerically stable. This alternative algorithm is based

upon discretization. The discretization algorithm itself is a direct generalization of an earlier algorithm
proposed by Goyal and Tantawi [3] for the 0 –1 case.
However, we could considerably speed up this naive
discretization algorithm by using a remarkably simple estimate for the error made in the discretization.
The discretization method has the additional advantage of being directly extendable to continuous-time
Markov chains with time-dependent transition
rates.

and y runs through 0; ; : : : ; ( x ). The recursion
scheme is initialized with
(
i = for y = ri ;
fi (; y) =
0
for y 6= ri ;
where { i ; i ∈ I } denotes the probability distribution
of the starting state at time 0. Using the simple-minded
approximation
Z x
x=

X
fi (t; y) dy ≈
fi (t; j);
0

j=1

we approximate the desired probability P{O(t) ¿ x}
by
P{O(t) ¿ x} ≈ 1 −

x=
XX

fi (t; j):

(1)

i∈I j=1


2. The discretization algorithm
Let us rst de ne fi (t; x) as the joint probability
density of the cumulative reward O(t) up to time t
and the state X (t) of the process at time t. In other
words,
fi (t; x)x ≈ P{O(t) ∈ (x; x + x) and X (t) = i}
for x → 0. Then
P{O(t) ¿ x} = 1 −

The computational complexity of this algorithm is
O(|Nt(t − x)|=2 ) where N denotes the number of
states. In practical applications, one is usually interested in P{O(t) ¿ x} for x suciently close to rmax t
convenient to
with rmax = maxi ri . Then
P it Risrmaxmore
t
write P{O(t) ¿ x} as
fi (t; y) dy and to
i∈I x
approximate P(O(t) ¿ x} by

P{O(t) ¿ x} ≈

XZ
i∈I

x

fi (t; y) dy:

0

In the sequel, it is assumed that the reward rates
ri are nonnegative integers. It is no restriction to
make this assumption. Next, we discretize x and t
in multiples of , where  is chosen suciently
small (i.e. the probability of more than one state
transition during a time period  should be small).
For xed  ¿ 0, the probability density fi (t; x) is
approximated by a discretized function fi (t; x). In
view of the probabilistic interpretation of fi (t; x)x,

this discretized function is de ned by the recursion
scheme
fi (u; y) = fi (u − ; y − ri )(1 − i )
X
+
fk (u − ; y − rk )qki ;
k6=i

P
where i = j6=i qij denotes the rate at which the system leaves state i. Here u runs through ; 2; : : : ; ( t )

X
i∈I

(rmax t−x)

X

fi (t; x + j):


(2)

j=1

The advantage of this representation is that it requires
fewer function evaluations of fi (u; y). For xed x
and t, the computational e ort of the algorithm is
proportional to 1=2 and so it quadruples when  is
halved. Thus, the computation time of the algorithm
gets very large when the probability P{O(t) ¿ x}
is desired in a high accuracy. Another drawback is
that no estimate is given for the discretization error.
Fortunately, both diculties can be overcome. Denote by P() the right-hand side of (1) (or (2)), and
let e() be the di erence between the exact value
of P{O(t) ¿ x} and the approximate value P().
The following remarkable result was empirically
found:
e() ≈ P() − P(2);

(3)


when  is not too large. In other words, the estimate P() of the true value of P{O(t) ¿ x} is much
improved when it is replaced by
P̃() = P() + (P() − P(2)):

157

H.C. Tijms, R. Veldman / Operations Research Letters 26 (2000) 155–158
Table 1
Numerical results for the rst example
t = 75; x = 178
P{O(t) ¿ x} = 0:99105

t = 75; x = 191
P{O(t) ¿ x} = 0:89416



P()


e()

P() − P(2)

1
4
1
8
1
16
1
32

0.99344

−0:00239






0.99229

−0:00124

−0:00116

0.99113

0.99168

−0:00063

−0:00061

0.99107

0.99137

−0:00032

−0:00031

0.99106

P̃()

e()

P() − P(2)

0.90651

−0:01234





0.90029

−0:00613

−0:00621

0.89408

0.89722

−0:00305

−0:00308

0.89414

0.89569

−0:00152

−0:00153

0.89416

P()

P̃()

Table 2
Numerical results for the second example
t = 25; x = 110
P{O(t) ¿ x} = 0:89773

t = 25; x = 115
P{O(t) ¿ x} = 0:54730



P()

e()

P() − P(2)

P̃()

P()

e()

P() − P(2)

P̃()

1
4
1
8
1
16
1
32
1
64

0.93035

−0:03262





0.56712

−0:01982





0.91353

−0:01580

−0:01681

0.89672

0.55684

−0:00954

−0:01028

0.54656

0.90551

−0:00778

−0:00802

0.89749

0.55198

−0:00468

−0:00486

0.54712

0.90159

−0:00386

−0:00392

0.89767

0.54962

−0:00232

−0:00236

0.54725

0.89965

−0:00193

−0:00194

0.89772

0.54845

−0:00115

−0:00117

0.54729

We could not nd in the literature on partial
di erential equations a result covering (3) neither we could prove (3) directly. However, on
the basis of numerous examples tested, it is
our conjecture that e() ∼ P() − P(2) for
 → 0.
The numerical results in Tables 1 and 2 demonstrate convincingly that replacing P() by P̃() =
P() + (P() − P(2)) leads to great improvements in accuracy and thus to considerable reductions in computation times. Table 1 refers to a
production-reliability system with three operating
units and a single repairman. The operation time
of each unit is exponentially distributed with mean
1= = 10 and the repair time of any failed unit is
exponentially distributed with mean 1= = 1. The
repairman can repair only one unit at a time. The
system earns a reward at rate ri = i for each unit of
time that i units are in operation. The system starts
with all units in good condition. This example can be
formulated as a continuous-time Markov chain with
state space I = {0; 1; 2; 3}, where state i means that

i units are in operation. The in nitesimal transition
rates are given by qi; i−1 = i and qi; i+1 = . In Table
2 we give some results for a second example dealing
with a continuous-time Markov chain with state space
I = {1; 2; 3; 4; 5}, initial state i = 1, reward vector
(ri ) = (5; 2; 4; 5; 4) and in nitesimal transition rates



0:1
0:5 0:25
0
1

0:5
0
0:5 



0:25

1
0:5 
(qij ) =  0
:
1
0
0

0:5 
1
0:25
0
1

Remark. The discretization algorithm needs only a
minor modi cation when in addition to the reward
rates ri a xed jump reward Fki is earned each time
the Markov chain jumps from state k to state i(6= k).
The recursion then becomes
fi (u; y) = fi (u − y; y − ri )(1 − i )
X
fk (u − ; y − rk  − Fki )qki :
+
k6=i

158

H.C. Tijms, R. Veldman / Operations Research Letters 26 (2000) 155–158

We found empirically that the error estimate for speeding up the convergence also applies in the case with
xed jump rewards.
References
[1] E. de Souza e Silva, H.R. Gail, Calculating cumulative
operational time distributions of repairable computer systems,
IEEE Trans. Comput. 35 (1986) 322–332.

[2] E. de Souza e Silva, H.R. Gail, An algorithm to calculate
transient distributions of cumulative rate and impulse rate based
rewards, Stochastic Models 14 (1998) 509–536.
[3] A. Goyal, A.N. Tantawi, A measure of guaranteed availability
and its numerical evaluation, IEEE Trans. Comput. 37 (1988)
25–32.