Directory UMM :Data Elmu:jurnal:I:International Journal of Production Economics:Vol67.Issue1.Aug2000:

Int. J. Production Economics 67 (2000) 53}61

Calculating the expected failure rate of complex equipment
subject to hazardous repair
Rose D. Baker*
Centre for Operational Research and Applied Statistics, University of Salford, Salford, M5 4WT, UK

Abstract
In hazardous repair of a repairable system, there is a strong probability of further failure during a period immediately
following the repair. To model failures of such systems under failure-based maintenance, a class of point process models is
proposed in which the failure intensity is a function only of equipment age and of time since most recent repair. Cost
modelling requires the computation of the expected number of failures by age t as a function of equipment age. This
expected number is shown to satisfy a Volterra (integral) equation of the second kind that generalises the renewal
equation. A solution algorithm is given. A simple (four-parameter) model from this class is introduced, and the
methodology is exempli"ed using a database of failures of non-maintained medical equipment. ( 2000 Elsevier Science
B.V. All rights reserved.
Keywords: Renewal equation; Failure rate; General repair; Poisson process; Repairable systems

1. Introduction
Mathematically, the failures of a repairable system such as an ECG monitor are most naturally
described using the failure intensity o. This is the

instantaneous rate of failure of a system at age
t given its previous history of failures and maintenance interventions. In this paper the equipment
considered was not subject to preventive maintenance, so that the failure intensity was a function of
equipment age and failure history only.
Failure intensity usually increases with age for
mechanical equipment, but may decrease with age
for electronic equipment or software, where defects
are gradually weeded out (e.g. [1]). Such models of

* Fax: #44-0161-280-6783.
E-mail address: r.d.baker@dial.pipex.com (R.D. Baker).

failure intensity as the power law and loglinear
Poisson processes are often used. The distinguishing feature of Poisson processes is that the previous
history of failure times t ,2, t does not a!ect
1
n
failure intensity.
However, in practice failure repair does sometimes a!ect failure rate. In the extreme case of
a system comprising a single component that is

replaced on failure, such as a lightbulb, the interfailure periods constitute a renewal process, and
one can imagine that for more complex systems
each repair might replace and so renew a proportion of the components. This would usually (but
not inevitably) lead to a decrease of failure intensity
following repair. Brown and Proschan [2]
developed an imperfect repair model, in which a
repair renewed the system with probability p, while
with probability 1!p it had no e!ect on failure
intensity.

0925-5273/00/$ - see front matter ( 2000 Elsevier Science B.V. All rights reserved.
PII: S 0 9 2 5 - 5 2 7 3 ( 0 0 ) 0 0 0 0 9 - 8

54

R.D. Baker / Int. J. Production Economics 67 (2000) 53}61

A common experience with systems encountered
in daily life, such as photocopiers, is that the system
has an increased intensity of failure just after failure

repair. Either as a result of the repair fresh problems were created, or the repair was incomplete and
did not really solve the original problem.
With the limited data that are typically available,
it is not possible to "t complex models that include
the whole history of repair. In general, the most
recent repair should have a larger e!ect on current
failure rate than earlier repairs. A model in which
failure intensity is a function only of system age and
of time since last repair is therefore attractive.
These variables can equivalently be taken as system
age and age at last repair.
Such models generalise both Poisson processes
and renewal processes, and so can be used to approximate the e!ect of repair on failure rate,
whether it is due to component replacement or to
hazardous or imperfect repair. These models can be
more pessimistic than the Brown and Proschan
model in that &imperfect repair' is here allowed to
mean that the repair might even increase failure
rate over that arising from a minimal repair.
This paper shows how to carry out computations

for calculating optimum age-based replacement
policies under such models. For completeness, the
other steps needed in the "tting of such models to
data are also (very) brie#y summarised. Without an
exploratory analysis of data, and model "tting and
validation, the functional forms and parameter
values of an appropriate model would be unknown.
However, those steps are not the intended focus of
this paper. A much fuller account of them is given
in a companion paper [3].
Before going further, it is necessary to give
a more precise de"nition of o(tDH ), the intensity of
t
failure at age t, where H denotes the history of the
t
failure process (number and timing of failures) to
age t. N(tDH ) is the number of failures occurring by
t
age t, and the intensity is de"ned as
o(tDH )" lim d~1 ProbMN(t#dDH )

t
t
d?0`
!N(tDH )"1N.
t
In the limit as dP0, given an &orderly' process in
which two or more failures never occur in the same

in"nitesimal interval dt, then o(tDH ) dt tends to the
t
probability of a failure occurring in the in"nitesimal interval (t, t#dt], given the previous history of
the process up to age t. Then o(tDH ) dt is the
t
probability that a single failure occurs in the in"nitesimal interval dt, and so is also the expected
number of failures occurring in dt. Hence, o(tDH ) is
t
the expected failure rate at time t given the previous
history of failures.
For a Poisson process, the intensity o(tDH ) is
t

numerically equal to the expected rate of occurrence of failures (ROCOF) l(t). Although o(t) is the
quantity that is estimated from data, it is l(t) that is
required for replacement calculations. Note that
l(t), being an expected rate, is not conditional on
the failure history. Ascher and Feingold [1] or
Thompson [4] discuss these concepts in more
detail.
The class of failure intensity models proposed
here is o"o(u, t), where u is the system age at the
most recent failure. A simple heuristic model of this
form has
o(u, t)"ab(at)b~1M1#c exp(!g(t!u))N,

(1)

where a, b, c and g are positive constants. Immediately after repair, at age u`, the failure intensity is
scaled up above the power-law process level at age
u~ by a factor of 1#c, which then decays back to
unity. If c is negative, the model gives a decreased
failure intensity after repair. Note that we require

c*!1 so that o*0. It is also possible to generate
many other plausible models from this class.
Lawless and Thiagarajah [5] discuss models of
type o"exp(hTz(t)), where h is a vector of parameter values and z(t) is a function of t and H(t).
Models of this type do not represent repair e!ects
well. By taking one element of z as a Heaviside step
function one could obtain models in which the
failure intensity would jump either up or down after
repair, but the modi"cation to the &baseline' intensity would not then attenuate with time.
Their formalism is more general than that presented here in that it allows the full history of
repairs to a!ect intensity, but is less general in that
their intensity is a product of factors, so that for
example model 1 is not of their type.
Calabria and Pulcini [6] discuss the 3-parameter
modulated power-law process (MPLP), in which

R.D. Baker / Int. J. Production Economics 67 (2000) 53}61

a system experiences a power-law process of
shocks, and k'0 shocks cause a failure, where

k need not be an integer. This model is of the class
proposed here, and can model either hazardous
(k(1) or partially renewing repairs.
With only three parameters however, the duration and size of any repair e!ect can not both be
arbitrary. A realistic model needs two parameters
to specify the &baseline' Poisson failure process,
then a further two to parameterise the repair e!ect:
one to specify its size, and one to specify its
duration.
Under an age-based replacement policy we need
to compute the optimum replacement age. With
discounting of costs at rate c, the sum of money
D(¹, c) needed to "nance initial system purchase at
cost C and expected repairs to replacement at time
¹ is

P

D(¹,c)"C#b


T
l(u) exp(!cu) du,
0

(2)

where l(u) is the ROCOF introduced earlier and
b is the mean repair cost. Costs for the replacement
system are all downweighted by a factor exp(!c¹),
costs for the second replacement by a factor
exp(!2c¹) and so on, so that the sum needed to
cover all future costs, given an in"nite series of
replacements at age ¹ is
S(¹ )"D (¹, c)/(1!exp(!c¹))
C#b:Tl(u) exp(!cu) du
0
"
1!exp(!c¹)
(see e.g. [7,8] for a discussion of discounting).
For small values of the discounting rate c,

C#:Tl(u) du
0
S(¹)K
"D(¹, c)/c¹,

so that as cP0, cS(¹)PD(¹, 0)/¹, and the optimum age-based replacement policy that minimises
the discounted sum S(¹) becomes equivalent to the
policy that minimises cost per unit time D(¹, 0)/¹.
To calculate either S(¹) or D(¹, c)/¹ one must
compute the ROCOF l(t) "rst.

55

A direct way to compute l(t) and its integral

P

M(t)"

t


l(u) du

(3)

0
for an assumed o(u, t) is by simulation. A large
number of realizations of the point process are
generated, and M(t), the expected cumulative number of failures by age t, read o! as the sample
average number of failures per system by time t. To
simulate a realization of the failure process, after
each failure, one generates a random time to the
next failure.
The pdf of failure at time t after a repair at time
u is

A P

t

B

o(u, v) dv
(4)
u
and for convenience de"ne f (u, t)"0 when u't.
This follows because o(u, v), besides being a failure
intensity, is also numerically equal to the hazard
function for the "rst failure at time v'u. Eq. (4) is
a standard equation relating the hazard function to
the pdf, but where here the time origin is the last
repair time u rather than zero.
The all-purpose &inverse method' of random variate generation is to generate a random number
X from the uniform distribution to correspond to
the distribution function F(u, t)":t f (u, v) dv, and
u
to then solve the equation F(u, >)"X to obtain
the required random variate >. For the model in
Eq. (1), this nonlinear equation involves the incomplete gamma function, which can be solved by
Newton}Raphson iteration.
Simulation, which seems at "rst to be a simple
and direct method of calculation, is in fact more
di$cult to program than the method developed
below. In addition, accuracy increases only slowly
with the number of realizations.
Much previous work on &general repair' has used
virtual age models, in which repair changes the virtual
or e!ective age of a system, and integral equations for
M(t) have been derived [9,10]. Baker and Wang [11]
"tted such an age-reduction model. Other models
also lead to renewal-type equations [12].
The next section derives an integral equation for
the ROCOF l(t) and its integral M(t) under the
model proposed here.
f (u, t)"o(u, t) exp !

56

R.D. Baker / Int. J. Production Economics 67 (2000) 53}61

2. A generalised renewal equation

failure, f (0, t), Eq. (5) becomes
0

Since the quantity l(t) dt is also the unconditional probability of a failure occurring in
(t, t#dt), it follows that l(t) is the density function
of the "rst, second, etc., failure occurring at t.
Hence,

l(t)"f (0, t)#
0

P

l(t)"f (0, t)#

t

PP

t t

f (0, u) f (u, v) f (v, t) du dv#2,

(5)

0 0

i.e., an in"nite sum.
This equation looks simpler on discretizing the
interval (0, t) into n steps of D"t/n, when f (u, v)
becomes f , where u"it/n, v"jt/n, and l(t) beij
comes
(6)

where f i is the ith power of the matrix f.
On rewriting Eq. (5) as an equation for l(u),
forming l(u) f (u, t) and integrating over all possible
last failure times u, we have

P

P

l(u) f (u, t) du"

0

0 0

f (0, u) f (u, v) f (v, t) du dv#2,
0

(8)

P

l(t)"f (0, t)#
0

t

l(u) f (u, t) du.

(9)

0

Note that it is possible to derive a second integral
equation, which corresponds to premultiplying
Eq. (5) by f rather than postmultiplying, which is

P

l(0, t)"f (0, t)#

t

f (0, u)l(u, t) du

0

=
l " + Di~1f i ,
0n
n
i/1

t

PP

t t

#

f (0, u) f (u, t) du
0

0

and on proceeding as before this yields the modi"ed
equation

f (0, u) f (u, t) du

0

#

P

t

t

in an obvious notation, but this equation does not
lend itself to recursive solution as does Eq. (7). It
cannot be derived when the pdf of "rst failure is
anomalous, f Of.
0
On integrating Eq. (7) and changing the order of
double integration,

P

M(t)"

f (0, u) f (u, t) du#2

t

P

l(v) dv"F(0, t)#

0

0

t

F(u, t)l(u) du,

0

or

"l(t)!f (0, t),

P

M(t)"F(0, t)!

giving

t

M(u) dF(u, t).

0

P

l(t)"f (0, t)#

t

l(u) f (u, t) du.

(7)

0

This is an integral equation for l(t), very similar to
the renewal equation, except that, for a renewal
process, the general function of u and t appearing
here reduces to f (t!u).
The e!ect of repair on intensity may well be
absent before the "rst failure, because the system
starts from &new' rather than from a repaired state.
For example, a simple power-law intensity
o(t)"ab(at)b~1 could be used instead of Eq. (1),
and Eq. (7) must then be modi"ed. With pdf of "rst

The starting point for numerical calculation of
l(t) and M(t) is however Eq. (7), in conjunction with
Eq. (3). It follows from the theory of Poisson processes that when o(u, t)"o(t), so that we have
a Poisson process, that l(t)"o(t) is a solution of
Eq. (7). As the solution of Volterra equations of the
second kind is unique (e.g. from Eq. (5)), l(t)"o(t)
must be the solution. The survival function corresponding to the hazard function o(t) with last repair
at time u is (1!F(0, t))/(1!F(0, u)), and so the pdf
f (u, t)"o(t)(1!F(0, t))/(1!F(0, u)). That it solves
Eq. (7) can be demonstrated on substituting it into
the right-hand side.

R.D. Baker / Int. J. Production Economics 67 (2000) 53}61

57

3. Computation of cost functions

4. Example

The general solution developed here is an adaptation of the method described in [13]. Setting
t"0 in Eq. (7), we have l(0)"f (0, 0), and on discretizing, l "f .
0
00
Carrying out integrals by the trapezoidal rule
incurs an error of O(D2), and Eq. (7) becomes

An inventory and maintenance database of
medical equipment serviced by a large teaching
hospital was studied. Besides purchase cost, the
main data were ages of equipment at failure. Several items of equipment subject to failure maintenance only, such as ECG monitors, showed an
increased failure rate for some time after repair.
In general, before a cost function can be plotted
as a function of replacement age ¹, there is a need
for at least the following steps:

i~1
l "f #Dl f /2#Dl f #D + l f ,
i
0i
0 0i
i ii
j ji
j/1
so that l is found recursively as
i

1. an exploratory analysis of the data, that will
indicate a suitable failure rate model;
2. "tting of a model or models to data;
3. some assurance that the model "t is acceptable.

f #Dl f /2#D +i~1 f l
j/1 ji j .
0 0i
l " 0i
i
1!Df /2
ii
The pdf f is similarly discretized, with
f "o ,
jj
jj

A A

i~1
f "o exp !D o /2#o /2# + o
ji
ji
jj
ji
jk
k/j`1

BB

.
(10)

It is not necessary to compute and store the full
matrix f before use, as elements f , j)i, can be
ji
computed for i from 1 to n, by storing the exponent
of Eq. (10) and the hazards o
, and updating the
j,i~1
lower triangle from i!1 to i.
Again, M is found as the integral of l by discretizing Eq. (3) as

A

B

i~1
M "D l /2# + l #l /2 .
i
0
j
i
j/1
Finally, accuracy can be greatly improved by
Richardson extrapolation, in which
MP(4MD !M D )/3.
2
The quantity of interest l or M must then also be
calculated at twice the step length, which requires
little extra computation if done in parallel with the
original calculation. Use of this extrapolation is
equivalent to use of Simpson's rule, and produces
an error of order O(D4).
The algorithm with n time steps requires time of
O(n2) to execute, and storage arrays of size O(n).

It is not the purpose of this paper to discuss these
issues, so the methodology used is presented only
brie#y.
Aalen plots [14] provide a non-parametric
estimate of failure intensity of repairable systems
as a function of machine age. This can be written as
m(t)
A(t)" + 1/>(s ),
i
i/1

(11)

where >(s) is the number of systems operational at
age s, s is the system age at the ith failure and m(t)
i
the total number of failures by age t. Fig. 1 shows
the Aalen plot for ECG monitors.
Besides a plot of failure intensity as a function of
age, it was also necessary to plot the hazard of the
next failure as a function of time from repair.
Eq. (11) still applies, except that the &lifetime' t now
becomes the failure interarrival period. Fig. 2
shows this plot for ECG monitors.
Models were "tted by the method of maximum
likelihood, where the likelihood function is
L"