55
3.8.2 Expectation-Maximization EM Algorithm
The Expectation-Maximization algorithm was originally developed by Dempster et al. 1977 for finding the maximum-likelihood estimate of the parameters of an underlying
distribution from a given data set when the data is incomplete or has missing values Bilmes, 1998. To demonstrate this algorithm using our data, we start with the general
notation of the EM algorithm given by Bilmes 1998. Assume data
y
is observed and called as incomplete data and x denotes the hidden or missing value. A complete
dataset exists, which is ,
x y
z
and has a joint density function as follows:
| ,
| |
, |
x p
x y
p x
y p
z p
3-30 Here we assume both x and
y
are continuous.
This joint density function describes a joint relationship between the missing and observed values. With this density function, we can define a likelihood function of
dataset ,
x y
z
as |
, ,
| |
x
y p
x y
L z
L
3-31
which is called the complete-data likelihood. The EM algorithm consists of two steps, the first being to find the expected value of the complete data log likelihood
| ,
log
x y
p with respect to the unknown state, x , given the observed data
y
and the current parameter estimates,
j
. We define this expected log likelihood as
, |
| ,
log ,
j j
x x
y p
E Q
3-32
where
dx x
p x
y p
x x
y p
E
X x
j j
| |
, log
, |
| ,
log
3-33
and where
X
is the complete set of x . The second step is to find the maximum of the expectation that we have computed in the first step, that is
, max
j
Q
. To obtain
56
1
j
Q
, these two steps are repeated until
tol Q
Q
j j
| |
1
, where
tol
is a pre-set tolerance level.
Applying this to our case study, given the observed value, the complete data likelihood has the following form, in which
, |
1
i i
y p
describes the observation values and ,
|
1
i i
x P
describes the hidden states. Note
i
X is a discrete random variable in our case. To calculate the estimate parameter using one life cycle of data using the EM
approach, the algorithm is as follows:
Start:
an initial estimate of
E-step: calculate the conditional expectation
,
j
Q
=
T i
j i
i i
i
x P
y p
1 1
1
, |
} ,
| log{
T i
j i
i N
x x
N x
i i
i i
i i
x P
x P
x x
P x
y p
i i
i
1 1
1 1
1 2
1 1
, |
, |
, |
, |
log
1 1
where
2 1
,
is a set of unknown parameters,
T
is the number of monitoring checks during the cycle,
N
is the number of states,
i
y
is the condition monitoring reading at time
i
t
and
1 ,
2 1
1
..., ,
i i
y y
y
.
M-step: maximize
,
j
Q
to obtain the next estimate,
, max
j
Q
to obtain
1
j
Q
.
Similarly, if we have m life cycles, the new algorithm is as follows
Start: an initial estimate of
E-step: calculate the conditional expectation
,
j
Q
=
m j
T i
j ji
ji ji
ji
j
x P
y p
1 1
1 1
, |
} ,
| log{
m j
T i
j ji
ji N
x x
N x
ji ij
ji ji
ji ji
j i
i i
x P
x P
x x
P x
y p
1 1
1 1
1 1
2 1
1
, |
, |
, |
, |
log
1 1
57 where m is the number of complete life cycles,
2 1
,
is a set of unknown parameters,
j
T
is the number of monitoring checks of the jth cycle,
ji
y
is the condition monitoring reading at
i
t
for the jth cycle and the
ith
monitoring,
ji
x
and
1
ji
x
are the states of the system at
i
t
and
1
i
t for the jth cycle and
ith
monitoring respectively, and
1 ,
2 1
1
..., ,
ji j
j ji
y y
y
.
M-step: maximize
,
j
Q
to obtain the next estimate,
, max
j
Q
to obtain
1
j
Q
.
Applying this to our simulated data without using failure information, we obtained the
results in Table 3-4 below.
Estimated
Value
1
ˆ = 0.2033
ˆ
= 3.1886 bˆ = 0.8013
1
ˆ
= 0.0239
2
ˆ
= not available
True Value
1
= 0.2176 = 4.0
b
= 0.8
1
= 0.05
2
= 0.025
Table 3-4: The estimated parameters and true values ,
j
Q
In our expectations, the EM should be better since, in theory, we cannot observe the hidden state and the algorithm is guaranteed to converge to a local maximum of the
likelihood function Dempster et al., 1977. Surprisingly, the EM procedure explained above produces almost the same result as the ordinary likelihood function without the
failure information; see
Table 3-2
. It is believed that the complete likelihood function in the E-step is not enough to describe the data. Therefore, the function for the E-step
was added with failure information, as follows:
,
j
Q
=
N i
j i
i i
i
x P
y p
1 1
1
, |
} ,
| log{
+log |
3
1 N
N
X P
3-34
where
1
N
t
is the failure point. When the same procedure was reapplied for m sets of
data, it generated a set of new estimates, shown in Table 3-5 below.
58 Estimated
Value
1
ˆ = 0.2032 ˆ = 3.1886 bˆ = 0.8013
1
ˆ
= 0.0239
2
ˆ = 0.016
True Value
1
= 0.2176 = 4.0
b
= 0.8
1
= 0.05
2
= 0.025
Table 3-5: The estimated parameters and true values with modified
,
j
Q
Again, the EM procedure produces almost the same results as the ordinary likelihood function with failure information. This is because the approach taken to formulate the
EM algorithm is almost the same as for the ordinary likelihood function, except that we multiplied it with
|
1
i i
x p
in equations 3-34 to boost the optimisation calculation. We were able to estimate
1
ˆ
simply because we had the information about when the time
1
l began and ended. In contrast, we were unable to estimate
2
ˆ because we did not have enough information concerning when the time
2
l ended. However, the failure information provides new data on the ending of the time
2
l , allowing the parameter
2
ˆ
to be estimated. In practice, failure information is difficult to obtain due to preventive
replacements, but expert judgments Zhang, 2004 can be used to overcome this problem.
3.9 Goodness-of