67 are in a defective state. This suggests that our model performed well, reaching nearly
the same conclusion. The full comparison of results using the SPC chart method and the model developed is shown in Section 4.4.
4.3 Testing the Model
One way to check the model‘s fit is by simulating and generating a new dataset y~ and comparing it with the observed data
y
, as a graphical comparison, which is usually a good initial indicator. The simulation of y
~ is carried out by substituting the estimated parameters from the observed data into the model
|
i i
x P
, which is used to identify the initialisation of stage 2 and the duration of time
1
l . Since the failure is also observed from the data, we can calculate the time
2
l . Knowing
1
l ,
2
l and estimated parameters in the model, we could run the simulation that generates the simulated y
~ . See Figure 3.5
for the algorithm. If the matching between the observed
y
and simulated y
~ is good, then we may say that our model is reasonable to explain the hidden process. Thus, the
Pearson product moment correlation coefficient was calculated, to assess whether there was a relationship between these two variables and to see how well the model explained
the observed data. Here, we show two plots from our test datasets, which attempt to compare the actual vibration from observation and simulated vibration based on
estimated parameters.
Figure 4-1: Case 1: Simulated and observed vibration levels for Gu-b3
68
Figure 4-2: Case 2: Simulated and observed vibration levels for Gu-b4
From these graphical representations, Figure 4-1 and Figure 4-2 it is sufficient to say
that the model fits well with the observed data. To see the significance level of this relationship, we set a value for the confidence level, normally 95 or 99 Albright et
al., 1999. The result of this comparison is shown in Table 4-5 below.
Gu-b3 Gu-b4
df n-2 7
20
2
R 0.958601 0.823308
95 Critical Values – Pearson
Product Moment
Correlation Coefficient
0.666 0.423
Table 4-5: Values for Pearson product moment correlation coefficient
The null hypothesis for the Pearson product moment correlation coefficient is
r
, indicating no correlation within variables. In the above cases, the critical value at 0.05
confidence level and
2
n
degrees of freedom is greater than those from the table; therefore the hull hypothesis is rejected, which implies that there is a correlation
between the observed and fitted data. Furthermore, we could see from Table 4-5 that
the observed data for Gu-b3 and Gu-b4 represents 95 and 82 close to the model developed. With this high percentage of correlations, we could say that the model fits
the data well.
69 The other test to determine how well the model actually reflects the data is by running a
goodness-of-fit test. One statistical test that addresses this issue is the chi-square,
2
, goodness-of-fit test, which determines whether there is a significant difference between
the sample data and the population distribution. We first formulated the conservative hypothesis, called the null hypothesis
H
, which states that there is no difference between the sample and the population distribution. The test procedure begins by
arranging the n observations into a set of
k
class intervals or cells. The test statistic is given by
2
k i
i i
i
np np
N
1 2
4-1 where
i
N is the number of observations included in category
i
,
i
p is the probability that the random variable X of the population falls into category i and
i
np
is the expected value of category
i
. The
2
statistic is then compared with the critical value of
2
distribution with the degree of freedom
1
k
. If the critical value of
2
distribution is greater than the
2
statistic, we can say that the observed data follows the theoretical distribution.
However, we cannot do this with this type of data as at each monitoring point we had only one observation. As an alternative, to divide the categories of data we could choose
intervals with equal probability Law and Kelton, 2000. To carry out the chi-square test, we could transform
|
1
i i
y p
into an equal probability,
M 1
, which will follow a
uniform distribution; see Figure 4-3 below.
70
Figure 4-3: Partition the pdf of
|
1
i i
y p
into equal probability and transform it to uniform distribution
Thus, we partition
|
1
i i
y p
into
M
cells so that each cell has an equal probability, and then we examine which cell each observation fell in and count the number of
observations in each cell. Hence, we can test whether the newly transformed data follows the uniform distribution. As an example, Zhang 2004 specified that the
intervals should have equal probability measures, when testing the conditional residual time CRT model developed for oil analysis. The difficulty with carrying out a chi-
square test is that there is no ideal recommendation for choosing the number and size of the intervals,
k
, which is guaranteed to produce good result in terms of validity Law and Kelton, 2000. However, a few guidelines can be followed to ensure the validity of
the test, such as the one given by Law and Kelton 2000 where for equal probability intervals, the validity condition can only be satisfied if
3
k
and
5
i
N
for all
i
. Applying all the considerations to satisfy the validity of the test, we choose the number
of intervals = 4 with the size of equal probability 0.25. The computations of this test are
given in Table 4-6 below, using two datasets Gu-b3 and Gub4, which have 31
observation points.
71 Number of
Interval
i
N
i
p
i
Np
i i
Np N
i i
i
Np Np
N
2
1 13
0.25 7.75
5.25 3.556452
2 5
0.25 7.75
-2.75 0.975806
3 5
0.25 7.75
-2.75 0.975806
4 8
0.25 7.75
0.25 0.008065
Total 31
1 5.516129
Table 4-6: Calculating the goodness-of-fit test
The value of the test statistic is
5161 .
5
2
. Referring to the chi-square table, we see
that
815 .
7
2 95
. ,
3
, which is not exceeded by the critical value of
2
, so we would not reject the null hypothesis at
05 .
. Thus, this test gives us a reason to conclude that
the sample is from the population distribution.
The next stage was to use the data to calculate the expected time to stage 2 and calculate the expected time to failure, based upon the model developed in Chapter 3. We first
model the problem with the expected time to the initial point of the second stage that is
equation 3-36. The numerical results are shown in Table 4-7 below.
time vibr
| 1
i i
X P
i i
t t
i
dl l
f dl
l f
t l
1 1
1 1
1 1
1
, |
,
1 1
i i
i i
t L
x t
L E
9 1.6012
0.953525 112.359
107.14 25.5
1.7306 0.944808
112.359 106.16
35 1.8917
0.968074 112.359
108.77 50
2.4231 0.958727
112.360 107.72
57 2.5086
0.974952 112.360
109.55 74
3.4853 0.664153
112.360 74.62
85 5.3865
5.14E-08 112.360
0.00
102 13.0421 9.04E-19
112.360 0.00
105 23.7722 9.07E-19
112.360 0.00
Table 4-7:
, |
,
1 1
i i
i i
t L
x t
L E
with exponential distribution for Gu-b3
From the table, the conditional expected value of
1
L given
i
t L
1
is constant at every monitoring check. This is because
1 1
l f
is assumed to be an exponential distribution, which has the memoryless property indicates that the remaining lifetime of a component
is independent of its current age. This situation can be resolved if a different distribution
72 is used to replace the exponential distribution
1 1
l f
. As an example, we chose the Weibull distribution with known
and , and re-calculated the conditional expected value of
1
l given
i
t l
1
. This result is given in Table 4-8 below
time vibr
| 1
i i
X P
i i
t t
i
dl l
f dl
l f
t l
1 1
1 1
1 1
1
, |
,
1 1
i i
i i
t L
x t
L E
9 1.6012
0.953525 72.665
69.29
25.5 1.7306
0.944808 56.552
53.43 35
1.8917 0.968074
47.895 46.37
50 2.4231
0.958727 35.825
34.35 57
2.5086 0.974952
30.996 30.22
74 3.4853
0.664153 21.481
14.27
85 5.3865
5.14E-08 16.861
0.00
102 13.0421 9.04E-19
11.638 0.00
105 23.7722 9.07E-19
10.915 0.00
Table 4-8:
, |
,
1 1
i i
i i
t L
x t
L E
with Weibull distribution for Gu-b3
Next, we calculated the expected time to failure as stated in equation 3-37. A similar problem also occurs if we use an exponential distribution to calculate the conditional
expected value for time
1
L and
2
L , given
i
t L
L
2 1
, as shown in Table 4-9
i i
t i
t
dl l
f dl
dl l
f t
l l
l f
1 1
1 1
2 2
2 1
2 1
1
i i
i i
t l
t l
t i
t
dl dl
l f
l f
dl dl
l f
t l
l l
f
1 2
2 2
1 1
1 2
2 2
1 2
1 1
1 1
, |
,
1 2
1 2
i i
i i
t L
L x
t L
L E
219.891 107.527
214.67 219.891
107.527 213.69
219.891 107.527
216.30 219.893
107.527 215.26
219.893 107.527
217.08 219.893
107.527 182.16
219.893 107.527
107.53 219.893
107.527 107.53
219.893 107.527
107.53
Table 4-9:
, |
,
1 2
1 2
i i
i i
t L
L x
t L
L E
with exponential distribution for Gu-b3
Thus, if we choose others distributions, e.g. Weibull, with known and , we could
obtain different results such as shown in Table 4-10 below:
73
i i
t i
t
dl l
f dl
dl l
f t
l l
l f
1 1
1 1
2 2
2 1
2 1
1
i i
i i
t l
t l
t i
t
dl dl
l f
l f
dl dl
l f
t l
l l
f
1 2
2 2
1 1
1 2
2 2
1 2
1 1
1 1
, |
,
1 2
1 2
i i
i i
t L
L x
t L
L E
150.80 76.34
147.34
134.69 73.04
131.29 126.03
71.13 124.28
113.96 68.05
112.07 109.13
66.54 108.07
99.62 62.46
87.14
94.99 59.28