Testing the Model HUSSIN

67 are in a defective state. This suggests that our model performed well, reaching nearly the same conclusion. The full comparison of results using the SPC chart method and the model developed is shown in Section 4.4.

4.3 Testing the Model

One way to check the model‘s fit is by simulating and generating a new dataset y~ and comparing it with the observed data y , as a graphical comparison, which is usually a good initial indicator. The simulation of y ~ is carried out by substituting the estimated parameters from the observed data into the model | i i x P  , which is used to identify the initialisation of stage 2 and the duration of time 1 l . Since the failure is also observed from the data, we can calculate the time 2 l . Knowing 1 l , 2 l and estimated parameters in the model, we could run the simulation that generates the simulated y ~ . See Figure 3.5 for the algorithm. If the matching between the observed y and simulated y ~ is good, then we may say that our model is reasonable to explain the hidden process. Thus, the Pearson product moment correlation coefficient was calculated, to assess whether there was a relationship between these two variables and to see how well the model explained the observed data. Here, we show two plots from our test datasets, which attempt to compare the actual vibration from observation and simulated vibration based on estimated parameters. Figure 4-1: Case 1: Simulated and observed vibration levels for Gu-b3 68 Figure 4-2: Case 2: Simulated and observed vibration levels for Gu-b4 From these graphical representations, Figure 4-1 and Figure 4-2 it is sufficient to say that the model fits well with the observed data. To see the significance level of this relationship, we set a value for the confidence level, normally 95 or 99 Albright et al., 1999. The result of this comparison is shown in Table 4-5 below. Gu-b3 Gu-b4 df n-2 7 20 2 R 0.958601 0.823308 95 Critical Values – Pearson Product Moment Correlation Coefficient 0.666 0.423 Table 4-5: Values for Pearson product moment correlation coefficient The null hypothesis for the Pearson product moment correlation coefficient is  r , indicating no correlation within variables. In the above cases, the critical value at 0.05 confidence level and 2  n degrees of freedom is greater than those from the table; therefore the hull hypothesis is rejected, which implies that there is a correlation between the observed and fitted data. Furthermore, we could see from Table 4-5 that the observed data for Gu-b3 and Gu-b4 represents 95 and 82 close to the model developed. With this high percentage of correlations, we could say that the model fits the data well. 69 The other test to determine how well the model actually reflects the data is by running a goodness-of-fit test. One statistical test that addresses this issue is the chi-square, 2  , goodness-of-fit test, which determines whether there is a significant difference between the sample data and the population distribution. We first formulated the conservative hypothesis, called the null hypothesis H , which states that there is no difference between the sample and the population distribution. The test procedure begins by arranging the n observations into a set of k class intervals or cells. The test statistic is given by 2        k i i i i np np N 1 2 4-1 where i N is the number of observations included in category i , i p is the probability that the random variable X of the population falls into category i and i np is the expected value of category i . The 2  statistic is then compared with the critical value of 2  distribution with the degree of freedom 1  k . If the critical value of 2  distribution is greater than the 2  statistic, we can say that the observed data follows the theoretical distribution. However, we cannot do this with this type of data as at each monitoring point we had only one observation. As an alternative, to divide the categories of data we could choose intervals with equal probability Law and Kelton, 2000. To carry out the chi-square test, we could transform | 1   i i y p into an equal probability, M 1 , which will follow a uniform distribution; see Figure 4-3 below. 70 Figure 4-3: Partition the pdf of | 1   i i y p into equal probability and transform it to uniform distribution Thus, we partition | 1   i i y p into M cells so that each cell has an equal probability, and then we examine which cell each observation fell in and count the number of observations in each cell. Hence, we can test whether the newly transformed data follows the uniform distribution. As an example, Zhang 2004 specified that the intervals should have equal probability measures, when testing the conditional residual time CRT model developed for oil analysis. The difficulty with carrying out a chi- square test is that there is no ideal recommendation for choosing the number and size of the intervals, k , which is guaranteed to produce good result in terms of validity Law and Kelton, 2000. However, a few guidelines can be followed to ensure the validity of the test, such as the one given by Law and Kelton 2000 where for equal probability intervals, the validity condition can only be satisfied if 3  k and 5  i N for all i . Applying all the considerations to satisfy the validity of the test, we choose the number of intervals = 4 with the size of equal probability 0.25. The computations of this test are given in Table 4-6 below, using two datasets Gu-b3 and Gub4, which have 31 observation points. 71 Number of Interval i N i p i Np i i Np N  i i i Np Np N 2  1 13 0.25 7.75 5.25 3.556452 2 5 0.25 7.75 -2.75 0.975806 3 5 0.25 7.75 -2.75 0.975806 4 8 0.25 7.75 0.25 0.008065 Total 31 1 5.516129 Table 4-6: Calculating the goodness-of-fit test The value of the test statistic is 5161 . 5 2   . Referring to the chi-square table, we see that 815 . 7 2 95 . , 3   , which is not exceeded by the critical value of 2  , so we would not reject the null hypothesis at 05 .   . Thus, this test gives us a reason to conclude that the sample is from the population distribution. The next stage was to use the data to calculate the expected time to stage 2 and calculate the expected time to failure, based upon the model developed in Chapter 3. We first model the problem with the expected time to the initial point of the second stage that is equation 3-36. The numerical results are shown in Table 4-7 below. time vibr | 1 i i X P        i i t t i dl l f dl l f t l 1 1 1 1 1 1 1 , | , 1 1 i i i i t L x t L E    9 1.6012 0.953525 112.359 107.14 25.5 1.7306 0.944808 112.359 106.16 35 1.8917 0.968074 112.359 108.77 50 2.4231 0.958727 112.360 107.72 57 2.5086 0.974952 112.360 109.55 74 3.4853 0.664153 112.360 74.62 85 5.3865 5.14E-08 112.360 0.00 102 13.0421 9.04E-19 112.360 0.00 105 23.7722 9.07E-19 112.360 0.00 Table 4-7: , | , 1 1 i i i i t L x t L E    with exponential distribution for Gu-b3 From the table, the conditional expected value of 1 L given i t L  1 is constant at every monitoring check. This is because 1 1 l f is assumed to be an exponential distribution, which has the memoryless property indicates that the remaining lifetime of a component is independent of its current age. This situation can be resolved if a different distribution 72 is used to replace the exponential distribution 1 1 l f . As an example, we chose the Weibull distribution with known  and  , and re-calculated the conditional expected value of 1 l given i t l  1 . This result is given in Table 4-8 below time vibr | 1 i i X P        i i t t i dl l f dl l f t l 1 1 1 1 1 1 1 , | , 1 1 i i i i t L x t L E    9 1.6012 0.953525 72.665 69.29 25.5 1.7306 0.944808 56.552 53.43 35 1.8917 0.968074 47.895 46.37 50 2.4231 0.958727 35.825 34.35 57 2.5086 0.974952 30.996 30.22 74 3.4853 0.664153 21.481 14.27 85 5.3865 5.14E-08 16.861 0.00 102 13.0421 9.04E-19 11.638 0.00 105 23.7722 9.07E-19 10.915 0.00 Table 4-8: , | , 1 1 i i i i t L x t L E    with Weibull distribution for Gu-b3 Next, we calculated the expected time to failure as stated in equation 3-37. A similar problem also occurs if we use an exponential distribution to calculate the conditional expected value for time 1 L and 2 L , given i t L L  2 1 , as shown in Table 4-9         i i t i t dl l f dl dl l f t l l l f 1 1 1 1 2 2 2 1 2 1 1           i i i i t l t l t i t dl dl l f l f dl dl l f t l l l f 1 2 2 2 1 1 1 2 2 2 1 2 1 1 1 1 , | , 1 2 1 2 i i i i t L L x t L L E      219.891 107.527 214.67 219.891 107.527 213.69 219.891 107.527 216.30 219.893 107.527 215.26 219.893 107.527 217.08 219.893 107.527 182.16 219.893 107.527 107.53 219.893 107.527 107.53 219.893 107.527 107.53 Table 4-9: , | , 1 2 1 2 i i i i t L L x t L L E      with exponential distribution for Gu-b3 Thus, if we choose others distributions, e.g. Weibull, with known  and  , we could obtain different results such as shown in Table 4-10 below: 73         i i t i t dl l f dl dl l f t l l l f 1 1 1 1 2 2 2 1 2 1 1           i i i i t l t l t i t dl dl l f l f dl dl l f t l l l f 1 2 2 2 1 1 1 2 2 2 1 2 1 1 1 1 , | , 1 2 1 2 i i i i t L L x t L L E      150.80 76.34 147.34 134.69 73.04 131.29 126.03 71.13 124.28 113.96 68.05 112.07 109.13 66.54 108.07 99.62 62.46 87.14

94.99 59.28