Testing Normality and Bandwith Estimation Using Kernel Method For Small Sample Size

KERNEL METHOD

K nh h x f

= ∑

− − ≠

u u n n b b i i h u w

1 ( 1)

( ) ^*

(2.2)

, for all i ≠ i* = 1, 2, ..., n. The kernel estimates of h ₁ , h ₂ and h respectively are:

u ii* and v ii*

,X ₂ ,…,X _n be a random sample of density function F with density f. and let u ii* = X i + X i* with density h ^{1 and v ii* = X i + X i* with density h} ² furthermore, let h(u, v) as joint density of

X ₁

(2.1) for Xi= xi, i=1,2…,n. The kernel function K represents how the probaility mass assigned, so for histogram it is just a constant interval, wich satistisfied . The smoothing funtion hn is Let

1 ^ 1 ) ; (

⎝

⎛ −

= n i i h X x

62 Testing ......... (Netti Herawati & Khoirin Nisa)

⎟ ⎠ ⎞ ⎜

∑ =

Let X ^{1 ,X} ^{2 ,…,X n be a sample, we write the density} estimator

or bandwiths which analogous to the bin width in a histogram. The problem of choosing the choose the estimate that is most in accordance with one’s prior ideas about the density Jones (1984), bandwith is scale factor to control the spread out of point observation in the curve. In this article, we tested normality assumption using kernel method and estimated the optimal bandwith using Kullback-Leibler cross validation method for small sample sizes. In order to get the satisfying result we also use Kolmogorof-Smirnov goodness of fit test to compare with result obtained from kernel method.

) assumption on the errors is satisfied, this plot should look like a sample from normal distribution centered at zero. Unfortunately with small samples, considerable fluctuation in shape of histogram often occurs, so the appearance of moderate departure from normality does not necessarily imply a serious violation of assumption. However, gross deviations from normality are potentially serious and required further analysis Another way to test normality assumption in parametric method can be done by using normal probability plot of residulas. In nonparametric method there are also some procedures to test normality such as Shapiro- Wilk tests, Locke-Spurrier test, etc. However such tests require special tables. Kernel method is considered as nonparameteric method. In kernel method, the idea is based on density estimator by more fairly spreading out the probability mass of each observation, not arbitrarily in a fixed interval, but smoothly around the observation, In order to smooth around the observation, it is important to choose what is called smoothing function h

In many statistical inference the statistics test is ussually based on assumption of normal distribution. A check of normality assumption could be made by plotting a histogram of the residual. If the NID(0,

INTRODUCTION

This article aimed to study kernel method for testing normality and to determine the density function based on curve fitting technique (density plot) for small sample sizes. To obtain optimal bandwith we used Kullback- Leibler cross validation method. We compared the result using goodness of fit test by Kolmogorof Smirnov test statistics. The result showed that kernel method gave the same performance as Kolmogorof Smirnov for testing normality but easier and more convinient than Kolmogorof Smirnov does. Keywords: Normality, kernel method, bandwith, Kolmogorof Smirnov test

Department of Mathematics FMIPA University of Lampung ABSTRACT

Testing Normality and Bandwith Estimation Using Kernel Method For Small Sample Size

ˆ ( ) ⁱⁱ
ˆ( , ) ^{ii ii}

Jurnal ILMU DASAR Vol. 10 No. 1. 2009 : 62 – 67

1 (2.11)

25. The estimating optimal bandwidth using Kullback-Leibler cross validation method was done by using S-Plus software. We compare the result with the Kolmogorov-Smirnov Goodness of Fit test statistics..

To demonstre the method introduced we simulated the data from N(0,1), N(5,10), exponential ditribution, and Gamma distribution with the size of samples are 10 and

= max | F _n (x) – F(x)| > D

(x) We reject H if

: F(x) ≠ F

H : F(x) = F (x), ( ∀x)

₁

Kolmogorov dan Smirnov (1948) goodness of fit test is used to test:

Kolmogorov-Smirnov

= h _max = max CV _KL (h) ∑ log

We obtained h oversmoothing bandwidth or

_KL

(2.10) According to Hardle (1991), the optimal bandwidth h is h which maximized (CV KL )

∑ ∑ _{= ≠}

⎝ ⎛ − =

⎟⎟ ⎠ ⎞ ⎜⎜

⎢ ⎢ ⎣ ⎡

1 ₁ − − ⎥ ⎥ ⎦ ⎤

X X K n ⁿ _{i i j} ^{j i} ) 1 ( log log

[ ] h n

Normal distribution ((N(0,1)) with n = 10

CV h ₁ ) ( ˆ log

u u v v b b n n b i i h u v w w

= − ∫ ∫

δ ∞ ∞ −∞ ∞

2 ˆ ˆ ˆ ˆ [ ( , ) ( ) ( )] . h u v h u h v dudv

∫ ∫

∞ ∞ −∞ ∞ = −

2 ₁ ₂ [ ( , ) ( ) ( )] . h u v h u h v dudv δ

= ∑

− − − ≠

1 ( 1)

= 0.53 and CV

( ) ( ) ^{* *} ²

= 0.65 and CV KL (h) maximum = 0.7. Using the estimating curve with bandwith (h) 0.45 to 0.7, it showed that the optimal bandwith (h opt )

We obtained h oversmoothing bandwidth or h

Normal distribution ((N(0,1)) with n = 25

(h) maximum= 0.6, and the estimated curve is shown in Figure 1.

) is is the same as CV

opt

(h) maximum = 0.6. Using the estimating curve with bandwith (h) 0.3 to 0.6, it showed that the optimal bandwith (h

1 ) (

= ⁿ _i _{KL i h} X f n

63 and (2.4) with b = b n is a positive constant called bandwith and w(.). is a known symmetric bounded density called kernel. Therefore, we can assume w has mean 0 and a finite variance µ ^{2 (w), and that b}

μ,σ ²

X _n

Suppose that an independent observation X ₁ , X ₂ , …,

Kullback-Leibler cross validation method

random sample X ₁ , X ₂ , …, X _n and using estimates , and above, we can performe the normality test based on the assumption of:

h ^{1 (u)h} ^{2 (v) u and v are independent. Using the}

This is so, since H0 is equivalent to H : h(u, v) =

> 0. A measure of departure from H is: (2.5)

μ Є R and σ ²

) , for

) agints H ₁ : f isnot N(

[ ] ∑ ₌

μ,σ ²

→ 0 as n → ∞. If we want to test normality assumption such as: H : f is N(

∏ ∑ ∏ _{= ≠} ⎟⎟ ⎠ ⎞ ⎜⎜

⎝ ⎛ − −

= = _n _i _n _{j i} _{j i} ⁿ _i _{i i h i} h

X X K h n X f

X L ₁ _, ) 1 (

1 ) ( ˆ ) (

(2.9) times 1/n, we get Kullback-Leibler cross validation

( ) KL CV

from f were available. Then the likelihood of f as the density underlying the observation X would be log f(X), regarded as a function of h as the log likelihood of the smoothing parameter h. Likelihood Cross Validation (LCV) is average over each choice of ommited Xi, to give the score:

RESULTS AND DISCUSSION

⎝ ⎛ = ⎟

is the value of h which maximized the Kullback-Leibler information distance and h _os is h oversmoothing bandwidth.

(2.8) This estimate is called cross validation defined by:

ˆ ,

) 1 ( 1 ) (

X X K h n X f

− − = n j i j i i i h h

⎟⎟ ⎞ ⎜⎜ ⎛

∑ ≠

, , so that with counting from is the same as getting likelihoof function for Xi (Hardle,1991).

Statistics value for different h will guide us to get better h,, because the algorithn of this statistics approximately close to

For instance, we would like to test independent random variable Xi, Likelihood of Xi is ∏ .

, where h _opt

⎟ ⎠ ⎞ ⎜

(2.7) To estimate the optimal bandwidth can be done by minimized h _opt and h _os

∧ , log

( ) _i X f n LCV ^{^} ¹

log

∑ −

= (2.6)

From (2.6) can be seen that the value of h maximized LCV(h). The maximum LCV(h) can be obtained from Kullback Leiebler information distance, defined by:

( ) ( ) dx x f x f f f f d

KL h ∫

⎠ ⎞ ⎜ ⎝ ⎛

64 Testing ......... (Netti Herawati & Khoirin Nisa)

We obtained h oversmoothing bandwidth (h

Figure 1. Normal density curve (N(0,1)) with n=10.

(h) maximum = 6, and the estimated curve is shown in Figure 4.

) is the same as CV

opt

(h) maximum = 6. Using the estimating curve with bandwith (h) 3 to 6, it showed that the optimal bandwith (h

) = 5.3 and CV

Normal distribution ((N(5,10)) with n = 25

Normal distribution ((N(5,10)) with n = 10

(h) maximum = 5, and the estimated curve is shown in Figure 3.

) is the same as CV

opt

(h) maximum = 5. By using the estimating curve with bandwith (h) 3 to 5, it showed that the optimal bandwith (h

) = 4.4 and

We obtained h oversmoothing bandwidth (h

Figure 2. Normal density curve (N(0,1)) with n=25.

Jurnal ILMU DASAR Vol. 10 No. 1. 2009 : 62 – 67

(h) maximum = 2, and the estimated curve is shown in Figure 7.

We obtained h oversmoothing bandwidth (h

) = 1.52 and CV

(h) maximum = 0.5. Using the estimating curve with bandwith (h) 1 to 2, it showed that the optimal bandwith (h

opt

) is the same as CV

Gamma distribution (G(1,1)) with n = 25

(h) maximum = 0.5, and the

We obtained h oversmoothing bandwidth (h

) = 0.63 and CV

(h) maximum = 0.7. Using the estimating curve with bandwith (h) 0.3 to 0.7, it showed that the optimal bandwith (h

opt

) is the same as CV

Gamma distribution (G(1,1)) with n = 10

65 Figure 3. Normal density curve (N(5,10)) with n=10.

) is the same as CV

Figure 4. Normal density curve (N(5,10)) with n=25.

Exponential distribution ((E(1)) with n = 10

We obtained h oversmoothing bandwidth (h

) = 0,97 and CV

(h) maximum = 1. Using the estimating curve with bandwith (h) 0.5 to 1, it showed that the optimal bandwith (h

opt

) is the same as CV

(h) maximum = 1, and the estimated curve is shown in Figure 5.

Exponential distribution ((E(1)) with n = 25

We obtained h oversmoothing bandwidth (h

) = 0.49 and CV

(h) maximum = 0.5. Using the estimating curve with bandwith (h) 0.2 to 0.5, it showed that the optimal bandwith (h

opt

(h) maximum = 0.7, and the estimated curve is shown in Figure 8.

66 Testing ......... (Netti Herawati & Khoirin Nisa) Figure 5. Exponential density curve (E(1)) with n=10.

Figure 6. Exponential density curve (E(1)) with n = 25.

Jurnal ILMU DASAR Vol. 10 No. 1. 2009 : 62 – 67

67 Table 1. Kolmogorov-Smirnov Goodness of Fit Test Distribution

N(0,1) N(5,10) Exponential (1) Gamma (1,1) Sample D D D D D D D D

n 0.05 n 0.05 n 0.05 n

0.05

size

ns ns * *

n=10 0.1443 0.369 0.1081 0.369 0.3920 0.369 0.4438 0.369

n=25 0.0879 0.283 0.0776 0.283 0.2843 0.283 0.2772 0.283 Note: ns= nonsignificant at

α=0.05

=significant at

α=0.05

Kolmogorov-Smirnov goodness of Fit test departure from normality can be detected easily

The result of Kolmogorov-Smirnov Goodness using kernel method. By comparing the results of Fit Test can be seen in Table 1. We reject from Kernel Method and Kolmogorov Smirnov hipotesis nul when D > D . test, we conclude that the two test gave the

0.05 By comparing the figures obtained by same performance

Kernel Method with Kolmogorof Smirnov test, we can say that both methods gave the same

REFERENCES

preformance. Figure 1, Figure 2, Figure 3, and Figure 4 showed that the data were normally

distributed which were the same as

Kolmogorof Smirnov’s (Table 1). The same result were given by Figure 5 and Figure 6

(data from exponential distribution) as well as

Figure 7 and Figure 8 (data from Gamma distribution) when comparing with Kolmogorof Smirnov’s (Table 1).

The result is also the same as the result by

the result of Locke & Spurrier (1976) when

simulated from distribution different than

normal such as from the Chi-Square, the Cauchy and the Beta Distributions.

CONCLUSION

The simulation study illustrates that kernel method is useful for testing normality for n=10

and n=25. This study also reveals that severe

Testing Normality and Bandwith Estimation Using Kernel Method For Small Sample Size

KERNEL METHOD

RESULTS AND DISCUSSION

0.05 By comparing the figures obtained by same performance

Dokumen yang terkait

Implementation Multiscale Retinex Method For Image Enhancement

Estimation of Capacity of Lead Acid Battery Using RBF Model

Smarthome For Home Safety and Monitoring System Using Smartphone Application and Zigbee Wireless Communication

Penerapan Metode Empirical Best Linear Unbiased Prediction (EBLUP) pada Model Fay-Herriot Small Area Estimation (SAE)

Break Even Point Estimation Using Fuzzy Cluster(FCM)

The Comparison Of Vehicle Speed Accuracy Using Video Based Mixture Of Gaussian 2 Method and K- Nearest Neighbor Method

Comparison of Classic Laryngeal Mask Airway Size Accuracy between Body Weight Method and Tongue width Method

Solusi Alternatif : Small Area Estimation (SAE)

Aﬀective Abstract Image Classiﬁcation and Retrieval Using Multiple Kernel Learning

Using Rummy Game Method to Improve Students’ Learning Activities and English Dialog

Dukungan

Links

Testing Normality and Bandwith Estimation Using Kernel Method For Small Sample Size

KERNEL METHOD

RESULTS AND DISCUSSION

0.05 By comparing the figures obtained by same performance

Dokumen yang terkait

Implementation Multiscale Retinex Method For Image Enhancement

Estimation of Capacity of Lead Acid Battery Using RBF Model

Smarthome For Home Safety and Monitoring System Using Smartphone Application and Zigbee Wireless Communication

Penerapan Metode Empirical Best Linear Unbiased Prediction (EBLUP) pada Model Fay-Herriot Small Area Estimation (SAE)

Break Even Point Estimation Using Fuzzy Cluster(FCM)

The Comparison Of Vehicle Speed Accuracy Using Video Based Mixture Of Gaussian 2 Method and K- Nearest Neighbor Method

Comparison of Classic Laryngeal Mask Airway Size Accuracy between Body Weight Method and Tongue width Method

Solusi Alternatif : Small Area Estimation (SAE)

Aﬀective Abstract Image Classiﬁcation and Retrieval Using Multiple Kernel Learning

Using Rummy Game Method to Improve Students’ Learning Activities and English Dialog

Dokumen yang Anda mencari sudah siap untuk unduhkan