If the observed value of the test-statistic , the the null hypothesis should be reject and accepted

  NONPARAMETRIK

NON PARAMETRIC TEST

  The majority of hypothesis tests discussed so far have made inferences about population parameters, such as the mean and the proportion. These parametric tests have used the parametric statistics of samples that came from the population being tested.

  To formulate these tests, we made restrictive assumptions about the populations from which we drew our samples. For example, we assumed that our samples either were large or came from normally distributed populations. But populations are not always normal.

  And even if a goodness-of-ft test indicates that a population is approximately normal. We cannot always be sure we’re right, because the test is not 100 percent reliable. Fortunately, in recent times statisticians have develops useful techniques that do not make restrictive assumption about the shape of population distribution. These are known as distribution – free or, more commonly, nonparametric test.

  Non parametric statistical procedures in preference to their parametric counterparts.

  The hypotheses of a nonparametric test are concerned with something other than the value of a population parameter.

  A large number of these tests exist, but this section will examine only a few of the better known and more widely used ones :

  SIGN TEST WILCOXON SIGNED RANK TEST MANN – WHITNEY TEST (WILCOXON RANK SUM TEST) RUN TEST KRUSKAL – WALLIS TEST KOLMOGOROV – SMIRNOV TEST LILLIEFORS TEST

NON PARAMETRIC TESTS

THE SIGN TEST

  The sign test is used to test hypotheses about the median of a continuous distribution. The median of a distribution is a value of the random variable X such that the probability is 0,5 that an observed value of X is less than or equal to the median, and the probability is 0,5 that an observed value of X is greater than or equal to the median. That is, Since the normal distribution is symmetric, the mean of a normal distribution equals the median. Therefore, the sign test can be used to test hypotheses about the mean of a normal Let X denote a continuous random variable with median and let denote a random sample of size n from the population of interest.

If denoted the hypothesized value of the

population median, then the usual forms of the

hypothesis to be tested can be stated as follows :

  VERSUS

(right-tailed (left-tailed (two-tailed

(right-tailed (left-tailed (two-tailed

test) test) test)

test) test) test) Form the diferences : Now if the null hypothesis is true,

any diference is equally likely to be positive

or negative. An appropriate test statistic is the

number of these diferences that are positive,

say . Therefore, to test the null hypothesis we are

really testing that the number of plus signs is a value

of a Binomial random variable that has the parameter

p = 0,5 .

  

A p-value for the observed number of plus signs

can be calculated directly from the Binomial

distribution. Thus, if the computed p-value.

is less than or equal to some preselected signifcance

level α , we will reject and conclude is

true.

  To test the other one-sided hypothesis, vs is less than or equal α, we will reject .

  The two-sided alternative may also be tested. If the hypotheses are: vs p-value is : It is also possible to construct a table of critical value for the sign test.

  As before, let denote the number of the diferences that are positive and let denote the number of the diferences that are negative.

  Let , table of critical values for the sign test that ensure that If the observed value of the test-statistic , the the null hypothesis should be reject and accepted

  If the alternative is , then reject if .

  If the alternative is , then reject if .

  The level of signifcance of a one-sided test is one-half the value for a two-sided test.

  TIES in the SIGN TEST Since the underlying population is assumed to be continuous, there is a zero probability that we will fnd a “tie”d , that is , a value of exactly equal to .

  When ties occur, they should be set aside and the sign test applied to the remaining data.

THE NORMAL APPROXIMATION

  When , the Binomial distribution is well approximated by a normal distribution when n is at least 10. Thus, since the mean

of the Binomial is and the variance is

, the distribution of is approximately normal with mean 0,5n and variance 0,25n whenever n is moderately large.

Therefore, in these cases the null hypothesis can be tested using the statistic : Critical Regions/Rejection Regions for α-level tests versus are given in this table :

CRITICAL/REJECTION REGIONS FOR

   Alternative CR/RR

THE WILCOXON SIGNED-RANK TEST

  The sign test makes use only of the plus and minus signs of the diferences between the observations and the median (the plus and minus signs of the diferences between the observations in the paired case). Frank Wilcoxon devised a test procedure that uses both direction (sign) and magnitude.

  This procedure, now called the Wilcoxon signed-rank test.

  The Wilcoxon signed-rank test applies to the case of the symmetric continuous distributions.

  Under these assumptions, the mean equals the median.

  Description of the test : We are interested in testing, versus

  Assume that is a random sample from a continuous and symmetric distribution with mean/median : . Compute the diferences , i 1, 2, … n

Rank the absolute diferences , and then give

the ranks the signs of their corresponding diferences.

Let be the sum of the positive ranks, and be the absolute value of the sum of the negative ranks, and let .

  Critical values of , say .

  1. If , then value of the statistic , reject

2. If , reject if

LARGE SAMPLE APPROXIMATION

  

If the sample size is moderately large (n>20),

then it can be shown that or has

approximately a normal distribution with mean

and variance

Therefore, a test of can be based

on the statistic

  Wilcoxon Signed-Rank Test Test statistic : Theorem : The probability distribution of when is true, which is based on a random sample of size n, satisfes :

  Proof : Let if , then where For a given , the discrepancy has a 50 : 50 chance

being “+”d or “-”d. Hence, where

PAIRED OBSERVATIONS

  

The Wilcoxon signed-rank test can be applied to paired

data.

  

Let ( ) , j 1,2, …n be a collection of paired

observations from two continuous distributions that difer

only with respect to their means. The distribution of the

diferences is continuous and symmetric.

  

The null hypothesis is : , which is equivalent

to

.To use the Wilcoxon signed-rank test, the diferences are

frst ranked in ascending order of their absolute values,

and then the ranks are given the signs of the diferences.

  Let be the sum of the positive ranks and

be the absolute value of the sum of the negative

ranks, and .

If the observed value , then is rejected

and accepted.

  If , then reject , if If , reject , if

  10

  • 3
  • 4

  79

  84

  48

  71

  82

  9

  13

  6

  4

  1

  14

  12

  8

  4

  52

  7

  4

  6

  1

  2

  11

  9

  4

  8

  10

  7

  6

  1

  11

  74

  56

  92

  Eleven students were randomly selected from a large statistics class, and their numerical grades on two successive examinations were recorded.

  • 3
  • 4

  • 2
  • 2

  Use the Wilcoxon signed rank test to determine whether the second test was more difcult than the frst. Use α 0,1.

  EXAMPLE Studen t Test 1 Test 2 Diferenc e Ran k Sign Rank

  1

  2

  3

  4

  5

  6

  7 solution : Jumlah ranks positif :

  8

  9

  10

  11

  94

  78

  89

  62

  49

  78

  80

  82

  62

  83

  79

  85

  65

  9

  • 3
  • 4

  TOLAK H

  1,28 1,69

  EXAMPLE Ten newly married couples were randomly selected, and each husband and wife were independently asked the question of how many children they would like to have. The following information was obtained.

  

COUPLE 1 2 3 4 5 6 7 8 9

10 WIFE X 3 2 1 0 0 1 2 2 2 0

  HUSBAND Y 2 3 2 2 0 2 1 3 1 2

  Using the sign test, is test reason to believe that wives want fewer children than husbands? Assume a maximum size of type I error of 0,05

  SOLUSI Tetapkan dulu H dan H :

1 H : p 0,5

  vs H : p < 0,5

1 Pasangan 1 2 3 4 6 7 8 9 10

  Tanda + - - - - + - + - Ada tiga tanda +.

  Di bawah H , S ~ BIN (9 , 1/2) P(S ≤ 3) 0,2539 Pada peringkat α 0,05 , karena 0,2539 > 0,05 maka H jangan ditolak.

THE WILCOXON RANK-SUM TEST THE WILCOXON RANK-SUM TEST

  Suppose that we have two independent continuous populations X and X with means µ

  

1

  2

  1 and µ Assume that the distributions of X and X 2.

  1

  2 have the same shape and spread, and difer only (possibly) in their means.

  The Wilcoxon rank-sum test can be used to test the hypothesis H : µ µ This procedure is sometimes called

  1 2. the Mann-Whitney test or Mann-Whitney U Test.

  Description of the Test Let and be two independent random samples of sizes from the continuous populations X and X We wish

  1 2. to test the hypotheses : H : µ = µ

  1

  2 versus H : µ ≠ µ

  1

  1

2 The test procedure is as follows. Arrange all n + n

  1

  2 observations in ascending order of magnitude and assign ranks to them. If two or more observations are tied, then use the mean of the ranks that would have been assigned if the observations difered.

  

Let W be the sum of the ranks in the smaller sample

  1 (1), and defne W 2 to be the sum of the ranks in the other sample.

  Then, Now if the sample means do not difer, we will expect the sum of the ranks to be nearly equal for both samples after adjusting for the diference in sample size. Consequently, if the sum of the ranks difer

greatly, we will conclude that the means are not equal.

  Refer to table with the appropriate sample sizes n 1 and n , the critical value w can be obtained.

  2 α

  H : µ

  1 = µ

  2 is rejected, if either of the observed values w

  1 or w

  2 is less than or equal w

  α If H

  1 : µ

  1 < µ 2, then reject H if w

  1 ≤ w

  α For H

  1 : µ

  1 > µ 2, reject H if w

  2 ≤ w α.

LARGE-SAMPLE APPROXIMATION

  When both n and n are moderately large,

  1

  2 say, greater than 8, the distribution of W can

  1 be well approximated by the normal distribution with mean : and variance : Therefore, for n

  1 and n

  2 > 8, we could use : as a statistic, and critical region is :

    two-tailed test

    upper-tail test

    lower-tail test

  EXAMPLE A large corporation is suspected of sex-discrimination in the salaries of its employees. From employees with similar responsibilities and work experience, 12 male and 12 female employees were randomly selected ; their annual salaries in thousands of dollars are as follows :

  

Femal 22, 19, 20, 24, 23, 19, 18, 20,9 21, 23, 20, 21,6

es

  5

  8

  6

  7

  2

  2

  7

  6

  5

  7 Males 21,9 21,6 22,4 24,0 24,1 23,4 21,2 23,9 20,5 24,5 22,3 23,6 Is there reason to believe that there random samples come from populations with diferent distributions ? Use α 0,05

SOLUSI SE

  3 M 20,5

  10

  10 F 21,6

  10 F 21,6

  8 M 21,6

  7 M 21,2

  6 F 20,9

  5 F 20,7

  4 F 20,6

  H : f

  1 (x) f

  1 F 19,2

  F 18,7

  X GAJI PERINGKA T

  Gabungkan dan buat peringkat salaries :

  2 (x)

  1 (x) ≠ f

  1 : f

  2 (x)  APA ARTINYA?? random samples berasal dari populasi dengan distribusi yang sama H

  2 F 19,8 M 21,9

  12 M 22,3

  13 M 22,4

  14 F 22,5

  15 F 23,2

  16 M 23,4

  17 F 23,5

  18 M 23,6

  19 M 23,9

  20 M 24,0

  21 M 24,1

  22 M 24,5

  23 F 24,7

  24 C........ Andaikan, kita pilih sampel dari female, maka jumlah peringkatnya R R 117

1 F

  Statistic nilai dari statistic U adalah Grafk α 0,05 Z 1,91 hit maka terima H

  • 1,96 1,96

  ARTINYA ???

  KOLMOGOROV – SMIRNOV TEST

  The Kolmogorov-Smirnov Test (K-S) test is conducted by the comparing the hypothesized and sample cumulative distribution function. A cumulative distribution function is defned as : and the sample cumulative distribution function, S(x), is defned as the proportion of sample values that are less than or equal to x.

  The K-S test should be used instead of the to determine if a sample is from a specifed continuous distribution.

  To illustrate how S(x) is computed, suppose we have the following 10 observations : 110, 89, 102, 80, 93, 121, 108, 97, 105, 103.

  We begin by placing the values of x in ascending order, as follows : 80, 89, 93, 97, 102, 103, 105, 108, 110, 121.

  Because x 80 is the smallest of the 10

values, the proportion of values of x that are

S(x) = P(X ≤

  X x) less than or equal to 80 is : S(80) 0,1.

  80 0,1 89 0,2 93 0,3 97 0,4

  102 0,5 103 0,6 105 0,7 108 0,8 110 0,9 121 1,0 The test statistic D is the maximum- absolute diference between the two cdf’s over all observed values. The range on D is 0 ≤ D ≤ 1, and the formula is : where x = each observed value

   S(x) = observed cdf at x F(x) = hypothesized cdf at x Let X , X , …. , X denote the ordered (1) (2) (n) observations of a random sample of size n, and defne the sample cdf as : is the proportion of the number of sample values less than or equal to x.

The Kolmogorov – Smirnov statistic, is defned to be :

For the size α of type I error, the critical region

is of form :

  EXAMPLE 1 A state vehicle inspection station has been designed so that inspection time follows a

uniform distribution with limits of 10 and 15

minutes.

  A sample of 10 duration times during low and peak trafc conditions was taken. Use

the K-S test with α 0,05 to determine if the

sample is from this uniform distribution. The

time are : 11,3 10,4 9,8 12,6 14,8 13,0 14,3 13,3 11,5 13,6

  SOLUTION 1.

  H : sampel berasal dari distribusi Uniform (10,15) versus H : sampel tidak berasal dari distribusi

1 Uniform (10,15) 2.

  Fungsi distribusi kumulatif dari sampel : S (x) dihitung dari,

  Waktu Pengamata n x S(x) F(x)

  9,8 0,10 0,00 0,10 10,4 0,20 0,08 0,12 11,3 0,30 0,26 0,04 11,5 0,40 0,30 0,10 12,6 0,50 0,52 0,02 13,0 0,60 0,60 0,00 13,3 0,70 0,66 0,04 13,6 0,80 0,72 0,08 14,3 0,90 0,86 0,04 14,8 1,00 0,96 0,04

  Hasil Perhitungan dari K-S

  , untuk x 10,4

Dalam tabel , n 10 , α 0,05  D 0,41

10,0.05 f(D)

α P(D ≥ D )

D D 0,12 < 0,41 maka do not reject H

  EXAMPLE 2 Suppose we have the following ten observations 110, 89, 102, 80, 93, 121, 108, 97, 105,

  103 ; were drawn from a normal distribution, with mean µ 100 and standard-deviation σ 10. Our hypotheses for this test are

H : Data were drawn from a normal distribution,

with µ 100 and σ 10. versus H : Data were not drawn from a normal

  1 distribution, with µ 100 and σ 10.

  F(x) P(X ≤ x) SOLUTION x

  F(x)

  80

  89

  93

  97 102 103 105 108 110 121

  P(X ≤ 80) P(Z ≤ -2) 0,0228 P(X ≤ 89) P(Z ≤ -1,1) 0,1357

  P(X ≤ 93) P(Z ≤ -0,7) 0,2420 P(X ≤ 97) P(Z ≤ -0,3) 0,3821

  P(X ≤ 102) P(Z ≤ 0,2) 0,5793 P(X ≤ 103) P(Z ≤ 0,3) 0,6179 P(X ≤ 105) P(Z ≤ 0,5) 0,6915 P(X ≤ 108) P(Z ≤ 0,8) 0,7881

  P(X ≤ 110) P(Z ≤ 1,0) 0,8413 P(X ≤ 121) P(Z ≤ 2,1) 0,9821 x F(x) S(x) 80 0,0228 0,1 0,0772 89 0,1357 0,2 0,0643 93 0,2420 0,3 0,0580 97 0,3821 0,4 0,0179 102 0,5793 0,5 0,0793 =

  103 0,6179 0,6 0,0179 105 0,6915 0,7 0,0085 108 0,7881 0,8 0,0119 110 0,8413 0,9 0,0587 121 0,9821 1,0 0,0179 Jika α 0,05 , maka critical value, dengan n 10 diperoleh di tabel 0,409.

  Aturan keputusannya, tolak H jika D > 0,409 Karena H jangan ditolak atau terima H .

  Artinya, data berasal dari distribusi normal dengan µ 100 dan σ 10.

  LILLIEFORS TEST LILLIEFORS TEST In most applications where we want to test for normality, the population mean and the population variance are known. In order to perform the K-S test, however, we must assume that those parameters are known.

  The Lilliefors test, which is quite similar to the K-S test.

  The major diference between two tests is that, with the Lilliefors test, the sample mean and the sample standard deviation s are used instead of µ and σ to calculate F (x).

  EXAMPLE A manufacturer of automobile seats has a production line that produces an average of 100 seats per day.

  Because of new government regulations, a new safety device has been installed, which the manufacturer believes will reduce average daily output. A random sample of 15 days’ output after the installation of the safety device is shown: 93, 103, 95 , 101, 91, 105, 96, 94, 101, 88, 98, 94, 101, 92, 95 The daily production was assumed to be normally distributed.

  Use the Lilliefors test to examine that assumption, with α 0,01

  SOLUSI Seperti pada uji K-S, untuk menghitung S (x) urutkan, sbb : x S(x)

  88 1/15 0,067 91 2/15 0,133 92 3/15 0,200 93 4/15 0,267 94 6/15 0,400 95 8/15 0,533 96 9/15 0,600 98 10/15 0,667

  101 13/15 0,867 103 14/15 0,933 105 15/15 1,000 Dari data di atas, diperoleh dan s 4,85.

  Selanjutnya F(x) dihitung sbb :

  X F(x)

  88

  91

  92 .

  . . .

  101 103 105 Akhirnya, buat rangkuman sbb : Tabel, nilai kritis dari uji Lilliefors : α 0,01 , n 15 D tab

  0,257 maka terima H

  x F(x) S(x)

  88 0,0401 0,067 0,0269 91 0,1292 0,133 0,0038 92 0,1788 0,200 0,0212 93 0,2358 0,267 0,0312 94 0,3050 0,400 0,0950 95 0,3821 0,533 0,1509 D 96 0,4602 0,600 0,1398 98 0,6255 0,667 0,0415

  101 0,8238 0,867 0,0432 103 0,9115 0,933 0,0215 105 0,9608 1,000 0,0392

TEST BASED ON RUNS TEST BASED ON RUNS

  Usually a sample that is taken from a population should be random.

  The runs test evaluates the null hypothesis H : the order of the sample data is random The alternative hypothesis is simply the negation of H

0. There is no comparable parametric test to evaluate this null hypothesis.

  The order in which the data is collected must be retained so that the runs may be developed.

  DEFINITIONS :

  1. A run is defned as a sequence of the same symbols.

  Two symbols are defned, and each sequence must contain a symbol at least once.

  2. A run of length j is defned as a sequence of j

observations, all belonging to the same group,

that is preceded or followed by observations belonging to a diferent group.

  For illustration, the ordered sequence by the sex of the employee is as follows : F F F M F F F M M F F M M M F F M F M M M M M F For the sex of the employee the ordered

sequence exhibits runs of F’s and M’s.

  The sequence begins with a run of length three, followed by a run of length one, followed by another run of length three, and so on. The total number of runs in this sequence is 11. Let R be the total number of runs observed in an ordered sequence of n

  1 + n 2 observations, where n 1 and n are the respective sample sizes. The possible values

  2 of R are 2, 3, 4, …. (n + n ).

  1

2 The only question to ask prior to performing the test is,

  Is the sample size small or large? We will use the guideline that a small sample has n 1 and n 2 less than or equal to 15. In the table, gives the lower r L and upper r U values of the distribution f(r) with α/2 0,025 in each tail. f(r) r

  AR

  r r L U

  If n or n exceeds 15, the sample is

  1

  2 considered large, in which case a normal

approximation to f(r) is used to test H versus

  H 1. The mean and variance of R are determined to be normal approximation

  

THE KRUSKAL - WALLIS H

THE KRUSKAL - WALLIS H

  

TEST

TEST

The Kruskal – Wallis H test is the nonparametric equivalent of the Analysis of Variance F test.

  It test the null hypothesis that all k populations possess the same probability distribution against the alternative hypothesis that the distributions difer in location – that is, one or more of the distributions are shifted to the right or left of each other. The advantage of the Kruskall – Wallis H test over the F test is that we need make no assumptions about the nature of sampled populations. A completely randomized design specifes that we select independent random samples of n , n , …. n

  1 2 k observations from the k populations. To conduct the test, we frst rank all : n n + n + n + … +n observations and compute

  1

  2 3 k the rank sums, R 1 , R 2 , …, R k for the k samples.

  The ranks of tied observations are averaged in the same manner as for the WILCOXON rank sum test.

  Then, if H is true, and if the sample sizes n 1 , n 2 , …, n each equal 5 or more, then the test statistic is k defned by : will have a sampling distribution that can be approximated by a chi-square distribution with (k-1) degrees of freedom.

  Large values of H imply rejection of H .

  Therefore, the rejection region for the test is , where is the value that located α in the upper tail of the chi- square distribution.

  The test is summarized in the following :

  KRUSKAL – WALLIS H TEST FOR COMPARING k POPULATION PROBABILITY DISTRIBUTIONS

H : The k population probability distributions are identical

H : The k population probability distributions are identical

H : At least two of the k population probability

  1 distributions distributions difer in location difer in location

1 H : At least two of the k population probability

  Test statistic : Test statistic : where, where, n Number of measurements in sample i i n Number of measurements in sample i i

  R Rank sum for sample i, where the rank of each i

  R Rank sum for sample i, where the rank of each i measurementis computed according to its relative measurementis computed according to its relative magnitude in the totality of data for the k samples. magnitude in the totality of data for the k samples. n Total sample size n + n + … +n

  

1

2 k n Total sample size n + n + … +n

  

1

2 k Rejection Region : with (k-1) dof Rejection Region : with (k-1) dof Assumptions : Assumptions :

  1. The k samples are random and independent

  1. The k samples are random and independent

  2. There are 5 or more measurements in each

  2. There are 5 or more measurements in each sample sample

  3. The observations can be ranked

  3. The observations can be ranked

No assumptions have to be made about the shape

No assumptions have to be made about the shape

of the population probability distributions. of the population probability distributions.

  Example Independent random samples of three diferent brands of magnetron tubes (the key components in microwave ovens) were subjected to stress testing, and the number of hours each operated without repair was recorded. Although these times do not represent typical life lengths, they do indicate how well the tubes can withstand extreme stress.

  The data are shown in table (below). Experience has shown that the distributions of life lengths for manufactured product are often non normal, thus violating the assumptions required for the proper use of an ANOVA F test. Use the K-S H test to determine whether evidence exists to conclude that the brands of magnetron tubes tend to difer in length of life under stress. Test using α 0,05

  BRAND A B C

  36 49 71 48 33 31 5 60 140 67 2 59 53 55 42 Lakukan ranking/peringkat dan jumlahkan peringkat dari 3 sample tersebut.

  H : the population probability distributions of length of life under

stress are identical for the three brands of magnetron tubes.

versus H

  1 : at least two of the population probability distributions difer in location

  Solusi A peringkat B peringkat C peringkat

  36 5 49 8 71 14 48 7 33 4 31 3 5 2 60 12 140 15 67 13 2 1 59 11 53 9 55 10 42 6

   R 1 = 36 R 2 = 35 R 3 = 49 Test statistic : H ???

  f(H) H

  1,22 5,99

COMPARISON OF POPULATION PROPORTIONS

  Given X ~BIN(n , p ) and X ~BIN(n , p )

  1

  1

  1

  2

  2

  2 X

  X

  1

  ˆ ˆ

  p ; p

2 Statistics :

   

  1

  2 n n

  1

2 Are defned to be the sample proportions.

  ˆ ˆ ˆ ˆ E ( p p ) E ( p ) E ( p )

    

  1

  2

  1

  

2

p

   p

  1

2 Assume, that X and X are independent;

  1

  2 ˆ ˆ ˆ ˆ

  Var ( p p ) Var ( p ) Var ( p )   

  1

  2

  1

  2 p ( 1 p ) p ( 1 p )

  

 

  1

  1

  2

  2   n n

  1

  2 For sufciently large n and n the standardized

  1

  2 ˆ ˆ ( p p ) ( p p )

  statistic :   

  1

  2

  1

  2 p ( 1 p ) p ( 1 p )

   

  1

  1

  2

  2  n n

  1

  2 The (1-α)100% CI : p ( 1 p ) p ( 1 p )

   

  1

  1

  2

  2 ˆ ˆ ( p p ) z

    

  1

  2

  2 

n n

  1

  2 As p and p UNKNOWN, approximate (1-α)100% CI

  1

  2

  for (p -p ) :

  1

  2 ˆ ˆ ˆ ˆ p ( 1 p ) p ( 1 p )

   

  1

  1

  2

  2 ˆ ˆ ( p p ) z

    

  1

  2

  2  n n

  1

  2 In the testing situation, H : p p p ( p unknown )

  o

  1

2 Versus

  H

  1 p p p p

  1

  2

  1

  2 p p

  1

  2 RR : Z zRR : Z z  

   

  RR : Z z

  2  los test

   

  ˆ ˆ

  p p 12 Z

   Test statistic :

  p (

  1 p ) p ( 1 p )  

  

  n n 1 2 X

  X

  1

  2 ˆ p

  

  The unknown common value of p is estimated by :

  n n

  1

  2 EXAMPLE Members of the Department of statistics at Iowa State Union collected the following data on grades in an introductory business statistics course and an introductory engineering statistics course.

  Course #Students #A grades B.Stat 571

  82 E.Stat 156

  25 H : p p ; The proportion of A grades

  o

  1

  2 in two courses is equal.

  Vs H : p ≠p

  1

  1

  2

  82

  25 ˆ ˆ p , 1436 p , 1603

     

  1

  2 571 156

  82

  25  ˆ p , 1472

    571 156  , 1436 , 1603

   Z

  

  1

  1 , 1472 ( , 8528 )( )  571 156

  Z ,

  52  

  The p-value is 2P(Z≤-0,52) 0,6030 If α 5% < p-value H would not be rejected

  o

  Proportion of A’s does not difer signifcantly in the two courses. EXERCIS E An insurance company is thinking about ofering discount on its life insurance policies to non smokers. As part of its analysis, it randomly select 200 men who are 50 years old and asks them if they smoke at least one pack of cigarettes per day and if they have ever sufered from heart diseases.

  The results indicate that 20 out of 80 smokers and 15 out of 120 non smokers sufer from heart disease. Can we conclude at the 5% los that smokers have a higher incidence of heart disease

  DATA than non smokers ? Solution:

  th th

  berumur 50 berumur 50 perokok bukan perokok menderita penyakit menderita penyakit

  JANTUNG JANTUNG Jelas Data Qualitative

  H : p H : pp1 1 2

   p  vs o 1 2

  ˆ ˆ ( p p ) 12

  z

   Test statistic :

  1

  1 ˆ ˆ

  p q ( )

  

  n n 1 2 RR : z z z 1 , 645 .

    

  ,

  05  z tab

  20

  15 ˆ ˆ

  p ,

  25 p , 125    

  Sample proportion : ; 1 2 80 120

  20

  15

  35 

  ˆ

  p , 175

     Pooled proportion estimate :

  80 120 200 

  Value of the test statistic:

  z z cal hit

  ˆ ˆ p -p (0,25-0,125) 1 2 z= =

  1

  1

  1

  1 ˆ ˆ

  • pq( + ) 0,175(0,825)( ) n n 80 120
  • 1 2

      z 2 , 28 z reject H    cal tab o

      Test statistic, is normally distributed We can calculate p-value p-value P ( z

      2 , 28 ) , 0113 1 , 13 %   

      Reject H

      o SOAL-SOAL

      1. Diberikan pmf dari variabel random X sbb: x 0 1 2 3

      2

      p(x) 0 k k 3k Tentukan k sehingga memenuhi sifat dari pmf!

      Solusi: Ada dua sifat pmf, yaitu :

      p ( x ) x  0  p (x )

      1 

      2 p ( x ) k k 3 k

      1      

      2 3 k

      2

      1  k   1 (

      3 k 1 )( k 1 ) k , k

      1       

      3 Untuk

      k 1 p ( 1 )

      1       p ( 2 )

      1   

      1 k

      1   k

       Dengan demikian tidak memenuhi.

      3 Selanjutnya untuk

      1 k

      

      dapat diperiksa ternyata pada kondisi ini memenuhi

      3 sifat pmf.

      Jadi nilai

      In a public opinion survey, 60 out of a sample of 100 high-income voters and 40 out of a sample of 75 low-income voters supported a decrease in sales tax.

      (a) Can we conclude at the 5% los that the proportion of voters favoring a sales tax decrease difers between high and low-income voters?

      (b) What is the p-value of this test? Solution:

      (c) Estimate the diference in proportions, with 99% confdence!

      H : ( p )  po

      1

      2

      vs

      H : ( p )  p

      1

      1

      2 RR : z 1 ,

      96  ˆ ˆ ( p p )

      

      1

      2 z

      

      Test statistic :

      1

      1 p ˆ q ˆ ( )

       n n

      60

      40

      p ˆ ,

      6 ; p ˆ ,

      53 1     2 100

      75

      60 40 100 

      ˆ

      p , 571

         100 75 175

       ˆ ˆ

      q

      1 p , 429   

      ( , 60 , 53 ) 

      z

      ,

      93 cal  

      1

      1 , 571 ( , 429 )( )

       100

      75

    • 1,96 1,96

      (a) Conclusion : don not reject H

      o (b) p-value 2P(z > 0,93) 2(0,1762) 0,3524.

      ( , 6 )( , 4 ) ( , 53 )( , 47 ) ˆ ˆ ˆ ˆ

      p q p q 1 1 2 2

      ( , 60 , 53 ) 2 , 575 (c)   

      ˆ ˆ ( p p ) z 1     2 2

      100

      75

      n n 1 2

      , 07 , 195  

      The diference between the two-proportions is estimated to lie between -0,125 and 0,265

      TEST on MEANS WHEN THE OBSERVATIONS ARE PAIRED TESTING THE PAIRED DIFFERENCES

      Let (X , Y ), (X , Y ) … (X , Y ) be the n pairs, where

      1

      1

      2 2 n m

      (X , Y ) denotes the systolic blood pressure of the i th

      i i subject before and after the drug.

      It is assumed that the diferences D , D , …, D 2

      1 2 n E D

       Var D

        

        i i i D

      constitute independent normally distributed RV such

      H : vs H :    

      that:  

      o D o

    1 D o

      and

      D

      

      o T

      

      S n D

    1 D

      2

      2 i

    TEST STATISTIC: 

      S ( D D )

      an  

      D D i

      

       n

      1 

      n

      d Rejection criteria for testing hypotheses on means when the observation are paired

      Null hypothesis Value test statistic under H o Alternative hypothesis Rejection criteria

      1 1 ,

      1

      :

        

      H

      :

        

      H

       D o

      1 ,   n t t

      

       n t t

       1 , 1  

      1    n t t

      2

      :

      Reject H

        

      H

       D o

      1 , 2   n t t

      

    o

      when or when

      D o o H

        

       Reject H

       

      : n s d t d o

    1 D o

      when Reject H

      

    o

      when

      

    o

      A paired diference experiment is conducted to compare the starting salaries of male and female college graduates who fnd jobs. Pairs are formed by choosing a male and female with the same major and similar GRADE-POINT-AVERAGE. Suppose a random sample of ten pairs is formed in this manner and starting annual salary of each person is recorded. The result are shown in table. Test to see whether there is evidence that the mean starting salary, μ , for males exceeds the mean starting

      1 salary, μ , for female. Use α 0,05.

      2

      Pair Male Female Diference (male- female)

      1 $ 14.300 $13.800 $ 500 2 16.500 16.600 -100 3 15.400 14.800 600 4 13.500 13.500 5 18.500 17.600 900 6 12.800 13.000 -200 7 14.500 14.200 300 8 16.200 15.100 1.100 9 13.400 13.200 200 10 14.200 13.500 700 Solution :

      ) ( :

      o

      0 1,83

      10 434 61 , 400   t

      2

      D D S S 91 ,

      2   

      . 888 89 , 188 434 61 ,

      400     n D x d i D

      1,833

      0.05,9

      ; t

      α

      if : t > t

      RR : reject H

      2

       ;