THE MULTINOMIAL DISTRIBUTION AND ELEMENTARY TESTS FOR CATEGORICAL DATA

THE MULTINOMIAL DISTRIBUTION
AND ELEMENTARY TESTS FOR CATEGORICAL
DATA
It is useful to have a probability model for the number of
observations falling into each of k mutually exclusive
classes. Such a model is given by the multinomial random
variable, for which it is assumed that :
1. A total for n independent trials are made
2. At each trial an observation will fall into exactly one of k
mutually exclusive classes
3. The probabilities of falling into the k classes are
p1, p2,……., pk where pi is the probability of falling into
class i, i = 1,2,…k
These probabilities are constant for all trials, with

1

If k =2 , we have the Binomial distribution.
Let us defne :
X1 to be the number of type 1 outcomes in the n
trials,

X2 to be the number of type 2 outcomes,
.
.
Xk to be the number of type k outcomes.
As there are n trials,
2

The joint probability function for these RV can be
shown to be :

where

For k=2, the probability function reduces to

which is the Binomial probability of
- successes
in n trials, each with probability of success
.
3


EXAMPLE
A simple example of multinomial trials is the tossing of
a die n times. At each trial the outcome is one of the
values 1, 2, 3, 4, 5 or 6. Here k=6. If n=10 , the
probability of 2 ones, 2 twos, 2 threes, no fours, 2 fves
and 2 sixes is :

To testing hypotheses concerning the
hypothesis for this example,

4

, the null

states that the die is fair.
vs
is false
which, of course, means that the die is not fair

The left-hand side can be thought of as the sum of the

terms :

Which will be used in testing
versus
where the
 

5

are hypothesized value of the

In the special case of k=2, there are two-possible
outcomes at each trial, which can be called success
and failure.
A test of
is a test of the same
null hypothesis
(
).Success
The following

are observed
an
Failure
Total
expected
values for this situation :
Expected
n
Observed

6

X

n-X

n

For an α-level test, a rejection region for testing
versus

is given by
We know that
Hence,
By defnition,
We have,

, and using
if and only if

7

GOODNESS - of – FIT TESTS
Thus far all our statistical inferences have involved population
parameters like : means, variances and proportions. Now we make
inferences about the entire population distribution. A sample is taken,
and we want to test a null hypothesis of the general form ;
H0 : sample is from a specifee eistribution
The alternative hypothesis is always of the form
H1 : sample is not from a specifee eistribution
A test of H0 versus H1 is called a goodness-of-fi iesi.

Two tests are used to evaluate goodness of ft :
1. The
test, which is based on an approximate
statistic.
2. The Kolmogorov – Smirnov (K-S) test.
This is called a non parametric test, because it uses a test statistic
that
makes no assumptions about distribution.
The
test is best for testing discrete distributions, and the K-S test is
best on continuous distributions.
8

Gooeness of Fit ??
A goodness of ft test attempts to determine if a conspicuous
discrepancy exists between the observed cell frequencies and
those expected under H0 .
A useful measure for the overall discrepancy is given by :

where O and E symbolize an observed frequency and the

corresponding expected frequency.
The discrepancy in each cell is measured by the squared
diference between the observed and the expected frequencies
divided by the expected frequency.
9

The

statistic was originally proposed by Karl Pearson (1857 –

1936) , who found the distribution for large n to be approximately a
distribution with degrees of freedom = k-1.
Due to this distribution, the statistic is denoted by

and is called

Pearson’s
statistic for goodness of ft .
Null hypothesis :


H0 : pi = pio ;

i = 1,2, ….k

H1 : at least one pi is not equal to its
specifed value.

Test statistic :

Rejection Region :
10

distribution with d.f = (k-1)

Chi – square statistic frst proposed by Karl Pearson in 1900,
begin with the Binomial case.
Let X1 ~ BIN (n, p1) where 0 < p1 < 1.
According to the CLT :

for large n, particularly when np1 ≥ 5 and n(1- p1) ≥ 5.

As you know, that

Q1 = Z2 ≈ χ2 (1)

If we let X2 = n - X1 and p2 = 1 - p1 ,

Because,
Hence,

11

Pearson the constructed an expression similar to Q 1 ; which involves
X1 and
X2 = n - X1 , that we denote by Qk-1 , involving X1 , X2 , ……., Xk-1 and
Xk = n - X1 - X2 - …….- Xk-1

Hence,

or


12

EXAMPLE
We observe n = 85 values of a – random variable X that is
thought to have a Poisson distribution, obtaining :
x

0

4

1
5

2

3

Frequen
41

29
9
4
The sample average
is the
estimate of
cy
1 appropriate
1

λ = E(X)
It is given by

The expected frequencies for the frst three cells are : npi
o,1,2
85 p0 = 85 P(X=0) = 85 (0,449) = 38,2
85 p1 = 85 P(X=1) = 85 (0,360) = 30,6
85 p2 = 85 P(X=2) = 85 (0,144) = 12,2
13

,

i=

The expected frequency for the cell { 3, 4, 5 } is :
85 (0,047) = 4,0 ;

WHY ???

The computed Q3 , with k=4 after combination,

 no reason to reject H0
H0 : sample is from Poisson distribution
vs H1 : sample is not from Poisson distribution
14

EXERCISE
The number X of telephone calls received each minute at a
certain switch board in the middle of a working day is thought
to have a Poisson distribution.
Data were collected, and the results were as follows :
x
frequenc
y

0
4
40
3

5

1

66
1

6
41

2

3
28

9

Fit a Poisson distribution. Then fnd the estimated expected
value of each cell after combining {4,5,6} to make one cell.
Compute Q4 , since k=5, and compare it to
Why do we use three degrees of freedom?
Do we accept or reject the Poisson distribution?
15

CONTINGENCY TABLES
In many cases, data can be classifed into categories on the
basis of two criteria.
For example, a radio receiver may be classifed as having
low, average, or high fdelity and as having low, average, or
high selectivity; or graduating engineering students may be
classifed according to their starting salary and their gradepoint-average.
In a contingency table, the statistical question is whether the
row criteria and column criteria are independent.
The null and alternative hypotheses are
H0 : The row ane column criteria are ineepeneent
H1 : The row ane column criteria are associatee
Consider a contingency table with r rows and c columns. The
number of elements in the sample that are observed to fall
into row class i and column class j is denoted by
16

The row sum for the ith row is

And the column sum for jth column is

The total number of observations in the entire table is

The contingency table for the general case is
given ON THE NEXT SLIDESHOW :
17

The General r x c Contingency Table
X11
X12 …................
X1j .................... X1c
X21
X22 …………….X2j
…………… X2c
.

R1
R2

Ri

.
.
.

Xi1
Xi2 ................... Xij
………………Xic
.
.
.
.

18

Xr1

Xr2 ……………Xrj

Rr

There are several probabilities of importance associated
with the table.
The probability of an element’s being in row class i and
column class j in the population is denoted by pij
The probability of being in row class i is denoted by pi• ,
and the probability of being in column class j is denoted
by p•j
Null and alternative hypotheses regarding the
independence of these probabilities would be stated as
follows :
for all pairs (i , j)
versus
is false
As pij , pi• , p•j are all unknown, it is necessary to
estimate these probabilities.
19

and

under the hypothesis of independence,
, so
would be estimated
by

The expected number of observations in cell (i,j) is
Under the null hypothesis,
, the estimate of
is
The chi-square statistic is computed as
20

The actual critical region is given by

If the computed

gets too large,

namely, exceeds
we reject the hypothesis
that the two attributes are independent.

21

EXAMPLE

Ninety graduating male engineers were classifed by
two attributes : grade-point average (low, average,
high) and
initial salary (low, high).
The following results
Graee-Point
were obtained.
Average
Salary

Low
High

Average

Low

15
7

18

5
23

22

20
30

40

High

22

40
50

90

SOLUTION

;
;
;

23

APA ARTINYA ???

EXERCISES
1. Test of the fdelity and the selectivity of 190 radios
produced. The results shown in the following
table :
Fieelity
7
35
Selectivity

12
Low
59

Average
15
13

Low
31

Average

High

18
0

High

Use the 0,01 level of signifcance to test the null
hypothesis that fdelity is independent of
selectivity.
24

2. A test of the quality of two or more multinomial
distributions can be made by using calculations that
are associated with a contingency table. For
example, n = 100 light bulbs were taken at random
from each of three brands and were graded as A, B,
C, or D.
Bran
e

A

1

27

2
3

25

Graee
B

C

Totals

42
10

21

100

D

23
25

39
13

28

36
19

72
69

117
42

100
100
23
300

Clearly, we want to test the equality of three
multinomial distributions, each with k=4 cells. Since
under

the probability of falling into a particular

grade category is independent of brand, we can
test this hypothesis by computing
comparing it with
Use

26

.

.

and

ANALYSIS OF VARIANCE
The Analysis of Variance
ANOVA
(AOV)
is generalization of the two sample t-test, so that the means of
k > 2 populations may be compared
ANalysis Of VAriance, frst suggested by Sir Ronald Fisher,
pioneer of the theory of design of experiments.
He is professor of genetics at Cambridge University.
The F-test, name in honor of Fisher
19

27

The name Analysis of Variance stems from
the somewhat surprising fact that a set of
computations on several variances is used
to test the equality of several means

IRONICALLY
28

The term ANOVA appears to be a misnomer,
since the objective is to analyze diferences
among the group means

The ANOVA
deals with
means, it
may appear
to be
misnamed

ANOV
A

The
terminology
of ANOVA can
be confusing,
this
procedure is
actually
concerned
with levels of
means

The ANOVA belies its name in that it is not
concerned with analyzing variances but rather
with analyzing variation in means

29

DEFINITION:
ANOVA, or one-factor analysis of
variance, is a procedure to test
the hypothesis that several
populations
have
the
same
means.

ANOV
A
FUNCTION:
Using analysis of variance, we will
be able to make inferences about
whether our samples are drawn
from populations having the
same means
30

INTRODUCTION
The Analysis of Variance (ANOVA) is a statistical technique used to
compare the locations (specifcally, the expectations) of k>2 populations.
The study of ANOVA involves the investigation of very complex statistical
models, which are interesting both statistically and mathematically.
The frst is referred to as a one-way classifcation or a completely
randomized design.
The second is called a two-way classifcation or a randomized block
design.
The basic idea behind the term “ANOVA” is that the total variability of all
the observations can be separated into distinct portions, each of which
can be assigned a particular source or cause.
This decomposition of the variability permits statistical estimation and
tests of hypotheses.

31

Suppose that we are interested in k populations, from
each of which we sample n observations. The
observations are denoted by:
Yij , i = 1,2,…k ; j = 1,2,…n
where Yij represents the jth observation from
population i.

A basic null hypothesis to test is :
H0 : µ1 = µ2 = … =µk
that is , all the populations have the same
expectation.
The ANOVA method to test this null hypothesis is
based on an F statistic.
32

THE COMPLETELY RANDOMIZED DESIGN
WITH EQUAL SAMPLE SIZES
First we will consider comparison of the true expectation of
k > 2 populations, sometimes referred to as the k –
sample problem.
For simplicity of presentation, we will assume initially that
an equal number of observations are randomly sampled
from each population. These observations are denoted by:
Y11 , Y12 , …… , Y1n
Y21 , Y22 , …… , Y2n
.
.
.

Yk1 , Yk2 , …… , Ykn
33

where Yij represents the jth observation out of the n
randomly sampled observations from the ith population.
Hence, Y12 would be the second observation from the frst
population.

In the completely randomized design, the observations
are assumed to :
1. Come from normal populations
2. Come from populations with the same variance
3. Have possibly diferent expectations, µ1 , µ2 , … , µk
These assumptions are expressed mathematically as
follows :
Yij ~ NOR (µi , σ2) ; i = 1,2,...k
(*)
j = 1,2,…n
This equation is equivalent to………….
34

Yij = µi + εij , with εij ~ NID (0, σ2)
Where N represents “normally”, I represents “
independently” and D represents “ distributed”.
The 0 means
for all pairs of indices i
and j, and σ2 means that Var (
) = σ2 for all such
pairs.
The parameters µ1 , µ2 , … ,µk are the expectations of
the k populations, about which inference is to be
made.
The initial hypotheses to be tested in the completely
randomized design are :
H0 : µ1 = µ2 = … =µk

versus
H1 : µi ≠ µj for some pair of indices i ≠ j (**)
35

The null hypothesis states that all of the k populations have the same
expectation. If this is true, then we know from equation (*) that all of
the Yij observations have the same normal distribution and we are
observing not n observations from each of k populations, but nk
observations, all from the same population.
The random variable Yij may be written as :

where,
defning,

So,

36

Hence,

, with

and
The hypotheses in equation (**) may be restated as :
VS
(***)
The observation

37

has expectation,

The parameters
are diferences or
deviations from this common part
of the individual
population expectations
. If all of the
are equal (say to
), then
. In this case all of the deviation
are zero, because :

Hence, the wall hypothesis in equation (***) means that,
, these expectations consist only of the common part
.
The total variability of the observations :
, where
observations.
It can be shown, that :
38

, is the means of all of the

The notation
represents the average of the observations
from the ith population ; that is

The last equation, is represented by :
SST = SSA + SSE
where SST represents the total sum of squares, SSA
represents the sum of squares due to diferences among
populations or ireaimenis, and SSE represents the sum of
squares that is unexplained or said to be “ due to error”.
The result of ANOVA, usually reported in an analysis of
variance table.

ANOVA table……….
39

ANOVA Table for the Completely Raneomizee Design
with Equal Sample Sizes :
Source of
Variation

Degrees of
Freeeom

Sum of
Squares

Mean Square

Among
populations
or
treatments

k-1

SSA

 

Error

k(n-1)

SSE

Total
kn-1
SST
For an
-level test, a reasonable critical region for the
alternative hypotheses in equation (**) is

40

F

THE COMPLETELY RANDOMIZED DESIGN
WITH UNEQUAL SAMPLE SIZES
In many studies in which expectation of k>2 populations are
compared, the samples from each population are not ultimately of
equal size, even in cases where we attempt to maintain equal
sample size. For example, suppose we decide to compare three
teaching methods using three classes of students.
The teachers of the classes agree to teach use one of the three
teaching methods.
The plan for the comparison is to give a common examination to all
of the students in each class after two months of instruction.

41

Even if the classes are initially of the same size, they may difer
after two months because students have dropped out for one
reason or another. Thus we need a way to analyze the k-sample
problem, when the samples are of unequal sizes.

In the case of UNEQUAL SAMPLE SIZE, the observations are
denoted by :

.
.
.
where,
represents the jth observation from the ith population.
For the ith population there are ni observations.
In the case of equal sample sizes, ni = n for i = 1,2,…,k.
The model assumptions are the same for the unequal sample size
case as for the equal sample size case. The
are assumed to :
1. Come from normal populations
2. Come from populations with the same variance
3. Have possibly diferent expectations, µ1, µ2, …, µk
42

These assumptions are expressed formally as
= 1, 2, …, k
j = 1, 2, …, ni
or as

;i

Yij = µi + εij , with εij ~ NID (0, σ2)

The frst null and alternative hypotheses to test are exactly the
same as those in the previous section-namely :
H0 : µ1 = µ2 = … =µk
versus
H1 : µi ≠ µl for some pair of indices i ≠ l
The model for the completely randomized design may be presented
as :
with
and εij ~ NID (0, σ2)
In this case the overall mean,
, is given by
where
is the total number of
observations.
43

Here
is a weighted average of the population expectations
, where the weights
are
, the proportion of observations coming from the ith
population.
The hypotheses, can also be restated as
versus
for at least one i.
The observation Yij has expectation

,

If H0 is true, then
, hence all of the
have a
common distribution.
Thus,
, under H0. The total variability of the
observations is again partitioned into two portions by

44

or
SST = SSA +SSE,
here

where

As before :
represents the average of the observations from
the ith population.
N is the total number of observations
is the average of all the observations

Again,

SST represents the total sum of squares.
SSA represents the sum of squares due to diferences
among populations or
treatments.
SSE represents the sum of squares due to error.

45

The number of Degrees of Freedom for :
DEGREE OF FREEDOM

TOTAL = TREATMENTS +
ERROR
(N-1) =
46

(k-1)

+ (N-k)

The mean square among treatments and the mean square for error
are equal to appropriate sum of squares divided by corresponding
dof.
That is,
It can be shown that MSE is an unbiased estimate of σ2 , that is :
, similarly ;

Under hypothesis,
dof.

has an F-distribution with (k-1) and (N-k)

Finally, we reject the null hypothesis at signifcance level α if :

47

ANOVA TABLE
for the Completely Raneomizee Design with
unequal sample sizes
SOURCE

eof

SS

Among
Populations
or
Treatments

k-1

SSA

ERROR

N-k

SSE

TOTAL

N-1

SST

Sometimes, SSA be denoted SSTR
SSE be denoted SSER
SST be denoted SSTO
48

MS

F

SUMMARY NOTATION FOR A CRD
POPULATIONS
(TREATMENTS)
1
………….....
MEAN
VARIANC
E

µ1
…………………

2
k

µ2
µk

µ3

INDEPENDENT RANDOM
SAMPLES
……………..
1
2
……………

SAMPLE SIZE
SAMPLE
TOTALS
SAMPLE MEANS
49

3

n1
n2
…................
T1
T2
…................
…………….

3
k

n3
T3

nk
Tk

ANOVA F-TEST FOR A CRD
with k treatments

H0 : µ1 = µ2 = … =µk
(i.e., there is no diference in the treatment means)
versus

Ha : At least two of the treatment means
difer.
Test Statistic :
Rejection Region :
50

PARTITIONING OF THE TOTAL SUM OF SQUARES
FOR THE COMPLETELY RANDOMIZED DESIGN
SUM OF SQUARES
FOR TREATMENTS
(SSTR)

TOTAL SUM OF
SQUARES
(SSTO)

SUM OF SQUARES
FOR ERROR (SSER)

51

FORMULAS FOR THE CALCULATIONS IN THE CRD

SSTR = sum of squares for treatments
= (sum of squares of treatment totals with each square
divided by number of observations for
that treatment) - CM
=

52

where k is the total of treatments and N is the total
number of observations.

53

EXAMPLE
For group of students were subjected to diferent teaching
techniques and tested at the end of a specifed period of time.
As a result of drop outs from the experimental groups (due to
sickness, transfer, and so on) the number of students varied
from group to group. Do the data shown in table (below)
present sufcient evidence
to indicate
a diference in the
1
2
mean achievement
3 for the4four teaching techniques ??
DATA65
FOR EXAMPLE
1
75
59
78
67
62
83
54

76

94
87
89
73
80
79
88
81
69

69
83
81
72
79
90

SOLUTION

The mean squares for treatment and error are

55

The test statistic for testing

H0 : µ1 = µ2 = µ3= µ4

The critical value of F for α = 0.05 is

reject H0

CONCLUDE ?????
56

is

THE RANDOMIZED BLOCK DESIGN
The randomized block design implies the
presence of two quantitative independent
variables, “blocks” and “treatments”
Consequently, the total sum of squares of
deviations of the response measurements
about their mean may be partitioned into
three parts, the sum of squares for blocks,
treatments and error.

57

CRD

SSTR

SST
O

RBD

SSTR

SSBL
SSER
SSER

58

Defnition :
A randomized block design is a design
devised to compare the means for k
treatments utilizing b matches blocks of k
experimental unit each. Each treatment
appears once in every block.
The observations in a RBD can be represented
by an array of the following type :

.
.
.

59

As before, the expectation of Yij the ith observation
from the jth treatment (population), was given by :
In this section / RBD, the assumption about Yij is
that :
(i)
; i = 1,2, … , t
j = 1, 2, … , b
with

and

The observation Yij is said that to be the
observation from block j on treatment i.
As equation (i), it’s assumed that there are t
diferent treatments and b blocks.
60

Hence,
overall efect
block efect
treatment efect
One task is to test the null hypothesis
which states that there are no treatment
diferences.

61

Here, the ith treatment mean is :
The jth block mean is :
And the overall mean is :
Expression above can be abbreviated as
SSTO = SSTR + SSBL +SSER

62

The degrees of freedom are partitioned as follows
:
eof TO = eof TR + eof BL + eof
ER
bt – 1 = (t-1) + (b-1) +
(b-1)(t-1)

If the null hypothesis of no treatment
diferences given in
is true,
. Then both MSTR and MSER are unbiased
estimate of
.
63

It can be further shown, under

Hence, using an

:

level test, we reject

in favor of

For reasons analogous, a test of :
versus
, can be carried out using the
critical region :

64

if :

Data Structure of a RBD with b blocks ane k
treatments
TREATMENTS
1
2
3
………………… k

B

1

L

2

O

.
.

C

.
.
b

K

………………..
………………..
.
.
.
.
.
.
.
.
.
.
…………………

Treatment
means

65

.
.
.
.
.

Block means

GENERAL FORM OF THE RANDOMIZED
BLOCK DESIGN (TREATMENT i IS
DENOTED BY Ai)
BLOCK
2
………………………

1
b

A1

A1

A1

A2

A2

A2

A3

A3

A3

Ak

Ak

.
.
.

Ak

Although we show thetreatments in order within the blocks, in
practice they would be assigned to the experimental units in a
random order (thus the name randomized block design)
66

FORMULAS FOR CALCULATIONS IN RBD

where,
N=
ioial number of observaiions
b=
number of blocks
k = number of ireaimenis
67

ANOVA Summary Table For RBD

68

SOURCE

DOF

SS

MS

Treatme

k-1

SSTR

MSTR

nts

b-1

SSBL

MSBL

Blocks

N-k-

SSER

MSER

Error

b+1

TOTAL

N-1

SSTO

F

EXAMPLE
A study was conducted in a large city to compare the
supermarket prices of the four leading brands of cofee at the
end of the year. Ten supermarkets in the city were selected,
and the price per pound was recorded for each brand.
1.
Set up the test of the null hypothesis that the mean
prices of the four brands sold in the city were the same at
the end of the year. Use α = 0,05
2.
Calculate the F statistic
3.
Do the data provide sufcient evidence to indicate a
diference in the mean prices for the four brands of cofee?

69

SUPER
MARKE
T
C
1
2
3
4
5
6
7
8
70

9

A

BRAND
D

B

$ 2,43
$ 2,47
2,27
$2,41
2,53
2,42
2,46
2,44
2,47
2,64
2,47
2,59

2,48
2,38
2,40
2,35
2,43
2,55
2,41
2,53

2,48
2,35
2,39
2,32
2,42
2,56
2,39
2,49

TOTALS
$

9,78

2,52

10,01

2,44

9,59

2,47

9,72

2,42

9,53

2,49

9,81

2,62

10,37

2,49

9,76

2,60

10,21

SOLUTION

71

72

Since the calculation F >F0,05 , there is very
strong evidence that at least two of the means
for the populations/treatments of prices of four
cofee brands difer.
Treatments :
H0 : µ1 = µ2 = µ3= µ4
H1 : at least two brands have diferent mean
prices
Test Statistic
Blocks :
H0 :Mean cofee prices are the same for all ten
supermarkets
H1 : Mean cofee prices difer for at least two
73
supermarkets

dof for the test statistic are
b - 1 = 9 and N – k – b +1=27
F0,05 = 2,25

ANOVA
SOURC
E

DOF

SS

MS

F

Treatme

3

0,05000

0,01666

92,8

nt

9

0,17451

7

107,9

27

0,00485

0,01939

Block
Error
74

TABLE

0
0,00017

NON PARAMETRIC TEST
The majority of hypothesis tests discussed so far have made
inferences about population parameters, such as the mean and
the proportion. These parametric tests have used the parametric
statistics of samples that came from the population being tested.
To formulate these tests, we made restrictive assumptions about
the populations from which we drew our samples. For example,
we assumed that our samples either were large or came from
normally distributed populations. But populations are not always
normal.

75

And even if a goodness-of-ft test indicates that a population is
approximately normal. We cannot always be sure we’re right,
because the test is not 100 percent reliable.
Fortunately, in recent times statisticians have develops useful
techniques that do noi make resiriciive assumption about the
shape of population distribution.
These are known as distribution – free or, more commonly,
nonparameiric iesi.
Non parametric statistical procedures in preference to their
parametric counterparts.
The hypotheses of a nonparametric test are concerned with
something other than the value of a population parameter.
A large number of these tests exist, but this section will examine
76

only a few of the better known and more widely used ones :

SIGN TEST
WILCOXON SIGNED RANK
TEST
MANN – WHITNEY TEST
(WILCOXON RANK SUM TEST)
NON PARAMETRIC
TESTS

RUN TEST
KRUSKAL – WALLIS TEST
KOLMOGOROV – SMIRNOV
TEST
LILLIEFORS TEST

77

THE SIGN TEST
The sign test is used to test hypotheses about the
median of a continuous distribution. The median
of a distribution is a value of the random variable X
such that the probability is 0,5 that an observed
value of X is less than or equal to the median, and
the probability is 0,5 that an observed value of X is
greater than or equal to the median. That is,

Since the normal distribution is symmetric, the
mean of a normal distribution equals the median.
Therefore, the sign test can be used to test
hypotheses about the mean of a normal distribution.
78

Let X denote a continuous random variable with
median
and let
denote a random sample of size n from
the population of interest.
If
denoted the hypothesized value of the
population median, then the usual forms of the
hypothesis to be tested can be stated as follows
:
VERSUS

(right-tailee
test)
79

(left-tailee
test)

(two-tailee
test)

Form the diferences :
Now if the null hypothesis
is true,
any diference
is equally likely to be
positive or negative. An appropriate test statistic is
the number of these diferences that are positive,
say
. Therefore, to test the null hypothesis we are
really testing that the number of plus signs is a value
of a Binomial random variable that has the
parameter p = 0,5 .
A p-value for the observed number of plus signs
can be calculated directly from the Binomial
distribution. Thus, if the computed p-value.
is less than or equal to some preselected signifcance
level α , we will reject
and conclude
is true.
80

To test the other one-sided hypothesis,
vs
is less than or equal α, we
will reject
.
The two-sided alternative may also be tested. If
the hypotheses are:
vs
p-value is :

81

It is also possible to construct a table of critical
value for the sign test.
As before, let
denote the number of the
diferences
that are positive and let
denote the number of the diferences that are
negative.
Let
, table of critical values
for the sign test that ensure that
If the observed value of the test-statistic
the the null hypothesis
should be
reject and accepted
82

,

If the alternative is
,
then reject
if
.
If the alternative is
,
then reject
if
.
The level of signifcance of a one-sided test is
one-half the value for a two-sided test.

83

TIES in the SIGN TEST
Since the underlying population is assumed to
be continuous, there is a zero probability that
we will fnd a “tie” , that is , a value of
exactly equal to
.
When ties occur, they should be set aside and
the sign test applied to the remaining data.

84

THE NORMAL
APPROXIMATION
When
, the Binomial distribution is
well approximated by a normal distribution
when n is at least 10. Thus, since the mean
of the Binomial is
and the variance is
, the distribution of
is approximately
normal with mean 0,5n and variance 0,25n
whenever n is moderately large.
Therefore, in these cases the null hypothesis
can be tested using the statistic :

85

Critical Regions/Rejection Regions for α-level
tests of :

versus

are given in this table :
CRITICAL/REJECTION
Alternative REGIONS FOR
CR/RR

86

THE WILCOXON SIGNED-RANK TEST
The sign test makes use only of the plus and
minus signs of the diferences between the
observations and the median
(the plus and
minus signs of the diferences between the
observations in the paired case).
Frank Wilcoxon devised a test procedure that
uses both direction (sign) and magnitude.
This procedure, now called the Wilcoxon signedrank test.
The Wilcoxon signed-rank test applies to the
case of the symmetric continuous distributions.
Under these assumptions, the mean equals the
87

Description of the test :
We are interested in testing,

versus

88

Assume that
is a random sample
from a continuous and symmetric distribution
with mean/median :
.
Compute the diferences
, i = 1, 2, … n
Rank the absolute diferences
, and then
give the ranks the signs of their corresponding
diferences.
Let
be the sum of the positive ranks, and
be the absolute value of the sum of the negative
ranks, and let
.
Critical values of
, say
.
1. If
, then value of the statistic
, reject
89
2. If
, reject
if

LARGE SAMPLE
APPROXIMATION
If the sample size is moderately large (n>20),
then it can be shown that
or
has
approximately a normal distribution with mean
and
variance
Therefore, a test of
the statistic

90

can be based on

Wilcoxon Signed-Rank Test
Test statistic :

Theorem : The probability distribution of
when
is true, which is based on a random
sample of size n, satisfes :

91

Proof :

Let

if

, then

where
For a given
, the discrepancy
50 : 50 chance being “+” or “-”. Hence,
where
92

has a

93

94

PAIRED OBSERVATIONS
The Wilcoxon signed-rank test can be applied to
paired data.
Let (
) , j = 1,2, …n be a collection of paired
observations from two continuous distributions
that difer only with respect to their means. The
distribution of the diferences
is
continuous and symmetric.
The null hypothesis is :
, which is
equivalent to
To use the Wilcoxon signed-rank test, the
diferences are frst ranked in ascending order of
their absolute values, and then the ranks are
95
given the signs of the diferences.

Let
be the sum of the positive ranks and
be the absolute value of the sum of the negative
ranks, and
.
If the observed value
, then
is rejected
and
accepted.
If
, then reject
, if
If
, reject
, if

96

EXAMPLE
Eleven students were randomly selected from a large
statistics class, and their numerical grades on two
successive
examinations
were
recorded. Sign
Stueen Test 1
Test 2
Diferenc Ran
t

1
2
3
4
5
6
7
8
9
10
11

94
78
89
62
49
78
80
82
62
83
79

85
65
92
56
52
74
79
84
48
71
82

e

k

Rank

9
13
-3
6
-3
4
1
-2
14
12
-3

8
10
4
7
4
6
1
2
11
9
4

8
10
-4
7
-4
6
1
-2
11
9
-4

Use the Wilcoxon signed rank test to determine
whether the second test was more difcult than the
97
frst. Use α = 0,1.

solution :
Jumlah ranks positif :

0
98

1,2
8 1,69

TOLAK H0

EXAMPLE
Ten newly married couples were randomly
selected, and each husband and wife were
independently asked the question of how many
children they would like to have. The following
information
was obtained.
COUPLE
1
2
3
4
5
6
7

WIFE X
HUSBAND
Y

3
2

8

2
3

2
2
3
1

9

1
0
2
2

10
0

0

1

2

2

0

2

1

Using the sign test, is test reason to believe that
wives want fewer children than husbands?
Assume a maximum size of type I error of 0,05
99

SOLUSI
Tetapkan dulu H0 dan H1 :
H0 : p = 0,5
vs

H1 : p < 0,5
Pasang
an
Tanda

1 2 3
8 9 10
+
+

-

-

4
-

6
-

7
+ -

Ada tiga tanda +.
Di bawah H0 , S ~ BIN (9 , 1/2)
P(S ≤ 3) = 0,2539
Pada peringkat α = 0,05 , karena 0,2539 > 0,05
maka H0 jangan ditolak.
100

THE WILCOXON RANKSUM TEST
Suppose that we have two independent
continuous populations X1 and X2 with means
µ1 and µ2. Assume that the distributions of X1
and X2 have the same shape and spread, and
difer only (possibly) in their means.
The Wilcoxon rank-sum test can be used to
test the hypothesis
H0 : µ1 = µ2. This procedure is sometimes called
the Mann-Whitney test or Mann-Whitney U
Test.
101

Description of the Test
Let
and
be
two independent random samples of sizes
from the continuous populations X 1 and X2.
We wish to test the hypotheses :
H0 : µ1 = µ2
versus

H 1 : µ1 ≠ µ 2

The test procedure is as follows. Arrange all n1
+ n2 observations in ascending order of
magnitude and assign ranks to them. If two or
more observations are tied, then use the mean
of the ranks that would have been assigned if
102
the observations difered.

Let W1 be the sum of the ranks in the smaller
sample (1), and defne W2 to be the sum of the
ranks in the other sample.
Then,
Now if the sample means do not difer, we will
expect the sum of the ranks to be nearly equal
for both samples after adjusting for the
diference in sample size. Consequently, if the
sum of the ranks difer greatly, we will conclude
that the means are not equal.
Refer to table with the appropriate sample sizes
n and n2 , the critical value wα can be obtained.
103 1

H0 : µ1 = µ2 is rejected, if either of the
observed values
w1 or w2 is less than or equal wα
If H1 : µ1 < µ2, then reject H0 if w1 ≤ wα
For H1 : µ1 > µ2, reject H0 if w2 ≤ wα.

104

LARGE-SAMPLE APPROXIMATION
When both n1 and n2 are moderately large, say,
greater than 8, the distribution of W 1 can be well
approximated by the normal distribution with
mean :

and variance :

105

Therefore, for n1 and n2 > 8, we could use :

as a statistic, and critical region is :

 two-tailed test

106



 upper-tail test



 lower-tail test

EXAMPLE
A large corporation is suspected of sexdiscrimination in the salaries of its employees.
From employees with similar responsibilities and
work experience, 12 male and 12 female
employees were randomly selected ; their annual
Femal 22, 19 20 24 23 19 18 20, 21 23 20 21
salaries
thousands
dollars
follows
es
5 in,8
,6 ,7 ,2of ,2
,7 9are,6as ,5
,7 ,6:
Males

21,
9

21, 22, 24, 24, 23, 21, 23,
6
4
0
1
4
2
9

20, 24, 22, 23,
5
5
3
6

Is there reason to believe that there random
samples come from populations with diferent
distributions ? Use α = 0,05
107

SOLUSI

H0 : f1(x) = f2(x)  APA ARTINYA??
random samples berasal dari
populasi dengan distribusi yang sama
H1 : f1(x) ≠ f2(x)

Gabungkan dan buat peringkat salaries :

108

SE
X

GAJI

PERINGKA
T

F

18,7

1

F

19,2

2

F

19,8

3

M

20,5

4

F

20,6

5

F

20,7

6

F

20,9

7

M

21,2

8

M

21,6

10

F

21,6

10

F

21,6

10

CONT’D...........

109

M

21,9

12

M

22,3

13

M

22,4

14

F

22,5

15

F

23,2

16

M

23,4

17

F

23,5

18

M

23,6

19

M

23,9

20

M

24,0

21

M

24,1

22

M

24,5

23

F

24,7

24

Andaikan, kita pilih sampel dari female, maka
jumlah peringkatnya
R1 = RF = 117
Statistic
nilai dari statistic U adalah

110

Grafk

α = 0,05
Zhit =
1,91
maka
terima H0
-1,96
ARTINYA ???

111

1,96

KOLMOGOROV – SMIRNOV

TEST

The Kolmogorov-Smirnov Test
(K-S) test is
conducted by the comparing the hypothesized
and sample cumulative distribution function.
A cumulative distribution function is defned as :
and the sample cumulative
distribution function, S(x), is defned as the
proportion of sample values that are less than or
equal to x.
The K-S test should be used instead of the
to
determine if a sample is from a specifed
continuous distribution.
To illustrate how S(x) is computed, suppose we
112
have the following 10 observations :

We begin by placing the values of x in
ascending order, as follows :
80, 89, 93, 97, 102, 103, 105, 108, 110,
121.
Because x = 80 xis the
smallest
S(x)
= P(X ≤ of the 10 values,
the proportion of values x)
of x that are less than
or equal to 80 is80: S(80)0,1= 0,1.

113

89

0,2

93

0,3

97

0,4

102

0,5

103

0,6

105

0,7

108

0,8

110

0,9

121

1,0

The test statistic D is the maximum- absolute
diference between the two cdf’s over all
observed values.
The range on D is 0 ≤ D ≤ 1, and the formula is

where x = each observed value
S(x) = observed cdf at x
F(x) = hypothesized cdf at x

114

Let X(1) , X(2) , …. , X(n) denote the ordered
observations of a random sample of size n, and
defne the sample cdf as :

is the proportion of the number of sample
values less than
or equal to x.
115

The Kolmogorov – Smirnov statistic, is defned to
be :

For the size α of type I error, the critical region is
of form :

116

EXAMPLE 1
A state vehicle inspection station has been
designed so that inspection time follows a
uniform distribution with limits of 10 and 15
minutes.
A sample of 10 duration times during low and
peak trafc conditions was taken. Use the K-S
test with α = 0,05 to determine if the sample
is from this uniform distribution. The time are :
11,3
10,4
9,8
12,6
14,8
13,0
14,3
13,3
11,5
13,6

117

SOLUTION
1. H0 : sampel berasal dari distribusi Uniform

(10,15)
versus
H1 : sampel tidak berasal dari distribusi
Uniform (10,15)
2. Fungsi distribusi kumulatif dari sampel : S (x)
dihitung dari,

118

Hasil Perhitungan dari K-S

119

Waktu
Pengamata
nx

S(x)

F(x)

9,8

0,10

0,00

0,10

10,4

0,20

0,08

0,12

11,3

0,30

0,26

0,04

11,5

0,40

0,30

0,10

12,6

0,50

0,52

0,02

13,0

0,60

0,60

0,00

13,3

0,70

0,66

0,04

13,6

0,80

0,72

0,08

14,3

0,90

0,86

0,04

14,8

1,00

0,96

0,04

, untuk x = 10,4
Dalam tabel , n = 10 , α = 0,05  D10,0.05 = 0,41
f(D)
α = P(D ≥ D0)

D0

D

0,12 < 0,41 maka do not reject H0
120

EXAMPLE 2
Suppose we have the following ten observations
110, 89, 102, 80, 93, 121, 108, 97, 105,
103 ;
were drawn from a normal distribution, with
mean µ = 100 and standard-deviation σ = 10.
Our hypotheses for this test are
H0 : Data were drawn from a normal distribution,
with µ = 100
and σ = 10.
versus
H1 : Data were not drawn from a normal
121distribution, with µ = 100

SOLUTION
F(x) = P(X ≤ x)
x

F(x)
P(X ≤ 80) = P(Z ≤ -2) =
0,0228
P(X ≤ 89) = P(Z ≤ -1,1) =
0,1357
P(X ≤ 93) = P(Z ≤ -0,7) =

122

80

0,2420

89

P(X ≤ 97) = P(Z ≤ -0,3) =

93

0,3821

97

P(X ≤ 102) = P(Z ≤ 0,2) =

102

0,5793

103

P(X ≤ 103) = P(Z ≤ 0,3) =

105

0,6179

123

x

F(x)

S(x)

80

0,0228

0,1

0,0772

89

0,1357

0,2

0,0643

93

0,2420

0,3

0,0580

97

0,3821

0,4

0,0179

102

0,5793

0,5

0,0793 =

103

0,6179

0,6

0,0179

105

0,6915

0,7

0,0085

108

0,7881

0,8

0,0119

110

0,8413

0,9

0,0587

121

0,9821

1,0

0,0179

Jika α = 0,05 , maka critical value, dengan n=10
diperoleh di tabel = 0,409.
Aturan keputusannya, tolak H0 jika D > 0,409
Karena
H0 jangan ditolak atau terima H0 .
Artinya, data berasal dari distribusi normal
dengan µ = 100 dan
σ = 10.

124

LILLIEFORS TEST
In most applications where we want to test for
normality, the population mean and the
population variance are known.
In order to perform the K-S test, however, we
must assume that those parameters are known.
The Lilliefors test, which is quite similar to the KS test.
The major diference between two tests is that,
with the Lilliefors test, the sample mean
and
the sample standard deviation s are used
instead of µ and σ to calculate F (x).
125

EXAMPLE
A manufacturer of automobile seats has a
production line that produces an average of 100
seats per day. Because of new government
regulations, a new safety device has been
installed, which the manufacturer believes will
reduce average daily output.
A random sample of 15 days’ output after the
installation of the safety device is shown:
93, 103, 95, 101, 91, 105, 96, 94, 101, 88,
98, 94, 101, 92, 95
The daily production was assumed to be
normally distributed.
Use the Lilliefors test to examine that
126
assumption, with α = 0,01

SOLUSI
Seperti pada uji K-S, untuk menghitung S (x) urutkan,
sbb :

127

x

S(x)

88

1/15 = 0,067

91

2/15 = 0,133

92

3/15 = 0,200

93

4/15 = 0,267

94

6/15 = 0,400

95

8/15 = 0,533

96

9/15 = 0,600

98

10/15 = 0,667

101

13/15 = 0,867

103

14/15 = 0,933

105

15/15 = 1,000

Dari data di atas, diperoleh
4,85
.
Selanjutnya F(x) dihitung sbb :
x

88
91
92
.
.
.
.
101
103
105

128

F(x)

dan s =

Akhirnya, buat rangkuman sbb :
x

F(x)

S(x)

88

0,0401

0,067

0,0269

91

0,1292

0,133

0,0038

92

0,1788

0,200

0,0212

93

0,2358

0,267

0,0312

94

0,3050

0,400

0,0950

95

0,3821

0,533

0,1509 = D

96

0,4602

0,600

0,1398

98

0,6255

0,667

0,0415

101

0,8238

0,867

0,0432

103

0,9115

0,933

0,0215

105

0,9608

1,000

0,0392

Tabel, nilai kritis dari uji Lilliefors : α = 0,01 , n = 15

0,257
129

maka terima H

Dtab =

TEST BASED ON RUNS
Usually a sample that is taken from a population
should be random.
The runs iesi evaluates the null hypothesis
H0 : the order of the sample data is random
The alternative hypothesis is simply the negation
of H0. There is no comparable parametric test to
evaluate this null hypothesis.
The order in which the data is collected must be
retained so that the runs may be developed.
130

DEFINITIONS :
1. A run is defned as a sequence of the same
symbols.
Two symbols are defned, and each sequence
must contain a symbol at least once.
2. A run of length j is defned as a sequence of j
observations, all belonging to the same group,
that is preceded or followed by observations
belonging to a diferent group.
For illustration, the ordered sequence by the sex of
the employee is as follows :
FFF M FFF MM FF MMM FF M F MMMM
M F
For the sex of the employee the ordered sequence
131
exhibits runs of F’s and M’s.

The sequence begins with a run of length three,
followed by a run of length one, followed by
another run of length three, and so on.
The total number of runs in this sequence is 11.
Let R be the total number of runs observed in an
ordered sequence of n1 + n2 observations, where
n1 and n2 are the respective sample sizes. The
possible values of R are 2, 3, 4, …. (n 1 + n2 ).
The only question to ask prior to performing the
test is, Is the sample size small or large?
We will use the guideline that a small sample has
n1 and n2 less than or equal to 15.
In the table, gives the lower rL and upper rU
132
values of the distribution f(r) with α/2 = 0,025 in

f(r
)

rL

AR

r
r
U

If n1 or n2 exceeds 15, the sample is considered
large, in which case a normal approximation to
f(r) is used to test H0 versus H1.
133

The mean and variance of R are determined to
be

normal
approximation

134

THE KRUSKAL - WALLIS H TEST
The Kruskal – Wallis H test is the nonparametric
equivalent of the Analysis of Variance F test.
It test the null hypothesis that all k populations
possess the same probability distribution against the
alternative hypothesis that the distributions difer in
location – that is, one or more of the distributions are
shifted to the right or left of each other.
The advantage of the Kruskall – Wallis H test over the
F test is that we need make no assumptions about
the nature of sampled populations.
A completely randomized design specifes that we
select independent random samples of n 1, n2 , …. nk
observations from the k populations.
135

To conduct the test, we frst rank all :
n = n1 + n2 + n3 + … +nk observations and
compute the rank sums, R1 , R2 , …, Rk for the k
samples.
The ranks of tied observations are averaged in the
same manner as for the WILCOXON rank sum test.
Then, if H0 is true, and if the sample sizes n 1 , n2 ,
…, nk each equal 5 or more, then the test statistic
is defned by :

will have a sampling distribution that can be
approximated by a chi-square distribution with (k1) degrees of freedom.
136

Therefore, the rejection region for the test
is
, where
is the value that
located α in the upper tail of the chi- square
distribution.
The test is summarized in the following :

137

KRUSKAL – WALLIS H TEST
FOR COMPARING k POPULATION PROBABILITY DISTRIBUTIONS

H0 : The k population probability distributions
are identical
H1 : At least two of the k population probability
distributions
difer in location
Test statistic :
where,
ni = Number of measurements in sample i

138

Ri = Rank sum for sample i, where the rank of
each measurement
is computed according to its relative

n = Total sample size = n1 + n2 + … +nk
Rejection Region :
with (k-1) dof
Assumptions :
1. The k samples are random and independent
2. There are 5 or more measurements in each
sample
3. The observations can be ranked
No assumptions have to be made about the
shape of the population probability distributions.

139

Example
Independent random samples of three diferent
brands of magnetron tubes (the key components
in microwave ovens) were subjected to stress
testing, and the number of hours each operated
without repair was recorded. Although these
times do not represent typical life lengths, they
do indicate how well the tubes can withstand
extreme stress. The data are shown in table
(below). Experience has shown that the
distributions of life lengths for manufactured
product are often non normal, thus violating the
assumptions required for the proper use of an
ANOVA F test.
140Use
the K-S H test to determine whether

BRAND
A
B
C
36
71
48
31
5
140
67
59
53
42

141

49
33
60
2
55

Solusi
Lakukan ranking/peringkat dan jumlahkan peringkat dari 3
sample tersebut.
A
peringkat
5
7

B
peringkat

C
peringkat

36

49
8

71
14

48

33
4

31

5

60
probability
12

3

140
distributions
15

H0 : the population
of length of
2
life under 67
2
59
1 three brands
11 of magnetron
stress are13identical for the
53
55
42
tubes.
9
10
6
versus
R1 =
R2 =
R3 =
H1 : at least
two
of
the
population
probability
distributions
36
35
49
difer in
location
142

Test statistic :

H0 ???
f(H)

1,2
2
143

5,9
9

H