Sesi 9. Hypothesis Testing: Categorical Data

Lecture 10
Hypothesis testing:
Categorical Data Analysis

Biostatistics I: 2017-18

1

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Heatlth

Learni ng Obj ecti ves
1.

Comparison of binomial proportion using Z
and 2 Test.

2.

Explain 2 Test for Independence of 2
variables


3.

Explain The Fisher’s test for independence

4.

McNemar’s tests for correlated data

5.

Kappa Statistic

6.

Use of Computer Program
Biostatistics I: 2017-18

sawilopo@yahoo.com

2


Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Data Types
Data

Quantitative

Discrete

Continuous

Biostatistics I: 2017-18
sawilopo@yahoo.com

Qualitative

3

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health


Qual i tati ve Data
1.

2.
3.
4.

Qualitative Random Variables Yield
Responses That Can Be Put In Categories.
Example: Sex (Male, Female)
Measurement or Count Reflect # in
Category
Nominal (no order) or Ordinal Scale (order)
Data can be collected as continuous but
recoded to categorical data. Example
(Systolic Blood Pressure - Hypotension,
Normal tension, hypertension )
Biostatistics I: 2017-18


sawilopo@yahoo.com

4

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Hypothesi s Tests
Qual i tati ve Data
Qualitative
Data
1 pop.

Proportion

2 or more
pop.

Independence

2 pop.

Z Test

Z Test

2 Test

Biostatistics I: 2017-18
sawilopo@yahoo.com

2 Test

5

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Di fferences i n
Two Proporti ons

Biostatistics I: 2017-18


6

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Heatlth

Hypotheses for
Two Proporti ons

Biostatistics I: 2017-18
sawilopo@yahoo.com

7

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Hypotheses for
Two Proporti ons
Research Questions
Hypothesis No Difference Pop 1 Pop 2 Pop 1  Pop 2
Any Difference Pop 1 < Pop 2 Pop 1 > Pop 2


H0
Ha

Biostatistics I: 2017-18
sawilopo@yahoo.com

8

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Hypotheses for
Two Proporti ons
Research Questions
Hypothesis No Difference Pop 1 Pop 2 Pop 1  Pop 2
Any Difference Pop 1 < Pop 2 Pop 1 > Pop 2

H0
Ha

p1 - p 2 = 0

p1 - p2 0

Biostatistics I: 2017-18
sawilopo@yahoo.com

9

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Hypotheses for
Two Proporti ons
Research Questions
Hypothesis No Difference Pop 1 Pop 2 Pop 1  Pop 2
Any Difference Pop 1 < Pop 2 Pop 1 > Pop 2

H0

p1 - p 2 = 0

Ha


p1 - p2 0

p1 - p2  0
p1 - p 2 < 0

Biostatistics I: 2017-18
sawilopo@yahoo.com

10

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Hypotheses for
Two Proporti ons
Research Questions
Hypothesis No Difference Pop 1 Pop 2 Pop 1  Pop 2
Any Difference Pop 1 < Pop 2 Pop 1 > Pop 2

H0


p1 - p 2 = 0

Ha

p1 - p2 0

p1 - p2  0
p1 - p 2 < 0

Biostatistics I: 2017-18
sawilopo@yahoo.com

p1 - p2  0
p1 - p2 > 0

11

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health


Hypotheses for
Two Proporti ons
Research Questions
Hypothesis No Difference Pop 1 Pop 2 Pop 1  Pop 2
Any Difference Pop 1 < Pop 2 Pop 1 > Pop 2

H0

p1 - p 2 = 0

Ha

p1 - p2 0

p1 - p2  0
p1 - p 2 < 0

Biostatistics I: 2017-18
sawilopo@yahoo.com

p1 - p2  0
p1 - p2 > 0

12

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Di fference i n Two
Proporti ons
1. Assumptions
Populations Are Independent
Populations Follow Binomial Distribution
Normal Approximation Can Be Used for large
samples (All Expected Counts  5)





2.

Z-Test Statistic for Two Proportions
Z

 pˆ 1  pˆ 2    p1  p2 
1 1
ˆp  1  pˆ     
 n1 n2 

X1  X 2
where pˆ 
n1  n2

Biostatistics I: 2017-18
sawilopo@yahoo.com

13

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Sampl e Di stri buti on for Di fference
Between Proporti ons

12  22 

X1  X 2 ~ N  1  2 ;



n
n
1
2


p1  p2


p1 1  p1  p2 1  p2  
 N  p1  p2 ;




n1
n2




 1 1 
 N  0; pq    

n1 n2  



x1  x2
p
,
n1  n2

under H 0 : p1  p2

Biostatistics I: 2017-18
sawilopo@yahoo.com

14

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons
Thi nki ng Chal l enge
MA


You’re an epidemiologist for the
Department of Health in your
country. You’re studying the
prevalence of disease X in two
provinces (MA and CA). In MA, 74
of 1500 people surveyed were
diseased and in CA, 129 of 1500
were diseased. At .05 level, does
MA have a lower prevalence rate?
Biostatistics I: 2017-18

sawilopo@yahoo.com

CA

15

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons
Sol uti on*

Biostatistics I: 2017-18
sawilopo@yahoo.com

16

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons
Sol uti on*
H0:
Ha:
=

nMA =

Test Statistic:

nCA =

Critical Value(s):

Decision:
Conclusion:

Biostatistics I: 2017-18
sawilopo@yahoo.com

17

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons
Sol uti on*
H0: pMA - pCA = 0
Ha: pMA - pCA < 0
=
nMA =
nCA =
Critical Value(s):

Test Statistic:

Decision:
Conclusion:

Biostatistics I: 2017-18
sawilopo@yahoo.com

18

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons
Sol uti on*
Test Statistic:
H0: pMA - pCA = 0
Ha: pMA - pCA < 0
 = .05
nMA = 1500 nCA = 1500
Critical Value(s):
Decision:
Conclusion:

Biostatistics I: 2017-18
sawilopo@yahoo.com

19

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons
Sol uti on*
Test Statistic:
H0: pMA - pCA = 0
Ha: pMA - pCA < 0
 = .05
nMA = 1500 nCA = 1500
Critical Value(s):
Decision:
Reject

.05

Conclusion:

-1.645 0

Z
Biostatistics I: 2017-18

sawilopo@yahoo.com

20

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons Sol uti on*
pˆ MA 

74
X MA

 .0493
nMA 1500

pˆ CA 

X CA 129

 .0860
nCA 1500

X MA  X CA
74  129
pˆ 

 .0677
nMA  nCA 1500  1500
Z

.0493  .0860  0 

1 
 1
.0677   1  .0677   


 1500 1500 
 4.00
Biostatistics I: 2017-18

sawilopo@yahoo.com

21

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons
Sol uti on*
Test Statistic:
H0: pMA - pCA = 0
Z = -4.00
Ha: pMA - pCA < 0
 = .05
nMA = 1500 nCA = 1500
Critical Value(s):
Decision:
Reject

.05

Conclusion:

-1.645 0

Z
Biostatistics I: 2017-18

sawilopo@yahoo.com

22

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons
Sol uti on*
Test Statistic:
H0: pMA - pCA = 0
Z = -4.00
Ha: pMA - pCA < 0
 = .05
nMA = 1500 nCA = 1500
Critical Value(s):
Decision:
Reject
Reject at  = .05

.05

Conclusion:

-1.645 0

Z
Biostatistics I: 2017-18

sawilopo@yahoo.com

23

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons
Sol uti on*
Test Statistic:
H0: pMA - pCA = 0
Z = -4.00
Ha: pMA - pCA < 0
 = .05
nMA = 1500 nCA = 1500
Critical Value(s):
Decision:
Reject
Reject at  = .05

.05
-1.645 0

Z

Conclusion:
There is evidence MA
is less than CA
Biostatistics I: 2017-18

sawilopo@yahoo.com

24

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Between 2 Categori cal
Vari abl es

Biostatistics I: 2017-18

25

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Heatlth

Hypothesi s Tests
Qual i tati ve Data
Qualitative
Data
1 pop.

Proportion

2 or more
pop.

Independence

2 pop.
Z Test

Z Test

2 Test

Biostatistics I: 2017-18
sawilopo@yahoo.com

2 Test

26

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

1.

2.

Shows If a Relationship Exists
Between 2 Qualitative Variables, but
does Not Show Causality
Assumptions
• Multinomial Experiment
• All Expected Counts  5

3.

Uses Two-Way Contingency Table

Biostatistics I: 2017-18
sawilopo@yahoo.com

27

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Conti ngency Tabl e
 1.

Shows # Observations From 1
Sample Jointly in 2 Qualitative
Variables

Biostatistics I: 2017-18
sawilopo@yahoo.com

28

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Conti ngency Tabl e
1. Shows # Observations From 1
Sample Jointly in 2 Qualitative
Variables
Levels of variable 2
Disease
Status
Disease
No disease
Total

Residence
Urban
Rural
63
15
78

49
33
82

Total
112
48
160

Levels of variable 1
Biostatistics I: 2017-18
sawilopo@yahoo.com

29

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Hypotheses & Stati sti c
1.

Hypotheses



H0: Variables Are Independent
Ha: Variables Are Related (Dependent)

Biostatistics I: 2017-18
sawilopo@yahoo.com

30

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Hypotheses & Stati sti c
1.

Hypotheses
H0: Variables Are Independent
Ha: Variables Are Related (Dependent)

2.

Test Statistic
Observed count

n Eˆn 

2

 
2

allcells

ij

ij

Eˆnij

Biostatistics I: 2017-18
sawilopo@yahoo.com

Expected
count

31

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Hypotheses & Stati sti c
1.

Hypotheses
H0: Variables Are Independent
Ha: Variables Are Related (Dependent)

2.

Test Statistic

n Eˆn 

Observed count

2

 
2

allcells

ij

ij

Eˆnij

Expected
count
Rows Columns

Degrees of Freedom: (r - 1)(c - 1)
Biostatistics I: 2017-18
sawilopo@yahoo.com

32

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence Expected
2

Counts
1.

2.
3.

Statistical Independence Means Joint
Probability Equals Product of
Marginal Probabilities
Compute Marginal Probabilities &
Multiply for Joint Probability
Expected Count Is Sample Size Times
Joint Probability

Biostatistics I: 2017-18
sawilopo@yahoo.com

33

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Expected Count Exampl e

Biostatistics I: 2017-18
sawilopo@yahoo.com

34

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Expected Count Exampl e
Marginal probability = 112
160

Residence
Urban Rural
Obs. Obs.

Disease
Status
Disease

63

49

112

No Disease

15

33

48

78

82

160

Total

Biostatistics I: 2017-18
sawilopo@yahoo.com

Total

35

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Expected Count Exampl e
Marginal probability = 112
160

Residence
Urban Rural
Obs. Obs.

Disease
Status

Total

Disease

63

49

112

No Disease

15

33

48

78

82

160

Total
78
Marginal probability =
160

Biostatistics I: 2017-18
sawilopo@yahoo.com

36

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Expected Count Exampl e
112 78
Joint probability =
160 160

Marginal probability = 112
160

Residence
Urban Rural
Obs. Obs.

Disease
Status

Total

Disease

63

49

112

No Disease

15

33

48

78

82

160

Total
78
Marginal probability =
160

Biostatistics I: 2017-18
sawilopo@yahoo.com

37

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Expected Count Exampl e
112 78
Joint probability =
160 160

Marginal probability = 112
160

Residence
Urban Rural
Obs. Obs.

Disease
Status
Disease

63

49

112

No Disease

15

33

48

78

82

160

Total
78
Marginal probability =
160

112 78
Expected count = 160·
160 160
= 54.6

Biostatistics I: 2017-18
sawilopo@yahoo.com

Total

38

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Expected Count Cal cul ati on

Biostatistics I: 2017-18
sawilopo@yahoo.com

39

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Expected Count Cal cul ati on
Expected count =

aRow totalf  aColumn totalf
Sample size

Biostatistics I: 2017-18
sawilopo@yahoo.com

40

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Expected Count Cal cul ati on
Expected count =
112x78
160
Disease

Status

aRow totalf  aColumn totalf
Sample size

Residence
112x82
160
Urban
Rural
Obs. Exp. Obs. Exp. Total

Disease

63

54.6

49

57.4

112

No Disease

15

23.4

33

24.6

48

Total

78

78

82

82

48x78
160Biostatistics I: 2017-18
sawilopo@yahoo.com

160
48x82
160

41

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Exampl e on HIV


You randomly sample 286 sexually active
individuals and collect information on their HIV
status and History of STDs. At the .05 level, is
there evidence of a relationship?
HIV
STDs Hx
No
Yes
Total

No
84
48
132
Biostatistics I: 2017-18

sawilopo@yahoo.com

Yes
32
122
154

Total
116
170
286
42

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on

Biostatistics I: 2017-18
sawilopo@yahoo.com

43

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on
H0:
Ha:
=
df =
Critical Value(s):

Test Statistic:

Decision:

Reject

Conclusion:
0

2
Biostatistics I: 2017-18

sawilopo@yahoo.com

44

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on
H0: No Relationship
Ha: Relationship
=
df =
Critical Value(s):

Test Statistic:

Decision:

Reject

Conclusion:
0

2
Biostatistics I: 2017-18

sawilopo@yahoo.com

45

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):

Test Statistic:

Decision:

Reject

Conclusion:
0

2
Biostatistics I: 2017-18

sawilopo@yahoo.com

46

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):

Test Statistic:

Decision:

Reject

 = .05
0

3.841

Conclusion:

2
Biostatistics I: 2017-18

sawilopo@yahoo.com

47

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on



E(nij)  5 in all
cells
116x132
286

STDs HX

HIV

154x116
286

No
Yes
Obs. Exp. Obs. Exp. Total

No

84

53.5

32

62.5

116

Yes

48

78.5

122

91.5

170

132

132

154

154

286

Total

170x132
286
Biostatistics I: 2017-18
sawilopo@yahoo.com

170x154
286
48

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on
 
2


all cells

c h
c h

nij  E nij
E n
ij

a f
a f

n11  E n11

E n
11

84  53.5

53.5

2

2

2

a f
a f

n12  E n12

E n

2

12

32  62.5

62.5

2

122  91.5

91.5

Biostatistics I: 2017-18
sawilopo@yahoo.com

a f
a f

n22  E n22

E n

2

22

2

 54.29

49

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):

Test Statistic:
2 = 54.29

Decision:

Reject

 = .05
0

3.841

Conclusion:

2
Biostatistics I: 2017-18

sawilopo@yahoo.com

50

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):
Reject

 = .05
0

3.841

Test Statistic:
2 = 54.29

Decision:
Reject at  = .05
Conclusion:

2
Biostatistics I: 2017-18

sawilopo@yahoo.com

51

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):
Reject

 = .05
0

3.841

2

Test Statistic:
2 = 54.29

Decision:
Reject at  = .05
Conclusion:
There is evidence of a
relationship

Biostatistics I: 2017-18
sawilopo@yahoo.com

52

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

STATA
tabi 30 18 38 \ 13 7 22, chi2








|
col
row |
1
2
3 |
Total
-----------+---------------------------------+---------1 |
30
18
38 |
86
2 |
13
7
22 |
42
-----------+---------------------------------+---------Total |
43
25
60 |
128

Pearson chi2(2) = 0.7967 Pr = 0.671

Biostatistics I: 2017-18
sawilopo@yahoo.com

53

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

SAS CODES
Data dis;
input STDs HIV count;
cards;
1 1 84
1 2 32
2 1 48
2 2 122
;
run;
Proc freq data=dis order=data;
weight Count;
tables STDs*HIV/chisq;
run;

Biostatistics I: 2017-18
sawilopo@yahoo.com

54

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

SAS OUTPUT
Statistics for Table of STDs by HIV
Statistic
DF
Value
Prob
------------------------------------------------------Chi-Square
1
54.1502
S