Sesi 9. Hypothesis Testing: Categorical Data

Lecture 10
Hypothesis testing:
Categorical Data Analysis

Biostatistics I: 2017-18

1

[email protected] Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Heatlth

Learni ng Obj ecti ves
1.

Comparison of binomial proportion using Z
and 2 Test.

2.

Explain 2 Test for Independence of 2
variables


3.

Explain The Fisher’s test for independence

4.

McNemar’s tests for correlated data

5.

Kappa Statistic

6.

Use of Computer Program
Biostatistics I: 2017-18

[email protected]

2


Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Data Types
Data

Quantitative

Discrete

Continuous

Biostatistics I: 2017-18
[email protected]

Qualitative

3

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health


Qual i tati ve Data
1.

2.
3.
4.

Qualitative Random Variables Yield
Responses That Can Be Put In Categories.
Example: Sex (Male, Female)
Measurement or Count Reflect # in
Category
Nominal (no order) or Ordinal Scale (order)
Data can be collected as continuous but
recoded to categorical data. Example
(Systolic Blood Pressure - Hypotension,
Normal tension, hypertension )
Biostatistics I: 2017-18


[email protected]

4

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Hypothesi s Tests
Qual i tati ve Data
Qualitative
Data
1 pop.

Proportion

2 or more
pop.

Independence

2 pop.

Z Test

Z Test

2 Test

Biostatistics I: 2017-18
[email protected]

2 Test

5

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Di fferences i n
Two Proporti ons

Biostatistics I: 2017-18


6

[email protected] Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Heatlth

Hypotheses for
Two Proporti ons

Biostatistics I: 2017-18
[email protected]

7

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Hypotheses for
Two Proporti ons
Research Questions
Hypothesis No Difference Pop 1 Pop 2 Pop 1  Pop 2
Any Difference Pop 1 < Pop 2 Pop 1 > Pop 2


H0
Ha

Biostatistics I: 2017-18
[email protected]

8

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Hypotheses for
Two Proporti ons
Research Questions
Hypothesis No Difference Pop 1 Pop 2 Pop 1  Pop 2
Any Difference Pop 1 < Pop 2 Pop 1 > Pop 2

H0
Ha

p1 - p 2 = 0

p1 - p2 0

Biostatistics I: 2017-18
[email protected]

9

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Hypotheses for
Two Proporti ons
Research Questions
Hypothesis No Difference Pop 1 Pop 2 Pop 1  Pop 2
Any Difference Pop 1 < Pop 2 Pop 1 > Pop 2

H0

p1 - p 2 = 0

Ha


p1 - p2 0

p1 - p2  0
p1 - p 2 < 0

Biostatistics I: 2017-18
[email protected]

10

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Hypotheses for
Two Proporti ons
Research Questions
Hypothesis No Difference Pop 1 Pop 2 Pop 1  Pop 2
Any Difference Pop 1 < Pop 2 Pop 1 > Pop 2

H0


p1 - p 2 = 0

Ha

p1 - p2 0

p1 - p2  0
p1 - p 2 < 0

Biostatistics I: 2017-18
[email protected]

p1 - p2  0
p1 - p2 > 0

11

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health


Hypotheses for
Two Proporti ons
Research Questions
Hypothesis No Difference Pop 1 Pop 2 Pop 1  Pop 2
Any Difference Pop 1 < Pop 2 Pop 1 > Pop 2

H0

p1 - p 2 = 0

Ha

p1 - p2 0

p1 - p2  0
p1 - p 2 < 0

Biostatistics I: 2017-18
[email protected]

p1 - p2  0
p1 - p2 > 0

12

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Di fference i n Two
Proporti ons
1. Assumptions
Populations Are Independent
Populations Follow Binomial Distribution
Normal Approximation Can Be Used for large
samples (All Expected Counts  5)





2.

Z-Test Statistic for Two Proportions
Z

 pˆ 1  pˆ 2    p1  p2 
1 1
ˆp  1  pˆ     
 n1 n2 

X1  X 2
where pˆ 
n1  n2

Biostatistics I: 2017-18
[email protected]

13

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Sampl e Di stri buti on for Di fference
Between Proporti ons

12  22 

X1  X 2 ~ N  1  2 ;



n
n
1
2


p1  p2


p1 1  p1  p2 1  p2  
 N  p1  p2 ;




n1
n2




 1 1 
 N  0; pq    

n1 n2  



x1  x2
p
,
n1  n2

under H 0 : p1  p2

Biostatistics I: 2017-18
[email protected]

14

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons
Thi nki ng Chal l enge
MA


You’re an epidemiologist for the
Department of Health in your
country. You’re studying the
prevalence of disease X in two
provinces (MA and CA). In MA, 74
of 1500 people surveyed were
diseased and in CA, 129 of 1500
were diseased. At .05 level, does
MA have a lower prevalence rate?
Biostatistics I: 2017-18

[email protected]

CA

15

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons
Sol uti on*

Biostatistics I: 2017-18
[email protected]

16

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons
Sol uti on*
H0:
Ha:
=

nMA =

Test Statistic:

nCA =

Critical Value(s):

Decision:
Conclusion:

Biostatistics I: 2017-18
[email protected]

17

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons
Sol uti on*
H0: pMA - pCA = 0
Ha: pMA - pCA < 0
=
nMA =
nCA =
Critical Value(s):

Test Statistic:

Decision:
Conclusion:

Biostatistics I: 2017-18
[email protected]

18

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons
Sol uti on*
Test Statistic:
H0: pMA - pCA = 0
Ha: pMA - pCA < 0
 = .05
nMA = 1500 nCA = 1500
Critical Value(s):
Decision:
Conclusion:

Biostatistics I: 2017-18
[email protected]

19

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons
Sol uti on*
Test Statistic:
H0: pMA - pCA = 0
Ha: pMA - pCA < 0
 = .05
nMA = 1500 nCA = 1500
Critical Value(s):
Decision:
Reject

.05

Conclusion:

-1.645 0

Z
Biostatistics I: 2017-18

[email protected]

20

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons Sol uti on*
pˆ MA 

74
X MA

 .0493
nMA 1500

pˆ CA 

X CA 129

 .0860
nCA 1500

X MA  X CA
74  129
pˆ 

 .0677
nMA  nCA 1500  1500
Z

.0493  .0860  0 

1 
 1
.0677   1  .0677   


 1500 1500 
 4.00
Biostatistics I: 2017-18

[email protected]

21

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons
Sol uti on*
Test Statistic:
H0: pMA - pCA = 0
Z = -4.00
Ha: pMA - pCA < 0
 = .05
nMA = 1500 nCA = 1500
Critical Value(s):
Decision:
Reject

.05

Conclusion:

-1.645 0

Z
Biostatistics I: 2017-18

[email protected]

22

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons
Sol uti on*
Test Statistic:
H0: pMA - pCA = 0
Z = -4.00
Ha: pMA - pCA < 0
 = .05
nMA = 1500 nCA = 1500
Critical Value(s):
Decision:
Reject
Reject at  = .05

.05

Conclusion:

-1.645 0

Z
Biostatistics I: 2017-18

[email protected]

23

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Z Test for Two Proporti ons
Sol uti on*
Test Statistic:
H0: pMA - pCA = 0
Z = -4.00
Ha: pMA - pCA < 0
 = .05
nMA = 1500 nCA = 1500
Critical Value(s):
Decision:
Reject
Reject at  = .05

.05
-1.645 0

Z

Conclusion:
There is evidence MA
is less than CA
Biostatistics I: 2017-18

[email protected]

24

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Between 2 Categori cal
Vari abl es

Biostatistics I: 2017-18

25

[email protected] Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Heatlth

Hypothesi s Tests
Qual i tati ve Data
Qualitative
Data
1 pop.

Proportion

2 or more
pop.

Independence

2 pop.
Z Test

Z Test

2 Test

Biostatistics I: 2017-18
[email protected]

2 Test

26

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

1.

2.

Shows If a Relationship Exists
Between 2 Qualitative Variables, but
does Not Show Causality
Assumptions
• Multinomial Experiment
• All Expected Counts  5

3.

Uses Two-Way Contingency Table

Biostatistics I: 2017-18
[email protected]

27

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Conti ngency Tabl e
 1.

Shows # Observations From 1
Sample Jointly in 2 Qualitative
Variables

Biostatistics I: 2017-18
[email protected]

28

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Conti ngency Tabl e
1. Shows # Observations From 1
Sample Jointly in 2 Qualitative
Variables
Levels of variable 2
Disease
Status
Disease
No disease
Total

Residence
Urban
Rural
63
15
78

49
33
82

Total
112
48
160

Levels of variable 1
Biostatistics I: 2017-18
[email protected]

29

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Hypotheses & Stati sti c
1.

Hypotheses



H0: Variables Are Independent
Ha: Variables Are Related (Dependent)

Biostatistics I: 2017-18
[email protected]

30

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Hypotheses & Stati sti c
1.

Hypotheses
H0: Variables Are Independent
Ha: Variables Are Related (Dependent)

2.

Test Statistic
Observed count

n Eˆn 

2

 
2

allcells

ij

ij

Eˆnij

Biostatistics I: 2017-18
[email protected]

Expected
count

31

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Hypotheses & Stati sti c
1.

Hypotheses
H0: Variables Are Independent
Ha: Variables Are Related (Dependent)

2.

Test Statistic

n Eˆn 

Observed count

2

 
2

allcells

ij

ij

Eˆnij

Expected
count
Rows Columns

Degrees of Freedom: (r - 1)(c - 1)
Biostatistics I: 2017-18
[email protected]

32

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence Expected
2

Counts
1.

2.
3.

Statistical Independence Means Joint
Probability Equals Product of
Marginal Probabilities
Compute Marginal Probabilities &
Multiply for Joint Probability
Expected Count Is Sample Size Times
Joint Probability

Biostatistics I: 2017-18
[email protected]

33

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Expected Count Exampl e

Biostatistics I: 2017-18
[email protected]

34

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Expected Count Exampl e
Marginal probability = 112
160

Residence
Urban Rural
Obs. Obs.

Disease
Status
Disease

63

49

112

No Disease

15

33

48

78

82

160

Total

Biostatistics I: 2017-18
[email protected]

Total

35

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Expected Count Exampl e
Marginal probability = 112
160

Residence
Urban Rural
Obs. Obs.

Disease
Status

Total

Disease

63

49

112

No Disease

15

33

48

78

82

160

Total
78
Marginal probability =
160

Biostatistics I: 2017-18
[email protected]

36

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Expected Count Exampl e
112 78
Joint probability =
160 160

Marginal probability = 112
160

Residence
Urban Rural
Obs. Obs.

Disease
Status

Total

Disease

63

49

112

No Disease

15

33

48

78

82

160

Total
78
Marginal probability =
160

Biostatistics I: 2017-18
[email protected]

37

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Expected Count Exampl e
112 78
Joint probability =
160 160

Marginal probability = 112
160

Residence
Urban Rural
Obs. Obs.

Disease
Status
Disease

63

49

112

No Disease

15

33

48

78

82

160

Total
78
Marginal probability =
160

112 78
Expected count = 160·
160 160
= 54.6

Biostatistics I: 2017-18
[email protected]

Total

38

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Expected Count Cal cul ati on

Biostatistics I: 2017-18
[email protected]

39

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Expected Count Cal cul ati on
Expected count =

aRow totalf  aColumn totalf
Sample size

Biostatistics I: 2017-18
[email protected]

40

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

Expected Count Cal cul ati on
Expected count =
112x78
160
Disease

Status

aRow totalf  aColumn totalf
Sample size

Residence
112x82
160
Urban
Rural
Obs. Exp. Obs. Exp. Total

Disease

63

54.6

49

57.4

112

No Disease

15

23.4

33

24.6

48

Total

78

78

82

82

48x78
160Biostatistics I: 2017-18
[email protected]

160
48x82
160

41

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Exampl e on HIV


You randomly sample 286 sexually active
individuals and collect information on their HIV
status and History of STDs. At the .05 level, is
there evidence of a relationship?
HIV
STDs Hx
No
Yes
Total

No
84
48
132
Biostatistics I: 2017-18

[email protected]

Yes
32
122
154

Total
116
170
286
42

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on

Biostatistics I: 2017-18
[email protected]

43

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on
H0:
Ha:
=
df =
Critical Value(s):

Test Statistic:

Decision:

Reject

Conclusion:
0

2
Biostatistics I: 2017-18

[email protected]

44

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on
H0: No Relationship
Ha: Relationship
=
df =
Critical Value(s):

Test Statistic:

Decision:

Reject

Conclusion:
0

2
Biostatistics I: 2017-18

[email protected]

45

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):

Test Statistic:

Decision:

Reject

Conclusion:
0

2
Biostatistics I: 2017-18

[email protected]

46

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):

Test Statistic:

Decision:

Reject

 = .05
0

3.841

Conclusion:

2
Biostatistics I: 2017-18

[email protected]

47

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on



E(nij)  5 in all
cells
116x132
286

STDs HX

HIV

154x116
286

No
Yes
Obs. Exp. Obs. Exp. Total

No

84

53.5

32

62.5

116

Yes

48

78.5

122

91.5

170

132

132

154

154

286

Total

170x132
286
Biostatistics I: 2017-18
[email protected]

170x154
286
48

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on
 
2


all cells

c h
c h

nij  E nij
E n
ij

a f
a f

n11  E n11

E n
11

84  53.5

53.5

2

2

2

a f
a f

n12  E n12

E n

2

12

32  62.5

62.5

2

122  91.5

91.5

Biostatistics I: 2017-18
[email protected]

a f
a f

n22  E n22

E n

2

22

2

 54.29

49

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):

Test Statistic:
2 = 54.29

Decision:

Reject

 = .05
0

3.841

Conclusion:

2
Biostatistics I: 2017-18

[email protected]

50

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):
Reject

 = .05
0

3.841

Test Statistic:
2 = 54.29

Decision:
Reject at  = .05
Conclusion:

2
Biostatistics I: 2017-18

[email protected]

51

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

Sol uti on
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):
Reject

 = .05
0

3.841

2

Test Statistic:
2 = 54.29

Decision:
Reject at  = .05
Conclusion:
There is evidence of a
relationship

Biostatistics I: 2017-18
[email protected]

52

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

STATA
tabi 30 18 38 \ 13 7 22, chi2








|
col
row |
1
2
3 |
Total
-----------+---------------------------------+---------1 |
30
18
38 |
86
2 |
13
7
22 |
42
-----------+---------------------------------+---------Total |
43
25
60 |
128

Pearson chi2(2) = 0.7967 Pr = 0.671

Biostatistics I: 2017-18
[email protected]

53

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

SAS CODES
Data dis;
input STDs HIV count;
cards;
1 1 84
1 2 32
2 1 48
2 2 122
;
run;
Proc freq data=dis order=data;
weight Count;
tables STDs*HIV/chisq;
run;

Biostatistics I: 2017-18
[email protected]

54

Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatatics, Epidemiology and Population Health

 Test of Independence
2

SAS OUTPUT
Statistics for Table of STDs by HIV
Statistic
DF
Value
Prob
------------------------------------------------------Chi-Square
1
54.1502
S