Materi Analisis Data Kategori

Contingency Tables
1.

Explain 2 Test of Independence

2.

Measure of Association

Contingency Tables

• Tables representing all combinations

of levels of explanatory and response
variables
• Numbers in table represent Counts
of the number of cases in each cell
• Row and column totals are called
Marginal counts

2x2 Tables


• Each variable has 2 levels
– Explanatory Variable – Groups (Typically
based on demographics, exposure)
– Response Variable – Outcome (Typically
presence or absence of a characteristic)

2x2 Tables - Notation
Outcome
Present

Outcome
Absent

Group
Total

Group 1

n11


n12

n1.

Group 2

n21

n22

n2.

Outcome
Total

n.1

n.2


n..

2 Test of Independence

2 Test of Independence
• 1. Shows If a Relationship Exists
Between 2 Qualitative Variables
– One Sample Is Drawn
– Does Not Show Causality

• 2. Assumptions
– Multinomial Experiment
– All Expected Counts  5

• 3. Uses Two-Way Contingency Table

 Test of Independence
Contingency Table
2


• 1. Shows # Observations From 1
Sample Jointly in 2 Qualitative
Variables

 Test of Independence
Contingency Table
2

• 1. Shows # Observations From 1

Sample Jointly in 2 Qualitative
Levels of variable 2
Variables

Levels of variable 1

 Test of Independence
Hypotheses & Statistic
2


• 1. Hypotheses
– H0: Variables Are Independent
– Ha: Variables Are Related (Dependent)

 Test of Independence
Hypotheses & Statistic
2

• 1. Hypotheses
– H0: Variables Are Independent
– Ha: Variables Are Related (Dependent)
Observed count

• 2. Test Statistic 
n

E
n
ij
ij

2
 



all cells

ch

Ec
n h
ij

2

Expected
count

 Test of Independence
Hypotheses & Statistic

2

• 1. Hypotheses
– H0: Variables Are Independent
– Ha: Variables Are Related (Dependent)

• 2. Test Statistic
2

 



all cells

Observed count

ch
ch


nij  E nij

E n
ij

2

Expected
count

Rows Columns

• Degrees of Freedom: (r - 1)(c - 1)

2 Test of Independence
Expected Counts
• 1. Statistical Independence Means

Joint Probability Equals Product of
Marginal Probabilities

• 2. Compute Marginal Probabilities &
Multiply for Joint Probability
• 3. Expected Count Is Sample Size
Times Joint Probability

Expected Count Example

Expected Count Example
Location
Urban Rural
House Style Obs. Obs.
Split-Level
63
49
Ranch
Total

Total
112


15

33

48

78

82

160

Expected Count Example
Marginal probability = 112
160

Expected Count Example
Marginal probability = 112
160


Location
Urban Rural
House Style Obs. Obs.
Split-Level
63
49
Ranch
Total
78
Marginal probability =
160

Total
112

15

33

48

78

82

160

Expected Count Example
112 78
Joint probability =
160 160

Marginal probability = 112
160

Location
Urban Rural
House Style Obs. Obs.
Split-Level
63
49
Ranch
Total
78
Marginal probability =
160

Total
112

15

33

48

78

82

160

Expected Count Example
112 78
Joint probability =
160 160

Marginal probability = 112
160

Location
Urban Rural
House Style Obs. Obs.
Split-Level
63
49
Ranch
Total
78
Marginal probability =
160

Total
112

15

33

48

78

82

160

112 78
Expected count = 160·
160 160
= 54.6

Expected Count Calculation

Expected Count Calculation
Expected count =

aRow totalfaColumn totalf
Sample size

Expected Count Calculation
Expected count =
112·78
160

aRow totalfaColumn totalf
Sample size

112·82
160

48·78
160

48·82
160

 Test of Independence
Example
2

• You’re a marketing research analyst. You
ask a random sample of 286 consumers if
they purchase Diet Pepsi or Diet Coke. At
the .05 level, is there evidence of a
relationship?
Diet Pepsi
Diet Coke
No
Yes
Total
No
84
32
116
Yes
48
122
170
Total
132
154
286

2 Test of Independence
Solution

2 Test of Independence
Solution
• H0:
• Ha:
 =
• df =
• Critical Value(s):

Test Statistic:

Decision:

Reject

Conclusion:
0

2

2 Test of Independence
Solution
• H0: No

Test Statistic:

Relationship
• Ha: Relationship
 =
• df =
• Critical Value(s):

Decision:

Reject

Conclusion:
0

2

2 Test of Independence
Solution
• H0: No

Test Statistic:

Relationship
• Ha: Relationship
  = .05
• df = (2 - 1)(2 - 1)
=1
Reject
• Critical Value(s):

Decision:
Conclusion:

0

2

2 Test of Independence
Solution
• H0: No

Test Statistic:

Relationship
• Ha: Relationship
  = .05
• df = (2 - 1)(2 - 1)
=1
Reject
• Critical Value(s):
 = .05

0

3.841

2

Decision:
Conclusion:

2 Test of Independence
Solution



E(nij)  5 in all
cells
116·132
286

154·116
286

170·132
286

170·154
286

2 Test of Independence
Solution
2

 



all cells

af
af

n11  E n11

E n
11

84  53.5

53.5

ch
ch

nij  E nij
E n

2

2

ij

2

af
af

n12  E n12

E n
12

2

2

af
af

n22  E n22

E n

32  62.5 2
122  91.5


62.5
91.5

2

22

2

 54.29

2 Test of Independence
Solution
• H0: No

Test Statistic:

Relationship
• Ha: Relationship
  = .05
• df = (2 - 1)(2 - 1)
=1
Reject
• Critical Value(s):
 = .05

0

3.841

2

 2 = 54.29

Decision:
Conclusion:

2 Test of Independence
Solution
• H0: No

Test Statistic:

Relationship
• Ha: Relationship
  = .05
• df = (2 - 1)(2 - 1)
=1
Reject
• Critical Value(s):
 = .05

0

3.841

2

 2 = 54.29

Decision:
Reject at  = .05
Conclusion:

2 Test of Independence
Solution
• H0: No

Test Statistic:

Relationship
• Ha: Relationship
  = .05
• df = (2 - 1)(2 - 1)
=1
Reject
• Critical Value(s):
 = .05

0

3.841

2

 2 = 54.29

Decision:
Reject at  = .05
Conclusion:
There is evidence of a
relationship

Siskel and Ebert

|
Ebert

Siskel |
Con
Mix
Pro |
Total
• -----------+---------------------------------+---------•
Con |
24
8
13 |
45

Mix |
8
13
11 |
32

Pro |
10
9
64 |
83
• -----------+---------------------------------+---------•
Total |
42
30
88 |
160

Siskel and Ebert


|
Ebert

Siskel |
Con
Mix
Pro |
Total
•-----------+---------------------------------+---------•
Con |
24
8
13 |
45

|
11.8
8.4
24.8 |
45.0
•-----------+---------------------------------+---------•
Mix |
8
13
11 |
32

|
8.4
6.0
17.6 |
32.0
•-----------+---------------------------------+---------•
Pro |
10
9
64 |
83

|
21.8
15.6
45.6 |
83.0
•-----------+---------------------------------+---------•
Total |
42
30
88 |
160

|
42.0
30.0
88.0 |
160.0


Pearson chi2(4) =

45.3569

p < 0.001

Yate’s Statistics
• Method of testing for association for
2x2 tables when sample size is
moderate ( total observation
between 6 – 25)
2
 Oij  eij  0.5



2

 

i



j

eij

Measures of association




Relative Risk
End of
Odds Ratio
Absolute Risk

Chapter

Any blank slides that follow are
blank intentionally.

Relative Risk
• Ratio of the probability that the outcome
characteristic is present for one group,
relative to the other
• Sample proportions with characteristic
from groups 1 and 2:

n11
1 
n1.
^

n21
2 
n2.
^

Relative Risk
• Estimated Relative Risk:
^

RR  1

^



2

95% Confidence Interval for Population Relative Risk:
( RR (e  1.96

v

) , RR (e1.96
^

e 2.71828

v 

v

))
^

(1   1 )
(1  

n11
n21

2

)

Relative Risk
• Interpretation
– Conclude that the probability that the outcome
is present is higher (in the population) for group
1 if the entire interval is above 1
– Conclude that the probability that the outcome
is present is lower (in the population) for group 1
if the entire interval is below 1
– Do not conclude that the probability of the
outcome differs for the two groups if the interval
contains 1

Example - Coccidioidomycosis and
TNF-antagonists
• Research Question: Risk of developing Coccidioidmycosis
associated with arthritis therapy?
• Groups: Patients receiving tumor necrosis factor  (TNF)
versus Patients not receiving TNF (all patients arthritic)

TNF
Other
Total

Source: Bergstrom, et al (2004)

COC
7
4
11

No COC
240
734
974

Total
247
738
985

Example - Coccidioidomycosis and
TNF-antagonists
• Group 1: Patients on TNF
• Group 2: Patients not on TNF
^
7
4
1 
.0283  2 
.0054
247
738
^

^

.0283
RR  ^ 
5.24
 2 .0054

1

95%CI : (5.24e  1.96

.3874

1  .0283 1  .0054
v

.3874
7
4
, 5.24e1.96

.3874

) (1.55 , 17.76)

Entire CI above 1  Conclude higher risk if on TNF

Odds Ratio
• Odds of an event is the probability it occurs

divided by the probability it does not occur
• Odds ratio is the odds of the event for group 1
divided by the odds of the event for group 2
• Sample odds of the outcome for each group:
n11 / n1.
n11
odds1 

n12 / n1.
n12
odds2 

n21
n22

Odds Ratio
• Estimated Odds Ratio:

odds1 n11 / n12 n11n22
OR 


odds2 n21 / n22 n12 n21
95% Confidence Interval for Population Odds Ratio
( OR (e  1.96

v

) , OR (e1.96 v ) )
1
1
1
1
e 2.71828
v 



n11
n12
n21
n22

Odds Ratio
• Interpretation
– Conclude that the probability that the outcome
is present is higher (in the population) for group
1 if the entire interval is above 1
– Conclude that the probability that the outcome
is present is lower (in the population) for group 1
if the entire interval is below 1
– Do not conclude that the probability of the
outcome differs for the two groups if the interval
contains 1

Example - NSAIDs and GBM
• Case-Control Study (Retrospective)
– Cases: 137 Self-Reporting Patients with Glioblastoma
Multiforme (GBM)
– Controls: 401 Population-Based Individuals matched to
cases wrt demographic factors

GBM Present GBM Absent
NSAID User
32
138
NSAID Non-User
105
263
Total
137
401
Source: Sivak-Sears, et al (2004)

Total
170
368
538

Example - NSAIDs and GBM
32(263)
8416
OR 

0.58
138(105) 14490
1
1
1
1
v 


0.0518
32 138 105 263
95% CI : ( 0.58e  1.96

0.0518

, 0.58e1.96

0.0518

) (0.37 , 0.91)

Interval is entirely below 1, NSAID use appears
to be lower among cases than controls

Absolute Risk
• Difference Between Proportions of outcomes
with an outcome characteristic for 2 groups

• Sample proportions with characteristic
from groups 1 and 2:

n11
1 
n1.
^

n21
2 
n2.
^

Absolute Risk
Estimated Absolute Risk:
^

^

AR  1   2
95% Confidence Interval for Population Absolute Risk
^

 ^  ^  ^ 
 1 1   1   2 1   2 

 

AR 1.96
n1.
n2.

Absolute Risk
• Interpretation
– Conclude that the probability that the outcome
is present is higher (in the population) for group
1 if the entire interval is positive
– Conclude that the probability that the outcome
is present is lower (in the population) for group 1
if the entire interval is negative
– Do not conclude that the probability of the
outcome differs for the two groups if the interval
contains 0

Example - Coccidioidomycosis and
TNF-antagonists
• Group 1: Patients on TNF
• Group 2: Patients not on TNF
^
7
4
1 
.0283  2 
.0054
247
738
^

^

^

AR  1   2 .0283  .0054 .0229
.0283(.9717) .0054(.9946)

247
738
.0229 .0213 (0.0016 , 0.0242)

95%CI : .0229 1.96

Interval is entirely positive, TNF is
associated with higher risk

Ordinal Explanatory and Response
Variables
• Pearson’s Chi-square test can be used to

test associations among ordinal variables,
but more powerful methods exist
• When theories exist that the association is
directional (positive or negative), measures
exist to describe and test for these specific
alternatives from independence:
– Gamma
– Kendall’s b

Concordant and Discordant Pairs
• Concordant Pairs - Pairs of individuals where

one individual scores “higher” on both ordered
variables than the other individual
• Discordant Pairs - Pairs of individuals where
one individual scores “higher” on one ordered
variable and the other individual scores
“lower” on the other
• C = # Concordant Pairs D = # Discordant
Pairs
– Under Positive association, expect C > D
– Under Negative association, expect C < D
– Under No association, expect C  D

Example - Alcohol Use and Sick
Days
• Alcohol Risk (Without Risk, Hardly any Risk,

Some to Considerable Risk)
• Sick Days (0, 1-6, 7)
• Concordant Pairs - Pairs of respondents
where one scores higher on both alcohol
risk and sick days than the other
• Discordant Pairs - Pairs of respondents
where one scores higher on alcohol risk and
the other scores higher on sick days
Source: Hermansson, et al (2003)

Example - Alcohol Use and Sick
Days
ALCOHOL * SICKDAYS Crosstabulation

Count

ALCOHOL

Total

Without Risk
Hardly any Risk
Some-Considerable Risk

0 days
347
154
52
553

SICKDAYS
1-6 days
113
63
25
201

7+ days
145
56
34
235

Total
605
273
111
989

• Concordant Pairs: Each individual in a given cell
is concordant with each individual in cells
“Southeast” of theirs
•Discordant Pairs: Each individual in a given cell is
discordant with each individual in cells “Southwest”
of theirs

Example - Alcohol Use and Sick
Days
ALCOHOL * SICKDAYS Crosstabulation

Count

ALCOHOL

Total

Without Risk
Hardly any Risk
Some-Considerable Risk

0 days
347
154
52
553

SICKDAYS
1-6 days
113
63
25
201

7+ days
145
56
34
235

Total
605
273
111
989

C 347(63  56  25  34)  113(56  34)  154(25  34)  63(34) 83164
D 145(154  63  52  25)  113(154  52)  56(52  25)  63(52) 73496

Measures of Association
• Goodman and Kruskal’s Gamma:
C D

CD
^

^

 1   1

• Kendall’s b:
C D

^

b 

2

(n 

2

2
n
)(
n

 i.

2

 n. j )

When there’s no association between the ordinal variables,
the population based values of these measures are 0.
Statistical software packages provide these tests.

Example - Alcohol Use and Sick
Days
C  D 83164  73496


0.0617
C  D 83164  73496
^

Symmetric Measures

Ordinal by
Ordinal

Kendall's tau-b
Gamma

N of Valid Cases

Value
.035
.062
989

Asymp.
a
Std. Error
.030
.052

b

Approx. T
1.187
1.187

a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.

Approx. Sig.
.235
.235