Materi Analisis Data Kategori

Contingency Tables
1.

Explain 2 Test of Independence

2.

Measure of Association

Contingency Tables

• Tables representing all combinations

of levels of explanatory and response
variables
• Numbers in table represent Counts
of the number of cases in each cell
• Row and column totals are called
Marginal counts

2x2 Tables

• Each variable has 2 levels
– Explanatory Variable – Groups (Typically
based on demographics, exposure)
– Response Variable – Outcome (Typically
presence or absence of a characteristic)

2x2 Tables - Notation
Outcome
Present

Outcome
Absent

Group
Total

Group 1

n11

n12

n1.

Group 2

n21

n22

n2.

Outcome
Total

n.1

n.2

n..

2 Test of Independence

2 Test of Independence
• 1. Shows If a Relationship Exists
Between 2 Qualitative Variables
– One Sample Is Drawn
– Does Not Show Causality

• 2. Assumptions
– Multinomial Experiment
– All Expected Counts  5

• 3. Uses Two-Way Contingency Table

 Test of Independence
Contingency Table
2

• 1. Shows # Observations From 1
Sample Jointly in 2 Qualitative
Variables

 Test of Independence
Contingency Table
2

• 1. Shows # Observations From 1

Sample Jointly in 2 Qualitative
Levels of variable 2
Variables

Levels of variable 1

 Test of Independence
Hypotheses & Statistic
2

• 1. Hypotheses
– H0: Variables Are Independent
– Ha: Variables Are Related (Dependent)

 Test of Independence
Hypotheses & Statistic
2

• 1. Hypotheses
– H0: Variables Are Independent
– Ha: Variables Are Related (Dependent)
Observed count

• 2. Test Statistic 
n

E
n
ij
ij

2
 



all cells

ch

Ec
n h
ij

2

Expected
count

 Test of Independence
Hypotheses & Statistic

2

• 1. Hypotheses
– H0: Variables Are Independent
– Ha: Variables Are Related (Dependent)

• 2. Test Statistic
2

 



all cells

Observed count

ch
ch

nij  E nij

E n
ij

2

Expected
count

Rows Columns

• Degrees of Freedom: (r - 1)(c - 1)

2 Test of Independence
Expected Counts
• 1. Statistical Independence Means

Joint Probability Equals Product of
Marginal Probabilities

• 2. Compute Marginal Probabilities &
Multiply for Joint Probability
• 3. Expected Count Is Sample Size
Times Joint Probability

Expected Count Example

Expected Count Example
Location
Urban Rural
House Style Obs. Obs.
Split-Level
63
49
Ranch
Total

Total
112

15

33

48

78

82

160

Expected Count Example
Marginal probability = 112
160

Expected Count Example
Marginal probability = 112
160

Location
Urban Rural
House Style Obs. Obs.
Split-Level
63
49
Ranch
Total
78
Marginal probability =
160

Total
112

15

33

48

78

82

160

Expected Count Example
112 78
Joint probability =
160 160

Marginal probability = 112
160

Location
Urban Rural
House Style Obs. Obs.
Split-Level
63
49
Ranch
Total
78
Marginal probability =
160

Total
112

15

33

48

78

82

160

Expected Count Example
112 78
Joint probability =
160 160

Marginal probability = 112
160

Location
Urban Rural
House Style Obs. Obs.
Split-Level
63
49
Ranch
Total
78
Marginal probability =
160

Total
112

15

33

48

78

82

160

112 78
Expected count = 160·
160 160
= 54.6

Expected Count Calculation

Expected Count Calculation
Expected count =

aRow totalfaColumn totalf
Sample size

Expected Count Calculation
Expected count =
112·78
160

aRow totalfaColumn totalf
Sample size

112·82
160

48·78
160

48·82
160

 Test of Independence
Example
2

• You’re a marketing research analyst. You
ask a random sample of 286 consumers if
they purchase Diet Pepsi or Diet Coke. At
the .05 level, is there evidence of a
relationship?
Diet Pepsi
Diet Coke
No
Yes
Total
No
84
32
116
Yes
48
122
170
Total
132
154
286

2 Test of Independence
Solution

2 Test of Independence
Solution
• H0:
• Ha:
 =
• df =
• Critical Value(s):

Test Statistic:

Decision:

Reject

Conclusion:
0

2

2 Test of Independence
Solution
• H0: No

Test Statistic:

Relationship
• Ha: Relationship
 =
• df =
• Critical Value(s):

Decision:

Reject

Conclusion:
0

2

2 Test of Independence
Solution
• H0: No

Test Statistic:

Relationship
• Ha: Relationship
  = .05
• df = (2 - 1)(2 - 1)
=1
Reject
• Critical Value(s):

Decision:
Conclusion:

0

2

2 Test of Independence
Solution
• H0: No

Test Statistic:

Relationship
• Ha: Relationship
  = .05
• df = (2 - 1)(2 - 1)
=1
Reject
• Critical Value(s):
 = .05

0

3.841

2

Decision:
Conclusion:

2 Test of Independence
Solution



E(nij)  5 in all
cells
116·132
286

154·116
286

170·132
286

170·154
286

2 Test of Independence
Solution
2

 



all cells

af
af

n11  E n11

E n
11

84  53.5

53.5

ch
ch

nij  E nij
E n

2

2

ij

2

af
af

n12  E n12

E n
12

2

2

af
af

n22  E n22

E n

32  62.5 2
122  91.5


62.5
91.5

2

22

2

 54.29

2 Test of Independence
Solution
• H0: No

Test Statistic:

Relationship
• Ha: Relationship
  = .05
• df = (2 - 1)(2 - 1)
=1
Reject
• Critical Value(s):
 = .05

0

3.841

2

 2 = 54.29

Decision:
Conclusion:

2 Test of Independence
Solution
• H0: No

Test Statistic:

Relationship
• Ha: Relationship
  = .05
• df = (2 - 1)(2 - 1)
=1
Reject
• Critical Value(s):
 = .05

0

3.841

2

 2 = 54.29

Decision:
Reject at  = .05
Conclusion:

2 Test of Independence
Solution
• H0: No

Test Statistic:

Relationship
• Ha: Relationship
  = .05
• df = (2 - 1)(2 - 1)
=1
Reject
• Critical Value(s):
 = .05

0

3.841

2

 2 = 54.29

Decision:
Reject at  = .05
Conclusion:
There is evidence of a
relationship

Siskel and Ebert
•
|
Ebert
•
Siskel |
Con
Mix
Pro |
Total
• -----------+---------------------------------+---------•
Con |
24
8
13 |
45
•
Mix |
8
13
11 |
32
•
Pro |
10
9
64 |
83
• -----------+---------------------------------+---------•
Total |
42
30
88 |
160

Siskel and Ebert

•
|
Ebert
•
Siskel |
Con
Mix
Pro |
Total
•-----------+---------------------------------+---------•
Con |
24
8
13 |
45
•
|
11.8
8.4
24.8 |
45.0
•-----------+---------------------------------+---------•
Mix |
8
13
11 |
32
•
|
8.4
6.0
17.6 |
32.0
•-----------+---------------------------------+---------•
Pro |
10
9
64 |
83
•
|
21.8
15.6
45.6 |
83.0
•-----------+---------------------------------+---------•
Total |
42
30
88 |
160
•
|
42.0
30.0
88.0 |
160.0
•

Pearson chi2(4) =

45.3569

p < 0.001

Yate’s Statistics
• Method of testing for association for
2x2 tables when sample size is
moderate ( total observation
between 6 – 25)
2
 Oij  eij  0.5



2

 

i



j

eij

Measures of association
–
–
–

Relative Risk
End of
Odds Ratio
Absolute Risk

Chapter

Any blank slides that follow are
blank intentionally.

Relative Risk
• Ratio of the probability that the outcome
characteristic is present for one group,
relative to the other
• Sample proportions with characteristic
from groups 1 and 2:

n11
1 
n1.
^

n21
2 
n2.
^

Relative Risk
• Estimated Relative Risk:
^

RR  1

^



2

95% Confidence Interval for Population Relative Risk:
( RR (e  1.96

v

) , RR (e1.96
^

e 2.71828

v 

v

))
^

(1   1 )
(1  

n11
n21

2

)

Relative Risk
• Interpretation
– Conclude that the probability that the outcome
is present is higher (in the population) for group
1 if the entire interval is above 1
– Conclude that the probability that the outcome
is present is lower (in the population) for group 1
if the entire interval is below 1
– Do not conclude that the probability of the
outcome differs for the two groups if the interval
contains 1

Example - Coccidioidomycosis and
TNF-antagonists
• Research Question: Risk of developing Coccidioidmycosis
associated with arthritis therapy?
• Groups: Patients receiving tumor necrosis factor  (TNF)
versus Patients not receiving TNF (all patients arthritic)

TNF
Other
Total

Source: Bergstrom, et al (2004)

COC
7
4
11

No COC
240
734
974

Total
247
738
985

Example - Coccidioidomycosis and
TNF-antagonists
• Group 1: Patients on TNF
• Group 2: Patients not on TNF
^
7
4
1 
.0283  2 
.0054
247
738
^

^

.0283
RR  ^ 
5.24
 2 .0054

1

95%CI : (5.24e  1.96

.3874

1  .0283 1  .0054
v

.3874
7
4
, 5.24e1.96

.3874

) (1.55 , 17.76)

Entire CI above 1  Conclude higher risk if on TNF

Odds Ratio
• Odds of an event is the probability it occurs

divided by the probability it does not occur
• Odds ratio is the odds of the event for group 1
divided by the odds of the event for group 2
• Sample odds of the outcome for each group:
n11 / n1.
n11
odds1 

n12 / n1.
n12
odds2 

n21
n22

Odds Ratio
• Estimated Odds Ratio:

odds1 n11 / n12 n11n22
OR 


odds2 n21 / n22 n12 n21
95% Confidence Interval for Population Odds Ratio
( OR (e  1.96

v

) , OR (e1.96 v ) )
1
1
1
1
e 2.71828
v 



n11
n12
n21
n22

Odds Ratio
• Interpretation
– Conclude that the probability that the outcome
is present is higher (in the population) for group
1 if the entire interval is above 1
– Conclude that the probability that the outcome
is present is lower (in the population) for group 1
if the entire interval is below 1
– Do not conclude that the probability of the
outcome differs for the two groups if the interval
contains 1

Example - NSAIDs and GBM
• Case-Control Study (Retrospective)
– Cases: 137 Self-Reporting Patients with Glioblastoma
Multiforme (GBM)
– Controls: 401 Population-Based Individuals matched to
cases wrt demographic factors

GBM Present GBM Absent
NSAID User
32
138
NSAID Non-User
105
263
Total
137
401
Source: Sivak-Sears, et al (2004)

Total
170
368
538

Example - NSAIDs and GBM
32(263)
8416
OR 

0.58
138(105) 14490
1
1
1
1
v 


0.0518
32 138 105 263
95% CI : ( 0.58e  1.96

0.0518

, 0.58e1.96

0.0518

) (0.37 , 0.91)

Interval is entirely below 1, NSAID use appears
to be lower among cases than controls

Absolute Risk
• Difference Between Proportions of outcomes
with an outcome characteristic for 2 groups

• Sample proportions with characteristic
from groups 1 and 2:

n11
1 
n1.
^

n21
2 
n2.
^

Absolute Risk
Estimated Absolute Risk:
^

^

AR  1   2
95% Confidence Interval for Population Absolute Risk
^

 ^  ^  ^ 
 1 1   1   2 1   2 

 

AR 1.96
n1.
n2.

Absolute Risk
• Interpretation
– Conclude that the probability that the outcome
is present is higher (in the population) for group
1 if the entire interval is positive
– Conclude that the probability that the outcome
is present is lower (in the population) for group 1
if the entire interval is negative
– Do not conclude that the probability of the
outcome differs for the two groups if the interval
contains 0

Example - Coccidioidomycosis and
TNF-antagonists
• Group 1: Patients on TNF
• Group 2: Patients not on TNF
^
7
4
1 
.0283  2 
.0054
247
738
^

^

^

AR  1   2 .0283  .0054 .0229
.0283(.9717) .0054(.9946)

247
738
.0229 .0213 (0.0016 , 0.0242)

95%CI : .0229 1.96

Interval is entirely positive, TNF is
associated with higher risk

Ordinal Explanatory and Response
Variables
• Pearson’s Chi-square test can be used to

test associations among ordinal variables,
but more powerful methods exist
• When theories exist that the association is
directional (positive or negative), measures
exist to describe and test for these specific
alternatives from independence:
– Gamma
– Kendall’s b

Concordant and Discordant Pairs
• Concordant Pairs - Pairs of individuals where

one individual scores “higher” on both ordered
variables than the other individual
• Discordant Pairs - Pairs of individuals where
one individual scores “higher” on one ordered
variable and the other individual scores
“lower” on the other
• C = # Concordant Pairs D = # Discordant
Pairs
– Under Positive association, expect C > D
– Under Negative association, expect C < D
– Under No association, expect C  D

Example - Alcohol Use and Sick
Days
• Alcohol Risk (Without Risk, Hardly any Risk,

Some to Considerable Risk)
• Sick Days (0, 1-6, 7)
• Concordant Pairs - Pairs of respondents
where one scores higher on both alcohol
risk and sick days than the other
• Discordant Pairs - Pairs of respondents
where one scores higher on alcohol risk and
the other scores higher on sick days
Source: Hermansson, et al (2003)

Example - Alcohol Use and Sick
Days
ALCOHOL * SICKDAYS Crosstabulation

Count

ALCOHOL

Total

Without Risk
Hardly any Risk
Some-Considerable Risk

0 days
347
154
52
553

SICKDAYS
1-6 days
113
63
25
201

7+ days
145
56
34
235

Total
605
273
111
989

• Concordant Pairs: Each individual in a given cell
is concordant with each individual in cells
“Southeast” of theirs
•Discordant Pairs: Each individual in a given cell is
discordant with each individual in cells “Southwest”
of theirs

Example - Alcohol Use and Sick
Days
ALCOHOL * SICKDAYS Crosstabulation

Count

ALCOHOL

Total

Without Risk
Hardly any Risk
Some-Considerable Risk

0 days
347
154
52
553

SICKDAYS
1-6 days
113
63
25
201

7+ days
145
56
34
235

Total
605
273
111
989

C 347(63  56  25  34)  113(56  34)  154(25  34)  63(34) 83164
D 145(154  63  52  25)  113(154  52)  56(52  25)  63(52) 73496

Measures of Association
• Goodman and Kruskal’s Gamma:
C D

CD
^

^

 1   1

• Kendall’s b:
C D

^

b 

2

(n 

2

2
n
)(
n

 i.

2

 n. j )

When there’s no association between the ordinal variables,
the population based values of these measures are 0.
Statistical software packages provide these tests.

Example - Alcohol Use and Sick
Days
C  D 83164  73496


0.0617
C  D 83164  73496
^

Symmetric Measures

Ordinal by
Ordinal

Kendall's tau-b
Gamma

N of Valid Cases

Value
.035
.062
989

Asymp.
a
Std. Error
.030
.052

b

Approx. T
1.187
1.187

a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.

Approx. Sig.
.235
.235

Materi Analisis Data Kategori

Dokumen yang terkait

Materi Analisis Data Kategori

Materi Analisis Data Kategori

Materi Analisis Data Kategori

Pendeteksian Sumber Ketidakbebasan

Model Peluang Linier

Materi Analisis Data Kategori

LATIHAN REGRESI LOGISTIK INGGRIS

Model Probit dan Gompit 2012

Loglinear Models 2012 Rev

Materi Analisis Data Kategori Lanjutan

Dukungan

Links

Materi Analisis Data Kategori

Dokumen yang terkait

Materi Analisis Data Kategori

Materi Analisis Data Kategori

Materi Analisis Data Kategori

Pendeteksian Sumber Ketidakbebasan

Model Peluang Linier

Materi Analisis Data Kategori

LATIHAN REGRESI LOGISTIK INGGRIS

Model Probit dan Gompit 2012

Loglinear Models 2012 Rev

Materi Analisis Data Kategori Lanjutan

Dokumen yang Anda mencari sudah siap untuk unduhkan