Loglinear Models 2012 Rev
Loglinear Models for
Contingency Tables
• Consider an IxJ contingency table that crossclassifies a multinomial sample of n subjects on
two categorical responses.
• The cell probabilities are (i j) and the expected
frequencies are (i j = n i j ).
• Loglinear model formulas use (i j = n i j ) rather
than (i j), so they also apply with Poisson
sampling for N = IJ independent cell counts (Yi j)
having {i j=E(Yi j) }.
• In either case we denote the observed cell counts
by (nij)
Independence Model
Under statistical independence
For multinomial sampling
Denote the row variable by X and the
column variable by Y
The formula expressing independence
is multiplicative
•Thus
for a row effect
and a column
effect
This is the loglinear model of
independence.
As usual, identifiability requires
constraints such as
•
• The tests using X2 and G2 are also goodness-offit tests of this loglinear model.
• Loglinear models for contingency tables are
GLMs that treat the N cell counts as
independent observations of a Poisson random
component.
• Loglinear GLMs identify the data as the N cell
counts rather than the individual classifications
of the n subjects.
• The expected cell counts link to the explanatory
terms using the log link
• The model does not distinguish
between response and explanatory
variables.
• It treats both jointly as responses,
modeling ij for combinations of their
levels.
• To interpret parameters, however, it
is helpful to treat the variables
asymmetrically.
• We illustrate with the independence
model for Ix2 tables.
• In row i, the logit equals
• The
final term does not depend on i;
• that is, logit[P(Y=1| X=i)] is identical
at each level of X
• Thus, independence implies a model
of form,
logit[P(Y=1| X=i)] =
• In each row, the odds of response in
column 1
equal exp() = exp(
An analogous property holds when
J>2.
• Differences between two parameters
for a given variable relate to the log
odds of making one response,
relative to the other, on that variable
Saturated Model
dependent variables satisfy a
•Statistically
more complex loglinear model
The are association terms that reflect
deviations from independence.
The represent interactions between X
and Y, whereby the effect of one variable
on ij depends on the level of the other
•direct
relationships exist between log
odds ratios and
Parameter Estimation
•Let
{ij} denote expected frequencies.
Suppose all ijk >0 and let ij = log ij .
A dot in a subscript denotes the
average with respect to that index; for
instance,
We set
, ,
•The
sum of parameters for any index
equals zero. That is
INFERENCE FOR LOGLINEAR MODELS
Chi-Squared
Goodness-of-Fit Tests
•
• As usual, X 2 and G2 test whether a model holds
by comparing cell fitted values to observed
counts
=2
• Where nijk = observed frequency and =expected
frequency . Here df equals the number of cell
counts minus the number of model parameters.
Example for Saturated
Model
Sex
Party
Total
Democrat
Republic
Male
222 (204.32)
115 (132.68)
337
Female
240 (257.68)
185 (167.32)
425
Total
462
300
762
Sex
Party
Total
Democrat
Republic
Male
Log(204.32) =
5.32
Log(132.68) =
4.89
10.21
Female
Log(257.68) =
5.55
Log(167.32) =
5.12
10.67
Total
10.87
10.01
20.88
•
•
•
)=204.38
)=132.95
)=257.24
)=167.34
•
Model lengkap tidak sesuai
Contingency Tables
• Consider an IxJ contingency table that crossclassifies a multinomial sample of n subjects on
two categorical responses.
• The cell probabilities are (i j) and the expected
frequencies are (i j = n i j ).
• Loglinear model formulas use (i j = n i j ) rather
than (i j), so they also apply with Poisson
sampling for N = IJ independent cell counts (Yi j)
having {i j=E(Yi j) }.
• In either case we denote the observed cell counts
by (nij)
Independence Model
Under statistical independence
For multinomial sampling
Denote the row variable by X and the
column variable by Y
The formula expressing independence
is multiplicative
•Thus
for a row effect
and a column
effect
This is the loglinear model of
independence.
As usual, identifiability requires
constraints such as
•
• The tests using X2 and G2 are also goodness-offit tests of this loglinear model.
• Loglinear models for contingency tables are
GLMs that treat the N cell counts as
independent observations of a Poisson random
component.
• Loglinear GLMs identify the data as the N cell
counts rather than the individual classifications
of the n subjects.
• The expected cell counts link to the explanatory
terms using the log link
• The model does not distinguish
between response and explanatory
variables.
• It treats both jointly as responses,
modeling ij for combinations of their
levels.
• To interpret parameters, however, it
is helpful to treat the variables
asymmetrically.
• We illustrate with the independence
model for Ix2 tables.
• In row i, the logit equals
• The
final term does not depend on i;
• that is, logit[P(Y=1| X=i)] is identical
at each level of X
• Thus, independence implies a model
of form,
logit[P(Y=1| X=i)] =
• In each row, the odds of response in
column 1
equal exp() = exp(
An analogous property holds when
J>2.
• Differences between two parameters
for a given variable relate to the log
odds of making one response,
relative to the other, on that variable
Saturated Model
dependent variables satisfy a
•Statistically
more complex loglinear model
The are association terms that reflect
deviations from independence.
The represent interactions between X
and Y, whereby the effect of one variable
on ij depends on the level of the other
•direct
relationships exist between log
odds ratios and
Parameter Estimation
•Let
{ij} denote expected frequencies.
Suppose all ijk >0 and let ij = log ij .
A dot in a subscript denotes the
average with respect to that index; for
instance,
We set
, ,
•The
sum of parameters for any index
equals zero. That is
INFERENCE FOR LOGLINEAR MODELS
Chi-Squared
Goodness-of-Fit Tests
•
• As usual, X 2 and G2 test whether a model holds
by comparing cell fitted values to observed
counts
=2
• Where nijk = observed frequency and =expected
frequency . Here df equals the number of cell
counts minus the number of model parameters.
Example for Saturated
Model
Sex
Party
Total
Democrat
Republic
Male
222 (204.32)
115 (132.68)
337
Female
240 (257.68)
185 (167.32)
425
Total
462
300
762
Sex
Party
Total
Democrat
Republic
Male
Log(204.32) =
5.32
Log(132.68) =
4.89
10.21
Female
Log(257.68) =
5.55
Log(167.32) =
5.12
10.67
Total
10.87
10.01
20.88
•
•
•
)=204.38
)=132.95
)=257.24
)=167.34
•
Model lengkap tidak sesuai