Loglinear Models 2012 Rev

Loglinear Models for
Contingency Tables

• Consider an IxJ contingency table that crossclassifies a multinomial sample of n subjects on
two categorical responses.
• The cell probabilities are (i j) and the expected
frequencies are (i j = n i j ).
• Loglinear model formulas use (i j = n i j ) rather
than (i j), so they also apply with Poisson
sampling for N = IJ independent cell counts (Yi j)
having {i j=E(Yi j) }.
• In either case we denote the observed cell counts
by (nij)

Independence Model
Under statistical independence
For multinomial sampling
Denote the row variable by X and the
column variable by Y
The formula expressing independence
is multiplicative


•Thus
 
for a row effect
and a column
effect
This is the loglinear model of
independence.
As usual, identifiability requires
constraints such as

•  


• The tests using X2 and G2 are also goodness-offit tests of this loglinear model.
• Loglinear models for contingency tables are
GLMs that treat the N cell counts as
independent observations of a Poisson random
component.
• Loglinear GLMs identify the data as the N cell

counts rather than the individual classifications
of the n subjects.
• The expected cell counts link to the explanatory
terms using the log link

• The model does not distinguish
between response and explanatory
variables.
• It treats both jointly as responses,
modeling ij for combinations of their
levels.
• To interpret parameters, however, it
is helpful to treat the variables
asymmetrically.

• We illustrate with the independence
model for Ix2 tables.
• In row i, the logit equals

• The

 
final term does not depend on i;
• that is, logit[P(Y=1| X=i)] is identical
at each level of X
• Thus, independence implies a model
of form,
logit[P(Y=1| X=i)] = 
• In each row, the odds of response in
column 1
equal exp() = exp(

An analogous property holds when
J>2.
• Differences between two parameters
for a given variable relate to the log
odds of making one response,
relative to the other, on that variable

Saturated Model
dependent variables satisfy a

•Statistically
 
more complex loglinear model

The are association terms that reflect
deviations from independence.
The represent interactions between X
and Y, whereby the effect of one variable
on ij depends on the level of the other

•direct
 
relationships exist between log
odds ratios and

Parameter Estimation
•Let
  {ij} denote expected frequencies.
Suppose all ijk >0 and let ij = log ij .
A dot in a subscript denotes the

average with respect to that index; for
instance,
We set
, ,

•The
  sum of parameters for any index
equals zero. That is

INFERENCE FOR LOGLINEAR MODELS
Chi-Squared
Goodness-of-Fit Tests
•  
• As usual, X 2 and G2 test whether a model holds
by comparing cell fitted values to observed
 
counts
 
=2


• Where nijk = observed frequency and =expected
frequency . Here df equals the number of cell
counts minus the number of model parameters.

Example for Saturated
Model
Sex

Party

Total

Democrat

Republic

Male

222 (204.32)


115 (132.68)

337

Female

240 (257.68)

185 (167.32)

425

Total

462

300

762


Sex

Party

Total

Democrat

Republic

Male

Log(204.32) =
5.32

Log(132.68) =
4.89

10.21


Female

Log(257.68) =
5.55

Log(167.32) =
5.12

10.67

Total

10.87

10.01

20.88

•  


•  

•  
)=204.38
)=132.95
)=257.24
)=167.34

•  

Model lengkap tidak sesuai