6 classical hypothesis testing theory
Classical Hypothesis
Testing Theory
Adapted from Alexander Senf
Review
• 5 steps of classical hypothesis testing
(Ch. 3)
1. Declare null hypothesis H0 and alternate
hypothesis H1
2. Fix a threshold α for Type I error (1% or
5%)
•
Type I error (α): reject H0 when it is true
•
Type II error (β): accept H0 when it is false
3. Determine a test statistic
•
7/31/2008
a quantity calculated from the data
2
Review
4. Determine what observed values of the
test statistic should lead to rejection of
H0
•
Significance point K (determined by α)
5. Test to see if observed data is more
extreme than significance point K
7/31/2008
•
If it is, reject H0
•
Otherwise, accept H0
3
Overview of Ch. 9
– Simple Fixed-Sample-Size Tests
– Composite Fixed-Sample-Size Tests
– The -2 log λ Approximation
– The Analysis of Variance (ANOVA)
– Multivariate Methods
– ANOVA: the Repeated Measures Case
– Bootstrap Methods: the Two-sample ttest
– Sequential Analysis
7/31/2008
4
Simple Fixed-Sample-Size
Tests
7/31/2008
5
The Issue
• In the simplest case, everything is
specified
– Probability distribution of H0 and H1
• Including all parameters
– α (and K)
– But: β is left unspecified
• It is desirable to have a procedure that
minimizes β given a fixed α
– This would maximize the power of the test
• 1-β, the probability of rejecting H0 when H1 is true
7/31/2008
6
Most Powerful Procedure
• Neyman-Pearson Lemma
– States that the likelihood-ratio (LR) test is the
most powerful test for a given α
– The LR is defined as:
– where
f1 ( X 1 ) f1 ( X 2 ) f1 ( X n )
LR
f 0 ( X 1 ) f 0 ( X 2 ) f 0 ( X n )
• f0, f1 are completely specified density functions for
H0,H1
• X1, X2, … Xn are iid random variables
7/31/2008
7
Neyman-Pearson Lemma
– H0 is rejected when LR ≥ K
– With a constant K chosen such that:
P(LR ≥ K when H0 is true) = α
– Let’s look at an example using the
Neyman-Pearson Lemma!
– Then we will prove it.
7/31/2008
8
Example
• Basketball players seem to
be taller than average
– Use this observation to
formulate our hypothesis H1:
• “Tallness is a factor in the
recruitment of KU basketball
players”
– The null hypothesis, H0, could
be:
• “No, the players on KU’s team are
a just average height compared to
the population in the U.S.”
• “Average height of the team and
the population in general is the
same”
7/31/2008
9
Example
• Setup:
– Average height of males in the US: 5’9 ½“
– Average height of KU players in 2008:
6’04 ½”
• Assumption: both populations are normaldistributed centered on their 2respective
( x 76.5 ) 2
( x 69.5 )
averages (μ0 = 69.5 in, μ
1=
8 2
8 76.5 in) and σ =
e
e
f1 ( x)
• Sample size: 3 f 0 ( x)
2 2
2 2
– Choose α: 5%
7/31/2008
10
Example
• The two populations:
f0
f1
p
height (inches)
7/31/2008
11
Example
– Our test statistic is the Likelihood Ratio, LR
e
( x1 76.5 ) 2
8
f1 ( x1 ) f1 ( x2 ) f1 ( x3 ) 2 2
( x )
( x 69.5 )
f 0 ( x1 ) f 0 ( x2 ) f 0 ( x3 )
8
1
e
2 2
e
( x2 76.5 ) 2
8
e
2 2
2
e
( x2 69.5 ) 2
8
2 2
( x3 76.5 ) 2
8
2 2
e
( x3 69.5 ) 2
8
2 2
3
e
1
( xi 69.5 ) 2 ( xi 76.5 ) 2
8 i 1
– Now we need to determine a significance
point K at which we can reject H0, given α =
5%
7/31/2008
• P(Λ(x) ≥ K | H0 is true) = 0.05, determine K
12
Example
– So we just need to solve for K’ and calculate K:
f
0
( x1 ) f 0 ( x2 ) f 0 ( x3 )dx1dx2 dx3 0.05
K1' K 2' K 3'
• How to solve this? Well, we only need one set of
values to calculate K, so let’s pick two and solve for
the third:
f
0
( x1 ) f 0 ( x2 ) f 0 ( x3 )dx1dx2 dx3 0.05
68 71 K 3'
• We get one result: K3’=71.0803
7/31/2008
13
Example
– Then we can just plug it in to Λ and
calculate
K:
3
1
( K i' 69.5 ) 2 ( K i' 76.5 ) 2
8 i 1
K e
1
8 ( 68 69.5)2 ( 68 76.5)2 ( 71 69.5) 2 ( 71 76.5) 2 ( 71.0803 69.5) 2 ( 71.0803 76.5)2
e
1.663 *10 7
7/31/2008
14
Example
– With the significance point K = 1.663*10-7 we
can now test our hypothesis based on
observations:
• E.g.: Sasha = 83 in, Darrell = 81 1in,
3 Sherron = 71 in
( X {83,81,71}) e
( X i 69.5 ) 2 ( X i 76.5 ) 2
8
i 1
(83,81,71) 1.446 *1012
• 1.446*1012 > 1.663*10-7
• Therefore, our hypothesis that tallness is a factor in
the recruitment of KU basketball players is true.
7/31/2008
15
Neyman-Pearson Proof
• Let A define region in the joint range
of X1, X2, … Xn such that LR ≥ K. A is
the critical region.
– IfAL(is
H 0 )the
only
f 0 (u2 ) f 0 region
(u n )du1du 2 of
dusize
n α
f 0 (u1 )critical
A
A
we are done
– Let’s
assume
another
critical
region
of
L
(
H
)
f
(
u
)
f
(
u
)
f
(
u
)
du
du
du
0
0
1
0
2
0
n
1
2
n
B
size α, defined
by B
B
7/31/2008
16
Proof
– H0 is rejected if the observed vector (x1,
x2, …, xn) is in A or in B.
– Let A and B overlap in region C
– Power of the test: rejecting H0 when H1 is
true
• The Power of this test using A is:
L( H ) f (u ) f (u ) f (u )du du
A
7/31/2008
1
1
1
1
2
1
n
1
2
dun
A
17
Proof
– Define: Δ = ∫AL(H1) - ∫BL(H1)
• The power of the test using A minus using B
f1 (u1 ) f1 (un )du1 dun f1 (u1 ) f1 (un )du1 dun
A
B
f1 (u1 ) f1 (un )du1 dun f1 (u1 ) f1 (un )du1 dun
A\C
B\C
• Where A\C is the set of points in A but not in
C
• And B\C contains points in B but not in C
7/31/2008
18
Proof
– So, in A\C we have:
f1 (u1 ) f1 (u n )
K
f 0 (u1 ) f 0 (un )
f1 (u1 ) f1 (un ) Kf 0 (u1 ) f 0 (un )
– While in B\C we have:
f1 (u1 ) f1 (un ) Kf 0 (u1 ) f 0 (un )
Why?
7/31/2008
19
Proof
– Thus
Kf 0 (u1 ) f 0 (un )du1 dun Kf 0 (u1 ) f 0 (un )du1 dun
A\C
B\C
Kf 0 (u1 ) f 0 (un )du1 dun Kf 0 (u1 ) f 0 (un )du1 dun
A
B
K K
0
– Which implies that the power of the test
using A is greater than or equal to the
power using B.
7/31/2008
20
Composite Fixed-SampleSize Tests
7/31/2008
21
Not Identically Distributed
• In most cases, random variables are not
identically distributed, at least not in H1
– This affects the likelihood function, L
– For example, H1 in the two-sample t-test is:
m
L
i 1
1
e
2
( x1i 1 ) 2
2 2
n
i 1
1
e
2
( x2 i 2 ) 2
2 2
– Where μ1 and μ2 are different
7/31/2008
22
Composite
– Further, the hypotheses being tested do
not specify all parameters
– They are composite
– This chapter only outlines aspects of
composite test theory relevant to the
material in this book.
7/31/2008
23
Parameter Spaces
– The set of values the parameters of interest
can take
– Null hypothesis: parameters in some region ω
– Alternate hypothesis: parameters in Ω
– ω is usually a subspace of Ω
• Nested hypothesis case
– Null hypothesis nested within alternate hypothesis
– This book focuses on this case
• “if the alternate hypothesis can explain the data
significantly better we can reject the null
hypothesis”
7/31/2008
24
λ Ratio
• Optimality theory for composite tests
suggests this as desirable test statistic:
Lmax ( )
Lmax ()
• Lmax(ω): maximum likelihood when parameters
are confined to the region ω
• Lmax(Ω): maximum likelihood when parameters
are confined to the region Ω, defined by H1
• H0 is rejected when λ is sufficiently small (→
Type I error)
7/31/2008
25
Example: t-tests
• The next slides calculate the λ-ratio
for the two sample t-test (with the
likelihood)
(x )
(x )
m
n
1
1
1i
L
i 1
2
e
2 2
1
2
2i
i 1
2
e
2
2
2 2
– t-tests later generalize to ANOVA and T2
tests
7/31/2008
26
Equal Variance Two-Sided ttest
• Setup
– Random variables X11,…,X1m in group 1 are
Normally and Independently Distributed
( μ 1 ,σ 2 )
– Random variables X21,…,X2n in group 2 are
NID (μ2,σ2)
– X1i and X2j are independent for all i and j
– Null hypothesis H0: μ1= μ2 (= μ, unspecified)
– Alternate hypothesis H1: both unspecified
7/31/2008
27
Equal Variance Two-Sided ttest
• Setup (continued)
– σ2 is unknown and unspecified in H0 and
H1
• Is assumed to be the same in both
distributions
{ ,0 2 }
1
– Region ω is:
2
{ 1 , 2 ,0 2 }
– Region Ω is:
7/31/2008
28
Equal Variance Two-Sided ttest
• Derivation
– H0: writing μ for the mean, when μ1= μ2,
the maximum over likelihood ω is at
X 11 X 12 X 1m X 21 X 22 X 2 n
ˆ X
mn
– And the (common) variance σ2 is
m
ˆ 02
7/31/2008
2
n
2
(
X
X
)
(
X
X
)
i1 1i
i1 2i
mn
29
Equal Variance Two-Sided ttest
– Inserting both into the likelihood
function, L
Lmax ( )
7/31/2008
1
2
0
(2ˆ )
m n
2
e
m n
2
30
Equal Variance Two-Sided ttest
– Do the same thing for region Ω
ˆ1 X 1
X 11 X 12 X 1m
m
ˆ 2 X 2
m
̂ 12
X 21 X 22 X 2 n
n
n
2
2
(
X
X
)
(
X
X
)
1
2
i 1 1i
i 1 2i
mn
– Which produces this likelihood Function,
L
m n
Lmax ()
7/31/2008
1
(2ˆ12 )
m n
2
e
2
31
Equal Variance Two-Sided ttest
– The test statistic λ is then
e
m n
2
Lmax ( ) (2ˆ 02 )
m2n
Lmax ()
e
2
1
(2ˆ )
m n
2
ˆ12
2
ˆ 0
m n
2
m n
2
It’s the same function, just
With different variances
7/31/2008
32
Equal Variance Two-Sided ttest
– We can then use the algebraic identity
m
n
2
m
i 1
i 1
– To show that
– Where t
7/31/2008
2
n
2
(
X
X
)
(
X
X
)
(
X
X
)
(
X
X
)
1
2
1i
2i
1i
2i
i 1
2
1
1t 2
mn 2
i 1
mn
( X 1 X 2 )2
mn
m n
2
( X 1 X 2 ) mn
T
is (from Ch. 3)
S mn
33
Equal Variance Two-Sided ttest
– t is the observed value of T
– S is defined in Ch. 3 as
m
n
2
(
X
X
)
(
X
X
)
1
2
1i
2i
S 2 i 1
2
i 1
mn 2
λ
We can plot λ as a
function of t:
(e.g. m+n=10)
7/31/2008
t
34
Equal Variance Two-Sided ttest
– So, by the monotonicity argument, we can
use t2 or |t| instead of λ as test statistic
– Small values of λ correspond to large
values of |t|
– Sufficiently large |t| lead to rejection of H0
– The H0 distribution of t is known
• t-distribution with m+n-2 degrees of freedom
– Significance points are widely available
• Once α has been chosen, values of |t|
sufficiently large to reject H0 can be determined
7/31/2008
35
http://www.socr.ucla.edu/Applets.dir/T-table.html
Equal Variance Two-Sided ttest
7/31/2008
36
Equal Variance One-Sided ttest
• Similar to Two-Sided t-test case
– Different region Ω for H1:
• Means μ1 and μ2 are not simply different, but
one is larger than the other μ1 ≥ μ2
{1 2 ,0 2 }
x1 x 2
• If
then maximum likelihood
estimates are the same as for the two-sided
case
7/31/2008
37
Equal Variance One-Sided ttest
• If x1 x 2
then the unconstrained maximum
of the likelihood is outside of ω
x1 , xat
• The unique maximum (is
2)
, implying
that the maximum in ω occurs at a boundary
point in Ω
( x)
• At this point estimates of μ1 and μ2 are equal
• At this point the likelihood ratio is 1 and H0 is
not rejected
• Result: H0 is rejected in favor of H1 (μ1 ≥ μ2)
only for sufficiently large positive values of t
7/31/2008
38
Example - Revised
• This scenario fits with our original
example:
– H1 is that the average height of KU
basketball players is bigger than for the
general population
– One-sided test
– We could assume that we don’t know the
averages for H0 and H1
– We actually don’t know σ (I just guessed 2 in
the original example)
7/31/2008
39
Example - Revised
• Updated example:
– Observation in group 1 (KU): X1 = {83, 81, 71}
– Observation in group 2: X2 = {65, 72, 70}
– Pick significance point for t from a table: tα =
2.132
• t-distribution, m+n-2 = 4 degrees of freedom, α =
0.05
– Calculate t with
(78.our
3 69observations
) 9
27.9
t
5.2122 6
2.185
12.7673
– t > tα, so we can reject H0!
7/31/2008
40
Comments
• Problems that might arise in other cases
– The λ-ratio might not reduce to a function of a
well-known test statistic, such as t
– There might not be a unique H0 distribution of λ
– Fortunately, the t statistic is a pivotal quantity
• Independent of the parameters not prescribed by H0
– e.g. μ, σ
– For many testing procedures this property does
not hold
7/31/2008
41
Unequal Variance Two-Sided
t-test
• Identical to Equal Variance Two-Sided ttest
– Except: variances in group 1 and group 2 are
no longer assumed to be identical
• Group 1: NID(μ1, σ12)
• Group 2: NID(μ2, σ22)
• With σ12 and σ22 unknown and not assumed
identical
• Region ω = {μ1 = μ2, 0 < σ12, σ22 < +∞}
• Ω makes no constraints on values μ1, μ2, σ12, and σ22
7/31/2008
42
Unequal Variance Two-Sided
t-test
– The likelihood function of (X11, X12, …,
X1m, X21, X22, …, X2n) then becomes
m
i 1
1
e
2 1
( x1i 1 ) 2
2 12
n
i 1
1
e
2 2
( x21i 2 ) 2
2 22
– Under H0 (μ1 = μ2 = μ), this becomes:
m
i 1
7/31/2008
1
e
2 1
( x1i ) 2
2 12
n
i 1
1
e
2 2
( x21i ) 2
2 22
43
Unequal Variance Two-Sided
t-test
̂ ˆ12
– Maximum likelihood estimates
,ˆ 22
and
satisfy the simultaneous
equations:
(x
1i
2
1
ˆ )
ˆ
2i
2
2
ˆ )
ˆ
2
1
(x
ˆ 22
(x
ˆ
7/31/2008
(x
1i
0
ˆ ) 2
m
2i
ˆ ) 2
n
44
Unequal Variance Two-Sided
t-test
– cubic equation in̂
– Neither the λ ratio, nor any monotonic
function has a known probability distribution
when H0 is true!
– This does not lead to any useful testing
statistic
• The t-statistic may be used as reasonably close
• However H0 distribution is still unknown, as it
depends on the unknown ratio σ12/σ22
• In practice, a heuristic is often used (see Ch. 3.5)
7/31/2008
45
The -2 log λ Approximation
7/31/2008
46
The -2 log λ Approximation
• Used when the λ-ratio procedure does
not lead to a test statistic whose H0
distribution is known
– Example: Unequal Variance Two-Sided ttest
• Various approximations can be used
– But only if certain regularity assumptions
and restrictions hold true
7/31/2008
47
The -2 log λ Approximation
• Best known approximation:
– If H0 is true, -2 log λ has an asymptotic
chi-square distribution,
• with degrees of freedom equal to the
difference in parameters unspecified by H0
and H1, respectively.
• λ is the likelihood ratio
• “asymptotic” = “as the sample size → ∞”
– Provides an asymptotically valid testing
procedure
7/31/2008
48
The -2 log λ Approximation
– Restrictions:
• Parameters must be real numbers that can take
on values in some interval
• The maximum likelihood estimator is found at a
turning point of the function
– i.e. a “real” maximum, not at a boundary point
• H0 is nested in H1 (as in all previous slides)
– These restrictions are important in the
proof
• I skip the proof…
7/31/2008
49
The -2 log λ Approximation
• Instead:
– Our original basketball example, revised
again:
• Let’s drop our last assumption, that the variance
in the population at large is the same as in the
group of KU basketball players.
• All we have left now are our observations and
the hypothesis that μ1 > μ2
– Where μ1 is the average height of Basketball players
• Observation in group 1 (KU): X1 = {83, 81, 71}
• Observation in group 2: X2 = {65, 72, 70}
7/31/2008
50
Example – Revised Again
– Using the Unequal Variance One-Sided tTest
– We get:
7/31/2008
51
The Analysis of Variance
(ANOVA)
7/31/2008
52
The Analysis of Variance
(ANOVA)
• Probably the most frequently used
hypothesis testing procedure in
statistics
• This section
– Derives of the Sum of Squares
– Gives an outline of the ANOVA procedure
– Introduces one-way ANOVA as a
generalization of the two-sample t-test
– Two-way and multi-way ANOVA
– Further generalizations of ANOVA
7/31/2008
53
Sum of Squares
• New variables (from Ch. 3)
– The two-sample t-test tests for equality
of the means of two groups.
– We could express the observations as:
X ij i Eij
i 1,2
– Where the Eij are assumed to be
NID(0,σ2)
– H0 is μ1 = μ2
7/31/2008
54
Sum of Squares
– This can also be written as:
X ij i Eij
i 1,2
• μ could be seen as overall mean
• αj as deviation from μ in group j
– This model is overparameterized
• Uses more parameters than necessary
m1 n 2 0
• Necessitates the requirement
• (always assumed imposed)
7/31/2008
55
Sum of Squares
– We are deriving a test procedure similar
to the two-sample two-sided t-test
– Using |t| as test statistic
• Absolute value of the T statistic
– This is equivalent to using t2
• Because it’s a monotonic function of |t|
– The square of the t statistic (from Ch. 3)
( X 1 X 2 ) mn
T
S mn
7/31/2008
56
Sum of Squares
– …can, after algebraic manipulations, be
written as F
B
F (m n 2)
W
m
– where
X
1
j 1
X1 j
n
X 2
m
j 1
X2j
X
n
mX 1 nX 2
mn
mn
B
( X 1 X 2 ) 2 m( X 1 X ) 2 n( X 2 X ) 2
mn
m
n
W ( X 1 j X 1 ) ( X 2 j X 2 ) 2
j 1
7/31/2008
2
j 1
57
Sum of Squares
– B: between (among) group sum of squares
– W: within group sum of squares
– B + W: total sum of squares
• Can be shown to be:
m
n
i 1
i 1
2
2
(
X
X
)
(
X
X
)
1i
2i
– Total number of degrees of freedom: m + n
–1
• Between groups: 1
• Within groups: m + n - 2
7/31/2008
58
Sum of Squares
B
– This gives us the F statistic
F ( m n 2)
W
– Our goal is to test the significance of the
difference between the means of two groups
• B measures the difference
– The difference must be measured relative to
the variance within the groups
• W measures that
– The larger F is, the more significant the
difference
7/31/2008
59
The ANOVA Procedure
• Subdivide observed total sum of
squares into several components
– In our case, B and W
• Pick appropriate significance point for
a chosen Type I error α from an F table
• Compare the observed components to
test our hypothesis
7/31/2008
60
F-Statistic
• Significance points depend on
degrees of freedom in B and W
– In our case, 1 and (m + n – 2)
7/31/2008
http://www.ento.vt.edu/~sharov/PopEcol/tables/f005.html
61
Comments
• The two-group case readily generalizes
to any number of groups.
• ANOVAs can be classified in various
ways, e.g.
– fixed effects models
– mixed effects models
– random effects model
– Difference is discussed later
– For now we consider fixed effect models
• Parameter αi is fixed, but unknown, in group i
7/31/2008
X ij i Eij
62
Comments
• Terminology
– Although ANOVA contains the word
‘variance’
– What we actually test for is a equality in
means between the groups
• The different mean assumptions affect the
variance, though
• ANOVAs are special cases of regression
models from Ch. 8
7/31/2008
63
One-Way ANOVA
• One-Way fixed-effect ANOVA
• Setup and derivation
– Like two-sample t-test for g number of groups
– Observations (ni observations, i=1,2,…,g)
X i1 , X i 2 , , X in
– Using overparameterized
model for X
X ij i Eij
j 1,2, , ni
i 1,2, , g
– Eij assumed NID(0,σ2), Σniαi = 0, αi fixed in
group i
7/31/2008
64
One-Way ANOVA
– Null Hypothesis H0 is: α1 = α2 = … = αg =
0
g n
– Total sum of squares
is
( X X )2
i
ij
i 1 j 1
g n
g
– This is subdivided
into
B
and W
2
i
W ( X ij X i ) 2
B ni ( X i X )
i 1 j 1
i 1
– with
7/31/2008
ni
X ij
j 1
ni
X i
g
ni
X
i 1 j 1
X ij
N
g
N ni
i 1
65
One-Way ANOVA
– Total degrees of freedom: N – 1
• Subdivided into dfB = g – 1 and dfW = N - g
– This gives us our test statistic F
B N g
F *
W g1
– We can now look in the F-table for these
degrees of freedom to pick significance points
for B and W
– And calculate B and W from the observed data
– And accept or reject H0
7/31/2008
66
Example
• Revisiting the Basketball example
– Looking at it as a One-Way ANOVA
analysis
• Observation in group 1 (KU): X1 = {83, 81,
71}
• (73Observation
group
X.662 =
70}
.66 83) (73.66 81) (73in
.66 71
) (73.66 652:
) (73
72) {65,
(73.66 70)72,
239.3336
2
2
2
2
2
2
– Total Sum of Squares:
g
B ni ( X i X ) 2 3(78.33 76.33) 2 3(69 76.33) 2 130.57
– B (between groups sum of squares)
i 1
7/31/2008
67
Example
– W (within groups sum of squares)
g
ni
W ( X ij X i ) 2
i 1 j 1
((83 78.33) 2 (81 78.33) 2 (71 78.33) 2 ) ((65 69) 2 (72 69) 2 (70 69) 2 )
108.667
– Degrees of freedom
• Total: N-1 = 5
• dfB = g – 1 = 2 - 1 = 1
• dfW = N – g = 6 – 2 = 4
7/31/2008
68
Example
– Table lookup for df 1 and 4 and α = 0.05:
– Critical value: F = 7.71
– Calculate F from our data:
B N g 130.57 6 2
F *
*
4.806
W g 1 108.667 2 1
– So… 4.806 < 7.71
– With ANOVA we actually accept H0!
• Seems to be the large variance in group 1
7/31/2008
69
Same Example – with Excel
• Screenshots:
7/31/2008
70
Excel
• Offers most of these tests, built-in
7/31/2008
71
Two-Way ANOVA
• Two-Way Fixed Effects ANOVA
• Overview only (in the scope of this book)
• More complicated setup; example:
– Expression levels of one gene in lung cancer
patients
– a different risk classes
• E.g.: ultrahigh, very high, intermediate, low
– b different age groups
– n individuals for each risk/age combination
7/31/2008
72
Two-Way ANOVA
– Expression levels (our observations): Xijk
• i is the risk class (i = 1, 2, …, a)
• j indicates the age group
• k corresponds to the individual in each group ( k = 1,
…, n)
– Each group is a possible risk/age combination
• The number of individuals in each group is the
same, n
• This is a “balanced” design
• Theory for unbalanced designs is more complicated
and not covered in this book
7/31/2008
73
Two-Way ANOVA
– The Xijk can be arranged in a table:
Risk category
1
2
3
4
1
n
n
n
n
2
n
n
n
n
3
n
n
n
n
4
n
n
n
n
5
n
n
n
n
Age group
j
i
Number of individuals in this
risk/age group (aka “cell”)
7/31/2008
This is a two-way table
74
Two-Way ANOVA
– The model adopted for each Xijk is
X ijk i j ij Eijk
i 1,2, , a
j 1,2, , b
k 1,2, , n
• Where Eijk are NID(μ, α2)
• The mean of Xijk is μ + αi + βi + δij
• αi is a fixed parameter, additive for risk class i
• βi is a fixed parameter, additive for age group i
• δij is a fixed risk/age interaction parameter
– Should be added is a possible group/group interaction
exists
7/31/2008
75
Two-Way ANOVA
– These constraints are imposed
• Σ i α i = Σ i βi = 0
• Σiδij = 0 for all j
• Σjδij = 0 for all i
– The total sum of squares is then subdivided
into four groups:
•
•
•
•
7/31/2008
Risk class sum of squares
Age group sum of squares
Interaction sum of squares
Within cells (“residual” or “error”) sum of
squares
76
Two-Way ANOVA
– Associated with each sum of squares
• Corresponding degrees of freedom
• Hence also a corresponding mean square
– Sum of squares divided by degrees of freedom
– The mean squares are then compared using
F ratios to test for significance of various
effects
• First – test for a significant risk/age interaction
• F-ratio used is ratio of interaction mean square
and within-cells mean square
7/31/2008
77
Two-Way ANOVA
Age
• If such an interaction is used, it may not be
reasonable to test for significant risk or age
differences
• Example, μ in two risk classes, two age
Risk
groups:
1
2
Age
– No evidence of interaction
1
4
12
2
7
15
1
2
1
4
15
2
11 6
– Example of interaction
7/31/2008
78
Multi-Way ANOVA
• One-way and two-way fixed effects
ANOVAs can be extended to multiway ANOVAs
• Gets complicated
• Example: three-way ANOVA model:
X ijkm i j k ij ik jk ijk Eijkm
7/31/2008
79
Further generalizations of
ANOVA
• The 2m factorial design
– A particular form of the one-way ANOVA
• Interactions between main effects
– m “factors” taken at two “levels”
• E.g. (1) Gender, (2) Tissue (lung, kidney), and
(3) status (affected, not affected)
– 2m possible combinations of levels/groups
– Can test for main effects and interactions
– Need replicated experiments
• n replications for each of the 2m experiments
7/31/2008
80
Further generalizations of
ANOVA
– Example, m = 3, denoted by A, B, C
• 8 groups, {abc, ab, ac, bc, a, b, c, 1}
• Write totals of n observations Tabc, Tab, …, T1
• The total between sum of squares can be
subdivided into seven individual sums of
squares
–
–
–
–
7/31/2008
Three main effects (A, B, C)
Three pair wise interactions (AB, AC, BC)
One triple-wise interaction (ABC)
Example:
ofTacsquares
(TabcSum
Tab
Ta Tbc for
Tb A,
T cand
T1 ) 2 for BC,
respectively
8n
(Tabc Tab Tac Ta Tbc Tb T cT1 ) 2
8n
81
Further generalizations of
ANOVA
– If m ≥ 5 the number of groups becomes large
– Then the total number of observations, n2m is
large
– It is possible to reduce the number of
observations by a process …
• Confounding
– Interaction ABC probably very small and not
interesting
– So, prefer a model without ABC, reduce data
– There are ANOVA designs for that
7/31/2008
82
Further generalizations of
ANOVA
• Fractional Replication
– Related to confounding
– Sometimes two groups cannot be
distinguished from each other, then they
are aliases
• E.g. A and BC
– This reduces the need to experiments and
data
– Ch. 13 talks more about this in the context
of microarrays
7/31/2008
83
Random/Mixed Effect
Models
• So far: fixed effect models
– E.g. Risk class, age group fixed in previous
example
• Multiple experiments would use same categories
• But: what if we took experimental data on several
random days?
• The days in itself have no meaning, but a
“between days” sum of squares must be extracted
– What if the days turn out to be important?
– If we fail to test for it, the significance of our procedure is
diminished.
– Days are a random category, unlike risk and age!
7/31/2008
84
Random/Mixed Effect
Models
• Mixed Effect Models
– If some categories are fixed and some
are random
– Symbols used:
• Greek letters for fixed effects
• Uppercase Roman letters for random effects
• Example: two-way mixed effect model with
– Risk class a and days d and n values collected
each day, the appropriate model is written:
X ikl i Dl Gil Eikl
7/31/2008
85
Random/Mixed Effect
Models
• Random effect model have no fixed
categories
• The details on the ANOVA analysis depend
on which effects are random and which are
fixed
• In a microarray context (more in Ch. 13)
– There tend to be several fixed and several
random effects, which complicates the analysis
– Many interactions simply assumed zero
7/31/2008
86
Multivariate Methods
ANOVA: the Repeated
Measures Case
Bootstrap Methods: the Twosample t-test
All skipped …
7/31/2008
87
Sequential Analysis
7/31/2008
88
Sequential Analysis
• Sequential Probability Ratio
– Sample size not known in advance
– Depends on outcomes of successive
observations
– Some of this theory is in BLAST
• Basic Local Alignment Search Tool
– The book focuses on discreet random
variables
7/31/2008
89
Sequential Analysis
– Consider:
• Random variable Y with distribution P(y;ξ)
• Tests usually relate to the value of
parameter ξ
• H0: ξ is ξ0
• H1: ξ is ξ1
• We can choose a value for the Type I error α
• And a value for the Type II error β
• Sampling then continues while
P ( y1 ; 1 ) P( y2 ; 1 ) P( yn ; 1 )
A
B
P( y1 ; 0 ) P ( y2 ; 0 ) P( yn ; 0 )
7/31/2008
90
Sequential Analysis
– A and B are chosen to correspond to an α
and β
– Sampling continues until the ratio is less
than A (accept H0) or greater than B (reject
H 0)
– Because these are discreet variables,
boundary overshoot usually occurs
• We don’t expect to exactly get values α and β
– Desired values for α and β approximately
1
achieved
A by using
B
1
7/31/2008
91
Sequential Analysis
– It is also convenient to take logarithms,
which gives us:
P( yi ; 1 )
1
log
log
log
1
P ( yi ; 0 )
i
– Using
S1,0 ( y ) log
– We can write
7/31/2008
P ( y; 1 )
P ( y; 0 )
1
log
S1, 0 ( yi ) log
1
i
92
Sequential Analysis
• Example: sequence matching
– H0: p0 = 0.25 (probability of a match is 0.25)
– H1: p1 = 0.35 (probability of a match is 0.35)
– Type I error α and Type II error β chosen 0.01
– Yi: 1 if there is a match at position i,
otherwise 0
– Sampling continues while
1
log S1, 0 (Yi ) log 99
99 i
– with
7/31/2008
(0.35)Yi (0.65) (1 Yi )
S1,0 (Yi ) log
(0.25)Yi (0.75) (1 Yi )
93
Sequential Analysis
– S can be seen as the support offered by
Yi for H1
– The inequality
can be re-written as
9.581 (Y 0.2984) 9.581
i
i
– This is actually a random walk with step
sizes 0.7016 for a match and -0.2984 for
a mismatch
7/31/2008
94
Sequential Analysis
• Power Function for a Sequential Test
– Suppose the true value of the parameter
of interest is ξ
– We wish to know the probability that H1
is accepted, given ξ
– This probability is the power Ρ(ξ) of the
test
( )
7/31/2008
*
1
1 *
*
1
1 (
(
) (
)
)
95
Sequential Analysis
– Where θ* is the unique non-zero solution
to θ in
P ( y; 1 )
P( y; ) P( y; )
yR
0
1
– R is the range of values of Y
– Equivalently, θ* is the unique non-zero
S ( y )
1
solution to θ in P( y; )e
1, 0
yR
– Where S is defined as before
7/31/2008
96
Sequential Analysis
– This is very similar to Ch. 7 – Random
Walks
– The parameter θ* is the same as in Ch. 7
– And it will be the same in Ch 10 – BLAST
– < skipping the random walk part >
7/31/2008
97
Sequential Analysis
• Mean Sample Size
– The (random) number of observations
until one or the other hypothesis is
accepted
– Find approximation by ignoring
boundary overshoot
– Essentially identical method used to find
the mean number of steps until the
random walk stops
7/31/2008
98
Sequential Analysis
– Two expressions are calculated for
ΣiS1,0(Yi)
• One involves the mean sample size
• By equating both expressions, solve for
1
mean
sample
size
S
(
y
)
(
1
(
))
log
(
)
log
1,0 i
i
1
P(Yi ; 1 )
P(Yi ; 1 )
E ( S1, 0 (Yi )) E log
P(Yi ; ) log
P(Yi ; 0 ) yR
P(Yi ; 0 )
7/31/2008
99
Sequential Analysis
– So, the mean sample size is:
(1 ( )) log( 1 ) ( ) log( 1 )
P ( y ;1 )
P
(
y
;
)
log
yR
P ( y ; 0 )
– Both numerator and denominator
depend on Ρ(ξ), and so also on θ*
– A generalization applies if Q(y) of Y has
different distribution than H0 and H1 –
relevant to BLAST
1
(1 ( )) log( 1 ) ( ) log(
P ( y ;1 )
Q
(
y
)
log
yR
P ( y ; 0 )
7/31/2008
)
100
Sequential Analysis
• Example
– Same sequence matching example as
before
• H0: p0 = 0.25 (probability of a match is 0.25)
• H1: p1 = 0.35 (probability of a match is 0.35)
• Type I error α and Type II error β chosen 0.01
– Mean sample size equation is:
9.190( p ) 4.595
p log 75 (1 p ) log 13
15
– Mean sample size is when H0 is true: 194
– Mean sample size is when H1 is true: 182
7/31/2008
101
Sequential Analysis
• Boundary Overshoot
– So far we assumed no boundary overshoot
– In practice, there will almost always be, though
• Exact Type I and Type II errors different from α and β
– Random walk theory can be used to assess how
significant the effects of boundary overshoot are
– It can be shown that the sum of Type I and Type II
errors is always less than α + β (also individually)
– BLAST deals with this in a novel way -> see Ch.
10
7/31/2008
102
Testing Theory
Adapted from Alexander Senf
Review
• 5 steps of classical hypothesis testing
(Ch. 3)
1. Declare null hypothesis H0 and alternate
hypothesis H1
2. Fix a threshold α for Type I error (1% or
5%)
•
Type I error (α): reject H0 when it is true
•
Type II error (β): accept H0 when it is false
3. Determine a test statistic
•
7/31/2008
a quantity calculated from the data
2
Review
4. Determine what observed values of the
test statistic should lead to rejection of
H0
•
Significance point K (determined by α)
5. Test to see if observed data is more
extreme than significance point K
7/31/2008
•
If it is, reject H0
•
Otherwise, accept H0
3
Overview of Ch. 9
– Simple Fixed-Sample-Size Tests
– Composite Fixed-Sample-Size Tests
– The -2 log λ Approximation
– The Analysis of Variance (ANOVA)
– Multivariate Methods
– ANOVA: the Repeated Measures Case
– Bootstrap Methods: the Two-sample ttest
– Sequential Analysis
7/31/2008
4
Simple Fixed-Sample-Size
Tests
7/31/2008
5
The Issue
• In the simplest case, everything is
specified
– Probability distribution of H0 and H1
• Including all parameters
– α (and K)
– But: β is left unspecified
• It is desirable to have a procedure that
minimizes β given a fixed α
– This would maximize the power of the test
• 1-β, the probability of rejecting H0 when H1 is true
7/31/2008
6
Most Powerful Procedure
• Neyman-Pearson Lemma
– States that the likelihood-ratio (LR) test is the
most powerful test for a given α
– The LR is defined as:
– where
f1 ( X 1 ) f1 ( X 2 ) f1 ( X n )
LR
f 0 ( X 1 ) f 0 ( X 2 ) f 0 ( X n )
• f0, f1 are completely specified density functions for
H0,H1
• X1, X2, … Xn are iid random variables
7/31/2008
7
Neyman-Pearson Lemma
– H0 is rejected when LR ≥ K
– With a constant K chosen such that:
P(LR ≥ K when H0 is true) = α
– Let’s look at an example using the
Neyman-Pearson Lemma!
– Then we will prove it.
7/31/2008
8
Example
• Basketball players seem to
be taller than average
– Use this observation to
formulate our hypothesis H1:
• “Tallness is a factor in the
recruitment of KU basketball
players”
– The null hypothesis, H0, could
be:
• “No, the players on KU’s team are
a just average height compared to
the population in the U.S.”
• “Average height of the team and
the population in general is the
same”
7/31/2008
9
Example
• Setup:
– Average height of males in the US: 5’9 ½“
– Average height of KU players in 2008:
6’04 ½”
• Assumption: both populations are normaldistributed centered on their 2respective
( x 76.5 ) 2
( x 69.5 )
averages (μ0 = 69.5 in, μ
1=
8 2
8 76.5 in) and σ =
e
e
f1 ( x)
• Sample size: 3 f 0 ( x)
2 2
2 2
– Choose α: 5%
7/31/2008
10
Example
• The two populations:
f0
f1
p
height (inches)
7/31/2008
11
Example
– Our test statistic is the Likelihood Ratio, LR
e
( x1 76.5 ) 2
8
f1 ( x1 ) f1 ( x2 ) f1 ( x3 ) 2 2
( x )
( x 69.5 )
f 0 ( x1 ) f 0 ( x2 ) f 0 ( x3 )
8
1
e
2 2
e
( x2 76.5 ) 2
8
e
2 2
2
e
( x2 69.5 ) 2
8
2 2
( x3 76.5 ) 2
8
2 2
e
( x3 69.5 ) 2
8
2 2
3
e
1
( xi 69.5 ) 2 ( xi 76.5 ) 2
8 i 1
– Now we need to determine a significance
point K at which we can reject H0, given α =
5%
7/31/2008
• P(Λ(x) ≥ K | H0 is true) = 0.05, determine K
12
Example
– So we just need to solve for K’ and calculate K:
f
0
( x1 ) f 0 ( x2 ) f 0 ( x3 )dx1dx2 dx3 0.05
K1' K 2' K 3'
• How to solve this? Well, we only need one set of
values to calculate K, so let’s pick two and solve for
the third:
f
0
( x1 ) f 0 ( x2 ) f 0 ( x3 )dx1dx2 dx3 0.05
68 71 K 3'
• We get one result: K3’=71.0803
7/31/2008
13
Example
– Then we can just plug it in to Λ and
calculate
K:
3
1
( K i' 69.5 ) 2 ( K i' 76.5 ) 2
8 i 1
K e
1
8 ( 68 69.5)2 ( 68 76.5)2 ( 71 69.5) 2 ( 71 76.5) 2 ( 71.0803 69.5) 2 ( 71.0803 76.5)2
e
1.663 *10 7
7/31/2008
14
Example
– With the significance point K = 1.663*10-7 we
can now test our hypothesis based on
observations:
• E.g.: Sasha = 83 in, Darrell = 81 1in,
3 Sherron = 71 in
( X {83,81,71}) e
( X i 69.5 ) 2 ( X i 76.5 ) 2
8
i 1
(83,81,71) 1.446 *1012
• 1.446*1012 > 1.663*10-7
• Therefore, our hypothesis that tallness is a factor in
the recruitment of KU basketball players is true.
7/31/2008
15
Neyman-Pearson Proof
• Let A define region in the joint range
of X1, X2, … Xn such that LR ≥ K. A is
the critical region.
– IfAL(is
H 0 )the
only
f 0 (u2 ) f 0 region
(u n )du1du 2 of
dusize
n α
f 0 (u1 )critical
A
A
we are done
– Let’s
assume
another
critical
region
of
L
(
H
)
f
(
u
)
f
(
u
)
f
(
u
)
du
du
du
0
0
1
0
2
0
n
1
2
n
B
size α, defined
by B
B
7/31/2008
16
Proof
– H0 is rejected if the observed vector (x1,
x2, …, xn) is in A or in B.
– Let A and B overlap in region C
– Power of the test: rejecting H0 when H1 is
true
• The Power of this test using A is:
L( H ) f (u ) f (u ) f (u )du du
A
7/31/2008
1
1
1
1
2
1
n
1
2
dun
A
17
Proof
– Define: Δ = ∫AL(H1) - ∫BL(H1)
• The power of the test using A minus using B
f1 (u1 ) f1 (un )du1 dun f1 (u1 ) f1 (un )du1 dun
A
B
f1 (u1 ) f1 (un )du1 dun f1 (u1 ) f1 (un )du1 dun
A\C
B\C
• Where A\C is the set of points in A but not in
C
• And B\C contains points in B but not in C
7/31/2008
18
Proof
– So, in A\C we have:
f1 (u1 ) f1 (u n )
K
f 0 (u1 ) f 0 (un )
f1 (u1 ) f1 (un ) Kf 0 (u1 ) f 0 (un )
– While in B\C we have:
f1 (u1 ) f1 (un ) Kf 0 (u1 ) f 0 (un )
Why?
7/31/2008
19
Proof
– Thus
Kf 0 (u1 ) f 0 (un )du1 dun Kf 0 (u1 ) f 0 (un )du1 dun
A\C
B\C
Kf 0 (u1 ) f 0 (un )du1 dun Kf 0 (u1 ) f 0 (un )du1 dun
A
B
K K
0
– Which implies that the power of the test
using A is greater than or equal to the
power using B.
7/31/2008
20
Composite Fixed-SampleSize Tests
7/31/2008
21
Not Identically Distributed
• In most cases, random variables are not
identically distributed, at least not in H1
– This affects the likelihood function, L
– For example, H1 in the two-sample t-test is:
m
L
i 1
1
e
2
( x1i 1 ) 2
2 2
n
i 1
1
e
2
( x2 i 2 ) 2
2 2
– Where μ1 and μ2 are different
7/31/2008
22
Composite
– Further, the hypotheses being tested do
not specify all parameters
– They are composite
– This chapter only outlines aspects of
composite test theory relevant to the
material in this book.
7/31/2008
23
Parameter Spaces
– The set of values the parameters of interest
can take
– Null hypothesis: parameters in some region ω
– Alternate hypothesis: parameters in Ω
– ω is usually a subspace of Ω
• Nested hypothesis case
– Null hypothesis nested within alternate hypothesis
– This book focuses on this case
• “if the alternate hypothesis can explain the data
significantly better we can reject the null
hypothesis”
7/31/2008
24
λ Ratio
• Optimality theory for composite tests
suggests this as desirable test statistic:
Lmax ( )
Lmax ()
• Lmax(ω): maximum likelihood when parameters
are confined to the region ω
• Lmax(Ω): maximum likelihood when parameters
are confined to the region Ω, defined by H1
• H0 is rejected when λ is sufficiently small (→
Type I error)
7/31/2008
25
Example: t-tests
• The next slides calculate the λ-ratio
for the two sample t-test (with the
likelihood)
(x )
(x )
m
n
1
1
1i
L
i 1
2
e
2 2
1
2
2i
i 1
2
e
2
2
2 2
– t-tests later generalize to ANOVA and T2
tests
7/31/2008
26
Equal Variance Two-Sided ttest
• Setup
– Random variables X11,…,X1m in group 1 are
Normally and Independently Distributed
( μ 1 ,σ 2 )
– Random variables X21,…,X2n in group 2 are
NID (μ2,σ2)
– X1i and X2j are independent for all i and j
– Null hypothesis H0: μ1= μ2 (= μ, unspecified)
– Alternate hypothesis H1: both unspecified
7/31/2008
27
Equal Variance Two-Sided ttest
• Setup (continued)
– σ2 is unknown and unspecified in H0 and
H1
• Is assumed to be the same in both
distributions
{ ,0 2 }
1
– Region ω is:
2
{ 1 , 2 ,0 2 }
– Region Ω is:
7/31/2008
28
Equal Variance Two-Sided ttest
• Derivation
– H0: writing μ for the mean, when μ1= μ2,
the maximum over likelihood ω is at
X 11 X 12 X 1m X 21 X 22 X 2 n
ˆ X
mn
– And the (common) variance σ2 is
m
ˆ 02
7/31/2008
2
n
2
(
X
X
)
(
X
X
)
i1 1i
i1 2i
mn
29
Equal Variance Two-Sided ttest
– Inserting both into the likelihood
function, L
Lmax ( )
7/31/2008
1
2
0
(2ˆ )
m n
2
e
m n
2
30
Equal Variance Two-Sided ttest
– Do the same thing for region Ω
ˆ1 X 1
X 11 X 12 X 1m
m
ˆ 2 X 2
m
̂ 12
X 21 X 22 X 2 n
n
n
2
2
(
X
X
)
(
X
X
)
1
2
i 1 1i
i 1 2i
mn
– Which produces this likelihood Function,
L
m n
Lmax ()
7/31/2008
1
(2ˆ12 )
m n
2
e
2
31
Equal Variance Two-Sided ttest
– The test statistic λ is then
e
m n
2
Lmax ( ) (2ˆ 02 )
m2n
Lmax ()
e
2
1
(2ˆ )
m n
2
ˆ12
2
ˆ 0
m n
2
m n
2
It’s the same function, just
With different variances
7/31/2008
32
Equal Variance Two-Sided ttest
– We can then use the algebraic identity
m
n
2
m
i 1
i 1
– To show that
– Where t
7/31/2008
2
n
2
(
X
X
)
(
X
X
)
(
X
X
)
(
X
X
)
1
2
1i
2i
1i
2i
i 1
2
1
1t 2
mn 2
i 1
mn
( X 1 X 2 )2
mn
m n
2
( X 1 X 2 ) mn
T
is (from Ch. 3)
S mn
33
Equal Variance Two-Sided ttest
– t is the observed value of T
– S is defined in Ch. 3 as
m
n
2
(
X
X
)
(
X
X
)
1
2
1i
2i
S 2 i 1
2
i 1
mn 2
λ
We can plot λ as a
function of t:
(e.g. m+n=10)
7/31/2008
t
34
Equal Variance Two-Sided ttest
– So, by the monotonicity argument, we can
use t2 or |t| instead of λ as test statistic
– Small values of λ correspond to large
values of |t|
– Sufficiently large |t| lead to rejection of H0
– The H0 distribution of t is known
• t-distribution with m+n-2 degrees of freedom
– Significance points are widely available
• Once α has been chosen, values of |t|
sufficiently large to reject H0 can be determined
7/31/2008
35
http://www.socr.ucla.edu/Applets.dir/T-table.html
Equal Variance Two-Sided ttest
7/31/2008
36
Equal Variance One-Sided ttest
• Similar to Two-Sided t-test case
– Different region Ω for H1:
• Means μ1 and μ2 are not simply different, but
one is larger than the other μ1 ≥ μ2
{1 2 ,0 2 }
x1 x 2
• If
then maximum likelihood
estimates are the same as for the two-sided
case
7/31/2008
37
Equal Variance One-Sided ttest
• If x1 x 2
then the unconstrained maximum
of the likelihood is outside of ω
x1 , xat
• The unique maximum (is
2)
, implying
that the maximum in ω occurs at a boundary
point in Ω
( x)
• At this point estimates of μ1 and μ2 are equal
• At this point the likelihood ratio is 1 and H0 is
not rejected
• Result: H0 is rejected in favor of H1 (μ1 ≥ μ2)
only for sufficiently large positive values of t
7/31/2008
38
Example - Revised
• This scenario fits with our original
example:
– H1 is that the average height of KU
basketball players is bigger than for the
general population
– One-sided test
– We could assume that we don’t know the
averages for H0 and H1
– We actually don’t know σ (I just guessed 2 in
the original example)
7/31/2008
39
Example - Revised
• Updated example:
– Observation in group 1 (KU): X1 = {83, 81, 71}
– Observation in group 2: X2 = {65, 72, 70}
– Pick significance point for t from a table: tα =
2.132
• t-distribution, m+n-2 = 4 degrees of freedom, α =
0.05
– Calculate t with
(78.our
3 69observations
) 9
27.9
t
5.2122 6
2.185
12.7673
– t > tα, so we can reject H0!
7/31/2008
40
Comments
• Problems that might arise in other cases
– The λ-ratio might not reduce to a function of a
well-known test statistic, such as t
– There might not be a unique H0 distribution of λ
– Fortunately, the t statistic is a pivotal quantity
• Independent of the parameters not prescribed by H0
– e.g. μ, σ
– For many testing procedures this property does
not hold
7/31/2008
41
Unequal Variance Two-Sided
t-test
• Identical to Equal Variance Two-Sided ttest
– Except: variances in group 1 and group 2 are
no longer assumed to be identical
• Group 1: NID(μ1, σ12)
• Group 2: NID(μ2, σ22)
• With σ12 and σ22 unknown and not assumed
identical
• Region ω = {μ1 = μ2, 0 < σ12, σ22 < +∞}
• Ω makes no constraints on values μ1, μ2, σ12, and σ22
7/31/2008
42
Unequal Variance Two-Sided
t-test
– The likelihood function of (X11, X12, …,
X1m, X21, X22, …, X2n) then becomes
m
i 1
1
e
2 1
( x1i 1 ) 2
2 12
n
i 1
1
e
2 2
( x21i 2 ) 2
2 22
– Under H0 (μ1 = μ2 = μ), this becomes:
m
i 1
7/31/2008
1
e
2 1
( x1i ) 2
2 12
n
i 1
1
e
2 2
( x21i ) 2
2 22
43
Unequal Variance Two-Sided
t-test
̂ ˆ12
– Maximum likelihood estimates
,ˆ 22
and
satisfy the simultaneous
equations:
(x
1i
2
1
ˆ )
ˆ
2i
2
2
ˆ )
ˆ
2
1
(x
ˆ 22
(x
ˆ
7/31/2008
(x
1i
0
ˆ ) 2
m
2i
ˆ ) 2
n
44
Unequal Variance Two-Sided
t-test
– cubic equation in̂
– Neither the λ ratio, nor any monotonic
function has a known probability distribution
when H0 is true!
– This does not lead to any useful testing
statistic
• The t-statistic may be used as reasonably close
• However H0 distribution is still unknown, as it
depends on the unknown ratio σ12/σ22
• In practice, a heuristic is often used (see Ch. 3.5)
7/31/2008
45
The -2 log λ Approximation
7/31/2008
46
The -2 log λ Approximation
• Used when the λ-ratio procedure does
not lead to a test statistic whose H0
distribution is known
– Example: Unequal Variance Two-Sided ttest
• Various approximations can be used
– But only if certain regularity assumptions
and restrictions hold true
7/31/2008
47
The -2 log λ Approximation
• Best known approximation:
– If H0 is true, -2 log λ has an asymptotic
chi-square distribution,
• with degrees of freedom equal to the
difference in parameters unspecified by H0
and H1, respectively.
• λ is the likelihood ratio
• “asymptotic” = “as the sample size → ∞”
– Provides an asymptotically valid testing
procedure
7/31/2008
48
The -2 log λ Approximation
– Restrictions:
• Parameters must be real numbers that can take
on values in some interval
• The maximum likelihood estimator is found at a
turning point of the function
– i.e. a “real” maximum, not at a boundary point
• H0 is nested in H1 (as in all previous slides)
– These restrictions are important in the
proof
• I skip the proof…
7/31/2008
49
The -2 log λ Approximation
• Instead:
– Our original basketball example, revised
again:
• Let’s drop our last assumption, that the variance
in the population at large is the same as in the
group of KU basketball players.
• All we have left now are our observations and
the hypothesis that μ1 > μ2
– Where μ1 is the average height of Basketball players
• Observation in group 1 (KU): X1 = {83, 81, 71}
• Observation in group 2: X2 = {65, 72, 70}
7/31/2008
50
Example – Revised Again
– Using the Unequal Variance One-Sided tTest
– We get:
7/31/2008
51
The Analysis of Variance
(ANOVA)
7/31/2008
52
The Analysis of Variance
(ANOVA)
• Probably the most frequently used
hypothesis testing procedure in
statistics
• This section
– Derives of the Sum of Squares
– Gives an outline of the ANOVA procedure
– Introduces one-way ANOVA as a
generalization of the two-sample t-test
– Two-way and multi-way ANOVA
– Further generalizations of ANOVA
7/31/2008
53
Sum of Squares
• New variables (from Ch. 3)
– The two-sample t-test tests for equality
of the means of two groups.
– We could express the observations as:
X ij i Eij
i 1,2
– Where the Eij are assumed to be
NID(0,σ2)
– H0 is μ1 = μ2
7/31/2008
54
Sum of Squares
– This can also be written as:
X ij i Eij
i 1,2
• μ could be seen as overall mean
• αj as deviation from μ in group j
– This model is overparameterized
• Uses more parameters than necessary
m1 n 2 0
• Necessitates the requirement
• (always assumed imposed)
7/31/2008
55
Sum of Squares
– We are deriving a test procedure similar
to the two-sample two-sided t-test
– Using |t| as test statistic
• Absolute value of the T statistic
– This is equivalent to using t2
• Because it’s a monotonic function of |t|
– The square of the t statistic (from Ch. 3)
( X 1 X 2 ) mn
T
S mn
7/31/2008
56
Sum of Squares
– …can, after algebraic manipulations, be
written as F
B
F (m n 2)
W
m
– where
X
1
j 1
X1 j
n
X 2
m
j 1
X2j
X
n
mX 1 nX 2
mn
mn
B
( X 1 X 2 ) 2 m( X 1 X ) 2 n( X 2 X ) 2
mn
m
n
W ( X 1 j X 1 ) ( X 2 j X 2 ) 2
j 1
7/31/2008
2
j 1
57
Sum of Squares
– B: between (among) group sum of squares
– W: within group sum of squares
– B + W: total sum of squares
• Can be shown to be:
m
n
i 1
i 1
2
2
(
X
X
)
(
X
X
)
1i
2i
– Total number of degrees of freedom: m + n
–1
• Between groups: 1
• Within groups: m + n - 2
7/31/2008
58
Sum of Squares
B
– This gives us the F statistic
F ( m n 2)
W
– Our goal is to test the significance of the
difference between the means of two groups
• B measures the difference
– The difference must be measured relative to
the variance within the groups
• W measures that
– The larger F is, the more significant the
difference
7/31/2008
59
The ANOVA Procedure
• Subdivide observed total sum of
squares into several components
– In our case, B and W
• Pick appropriate significance point for
a chosen Type I error α from an F table
• Compare the observed components to
test our hypothesis
7/31/2008
60
F-Statistic
• Significance points depend on
degrees of freedom in B and W
– In our case, 1 and (m + n – 2)
7/31/2008
http://www.ento.vt.edu/~sharov/PopEcol/tables/f005.html
61
Comments
• The two-group case readily generalizes
to any number of groups.
• ANOVAs can be classified in various
ways, e.g.
– fixed effects models
– mixed effects models
– random effects model
– Difference is discussed later
– For now we consider fixed effect models
• Parameter αi is fixed, but unknown, in group i
7/31/2008
X ij i Eij
62
Comments
• Terminology
– Although ANOVA contains the word
‘variance’
– What we actually test for is a equality in
means between the groups
• The different mean assumptions affect the
variance, though
• ANOVAs are special cases of regression
models from Ch. 8
7/31/2008
63
One-Way ANOVA
• One-Way fixed-effect ANOVA
• Setup and derivation
– Like two-sample t-test for g number of groups
– Observations (ni observations, i=1,2,…,g)
X i1 , X i 2 , , X in
– Using overparameterized
model for X
X ij i Eij
j 1,2, , ni
i 1,2, , g
– Eij assumed NID(0,σ2), Σniαi = 0, αi fixed in
group i
7/31/2008
64
One-Way ANOVA
– Null Hypothesis H0 is: α1 = α2 = … = αg =
0
g n
– Total sum of squares
is
( X X )2
i
ij
i 1 j 1
g n
g
– This is subdivided
into
B
and W
2
i
W ( X ij X i ) 2
B ni ( X i X )
i 1 j 1
i 1
– with
7/31/2008
ni
X ij
j 1
ni
X i
g
ni
X
i 1 j 1
X ij
N
g
N ni
i 1
65
One-Way ANOVA
– Total degrees of freedom: N – 1
• Subdivided into dfB = g – 1 and dfW = N - g
– This gives us our test statistic F
B N g
F *
W g1
– We can now look in the F-table for these
degrees of freedom to pick significance points
for B and W
– And calculate B and W from the observed data
– And accept or reject H0
7/31/2008
66
Example
• Revisiting the Basketball example
– Looking at it as a One-Way ANOVA
analysis
• Observation in group 1 (KU): X1 = {83, 81,
71}
• (73Observation
group
X.662 =
70}
.66 83) (73.66 81) (73in
.66 71
) (73.66 652:
) (73
72) {65,
(73.66 70)72,
239.3336
2
2
2
2
2
2
– Total Sum of Squares:
g
B ni ( X i X ) 2 3(78.33 76.33) 2 3(69 76.33) 2 130.57
– B (between groups sum of squares)
i 1
7/31/2008
67
Example
– W (within groups sum of squares)
g
ni
W ( X ij X i ) 2
i 1 j 1
((83 78.33) 2 (81 78.33) 2 (71 78.33) 2 ) ((65 69) 2 (72 69) 2 (70 69) 2 )
108.667
– Degrees of freedom
• Total: N-1 = 5
• dfB = g – 1 = 2 - 1 = 1
• dfW = N – g = 6 – 2 = 4
7/31/2008
68
Example
– Table lookup for df 1 and 4 and α = 0.05:
– Critical value: F = 7.71
– Calculate F from our data:
B N g 130.57 6 2
F *
*
4.806
W g 1 108.667 2 1
– So… 4.806 < 7.71
– With ANOVA we actually accept H0!
• Seems to be the large variance in group 1
7/31/2008
69
Same Example – with Excel
• Screenshots:
7/31/2008
70
Excel
• Offers most of these tests, built-in
7/31/2008
71
Two-Way ANOVA
• Two-Way Fixed Effects ANOVA
• Overview only (in the scope of this book)
• More complicated setup; example:
– Expression levels of one gene in lung cancer
patients
– a different risk classes
• E.g.: ultrahigh, very high, intermediate, low
– b different age groups
– n individuals for each risk/age combination
7/31/2008
72
Two-Way ANOVA
– Expression levels (our observations): Xijk
• i is the risk class (i = 1, 2, …, a)
• j indicates the age group
• k corresponds to the individual in each group ( k = 1,
…, n)
– Each group is a possible risk/age combination
• The number of individuals in each group is the
same, n
• This is a “balanced” design
• Theory for unbalanced designs is more complicated
and not covered in this book
7/31/2008
73
Two-Way ANOVA
– The Xijk can be arranged in a table:
Risk category
1
2
3
4
1
n
n
n
n
2
n
n
n
n
3
n
n
n
n
4
n
n
n
n
5
n
n
n
n
Age group
j
i
Number of individuals in this
risk/age group (aka “cell”)
7/31/2008
This is a two-way table
74
Two-Way ANOVA
– The model adopted for each Xijk is
X ijk i j ij Eijk
i 1,2, , a
j 1,2, , b
k 1,2, , n
• Where Eijk are NID(μ, α2)
• The mean of Xijk is μ + αi + βi + δij
• αi is a fixed parameter, additive for risk class i
• βi is a fixed parameter, additive for age group i
• δij is a fixed risk/age interaction parameter
– Should be added is a possible group/group interaction
exists
7/31/2008
75
Two-Way ANOVA
– These constraints are imposed
• Σ i α i = Σ i βi = 0
• Σiδij = 0 for all j
• Σjδij = 0 for all i
– The total sum of squares is then subdivided
into four groups:
•
•
•
•
7/31/2008
Risk class sum of squares
Age group sum of squares
Interaction sum of squares
Within cells (“residual” or “error”) sum of
squares
76
Two-Way ANOVA
– Associated with each sum of squares
• Corresponding degrees of freedom
• Hence also a corresponding mean square
– Sum of squares divided by degrees of freedom
– The mean squares are then compared using
F ratios to test for significance of various
effects
• First – test for a significant risk/age interaction
• F-ratio used is ratio of interaction mean square
and within-cells mean square
7/31/2008
77
Two-Way ANOVA
Age
• If such an interaction is used, it may not be
reasonable to test for significant risk or age
differences
• Example, μ in two risk classes, two age
Risk
groups:
1
2
Age
– No evidence of interaction
1
4
12
2
7
15
1
2
1
4
15
2
11 6
– Example of interaction
7/31/2008
78
Multi-Way ANOVA
• One-way and two-way fixed effects
ANOVAs can be extended to multiway ANOVAs
• Gets complicated
• Example: three-way ANOVA model:
X ijkm i j k ij ik jk ijk Eijkm
7/31/2008
79
Further generalizations of
ANOVA
• The 2m factorial design
– A particular form of the one-way ANOVA
• Interactions between main effects
– m “factors” taken at two “levels”
• E.g. (1) Gender, (2) Tissue (lung, kidney), and
(3) status (affected, not affected)
– 2m possible combinations of levels/groups
– Can test for main effects and interactions
– Need replicated experiments
• n replications for each of the 2m experiments
7/31/2008
80
Further generalizations of
ANOVA
– Example, m = 3, denoted by A, B, C
• 8 groups, {abc, ab, ac, bc, a, b, c, 1}
• Write totals of n observations Tabc, Tab, …, T1
• The total between sum of squares can be
subdivided into seven individual sums of
squares
–
–
–
–
7/31/2008
Three main effects (A, B, C)
Three pair wise interactions (AB, AC, BC)
One triple-wise interaction (ABC)
Example:
ofTacsquares
(TabcSum
Tab
Ta Tbc for
Tb A,
T cand
T1 ) 2 for BC,
respectively
8n
(Tabc Tab Tac Ta Tbc Tb T cT1 ) 2
8n
81
Further generalizations of
ANOVA
– If m ≥ 5 the number of groups becomes large
– Then the total number of observations, n2m is
large
– It is possible to reduce the number of
observations by a process …
• Confounding
– Interaction ABC probably very small and not
interesting
– So, prefer a model without ABC, reduce data
– There are ANOVA designs for that
7/31/2008
82
Further generalizations of
ANOVA
• Fractional Replication
– Related to confounding
– Sometimes two groups cannot be
distinguished from each other, then they
are aliases
• E.g. A and BC
– This reduces the need to experiments and
data
– Ch. 13 talks more about this in the context
of microarrays
7/31/2008
83
Random/Mixed Effect
Models
• So far: fixed effect models
– E.g. Risk class, age group fixed in previous
example
• Multiple experiments would use same categories
• But: what if we took experimental data on several
random days?
• The days in itself have no meaning, but a
“between days” sum of squares must be extracted
– What if the days turn out to be important?
– If we fail to test for it, the significance of our procedure is
diminished.
– Days are a random category, unlike risk and age!
7/31/2008
84
Random/Mixed Effect
Models
• Mixed Effect Models
– If some categories are fixed and some
are random
– Symbols used:
• Greek letters for fixed effects
• Uppercase Roman letters for random effects
• Example: two-way mixed effect model with
– Risk class a and days d and n values collected
each day, the appropriate model is written:
X ikl i Dl Gil Eikl
7/31/2008
85
Random/Mixed Effect
Models
• Random effect model have no fixed
categories
• The details on the ANOVA analysis depend
on which effects are random and which are
fixed
• In a microarray context (more in Ch. 13)
– There tend to be several fixed and several
random effects, which complicates the analysis
– Many interactions simply assumed zero
7/31/2008
86
Multivariate Methods
ANOVA: the Repeated
Measures Case
Bootstrap Methods: the Twosample t-test
All skipped …
7/31/2008
87
Sequential Analysis
7/31/2008
88
Sequential Analysis
• Sequential Probability Ratio
– Sample size not known in advance
– Depends on outcomes of successive
observations
– Some of this theory is in BLAST
• Basic Local Alignment Search Tool
– The book focuses on discreet random
variables
7/31/2008
89
Sequential Analysis
– Consider:
• Random variable Y with distribution P(y;ξ)
• Tests usually relate to the value of
parameter ξ
• H0: ξ is ξ0
• H1: ξ is ξ1
• We can choose a value for the Type I error α
• And a value for the Type II error β
• Sampling then continues while
P ( y1 ; 1 ) P( y2 ; 1 ) P( yn ; 1 )
A
B
P( y1 ; 0 ) P ( y2 ; 0 ) P( yn ; 0 )
7/31/2008
90
Sequential Analysis
– A and B are chosen to correspond to an α
and β
– Sampling continues until the ratio is less
than A (accept H0) or greater than B (reject
H 0)
– Because these are discreet variables,
boundary overshoot usually occurs
• We don’t expect to exactly get values α and β
– Desired values for α and β approximately
1
achieved
A by using
B
1
7/31/2008
91
Sequential Analysis
– It is also convenient to take logarithms,
which gives us:
P( yi ; 1 )
1
log
log
log
1
P ( yi ; 0 )
i
– Using
S1,0 ( y ) log
– We can write
7/31/2008
P ( y; 1 )
P ( y; 0 )
1
log
S1, 0 ( yi ) log
1
i
92
Sequential Analysis
• Example: sequence matching
– H0: p0 = 0.25 (probability of a match is 0.25)
– H1: p1 = 0.35 (probability of a match is 0.35)
– Type I error α and Type II error β chosen 0.01
– Yi: 1 if there is a match at position i,
otherwise 0
– Sampling continues while
1
log S1, 0 (Yi ) log 99
99 i
– with
7/31/2008
(0.35)Yi (0.65) (1 Yi )
S1,0 (Yi ) log
(0.25)Yi (0.75) (1 Yi )
93
Sequential Analysis
– S can be seen as the support offered by
Yi for H1
– The inequality
can be re-written as
9.581 (Y 0.2984) 9.581
i
i
– This is actually a random walk with step
sizes 0.7016 for a match and -0.2984 for
a mismatch
7/31/2008
94
Sequential Analysis
• Power Function for a Sequential Test
– Suppose the true value of the parameter
of interest is ξ
– We wish to know the probability that H1
is accepted, given ξ
– This probability is the power Ρ(ξ) of the
test
( )
7/31/2008
*
1
1 *
*
1
1 (
(
) (
)
)
95
Sequential Analysis
– Where θ* is the unique non-zero solution
to θ in
P ( y; 1 )
P( y; ) P( y; )
yR
0
1
– R is the range of values of Y
– Equivalently, θ* is the unique non-zero
S ( y )
1
solution to θ in P( y; )e
1, 0
yR
– Where S is defined as before
7/31/2008
96
Sequential Analysis
– This is very similar to Ch. 7 – Random
Walks
– The parameter θ* is the same as in Ch. 7
– And it will be the same in Ch 10 – BLAST
– < skipping the random walk part >
7/31/2008
97
Sequential Analysis
• Mean Sample Size
– The (random) number of observations
until one or the other hypothesis is
accepted
– Find approximation by ignoring
boundary overshoot
– Essentially identical method used to find
the mean number of steps until the
random walk stops
7/31/2008
98
Sequential Analysis
– Two expressions are calculated for
ΣiS1,0(Yi)
• One involves the mean sample size
• By equating both expressions, solve for
1
mean
sample
size
S
(
y
)
(
1
(
))
log
(
)
log
1,0 i
i
1
P(Yi ; 1 )
P(Yi ; 1 )
E ( S1, 0 (Yi )) E log
P(Yi ; ) log
P(Yi ; 0 ) yR
P(Yi ; 0 )
7/31/2008
99
Sequential Analysis
– So, the mean sample size is:
(1 ( )) log( 1 ) ( ) log( 1 )
P ( y ;1 )
P
(
y
;
)
log
yR
P ( y ; 0 )
– Both numerator and denominator
depend on Ρ(ξ), and so also on θ*
– A generalization applies if Q(y) of Y has
different distribution than H0 and H1 –
relevant to BLAST
1
(1 ( )) log( 1 ) ( ) log(
P ( y ;1 )
Q
(
y
)
log
yR
P ( y ; 0 )
7/31/2008
)
100
Sequential Analysis
• Example
– Same sequence matching example as
before
• H0: p0 = 0.25 (probability of a match is 0.25)
• H1: p1 = 0.35 (probability of a match is 0.35)
• Type I error α and Type II error β chosen 0.01
– Mean sample size equation is:
9.190( p ) 4.595
p log 75 (1 p ) log 13
15
– Mean sample size is when H0 is true: 194
– Mean sample size is when H1 is true: 182
7/31/2008
101
Sequential Analysis
• Boundary Overshoot
– So far we assumed no boundary overshoot
– In practice, there will almost always be, though
• Exact Type I and Type II errors different from α and β
– Random walk theory can be used to assess how
significant the effects of boundary overshoot are
– It can be shown that the sum of Type I and Type II
errors is always less than α + β (also individually)
– BLAST deals with this in a novel way -> see Ch.
10
7/31/2008
102