Statistics calculated from the sample The interval [L, U] is called a 95 confidence interval (C.I.) for the parameter θ if
I nterval Estimation
I nterval Estimation
Example : Election 2000 voting poll from 2386 interviews
Example : VSRS {X 1 , X 2 ,..., X n }
47% > 45%
Result of Nationwide
Gallup poll
(
N µ,σ 2
)
σ2
X ~ N µ ,
n
Can Bush get more votes than Gore?
November 5, 2000
from
Point estimates
Interval
estimates
George W. Bush
47%
Al Gore
45%
Ralph Nader
4%
Buchanan
1%
Not yet decided
3%
µ − 1.96σ n − µ
µ + 1.96σ n − µ
σ
σ
Pr µ − 1.96
≤ X ≤ µ + 1.96
= Φ
− Φ
σ n
σ n
n
n
Bush
Gore
43
45
47
49
0 .(975
=Φ
1.96−) −(1Φ−(0−.1975
.96)
%
= 0.95
Margin
Margin
of error
of error
: 0.05%
: 2%
“toowill
close
to the
call”
“Bush
win
race”
I nterval Estimation
Confidence I nterval
σ2
X ~ N µ,
n
µ − 1.96
σ
n
L, U
µ + 1.96
area = 0.95
X + 1.96
X
X
X − 1.96
X
σ
σ X
σ
Pr Xµn− 1−.961.σ96
= 95%
≤ X ≤ µ + 1.96
X
σ
n
n
− 1.96
Xn
X − 1.96
σ
n
Xn
σ
n
σ
X − 1.96
Xn
X
X − 1.96
X
X
X
X − 1.96
σ
n
The
Theinterval
interval[L,
[L,U]
U]isiscalled
calleda a100(195% confidence
90%
α)% confidence
interval
interval
(C.I.)(C.I.)
for
forthe
theparameter
parameterθθifif
σ
n
σ
+ 1.%
96 confidence interval
X95
σ
n
X + 1.96
n
σ
σ
σ
X − 1.96
, X + 1.96
X + 1.96
nσ
n
n X + 1.96
n
σ
X + 1.96
n
σ
X + 1.96
n
(
N µ ,σ
Pr
Pr((LL ≤≤θθ ≤≤U
U)) ==10-.95
9α0
2
)
for
forany
anypossible
possible θθ
Confidence level
95%
within
bounds
95%ofofthe
themeans
intervals
contain
µ
σ
n
Confidence I nterval
Normal population
Statistics calculated from the sample
α
Confidence level
0.01
99%
0.05
95%
0.1
90%
Tail Area
µ unknown
σ2
known
VSRS {X 1 , X 2 ,..., X n }
Z 0.05 = 1.645
N (0,1)
σ σσ
X σ−σµ
X
)− Φ
(−101==.−95
Pr
−1Z21.≤.α96
≤≤≤µ1µ
≤ ≤X
+1Z(.1α296
Pr−XX1.−96
576
645
.96
..96
576
96
00α..)90
99
= 0.95
2
2645
=
=+Φ
σ nnn
n nn
90%
99%
100(1confidence
α)% C.I. interval
for µ : for µ :
95%
X ±±12Z..645
σσσσ
σ
576
962
, X + 1.96
α
XX±−11..96
n
nnnn
point estimate
1-α
α/2
Z 0.025 = 1.96
α/2
Z 00..15
= Z .3 2
15 = 1.0036
Z 0.9 = −1Z.0282
.1
-Zα/2
0
Zα/2
margin of error
1
Confidence I nterval
Example : X
Confidence I nterval
1.22
X ~ N µ,
60
alcohol concentration of French wine
(
X ~ N µ ,1.2
2
)
A 90% C.I. for µ :
X ± Z 0.05
µ + 0.255
µ − 0.255
VSRS {X 1 , X 2 ,..., X 60 }
area = 0.90
X = 9.3
1.2
σ
= 9.3 ± (1.645)
n
60
90 % confidence intervals
X
9.3
= 9.3 ± 0.255
X
8.9
= [9.045 , 9.555]
?
?
(
)
90% of the intervals contain µ
Student-t Distribution
f (x ) =
VSRS {X 1 , X 2 ,..., X n }
S=
1 n
2
∑ (X i − X )
n − 1 i =1
r +1
r +1
Γ
−
1
2
2 r − 2 1 + x 2
r
r 1
Γ Γ
2 2
XX±±t nZ−1α,α 22
σS
nn
0
r
for r > 2
r −2
t1 ≡ Cauchy
unknown
0
3
X ~ tr
tr
-tr,α/2
Var ( X ) =
t3
Student-t Distribution Table
Student-t Distribution
1-α
X ~ tr
E ( X ) = 0 for r > 1
t7
3
α/2
, −∞ < x < ∞
N (0,1) ≡ t ∞
Student-t Densities
σS
σ S
≤ µ ≤ X + tZnα−1,2α 2 = 1= −1α
X X− −
t n −Z1,α 2
−α
PrPr
n
n n
100(1-α)% C.I. for µ :
9.155]
[9.245 , 9.755]
X
µ unknown σ2 unknown
XX −−−µµµ
~ ~N~(t0n,?1− 1)
σS nnn
9.5
X
Normal Model, Unknown Variance
N µ ,σ 2
[8.645 ,
X
X
Pr (9.045 ≤ µ ≤ 9.555) = 0.9
Normal population
[9.045 , 9.555]
X
α/2
tr,α/2
t 8,0..05 = 1?.860
t11,0.025 = 2? .201
2
Normal Model, Unknown Variance
Example : X
Frequency of elephant call
X ~ N (µ , σ 2 )
VSRS {X 1 , X 2 ,..., X 12 }
A 95% C.I. for µ :
Population 1
Population 2
µ x , σ x2
µ y , σ y2
X = 22.33 S = 56.424
2
S
X ± t11, 0.025
Two Samples Problem
n
= 22.33 ± (2.201)
56.424
12
= 22.33 ± 4.773
VSRS
= [17.557 , 27.103]
{Y1 , Y2 ,..., Yn }
{X 1 , X 2 ,..., X m }
estimate
Pr (17.557 ≤ µ ≤ 27.103) = 0.95
Compare µx and µy by :
Normal Models, Known Variances
Normal populations
(
N µ x ,σ
2
x
(
)
N µ y ,σ
2
y
µx , µy unknown
σx2 , σy2 known
)
Independent VSRSs {X 1 , X 2 ,..., X m } {Y1 , Y2 ,..., Yn }
2
2
σ x2 σ y
σ x2 σ y
= 1 −α
Pr (X − Y ) − Z α 2
+
≤ µ ≤ (X − Y ) + Zα 2
+
m
n
m
n
(X − Y ) ± Z
100(1-α)% C.I. for µ :
point estimate
2
σ x2 σ y
+
m
n
α 2
Normal Models, Unknown Variances
Normal populations
Normal Models, Unknown Variances
Fertilizer A ( X)
12, 11, 7, 13, 8, 9, 10, 13
13, 11, 10, 6, 7, 4, 10
100(1-α)% C.I. for µx - µy :
95% C.I. for µx - µy :
)
µx , µy unknown
σx2 , σy2 unknown
σ x2 = σ y2 = σ 2
(m − 1)S x2 + (n − 1)S y2
m+n−2
σ y2
(X −−YY ))±±Zt m + n − 2σ,α 2 S+ pool
2
x
α 2
m
n
1 1
+
m n
unknown
Large Sample Approximation
2
µ
N(µ, σ
,σ 2 )
For large n
X −µ .
N (0,1)
~N
σ n
?
σ
σ
≈
≤ µ ≤ X + Zα 2
= 1−α
Pr X − Z α 2
n
n
(iii) Equal variance
100(1-α100(1)% C.I.α)%
forC.I.
µ : for µ :
Approximate
m = 8, n = 7, X = 10.371, Y = 8.714, S x2 = 5.125, S y2 = 9.905
22
SS pool
=
pool =
(
N µ y , σ y2
2
=
S pool
Pooled sample variance
Assumptions : (i) Normal populations
Pooled sample variance :
)
Assumption : Equal variances
Arbitrary
Population
Normal Population
Example : growth of tomatoes
(ii) Independent samples
(
N µ x , σ x2
margin of error
Fertilizer B ( Y)
unknown
Independent VSRSs {X 1 , X 2 ,..., X m } {Y1 , Y2 ,..., Yn }
2
σ x2 σ y
+
X − Y ~ N µ x − µ y ,
m
n
Point estimator for µx - µy :
µx − µy
((8m−−11)()5S .125
+ ()n+−(17)S− 1)(9.905)
= 7.331
2
x
− 27 − 2
m + n8 +
2
y
11 1 1 1 1
=2),7α[±−.331
1 . 366
, +4. 331
1(10
X
. 661
.−371
Y± )3(−±2.027
8.t160
.m714
+.688) ] +
+ n −)(
2 tS13pool
, 0).025 (7
8m 7 n 8 7
VSRS
{X 1 , X 2 ,..., X n }
X ± Zα
2
σ
n
IF σ is unknown, replace it by S .
3
Population Proportion
Population Proportion
Large Population
X − nπ
For large n
nπ (1 − π )
Virus carriers
1−π
π
π
X ~ b(n, π )
X
π=
n
X − nπ
nπ (1 − π )
≤ Zα 2 ≈ 1 − α
Approximate 100(1-α)% C.I. for π :
2π + Z α2 2 ±
Point estimator for π :
Random sample of size n
.
Pr − Z α 2 ≤
Population proportion
X = number of virus carriers in the sample
~ N (0,1)
(2π + Z n ) − 4π (1 + Z
2(1 + Z
n)
2
2
α 2
2
2
α 2
n
)
2
α 2
π (1 − π )
≈ π ± Zα 2
n
Sample proportion
Population Proportion
Two Samples Problem
Population 2
Population 1
Example : 41 people out of 500 persons were unemployed
K 41
π=
= 0.082
500
π2
π1
Approximate 95% C.I. for π :
K
K
(0.082)(0.918)
π (1 − π )
K
π ± Z 0.025
= 0.082 ± (1.96)
500
n
= 0.082 ± 0.024
Sample with size n1
= [5.8% , 10.6%]
Estimated unemployment rate is 8.2% with margin of error 2.4% .
Two Samples Problem
Point estimators
For large n1, n2
X
π1 =
n1
X − n1π 1
n1π 1 (1 − π 1 )
.
π1 − π 2
unknown
Example : ability of detergents to remove stains
Y − n 2π 2
n2π 2 (1 − π 2 )
Approximate 100(1-α)% C.I. for π1 - π2 :
(π1 − π 2 ) ± Z α 2
Compare π1 and π2 by :
Sample with size n2
Two Samples Problem
Y
π2 =
n2
~ N (0,1)
Y
X
π 1 (1 − π 1 ) π 2 (1 − π 2 )
+
n1
n2
~ N (0,1)
.
Detergent
Successes
Failures
Total
1
63
28
91
2
42
37
79
K 63 9
=
= 69.23%
π1 =
91 13
42
K
= 53.16%
π2 =
79
Approximate 90% C.I. for π1 - π2 :
(π1 − π 2 ) ± Z 0.05
π 1 (1 − π 1 ) π 2 (1 − π 2 )
+
n1
n2
π 1π−1 π>2π>2 0
(9 13)(4 13) + (42 79 )(37 79 )
9 42
= − ± (1.645)
91
79
13 79
= 0.1607 ± 0.1219 = [3.88% , 28.26%]
4
Sample Size Determination
100(1-α)% C.I. for µ :
Zα 2
σ
n
Zα 2
X ± Zα
2
σ
n
σ σ σ
σ
Zα 2 Zα 2 Zα 2
n n n
n
µ
X
Sample Size Determination
Example : survey on entertainment expenses
Zα 2
σ σ
Zα 2
n n
Zα 2
σ
n
σ = $400
from past surveys
Precision requirement :
X X
X
95% confidence that estimation error is at most $120
D : precision requirement
D=
Z α 2σ
n
⇔n=
Z α2 2σ 2
D2
D = Zα 2
With probability 1-α,
estimation error is at most D.
n=
Z 02.025σ 2 (1.96)2 (400 )2
=
= 42.68 ≈ 43
D2
(120 )2
Z α2 2π (1 − π )
π (1 − π )
⇔n=
n
D2
5
I nterval Estimation
Example : Election 2000 voting poll from 2386 interviews
Example : VSRS {X 1 , X 2 ,..., X n }
47% > 45%
Result of Nationwide
Gallup poll
(
N µ,σ 2
)
σ2
X ~ N µ ,
n
Can Bush get more votes than Gore?
November 5, 2000
from
Point estimates
Interval
estimates
George W. Bush
47%
Al Gore
45%
Ralph Nader
4%
Buchanan
1%
Not yet decided
3%
µ − 1.96σ n − µ
µ + 1.96σ n − µ
σ
σ
Pr µ − 1.96
≤ X ≤ µ + 1.96
= Φ
− Φ
σ n
σ n
n
n
Bush
Gore
43
45
47
49
0 .(975
=Φ
1.96−) −(1Φ−(0−.1975
.96)
%
= 0.95
Margin
Margin
of error
of error
: 0.05%
: 2%
“toowill
close
to the
call”
“Bush
win
race”
I nterval Estimation
Confidence I nterval
σ2
X ~ N µ,
n
µ − 1.96
σ
n
L, U
µ + 1.96
area = 0.95
X + 1.96
X
X
X − 1.96
X
σ
σ X
σ
Pr Xµn− 1−.961.σ96
= 95%
≤ X ≤ µ + 1.96
X
σ
n
n
− 1.96
Xn
X − 1.96
σ
n
Xn
σ
n
σ
X − 1.96
Xn
X
X − 1.96
X
X
X
X − 1.96
σ
n
The
Theinterval
interval[L,
[L,U]
U]isiscalled
calleda a100(195% confidence
90%
α)% confidence
interval
interval
(C.I.)(C.I.)
for
forthe
theparameter
parameterθθifif
σ
n
σ
+ 1.%
96 confidence interval
X95
σ
n
X + 1.96
n
σ
σ
σ
X − 1.96
, X + 1.96
X + 1.96
nσ
n
n X + 1.96
n
σ
X + 1.96
n
σ
X + 1.96
n
(
N µ ,σ
Pr
Pr((LL ≤≤θθ ≤≤U
U)) ==10-.95
9α0
2
)
for
forany
anypossible
possible θθ
Confidence level
95%
within
bounds
95%ofofthe
themeans
intervals
contain
µ
σ
n
Confidence I nterval
Normal population
Statistics calculated from the sample
α
Confidence level
0.01
99%
0.05
95%
0.1
90%
Tail Area
µ unknown
σ2
known
VSRS {X 1 , X 2 ,..., X n }
Z 0.05 = 1.645
N (0,1)
σ σσ
X σ−σµ
X
)− Φ
(−101==.−95
Pr
−1Z21.≤.α96
≤≤≤µ1µ
≤ ≤X
+1Z(.1α296
Pr−XX1.−96
576
645
.96
..96
576
96
00α..)90
99
= 0.95
2
2645
=
=+Φ
σ nnn
n nn
90%
99%
100(1confidence
α)% C.I. interval
for µ : for µ :
95%
X ±±12Z..645
σσσσ
σ
576
962
, X + 1.96
α
XX±−11..96
n
nnnn
point estimate
1-α
α/2
Z 0.025 = 1.96
α/2
Z 00..15
= Z .3 2
15 = 1.0036
Z 0.9 = −1Z.0282
.1
-Zα/2
0
Zα/2
margin of error
1
Confidence I nterval
Example : X
Confidence I nterval
1.22
X ~ N µ,
60
alcohol concentration of French wine
(
X ~ N µ ,1.2
2
)
A 90% C.I. for µ :
X ± Z 0.05
µ + 0.255
µ − 0.255
VSRS {X 1 , X 2 ,..., X 60 }
area = 0.90
X = 9.3
1.2
σ
= 9.3 ± (1.645)
n
60
90 % confidence intervals
X
9.3
= 9.3 ± 0.255
X
8.9
= [9.045 , 9.555]
?
?
(
)
90% of the intervals contain µ
Student-t Distribution
f (x ) =
VSRS {X 1 , X 2 ,..., X n }
S=
1 n
2
∑ (X i − X )
n − 1 i =1
r +1
r +1
Γ
−
1
2
2 r − 2 1 + x 2
r
r 1
Γ Γ
2 2
XX±±t nZ−1α,α 22
σS
nn
0
r
for r > 2
r −2
t1 ≡ Cauchy
unknown
0
3
X ~ tr
tr
-tr,α/2
Var ( X ) =
t3
Student-t Distribution Table
Student-t Distribution
1-α
X ~ tr
E ( X ) = 0 for r > 1
t7
3
α/2
, −∞ < x < ∞
N (0,1) ≡ t ∞
Student-t Densities
σS
σ S
≤ µ ≤ X + tZnα−1,2α 2 = 1= −1α
X X− −
t n −Z1,α 2
−α
PrPr
n
n n
100(1-α)% C.I. for µ :
9.155]
[9.245 , 9.755]
X
µ unknown σ2 unknown
XX −−−µµµ
~ ~N~(t0n,?1− 1)
σS nnn
9.5
X
Normal Model, Unknown Variance
N µ ,σ 2
[8.645 ,
X
X
Pr (9.045 ≤ µ ≤ 9.555) = 0.9
Normal population
[9.045 , 9.555]
X
α/2
tr,α/2
t 8,0..05 = 1?.860
t11,0.025 = 2? .201
2
Normal Model, Unknown Variance
Example : X
Frequency of elephant call
X ~ N (µ , σ 2 )
VSRS {X 1 , X 2 ,..., X 12 }
A 95% C.I. for µ :
Population 1
Population 2
µ x , σ x2
µ y , σ y2
X = 22.33 S = 56.424
2
S
X ± t11, 0.025
Two Samples Problem
n
= 22.33 ± (2.201)
56.424
12
= 22.33 ± 4.773
VSRS
= [17.557 , 27.103]
{Y1 , Y2 ,..., Yn }
{X 1 , X 2 ,..., X m }
estimate
Pr (17.557 ≤ µ ≤ 27.103) = 0.95
Compare µx and µy by :
Normal Models, Known Variances
Normal populations
(
N µ x ,σ
2
x
(
)
N µ y ,σ
2
y
µx , µy unknown
σx2 , σy2 known
)
Independent VSRSs {X 1 , X 2 ,..., X m } {Y1 , Y2 ,..., Yn }
2
2
σ x2 σ y
σ x2 σ y
= 1 −α
Pr (X − Y ) − Z α 2
+
≤ µ ≤ (X − Y ) + Zα 2
+
m
n
m
n
(X − Y ) ± Z
100(1-α)% C.I. for µ :
point estimate
2
σ x2 σ y
+
m
n
α 2
Normal Models, Unknown Variances
Normal populations
Normal Models, Unknown Variances
Fertilizer A ( X)
12, 11, 7, 13, 8, 9, 10, 13
13, 11, 10, 6, 7, 4, 10
100(1-α)% C.I. for µx - µy :
95% C.I. for µx - µy :
)
µx , µy unknown
σx2 , σy2 unknown
σ x2 = σ y2 = σ 2
(m − 1)S x2 + (n − 1)S y2
m+n−2
σ y2
(X −−YY ))±±Zt m + n − 2σ,α 2 S+ pool
2
x
α 2
m
n
1 1
+
m n
unknown
Large Sample Approximation
2
µ
N(µ, σ
,σ 2 )
For large n
X −µ .
N (0,1)
~N
σ n
?
σ
σ
≈
≤ µ ≤ X + Zα 2
= 1−α
Pr X − Z α 2
n
n
(iii) Equal variance
100(1-α100(1)% C.I.α)%
forC.I.
µ : for µ :
Approximate
m = 8, n = 7, X = 10.371, Y = 8.714, S x2 = 5.125, S y2 = 9.905
22
SS pool
=
pool =
(
N µ y , σ y2
2
=
S pool
Pooled sample variance
Assumptions : (i) Normal populations
Pooled sample variance :
)
Assumption : Equal variances
Arbitrary
Population
Normal Population
Example : growth of tomatoes
(ii) Independent samples
(
N µ x , σ x2
margin of error
Fertilizer B ( Y)
unknown
Independent VSRSs {X 1 , X 2 ,..., X m } {Y1 , Y2 ,..., Yn }
2
σ x2 σ y
+
X − Y ~ N µ x − µ y ,
m
n
Point estimator for µx - µy :
µx − µy
((8m−−11)()5S .125
+ ()n+−(17)S− 1)(9.905)
= 7.331
2
x
− 27 − 2
m + n8 +
2
y
11 1 1 1 1
=2),7α[±−.331
1 . 366
, +4. 331
1(10
X
. 661
.−371
Y± )3(−±2.027
8.t160
.m714
+.688) ] +
+ n −)(
2 tS13pool
, 0).025 (7
8m 7 n 8 7
VSRS
{X 1 , X 2 ,..., X n }
X ± Zα
2
σ
n
IF σ is unknown, replace it by S .
3
Population Proportion
Population Proportion
Large Population
X − nπ
For large n
nπ (1 − π )
Virus carriers
1−π
π
π
X ~ b(n, π )
X
π=
n
X − nπ
nπ (1 − π )
≤ Zα 2 ≈ 1 − α
Approximate 100(1-α)% C.I. for π :
2π + Z α2 2 ±
Point estimator for π :
Random sample of size n
.
Pr − Z α 2 ≤
Population proportion
X = number of virus carriers in the sample
~ N (0,1)
(2π + Z n ) − 4π (1 + Z
2(1 + Z
n)
2
2
α 2
2
2
α 2
n
)
2
α 2
π (1 − π )
≈ π ± Zα 2
n
Sample proportion
Population Proportion
Two Samples Problem
Population 2
Population 1
Example : 41 people out of 500 persons were unemployed
K 41
π=
= 0.082
500
π2
π1
Approximate 95% C.I. for π :
K
K
(0.082)(0.918)
π (1 − π )
K
π ± Z 0.025
= 0.082 ± (1.96)
500
n
= 0.082 ± 0.024
Sample with size n1
= [5.8% , 10.6%]
Estimated unemployment rate is 8.2% with margin of error 2.4% .
Two Samples Problem
Point estimators
For large n1, n2
X
π1 =
n1
X − n1π 1
n1π 1 (1 − π 1 )
.
π1 − π 2
unknown
Example : ability of detergents to remove stains
Y − n 2π 2
n2π 2 (1 − π 2 )
Approximate 100(1-α)% C.I. for π1 - π2 :
(π1 − π 2 ) ± Z α 2
Compare π1 and π2 by :
Sample with size n2
Two Samples Problem
Y
π2 =
n2
~ N (0,1)
Y
X
π 1 (1 − π 1 ) π 2 (1 − π 2 )
+
n1
n2
~ N (0,1)
.
Detergent
Successes
Failures
Total
1
63
28
91
2
42
37
79
K 63 9
=
= 69.23%
π1 =
91 13
42
K
= 53.16%
π2 =
79
Approximate 90% C.I. for π1 - π2 :
(π1 − π 2 ) ± Z 0.05
π 1 (1 − π 1 ) π 2 (1 − π 2 )
+
n1
n2
π 1π−1 π>2π>2 0
(9 13)(4 13) + (42 79 )(37 79 )
9 42
= − ± (1.645)
91
79
13 79
= 0.1607 ± 0.1219 = [3.88% , 28.26%]
4
Sample Size Determination
100(1-α)% C.I. for µ :
Zα 2
σ
n
Zα 2
X ± Zα
2
σ
n
σ σ σ
σ
Zα 2 Zα 2 Zα 2
n n n
n
µ
X
Sample Size Determination
Example : survey on entertainment expenses
Zα 2
σ σ
Zα 2
n n
Zα 2
σ
n
σ = $400
from past surveys
Precision requirement :
X X
X
95% confidence that estimation error is at most $120
D : precision requirement
D=
Z α 2σ
n
⇔n=
Z α2 2σ 2
D2
D = Zα 2
With probability 1-α,
estimation error is at most D.
n=
Z 02.025σ 2 (1.96)2 (400 )2
=
= 42.68 ≈ 43
D2
(120 )2
Z α2 2π (1 − π )
π (1 − π )
⇔n=
n
D2
5