Statistics calculated from the sample The interval [L, U] is called a 95 confidence interval (C.I.) for the parameter θ if

I nterval Estimation

I nterval Estimation

Example : Election 2000 voting poll from 2386 interviews

Example : VSRS {X 1 , X 2 ,..., X n }

47% > 45%

Result of Nationwide
Gallup poll

(

N µ,σ 2

)

 σ2


X ~ N  µ ,
n 


Can Bush get more votes than Gore?

November 5, 2000

from

Point estimates
Interval
estimates

George W. Bush

47%

Al Gore


45%

Ralph Nader

4%

Buchanan

1%

Not yet decided

3%

 µ − 1.96σ n − µ 
 µ + 1.96σ n − µ 

σ
σ 




Pr  µ − 1.96
≤ X ≤ µ + 1.96
 = Φ

 − Φ
σ n
σ n
n
n






Bush
Gore
43


45

47

49

0 .(975

1.96−) −(1Φ−(0−.1975
.96)

%

= 0.95

Margin
Margin
of error
of error

: 0.05%
: 2%

“toowill
close
to the
call”
“Bush
win
race”

I nterval Estimation

Confidence I nterval

 σ2 
X ~ N  µ, 
 n 

µ − 1.96


σ
n

L, U
µ + 1.96

area = 0.95

X + 1.96

X
X

X − 1.96

X

σ



σ X
σ 
Pr Xµn− 1−.961.σ96
 = 95%
≤ X ≤ µ + 1.96
X
σ
n
n

− 1.96
Xn

X − 1.96

σ
n

Xn


σ
n
σ
X − 1.96
Xn

X

X − 1.96

X
X
X
X − 1.96

σ
n

The

Theinterval
interval[L,
[L,U]
U]isiscalled
calleda a100(195% confidence
90%
α)% confidence
interval
interval
(C.I.)(C.I.)
for
forthe
theparameter
parameterθθifif

σ

n
σ
+ 1.%

96 confidence interval
X95
σ
n
X + 1.96
n
 σ
σ
σ 
X − 1.96
, X + 1.96
X + 1.96


n
 n X + 1.96
n
σ
X + 1.96
n

σ
X + 1.96
n

(

N µ ,σ

Pr
Pr((LL ≤≤θθ ≤≤U
U)) ==10-.95
9α0

2

)

for
forany
anypossible
possible θθ

Confidence level

95%
within
bounds
95%ofofthe
themeans
intervals
contain
µ

σ
n

Confidence I nterval
Normal population

Statistics calculated from the sample

α

Confidence level

0.01

99%

0.05

95%

0.1

90%

Tail Area
µ unknown

σ2

known

VSRS {X 1 , X 2 ,..., X n }

Z 0.05 = 1.645

N (0,1)



 
σ σσ
X σ−σµ
X
)− Φ
(−101==.−95
Pr
−1Z21.≤.α96
≤≤≤µ1µ
≤ ≤X
+1Z(.1α296
Pr−XX1.−96
576
645
.96
..96
576
96
00α..)90
99
= 0.95
2
2645
 =
 =+Φ
σ nnn
n nn 



90%
99%
100(1confidence
α)% C.I. interval
for µ : for µ :
95%

X ±±12Z..645
σσσσ
σ 
576
962
, X + 1.96
α
XX±−11..96

n
nnnn


point estimate

1-α

α/2

Z 0.025 = 1.96
α/2

Z 00..15
= Z .3 2
15 = 1.0036

Z 0.9 = −1Z.0282
.1
-Zα/2

0

Zα/2

margin of error

1

Confidence I nterval
Example : X

Confidence I nterval
 1.22 
X ~ N  µ,

 60 

alcohol concentration of French wine

(

X ~ N µ ,1.2

2

)

A 90% C.I. for µ :

X ± Z 0.05

µ + 0.255

µ − 0.255

VSRS {X 1 , X 2 ,..., X 60 }

area = 0.90

X = 9.3

1.2
σ
= 9.3 ± (1.645)
n
60

90 % confidence intervals

X

9.3

= 9.3 ± 0.255

X

8.9

= [9.045 , 9.555]

?

?

(

)

90% of the intervals contain µ

Student-t Distribution
f (x ) =

VSRS {X 1 , X 2 ,..., X n }
S=

1 n
2
∑ (X i − X )
n − 1 i =1

 r +1
r +1
Γ


1
2
 2  r − 2  1 + x  2


r
 r  1


Γ  Γ 
2 2

XX±±t nZ−1α,α 22

σS
nn

0

r
for r > 2
r −2

t1 ≡ Cauchy

unknown
0

3

X ~ tr

tr

-tr,α/2

Var ( X ) =

t3

Student-t Distribution Table

Student-t Distribution

1-α

X ~ tr
E ( X ) = 0 for r > 1

t7

3

α/2

, −∞ < x < ∞

N (0,1) ≡ t ∞

Student-t Densities


σS
σ S 
≤ µ ≤ X + tZnα−1,2α 2  = 1= −1α
 X X− −
t n −Z1,α 2
−α
PrPr
n
n n 


100(1-α)% C.I. for µ :

9.155]

[9.245 , 9.755]

X

µ unknown σ2 unknown

XX −−−µµµ
~ ~N~(t0n,?1− 1)
σS nnn

9.5

X

Normal Model, Unknown Variance
N µ ,σ 2

[8.645 ,

X

X

Pr (9.045 ≤ µ ≤ 9.555) = 0.9

Normal population

[9.045 , 9.555]

X

α/2

tr,α/2

t 8,0..05 = 1?.860

t11,0.025 = 2? .201

2

Normal Model, Unknown Variance
Example : X

Frequency of elephant call
X ~ N (µ , σ 2 )

VSRS {X 1 , X 2 ,..., X 12 }
A 95% C.I. for µ :

Population 1

Population 2

µ x , σ x2

µ y , σ y2

X = 22.33 S = 56.424
2

S

X ± t11, 0.025

Two Samples Problem

n

= 22.33 ± (2.201)

56.424
12

= 22.33 ± 4.773

VSRS

= [17.557 , 27.103]

{Y1 , Y2 ,..., Yn }

{X 1 , X 2 ,..., X m }
estimate

Pr (17.557 ≤ µ ≤ 27.103) = 0.95

Compare µx and µy by :

Normal Models, Known Variances
Normal populations

(

N µ x ,σ

2
x

(

)

N µ y ,σ

2
y

µx , µy unknown
σx2 , σy2 known

)

Independent VSRSs {X 1 , X 2 ,..., X m } {Y1 , Y2 ,..., Yn }

2
2

σ x2 σ y
σ x2 σ y 
= 1 −α
Pr  (X − Y ) − Z α 2
+
≤ µ ≤ (X − Y ) + Zα 2
+

m
n
m
n 



(X − Y ) ± Z

100(1-α)% C.I. for µ :

point estimate

2
σ x2 σ y
+
m
n

α 2

Normal Models, Unknown Variances
Normal populations

Normal Models, Unknown Variances
Fertilizer A ( X)

12, 11, 7, 13, 8, 9, 10, 13
13, 11, 10, 6, 7, 4, 10

100(1-α)% C.I. for µx - µy :

95% C.I. for µx - µy :

)

µx , µy unknown
σx2 , σy2 unknown

σ x2 = σ y2 = σ 2

(m − 1)S x2 + (n − 1)S y2
m+n−2

σ y2

(X −−YY ))±±Zt m + n − 2σ,α 2 S+ pool
2
x

α 2

m

n

1 1
+
m n

unknown

Large Sample Approximation
2
µ
N(µ, σ
,σ 2 )

For large n
X −µ .
N (0,1)
~N
σ n

?

σ
σ 

 ≈
≤ µ ≤ X + Zα 2
= 1−α
Pr X − Z α 2
n
n


(iii) Equal variance

100(1-α100(1)% C.I.α)%
forC.I.
µ : for µ :
Approximate

m = 8, n = 7, X = 10.371, Y = 8.714, S x2 = 5.125, S y2 = 9.905
22
SS pool
=
pool =

(

N µ y , σ y2

2
=
S pool

Pooled sample variance

Assumptions : (i) Normal populations

Pooled sample variance :

)

Assumption : Equal variances

Arbitrary
Population
Normal Population

Example : growth of tomatoes

(ii) Independent samples

(

N µ x , σ x2

margin of error

Fertilizer B ( Y)

unknown

Independent VSRSs {X 1 , X 2 ,..., X m } {Y1 , Y2 ,..., Yn }

2

σ x2 σ y 
+
X − Y ~ N  µ x − µ y ,

m
n



Point estimator for µx - µy :

µx − µy

((8m−−11)()5S .125
+ ()n+−(17)S− 1)(9.905)
= 7.331
2
x

− 27 − 2
m + n8 +

2
y

11 1 1 1 1
=2),7α[±−.331
1 . 366
, +4. 331
1(10
X
. 661
.−371
Y± )3(−±2.027
8.t160
.m714
+.688) ] +
+ n −)(
2 tS13pool
, 0).025 (7
8m 7 n 8 7

VSRS

{X 1 , X 2 ,..., X n }

X ± Zα

2

σ
n

IF σ is unknown, replace it by S .

3

Population Proportion

Population Proportion

Large Population

X − nπ

For large n

nπ (1 − π )

Virus carriers

1−π

π

π

X ~ b(n, π )

 X
π=
n

X − nπ

nπ (1 − π )


≤ Zα 2  ≈ 1 − α



Approximate 100(1-α)% C.I. for π :

2π + Z α2 2 ±

Point estimator for π :
Random sample of size n

.


Pr − Z α 2 ≤



Population proportion

X = number of virus carriers in the sample

~ N (0,1)

(2π + Z n ) − 4π (1 + Z
2(1 + Z
n)
2

2
α 2

2

2
α 2

n

)

2
α 2



π (1 − π )

≈ π ± Zα 2
n

Sample proportion

Population Proportion

Two Samples Problem
Population 2

Population 1

Example : 41 people out of 500 persons were unemployed
K 41
π=
= 0.082
500

π2

π1

Approximate 95% C.I. for π :
K
K
(0.082)(0.918)
π (1 − π )
K
π ± Z 0.025
= 0.082 ± (1.96)
500
n
= 0.082 ± 0.024

Sample with size n1

= [5.8% , 10.6%]

Estimated unemployment rate is 8.2% with margin of error 2.4% .

Two Samples Problem
Point estimators
For large n1, n2

X

π1 =
n1
X − n1π 1

n1π 1 (1 − π 1 )

.

π1 − π 2

unknown

Example : ability of detergents to remove stains
Y − n 2π 2

n2π 2 (1 − π 2 )

Approximate 100(1-α)% C.I. for π1 - π2 :

(π1 − π 2 ) ± Z α 2

Compare π1 and π2 by :

Sample with size n2

Two Samples Problem

Y

π2 =
n2

~ N (0,1)

Y

X





π 1 (1 − π 1 ) π 2 (1 − π 2 )
+
n1
n2

~ N (0,1)
.

Detergent

Successes

Failures

Total

1

63

28

91

2

42

37

79

K 63 9
=
= 69.23%
π1 =
91 13
42
K
= 53.16%
π2 =
79

Approximate 90% C.I. for π1 - π2 :

(π1 − π 2 ) ± Z 0.05





π 1 (1 − π 1 ) π 2 (1 − π 2 )
+
n1
n2

π 1π−1 π>2π>2 0

(9 13)(4 13) + (42 79 )(37 79 )
 9 42 
=  −  ± (1.645)
91
79
 13 79 
= 0.1607 ± 0.1219 = [3.88% , 28.26%]

4

Sample Size Determination
100(1-α)% C.I. for µ :
Zα 2

σ
n

Zα 2

X ± Zα

2

σ
n

σ σ σ
σ
Zα 2 Zα 2 Zα 2
n n n
n

µ

X

Sample Size Determination
Example : survey on entertainment expenses

Zα 2

σ σ
Zα 2
n n

Zα 2

σ
n

σ = $400

from past surveys

Precision requirement :
X X

X

95% confidence that estimation error is at most $120

D : precision requirement
D=

Z α 2σ
n

⇔n=

Z α2 2σ 2
D2

D = Zα 2

With probability 1-α,
estimation error is at most D.

n=

Z 02.025σ 2 (1.96)2 (400 )2
=
= 42.68 ≈ 43
D2
(120 )2

Z α2 2π (1 − π )
π (1 − π )
⇔n=
n
D2

5