Mathematical statistics

MATH 2P82
MATHEMATICAL STATISTICS
(Lecture Notes)
c Jan Vrbik
°

2

3

Contents
1 PROBABILITY REVIEW
Basic Combinatorics . . . . . . . . . . . . . .
Binomial expansion . . . . . . . . . . .
Multinomial expansion . . . . . . . . .
Random Experiments (Basic Definitions)
Sample space . . . . . . . . . . . . . . .
Events . . . . . . . . . . . . . . . . . . .
Set Theory . . . . . . . . . . . . . . . . .
Boolean Algebra . . . . . . . . . . . . .
Probability of Events . . . . . . . . . . . . .

Probability rules . . . . . . . . . . . . .
Important result . . . . . . . . . . . . .
Probability tree . . . . . . . . . . . . . .
Product rule . . . . . . . . . . . . . . . .
Conditional probability . . . . . . . . .
Total-probability formula . . . . . . . .
Independence . . . . . . . . . . . . . . .
Discrete Random Variables . . . . . . . . .
Bivariate (joint) distribution . . . . .
Conditional distribution . . . . . . . .
Independence . . . . . . . . . . . . . . .
Multivariate distribution . . . . . . . .
Expected Value of a RV . . . . . . . . . . .
Expected values related to X and Y .
Moments (univariate) . . . . . . . . . .
Moments (bivariate or ’joint’) . . . . .
Variance of aX + bY + c . . . . . . . . .
Moment generating function . . . . . . . .
Main results . . . . . . . . . . . . . . . .
Probability generating function . . . . . . .

Conditional expected value . . . . . . . . .
Common discrete distributions . . . . . . .
Binomial . . . . . . . . . . . . . . . . . .
Geometric . . . . . . . . . . . . . . . . .
Negative Binomial . . . . . . . . . . . .
Hypergeometric . . . . . . . . . . . . .
Poisson . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

7
7
7
7
7
7
8
8
8
8
9
9
9
9
9
10
10
10
11
11
11
11
11
12
12
12
13
13
13
13
14
14
14
14
14
15
15

4

Multinomial . . . . . . . . . . . . . . . . .
Multivariate Hypergeometric . . . . . .
Continuous Random Variables . . . . . . . .
Univariate probability density function
Distribution Function . . . . . . . . . . .
Bivariate (multivariate) pdf . . . . . . .
Marginal Distributions . . . . . . . . . .
Conditional Distribution . . . . . . . . .
Mutual Independence . . . . . . . . . . .
Expected value . . . . . . . . . . . . . . .
Common Continuous Distributions . . . . .
Transforming Random Variables . . . . . . .
Examples . . . . . . . . . . . . . . . . . . .

. . . .
. . . .
. . . .
(pdf)
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .

2 Transforming Random Variables
Univariate transformation . . . . . . . . . . . . . .
Distribution-Function (F ) Technique . . . .
Probability-Density-Function (f) Technique
Bivariate transformation . . . . . . . . . . . . . . .
Distribution-Function Technique . . . . . . .
Pdf (Shortcut) Technique . . . . . . . . . . .
3 Random Sampling
Sample mean . . . . . . . . . . .
Central Limit Theorem . .
Sample variance . . . . . . . . .
Sampling from N (µ, σ) . .
Sampling without replacement
Bivariate samples . . . . . . . .
4 Order Statistics
Univariate pdf . .
Sample median . .
Bivariate pdf . . .
Special Cases

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.

.
.
.
.

5 Estimating Distribution Parameters
A few definitions . . . . . . . . . . . .
Cramér-Rao inequality . . . . . . . .
Sufficiency . . . . . . . . . . . . . . . .
Method of moments . . . . . . . . . .
One Parameter . . . . . . . . . .
Two Parameters . . . . . . . . .
Maximum-likelihood technique . . .
One Parameter . . . . . . . . . .
Two-parameters . . . . . . . . .

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

15
15
16
16
16
16
16
17
17
17
18
18
19

.
.
.
.
.
.

21
21
21
23
24
24
25

.
.
.
.
.
.

31
31
31
32
33
35
36

.
.
.
.

37
37
38
40
41

.
.
.
.
.
.
.
.
.

45
45
47
50
51
52
53
53
54
55

5

6 Confidence Intervals
CI for mean µ . . . . . . . . . .
σ unknown . . . . . . . . .
Large-sample case . . . .
Difference of two means
Proportion(s) . . . . . . . . . .
Variance(s) . . . . . . . . . . .
σ ratio . . . . . . . . . . .
7 Testing Hypotheses
Tests concerning mean(s)
Concerning variance(s) . .
Concerning proportion(s)
Contingency tables . . . .
Goodness of fit . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

57
57
58
58
58
59
60
60

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

61
62
63
63
63
63

8 Linear Regression and Correlation
Simple regression . . . . . . . . . . . . . . . .
Maximum likelihood method . . . . . .
Least-squares technique . . . . . . . . . .
Normal equations . . . . . . . . . . . . .
Statistical properties of the estimators
Confidence intervals . . . . . . . . . . . .
Correlation . . . . . . . . . . . . . . . . . . . .
Multiple regression . . . . . . . . . . . . . . .
Various standard errors . . . . . . . . . .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

65
65
65
65
66
67
69
70
71
73

9 Analysis of Variance
One-way ANOVA . .
Two-way ANOVA . .
No interaction .
With interaction

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

75
75
76
77
78

10 Nonparametric Tests
Sign test . . . . . . . . . . . . . . . . . . . .
Signed-rank test . . . . . . . . . . . . . . .
Rank-sum tests . . . . . . . . . . . . . . . .
Mann-Whitney . . . . . . . . . . . . .
Kruskal-Wallis . . . . . . . . . . . . .
Run test . . . . . . . . . . . . . . . . . . . .
(Sperman’s) rank correlation coefficient

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

79
79
79
80
80
81
81
83

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

6

7

Chapter 1 PROBABILITY REVIEW
Basic Combinatorics
Number of permutations of n distinct objects: n!
Not all distinct, such as, for example aaabbc:

µ
6
6! def.
=
3, 2, 1
3!2!1!
or

N!
def.
=
n1 !n2 !n3 !.....nk !

in general, where N =

k
P

µ

N
n1 , n2 , n3 , ...., nk



ni which is the total word length (multinomial coef-

i=1

ficient).
Selecting r out of n objects (without duplication), counting all possible arrangements:
n!
def.
= Prn
n × (n − 1) × (n − 2) × .... × (n − r + 1) =
(n − r)!

(number of permutations).
Forget their final arrangement:

Prn
n!
def.
= Crn
=
r!
(n − r)!r!
(number of combinations). This will also be called the binomial coefficient.
If we can duplicate (any number of times), and count the arrangements:
nr
Binomial expansion
n

(x + y) =

n ³ ´
X
n
i=0

Multinomial expansion
n

(x + y + z)

i

X µ n ¶
xi y j z k
i,
j,
k
i,j,k≥0

i+j+k=n
n

(x + y + z + w) =

xn−i y i

X

i,j,k, ≥0
i+j+k+ =n

µ

n
i, j, k,



xi y j z k w

etc.

Random Experiments (Basic Definitions)
Sample space
is a collection of all possible outcomes of an experiment.
The individual (complete) outcomes are called simple events.

8

Events
are subsets of the sample space (A, B, C,...).
Set Theory
The old notion of:
Universal set Ω
Elements of Ω (its individual ’points’)
Subsets of Ω
Empty set ∅

is (are) now called:
Sample space
Simple events (complete outcomes)
Events
Null event

We continue to use the word intersection (notation: A ∩ B, representing
the collection of simple events common to both A and B ), union (A ∪ B, simple
events belonging to either A or B or both), and complement (A, simple events
not in A ). One should be able to visualize these using Venn diagrams, but when
dealing with more than 3 events at a time, one can tackle problems only with the
help of
Boolean Algebra
Both ∩ and ∪ (individually) are commutative and associative.
Intersection is distributive over union: A∩(B∪C ∪...) = (A∩B)∪(A∩C)∪...
Similarly, union is distributive over intersection: A ∪ (B ∩ C ∩ ...) = (A ∪ B) ∩
(A ∪ C) ∩ ...
Trivial rules: A ∩ Ω = A, A ∩ ∅ = ∅, A ∩ A = A, A ∪ Ω = Ω, A ∪ ∅ = A,
A ∪ A = A, A ∩ A = ∅, A ∪ A = Ω, Ā = A.
Also, when A ⊂ B (A is a subset of B, meaning that every element of A also
belongs to B), we get: A ∩ B = A (the smaller event) and A ∪ B = B (the bigger
event).
DeMorgan Laws: A ∩ B = A ∪ B, and A ∪ B = A ∩ B, or in general
A ∩ B ∩ C ∩ ... = A ∪ B ∪ C ∪ ...
and vice versa (i.e. ∩ ↔ ∪).
A and B are called (mutually) exclusive or disjoint when A ∩ B = ∅ (no
overlap).

Probability of Events
Simple events can be assigned a probability (relative frequency of its occurrence
in a long run). It’s obvious that each of these probabilities must be a non-negative
number. To find a probability of any other event A (not necessarily simple), we
then add the probabilities of the simple events A consists of. This immediately
implies that probabilities must follow a few basic rules:
Pr(A) ≥ 0
Pr(∅) = 0
Pr(Ω) = 1
(the relative frequency of all Ω is obviously 1).
We should mention that Pr(A) = 0 does not necessarily imply that A = ∅.

9

Probability rules
Pr(A ∪ B) = Pr(A) + Pr(B) but only when A ∩ B = ∅ (disjoint). This implies that
Pr(A) = 1 − Pr(A) as a special case.
This also implies that Pr(A ∩ B) = Pr(A) − Pr(A ∩ B).
For any A and B (possibly overlapping) we have
Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B)
Can be extended to: Pr(A ∪ B ∪ C) = Pr(A) + Pr(B) + Pr(C) − Pr(A ∩B) − Pr(A ∩
C) − Pr(B ∩ C) + Pr(A ∩ B ∩ C).
In general
Pr(A1 ∪ A2 ∪ A3 ∪ ... ∪ Ak ) =

k
X
i=1

Pr(Ai ) −

k
X
i 0 and y2 > 0. Eliminate y1 by y1 e−y1 (1+y2 ) dy1 =

Solution: This time we reverse the labels: Y1 ≡ X1 and Y2 =

0

27
1
,
(1+y2 )2

1
(1+y)2

where y2 > 0. Thus, fY (y) =
solved this problem before].

with y > 0 [check, we have

3. In this example we introduce the so called Beta distribution
Let X1 and X2 be independent RVs from the gamma distribution with pa1
rameters (k, β) and (m, β) respectively, and let Y1 = X1X+X
.
2
Solution: Using the argument of Example 1 one can show that β ’cancels out’,
and we can assume that β = 1 without affecting the answer. The definition of
y1 y2
, x2 = y2 , and the Jacobian =
Y1 is also the same as in Example 1 ⇒ x1 = 1−y
1
y2
.
(1−y1 )2

xk−1
xm−1
e−x1 −x2
1
2
and multiplying by the
Γ(k)·Γ(m)
y2

y k−1 y k−1 y m−1 e 1−y1
y2
·
for 0 < y1 < 1
yields f(y1 , y2 ) = 1 2 2
k−1
Γ(k)Γ(m)(1 − y1 )
(1 − y1 )2
R∞ k+m−1 − y2
y1k−1
0. Integrating over y2 results in:
y
e 1−y1
Γ(k)Γ(m)(1 − y1 )k+1 0 2

Substituting into f(x1 , x2 ) =

Jacobian
and y2 >

Γ(k + m)
· y k−1 (1 − y1 )m−1
Γ(k) · Γ(m) 1

(f)

where 0 < y1 < 1.
This is the pdf of a new two-parameters (k and m) distribution which is
called beta. Note that, as a by-product, we have effectively proved the followR1
ing formula: y k−1 (1 − y)m−1 dy = Γ(k)·Γ(m)
for any k, m > 0. This enables
Γ(k+m)
0

us to find the distribution’s mean: E(Y ) =
Γ(k+m)
Γ(k)·Γ(m)

·

Γ(k+1)·Γ(m)
Γ(k+m+1)

Γ(k+m)
Γ(k)·Γ(m)

=

R1
0

y k (1 − y)m−1 dy =

k
k+m
and similarly E(Y 2 ) =
(k+1) k
(k+m+1) (k+m)

Γ(k+m)
Γ(k)·Γ(m)

⇒ V ar(Y ) =

R1

(mean)

y k+1 (1 − y)m−1 dy =

0
(k+1) k
(k+m+1) (k+m)

Γ(k+m)
Γ(k)·Γ(m)

· Γ(k+2)·Γ(m)
=
Γ(k+m+2)

k
− ( k+m
)2 =

km
(k + m + 1) (k + m)2

(variance)

X2
X1 +X2

is also beta (why?) with

Note that the distribution of 1 − Y ≡
parameters m and k [reversed].

We learn how to compute related probabilities in the following set of Examples:
(a) Pr(X1 < X22 ) where X1 and X2 have the gamma distribution with parameters (4, β) and (3, β) respectively [this corresponds to the probability
that Mr.A catches 4 fishes in less than half the time Mr.B takes to catch
3].

dy2 =

28
1
Solution: Pr(2X1 < X2 ) = Pr(3X1 < X1 + X2 ) = Pr( X1X+X
< 13 ) =
2
1
h 4
i1
3
Γ(4+3) R 3
y5
y6 3
y
2
= 10.01%.
y (1 − y) dy = 60 × 4 − 2 5 + 6
Γ(4)·Γ(3)

y=0

0

(b) Evaluate Pr(Y < 0.4) where Y has the beta distribution with parameters
( 23 , 2) [half-integer values are not unusual, as we learn shortly].
¸0.4
· 3
0.4
5
R 1
Γ( 27 )
2
2
y
y
Solution: Γ( 3 )·Γ(2) y 2 (1 − y) dy = 25 · 32 · 3 − 5
= 48.07%.
2

2

0

2

y=0

5
).
2

(c) Evaluate Pr(Y < 0.7) where Y ∈ beta(4,
Solution: This equals [it is more convenient to have the half-integer first]
· 5
7
9
11 9 7 5
R1 3
)
·2·2·2
Γ( 13
y2
y2
y2
3
2
2
Pr(1−Y > 0.3) = Γ( 5 )·Γ(4) u 2 (1−u) du =

3
+
3

5
7
9
3!
2

2

0.3

2

2

1 − 0.3522 = 64.78%.

d Pr(Y < 0.5) when Y ∈ beta( 32 , 12 ).
0.5
R 1
1
Solution: Γ( 3Γ(2)
y 2 (1 − y)− 2 dy = 18.17% (Maple).
)·Γ( 1 )
2

2

0

4. In this example we introduce the so called Student’s or t-distribution
[notation: tn , where n is called ’degrees of freedom’ − the only parameter].
We start with two independent RVs X1 ∈ N (0, 1) and X2 ∈ χ2n , and introduce
X1
a new RV by Y1 = q .
X2
n

p
To get its pdf we take Y2 ≡ X2 , solve for x2 = y2 and x1 = y1 · yn2 , substitute
n
x2
¯ p y2 1 y1 ¯
x2
−1
1
¯
· √ny2 ¯¯ p y2
e− 2 x22 e− 2
n
2
¯
into f(x1 , x2 ) = √ · n
n and multiply by ¯
n
¯=
0
1
2π Γ( 2 ) · 2 2
y 2 y2

n

−1
2
1
e− 2n
y22 e− 2 p y2
where −∞ < y1 < ∞ and
to get f (y1 , y2 ) = √
·
n ·
n
Γ( n2 ) · 2 2

R∞ n−1
y2
y
1
− 22 (1+ n1 )
2
dy2 =
y
e
y2 > 0. To eliminate y2 we integrate: √

n
2
2πΓ( n2 ) 2 2 n 0
n +1
)2 2
Γ( n+1
2
´ n 2+1 =
³

n√
y12
n
2πΓ( 2 ) 2 2 n 1 + n
y

)
Γ( n+1
1
2
·³
´ n 2+1
n √
Γ( 2 ) nπ
y12
1+ n

(f )

with −∞ < y1 < ∞. Note that when n = 1 this gives

y2
− 21

when n → ∞ the second part of the formula tends to e

1
π

·

1
1+ y12

(Cauchy),

which is, up to the
Γ( n+1
)
1
2
normalizing constant, the pdf of N (0, 1) [implying that n √
−→ √ ,
n→∞
Γ( 2 ) nπ

why?].

11
2
11
2

y

¸1

y=0.3

=

29

Due to the symmetry of the distribution [f (y) = f (−y)] its mean is zero
(when is exists, i.e. when n ≥ 2).
) R∞ (y 2 + n − n) dy
Γ( n+1
2

To compute its variance: V ar(Y ) = E(Y 2 ) =
=
´ n+1
Γ( n2 ) nπ −∞ ³
2
y2
1+ n


·
¸
n−2
n−1
Γ( 2 ) nπ
Γ( n2 ) nπ
Γ( n+1
)
2
2

n
·

n
·
=
n
·
−n=
n−2
n+1
2
Γ( n2 ) nπ
Γ( n−1
Γ(
)
)
2
2
n
n−2

(variance)

for n ≥ 3 (for n = 1 and 2 the variance is infinite).
Note that when n ≥ 30 the t-distribution can be closely approximated by
N (0, 1).
5. And finally, we introduce the Fisher’s F-distribution
(notation: Fn,m where n and m are its two parameters, also referred to as
’degrees of freedom’), defined by Y1 =

X1
n
X2
m

where X1 and X2 are inde-

pendent, both having the chi-square distribution, with degrees of freedom n
and m, respectively.
First we solve for x2 = y2 and x1 =
we substitute into
n

n 2
(m
)

n
−1
2

m
−1
2

x1
2

n
m


y1 y2 ⇒ Jacobian equals to

x2
2

n
m

y2 . Then

x1 e−
x2 e
and multiply by this Jacobian to get
n ·
m
n
Γ( 2 ) 2 2
Γ( m2 ) 2 2
n

y12

−1

n +m
−1
2

·y2

e−

ny )
y2 (1+ m
1
2

with y1 > 0 and y2 > 0. Integrating
n+m
Γ( n2 ) Γ( m2 ) 2 2
over y2 (from 0 to ∞) yields the following formula for the corresponding pdf
n

−1

Γ( n +m ) n n
y12
f (y1 ) = n 2 m ( ) 2 ·
n +m
n
Γ( 2 ) Γ( 2 ) m
(1 + m
y1 ) 2
for y1 > 0.
We can also find E(Y ) =

Γ( n +m
) n n R∞ y n2 dy
2
( )2
n +m =
Γ( n2 ) Γ( m2 ) m 0 (1+ mn y) 2
m
m−2

(mean)

for m ≥ 3 (the mean is infinite for m = 1 and 2).
2

(n+2) m
Similarly E(Y 2 ) = (m−2)
⇒ V ar(Y ) =
(m−4) n
h
i
(n+2) (m−2)
−1 =
(m−4) n

(n+2) m2
m2
− (m−2)
2
(m−2) (m−4) n

2 m2 (n + m − 2)
(m − 2)2 (m − 4) n

for m ≥ 5 [infinite for m = 1, 2, 3 and 4].

=

m2
·
(m−2)2

(variance)

30

Note that the distribution of
χ21

Z2

1
is obviously Fm,n [degrees of freedom reversed],
Y

≡ t2m , and finally when both n and m are large (say
µ q

2(n +m)
> 30) then Y is approximately normal N 1,
.
n·m
also that F1,m ≡

χ2m
m



χ2m
m


The last assertion can be proven by introducing U = m · (Y − 1), getting its
n
(1 + √um ) 2 −1
) n n
Γ( n +m
u
2
( )2 ·
pdf: (i) y = 1 + √m , (ii) substituting:
n +m ·
Γ( n2 ) Γ( m2 ) m
(1 + n + n √u ) 2
√1
m

[the Jacobian] =

u2
2m

− .... = − √um +

Γ( n +m
)
2
m √
n
Γ( 2 ) Γ( 2 ) m

·

n n
(m
)2
n n +m
+m
) 2

(1 +

m
m
n
√u ) 2 −1
m

m

·
where
n √u n +m
(1
(1 + n +m
) 2
m

− m < u < ∞. Now, taking the limit of the last factor (since that is
the only part containing u, the rest being only a normalizing constant)
we get [this is actually easier with the correspondingh logarithm, namely
i
n √u
n2
√u − ( n − 1) −
)
=

ln(1
+
·
( n2 − 1) ln(1 + √um ) − n+m
2
n +m m
2
2(n +m)
m
u2
2m



n u2
n +m 4

− .... −→

n,m→∞



1 u2
[assuming that
1+ m
4
n


m
n

u2 n
4(n+m)

the ratio remains finite]. This implies that the limiting pdf is C · e
where C is a normalizing constantµ(try to establish
its value). The limiting

q
. Since this is the (approxidistribution is thus, obviously, N 0, 2(n+m)
n
√U
m

+ 1 must be also (approximately) normal
q
with the mean of 1 and the standard deviation of 2(n+m)
. ¤
n·m

mate) distribution of U, Y =

We will see more examples of the F, t and χ2 distributions in the next chapter,
which discusses the importance of these distributions to Statistics, and the context
in which they usually arise.

31

Chapter 3 RANDOM SAMPLING
A random independent sample (RIS) of size n from a (specific) distribution
is a collection of n ind