EXPECTATIONS AND MOMENTS OF RANDOM VARIABLES
F YX (y, x) = F (y)F (x)
Multivariate Random Variables
In fact, we also work with the equivalent density version
f (y, x) = f (y)f (x)
f or all y, x
f Y |X (y |x) = f(y) for all y
f X|Y (x |y) = f(x) for all x
If Y ⊥⊥ X, then g(X) ⊥⊥ h(Y ) for any measurable functions g, and h.
We can generalise the notion of independence to multiple random variables. Thus Y , X, and Z are mutually independent if:
f (y, x, z) = f (y)f (x)f (z)
f (y, x) = f (y)f (x)
f or all y, x
f (x, z) = f (x)f (z)
f or all x, z
f (y, z) = f (y)f (z)
f or all y, z
for all y, x, z.
4.1.3 Examples of Multivariate Distributions Multivariate Normal
We say that X (X 1 ,X 2 , ..., X k ) v MV N k (μ, Σ) , when
(2π) [det (Σ)]
k2
where Σ is a k × k covariance matrix
and det (Σ) is the determinant of Σ.
Examples of Parametric Univariate Distributions
Theorem 8 (a) If X v MV N k (μ, Σ) then X i v N (μ i ,σ ii ) (this is shown by inte- gration of the joint density with respect to the other variables).
(b) The conditional distributions X = (X 1 ,X 2 ) are Normal too
(c) Iff Σ diagonal then X 1 ,X 2 , .., X k are mutually independent. In this case
so that
4.1.4 More on Conditional Distributions We now consider the relationship between two, or more, r.v. when they are not inde-
pendent. In this case, conditional density f Y |X and c.d.f. F Y |X is in general varying with the conditioning point x. Likewise for conditional mean E (Y |X), conditional
¡ itY ¢ median M (Y |X), conditional variance V (Y |X), conditional cf E e |X , and other
functionals, all of which characterize the relationship between Y and X. Note that this is a directional concept, unlike covariance, and so for example E (Y |X) can be
very different from E (X|Y ). Regression Models:
We start with random variable (Y, X). We can write for any such random variable
systematic part
rand om part
Multivariate Random Variables
By construction ε satisfies E (ε|X) = 0, but ε is not necessarily independent of X. For example, V ar (ε|X) = V ar (Y − E (Y |X) |X) = V ar (Y |X) = σ 2 (X) can be
expected to vary with X as much as m (X) = E (Y |X) . A convenient and popular simplification is to assume that
E (Y |X) = α + βX
V ar (Y 2 |X) = σ
For example, in the bivariate normal distribution Y |X has
and in fact ε ⊥⊥ X.
We have the following result about conditional expectations Theorem 9 (1) E (Y ) = E [E (Y |X)]
£
2 ¤
(2) E (Y |X) minimizes E (Y − g (X)) over all measurable functions g (·) (3) V ar (Y ) = E [V ar (Y |X)] + V ar [E (Y |X)]
R
Proof. (1) Write f YX (y, x) = f Y |X (y |x) f X (x) then we have E (Y ) = yf Y (y)dy =
¢ y f YX (y, x)dx dy = y f Y |X (y |x) f X (x) dx dy =
yf Y |X (y |x) dy f X (x) dx = [E(Y |X = x] f X (x) dx = E (E (Y |X))
(2) E (Y − g (X)) =E [Y − E (Y |X) + E (Y |X) − g (X)]
= E [Y 2 − E (Y |X)] +2E [[Y − E (Y |X)] [E (Y |X) − g (X)]]+E [E (Y |X) − g (X)]
£
2 ¤
as now E (Y E (Y |X)) = E (E (Y |X)) , and E (Y g (X)) = E (E (Y |X) g (X)) we
£
2 ¤
2 2 get that E 2 (Y − g (X)) = E [Y − E (Y |X)] +E [E (Y |X) − g (X)] ≥ E [Y − E (Y |X)] .
2 2 (3) V ar (Y ) = E [Y − E (Y )] 2 = E [Y − E (Y |X)] + E [E (Y |X) − E (Y )] +2E [[Y − E (Y |X)] [E (Y |X) − E (Y )]]
2 £
2 ¤
The first term is E [Y − E (Y |X)] =E {E [Y − E (Y |X)] |X } = E [V ar (Y |X)] The second term is E [E (Y |X) − E (Y )] 2 = V ar [E (Y |X)]
The third term is zero as ε = Y − E (Y |X) is such that E (ε|X) = 0, and
E (Y |X) − E (Y ) is measurable with respect to X.
Examples of Parametric Univariate Distributions
Covariance
Cov (X, Y ) = E [X − E (X)] E [Y − E (Y )] = E (XY ) − E (X) E (Y ) Note that if X or Y is a constant then Cov (X, Y ) = 0. Also
Cov (aX + b, cY + d) = acCov (X, Y )
An alternative measure of association is given by the correlation coefficient
Cov (X, Y ) ρ XY = σ X σ Y
Note that
ρ aX+b,cY +d = sign (a) × sign (c) × ρ XY
If E (Y |X) = a = E (Y ) almost surely, then Cov (X, Y ) = 0. Also if X and Y are independent r.v. then Cov (X, Y ) = 0.
Both the covariance and the correlation of random variables X and Y are measures of a linear relationship of X and Y in the following sense. cov[X, Y ] will
be positive when (X − μ X ) and (Y − μ Y ) tend to have the same sign with high
probability, and cov[X, Y ] will be negative when (X − μ X ) and (Y − μ Y ) tend to have
opposite signs with high probability. The actual magnitude of the cov[X, Y ] does not much meaning of how strong the linear relationship between X and Y is. This is because the variability of X and Y is also important. The correlation coefficient does not have this problem, as we divide the covariance by the product of the standard deviations. Furthermore, the correlation is unitless and −1 ≤ ρ ≤ 1.
The properties are very useful for evaluating the expected return and stan-
dard deviation of a portfolio. Assume r a and r b are the returns on assets A and B,
and their variances are σ 2 a and σ 2 b , respectively. Assume that we form a portfolio of
the two assets with weights w a and w b , respectively. If the correlation of the returns
of these assets is ρ, find the expected return and standard deviation of the portfolio.
Inequalities
If R p is the return of the portfolio then R p =w a r a +w b r b . The expected portfolio return is E[R p ]=w a E[r a ]+w b E[r b ] . The variance of the portfolio is
2 var[R 2
p ] = var[w a r a +w b r b ] = E[(w a r a +w b r b ) ] − (E[w a r a +w b r b ]) =
=w 2 E[r 2 ]+w 2 E[r a 2 a b b ] + 2w a w b E[r a r b ]
2 (E[r ]) 2 2 (E[r ]) −w 2
a a −w b b − 2w a w b E[r a ]E[r b ]=
2 2 2 2 2 =w 2
a {E[r a ] − (E[r a ]) }+w b {E[r b ] − (E[r b ]) }+2w a w b {E[r a r b ] − E[r a ]E[r b ] }
2 2 2 2 2 =w 2
a var[r a ]+w b var[r b ] + 2w a w b cov[r a ,r b ] or = w a σ a +w b σ b + 2w a w b ρσ a σ b
In a vector format we have: ⎛ ⎞
var[R p ]= w a w b ⎝
From the above example we can see that var[aX +bY ] = a 2 var[X]+b 2 var[Y ]+
2abcov[X, Y ] for random variables X and Y and constants a and b. In fact we can
generalize the formula above for several random variables X 1 ,X 2 , ..., X n and constants
a 1 ,a 2 ,a 3 , ..., a n
n
i.e. var[a 1 X 1 +a 2 X 2 + ...a n X ]=
a i var[X i ]+2 a i a j cov[X i ,X j ]
i=1
i 4.2 Inequalities This section gives some inequalities that are useful in establishing a variety of prob- abilistic results. 4.2.1 Markov Let Y be a random variable and consider a function g (.) such that g (y) ≥ 0 for all y ∈ R. Assume that E [g (Y )] exists. Then Proof: Assume that Y is continuous random variable (the discrete case follows anal-
Examples of Parametric Univariate Distributions
ogously) with p.d.f. f (.). Define A 1 = {y|g (y) ≥ c} and A 2 = {y|g (y) < c}. Then
4.2.2 Chebychev’s Inequality
V ar (X) P[ |X − E (X)| ≥ η] ≤ η 2
or alternatively
h p
i 1
P |X − E (X)| ≥ r
X To prove the above, assume that E (X) = 0 and compare 1 (|X| ≥ η) with 2
η 2 .
X 2 E ( X 2 ) Clearly 1 (|X| ≥ η) ≤ η 2 and it follows that E [1 (|X| ≥ η)] ≤ η 2 ⇒ P [|X| ≥ η] ≤
V ar(X)
η 2 . Alternatively, apply Markov’s inequality by setting g (y) = [x − E (X)] and
c=r 2 V ar (X) . ¥
4.2.3 Minkowski
Let Y and Z be random variables such that [E (|Y | α )] < ∞ and [E (|Z| )] < ∞ for some 1 ≤ α < ∞. Then
For α = 1 we have the triangular inequality
4.2.4 Triangle
E |X + Y | ≤ E|X| + E|Y |.
4.2.5 Cauchy-Schwarz
2 2 E 2 (XY ) ≤ E (X) E (Y )
P
2 ¡P 2 ¢ ¡P 2 ¢
( a j b j ) ≤
a j
b j
2 2 Let 0 ≤ h (t) = E 2 (tX −Y) =t E (X ) + E (Y ) − 2tE (XY ). Then the function h (t) is a quadratic function in t which is increasing as t → ±∞. It has a
unique minimum at h E(XY ) (t) = 0 2 ⇒ 2tE (X ) − 2E (XY ) = 0 ⇒ t =
E(X 2 ) . Hence
³ E(XY ) ´
4.2.6 Hölder’s Inequality
1 For any p, q satisfying 1
p + q =1 we have
In fact the Cauchy-Schwarz inequality corresponds for p = q = 2.
4.2.7 Jensen Inequality Let X be a random variable with mean E[X], and let g(.) be a convex function. Then
E[g(X)] ≥ g(E[X]).
Now a continuous function g(.) with domain and counterdomain the real line is called
convex if for any x 0 on the real line, there exist a line which goes through the point
(x
0 , g(x 0 )) and lies on or under the graph of the function g(.). Also if g (x 0 ) ≥0
then g(.) is convex.
Examples of Parametric Univariate Distributions
Part II
Statistical Inference