Expected Values, Covariance, and Correlation
5.2 Expected Values, Covariance, and Correlation
Any function h(X) of a single rv X is itself a random variable. However, we saw that to compute E[h(X)], it is not necessary to obtain the probability distribution of h(X). Instead, E[h(X)] is computed as a weighted average of h(x) values, where the weight function is the pmf p(x) or pdf f(x) of X. A similar result holds for a function h(X, Y) of two jointly distributed random variables.
prOpOSITION Let X and Y be jointly distributed rv’s with pmf p(x, y) or pdf f(x, y) according to whether the variables are discrete or continuous. Then the expected value of
a function h(X, Y), denoted by E[h(X, Y)] or m h (X, Y) , is given by
o if X and Y are discrete
x o y
h(x, y) ? p(x, y)
E [h(X, Y )] 5
h (x, y) ? f(x, y) dx dy if X and Y are continuous
ExamplE 5.13 Five friends have purchased tickets to a certain concert. If the tickets are for seats 1–5 in a particular row and the tickets are randomly distributed among the five, what is the expected number of seats separating any particular two of the five? Let X and Y denote the seat numbers of the first and second individuals, respectively. Possible (X, Y) pairs are {(1, 2), (1, 3), . . . , (5, 4)}, and the joint pmf of (X, Y) is
H 0 otherwise
x5 1,…, 5; y 5 1,…, 5; x Þ y
p sx, yd 5 20
The number of seats separating the two individuals is h(X, Y) 5 u X 2 Y u 2 1. The accompanying table gives h(x, y) for each possible (x, y) pair.
x h (x, y) 1 2 3 4 5
1—0 123 20—012
o x5 1 y5 o 1 20
E [h sX, Yd] 5
o h o sx, yd ? psx, yd 5 sux 2 yu 2 1d ? 5 1 n
sx, yd
xÞy
ExamplE 5.14 In Example 5.5, the joint pdf of the amount X of almonds and amount Y of cashews in a 1-lb can of nuts was
24xy 0 x 1, 0 y 1, x 1 y 1
5 0 otherwise
f (x, y) 5
214 ChapteR 5 Joint probability Distributions and Random Samples
If 1 lb of almonds costs the company 1.50, 1 lb of cashews costs 2.25, and 1 lb of peanuts costs .75, then the total cost of the contents of a can is
h (X, Y) 5 (1.5)X 1 (2.25)Y 1 (.75)(1 2 X 2 Y) 5.75 1 .75X 1 1.5Y (since 1 2 X 2 Y of the weight consists of peanuts). The expected total cost is
E[h(X, Y)] 5
h(x, y) ? f (x, y) dx dy
1 12x
5 (.75 1 .75x 1 1.5y) ? 24xy dy dx 5 1.65 n
The method of computing the expected value of a function h(X 1 ,...,X n ) of n
random variables is similar to that for two random variables. If the X i ’ s are discrete,
E [h(X 1 , ..., X n )] is an n-dimensional sum; if the X i ’ s are continuous, it is an n- dimensional integral.
covariance
When two random variables X and Y are not independent, it is frequently of interest to assess how strongly they are related to one another.
DEFINITION The covariance between two rv’s X and Y is Cov(X, Y) 5 E[(X 2 m X )(Y 2 m Y )]
X o , Y discrete
x o
(x 2 m )(y 2 m )p(x, y)
X Y
5 y
(x 2 m X )(y 2 m Y )f(x, y) dx dy X , Y continuous
That is, since X 2 m X and Y 2 m Y are the deviations of the two variables from their
respective mean values, the covariance is the expected product of deviations. Note
that Cov(X, X) 5 E[(X 2 m ) 2 X ] 5 V(X).
The rationale for the definition is as follows. Suppose X and Y have a strong positive relationship to one another, by which we mean that large values of X tend to occur with large values of Y and small values of X with small values of
Y . Then most of the probability mass or density will be associated with (x 2 m X )
and (y 2 m Y ), either both positive (both X and Y above their respective means) or
both negative, so the product (x 2 m X )(y 2 m Y ) will tend to be positive. Thus for
a strong positive relationship, Cov(X, Y) should be quite positive. For a strong
negative relationship, the signs of (x 2 m X ) and (y 2 m Y ) will tend to be opposite,
yielding a negative product. Thus for a strong negative relationship, Cov(X, Y) should be quite negative. If X and Y are not strongly related, positive and nega- tive products will tend to cancel one another, yielding a covariance near 0. Figure
5.4 illustrates the different possibilities. The covariance depends on both the set of possible pairs and the probabilities. In Figure 5.4, the probabilities could be changed without altering the set of possible pairs, and this could drastically change the value of Cov(X, Y).
5.2 expected Values, Covariance, and Correlation 215
Figure 5.4 p(x, y) 5 110 for each of ten pairs corresponding to indicated points: (a) positive covariance; (b) negative covariance; (c) covariance near zero
ExamplE 5.15 The joint and marginal pmf’s for X 5 automobile policy deductible amount and Y 5 homeowner policy deductible amount in Example 5.1 were
y
p (x, y) 500 1000 5000
from which m X 5 oxp X (x) 5 485 and m Y 5 1125. Therefore, Cov(X, Y) 5
o (x 2 485)(y 2 1125)p(x, y)
(x, y) o
The following shortcut formula for Cov(X, Y) simplifies the computations.
prOpOSITION
Cov(X, Y) 5 E(XY) 2 m X ?m Y
According to this formula, no intermediate subtractions are necessary; only at
the end of the computation is m X ?m Y subtracted from E(XY). The proof involves expanding (X 2 m X )(Y 2 m Y ) and then carrying the summation or integration
through to each individual term.
ExamplE 5.16
The joint and marginal pdf’s of X 5 amount of almonds and Y 5 amount of cashews
5 0 otherwise
0 x 1, 0 y 1, x 1 y 1
f(x, y) 5
12x(1 2 x) 2 0x1
5 0 otherwise
X f (x) 5
216 ChapteR 5 Joint probability Distributions and Random Samples
with f ( y) obtained by replacing x by y in f (x). It is easily verified that m 5m 5 Y 2 X X Y 5 ,
and
1 12x
E sXYd 5
xy f sx, yd dx dy 5
xy ? 24xy dy dx
2 dx 5
0 s1 2 xd 15
58 x 2 3
Thus Cov(X, Y ) 5 2 y15 2 (2y5)(2y5) 5 2y15 2 4y25 5 22y75. A negative covar- iance is reasonable here because more almonds in the can implies fewer cashews. n
It might appear that the relationship in the insurance example is quite strong since Cov(X, Y ) 5 136,875, whereas Cov(X, Y) 5 22 y75 in the nut example would seem to imply quite a weak relationship. Unfortunately, the covariance has a seri- ous defect that makes it impossible to interpret a computed value. In the insurance example, suppose we had expressed the deductible amount in cents rather than in dollars. Then 100X would replace X, 100Y would replace Y, and the resulting covari- ance would be Cov(100X, 100Y) 5 (100)(100)Cov(X, Y) 5 1,368,750,000. If, on the other hand, the deductible amount had been expressed in hundreds of dollars, the computed covariance would have been (.01)(.01)(136,875) 5 13.6875. The defect of covariance is that its computed value depends critically on the units of measure- ment . Ideally, the choice of units should have no effect on a measure of strength of relationship. This is achieved by scaling the covariance.
correlation
DEFINITION The correlation coefficient of X and Y, denoted by Corr(X, Y), r X,Y , or just r, is defined by
Cov(X, Y) r X ,Y 5
s X ?s Y
ExamplE 5.17 It is easily verified that in the insurance scenario of Example 5.15, E(X 2 ) 5 353,500,
s 2 5 353,500 2 (485) 2 5 118,275 , s
5 343.911, E(Y 2 ) 5 2,987,500, s X 2 X Y 5
1,721,875, and s Y
5 1312.202. This gives
The following proposition shows that r remedies the defect of Cov(X, Y ) and also suggests how to recognize the existence of a strong (linear) relationship.
prOpOSITION 1. If a and c are either both positive or both negative, Corr(aX 1 b, cY 1 d) 5 Corr(X, Y)
2. For any two rv’s X and Y, 21 r 1. The two variables are said to be uncorrelated when r 5 0.
5.2 expected Values, Covariance, and Correlation 217
Statement 1 says precisely that the correlation coefficient is not affected by a linear change in the units of measurement (if, say, X 5 temperature in °C, then 9X5 1 32 5 temperature in °F). According to Statement 2, the strongest possible positive relationship is evidenced by r 511, the strongest possible negative relation- ship corresponds to r 521, and r 5 0 indicates the absence of a relationship. The proof of the first statement is sketched in Exercise 35, and that of the second appears in Supplementary Exercise 87 at the end of the chapter. For descriptive purposes, the relationship will be described as strong if u r u .8, moderate if .5 , u r u , .8, and weak if u r u .5.
If we think of p(x, y) or f (x, y) as prescribing a mathematical model for how the two numerical variables X and Y are distributed in some population (height and weight, verbal SAT score and quantitative SAT score, etc.), then r is a population characteristic or parameter that measures how strongly X and Y are related in the
population. In Chapter 12, we will consider taking a sample of pairs (x 1 ,y 1 ), . . . ,
(x n ,y n ) from the population. The sample correlation coefficient r will then be defined and used to make inferences about r.
The correlation coefficient r is actually not a completely general measure of the strength of a relationship.
prOpOSITION 1. If X and Y are independent, then r 5 0, but r 5 0 does not imply
independence.
2. r 5 1 or 21 iff Y 5 aX 1 b for some numbers a and b with a ? 0. This proposition says that r is a measure of the degree of linear relationship between
X and Y, and only when the two variables are perfectly related in a linear manner will
r
be as positive or negative as it can be. However, if u r u ,, 1, there may still be a strong relationship between the two variables, just one that is not linear. And even if u r u is close to 1, it may be that the relationship is really nonlinear but can be well approximated by a straight line.
ExamplE 5.18 Let X and Y be discrete rv’s with joint pmf
.25 sx, yd 5 s24, 1d, s4,21d, s2, 2d, s22, 22d
5 0 otherwise
p sx, yd 5
The points that receive positive probability mass are identified on the (x, y) coordinate system in Figure 5.5. It is evident from the figure that the value of X is com- pletely determined by the value of Y and vice versa, so the two variables are com-
pletely dependent. However, by symmetry m X 5m Y 5 0 and E(XY) 5 (24)(.25) 1
(24)(.25) 1 (4)(.25) 1 (4)(.25) 5 0. The covariance is then Cov(X,Y) 5
E (XY) 2 m X ?m Y 5 0 and thus r X,Y 5 0. Although there is perfect dependence, there is
also complete absence of any linear relationship!
Figure 5.5 The population of pairs for Example 5.18
n
218 ChapteR 5 Joint probability Distributions and Random Samples
A value of r near 1 does not necessarily imply that increasing the value of X causes Y to increase. It implies only that large X values are associated with large Y values. For example, in the population of children, vocabulary size and number of cavities are quite positively correlated, but it is certainly not true that cavities cause vocabulary to grow. Instead, the values of both these variables tend to increase as the value of age, a third variable, increases. For children of a fixed age, there is probably a low correlation between number of cavities and vocabulary size. In summary, association (a high correlation) is not the same as causation.
the Bivariate normal distribution
Just as the most useful univariate distribution in statistical practice is the normal distribution, the most useful joint distribution for two rv’s X and Y is the bivariate normal distribution. The pdf is somewhat complicated:
1 2(1 2 r 2 ) 31 s 1 2 1 s 1 21 s 2 s 2 42
2 2r
A graph of this pdf, the density surface, appears in Figure 5.6. It follows (after some tricky integration) that the marginal distribution of X is normal with mean
value m 1 and standard deviation s 1 , and similarly the marginal distribution of Y is normal with mean m 2 and standard deviation s 2 . The fifth parameter of the distri-
bution is r, which can be shown to be the correlation coefficient between X and Y.
f (x, y)
y
x Figure 5.6 A graph of the bivariate normal pdf
It is not at all straightforward to integrate the bivariate normal pdf in order to calcu- late probabilities. Instead, selected software packages employ numerical integration techniques for this purpose.
ExamplE 5.19 Many students applying for college take the SAT, which for a few years consisted of
three components: Critical Reading, Mathematics, and Writing. While some colleges used all three components to determine admission, many only looked at the first two (reading and math). Let X and Y denote the Critical Reading and Mathematics scores, respectively, for a randomly selected student. According to the College Board website, the population of students taking the exam in Fall 2012 had the following
characteristics: m 1 5 496, s 1 5 114, m 2 5 514, s 2 5 117.
Suppose that X and Y have (approximately, since both variables are discrete) a bivariate normal distribution with correlation coefficient r 5 .25. The Matlab software package gives P(X 650, Y 650) 5 P(both scores are at most 650) 5 .8097. n
5.2 expected Values, Covariance, and Correlation 219
It can also be shown that the conditional distribution of Y given that X 5 x is normal. This can be seen geometrically by slicing the density surface with a plane perpendicular to the (x, y) passing through the value x on that axis; the result is
a normal curve sketched out on the slicing plane. The conditional mean value is
m Y?x 5 (m 2 2 rm 1 s 2 ys 1 ) 1 rs 2 x ys 1 , a linear function of x, and the conditional vari- ance is s 2 5 (1 2 r Y?x 2 )s 2 . The closer the correlation coefficient is to 1 or 21, the
less variability there is in the conditional distribution. Analogous results hold for the conditional distribution of X given that Y 5 y.
The bivariate normal distribution can be generalized to the multivariate normal distribution. Its density function is quite complicated, and the only way to write it compactly is to employ matrix notation. If a collection of variables has this distribution, then the marginal distribution of any single variable is normal, the conditional distribution of any single variable given values of the other variables is normal, the joint marginal distribution of any pair of variables is bivariate normal, and the joint marginal distribution of any subset of three or more of the variables is again multivariate normal.
EXERCISES Section 5.2 (22–36)
22. An instructor has given a short quiz consisting of two
both have length X and the east–west sides both have length
parts. For a randomly selected student, let X 5 the
Y . Suppose that X and Y are independent and that each is
number of points earned on the first part and Y 5 the
uniformly distributed on the interval [L 2 A, L 1 A] (where
number of points earned on the second part. Suppose
0 , A , L). What is the expected area of the resulting
that the joint pmf of X and Y is given in the accompany-
rectangle?
ing table.
26. Consider a small ferry that can accommodate cars and y buses. The toll for cars is 3, and the toll for buses is 10.
p (x, y)
Let X and Y denote the number of cars and buses, respec-
tively, carried on a single trip. Suppose the joint distribu-
x 5 .04 .15 .20 .10
tion of X and Y is as given in the table of Exercise 7.
Compute the expected revenue from a single trip. 27. Annie and Alvie have agreed to meet for lunch between
a. If the score recorded in the grade book is the total
noon (0:00 p.m.) and 1:00 p.m. Denote Annie’s arrival
number of points earned on the two parts, what is the
time by X, Alvie’s by Y, and suppose X and Y are inde-
expected recorded score E(X 1 Y)?
pendent with pdf’s
b. If the maximum of the two scores is recorded, what is the expected recorded score?
3x 2 0x1
5 0 otherwise
f X (x) 5
23. The difference between the number of customers in line
at the express checkout and the number in line at the
2y 0 y 1
5 0 otherwise
super-express checkout in Exercise 3 is X 1 2X 2 .
f Y ( y) 5
Calculate the expected difference.
24. Six individuals, including A and B, take seats around a
What is the expected amount of time that the one who
circular table in a completely random fashion. Suppose
arrives first must wait for the other person? [Hint:
the seats are numbered 1, . . . , 6. Let X 5 A’s seat num-
h (X, Y) 5 u X2Y u .]
ber and Y 5 B’s seat number. If A sends a written mes-
28. Show that if X and Y are independent rv’s, then
sage around the table to B in the direction in which they
E (XY) 5 E(X) ? E(Y). Then apply this in Exercise 25.
are closest, how many individuals (including A and B)
[Hint: Consider the continuous case with f (x, y) 5
would you expect to handle the message?
f X (x) ? f Y (y).]
25. A surveyor wishes to lay out a square region with each side
29. Compute the correlation coefficient r for X and Y of
having length L. However, because of a measurement error,
Example 5.16 (the covariance has already been
he instead lays out a rectangle in which the north–south sides
computed).
220 ChapteR 5 Joint probability Distributions and Random Samples
30. a. Compute the covariance for X and Y in Exercise 22.
variables. [Hint: Remember that variance is just a
b. Compute r for X and Y in the same exercise.
special expected value.]
31. a. Compute the covariance between X and Y in Exercise 9.
b. Use this formula to compute the variance of the
b. Compute the correlation coefficient r for this X and Y.
recorded score h(X, Y) [ 5 max(X, Y)] in part (b) of Exercise 22.
32. Reconsider the minicomputer component lifetimes X and Y as described in Exercise 12. Determine E(XY).
35. a. Use the rules of expected value to show that
What can be said about Cov(X, Y) and r?
Cov(aX 1 b, cY 1 d) 5 ac Cov(X, Y). b. Use part (a) along with the rules of variance and stan-
33. Use the result of Exercise 28 to show that when X and Y
dard deviation to show that Corr(aX 1 b,
are independent, Cov(X, Y) 5 Corr(X, Y) 5 0.
cY 1 d) 5 Corr(X, Y) when a and c have the same sign.
34. a. Recalling the definition of s 2 for a single rv X, write
c. What happens if a and c have opposite signs?
a formula that would be appropriate for computing
36. Show that if Y 5 aX 1 b (a ? 0), then Corr(X, Y) 511
the variance of a function h(X, Y) of two random
or 21. Under what conditions will r 511?