Expected Values, Covariance, and Correlation

5.2 Expected Values, Covariance, and Correlation

  Any function h(X) of a single rv X is itself a random variable. However, we saw that to compute E[h(X)], it is not necessary to obtain the probability distribution of h(X). Instead, E[h(X)] is computed as a weighted average of h(x) values, where the weight function is the pmf p(x) or pdf f(x) of X. A similar result holds for a function h(X, Y) of two jointly distributed random variables.

  prOpOSITION Let X and Y be jointly distributed rv’s with pmf p(x, y) or pdf f(x, y) according to whether the variables are discrete or continuous. Then the expected value of

  a function h(X, Y), denoted by E[h(X, Y)] or m h (X, Y) , is given by

  o if X and Y are discrete

  x o y

  h(x, y) ? p(x, y)

  E [h(X, Y )] 5

  h (x, y) ? f(x, y) dx dy if X and Y are continuous

  ExamplE 5.13 Five friends have purchased tickets to a certain concert. If the tickets are for seats 1–5 in a particular row and the tickets are randomly distributed among the five, what is the expected number of seats separating any particular two of the five? Let X and Y denote the seat numbers of the first and second individuals, respectively. Possible (X, Y) pairs are {(1, 2), (1, 3), . . . , (5, 4)}, and the joint pmf of (X, Y) is

  H 0 otherwise

  x5 1,…, 5; y 5 1,…, 5; x Þ y

  p sx, yd 5 20

  The number of seats separating the two individuals is h(X, Y) 5 u X 2 Y u 2 1. The accompanying table gives h(x, y) for each possible (x, y) pair.

  x h (x, y) 1 2 3 4 5

  1—0 123 20—012

  o x5 1 y5 o 1 20

  E [h sX, Yd] 5

  o h o sx, yd ? psx, yd 5 sux 2 yu 2 1d ? 5 1 n

  sx, yd

  xÞy

  ExamplE 5.14 In Example 5.5, the joint pdf of the amount X of almonds and amount Y of cashews in a 1-lb can of nuts was

  24xy 0 x 1, 0 y 1, x 1 y 1

  5 0 otherwise

  f (x, y) 5

  214 ChapteR 5 Joint probability Distributions and Random Samples

  If 1 lb of almonds costs the company 1.50, 1 lb of cashews costs 2.25, and 1 lb of peanuts costs .75, then the total cost of the contents of a can is

  h (X, Y) 5 (1.5)X 1 (2.25)Y 1 (.75)(1 2 X 2 Y) 5.75 1 .75X 1 1.5Y (since 1 2 X 2 Y of the weight consists of peanuts). The expected total cost is

  E[h(X, Y)] 5

  h(x, y) ? f (x, y) dx dy

  1 12x

  5 (.75 1 .75x 1 1.5y) ? 24xy dy dx 5 1.65 n

  The method of computing the expected value of a function h(X 1 ,...,X n ) of n

  random variables is similar to that for two random variables. If the X i ’ s are discrete,

  E [h(X 1 , ..., X n )] is an n-dimensional sum; if the X i ’ s are continuous, it is an n- dimensional integral.

  covariance

  When two random variables X and Y are not independent, it is frequently of interest to assess how strongly they are related to one another.

  DEFINITION The covariance between two rv’s X and Y is Cov(X, Y) 5 E[(X 2 m X )(Y 2 m Y )]

  X o , Y discrete

  x o

  (x 2 m )(y 2 m )p(x, y)

  X Y

  5 y

  (x 2 m X )(y 2 m Y )f(x, y) dx dy X , Y continuous

  That is, since X 2 m X and Y 2 m Y are the deviations of the two variables from their

  respective mean values, the covariance is the expected product of deviations. Note

  that Cov(X, X) 5 E[(X 2 m ) 2 X ] 5 V(X).

  The rationale for the definition is as follows. Suppose X and Y have a strong positive relationship to one another, by which we mean that large values of X tend to occur with large values of Y and small values of X with small values of

  Y . Then most of the probability mass or density will be associated with (x 2 m X )

  and (y 2 m Y ), either both positive (both X and Y above their respective means) or

  both negative, so the product (x 2 m X )(y 2 m Y ) will tend to be positive. Thus for

  a strong positive relationship, Cov(X, Y) should be quite positive. For a strong

  negative relationship, the signs of (x 2 m X ) and (y 2 m Y ) will tend to be opposite,

  yielding a negative product. Thus for a strong negative relationship, Cov(X, Y) should be quite negative. If X and Y are not strongly related, positive and nega- tive products will tend to cancel one another, yielding a covariance near 0. Figure

  5.4 illustrates the different possibilities. The covariance depends on both the set of possible pairs and the probabilities. In Figure 5.4, the probabilities could be changed without altering the set of possible pairs, and this could drastically change the value of Cov(X, Y).

  5.2 expected Values, Covariance, and Correlation 215

  Figure 5.4 p(x, y) 5 110 for each of ten pairs corresponding to indicated points: (a) positive covariance; (b) negative covariance; (c) covariance near zero

  ExamplE 5.15 The joint and marginal pmf’s for X 5 automobile policy deductible amount and Y 5 homeowner policy deductible amount in Example 5.1 were

  y

  p (x, y) 500 1000 5000

  from which m X 5 oxp X (x) 5 485 and m Y 5 1125. Therefore, Cov(X, Y) 5

  o (x 2 485)(y 2 1125)p(x, y)

  (x, y) o

  The following shortcut formula for Cov(X, Y) simplifies the computations.

  prOpOSITION

  Cov(X, Y) 5 E(XY) 2 m X ?m Y

  According to this formula, no intermediate subtractions are necessary; only at

  the end of the computation is m X ?m Y subtracted from E(XY). The proof involves expanding (X 2 m X )(Y 2 m Y ) and then carrying the summation or integration

  through to each individual term.

  ExamplE 5.16

  The joint and marginal pdf’s of X 5 amount of almonds and Y 5 amount of cashews

  5 0 otherwise

  0 x 1, 0 y 1, x 1 y 1

  f(x, y) 5

  12x(1 2 x) 2 0x1

  5 0 otherwise

  X f (x) 5

  216 ChapteR 5 Joint probability Distributions and Random Samples

  with f ( y) obtained by replacing x by y in f (x). It is easily verified that m 5m 5 Y 2 X X Y 5 ,

  and

  1 12x

  E sXYd 5

  xy f sx, yd dx dy 5

  xy ? 24xy dy dx

  2 dx 5

  0 s1 2 xd 15

  58 x 2 3

  Thus Cov(X, Y ) 5 2 y15 2 (2y5)(2y5) 5 2y15 2 4y25 5 22y75. A negative covar- iance is reasonable here because more almonds in the can implies fewer cashews. n

  It might appear that the relationship in the insurance example is quite strong since Cov(X, Y ) 5 136,875, whereas Cov(X, Y) 5 22 y75 in the nut example would seem to imply quite a weak relationship. Unfortunately, the covariance has a seri- ous defect that makes it impossible to interpret a computed value. In the insurance example, suppose we had expressed the deductible amount in cents rather than in dollars. Then 100X would replace X, 100Y would replace Y, and the resulting covari- ance would be Cov(100X, 100Y) 5 (100)(100)Cov(X, Y) 5 1,368,750,000. If, on the other hand, the deductible amount had been expressed in hundreds of dollars, the computed covariance would have been (.01)(.01)(136,875) 5 13.6875. The defect of covariance is that its computed value depends critically on the units of measure- ment . Ideally, the choice of units should have no effect on a measure of strength of relationship. This is achieved by scaling the covariance.

  correlation

  DEFINITION The correlation coefficient of X and Y, denoted by Corr(X, Y), r X,Y , or just r, is defined by

  Cov(X, Y) r X ,Y 5

  s X ?s Y

  ExamplE 5.17 It is easily verified that in the insurance scenario of Example 5.15, E(X 2 ) 5 353,500,

  s 2 5 353,500 2 (485) 2 5 118,275 , s

  5 343.911, E(Y 2 ) 5 2,987,500, s X 2 X Y 5

  1,721,875, and s Y

  5 1312.202. This gives

  The following proposition shows that r remedies the defect of Cov(X, Y ) and also suggests how to recognize the existence of a strong (linear) relationship.

  prOpOSITION 1. If a and c are either both positive or both negative, Corr(aX 1 b, cY 1 d) 5 Corr(X, Y)

  2. For any two rv’s X and Y, 21 r 1. The two variables are said to be uncorrelated when r 5 0.

  5.2 expected Values, Covariance, and Correlation 217

  Statement 1 says precisely that the correlation coefficient is not affected by a linear change in the units of measurement (if, say, X 5 temperature in °C, then 9X5 1 32 5 temperature in °F). According to Statement 2, the strongest possible positive relationship is evidenced by r 511, the strongest possible negative relation- ship corresponds to r 521, and r 5 0 indicates the absence of a relationship. The proof of the first statement is sketched in Exercise 35, and that of the second appears in Supplementary Exercise 87 at the end of the chapter. For descriptive purposes, the relationship will be described as strong if u r u .8, moderate if .5 , u r u , .8, and weak if u r u .5.

  If we think of p(x, y) or f (x, y) as prescribing a mathematical model for how the two numerical variables X and Y are distributed in some population (height and weight, verbal SAT score and quantitative SAT score, etc.), then r is a population characteristic or parameter that measures how strongly X and Y are related in the

  population. In Chapter 12, we will consider taking a sample of pairs (x 1 ,y 1 ), . . . ,

  (x n ,y n ) from the population. The sample correlation coefficient r will then be defined and used to make inferences about r.

  The correlation coefficient r is actually not a completely general measure of the strength of a relationship.

  prOpOSITION 1. If X and Y are independent, then r 5 0, but r 5 0 does not imply

  independence.

  2. r 5 1 or 21 iff Y 5 aX 1 b for some numbers a and b with a ? 0. This proposition says that r is a measure of the degree of linear relationship between

  X and Y, and only when the two variables are perfectly related in a linear manner will

  r

  be as positive or negative as it can be. However, if u r u ,, 1, there may still be a strong relationship between the two variables, just one that is not linear. And even if u r u is close to 1, it may be that the relationship is really nonlinear but can be well approximated by a straight line.

  ExamplE 5.18 Let X and Y be discrete rv’s with joint pmf

  .25 sx, yd 5 s24, 1d, s4,21d, s2, 2d, s22, 22d

  5 0 otherwise

  p sx, yd 5

  The points that receive positive probability mass are identified on the (x, y) coordinate system in Figure 5.5. It is evident from the figure that the value of X is com- pletely determined by the value of Y and vice versa, so the two variables are com-

  pletely dependent. However, by symmetry m X 5m Y 5 0 and E(XY) 5 (24)(.25) 1

  (24)(.25) 1 (4)(.25) 1 (4)(.25) 5 0. The covariance is then Cov(X,Y) 5

  E (XY) 2 m X ?m Y 5 0 and thus r X,Y 5 0. Although there is perfect dependence, there is

  also complete absence of any linear relationship!

  Figure 5.5 The population of pairs for Example 5.18

  n

  218 ChapteR 5 Joint probability Distributions and Random Samples

  A value of r near 1 does not necessarily imply that increasing the value of X causes Y to increase. It implies only that large X values are associated with large Y values. For example, in the population of children, vocabulary size and number of cavities are quite positively correlated, but it is certainly not true that cavities cause vocabulary to grow. Instead, the values of both these variables tend to increase as the value of age, a third variable, increases. For children of a fixed age, there is probably a low correlation between number of cavities and vocabulary size. In summary, association (a high correlation) is not the same as causation.

  the Bivariate normal distribution

  Just as the most useful univariate distribution in statistical practice is the normal distribution, the most useful joint distribution for two rv’s X and Y is the bivariate normal distribution. The pdf is somewhat complicated:

  1 2(1 2 r 2 ) 31 s 1 2 1 s 1 21 s 2 s 2 42

  2 2r

  A graph of this pdf, the density surface, appears in Figure 5.6. It follows (after some tricky integration) that the marginal distribution of X is normal with mean

  value m 1 and standard deviation s 1 , and similarly the marginal distribution of Y is normal with mean m 2 and standard deviation s 2 . The fifth parameter of the distri-

  bution is r, which can be shown to be the correlation coefficient between X and Y.

  f (x, y)

  y

  x Figure 5.6 A graph of the bivariate normal pdf

  It is not at all straightforward to integrate the bivariate normal pdf in order to calcu- late probabilities. Instead, selected software packages employ numerical integration techniques for this purpose.

  ExamplE 5.19 Many students applying for college take the SAT, which for a few years consisted of

  three components: Critical Reading, Mathematics, and Writing. While some colleges used all three components to determine admission, many only looked at the first two (reading and math). Let X and Y denote the Critical Reading and Mathematics scores, respectively, for a randomly selected student. According to the College Board website, the population of students taking the exam in Fall 2012 had the following

  characteristics: m 1 5 496, s 1 5 114, m 2 5 514, s 2 5 117.

  Suppose that X and Y have (approximately, since both variables are discrete) a bivariate normal distribution with correlation coefficient r 5 .25. The Matlab software package gives P(X 650, Y 650) 5 P(both scores are at most 650) 5 .8097. n

  5.2 expected Values, Covariance, and Correlation 219

  It can also be shown that the conditional distribution of Y given that X 5 x is normal. This can be seen geometrically by slicing the density surface with a plane perpendicular to the (x, y) passing through the value x on that axis; the result is

  a normal curve sketched out on the slicing plane. The conditional mean value is

  m Y?x 5 (m 2 2 rm 1 s 2 ys 1 ) 1 rs 2 x ys 1 , a linear function of x, and the conditional vari- ance is s 2 5 (1 2 r Y?x 2 )s 2 . The closer the correlation coefficient is to 1 or 21, the

  less variability there is in the conditional distribution. Analogous results hold for the conditional distribution of X given that Y 5 y.

  The bivariate normal distribution can be generalized to the multivariate normal distribution. Its density function is quite complicated, and the only way to write it compactly is to employ matrix notation. If a collection of variables has this distribution, then the marginal distribution of any single variable is normal, the conditional distribution of any single variable given values of the other variables is normal, the joint marginal distribution of any pair of variables is bivariate normal, and the joint marginal distribution of any subset of three or more of the variables is again multivariate normal.

  EXERCISES Section 5.2 (22–36)

  22. An instructor has given a short quiz consisting of two

  both have length X and the east–west sides both have length

  parts. For a randomly selected student, let X 5 the

  Y . Suppose that X and Y are independent and that each is

  number of points earned on the first part and Y 5 the

  uniformly distributed on the interval [L 2 A, L 1 A] (where

  number of points earned on the second part. Suppose

  0 , A , L). What is the expected area of the resulting

  that the joint pmf of X and Y is given in the accompany-

  rectangle?

  ing table.

  26. Consider a small ferry that can accommodate cars and y buses. The toll for cars is 3, and the toll for buses is 10.

  p (x, y)

  Let X and Y denote the number of cars and buses, respec-

  tively, carried on a single trip. Suppose the joint distribu-

  x 5 .04 .15 .20 .10

  tion of X and Y is as given in the table of Exercise 7.

  Compute the expected revenue from a single trip. 27. Annie and Alvie have agreed to meet for lunch between

  a. If the score recorded in the grade book is the total

  noon (0:00 p.m.) and 1:00 p.m. Denote Annie’s arrival

  number of points earned on the two parts, what is the

  time by X, Alvie’s by Y, and suppose X and Y are inde-

  expected recorded score E(X 1 Y)?

  pendent with pdf’s

  b. If the maximum of the two scores is recorded, what is the expected recorded score?

  3x 2 0x1

  5 0 otherwise

  f X (x) 5

  23. The difference between the number of customers in line

  at the express checkout and the number in line at the

  2y 0 y 1

  5 0 otherwise

  super-express checkout in Exercise 3 is X 1 2X 2 .

  f Y ( y) 5

  Calculate the expected difference.

  24. Six individuals, including A and B, take seats around a

  What is the expected amount of time that the one who

  circular table in a completely random fashion. Suppose

  arrives first must wait for the other person? [Hint:

  the seats are numbered 1, . . . , 6. Let X 5 A’s seat num-

  h (X, Y) 5 u X2Y u .]

  ber and Y 5 B’s seat number. If A sends a written mes-

  28. Show that if X and Y are independent rv’s, then

  sage around the table to B in the direction in which they

  E (XY) 5 E(X) ? E(Y). Then apply this in Exercise 25.

  are closest, how many individuals (including A and B)

  [Hint: Consider the continuous case with f (x, y) 5

  would you expect to handle the message?

  f X (x) ? f Y (y).]

  25. A surveyor wishes to lay out a square region with each side

  29. Compute the correlation coefficient r for X and Y of

  having length L. However, because of a measurement error,

  Example 5.16 (the covariance has already been

  he instead lays out a rectangle in which the north–south sides

  computed).

  220 ChapteR 5 Joint probability Distributions and Random Samples

  30. a. Compute the covariance for X and Y in Exercise 22.

  variables. [Hint: Remember that variance is just a

  b. Compute r for X and Y in the same exercise.

  special expected value.]

  31. a. Compute the covariance between X and Y in Exercise 9.

  b. Use this formula to compute the variance of the

  b. Compute the correlation coefficient r for this X and Y.

  recorded score h(X, Y) [ 5 max(X, Y)] in part (b) of Exercise 22.

  32. Reconsider the minicomputer component lifetimes X and Y as described in Exercise 12. Determine E(XY).

  35. a. Use the rules of expected value to show that

  What can be said about Cov(X, Y) and r?

  Cov(aX 1 b, cY 1 d) 5 ac Cov(X, Y). b. Use part (a) along with the rules of variance and stan-

  33. Use the result of Exercise 28 to show that when X and Y

  dard deviation to show that Corr(aX 1 b,

  are independent, Cov(X, Y) 5 Corr(X, Y) 5 0.

  cY 1 d) 5 Corr(X, Y) when a and c have the same sign.

  34. a. Recalling the definition of s 2 for a single rv X, write

  c. What happens if a and c have opposite signs?

  a formula that would be appropriate for computing

  36. Show that if Y 5 aX 1 b (a ? 0), then Corr(X, Y) 511

  the variance of a function h(X, Y) of two random

  or 21. Under what conditions will r 511?

Dokumen yang terkait

AN ALIS IS YU RID IS PUT USAN BE B AS DAL AM P E RKAR A TIND AK P IDA NA P E NY E RTA AN M E L AK U K A N P R AK T IK K E DO K T E RA N YA NG M E N G A K IB ATK AN M ATINYA P AS IE N ( PUT USA N N O MOR: 9 0/PID.B /2011/ PN.MD O)

0 82 16

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

19 819 7

Anal isi s L e ve l Pe r tanyaan p ad a S oal Ce r ita d alam B u k u T e k s M at e m at ik a Pe n u n jang S MK Pr ogr a m Keahl ian T e k n ologi , Kese h at an , d an Pe r tani an Kelas X T e r b itan E r lan gga B e r d asarkan T ak s on om i S OL O

2 99 16

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

1 29 9

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

7 202 3

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

0 63 87

The Correlation between students vocabulary master and reading comprehension

16 145 49

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

8 140 133

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

9 128 37

Transmission of Greek and Arabic Veteri

0 1 22