Chp03 – General Random Variables

Introduction to Probability

SECOND EDITION Dimitri P. Bertsekas and John N. Tsitsiklis

Massachusetts Institute of Technology

WWW site for book information and orders

http://www.athenasc.com

Athena Scientific, Belmont, Massachusetts

General R andom Variable s

Contents

3 . 1 . Continuous Random Variables and PDFs

p. 140

3.2. Cumulative Distribution Functions . . .

p. 148

3.3. Normal Random Variables . . . . . . .

p. 153

3.4. Joint PDFs of Multiple Random Variables

3.6. The Continuous Bayes' Rule

p. 178

p. 182 Problems . . . . . . . . .

3.7. Summary and Discussion

p. 184

General Random Variables Chap. 3 Random variables with a continuous range of possible values are quite common;

the velocity of a vehicle traveling along the highway could be one example. If the velocity is measured by a digital speedometer, we may view the speedometer's

reading as a discrete random variable. But if we wish to model the exact velocity,

a continuous random variable is called for. Models involving continuous random variables can be useful for several reasons. Besides being finer-grained and pos­ sibly more accurate, they allow the use of powerful tools from calculus and often

admit an insightful analysis that would not be possible under a discrete model. All of the concepts and methods introduced in Chapter 2, such as expec­ tation, PMFs, and conditioning, have continuous counterparts. Developing and interpreting these counterparts is the subject of this chapter.

3.1 CONTINUOUS RANDOM VARIABLES AND PDFS

f x,

A random variable X is called continuous if there is a nonnegative function

called the probability density function of X , or PDF for short, such that

1 fx(x) dx,

P(X E B) =

for every subset B of the real line. t In particular, the probability that the value of X falls within an interval is

< = f.b fx(x) dx,

P(a X b)

and can be interpreted as the area under the graph of the PDF ( see Fig. 3. 1 ) . For any single value a, we have P (X = a) =

faa fx{x) dx

= O. For this reason,

including or excluding the endpoints of an interval has no effect on its probability:

P(a � X � b) = P(a X b) = P (a <X b) = P(a X � < b) . < < <

fx

Note that to qualify as a PDF, a function must be nonnegative, I.e. ,

fx(x)

x,

;:::: 0 for every and must also have the normalization property

I: fx{x)dx

P(-oc <X oc ) = 1.

Ix (x) dx fB is to be interpreted in the usual calculus/Riemann

t The integral

sense and we implicitly assume that it is well-defined. For highly unusual functions and sets, this integral can be harder or even impossible - to define, but such issues

belong to a more advanced treatment of the subject. In any case, it is comforting to know that mathematical subtleties of this type do not arise if Ix is a piecewise continuous function with a finite or countable number of points of discontinuity, and

B is the union of a finite or countable number of intervals.

that X takes a value in an

is the shaded area in

must

( X (t)

of the mass per around x. If 6 is very that X takes x + 0] is which

is

if 0 :::; X :::; 1,

General Random Variables 3

PDF fxCr)

1 b- a

3 . 3 : The PDF a uni­ form random variable.

a b .1'

for some constant c. constant can

property

tvlore generally, we can consider a random variable X that takes values in

an interval [a, b] )

again assume that

two subintervals of the same length

to

as or

Ix (x) =

if a � x � b,

(cf. Fig. constant value of the

within

b] is determined from the

normalization property. Indeed, we have

Ib a. b-a

dx =

Ix

PDF. Alvin's driving

between 20

case.

is sunny

the driving time,

as

a random variable X?

We interpret the statement "all times are equally likely" in t he sunny cases, to mean

is co nst a nt

[ 1 5 , 20] and [20, 25] . Furthermore, since these two intervals contain PDF

zero

if 15 ::; x < 20, if 20 ::; x ::; 25,

C2 of a sunny

are some constants. constants

:3 2 = P (sunny d ay) =

(x) =

Cl

3.4: A piecewise constant PDF involving three intervals.

3 1 = P ( ain y day ) = (x) =

C2

C2 a

so

15 ' -.

constant form if a, � x < ai+ l l i=I,2, ... ,n-I,

) constants Cl, C2, ... ) Cn determined

a 1 a2 : ' - . ) an are some with at < ai+ 1 for all i, and are

by

as in

m ust be such that the normalization

(x) dx = n-l1ai+1 Ci dx = ci (ai+ l - ad.

{ O.

1 if 0 < x ::; 1, !X (X) =

becomes infinitely large as x approaches zero� this is still a valid

fx (x)

144 General Random Variables Chap. 3

Summary of PDF Properties

fx

Let X be a continuous random variable with PDF

fx(x)

x.

� 0 for all

I: fx(x) dx

= l.

([x, x f x (x)

If 8 is very small, then P

For any subset B of the real line,

fa fx(x) dx.

P X E B) =

Expectation The expected value or expectation or mean of a continuous random variable

X is defined by t

I: xfx(x) dx.

E[X] =

This is similar to the discrete case except that the PMF is replaced by the PDF, and summation is replaced by integration. As in Chapter 2. E[X] can be inter­ preted as the " center of gravity" of the PDF and, also. as the anticipated average value of X in a large number of independent repetitions of the experiment. Its mathematical properties are similar to the discrete case - after all, an integral is just a limiting form of a sum.

If X is a continuous random variable with given PDF. any real-valued function Y = g(X) of X is also a random variable. Note that Y can be a continuous random variable: for example, consider the trivial case where Y = g(X) = X. But Y can also tUrn out to be discrete. For example. suppose that

t One has to deal with the possibility that the integral f�oc> xix (x) dx is infi­ nite or undefined. More concretely. we will say that the expectation is well-defined if

Ixlix (x) dx < ce. In that case, it is known that the integral f�Xi xix (x) dx f�x takes

a finite and unambiguous value. For an example where the expectation is not well-defined. consider a random

variable X with PDF ix(x) = c/(l + x2), where c is a constant chosen to enforce the normalization condition. The expression Ixlix (x) can be approximated by c/lxl when Ixl is large. Using the fact fiX (Ilx) dx = x. one can show that r::=x Ixlix (x) dx = :)c .

Thus. E[X] is left undefined. despite the symmetry of the PDF around zero. Throughout this book. in the absence of an indication to the contrary, we implic­ itly assume that the expected value of any random variable of interest is well-defined.

Sec. 3. 1 Continuous Random Variables and PDFs 145

g(x) x random variable taking values in the finite set g(x) g(X)

= 1 for > O. and

O. otherwise. Then Y =

is a discrete

{O. I}. In either case. the mean of

g(X)

satisfies the expected value rule

E[g(X)] I: g(x)fx(x) dx,

in complete analogy with the discrete case; see the end-of-chapter problems.

The nth moment of a continuous random variable X is defined as E[xn] . the expected value of the random variable Xn. The variance. denoted by var (X). is defined as the expected value of the random variable (X - E[X]) 2.

We now summarize this discussion and list a number of additional facts that are practically identical to their discrete counterparts.

Expectation of a Continuous Random Variable and its Properties

fx.

Let X be a continuous random variable with PDF

The expectation of X is defined by

I: xfx(x) dx.

E[X] =

g(X)

The expected value rule for a function has the form

E[g(X)] I: g(x)fx(x) dx.

The variance of X is defined by

i: (x fx(x) dx.

var (X) =

E [ (X - E[X])2] =

- E[X]) 2

We have

0 ::; var (X) = E[X2] - (E[X]) 2 .

If Y = aX + b, where a and b are given scalars, then

E[Y] = aE[X] + b,

var (Y) = a2var (X).

Example 3.4. Mean and Variance of the Uniform Random Variable.

Consider a uniform PDF over an interval [a. b]. as in Example 3. 1 . \Ve have jx

E[X] = _ = xfx{x) dx = X· a dx

146 General Random Variables Chap. 3

b-a 2 1 a

!X2 b

1 b 2 -- . --- - a2

b-a

a+b

as one expects based on the symmetry of the PDF around (a + b)/2.

To obtain the variance, we first calculate the second moment. We have

= l __ b-a b-a l a a

dx = 1 b x2 dx

E[X2]

Thus, the variance is obtained as

(a + b) 2 _ (b - a) 2

after some calculation.

Exponential Random Variable An exponential random variable has a PDF of the form

fx(x) 0, otherwi�e,

Ae-AX ' if > °

where A is a positive parameter characterizing the PDF ( see Fig. 3.5). This is a legitimate PDF because

100 fx(x) dx

OO

dx I =

Ae-AX

= _e_.xx

Note that the probability that X exceeds a certain value decreases exponentially. Indeed, for any a > 0, we have

An exponential random variable can, for example, be a good model for the amount of time until an incident of interest takes place, such as a message

J ...

large A

3.5: The PDF

of an

random variable.

a for the time being we will analytically.

the

can be

E[X] =

var(X ) � = ,.\2 '

can be verified by straightforward as we now

E[X] =

1 -�.

by

the second

IS

E[X2] = [10

I : 100

Ax ) +

148 General Random Variables Chap. 3

Finally, using the formula var(X) = E[X2] - ( E[X]) 2 . we obtain

var(X) = ,,\2 - ,,\2 = ,,\2 '

Example 3.5. The time until a small meteorite first lands anywhere in the Sahara desert is modeled as an exponential random variable with a mean of 10 days. The

time is currently midnight. What is the probability that a meteorite first lands

some time between 6 a.m. and 6 p.m. of the first day?

Let X be the time elapsed until the event of interest, measured in days. Then, X is exponential, with mean 1/>. = 10, which yields >. = 1 /10. The desired probability is

P(1/4 � X � 3/4) = P(X � 1/4) - P(X > 3/4) = e-I/40 - e-3/40 = 0.0476.

where we have used the formula P(X � a ) = P(X > a ) = e-..\a.

3.2 CUMULATIVE DISTRIBUTION FUNCTIONS We have been dealing with discrete and continuous random variables in a some­

what different manner. using PMFs and PDFs, respectively. It would be desir­ able to describe all kinds of random variables with a single mathematical concept. This is accomplished with the cumulative distribution function, or CDF for short. The CDF of a random variable X is denoted by

Fx

and provides the

probability P(X ::; x). In particular, for every x we have if X is discrete,

Fx(x) ::; x)

= P(X

if X is continuous. Loosely speaking, the CDF

Fx(x)

x.

" accumulates" probability "up to" the value

Any random variable associated with a given probability model has a CDF, regardless of whether it is discrete or continuous. This is because {X ::; is

x}

always an event and therefore has a well-defined probability. In what follows, any unambiguous specification of the pro babili ties of all events of the form {X ::;

.r},

be it through a PMF, PDF, or CDF. will be referred to as the probability law of the random variable X. Figures 3.6 and 3.7 illustrate the CDFs of various discrete and continuous random variables. From these figures, as well as from the definition, some general properties of the CDF can be observed.

variables. T he C D F 15 re l ated to the the formulA.

and has a staircase

at t he

mass . Note t h a t at t he of the two

1 b -a

3 . 7: CDFs of some continuous random variables. The CDF is related to t he PDF

the

dt.

Thus. t he PDF

can

be obtained from the CDF

General Random Variables Chap. 3

Properties of a CDF

Fx

The CDF

of a random variable X is defined by

Fx(x) ::; x),

x,

= P(X

for all

and has the following properties.

Fx

is monotonically nondecreasing:

x ::; y, < Fx(x) Fx(y).

if

then

Fx(x)

tends to 0 as - - 00 , and to 1 as - 00 .

Fx(x)

x.

If X is discrete, then is a piecewise constant function of

Fx(x)

x.

If X is continuous, then is a continuous function of

If X is discrete and takes integer values, the PMF and the CDF can

be obtained from each other by summing or differencing:

Fx(k) L px(i),

i=-oo

px(k) ( ::; k) ::; k Fx(k) - Fx(k

= P X - P(X

k.

for all integers

If X is continuous, the PDF and the CDF can be obtained from each other by integration or differentiation:

= dFx Fx(x) [ oo /x(t) dt, /x(x) dx (x).

(The second equality is valid for those at which the PDF is contin­

uous.)

Sometimes, in order to calculate the PMF or PDF of a discrete or contin­ uous random variable. respectively. it is more convenient to first calculate the CDF. The systematic use of this approach for functions of continuous random

variables will be discussed in Section 4. 1 . The following is a discrete example. Example 3.6. The Maximum of Several Random Variables. You are al­

lowed to take a certain test three times. and your final score will be the maximum

Sec. 3.2 Cumulative Distribution Functions

151

of the test scores. Thus,

where

your score in each test takes one of the values from 1 to 10 with equal probability 1/10, independently of the scores in other tests. What is the PMF px of the final

XI , X2 . X3 are the three test scores and X is the final score. Assume that

score?

We calculate the PMF indirectly. We first compute the CDF F x and then

obtain the PMF as

- 1), k 1, .. . ,10.

px (k)

= = Fx (k) - Fx (k

We have

F x ( k ) = P(X � k)

= P(XI � k, X2 � � k, X3 k)

k) P(X2 k) P(X3 k)

= � P(XI �

where the third equality follows from the independence of the events { Xl � k}, {X2 � k}, {X3 � k}. Thus, the PMF is given by

px ( = k ) ( k )3 (k - 1)3 =

k 1. .. . . 10.

10 -

--w-

The preceding line of argument can be generalized to any number of random variables Xl . . . . . Xn . In particular, if the events {Xl � x} . . . . , {Xn � X} are

independent for every x, then the CDF of X = max{XI , . . . . Xn} is

Fx (x) = FXI (x) .. . FXn (x).

From this formula, we can obtain px (x) by differencing ( if X is discrete ) , or fx (x)

by differentiation ( if X is continuous ) .

The Geometric and Exponential CDFs Because the CDF is defined for any type of random variable, it provides a conve­

nient means for exploring the relations between continuous and discrete random variables. A particularly interesting case in point is the relation between geo­ metric and exponential random variables. Let X be a geometric random variable with parameter Pi that is, X is the number of trials until the first success in a sequence of independent Bernoulli

trials, where the probability of success at each trial is p. Thus, for k = 1 , 2, . . " we have P(X = k) = p(l - p)k- l and the CDF is given by

Fgeo(n) l - ( l - p)n =

p)k-l = p _ _

=1- (1 - p)n ,

for n = 1, 2, . . ..

1 (1 p)

k=l

ra.ndom variable can be

where 6 is chosen so that

1-

so

are

1,

3.9: A normal PDF and with IJ. = 1 an d 1. We observe that the PDF is

around its mean 11-, a n d has a characteristic bel l As x

In this the PDF is very dose to zero outside the i nterval

further from

t he term e-

decreases very

Inean can

we

e-

dy

154 General Random Variables Chap. 3

The last equality above is obtained by using the fact

1 -- 100 e-y2 /2 dy = 1,

J27T

which is just the normalization property of the normal PDF for the case where

Il =

0 and (j = 1.

A normal random variable has several special properties. The following one is particularly important and will be justified in Section 4. 1.

Normality is Preserved by Linear Transformations If X is a normal random variable with mean Il and variance (j2, and if a =f. 0,

b are scalars, then the random variable

Y = aX + b

is also normal, with mean and variance

E[Y] = all + b,

var(Y) = a2(j2.

The Standard Normal Random Variable

A normal random variable Y with zero mean and unit variance is said to be a standard normal. Its CDF is denoted by cI>:

1Y e-t /2 dt.

cI>(y) = P (Y ::; y) = P(Y < y)

It is recorded in a table (given in the next page), and is a very useful tool for calculating various probabilities involving normal random variables; see also Fig. 3.10.

Note that the table only provides the values of cI>(y) for y � 0, because the omitted values can be found using the symmetry of the PDF. For example, if Y is a standard normal random variable, we have

cI>( -0.5) = P(Y ::; -0.5) = P(Y � 0.5) = 1- P(Y < 0.5) = 1 - cI>(0.5) = 1 - .6915 = 0.3085.

More generally, we have

cI>( -y) = 1 - <I>(y) ,

for all y.

Sec. 3.3 Normal Random Variables 155

The standard normal table. The entries in this table provide the numerical values of �(y) = P(Y :::; y), where Y is a standard normal random variable, for y between 0 and 3.49. For example, to find �( 1.71 ) , we look at the row corresponding to 1.7 and the column corresponding to 0.01 . so that �(1 .71) .9564. When y is negative. the = value of �(y) can be found using the formula �(y)

= 1 - �( -y) .

General Randolll Variables Chap. 3

by

Y is a

it is normal.

- 1-1 = 0,

var ( Y ) =

Thus, Y is a standard normal random variable. This

the probability of any event defined in

of X: we

of and then use the standard nonnal table.

IvIean = 0

Figure 3 . 1 0 : The P DF

(y) =

of t he standard normal random variable. The w h ich is de- n oted by <1>, is recorded in a table.

Example

3 . 7. Using the Normal Table. The annual snowfall at a particular location is modeled as a normal random

with a mean of j.L = 60

that snowfall will be at least 80 i nches?

a (J =

is

X snow accumulation, viewed as

a normal random variable, and

y = X - J1

(J

be the correspondi ng standard normal random variable. We have

P (X > - 80) = P (X

) (y

where 4> is the CDF of the standard normal. We read the value 4> ( 1) from the table:

4l( 1) = 0.841 3,

Sec. 3.3 Normal Random Variables 157 so that

P(X � 80) = 1 - cIl(l) = 0.1587.

Generalizing the approach in the preceding example, we have the following procedure.

CDF Calculation for a Normal Random Variable For a normal random variable X with mean Jl and variance 0'2, we use a

two-step procedure. (a ) "Standardize" X, i.e., subtract Jl and divide by a to obtain a standard

normal random variable Y. (b ) Read the CDF value from the standard normal table:

P(X S x ) = P

Normal random variables are often used in signal processing and communi­ cations engineering to model noise and unpredictable distortions of signals. The following is a typical example.

A binary message is transmitted as a signal s, which is either - l or + 1 . The communication channel corrupts the transmission with additive normal noise with mean J-L = 0 and variance a2• The receiver concludes that the signal - 1 (or + 1 ) was transmitted if the value received is <

Example 3.8. Signal Detection.

0 (or � 0, respectively) ; see Fig. 3. 1 1 . What is the probability of error? An error occurs whenever - 1 is transmitted and the noise N is at least 1 so that s + N = - 1 + N � 0, or whenever + 1 is transmitted and the noise N is smaller than - 1 so that s +N= 1+N < O. In the former case, the probability of error is

= l - cIl (�) .

In the latter case, the probability of error is the same, by symmetry. The value of cIl(l/a) can be obtained from the normal table. For a = 1 , we have cIl(l/a) = cIl ( 1 ) = 0.8413, and the probability of error is 0.1587.

Normal random variables play an important role in a broad range of proba­ bilistic models. The main reason is that. generally speaking, they model well the additive effect of many independent factors in a variety of engineering, physical. and statistical contexts. Mathematically, the key fact is that the sum of a large

3.8. The area of t he

shaded of er ror in the two cases w here -1 and + 1 is transmitted .

number independent identically distributed (not necessarily normal) ran­ dom variables has an approximately normal cnp,

of the cnp of the

theorem� which will be d i s cu ss ed

of a PDF to the c ase m ult i ple random variables.

we

uce j o i n t ,

i nte r p r e t a ti o n as

p a r a ll el the discrete case.

th at two continuous random

saIne

of a joint PDF Jx.Y if fx.Y is a nonnegative function that satisfies

are

can be

((X� Y) E (x, y) dx

of

two-dimensional

above Ineans

that t he integration is carried o v e r the set B. In t he

case where

B is

{ (x: y) I a � x � b, c :::; y ::; d}, we

a of

P(a :-::: X :-::: b, c :-::: Y :-::: d) =

fx.Y Y)

159 Furthermore, by letting

Sec. 3.4 Joint PDFs of Multiple Random Variables

B be the entire two-dimensional plane, we obtain the normalization property

fx.y(x,y)dxdy

To interpret the joint PDF, we let 0 be a small positive number and consider the probability of a small rectangle. We have

lc+61a+6

P(a < X < a+o, c ::; < c+o) c a fX'y(x,y)dxdy fx.Y(a,c)·o2,

fx,Y(a, c) (a. c).

so we can view

as

the "probability per unit area" in the vicinity of

The joint PD F contains all relevant probabilistic information on the random variables Y, and their dependencies. It allows us to calculate the probability

X,

of any event that can be defined in terms of these two random variables. As

a special case, it can be used to calculate the probability of an event involving only one of them. For example, let A be a subset of the real line and consider the event E A}. We have

{X

I:

P(X p(X

i fx,Y(x, y) dydx.

E A) =

E A and Y E (-00,

Comparing with the formula

P(X

i fx(x) dx.

E A) =

fx X

we see that the marginal PDF of is given by

fx{x) i: fx.y(x, y) dy.

Similarly,

fy(y) fx,y{x, y) dx.

Example 3.9. Two-Dimensional Uniform PDF. Romeo and Juliet have a

date at a given time, and each will arrive at the meeting place with a delay between o and 1 hour (recall the example given in Section 1 .2). Let X and Y denote the

delays of Romeo and Juliet, respectively. Assuming that no pairs (x. y) in the unit square are more likely than others, a natural model involves a joint PDF of the

form

{ c.

if 0 S x S

1 and 0 Sy S I,

/x.y ( , y) =

0 , otherwise,

3 where c is a constant.

to

y)

= I,

we must have

c ==

us some S of

is an

two-dimensional plane. The corresponding u niform j oint PDF on S is deft ned to be

{ O. otherwise.

if

Y) E

area of S !

any set A C Y)

in A is

A ((X. E A) =

JJ

area of Y)

fx,y y)

(:r· Y) E A

Example

are told that t he jOint PDF of the random variables X and Y is a c onstant c on t

he set S

3. and is zero

We wish to

Y.

fx.y (x, y) = c = 1 /4, for y) E marginal PDF f X (x) for SOIne particular Xl we

(with respe ct to y) the joint PDF over the vertical. ]ine correspond ing to that x.

is

We can

3 . 1 2: The joint PDF in Example 3.10 and the resulting margi nal P D Fs.

161 Example 3 . 1 1 . Buffon's

This is a famous example, which marks the

the is

. Suppose that we a nee dle of length l on t he What is the probability that the needle will intersect one of the lines?

3 . 1 3: BuH'on 's need le. The

between t he

of t he need le and the of i ntersection of the axis of t he needle w i th the closest paral le] l ine is

sin 9. The needle will intersect the closest par- allel l ine if and only if t h i s

is l es s

t han l /2 .

1 < d so that

cannot

two

the vertical dist.ance fr om the midpoint of the needle to

(X. with a uniform joint

d/2. 0 � () � 11"/2} , so that

4/(7rd}. { otherwise.

if x E [0, d/2]

(j E

(x, O ) =

one of the lines if and only i f

seen from

3. t h e will

x < � sin

so of

(X ::; (l/2) sin e) =

Ix.s

B) dB

t T hi s problem was posed and solved in 1 by the French naturalist Buffon. A n u m b er of

of the problem have been

induding the case where

as a basis f o r experimental evaluations of 11" (among others, it named

simulation programs for co mp uti n g 11" u s i n g Buffon 's ideas .

162 General Random Variables Chap. 3

; --d 7r 1Tf/21(1/2)Sin9 0 0

= 7rd 1Tf/2 sin O dO

ITf 0

7rd

21 7rd '

The probability of intersection can be empirically estimated, by repeating the ex­ periment a large number of times. Since it is equal to 2l/7rd, this provides us with

a method for the experimental evaluation of 7r. Joint CDFs

X define their joint CDF by

If and Y are two random variables associated with the same experiment, we

Fx,Y(x, y) P(X � < y).

X, Y

As in the case of a single random variable, the advantage of working with the CDF is that it applies equally well to discrete and continuous random variables. In particular, if and Y are described by a joint PDF

X j X,Y,

then

Conversely, the PDF can be recovered from the CDF by differentiating:

82FX,Y

jx,Y(x,y) 8x8y (x, y).

Example 3.12. Let X and Y be described by a uniform PDF on the unit square.

The joint CDF is given by

Fx.y (x, y) ; P(X � x, Y � y) xy, for 0 :5 x, y � 1.

We then verify that

;)2 Fx,y

tf(xy)

(x y)

(x, y) = 1 = /x.y (x, y),

axay

axay

for all (x, y) in the unit square.

Sec. 3.4 Joint PDFs of Multiple Random Variables 163

Expectation If and Y are jointly continuous random variables and 9 is some function, then

g(X,

Y) is also a random variable. We will see in Section 4. 1 methods for

computing the PDF of Z, if it has one. For now, let us note that the expected value rule is still applicable and

Y) ] E[g(X, = I: I: g(x, y)fx,y(x, y) dxdy.

a,

As an important special case, for any scalars b, and c, we have

E[aX

aE[X] E

+b Y + c] =

+b r Y l+ c.

More than Two Random Variables The joint PDF of three random variables x, Y , and Z is defined in analogy with

the case of two random variables. For example, we have

P((X, j j jfx,y,z(x,y, z) dxdydz,

Y, Z) E B) =

(x,y,z) E B

for any set B. We also have relations such as

fx,Y(x, y) I: fx,Y,z(x, y, z) dz,

and

fx(x) I: I: fx,Y.z(x, y, z) dydz.

The expected value rule takes the form

Y, Z) ] E[g(X, = I: I: I: g(x, y, z)fx.Y,z(x, y, z) dxdydz,

aX +b Y + cZ ] =

and if 9 is linear, of the form

+b Y + cZ, then

E[aX

aE[X] E [

+b r Y l+ cE Z ] .

Furthermore, there are obvious generalizations of the above to the case of more than three random variables. For example, for any random variables

Xl, X2, .. . ,

X aI, a2 .

and any scalars

, an , we have

General Random Variables Chap. 3

Summary of Facts about Joint PDFs Let and Y be jointly continuous random variables with joint PDF

X /x,y.

The joint PDF is used to calculate probabilities:

p((X, B) /x,y(x, y) dxdy.

(x,y)EB JJ

Y) E =

The marginal PDFs of and can be obtained from the joint PDF,

using the formulas

I: /x.y(x, y) dy, /y(y) i:

/x(x) =

/x.y(x, y) dx.

Fx,Y(x, y) P(X ::; x, ::; y),

The joint CDF is defined by

Y and

determines the joint PDF through the formula

= 82FX Y

/x,y(x, y)

(x, y),

(x, y)

for every

at which the joint PDF is continuous.

g(X, X

A function

Y) of and defines a new random variable, and

I: 1:

E [g(X,

g(x, Y)/X,y(x, y) dx dy.

Y)] =

aX bY

If 9 is linear, of the form

+ c, we have

E[aX ] aE[X]

+ bY + c =

+ bErYl + c.

The above have natural extensions to the case where more than two random variables are involved.

3.5 CONDITIONING Similar to the case of discrete random variables. we can condition a random

variable on an event or on another random variable, and define the concepts of conditional PDF and conditional expectation. The various definitions and

formulas parallel the ones for the discrete case, and their interpretation is similar, except for some subtleties that arise when we condition on an event of the form

{Y = which has zero probability. y},

Sec. 3.5 Conditioning 165

Conditioning a Random Variable on an Event The conditional PDF of a continuous random variable X, given an event A

with P(A) > 0, is defined as a nonnegative function fXIA that satisfies

is fXIA(X) dx,

P(X E B I A) =

for any subset B of the real line. In particular, by letting B be the entire real line, we obtain the normalization property

so that fXIA is a legitimate PDF.

In the important special case where we condition on an event of the form {X E A}, with P(X E A) > 0, the definition of conditional probabilities yields

P(X E B, X E A)

P(X B I X A) E E

P(X E A) By comparing with the earlier formula, we conclude that

P(X E A)

fx(x) if

fXI{XEA}(X) P(X E A) .

E A,

otherwise.

As in the discrete case, the conditional PDF is zero outside the conditioning set. Within the conditioning set , the conditional PDF has exactly the same shape as

the unconditional one, except that it is scaled by the constant factor l/P(X E A),

so that fXI{XEA} integrates to 1; see Fig. 3. 14. Thus, the conditional PDF is similar to an ordinary PDF, except that it refers to a new universe in which the event {X E A} is known to have occurred.

Example 3.13. The Exponential Random Variable is Memoryless. The

time T until a new light bulb burns out is an exponential random variable with

parameter A. Ariadne turns the light on, leaves the room, and when she returns, t

time units later. finds that the light bulb is still on. which corresponds to the event

A {T > t}. = Let X be the additional time until the light bulb burns out. What

is the conditional CDF of X. given the event A? We have, for x � O.

P(X > x I A) = P(T > t+x IT> t) =

P(T >

T> t)

P(T > t + x)

=e

- .>.x

P (T > t)

3.14: The unconditional PDF Ix and the cond itional PDF Ix

where

conditioning event I {X E A} retains the same shape as Ix ! except that it is scaled along the vertical axis.

the A is interval [a, b] . No t e that within

where we have used the expression for the CDF of an exponential random v ariab l e

of X is

of the t i me t that elapsed between t he l i gh t ing of the bulb and Ariadne's arrival. is k n o w n as t he memorylessness property of the exponential. Generally, if we

remaining time up to completion operation started .

When

variables are

a similar notion

S upp o s e , for

are jointly

f x I y. If we condition on a probability event of the form := { (X!

with joint

E A}, w e have

y) == 0,

can (x) =

two

provide one

method for obtaining the

PDF of a random variable X is not of {X E A},

is terms of multiple random variables. fi na lly

of the total probability which

conditional

form a

Sec. 3.5 Conditioning

167 the sample space, then

!x (x) = L P(Ai)!XIAi (x).

i=l

To j ustify this statement, we use the total probability theorem from Chapter 1 , and obtain

P(X � x) = L P(AdP(X � x I Ai) .

i= l

This formula can be rewritten as

- oc x !X (t) dt

= P(Ad !XIA, (t) dt. �

j.e

- oc

We then take the derivative of both sides, with respect to x, and obtain the desired result.

Conditional PDF Given an Event

The conditional PDF !XIA of a continuous random variable X, given

an event A with P(A) > 0, satisfies P(X E B I A) = !XIA(X) dx.

Is

!XI{XEA} (x) { P(X E A)

If A is a subset of the real line with P(X E A) > 0, then

!x (x)

' 'f 1 X E A ,

otherwise.

Let A I , A2, . . . ,An be dis j oint events that form a partition of the sam­ ple space, and assume that P(Ai) > 0 for all i, Then,

!X {x) L P(Ai)!xIA; (X)

i=l

( a version of the total probability theorem ) ,

The following example illustrates a divide-and-conquer approach that uses the total probability theorem to calculate a PDF.

General

Variables 3

Example 14. The metro train arrives at the station near your home every

at 6:00 a. m .

lJ

3 . 1 5 : The PDFs

jYjA ' jYI B l and

in

X l is a

over the i nterval from

your

7: 1 0 to 7:30; see

train}\ = {7: 15 < ::; 7:30} = {you board t he 7: 30

A = { 7: 1 0 � X � 7:

= { yon board t he 7:

7: to 7: 1 5. In t his case:

time Y is also uniform and takes

between

o and 5 minutes; see Fig. 3. 15(b) . Similarly, conditioned on B� Y is uniform and

s e e 3. 1 5(c) .

is

using th e total probability

(y) =

and is shown i n

3. 1 5(d) . We

(y)

for 0

for 5 = 4 < 4:

y :::; 1 5.

(y)

on

c on t inuo us

with joint

with

(y) > O. th e th a t = y, is de fi ned by

IY

This definition is analogous to

y)/py (y) for

case.

a fixed number and consider IY (x I y) as a function of the single variable x. Viewed as

thinking about th e it is

to

y as

Ix. Y (x, Y) , be- cause the denominator (y)

a I XIY ( x

I y)

the sanle

joint

as

not depend on �r : see

Iy (y) =

fx.y (:r. y)

Ix

I y)

so for

3 . 1 6 : V isualization of t he cond i t ional PDF

I y) . Let X and Y

PDF w h i c h is u n i form on t he set S.

Fo r

fixed y,

we consider the

the slice Y = y and normal ize it so

it

to L a dart at a

target of

always hits the target� all points of impact (x . y) are equally likely. so t hat the joint PDF of the random

assume

Example

area of t

if IYI � T.

is not

I y) ==

if

I Y IS

Sec. 3.5 Conditioning 171

To interpret the conditional PDF, let us fix some small positive numbers (h and 82, and condition on the event B= {y ::; Y ::; y + 82}. We have

y 82) P(x

) = P(x ::; X ::;

x 81 and y Y ::;

<X< +

x 8 1 Y - I < < -Y 8 2

P( y < _ Y < _ Y + 8) 2

fx,Y(x, y)8182

fy(y)82

= fxlY(x I y)81 •

In words, fxlY(x I y)81 provides us with the probability that X belongs to a small interval [x, x + 8J], given that Y belongs to a small interval [y, y + 82], Since fxlY(x I y)81 does not depend on 82, we can think of the limiting case where 82 decreases to zero and write

P(x <X< x + 81 I Y = y) � fxlY(x I y)81, (81 small) ,

and, more generally,

P (X E AI Y = y) = f x IY (x I y) dx.

Conditional probabilities, given the zero probability event {Y = y}, were left

undefined in Chapter 1. But the above formula provides a natural way of defining such conditional probabilities in the present context. In addition, it allows us to

view the conditional PDF fxlY(x I y) (as a function of x) as a description of the probability law of X, given that the event {Y = y} has occurred. As in the discrete case, the conditional PDF fxlY' together with the marginal PDF fy are sometimes used to calculate the joint PDF. Furthermore, this approach can also be used for modeling: instead of directly specifying fx,Y, it is often natural to provide a probability law for Y, in terms of a PDF fy,

and then provide a conditional PDF fxlY(x I y) for X, given any possible value y of Y.

Example 3. 16. The speed of a typical vehicle that drives past a police radar is modeled as an exponentially distributed random variable X with mean 50 miles per hour. The police radar's measurement Y of the vehicle's speed has an error which is modeled as a normal random variable with zero mean and standard deviation equal to one tenth of the vehicle's speed. What is the joint PDF of X and Y?

We have fx (x)

= x, the ( 1/50)e-x/5o , for x � O. Also, conditioned on X =

measurement Y has a normal PDF with mean x and variance x2/ 100. Therefore,

f YIX y ( I x) =

e-Cy-x)2 /C2x2 /100) .

J2"7r (x/ lO)

Thus, for all x � 0 and all y,

, - y ) -f ( )I YIX (I) 1 -x/50 10 -50Cy-x)2/x2 Y

f X,Y ( x -x x

x -- e e .

172 General Random Variables Chap. 3

Conditional PDF Given a Random Variable

fx,Y'

Let X and Y be jointly continuous random variables with joint PDF

The joint, marginal, and conditional PDFs are related to each other by the formulas

fx,Y(x, y) fy(y)fxlY(x y),

fx(x) [: fy(y)fxlY(x y) dy.

fxlY(x y)

The conditional PDF

I is defined only for those for which

fy(y)

We have

y) L fxlY(X y) dx.

P(X E AIY = =

For the case of more than two random variables, there are natural exten­ sions to the above. For example, we can define conditional PDFs by formulas such as

I = fx,Y.z(x, y, z) if > 0, fX,Ylz(x.y z)

fz(z)

fz(z) .

I = fx,Y.z(x, y. z) . if

fxlY,z(x y, z) fy,z(y, z)

Y,Z y, z f ( )

> O.

There is also an analog of the multiplication rule,

fx.y.z(x. y. z) fxlY,z(x y. z)fYlz(Y z)fz(z),

and of other formulas developed in this section. Conditional Expectation For a continuous random variable X, we define its conditional expectation

E[X I A] given an event A, similar to the unconditional case, except that we now need to use the conditional PDF

The conditional expectation E[X I Y fXIA' = y]

fxlY'

is defined similarly, in terms of the conditional PDF Various familiar

properties of expectations carry over to the present context and are summarized below. We note that all formulas are analogous to corresponding formulas for

the case of discrete random variables. except that sums are replaced by integrals, and PMFs are replaced by PDFs.

Sec. 3. 5 Conditioning

173 Summary of Facts About Conditional Expectations

Let snd Y be jointly continuous random variables, and let

X be an event A

with

> P(A) O.

Definitions: The conditional expectation of given the event is X A

defined by

I: E[X XJXIA (X) dx. The conditional expectation of given that Y = y is defined by X

I A] =

I: xJx IY (x I y) dx.

I Y = E[X y] =

The expected value rule: For a function

g(X), we have

I:

E[g(X) g(X)JXIA (X) dx,

I A] =

and

I: g(x)JxlY (X I y) dx.

I Y = Y] E[g(X) =

Total expectation theorem: Let

that form a partition of the sample space, and assume that AI, A2, .. . ,An >0

be disjoint events

P(Ad

for all i.

Then,

= E[X] L P(AdE[X I Ai]. i=l

Similarly,

E[X] =

E[X I Y = I: y]Jy (y) dy.

There are natural analogs for the case of functions of several random variables. For example,

Y) I Y = y] = E[g(X, g(x, y)JxlY (X I y) dx,

and

! E[ (X

g , Y) I Y = E[g(X, ylJy (y) dy.

Y)] =

The expected value rule is established in the same manner as for the case of unconditional expectations. To justify the first version of the total expectation

114 General Random Variables Chap. 3

theorem, we start with the total probability theorem

fx(x) = n

L P(Ai)fxIAi (x),

i=l

x,

multiply both sides by and then integrate from - 00 to 00 .

To justify the second version of the total expectation theorem, we observe that

E[X y]fy(y) dy xfxlY(x I: y) dX I: [I: ] fy(y) dy

IY =

I: I: xfxlY(x y)fy(y) dxdy

I: I:

xfx,Y(x,y) dxdy

= L: x [L: fx,Y(X,Y)dY ]

dx

I: xfx(x) dx

= E[X].

The total expectation theorem can often facilitate the calculation of the mean, variance, and other moments of a random variable, using a divide-and­ conquer approach.

Example 3.11. Mean and Variance of a Piecewise Constant PDF. Suppose

Ix {x) = { 2/3, if 1 < x � 2,

that the random variable X has the piecewise constant PDF

1/3, if 0 � x � 1 ,

otherwise,

( see Fig. 3. 18 ) . Consider the events

Al = {X lies in the first interval [0, In, A2 = {X lies in the second interval {I , 2] }.

We have from the given PDF,

P{AI)

Ix (x) dx = 3' P{A2) = Ix (x) dx = 3·

FUrthermore, the conditional mean and second moment of X, conditioned on Al and A2 , are easily calculated since the corresponding conditional PDFs IXIAI and

PDF for

.r

mean a moment is

l Ad 1 = 2'

now use

= 7 J+ _ . _ + _ . - =::-

6' 1 1

+ 2 7 =-.-+-._=-

, we see

all

z.

y) =

Le. , sets

contours are described

co n s t a nt .

are two axes are

3.19: Contou rs of the

PDF of two

variables X a nd Y with meanS J.ix . p " . and vari ances

Sec. 3.5 Conditioning 177 If X and Y are independent, then any two events of the form {X E and

A}

B}

{ Y E are independent. Indeed,

A B) yEB fx.Y(x, y) dydx 1

P(X E and Y E =

xEA yEB fx(x)fy(y) dydx 1 1

xEA fx(x) dx 1 fy(y) dy 1 yEB

= P(X E P Y A) E ( B).

In particular, independence implies that = P(X ::; Y ::; = P(X ::; P(Y ::; =

Fx,Y(x. y)

x, y)

x) y) Fx(x)Fy(y).

The converse of these statements is also true: see the end-of-chapter problems. The property

Fx.y(x. y) Fx(x)Fy(y).

x. y.

for all

can be used to provide a general definition of independence between two random

variables. e.g., if X is discrete and Y is continuous.

An argument similar to the discrete case shows that if X and Y are inde­ pendent, then

E [g(X)h(Y)] = E [g(X)] E [h(Y)] .

independent

for any two functions 9 and h. Finally. the variance of the sum of

random variables is equal to the sum of their variances.

Independence of Continuous Random Variables Let X and Y be jointly continuous random variables.

X and Y are independent if

!x,Y(x, y) !x(x)fy(y),

x, y .

for all

If X and Y are independent, then

E[XY] = E[X] E[ Y ] .

for

g the

g(X)

and h ( Y ) are independent , and we have

[g(X)h(Y)] = [g(X)] [h(Y)] .

Y are

+ Y) = var ( X ) + var(Y ) .

3.6 THE In many situations� we represent an unobserved phenomenon by a random

able X w i t h PDF f x and we make a noisy measurement Y , which is modeled

in a conditional

!y!x . Once

value

Y is

information does it prov ide on the unknown value of

This setting is similar

1 . 4, where we

rule

it to

we are now dealing w ith continuous random

see

The only

Schematic d e s c r i pt i on of the inference pr obl e m . We have an

random variable X with k no wn

and we obt ain a measurement

Y accordi ng to a cond i tio n al PDF !YIX ' Given an observed value y of Y 1 the i n ference problem is to evaluate the co n d it i o n al PDF fX I Y (x

I y) .

{Y = y} is tured by the

information i s provided

! X I Y I y) . It

to evaluate this

From the formulas f X Iy = Ix.y = Iy jY, i t follows that

IY

I y) =

ex- pression is

!XI Y I y) dx = 1 , an

!XIY

I y) =

icc

f X (t)!YIX (y I t) dt

Sec. 3.6 The Continuous Bayes ' Rule 179

Example 3.19. A light bulb produced by the General Illumination Company is known to have an exponentially distributed lifetime Y. However, the company has been experiencing quality control problems. On any given day, the parameter ). of the PDF of Y is actually a random variable, uniformly distributed in the interval

[1 . 3/2] . We test a light bulb and record its lifetime. What can we say about the underlying parameter ).?

We model the parameter ). in terms of a uniform random variable A with PDF

for 1 < ). < - .

- -2

The available information about A is captured by the conditional PDF !AIY(). I y), which using the continuous Bayes' rule, is given by

f AIY (). I ) = y !A().)!YIA (y l ).) = 3/2

2).e- >'Y

!. dt

for - 1 < ). <

!A(t)!YIA(y I t) dt

2te-ty

Inference about a Discrete Random Variable In some cases, the unobserved phenomenon is inherently discrete. For some

examples, consider a binary signal which is observed in the presence of normally distributed noise, or a medical diagnosis that is made on the basis of continuous measurements such as temperature and blood counts. In such cases, a somewhat different version of Bayes' rule applies.

We first consider the case where the unobserved phenomenon is described in terms of an event whose occurrence is unknown. Let

A P(A)

A.

be the probability of

event Let Y be a continuous random variable, and assume that the conditional

fYIA(Y) fYIAc(y)

PDFs and are known. We are interested in the conditional

P(A y)

A,

probability

IY =

of the event given the value of Y.

{Y y},

Instead of working with the conditioning event

{y y o},

which has zero

probability, let us instead condition on the event

where 0 is

a small positive number, and then take the limit as 0 tends to zero. We have, using Bayes' rule, and assuming that

fy(y)

I < Y � P(A + 0) y) P(A y y

IY=

P(A)P(y y A)

� Y� +0I

P(y y � P(A)fYIA (y)o

::; Y � + 0)

fy(y)o P(A)fYIA (y)

fy(y)

The denominator can be evaluated using the following version of the total prob­ ability theorem:

180 General Random Variables Chap. 3

so that

P(A)fYIA(Y)

P(A Y)

. P(A)fYIA (y) P(Ac)fYIAC (y)

A = {N n},

In a variant of this formula, we consider an event of the form

where is a discrete random variable that represents the different discrete

PN

possibilities for the unobserved phenomenon of interest. Let

N. n

be the PMF

of Let also Y be a continuous random variable which, for any given value

N,

fYIN(Y n).

of is described by a conditional PDF

I The above formula becomes

n).

= I Y = P(N = n y)

The denominator can be evaluated using the following version of the total prob­ ability theorem:

so that

= IY = = PN(n)fYIN(Y P(N n y)

n) . I

LPN(i)fYIN(Y i)

where N is normal noise, with zero mean and unit variance, independent of S. What .

Example 3.20. Signal Detection. A binary signal S is transmitted, and we are

given that P(S = 1) = p and P(S = -1) = I -p. The received signal is Y = N + S

is the probability that S = 1, as a function of the observed value y of Y?

Conditioned on S = s, the random variable Y has a normal distribution with mean s and unit variance. Applying the formulas given above, we obtain

--e P _(y_ 1)2/2

P(S = 11 Y = y) = ps(l)fYIS(Y =

-- v"21r

fy (y)

--e p _(y_l)2/2

e _(y+l)2/2 '

v"21r

which simplifies to

P (S = 1 IY = y) = peY + -

Note that the probability P(S = 11 Y = y) goes to zero as y decreases to - 00 ,

goes to 1 as y increases to 00, and is monotonically increasing in between, which is consistent with intuition.

Sec. 3.6 The Continuous Bayes ' Rule 181 Inference Based on Discrete Observations

P(A I Y = y)

We finally note that our earlier formula expressing in terms of

fYIA (y)

can be turned around to yield

Based on the normalization property J�oc

fYIA(Y) dy =

1. an equivalent expres-

sion is

YIA Y f ( ) - fy(y)P(A I Y = y) - jOC . -

fy(t) P(A I Y = t) dt

oc

This formula can be used to make an inference about a random variable when

A is of the form

an event is observed. There is a similar formula for the case where the event

A {N = n}, N

Y PNly(n y).

where is an observed discrete random variable that

depends on in a manner described by a conditional PMF

Bayes' Rule for Continuous Random Variables Let

be a continuous random variable.

If X is a continuous random variable, we have

fy(y)fxlY(x fx(x)fYlx(Y I x),

I y)

and

fx(x)fYlx(Y lx) XIY x y ( I ) - fx(x)fYlx(Y lx)

fy(y)

fx(t)fYlx(Y I t) dt

If is a discrete random variable, we have

fy(y) P(N = I = y) = pN(n)fYIN(Y In),

resulting in the formulas

182 General Random Variables Chap. 3

and

y) - 00 f y (y) (N y) .

f nlY fy(y) P(N = n l Y = _ = = YIN Y (I)- n -

PN (n)

f y(

t ) P(N = n I Y

= t) dt

There are similar formulas for P ( AI Y = and y) fYIA(Y).

3.7 SUMMARY AND DISCUSSION Continuous random variables are characterized by PDFs, which are used to cal­

culate event probabilities. This is similar to the use of PMFs for the discrete case, except that now we need to integrate instead of summing. Joint PDFs are similar to joint Pl\lFs and are used to determine the probability of events that are de­ fined in terms of multiple random variables. Furthermore, conditional PDFs are similar to conditional PMFs and are used to calculate conditional probabilities, given the value of the conditioning random variable. An important application is in problems of inference, using various forms of Bayes' rule that were developed in this chapter.

There are several special continuous random variables which frequently arise in probabilistic models. We introduced some of them, and derived their mean and variance. A summary is provided in the table that follows.

Summary of Results for Special Random Variables Continuous Uniform Over [a, b]:

if a� < b,

otherwise,

E [ X = a+b,

(b - a)2

var

Exponential with Parameter 'x:

if � = { ,Xe-Ax, 0, fx(x) = { 1 - e-AX,

if > 0,

Fx(x) 0,

otherwise,

otherwise,

var(X) = ,X2 .

Sec. 3. 7 Summary and Discussion

Normal with Parameters 11 and a2 > 0:

fx (x) =

e-(X-IL) /20' ,

V"iia

E [ X ] = 11, var ( X )= a2.

We have also introduced CDFs, which can be used to characterize general random variables that are neither discrete nor continuous. CDFs are related to PMFs and PDFs, but are more general. For a discrete random variable, we can obtain the PMF by differencing the CDF; for a continuous random variable, we can obtain the PDF by differentiating the CDF.

1 84 General Random Variables Chap. 3

PROBLEMS

SECTION 3. 1 . Continuous Random Variables and PDFs Problem 1. Let X be uniformly distributed in the unit interval [0, 1] . Consider the

random variable Y = g(X), where

g(x) =

if x ::; 1/3,

if x> 1/3.

Find the expected value of Y by first deriving its PMF. Verify the result using the expected value rule.

Problem 2. Laplace random variable. Let X have the PDF

f x ( ) .A x = '2e ->'Ix l ,

where .A is a positive scalar. Verify that f x satisfies the normalization condition, and evaluate the mean and variance of X.

Problem 3. * Show that the expected value of a discrete or continuous random vari­ able X satisfies

J.x

J.x

E[X] =

P(X > x) dx

P ( X < -x) dx.

Solution. Suppose that X is continuous. We then have

where for the second equality we have reversed the order of integration by writing the set {(x. y) 1 0 ::; x < oc. x ::; y < oc} as { (x. y) 1 0 ::; x ::; y, 0 ::; y < oo } . Similarly. we can show that

J.=

P(X < -x) dx

yfx (y) dy.

Problems

Combining the two relations above, we obtain the desired result.

If X is discrete, we have

IX> o LPX(Y) y>x

P(X > x) =

(IV 0 P

x (y) dx )

y>O

LPX(Y)( III )

dX

y>O

and the rest of the argument is similar to the continuous case.

Problem 4. * Establish the validity of the expected value rule

I:

E [ g(X) ] =

g(x)fx (x) dx,

where X is a continuous random variable with PDF fx .