Chp03 – General Random Variables
Introduction to Probability
SECOND EDITION Dimitri P. Bertsekas and John N. Tsitsiklis
Massachusetts Institute of Technology
WWW site for book information and orders
http://www.athenasc.com
Athena Scientific, Belmont, Massachusetts
General R andom Variable s
Contents
3 . 1 . Continuous Random Variables and PDFs
p. 140
3.2. Cumulative Distribution Functions . . .
p. 148
3.3. Normal Random Variables . . . . . . .
p. 153
3.4. Joint PDFs of Multiple Random Variables
3.6. The Continuous Bayes' Rule
p. 178
p. 182 Problems . . . . . . . . .
3.7. Summary and Discussion
p. 184
General Random Variables Chap. 3 Random variables with a continuous range of possible values are quite common;
the velocity of a vehicle traveling along the highway could be one example. If the velocity is measured by a digital speedometer, we may view the speedometer's
reading as a discrete random variable. But if we wish to model the exact velocity,
a continuous random variable is called for. Models involving continuous random variables can be useful for several reasons. Besides being finer-grained and pos sibly more accurate, they allow the use of powerful tools from calculus and often
admit an insightful analysis that would not be possible under a discrete model. All of the concepts and methods introduced in Chapter 2, such as expec tation, PMFs, and conditioning, have continuous counterparts. Developing and interpreting these counterparts is the subject of this chapter.
3.1 CONTINUOUS RANDOM VARIABLES AND PDFS
f x,
A random variable X is called continuous if there is a nonnegative function
called the probability density function of X , or PDF for short, such that
1 fx(x) dx,
P(X E B) =
for every subset B of the real line. t In particular, the probability that the value of X falls within an interval is
< = f.b fx(x) dx,
P(a X b)
and can be interpreted as the area under the graph of the PDF ( see Fig. 3. 1 ) . For any single value a, we have P (X = a) =
faa fx{x) dx
= O. For this reason,
including or excluding the endpoints of an interval has no effect on its probability:
P(a � X � b) = P(a X b) = P (a <X b) = P(a X � < b) . < < <
fx
Note that to qualify as a PDF, a function must be nonnegative, I.e. ,
fx(x)
x,
;:::: 0 for every and must also have the normalization property
I: fx{x)dx
P(-oc <X oc ) = 1.
Ix (x) dx fB is to be interpreted in the usual calculus/Riemann
t The integral
sense and we implicitly assume that it is well-defined. For highly unusual functions and sets, this integral can be harder or even impossible - to define, but such issues
belong to a more advanced treatment of the subject. In any case, it is comforting to know that mathematical subtleties of this type do not arise if Ix is a piecewise continuous function with a finite or countable number of points of discontinuity, and
B is the union of a finite or countable number of intervals.
that X takes a value in an
is the shaded area in
must
( X (t)
of the mass per around x. If 6 is very that X takes x + 0] is which
is
if 0 :::; X :::; 1,
General Random Variables 3
PDF fxCr)
1 b- a
3 . 3 : The PDF a uni form random variable.
a b .1'
for some constant c. constant can
property
tvlore generally, we can consider a random variable X that takes values in
an interval [a, b] )
again assume that
two subintervals of the same length
to
as or
Ix (x) =
if a � x � b,
(cf. Fig. constant value of the
within
b] is determined from the
normalization property. Indeed, we have
Ib a. b-a
dx =
Ix
PDF. Alvin's driving
between 20
case.
is sunny
the driving time,
as
a random variable X?
We interpret the statement "all times are equally likely" in t he sunny cases, to mean
is co nst a nt
[ 1 5 , 20] and [20, 25] . Furthermore, since these two intervals contain PDF
zero
if 15 ::; x < 20, if 20 ::; x ::; 25,
C2 of a sunny
are some constants. constants
:3 2 = P (sunny d ay) =
(x) =
Cl
3.4: A piecewise constant PDF involving three intervals.
3 1 = P ( ain y day ) = (x) =
C2
C2 a
so
15 ' -.
constant form if a, � x < ai+ l l i=I,2, ... ,n-I,
) constants Cl, C2, ... ) Cn determined
a 1 a2 : ' - . ) an are some with at < ai+ 1 for all i, and are
by
as in
m ust be such that the normalization
(x) dx = n-l1ai+1 Ci dx = ci (ai+ l - ad.
{ O.
1 if 0 < x ::; 1, !X (X) =
becomes infinitely large as x approaches zero� this is still a valid
fx (x)
144 General Random Variables Chap. 3
Summary of PDF Properties
fx
Let X be a continuous random variable with PDF
fx(x)
x.
� 0 for all
I: fx(x) dx
= l.
([x, x f x (x)
If 8 is very small, then P
For any subset B of the real line,
fa fx(x) dx.
P X E B) =
Expectation The expected value or expectation or mean of a continuous random variable
X is defined by t
I: xfx(x) dx.
E[X] =
This is similar to the discrete case except that the PMF is replaced by the PDF, and summation is replaced by integration. As in Chapter 2. E[X] can be inter preted as the " center of gravity" of the PDF and, also. as the anticipated average value of X in a large number of independent repetitions of the experiment. Its mathematical properties are similar to the discrete case - after all, an integral is just a limiting form of a sum.
If X is a continuous random variable with given PDF. any real-valued function Y = g(X) of X is also a random variable. Note that Y can be a continuous random variable: for example, consider the trivial case where Y = g(X) = X. But Y can also tUrn out to be discrete. For example. suppose that
t One has to deal with the possibility that the integral f�oc> xix (x) dx is infi nite or undefined. More concretely. we will say that the expectation is well-defined if
Ixlix (x) dx < ce. In that case, it is known that the integral f�Xi xix (x) dx f�x takes
a finite and unambiguous value. For an example where the expectation is not well-defined. consider a random
variable X with PDF ix(x) = c/(l + x2), where c is a constant chosen to enforce the normalization condition. The expression Ixlix (x) can be approximated by c/lxl when Ixl is large. Using the fact fiX (Ilx) dx = x. one can show that r::=x Ixlix (x) dx = :)c .
Thus. E[X] is left undefined. despite the symmetry of the PDF around zero. Throughout this book. in the absence of an indication to the contrary, we implic itly assume that the expected value of any random variable of interest is well-defined.
Sec. 3. 1 Continuous Random Variables and PDFs 145
g(x) x random variable taking values in the finite set g(x) g(X)
= 1 for > O. and
O. otherwise. Then Y =
is a discrete
{O. I}. In either case. the mean of
g(X)
satisfies the expected value rule
E[g(X)] I: g(x)fx(x) dx,
in complete analogy with the discrete case; see the end-of-chapter problems.
The nth moment of a continuous random variable X is defined as E[xn] . the expected value of the random variable Xn. The variance. denoted by var (X). is defined as the expected value of the random variable (X - E[X]) 2.
We now summarize this discussion and list a number of additional facts that are practically identical to their discrete counterparts.
Expectation of a Continuous Random Variable and its Properties
fx.
Let X be a continuous random variable with PDF
The expectation of X is defined by
I: xfx(x) dx.
E[X] =
g(X)
The expected value rule for a function has the form
E[g(X)] I: g(x)fx(x) dx.
The variance of X is defined by
i: (x fx(x) dx.
var (X) =
E [ (X - E[X])2] =
- E[X]) 2
We have
0 ::; var (X) = E[X2] - (E[X]) 2 .
If Y = aX + b, where a and b are given scalars, then
E[Y] = aE[X] + b,
var (Y) = a2var (X).
Example 3.4. Mean and Variance of the Uniform Random Variable.
Consider a uniform PDF over an interval [a. b]. as in Example 3. 1 . \Ve have jx
E[X] = _ = xfx{x) dx = X· a dx
146 General Random Variables Chap. 3
b-a 2 1 a
!X2 b
1 b 2 -- . --- - a2
b-a
a+b
as one expects based on the symmetry of the PDF around (a + b)/2.
To obtain the variance, we first calculate the second moment. We have
= l __ b-a b-a l a a
dx = 1 b x2 dx
E[X2]
Thus, the variance is obtained as
(a + b) 2 _ (b - a) 2
after some calculation.
Exponential Random Variable An exponential random variable has a PDF of the form
fx(x) 0, otherwi�e,
Ae-AX ' if > °
where A is a positive parameter characterizing the PDF ( see Fig. 3.5). This is a legitimate PDF because
100 fx(x) dx
OO
dx I =
Ae-AX
= _e_.xx
Note that the probability that X exceeds a certain value decreases exponentially. Indeed, for any a > 0, we have
An exponential random variable can, for example, be a good model for the amount of time until an incident of interest takes place, such as a message
J ...
large A
3.5: The PDF
of an
random variable.
a for the time being we will analytically.
the
can be
E[X] =
var(X ) � = ,.\2 '
can be verified by straightforward as we now
E[X] =
1 -�.
by
the second
IS
E[X2] = [10
I : 100
Ax ) +
148 General Random Variables Chap. 3
Finally, using the formula var(X) = E[X2] - ( E[X]) 2 . we obtain
var(X) = ,,\2 - ,,\2 = ,,\2 '
Example 3.5. The time until a small meteorite first lands anywhere in the Sahara desert is modeled as an exponential random variable with a mean of 10 days. The
time is currently midnight. What is the probability that a meteorite first lands
some time between 6 a.m. and 6 p.m. of the first day?
Let X be the time elapsed until the event of interest, measured in days. Then, X is exponential, with mean 1/>. = 10, which yields >. = 1 /10. The desired probability is
P(1/4 � X � 3/4) = P(X � 1/4) - P(X > 3/4) = e-I/40 - e-3/40 = 0.0476.
where we have used the formula P(X � a ) = P(X > a ) = e-..\a.
3.2 CUMULATIVE DISTRIBUTION FUNCTIONS We have been dealing with discrete and continuous random variables in a some
what different manner. using PMFs and PDFs, respectively. It would be desir able to describe all kinds of random variables with a single mathematical concept. This is accomplished with the cumulative distribution function, or CDF for short. The CDF of a random variable X is denoted by
Fx
and provides the
probability P(X ::; x). In particular, for every x we have if X is discrete,
Fx(x) ::; x)
= P(X
if X is continuous. Loosely speaking, the CDF
Fx(x)
x.
" accumulates" probability "up to" the value
Any random variable associated with a given probability model has a CDF, regardless of whether it is discrete or continuous. This is because {X ::; is
x}
always an event and therefore has a well-defined probability. In what follows, any unambiguous specification of the pro babili ties of all events of the form {X ::;
.r},
be it through a PMF, PDF, or CDF. will be referred to as the probability law of the random variable X. Figures 3.6 and 3.7 illustrate the CDFs of various discrete and continuous random variables. From these figures, as well as from the definition, some general properties of the CDF can be observed.
variables. T he C D F 15 re l ated to the the formulA.
and has a staircase
at t he
mass . Note t h a t at t he of the two
1 b -a
3 . 7: CDFs of some continuous random variables. The CDF is related to t he PDF
the
dt.
Thus. t he PDF
can
be obtained from the CDF
General Random Variables Chap. 3
Properties of a CDF
Fx
The CDF
of a random variable X is defined by
Fx(x) ::; x),
x,
= P(X
for all
and has the following properties.
Fx
is monotonically nondecreasing:
x ::; y, < Fx(x) Fx(y).
if
then
Fx(x)
tends to 0 as - - 00 , and to 1 as - 00 .
Fx(x)
x.
If X is discrete, then is a piecewise constant function of
Fx(x)
x.
If X is continuous, then is a continuous function of
If X is discrete and takes integer values, the PMF and the CDF can
be obtained from each other by summing or differencing:
Fx(k) L px(i),
i=-oo
px(k) ( ::; k) ::; k Fx(k) - Fx(k
= P X - P(X
k.
for all integers
If X is continuous, the PDF and the CDF can be obtained from each other by integration or differentiation:
= dFx Fx(x) [ oo /x(t) dt, /x(x) dx (x).
(The second equality is valid for those at which the PDF is contin
uous.)
Sometimes, in order to calculate the PMF or PDF of a discrete or contin uous random variable. respectively. it is more convenient to first calculate the CDF. The systematic use of this approach for functions of continuous random
variables will be discussed in Section 4. 1 . The following is a discrete example. Example 3.6. The Maximum of Several Random Variables. You are al
lowed to take a certain test three times. and your final score will be the maximum
Sec. 3.2 Cumulative Distribution Functions
151
of the test scores. Thus,
where
your score in each test takes one of the values from 1 to 10 with equal probability 1/10, independently of the scores in other tests. What is the PMF px of the final
XI , X2 . X3 are the three test scores and X is the final score. Assume that
score?
We calculate the PMF indirectly. We first compute the CDF F x and then
obtain the PMF as
- 1), k 1, .. . ,10.
px (k)
= = Fx (k) - Fx (k
We have
F x ( k ) = P(X � k)
= P(XI � k, X2 � � k, X3 k)
k) P(X2 k) P(X3 k)
= � P(XI �
where the third equality follows from the independence of the events { Xl � k}, {X2 � k}, {X3 � k}. Thus, the PMF is given by
px ( = k ) ( k )3 (k - 1)3 =
k 1. .. . . 10.
10 -
--w-
The preceding line of argument can be generalized to any number of random variables Xl . . . . . Xn . In particular, if the events {Xl � x} . . . . , {Xn � X} are
independent for every x, then the CDF of X = max{XI , . . . . Xn} is
Fx (x) = FXI (x) .. . FXn (x).
From this formula, we can obtain px (x) by differencing ( if X is discrete ) , or fx (x)
by differentiation ( if X is continuous ) .
The Geometric and Exponential CDFs Because the CDF is defined for any type of random variable, it provides a conve
nient means for exploring the relations between continuous and discrete random variables. A particularly interesting case in point is the relation between geo metric and exponential random variables. Let X be a geometric random variable with parameter Pi that is, X is the number of trials until the first success in a sequence of independent Bernoulli
trials, where the probability of success at each trial is p. Thus, for k = 1 , 2, . . " we have P(X = k) = p(l - p)k- l and the CDF is given by
Fgeo(n) l - ( l - p)n =
p)k-l = p _ _
=1- (1 - p)n ,
for n = 1, 2, . . ..
1 (1 p)
k=l
ra.ndom variable can be
where 6 is chosen so that
1-
so
are
1,
3.9: A normal PDF and with IJ. = 1 an d 1. We observe that the PDF is
around its mean 11-, a n d has a characteristic bel l As x
In this the PDF is very dose to zero outside the i nterval
further from
t he term e-
decreases very
Inean can
we
e-
dy
154 General Random Variables Chap. 3
The last equality above is obtained by using the fact
1 -- 100 e-y2 /2 dy = 1,
J27T
which is just the normalization property of the normal PDF for the case where
Il =
0 and (j = 1.
A normal random variable has several special properties. The following one is particularly important and will be justified in Section 4. 1.
Normality is Preserved by Linear Transformations If X is a normal random variable with mean Il and variance (j2, and if a =f. 0,
b are scalars, then the random variable
Y = aX + b
is also normal, with mean and variance
E[Y] = all + b,
var(Y) = a2(j2.
The Standard Normal Random Variable
A normal random variable Y with zero mean and unit variance is said to be a standard normal. Its CDF is denoted by cI>:
1Y e-t /2 dt.
cI>(y) = P (Y ::; y) = P(Y < y)
It is recorded in a table (given in the next page), and is a very useful tool for calculating various probabilities involving normal random variables; see also Fig. 3.10.
Note that the table only provides the values of cI>(y) for y � 0, because the omitted values can be found using the symmetry of the PDF. For example, if Y is a standard normal random variable, we have
cI>( -0.5) = P(Y ::; -0.5) = P(Y � 0.5) = 1- P(Y < 0.5) = 1 - cI>(0.5) = 1 - .6915 = 0.3085.
More generally, we have
cI>( -y) = 1 - <I>(y) ,
for all y.
Sec. 3.3 Normal Random Variables 155
The standard normal table. The entries in this table provide the numerical values of �(y) = P(Y :::; y), where Y is a standard normal random variable, for y between 0 and 3.49. For example, to find �( 1.71 ) , we look at the row corresponding to 1.7 and the column corresponding to 0.01 . so that �(1 .71) .9564. When y is negative. the = value of �(y) can be found using the formula �(y)
= 1 - �( -y) .
General Randolll Variables Chap. 3
by
Y is a
it is normal.
- 1-1 = 0,
var ( Y ) =
Thus, Y is a standard normal random variable. This
the probability of any event defined in
of X: we
of and then use the standard nonnal table.
IvIean = 0
Figure 3 . 1 0 : The P DF
(y) =
of t he standard normal random variable. The w h ich is de- n oted by <1>, is recorded in a table.
Example
3 . 7. Using the Normal Table. The annual snowfall at a particular location is modeled as a normal random
with a mean of j.L = 60
that snowfall will be at least 80 i nches?
a (J =
is
X snow accumulation, viewed as
a normal random variable, and
y = X - J1
(J
be the correspondi ng standard normal random variable. We have
P (X > - 80) = P (X
) (y
where 4> is the CDF of the standard normal. We read the value 4> ( 1) from the table:
4l( 1) = 0.841 3,
Sec. 3.3 Normal Random Variables 157 so that
P(X � 80) = 1 - cIl(l) = 0.1587.
Generalizing the approach in the preceding example, we have the following procedure.
CDF Calculation for a Normal Random Variable For a normal random variable X with mean Jl and variance 0'2, we use a
two-step procedure. (a ) "Standardize" X, i.e., subtract Jl and divide by a to obtain a standard
normal random variable Y. (b ) Read the CDF value from the standard normal table:
P(X S x ) = P
Normal random variables are often used in signal processing and communi cations engineering to model noise and unpredictable distortions of signals. The following is a typical example.
A binary message is transmitted as a signal s, which is either - l or + 1 . The communication channel corrupts the transmission with additive normal noise with mean J-L = 0 and variance a2• The receiver concludes that the signal - 1 (or + 1 ) was transmitted if the value received is <
Example 3.8. Signal Detection.
0 (or � 0, respectively) ; see Fig. 3. 1 1 . What is the probability of error? An error occurs whenever - 1 is transmitted and the noise N is at least 1 so that s + N = - 1 + N � 0, or whenever + 1 is transmitted and the noise N is smaller than - 1 so that s +N= 1+N < O. In the former case, the probability of error is
= l - cIl (�) .
In the latter case, the probability of error is the same, by symmetry. The value of cIl(l/a) can be obtained from the normal table. For a = 1 , we have cIl(l/a) = cIl ( 1 ) = 0.8413, and the probability of error is 0.1587.
Normal random variables play an important role in a broad range of proba bilistic models. The main reason is that. generally speaking, they model well the additive effect of many independent factors in a variety of engineering, physical. and statistical contexts. Mathematically, the key fact is that the sum of a large
3.8. The area of t he
shaded of er ror in the two cases w here -1 and + 1 is transmitted .
number independent identically distributed (not necessarily normal) ran dom variables has an approximately normal cnp,
of the cnp of the
theorem� which will be d i s cu ss ed
of a PDF to the c ase m ult i ple random variables.
we
uce j o i n t ,
i nte r p r e t a ti o n as
p a r a ll el the discrete case.
th at two continuous random
saIne
of a joint PDF Jx.Y if fx.Y is a nonnegative function that satisfies
are
can be
((X� Y) E (x, y) dx
of
two-dimensional
above Ineans
that t he integration is carried o v e r the set B. In t he
case where
B is
{ (x: y) I a � x � b, c :::; y ::; d}, we
a of
P(a :-::: X :-::: b, c :-::: Y :-::: d) =
fx.Y Y)
159 Furthermore, by letting
Sec. 3.4 Joint PDFs of Multiple Random Variables
B be the entire two-dimensional plane, we obtain the normalization property
fx.y(x,y)dxdy
To interpret the joint PDF, we let 0 be a small positive number and consider the probability of a small rectangle. We have
lc+61a+6
P(a < X < a+o, c ::; < c+o) c a fX'y(x,y)dxdy fx.Y(a,c)·o2,
fx,Y(a, c) (a. c).
so we can view
as
the "probability per unit area" in the vicinity of
The joint PD F contains all relevant probabilistic information on the random variables Y, and their dependencies. It allows us to calculate the probability
X,
of any event that can be defined in terms of these two random variables. As
a special case, it can be used to calculate the probability of an event involving only one of them. For example, let A be a subset of the real line and consider the event E A}. We have
{X
I:
P(X p(X
i fx,Y(x, y) dydx.
E A) =
E A and Y E (-00,
Comparing with the formula
P(X
i fx(x) dx.
E A) =
fx X
we see that the marginal PDF of is given by
fx{x) i: fx.y(x, y) dy.
Similarly,
fy(y) fx,y{x, y) dx.
Example 3.9. Two-Dimensional Uniform PDF. Romeo and Juliet have a
date at a given time, and each will arrive at the meeting place with a delay between o and 1 hour (recall the example given in Section 1 .2). Let X and Y denote the
delays of Romeo and Juliet, respectively. Assuming that no pairs (x. y) in the unit square are more likely than others, a natural model involves a joint PDF of the
form
{ c.
if 0 S x S
1 and 0 Sy S I,
/x.y ( , y) =
0 , otherwise,
3 where c is a constant.
to
y)
= I,
we must have
c ==
us some S of
is an
two-dimensional plane. The corresponding u niform j oint PDF on S is deft ned to be
{ O. otherwise.
if
Y) E
area of S !
any set A C Y)
in A is
A ((X. E A) =
JJ
area of Y)
fx,y y)
(:r· Y) E A
Example
are told that t he jOint PDF of the random variables X and Y is a c onstant c on t
he set S
3. and is zero
We wish to
Y.
fx.y (x, y) = c = 1 /4, for y) E marginal PDF f X (x) for SOIne particular Xl we
(with respe ct to y) the joint PDF over the vertical. ]ine correspond ing to that x.
is
We can
3 . 1 2: The joint PDF in Example 3.10 and the resulting margi nal P D Fs.
161 Example 3 . 1 1 . Buffon's
This is a famous example, which marks the
the is
. Suppose that we a nee dle of length l on t he What is the probability that the needle will intersect one of the lines?
3 . 1 3: BuH'on 's need le. The
between t he
of t he need le and the of i ntersection of the axis of t he needle w i th the closest paral le] l ine is
sin 9. The needle will intersect the closest par- allel l ine if and only if t h i s
is l es s
t han l /2 .
1 < d so that
cannot
two
the vertical dist.ance fr om the midpoint of the needle to
(X. with a uniform joint
d/2. 0 � () � 11"/2} , so that
4/(7rd}. { otherwise.
if x E [0, d/2]
(j E
(x, O ) =
one of the lines if and only i f
seen from
3. t h e will
x < � sin
so of
(X ::; (l/2) sin e) =
Ix.s
B) dB
t T hi s problem was posed and solved in 1 by the French naturalist Buffon. A n u m b er of
of the problem have been
induding the case where
as a basis f o r experimental evaluations of 11" (among others, it named
simulation programs for co mp uti n g 11" u s i n g Buffon 's ideas .
162 General Random Variables Chap. 3
; --d 7r 1Tf/21(1/2)Sin9 0 0
= 7rd 1Tf/2 sin O dO
ITf 0
7rd
21 7rd '
The probability of intersection can be empirically estimated, by repeating the ex periment a large number of times. Since it is equal to 2l/7rd, this provides us with
a method for the experimental evaluation of 7r. Joint CDFs
X define their joint CDF by
If and Y are two random variables associated with the same experiment, we
Fx,Y(x, y) P(X � < y).
X, Y
As in the case of a single random variable, the advantage of working with the CDF is that it applies equally well to discrete and continuous random variables. In particular, if and Y are described by a joint PDF
X j X,Y,
then
Conversely, the PDF can be recovered from the CDF by differentiating:
82FX,Y
jx,Y(x,y) 8x8y (x, y).
Example 3.12. Let X and Y be described by a uniform PDF on the unit square.
The joint CDF is given by
Fx.y (x, y) ; P(X � x, Y � y) xy, for 0 :5 x, y � 1.
We then verify that
;)2 Fx,y
tf(xy)
(x y)
(x, y) = 1 = /x.y (x, y),
axay
axay
for all (x, y) in the unit square.
Sec. 3.4 Joint PDFs of Multiple Random Variables 163
Expectation If and Y are jointly continuous random variables and 9 is some function, then
g(X,
Y) is also a random variable. We will see in Section 4. 1 methods for
computing the PDF of Z, if it has one. For now, let us note that the expected value rule is still applicable and
Y) ] E[g(X, = I: I: g(x, y)fx,y(x, y) dxdy.
a,
As an important special case, for any scalars b, and c, we have
E[aX
aE[X] E
+b Y + c] =
+b r Y l+ c.
More than Two Random Variables The joint PDF of three random variables x, Y , and Z is defined in analogy with
the case of two random variables. For example, we have
P((X, j j jfx,y,z(x,y, z) dxdydz,
Y, Z) E B) =
(x,y,z) E B
for any set B. We also have relations such as
fx,Y(x, y) I: fx,Y,z(x, y, z) dz,
and
fx(x) I: I: fx,Y.z(x, y, z) dydz.
The expected value rule takes the form
Y, Z) ] E[g(X, = I: I: I: g(x, y, z)fx.Y,z(x, y, z) dxdydz,
aX +b Y + cZ ] =
and if 9 is linear, of the form
+b Y + cZ, then
E[aX
aE[X] E [
+b r Y l+ cE Z ] .
Furthermore, there are obvious generalizations of the above to the case of more than three random variables. For example, for any random variables
Xl, X2, .. . ,
X aI, a2 .
and any scalars
, an , we have
General Random Variables Chap. 3
Summary of Facts about Joint PDFs Let and Y be jointly continuous random variables with joint PDF
X /x,y.
The joint PDF is used to calculate probabilities:
p((X, B) /x,y(x, y) dxdy.
(x,y)EB JJ
Y) E =
The marginal PDFs of and can be obtained from the joint PDF,
using the formulas
I: /x.y(x, y) dy, /y(y) i:
/x(x) =
/x.y(x, y) dx.
Fx,Y(x, y) P(X ::; x, ::; y),
The joint CDF is defined by
Y and
determines the joint PDF through the formula
= 82FX Y
/x,y(x, y)
(x, y),
(x, y)
for every
at which the joint PDF is continuous.
g(X, X
A function
Y) of and defines a new random variable, and
I: 1:
E [g(X,
g(x, Y)/X,y(x, y) dx dy.
Y)] =
aX bY
If 9 is linear, of the form
+ c, we have
E[aX ] aE[X]
+ bY + c =
+ bErYl + c.
The above have natural extensions to the case where more than two random variables are involved.
3.5 CONDITIONING Similar to the case of discrete random variables. we can condition a random
variable on an event or on another random variable, and define the concepts of conditional PDF and conditional expectation. The various definitions and
formulas parallel the ones for the discrete case, and their interpretation is similar, except for some subtleties that arise when we condition on an event of the form
{Y = which has zero probability. y},
Sec. 3.5 Conditioning 165
Conditioning a Random Variable on an Event The conditional PDF of a continuous random variable X, given an event A
with P(A) > 0, is defined as a nonnegative function fXIA that satisfies
is fXIA(X) dx,
P(X E B I A) =
for any subset B of the real line. In particular, by letting B be the entire real line, we obtain the normalization property
so that fXIA is a legitimate PDF.
In the important special case where we condition on an event of the form {X E A}, with P(X E A) > 0, the definition of conditional probabilities yields
P(X E B, X E A)
P(X B I X A) E E
P(X E A) By comparing with the earlier formula, we conclude that
P(X E A)
fx(x) if
fXI{XEA}(X) P(X E A) .
E A,
otherwise.
As in the discrete case, the conditional PDF is zero outside the conditioning set. Within the conditioning set , the conditional PDF has exactly the same shape as
the unconditional one, except that it is scaled by the constant factor l/P(X E A),
so that fXI{XEA} integrates to 1; see Fig. 3. 14. Thus, the conditional PDF is similar to an ordinary PDF, except that it refers to a new universe in which the event {X E A} is known to have occurred.
Example 3.13. The Exponential Random Variable is Memoryless. The
time T until a new light bulb burns out is an exponential random variable with
parameter A. Ariadne turns the light on, leaves the room, and when she returns, t
time units later. finds that the light bulb is still on. which corresponds to the event
A {T > t}. = Let X be the additional time until the light bulb burns out. What
is the conditional CDF of X. given the event A? We have, for x � O.
P(X > x I A) = P(T > t+x IT> t) =
P(T >
T> t)
P(T > t + x)
=e
- .>.x
P (T > t)
3.14: The unconditional PDF Ix and the cond itional PDF Ix
where
conditioning event I {X E A} retains the same shape as Ix ! except that it is scaled along the vertical axis.
the A is interval [a, b] . No t e that within
where we have used the expression for the CDF of an exponential random v ariab l e
of X is
of the t i me t that elapsed between t he l i gh t ing of the bulb and Ariadne's arrival. is k n o w n as t he memorylessness property of the exponential. Generally, if we
remaining time up to completion operation started .
When
variables are
a similar notion
S upp o s e , for
are jointly
f x I y. If we condition on a probability event of the form := { (X!
with joint
E A}, w e have
y) == 0,
can (x) =
two
provide one
method for obtaining the
PDF of a random variable X is not of {X E A},
is terms of multiple random variables. fi na lly
of the total probability which
conditional
form a
Sec. 3.5 Conditioning
167 the sample space, then
!x (x) = L P(Ai)!XIAi (x).
i=l
To j ustify this statement, we use the total probability theorem from Chapter 1 , and obtain
P(X � x) = L P(AdP(X � x I Ai) .
i= l
This formula can be rewritten as
- oc x !X (t) dt
= P(Ad !XIA, (t) dt. �
j.e
- oc
We then take the derivative of both sides, with respect to x, and obtain the desired result.
Conditional PDF Given an Event
The conditional PDF !XIA of a continuous random variable X, given
an event A with P(A) > 0, satisfies P(X E B I A) = !XIA(X) dx.
Is
!XI{XEA} (x) { P(X E A)
If A is a subset of the real line with P(X E A) > 0, then
!x (x)
' 'f 1 X E A ,
otherwise.
Let A I , A2, . . . ,An be dis j oint events that form a partition of the sam ple space, and assume that P(Ai) > 0 for all i, Then,
!X {x) L P(Ai)!xIA; (X)
i=l
( a version of the total probability theorem ) ,
The following example illustrates a divide-and-conquer approach that uses the total probability theorem to calculate a PDF.
General
Variables 3
Example 14. The metro train arrives at the station near your home every
at 6:00 a. m .
lJ
3 . 1 5 : The PDFs
jYjA ' jYI B l and
in
X l is a
over the i nterval from
your
7: 1 0 to 7:30; see
train}\ = {7: 15 < ::; 7:30} = {you board t he 7: 30
A = { 7: 1 0 � X � 7:
= { yon board t he 7:
7: to 7: 1 5. In t his case:
time Y is also uniform and takes
between
o and 5 minutes; see Fig. 3. 15(b) . Similarly, conditioned on B� Y is uniform and
s e e 3. 1 5(c) .
is
using th e total probability
(y) =
and is shown i n
3. 1 5(d) . We
(y)
for 0
for 5 = 4 < 4:
y :::; 1 5.
(y)
on
c on t inuo us
with joint
with
(y) > O. th e th a t = y, is de fi ned by
IY
This definition is analogous to
y)/py (y) for
case.
a fixed number and consider IY (x I y) as a function of the single variable x. Viewed as
thinking about th e it is
to
y as
Ix. Y (x, Y) , be- cause the denominator (y)
a I XIY ( x
I y)
the sanle
joint
as
not depend on �r : see
Iy (y) =
fx.y (:r. y)
Ix
I y)
so for
3 . 1 6 : V isualization of t he cond i t ional PDF
I y) . Let X and Y
PDF w h i c h is u n i form on t he set S.
Fo r
fixed y,
we consider the
the slice Y = y and normal ize it so
it
to L a dart at a
target of
always hits the target� all points of impact (x . y) are equally likely. so t hat the joint PDF of the random
assume
Example
area of t
if IYI � T.
is not
I y) ==
if
I Y IS
Sec. 3.5 Conditioning 171
To interpret the conditional PDF, let us fix some small positive numbers (h and 82, and condition on the event B= {y ::; Y ::; y + 82}. We have
y 82) P(x
) = P(x ::; X ::;
x 81 and y Y ::;
<X< +
x 8 1 Y - I < < -Y 8 2
P( y < _ Y < _ Y + 8) 2
fx,Y(x, y)8182
fy(y)82
= fxlY(x I y)81 •
In words, fxlY(x I y)81 provides us with the probability that X belongs to a small interval [x, x + 8J], given that Y belongs to a small interval [y, y + 82], Since fxlY(x I y)81 does not depend on 82, we can think of the limiting case where 82 decreases to zero and write
P(x <X< x + 81 I Y = y) � fxlY(x I y)81, (81 small) ,
and, more generally,
P (X E AI Y = y) = f x IY (x I y) dx.
Conditional probabilities, given the zero probability event {Y = y}, were left
undefined in Chapter 1. But the above formula provides a natural way of defining such conditional probabilities in the present context. In addition, it allows us to
view the conditional PDF fxlY(x I y) (as a function of x) as a description of the probability law of X, given that the event {Y = y} has occurred. As in the discrete case, the conditional PDF fxlY' together with the marginal PDF fy are sometimes used to calculate the joint PDF. Furthermore, this approach can also be used for modeling: instead of directly specifying fx,Y, it is often natural to provide a probability law for Y, in terms of a PDF fy,
and then provide a conditional PDF fxlY(x I y) for X, given any possible value y of Y.
Example 3. 16. The speed of a typical vehicle that drives past a police radar is modeled as an exponentially distributed random variable X with mean 50 miles per hour. The police radar's measurement Y of the vehicle's speed has an error which is modeled as a normal random variable with zero mean and standard deviation equal to one tenth of the vehicle's speed. What is the joint PDF of X and Y?
We have fx (x)
= x, the ( 1/50)e-x/5o , for x � O. Also, conditioned on X =
measurement Y has a normal PDF with mean x and variance x2/ 100. Therefore,
f YIX y ( I x) =
e-Cy-x)2 /C2x2 /100) .
J2"7r (x/ lO)
Thus, for all x � 0 and all y,
, - y ) -f ( )I YIX (I) 1 -x/50 10 -50Cy-x)2/x2 Y
f X,Y ( x -x x
x -- e e .
172 General Random Variables Chap. 3
Conditional PDF Given a Random Variable
fx,Y'
Let X and Y be jointly continuous random variables with joint PDF
The joint, marginal, and conditional PDFs are related to each other by the formulas
fx,Y(x, y) fy(y)fxlY(x y),
fx(x) [: fy(y)fxlY(x y) dy.
fxlY(x y)
The conditional PDF
I is defined only for those for which
fy(y)
We have
y) L fxlY(X y) dx.
P(X E AIY = =
For the case of more than two random variables, there are natural exten sions to the above. For example, we can define conditional PDFs by formulas such as
I = fx,Y.z(x, y, z) if > 0, fX,Ylz(x.y z)
fz(z)
fz(z) .
I = fx,Y.z(x, y. z) . if
fxlY,z(x y, z) fy,z(y, z)
Y,Z y, z f ( )
> O.
There is also an analog of the multiplication rule,
fx.y.z(x. y. z) fxlY,z(x y. z)fYlz(Y z)fz(z),
and of other formulas developed in this section. Conditional Expectation For a continuous random variable X, we define its conditional expectation
E[X I A] given an event A, similar to the unconditional case, except that we now need to use the conditional PDF
The conditional expectation E[X I Y fXIA' = y]
fxlY'
is defined similarly, in terms of the conditional PDF Various familiar
properties of expectations carry over to the present context and are summarized below. We note that all formulas are analogous to corresponding formulas for
the case of discrete random variables. except that sums are replaced by integrals, and PMFs are replaced by PDFs.
Sec. 3. 5 Conditioning
173 Summary of Facts About Conditional Expectations
Let snd Y be jointly continuous random variables, and let
X be an event A
with
> P(A) O.
Definitions: The conditional expectation of given the event is X A
defined by
I: E[X XJXIA (X) dx. The conditional expectation of given that Y = y is defined by X
I A] =
I: xJx IY (x I y) dx.
I Y = E[X y] =
The expected value rule: For a function
g(X), we have
I:
E[g(X) g(X)JXIA (X) dx,
I A] =
and
I: g(x)JxlY (X I y) dx.
I Y = Y] E[g(X) =
Total expectation theorem: Let
that form a partition of the sample space, and assume that AI, A2, .. . ,An >0
be disjoint events
P(Ad
for all i.
Then,
= E[X] L P(AdE[X I Ai]. i=l
Similarly,
E[X] =
E[X I Y = I: y]Jy (y) dy.
There are natural analogs for the case of functions of several random variables. For example,
Y) I Y = y] = E[g(X, g(x, y)JxlY (X I y) dx,
and
! E[ (X
g , Y) I Y = E[g(X, ylJy (y) dy.
Y)] =
The expected value rule is established in the same manner as for the case of unconditional expectations. To justify the first version of the total expectation
114 General Random Variables Chap. 3
theorem, we start with the total probability theorem
fx(x) = n
L P(Ai)fxIAi (x),
i=l
x,
multiply both sides by and then integrate from - 00 to 00 .
To justify the second version of the total expectation theorem, we observe that
E[X y]fy(y) dy xfxlY(x I: y) dX I: [I: ] fy(y) dy
IY =
I: I: xfxlY(x y)fy(y) dxdy
I: I:
xfx,Y(x,y) dxdy
= L: x [L: fx,Y(X,Y)dY ]
dx
I: xfx(x) dx
= E[X].
The total expectation theorem can often facilitate the calculation of the mean, variance, and other moments of a random variable, using a divide-and conquer approach.
Example 3.11. Mean and Variance of a Piecewise Constant PDF. Suppose
Ix {x) = { 2/3, if 1 < x � 2,
that the random variable X has the piecewise constant PDF
1/3, if 0 � x � 1 ,
otherwise,
( see Fig. 3. 18 ) . Consider the events
Al = {X lies in the first interval [0, In, A2 = {X lies in the second interval {I , 2] }.
We have from the given PDF,
P{AI)
Ix (x) dx = 3' P{A2) = Ix (x) dx = 3·
FUrthermore, the conditional mean and second moment of X, conditioned on Al and A2 , are easily calculated since the corresponding conditional PDFs IXIAI and
PDF for
.r
mean a moment is
l Ad 1 = 2'
now use
= 7 J+ _ . _ + _ . - =::-
6' 1 1
+ 2 7 =-.-+-._=-
, we see
all
z.
y) =
Le. , sets
contours are described
co n s t a nt .
are two axes are
3.19: Contou rs of the
PDF of two
variables X a nd Y with meanS J.ix . p " . and vari ances
Sec. 3.5 Conditioning 177 If X and Y are independent, then any two events of the form {X E and
A}
B}
{ Y E are independent. Indeed,
A B) yEB fx.Y(x, y) dydx 1
P(X E and Y E =
xEA yEB fx(x)fy(y) dydx 1 1
xEA fx(x) dx 1 fy(y) dy 1 yEB
= P(X E P Y A) E ( B).
In particular, independence implies that = P(X ::; Y ::; = P(X ::; P(Y ::; =
Fx,Y(x. y)
x, y)
x) y) Fx(x)Fy(y).
The converse of these statements is also true: see the end-of-chapter problems. The property
Fx.y(x. y) Fx(x)Fy(y).
x. y.
for all
can be used to provide a general definition of independence between two random
variables. e.g., if X is discrete and Y is continuous.
An argument similar to the discrete case shows that if X and Y are inde pendent, then
E [g(X)h(Y)] = E [g(X)] E [h(Y)] .
independent
for any two functions 9 and h. Finally. the variance of the sum of
random variables is equal to the sum of their variances.
Independence of Continuous Random Variables Let X and Y be jointly continuous random variables.
X and Y are independent if
!x,Y(x, y) !x(x)fy(y),
x, y .
for all
If X and Y are independent, then
E[XY] = E[X] E[ Y ] .
for
g the
g(X)
and h ( Y ) are independent , and we have
[g(X)h(Y)] = [g(X)] [h(Y)] .
Y are
+ Y) = var ( X ) + var(Y ) .
3.6 THE In many situations� we represent an unobserved phenomenon by a random
able X w i t h PDF f x and we make a noisy measurement Y , which is modeled
in a conditional
!y!x . Once
value
Y is
information does it prov ide on the unknown value of
This setting is similar
1 . 4, where we
rule
it to
we are now dealing w ith continuous random
see
The only
Schematic d e s c r i pt i on of the inference pr obl e m . We have an
random variable X with k no wn
and we obt ain a measurement
Y accordi ng to a cond i tio n al PDF !YIX ' Given an observed value y of Y 1 the i n ference problem is to evaluate the co n d it i o n al PDF fX I Y (x
I y) .
{Y = y} is tured by the
information i s provided
! X I Y I y) . It
to evaluate this
From the formulas f X Iy = Ix.y = Iy jY, i t follows that
IY
I y) =
ex- pression is
!XI Y I y) dx = 1 , an
!XIY
I y) =
icc
f X (t)!YIX (y I t) dt
Sec. 3.6 The Continuous Bayes ' Rule 179
Example 3.19. A light bulb produced by the General Illumination Company is known to have an exponentially distributed lifetime Y. However, the company has been experiencing quality control problems. On any given day, the parameter ). of the PDF of Y is actually a random variable, uniformly distributed in the interval
[1 . 3/2] . We test a light bulb and record its lifetime. What can we say about the underlying parameter ).?
We model the parameter ). in terms of a uniform random variable A with PDF
for 1 < ). < - .
- -2
The available information about A is captured by the conditional PDF !AIY(). I y), which using the continuous Bayes' rule, is given by
f AIY (). I ) = y !A().)!YIA (y l ).) = 3/2
2).e- >'Y
!. dt
for - 1 < ). <
!A(t)!YIA(y I t) dt
2te-ty
Inference about a Discrete Random Variable In some cases, the unobserved phenomenon is inherently discrete. For some
examples, consider a binary signal which is observed in the presence of normally distributed noise, or a medical diagnosis that is made on the basis of continuous measurements such as temperature and blood counts. In such cases, a somewhat different version of Bayes' rule applies.
We first consider the case where the unobserved phenomenon is described in terms of an event whose occurrence is unknown. Let
A P(A)
A.
be the probability of
event Let Y be a continuous random variable, and assume that the conditional
fYIA(Y) fYIAc(y)
PDFs and are known. We are interested in the conditional
P(A y)
A,
probability
IY =
of the event given the value of Y.
{Y y},
Instead of working with the conditioning event
{y y o},
which has zero
probability, let us instead condition on the event
where 0 is
a small positive number, and then take the limit as 0 tends to zero. We have, using Bayes' rule, and assuming that
fy(y)
I < Y � P(A + 0) y) P(A y y
IY=
P(A)P(y y A)
� Y� +0I
P(y y � P(A)fYIA (y)o
::; Y � + 0)
fy(y)o P(A)fYIA (y)
fy(y)
The denominator can be evaluated using the following version of the total prob ability theorem:
180 General Random Variables Chap. 3
so that
P(A)fYIA(Y)
P(A Y)
. P(A)fYIA (y) P(Ac)fYIAC (y)
A = {N n},
In a variant of this formula, we consider an event of the form
where is a discrete random variable that represents the different discrete
PN
possibilities for the unobserved phenomenon of interest. Let
N. n
be the PMF
of Let also Y be a continuous random variable which, for any given value
N,
fYIN(Y n).
of is described by a conditional PDF
I The above formula becomes
n).
= I Y = P(N = n y)
The denominator can be evaluated using the following version of the total prob ability theorem:
so that
= IY = = PN(n)fYIN(Y P(N n y)
n) . I
LPN(i)fYIN(Y i)
where N is normal noise, with zero mean and unit variance, independent of S. What .
Example 3.20. Signal Detection. A binary signal S is transmitted, and we are
given that P(S = 1) = p and P(S = -1) = I -p. The received signal is Y = N + S
is the probability that S = 1, as a function of the observed value y of Y?
Conditioned on S = s, the random variable Y has a normal distribution with mean s and unit variance. Applying the formulas given above, we obtain
--e P _(y_ 1)2/2
P(S = 11 Y = y) = ps(l)fYIS(Y =
-- v"21r
fy (y)
--e p _(y_l)2/2
e _(y+l)2/2 '
v"21r
which simplifies to
P (S = 1 IY = y) = peY + -
Note that the probability P(S = 11 Y = y) goes to zero as y decreases to - 00 ,
goes to 1 as y increases to 00, and is monotonically increasing in between, which is consistent with intuition.
Sec. 3.6 The Continuous Bayes ' Rule 181 Inference Based on Discrete Observations
P(A I Y = y)
We finally note that our earlier formula expressing in terms of
fYIA (y)
can be turned around to yield
Based on the normalization property J�oc
fYIA(Y) dy =
1. an equivalent expres-
sion is
YIA Y f ( ) - fy(y)P(A I Y = y) - jOC . -
fy(t) P(A I Y = t) dt
oc
This formula can be used to make an inference about a random variable when
A is of the form
an event is observed. There is a similar formula for the case where the event
A {N = n}, N
Y PNly(n y).
where is an observed discrete random variable that
depends on in a manner described by a conditional PMF
Bayes' Rule for Continuous Random Variables Let
be a continuous random variable.
If X is a continuous random variable, we have
fy(y)fxlY(x fx(x)fYlx(Y I x),
I y)
and
fx(x)fYlx(Y lx) XIY x y ( I ) - fx(x)fYlx(Y lx)
fy(y)
fx(t)fYlx(Y I t) dt
If is a discrete random variable, we have
fy(y) P(N = I = y) = pN(n)fYIN(Y In),
resulting in the formulas
182 General Random Variables Chap. 3
and
y) - 00 f y (y) (N y) .
f nlY fy(y) P(N = n l Y = _ = = YIN Y (I)- n -
PN (n)
f y(
t ) P(N = n I Y
= t) dt
There are similar formulas for P ( AI Y = and y) fYIA(Y).
3.7 SUMMARY AND DISCUSSION Continuous random variables are characterized by PDFs, which are used to cal
culate event probabilities. This is similar to the use of PMFs for the discrete case, except that now we need to integrate instead of summing. Joint PDFs are similar to joint Pl\lFs and are used to determine the probability of events that are de fined in terms of multiple random variables. Furthermore, conditional PDFs are similar to conditional PMFs and are used to calculate conditional probabilities, given the value of the conditioning random variable. An important application is in problems of inference, using various forms of Bayes' rule that were developed in this chapter.
There are several special continuous random variables which frequently arise in probabilistic models. We introduced some of them, and derived their mean and variance. A summary is provided in the table that follows.
Summary of Results for Special Random Variables Continuous Uniform Over [a, b]:
if a� < b,
otherwise,
E [ X = a+b,
(b - a)2
var
Exponential with Parameter 'x:
if � = { ,Xe-Ax, 0, fx(x) = { 1 - e-AX,
if > 0,
Fx(x) 0,
otherwise,
otherwise,
var(X) = ,X2 .
Sec. 3. 7 Summary and Discussion
Normal with Parameters 11 and a2 > 0:
fx (x) =
e-(X-IL) /20' ,
V"iia
E [ X ] = 11, var ( X )= a2.
We have also introduced CDFs, which can be used to characterize general random variables that are neither discrete nor continuous. CDFs are related to PMFs and PDFs, but are more general. For a discrete random variable, we can obtain the PMF by differencing the CDF; for a continuous random variable, we can obtain the PDF by differentiating the CDF.
1 84 General Random Variables Chap. 3
PROBLEMS
SECTION 3. 1 . Continuous Random Variables and PDFs Problem 1. Let X be uniformly distributed in the unit interval [0, 1] . Consider the
random variable Y = g(X), where
g(x) =
if x ::; 1/3,
if x> 1/3.
Find the expected value of Y by first deriving its PMF. Verify the result using the expected value rule.
Problem 2. Laplace random variable. Let X have the PDF
f x ( ) .A x = '2e ->'Ix l ,
where .A is a positive scalar. Verify that f x satisfies the normalization condition, and evaluate the mean and variance of X.
Problem 3. * Show that the expected value of a discrete or continuous random vari able X satisfies
J.x
J.x
E[X] =
P(X > x) dx
P ( X < -x) dx.
Solution. Suppose that X is continuous. We then have
where for the second equality we have reversed the order of integration by writing the set {(x. y) 1 0 ::; x < oc. x ::; y < oc} as { (x. y) 1 0 ::; x ::; y, 0 ::; y < oo } . Similarly. we can show that
J.=
P(X < -x) dx
yfx (y) dy.
Problems
Combining the two relations above, we obtain the desired result.
If X is discrete, we have
IX> o LPX(Y) y>x
P(X > x) =
(IV 0 P
x (y) dx )
y>O
LPX(Y)( III )
dX
y>O
and the rest of the argument is similar to the continuous case.
Problem 4. * Establish the validity of the expected value rule
I:
E [ g(X) ] =
g(x)fx (x) dx,
where X is a continuous random variable with PDF fx .