Laws of large numbers
14.29 Laws of large numbers
In connection with coin-tossing problems, it is often said that the probability of tossing heads with a perfectly balanced coin is
This does not mean that if a coin is tossed twice it will necessarily come up heads exactly once. Nor does it mean that in 1000 tosses heads
will appear exactly 500 times. Let us denote by h(n) the number of heads that occur in n
tosses. Experience shows that even for very large
the ratio
is not necessarily
However, experience also shows that this ratio does seem to approach
as n increases, although it may oscillate considerably above and below in the process. This suggests that it might be possible to prove that
Unfortunately, this cannot be done. One difficulty is that the number h(n) depends not only on n but also on the particular experiment being performed. We have no way of knowing in advance how h(n) will vary from one experiment to another. But the real trouble is that it is possible (although not very likely) that in some particular experiment the ratio
may not tend to at all. For example, there is no reason to exclude the possibility of getting heads on every toss of the coin, in which case h(n) = n and
1. Therefore, instead of trying to prove the formula in
we shall find it more reasonable (and more profitable) to ask how likely it is that
will differ from by a certain amount. In other words, given some positive number
we seek the probability
By introducing a suitable random variable and using Chebyshev’s inequality we can get
a useful upper bound to this probability, a bound which does not require an explicit knowledge of h(n). This leads to a new limit relation that serves as an appropriate substitute for (14.45).
No extra effort is required to treat the more general case of a Bernoullian sequence of
trials, in which the probability of “success” is p and the probability of “failure” is q.
(In coin tossing, “success” can mean “heads” and for p we may take Let denote the random variable which counts the number of successes in
independent trials. Then has a binomial distribution with expectation E(X) = np and variance Var
= npq.
Hence Chebyshev’s inequality is applicable; it states that (14.46)
> c)
Since we are interested in the ratio which we may call the relative frequency of success,
Laws of large numbers
we divide the inequality
> c by and rewrite (14.46) as
Since this is valid for every c > 0, we may let depend on n and write = en, where is a fixed positive number. Then (14.47) becomes
The appearance of in the denominator on the right suggests that we let This leads to the limit formula
=0 foreveryfixed
called the law of large numbers for the Bernoulli distribution. It tells us that, given any > 0 (no matter how small), the probability that the relative frequency of success differs from
by more than is a function of which tends to 0 as This limit relation gives a mathematical justification to the assignment of the probability for tossing heads with a perfectly balanced coin.
The limit relation in (14.48) is a special case of a more general result in which the “relative frequency”
independent random variables having the same expectation and variance. This more general theorem is usually referred to as the weak law of large numbers; it may be stated as follows:
is replaced by the arithmetic mean of
ben independent random variables, each having the same expectation and the same variance, say
THEOREM 14.12. WEAK
LARGE NUMBERS .
Let
n. a new random variable (called the arithmetic mean of
..., by the equation
Then, for
> 0, we have
An equivalent statement is
Calculus of probabilities
We apply Chebyshev’s inequality to For this we need to know the expectation and variance of
These are
and
(See Exercise 5 in Section 14.27.) Chebyshev’s inequality becomes > c) Letting
and replacing c by we obtain (14.49) and hence (14.50). To show that the limit relation in (14.48) is a special case of Theorem 14.12, we
assume each has the possible values 0 and with probabilities = 1) = and = 0) = 1 . Then
is the relative frequency of success in independent trials, E(X) = p , and (14.49) reduces to (14.48).
Theorem 14.12 is called a weak law because there is also a strong law of large numbers which (under the same hypotheses) states that
The principal difference between (14.51) and (14.50) is that the operations “limit” and “probability” are interchanged. It can be shown that the strong law implies the weak law, but not conversely.
Notice that the strong law in (14.51) seems to be closer to formula (14.45) than (14.50) is. In fact, (14.51) says that we have
= m “almost always,” that is, with probability
1. When applied to coin tossing, in particular, it says that the failure of Equation (14.45) is no more likely than the chance of tossing a fair coin repeatedly and always getting heads. The strong law really shows why probability theory corresponds to experience and to our
intuitive feeling of what probability “should be.”
The proof of the strong law is lengthy and will be omitted. Proofs appear in the books listed as References 1, 3, 8, and 10 at the end of this chapter.