Density Required
proportion
Variable Fixed
interval
2.6 THE NORMAL DENSITY FUNCTION
2.33
normal density function.
Minitab or similar program recommended
72
CHAPTER 2 DESCRIBING PATTERNS IN DATA
CitGrad.dat
a. Construct a stem-and-leaf diagram of “ graduate degree.”
b. Which cities have an unusually large percentage of workers holding gradu-
ate degrees? Refer to Exercise 2.32. Consider the
first ten cities in the first column of the list. Construct back-to-back stem-and-leaf diagrams of these ten cities and the remaining cities. Does there appear to be a
difference between the two groups of cities with respect to the percentage of workers with graduate degrees?
Density histograms provide good visual representations of data distributions. But the appearance of a density histogram can change as the number and width of the
class intervals change. The outline of a density histogram, by construction, is not very smooth. Moreover, density histograms are somewhat awkward to use. If we
want to find the proportion relative frequency of the data set falling in some fixed interval, we must sum the areas of the vertical bars over the chosen interval. If one
or both of the endpoints of our fixed interval fall within class intervals, as in the diagram shown here, then it is necessary to interpolate to find the required propor-
tion.
When density histograms are symmetric about a single peak and look like the outline of a bell, they can often be closely approximated by a smooth curve known
as the You may already be familiar with the bell-
shaped normal density curve because it often serves as a model for the distribution of examination scores. Areas under a normal density curve can then approximate
density histogram relative frequencies. The advantage of using a single mathematical function, like the normal density
function, to represent a distribution of data is that it is always available in a com- pact form. Histograms of data from a variety of sources may display very similar
features. If so, they may all be represented by the same mathematical function. This function can be used to make statements about the size of future measurements
and to develop procedures that allow us to generalize from a sample to a popula- tion.
Large σ
Small σ
µ σ
σ
Figure 2.13
normal distribution
4 4
Two Normal Density Functions with Different Standard Deviations
The normal density function with mean and standard deviation
will be denoted by N
, .
N ,
X x
73
2.6 THE NORMAL DENSITY FUNCTION
The normal density function is usually associated with Pierre Laplace and Carl Gauss, who, working somewhat independently in the 18th and 19th centuries, fig-
ured prominently in its development. Gauss, motivated by errors in astronomical measurements, derived the function mathematically as a distribution of errors. He
called his error distribution the “normal law of errors.” Subsequent scientists and data collectors in a wide variety of fields found that their histograms exhibited
the common feature of first gradually rising in height to a maximum and then decreasing in a symmetric manner. Although there are other functions exhibit-
ing this property, the normal density seemed to “fit” the data in so many real- life situations that many of its proponents believed that if data did not conform
to the normal curve, the data collection process must be suspect. In this con- text, Gauss’s function became known as the
and the name held.
There are many normal distributions, but all normal curves have the same overall shape. A particular normal distribution is determined once its mean
mu and standard deviation
sigma are specified. The mean is the balancing point of the normal curve; because a normal distribution is symmetric, it is also the median.
Changing the value of the mean changes the location of the normal curve on the horizontal axis. The standard deviation measures spread. As the standard deviation
decreases, the normal curve becomes more tightly concentrated about its center mean . Two normal density functions with the same mean but different standard
deviations are shown in Figure 2.13.
The two points along the horizontal axis at which the normal curve changes from curving more steeply downward to curving less steeply downward beginning
to flatten out are located a distance on each side of the mean
. Consequently, it is possible to guess the values of
and from a graph of the normal density
function. So if we want to refer to a normal distribution with
4 and 3, we write
4 3 . Furthermore, we shall use uppercase letters, such as ,
to represent the variable whose measurements have a theoretical distribution like the normal distribution, and we shall use lowercase letters, for example, , to represent a
particular measurement. With a little mathematics, it is possible to show that the total area under any normal
curve is 1. In addition, we have the following rule: 2
2
m s
s m
m s
m s
m s m
s
3 2
1 –1
–2 –3
99.7 of area 95 of area
68 of area
v v
v
v v
v
Figure 2.14 68 – 95 – 99.7 rule
4
The Normal Distri- bution 68 – 95 – 99.7 Rule
s x
s x
s x
x s
74
The Normal Distribution 68 – 95 – 99.7 Rule
CHAPTER 2 DESCRIBING PATTERNS IN DATA
For any normal density function, 68 of the area under the curve is contained within 1 standard deviation
of the mean. 95 of the area under the curve is contained within 2 standard deviations
of the mean. 99.7 of the area under the curve is contained within 3 standard devia-
tions of the mean.
The allows us to think about the nature of normal distributions
without having to make repeated mathematical calculations. Note the similarity between this rule and the empirical rule. Recall that the empirical rule talks about the proportion
of a data set that falls within 1, 2, and 3 sample standard deviations of the sample mean. Specifically, the empirical rule states that
At least 68 of the data fall within of .
At least 95 of the data fall within 2 of . At least 99.7 of the data fall within 3 of .
In fact, the empirical rule comes from assuming that a data frequency distribution can be approximately represented by a normal distribution with a mean
equal to the sample mean
and a standard deviation equal to the sample standard deviation . For
small data sets, it is difficult to determine whether a normal distribution approximation is warranted because there is little information in few observations. With large data
sets, we can get a better picture of the shape of their distributions. If a normal density curve provides an adequate model for the data distribution, the empirical rule will
provide an accurate summary of the variation.
Figure 2.14 illustrates the 68 – 95 – 99.7 rule for a normal distribution with 0 and
the measurements expressed in units of ; so, for example,
2 in the figure stands for 2
m s
m s
p
linear transformation
standardized variable. standard normal
distribution.
4
4 `
4 `
4 `
`
4 `
`
4 `
4 N
,
x y
y a
bx y
x y
a bx
X Y
a bX
X Y
a b
b Y
X N
, Y
a bX
N a b , b
X Z
X
X N
, Z
N ,
75
A Linear Transformation of a Normal Variable
p
THE STANDARD NORMAL DISTRIBUTION
2.6 THE NORMAL DENSITY FUNCTION
In this section, we focus our attention on a normal variable and patterns of data that are well approximated by the bell-shaped normal curve. We will encounter other data models in this book, and we will often find
it convenient to work with standardized variables in those contexts.
u u
u u
1 2 1 2
2 and so forth. In particular, for
1, the plot in the figure is the 0 1 density
function. Normal distributions serve as good data models for scores on psychological tests or
subject-matter examinations taken by a broad spectrum of individuals. Measurements from homogeneous biological populations that yield data on, say, bone lengths or
corn production, tend to be normally distributed. Data from stable processes collected over time, such as stock rates of return, are often well represented by a normal
distribution. Finally, repeated careful measurements of the same quantity, like the moisture content in portions of ground cheese from Chapter 1, are nearly normally
distributed.
If two variables and
are related by the expression
then is said to be a
of . The name linear transformation
comes from the fact that a plot of is a straight line.
Let be a variable whose values, theoretically, are normally distributed with mean
and standard deviation , and let be a linear transformation of
. Then the values of
will have mean and standard deviation
. In addition, will have a normal distribution.
If is distributed as
, then is distributed as
.
One linear combination is particularly convenient for normal data. Define the variable
1
This variable is called a By construction, a standardized
variable always has mean 0 and standard deviation 1. Consequently, from our previous result, if
has a distribution,
has a 0 1 distribution.
A normal distribution with mean 0 and standard deviation 1 is called the
2
2 2
s s
m s
m s
m s m
s
m m
s s
s
m s
.8962
1.26 z
z
. . .
Figure 2.15 standardized observations,
4 4
`
4 4
`
4 4
4 4
4
Area Under the Curve to the Left of
N ,
z .
4
How to Read from Table 3, Appendix B for
1 26 1 2
06
Z X
X Z
X Z
X Z
x s
n x
x z
i , ,
, n s
left z
N ,
, z z
. right
z .
. .
N ,
76
? ? ? ? ? ?
AREAS UNDER A NORMAL CURVE
.00 .06
.0 ..
. 1.2
- - - - - - - - - - - - - - - - - - - - .8962 ..
. 0 1
1 26
CHAPTER 2 DESCRIBING PATTERNS IN DATA
We can turn the expression around and write the variable
in terms of the standardized variable
. With a little algebra, we have
If and
are the mean and standard deviation of the normal distribution, then this equation implies that any value of the normal variable
can be written as the mean plus a multiple
of the standard deviation. In practice, data are often standardized with the sample mean,
, playing the role of
and the sample standard deviation, , playing the role of . If there are
observations, then 1 2
are the and these values have sample mean 0 and sample
standard deviation 1 see Exercise 2.46 . If the original data are approximately normally distributed, the standardized observations are approximately normally distributed.
Areas under the standard normal curve have been tabulated. Table 3 in Appendix B is a table of the area under the standard normal curve to the
of a particular value of . Thus, the table gives the area under the
0 1 curve over the interval ].
Figure 2.15 demonstrates how to read and interpret the standard normal table for 1 26.
Since the total area under any normal curve is 1, the area under the standard normal curve to the
of 1 26 is 1
8962 1038. Moreover, as we have
indicated in Figure 2.14, about .68 actually, .6827 of the area under the curve is between
1 and 1, and about .95 actually, .9545 of the area is between 2 and 2.
Of course, since the mean is also the median, .5 of the area under the 0 1 curve
is to the left of 0 and .5 of the area is to the right. Using Table 3, the symmetry of the normal density function, and simple arithmetic operations, we can determine any area
under the standard normal curve. 2
2
2`
2 2
2
m s
m s
m s
m s
z .
. .
i i
2.01 z
1.45 z
–.53 z
4
4 4
4
4 4
4 4
4 4
4 4
Solution and Discussion.
Solution and Discussion.
z .
z .
right z
. z
.
z .
. .
z .
z .
left z
. z
z .
77
EXAMPLE 2.14 Using the Normal Table
2.6 THE NORMAL DENSITY FUNCTION
Find the area under the standard normal curve for the following cases: 1.
Area to the left of 53
This area can be read directly from Table 3. The table entry corresponding to
53 is .2981. Because of the symmetry of the standard normal curve, .2981 is also the area to the
of 53.
2. Area to the right of
1 45
The desired area is 1 Area to the left of
1 45 1
9265 0735 since .9265 is the table entry corresponding to
1 45. Equiva- lently, the area to the right of
1 45 is equal, from symmetry, to the area to the of
1 45. The latter area can be read directly from Table 3 and is, as expected, .0735.
3. Area between
0 and 2 01
2
2
2 2
2
–1.195 .83
z
p
4 4
4 4
4 4
4
4 4
4
4
4 4
4 4
4 4
4 4
Solution and Discussion.
Solution and Discussion.
z z
. z
. z
. .
. z
. z
.
z .
z .
. .
z .
. .
. .
. Hint:
. .
. z
z x
N ,
, x
z x
z .
N ,
, N
, , .
. .
. .
. .
78
p
2 2
2 2
2 2
CHAPTER 2 DESCRIBING PATTERNS IN DATA
Since 1 195 is halfway between
1 200 and 1 190, the table entry corresponding to
1 195 is halfway between the table entries for
1 200 and 1 190, respectively.
The area between 0 and
2 01 is given by the area to the left of
2 01 minus the area to the left of 0. We know the area to
the left of 0 is .5 verify with the table . Table 3 indicates that the area to the left of 2.01 is .9778 and, consequently, the area between 0 and 2.01 is 9778
5000 4778.
4. Area between
1 195 and 830
Again, the area between 1 195 and
830 is the area to the left of .830 minus the area to the left of
1 195. From Table 3, the area to the left of .830 is .7967. To determine the area to the left of
1 195, we must interpolate since the
values in the table are given to only two decimal places. Interpolating between the table entries for
1 200 and 1 190, we find the area to
the left of 1 195 to be .1161. The required area is then 7967
1161 6806.
When evaluating areas under a normal curve, it is a good idea to sketch the curve and then darken the required area. This will often immediately indicate the
arithmetic required if any to determine the area from Table 3 entries.
Notice that virtually all the area under the standard normal curve is contained between
3 5 and 3 5. Areas to the left of 3 5 and to the right of 3.5 are extremely
small. Table 3 gives .0002 for each area. Consequently, for values more extreme than
these, we typically ignore the areas to the left of the negative extreme values and to the right of the positive extreme values.
A table of standard normal curve areas and the relationship can be
used to find the area under any normal density curve. To illustrate, suppose a normal density function with mean 10 and standard deviation 2 is a good representation
of a particular density histogram, and we are interested in the proportion of the data between, say, 6 and 11. This proportion is approximated by the area under the
10 2 curve over the interval [ 6 11 ]. The values of a normal variable can be converted to the values of a standard normal variable. In this case,
10 and 2,
so a value of 6 corresponds to a value of
2. Similarly, a value of 11 converts to a standard normal value of
5. The area under the
10 2 function over the interval [ 6 11 ] is exactly the same as the area under the standard normal curve,
0 1 , over the interval [ 2 5 ]. The latter
6 10 2
11 10 2
2 2
2 2
2 2
2 2
2 2
2
2 2
2
2 2
m s
m s
–2 –1
1 2
N 0,1 z =
x – 10 2
10 Area
Area 6
8 12
14 x
N 10, 2
Figure 2.16
4
4 4
4 4
4 4
4 4
4 4
4
Solution and Discussion.
Area Under the Curve over the Interval [ 6, 11 ]
N ,
N ,
c d
c d
N ,
c, d c
d N
, ,
c, d c, d
X N
, c
d x
x
z .
z .
z .
z .
. .
. X
N ,
.
79
EXAMPLE 2.15 Determining Areas Under Normal Curves
10 2
2.6 THE NORMAL DENSITY FUNCTION
F G
area can be determined with the help of the standard normal table. This situation is illustrated in Figure 2.16.
In general, we have the following. Suppose we are interested in the area under the
distribution curve between two numbers and
with . Then
Area under over the interval [
] Area under
0 1 over the interval Since single points have zero width, the area under a normal curve over the
interval [ ] is the same as the area over the interval
— the interval without the endpoints. That is, the area under the curve does not change if we include or exclude
one or both of the endpoints of the target interval.
1. Suppose
is approximately distributed as 100 5 . Determine the area under
this normal density between 97 and 110. When
97 and 110, the area under the curve
between 97 and
110 is the same as the area under the standard normal density between
97 100
110 100
60 and 2 00
5 5
Using Table 3, the latter area is the area to the left of 2 00 minus the area to the
left of 60, or 9772
2743 7029.
2. Suppose
is approximately distributed as 12 3 . Determine the area to the
right of 8 5.
, 2
2
2 2
2
2 2
2 2
m s m s
m m
s s
4 4
4 4
4 4
4 4
4 4
4
4 4
4 4
4 4
` 4
4
4 4
Solution and Discussion.
Solution and Discussion.
x .
x .
. x
. z
. z
. .
. z
. z
. X
N ,
x x
N ,
x
x .
x z
. z
z .
.
x .
x x
. .
x .
N ,
N ,
x z
x .
.
80
CHAPTER 2 DESCRIBING PATTERNS IN DATA
1 2
Since the total area under any normal curve is 1, the area to the right of
8 5 is equal to 1
Area to the left of 8 5
8 5 12
Converting 8 5 to
1 167, we calculate the required area: 3
1 Area to the left of
1 167 1
8784 1216
where .8784 is obtained by interpolating between the entries corresponding to 1 16
and 1 17 in Table 3.
3. Suppose
is approximately distributed as 44 6 , and we know that a proportion
.90 of the area under this curve is to the left of a value . That is,
is the 90th percentile of the
44 6 distribution. Determine . We are given
Area to the left of 90
This implies that 44
Area to the left of 90
6 From Table 3, we can determine the value of
that has .90 or approximately .90 of the area under the standard normal density to the left of it. Using the table, we find
Area to the left of 1 28
8997 which is nearly .90. Setting
44 1 28
6 and solving for , we get
44 1 28 6
51 68. Consequently, 51 68 is the
90th percentile of the 44 6 distribution.
On occasion, we might want to judge whether the observed value of a normal variable is, in some sense, unexpected. If area is determined from a
curve, an unexpected or unusual value
is one that is too far from the mean . Equivalently, the absolute value of
is too large. A value for the standardized variable less than
2 or greater than 2 could be considered large because each of the tail areas is .0228 and the combined area 2 0228
0456 is small. We will continue to elaborate on this idea of “unusual” or “unexpected” as we develop the central statistical
procedures. 2
2 2
2 2 2
2 2
2
2
2
2 2
m s m
m s
2.34
2.35
2.36
4 4
4 4
4 4
4 4
4 4
An Outlier?
z .
z .
z .
z .
z .
z .
z .
z .
z .
z .
81
2.6 THE NORMAL DENSITY FUNCTION
John Chase
Not all data patterns can be reasonably approximated by a normal curve. Therefore, if a normal distribution is tentatively assumed to be a plausible data model in a particular
case, this assumption must be checked once the sample observations are in hand. We consider this in Chapter 7, where we discuss the normal distribution in greater detail.
Find the area under the standard normal curve to the left of a.
1 16 b.
24 c.
57 d.
2 1 Find the area under the standard normal curve to the left of
a. 77
b. 1 68
c. 21
d. 1 39
Find the area under the standard normal curve to the right of a.
84 b.
2 25
EXERCISES
2 2
2 2
.2643 z
b
z
–1 z
.756
e
z –z
.6528 .20
c
z .35
1.82
f a
d
z .59
2.37
2.38
2.39
2.40
4 4
4 4
4 4
4 4
4 4
4 4
4 4
4 4
4 4
4 4
4 4
z .
z .
z .
z .
z .
z .
z z
. z
. z
. z
. z
. z
. z
. z
. z
. z
. z
. z
. z
z .
z .
z
82
CHAPTER 2 DESCRIBING PATTERNS IN DATA
c. 1
d. 1 595 interpolate
Find the area under the standard normal curve to the right of a.
21 b.
2 03 c.
67 d.
1 115 interpolate Find the area under the standard normal curve over the interval
a. 0 to
37 b.
42 to 1 06
c. 1 62 to
09 d.
25 to 1 966 interpolate
Find the area under the standard normal curve over the interval a.
2 07 to 04
b. 1 12 to
35 c.
77 to d.
69 to 1 893 interpolate
Identify the values in the following diagrams of the standard normal distribu-
tions interpolate as needed . 2
2
2 2
2 2
2 2
2 2
a
z
b
z .125
.20
c
z –z
.668
2.0
d
z .888
2.41
2.42 2.43