LCL = .3635 x = .5643
UCL = .7650
– Sample mean
.40 5
10 Sample number
15 .60
.80
v v
v v
X
2.9 CHAPTER SUMMARY
4
42 3
Figure 2.21
X
exploratory data analysis. enumerative study
frame. analytic study
variable. quantitative
qualitative. nominal data.
ordinal data. discrete
continuous.
sample from a batch of microwave ovens produced at about the same time. The second three observations are a sample from a later batch of ovens, and so forth. The subgroup
means are plotted in the control chart in Figure 2.21. Since the points plotted are means, the chart is called an
The 3 limits UCL and LCL are determined using the standard deviation of the subgroup means. This standard deviation is less than the
standard deviation of the original observations. Since we have 42 observations, there are
14 sample subgroup means plotted in the chart. Notice the upward drift in the mean radiation numbers over time. This pattern is
consistent with the pattern for the individual observations in Figure 2.20. The drift in the means is clearer than the drift in the original observations, suggesting that there may
be a problem with the more recently manufactured ovens.
91
In this chapter, we have learned: The general practice of summarizing data with a collection of relatively simple tables, plots,
and numbers is called An
is a study about an identifiable, unchanging, generally finite population done at a particular point in time. The list of sampling units for an enumerative
study is called a An enumerative study is like a snapshot.
An is a study that is not enumerative. Analytic studies generally take place
over time, like a moving picture. The objective is to use current information to predict what will happen in the future. Analytic studies often involve comparisons.
The characteristic of interest in either an enumerative or analytic study is called a Variables are of two types:
and Qualitative variables are non-
numerical. The “values” for qualitative variables are categories. Numbers are often assigned to the categories to distinguish them. If the categories have no particular order, such
as male female, the numbers assigned to the categories are called If the
categories are ordered, such as the bond ratings, Aaa, Aa, . . . , the ordered numbers assigned to the categories are called
Quantitative variables are naturally numerical and may be
or Discrete variables are those variables whose values are
2.9 CHAPTER SUMMARY
s
-bar chart.
An -bar Chart for the Transformed Radia-
tion Data
v v
v
v v
v v
v
v
v
v
v v
4
1 2
3 2
2 2
3 1
1 2
3
x M
Q Q
Q Q
M s
s s
Q Q
Q Q
M Q
x y
y a
bx y
x N
, x
x s
x s
x s
x s
X N
, Y
a bX
N a b , b
4 ` 4
4 4
`
4 `
`
frequency distribution.
dot diagram, stem-and-leaf diagram,
histogram density histogram,
boxplot. mean,
median, per-
centiles. Quartiles,
robust resistant.
variance, standard
deviation, range,
interquartile range, Boxplots
five-number summary, linear transformation.
normal density function.
normal distribution. empirical rule
68 – 95 – 99.7 Rule
standard normal distribution
92
2 2
6 6
6
m s
m s
m s
m s m
s
numbers with gaps between them, such as the number of votes received by candidates in an election. Continuous variables are those variables that can, in principle, take any value in
an interval. Sometimes discrete variables are treated as continuous variables. The pattern of variability in a set of data is called its
A frequency distribution indicates the values, or categories, for the variable and the number of times each
value, or category, occurs. Frequency distributions are best characterized by graphs or plots. Possible plots for displaying
a frequency distribution include: a a
a or
and a Summary numbers indicating location include the
, the , and
The location of the mean is such that the sum of the deviations of the observations from the mean is zero. The median divides the ordered data in half.
, ,
, are percentiles that divide the ordered data into quarters. Consequently, the second quartile,
, is also the median, .
Summary measures that are not appreciably affected by a few extreme observations are said to be
or The median, for example, is a robust measure of location.
Summary numbers measuring variation include the following: the , the
, the Range
Max Min; and the
IR . The interquartile range is a robust measure of variation.
are pictorial representations of the Min,
, ,
, and Max. Variables
and that are connected by the expression
are said to be connected by a
We say that is a linear transformation of .
When density histograms are symmetric about a single peak and look like the outline of a bell, they can often be approximated by a smooth curve known as the
The normal density function with mean and standard deviation
is denoted by .
The mean locates the middle of the normal density function along the -axis, and the stan- dard deviation controls the spread or concentration of the normal curve about the mean. As the
standard deviation decreases, the normal curve becomes more tightly concentrated about its mean. The normal density function is also called the
The allows us to summarize the locations of increasing proportions of a set
of numbers using only the sample mean and sample standard deviation. The empirical rule tells us that about 68 of the data lie in the interval
; about 95 of the data lie in the interval
2 ; and about 99.7 of the data lie in the interval 3 . The empirical rule
works best for large, mound-shaped data sets. The
summarizes the area under the normal curve in terms of standard deviation intervals centered at the mean. For any normal density function,
1. 68 of the area under the curve is contained within 1 standard deviation of the mean.
2. 95 of the area under the curve is contained within 2 standard deviations of the mean.
3. 99.7 of the area under the curve is contained within 3 standard deviations of the mean.
The 68 – 95 – 99.7 rule and the empirical rule are related. In fact, the empirical rule comes from assuming that a data frequency distribution can be approximately represented by a
normal density function with a mean equal to the sample mean
and a standard deviation equal to the sample standard deviation .
The is a normal distribution with mean 0 and standard
deviation 1. If
is distributed as , then
is distributed as .
CHAPTER 2 DESCRIBING PATTERNS IN DATA
u u
v
v
4
2.10 IMPORTANT CONCEPTS AND TOOLS