4 4
4 4
2.4 GRAPHICAL DISPLAYS OF DATA DISTRIBUTIONS
2.3 2.4
2.5
2.6
2.7
frequency distribution.
4 n
x
x
41
2.4 GRAPHICAL DISPLAYS OF DATA DISTRIBUTIONS
Refer to Exercise 2.1. For each of the studies, define the variable characteristic of interest. Indicate whether the variable is quantitative or qualitative. If the
variable is quantitative, indicate whether it is discrete or continuous. Refer to Exercise 2.2. For each of the studies, define the variable characteristic
of interest. Indicate whether the variable is quantitative or qualitative. If the variable is quantitative, indicate whether it is discrete or continuous.
Consider the collection of students in the class. a.
Describe two enumerative studies for which the students in the class may be regarded as a sample from a larger population.
b. Describe two enumerative studies for which the students in the class may be
regarded as the population. Suggest a frame for this population. A sample of
15 people were asked whether they favored or opposed a new system of high-speed rail transportation. The responses were coded as follows:
favored 1, opposed
0. The data are 1 0 0 0 1 1 1 1 0 0 1 0 0 0 0
a. Sum these numbers and interpret the result.
b. Calculate the sample mean, , and interpret this quantity for 0 – 1, or binary
coded, data. A sample of ten recent graduates in accounting were asked about job satisfaction.
The responses were coded as follows: satisfied 1, not satisfied
0. The data are
1 0 1 1 0 0 1 1 1 1 a.
Sum these numbers and interpret the result. b.
Calculate the sample mean, , and interpret this quantity for 0 – 1, or binary coded, data.
Recall the sendouts of gas on Wednesdays in January discussed in Example 1.6 or the monthly inventory levels of diesel engines introduced in Example 1.8. These
examples illustrate that repeated measurements of a given variable are different. The measurements vary. When the data set is very small, the differences are readily
apparent. We can immediately see, for example, whether all the numbers are close together or whether one number is considerably smaller than all the rest. With moderate
to large data sets, however, the pattern of variability is generally not evident. Until the data are organized in some meaningful fashion, it is often not clear what the data are
telling us. Organizing the data graphically is a necessary first step to understanding the information contained in them. Graphical displays can provide an immediate
interpretation not visible in the raw numbers.
The pattern of variability in a data set is called its This
distribution indicates the possible values, or categories, for the variable and the number of times each value or category occurs. Frequency distributions are best characterized
by graphs or plots. The best display in a particular case depends on the nature and size of the data set.
500 1000
1500 2000
Inventory
Figure 2.1 dot diagram
stem-and-leaf diagram. Solution and Discussion.
Dot Diagram for Inventory Levels
42
DOT DIAGRAM
EXAMPLE 2.1 Constructing a Dot Diagram
STEM-AND-LEAF DIAGRAM
CHAPTER 2 DESCRIBING PATTERNS IN DATA
We have already seen an example of a in Figure 1.10. This figure
showed the frequencies of the mail arrival times for a department in a computer manufacturing company. For relatively small data sets that contain either discrete or
rounded continuous measurements, a dot diagram provides a useful display of the variability. To create a dot diagram, draw a line with a scale covering the range of
values of the measurements, and then plot the individual measurements above the line as prominent dots.
The monthly inventory levels for diesel engines in thousands of dollars were given in Table 1.4 see Example 1.8 . Construct a dot diagram for these data.
The dot diagram, constructed with the help of a computer program, is given in Figure 2.1. The dot diagram indicates that most of the monthly
inventories cluster around the 1000 1,000,000 level, with a few levels above 1500 and none below 500. The dot diagram shows the pattern of the variation in
these data — a pattern that is not obvious from examining the rows of numbers in Table 1.4.
The dot diagram is the simplest way to display data. However, for moderate to large data sets, it is sometimes difficult to determine the actual numerical values
from the scale beneath the dots. The dots blend together, or nearly identical data get rounded and plotted as the same numbers. Other graphical displays can be constructed
that picture the frequency distribution and convey information about the magnitudes of the numbers themselves.
A view of a frequency distribution that features the actual numerical values in the display is provided by a
Stem-and-leaf diagrams work best for small- to moderate-size data sets where the measurements are all positive two- or
three-digit numbers. A stem-and-leaf diagram is created from data arranged in ascending order of
magnitude smallest to largest . The diagram uses the information in the leading digits of the numbers. For two-digit numbers, the first digits are the stems and the second
2 2 3 0566799
4 445 5
6 7
8 9 6
Figure 2.2 Solution and Discussion.
Stem-and-Leaf Diagram for RD Expenditures .
. .
. .
. .
. .
. .
.
43
EXAMPLE 2.2 Constructing a Stem-and-Leaf Diagram
for a Small Data Set
2.4 GRAPHICAL DISPLAYS OF DATA DISTRIBUTIONS
u u
u u
digits are the leaves, and the arrangement of the leaves on the stems provides a pictorial representation of the distribution.
Consider the RD expenditures for the twelve largest automakers given in Exercise 1.37. Create a stem-and-leaf diagram for these data.
The RD expenditures as a percentage of sales are reproduced here:
4 4 3 6
4 4 3 7
9 6 3 9
3 6 3 5
3 0 4 5
3 9 2 2
We represent 2.2, for example, as 2 2, where the first digit is the stem and the second digit is the leaf. The numbers 3.0 and 3.5 become 3 05, where 0 and 5 are the leaves
attached to the same stem, 3. Continuing in this way we obtain the stem-and-leaf diagram in Figure 2.2.
In summary, the integers 2, 3, . . . , 9 in the first column are the stems. The column of stem digits is often called the stem for convenience. The integers in the horizontal
lines coming from the stems are the leaves. Each leaf digit corresponds to a number. Since the leaf digits, for this data set, are in tenths, we say the leaf unit is .10. The first
line in the stem-and-leaf diagram, 2 2, indicates a value of 2.2 for the smallest number in the data set.
The second line, 3 0566799, indicates that there are seven numbers with the stem, or first digit, 3, and with the leaf unit .10; the seven numbers are 3.0, 3.5, 3.6, 3.6,
3.7, 3.9, and 3.9. There are no numbers in the data set with the stems first digits 5 through 8, because there are no leaves attached to these stems. The final and largest
RD expenditure is 9.6.
It is clear that most of the data are associated with the stem category 3. That is, the majority of the automakers in this group spend between 3 and 4 of sales on
research and development. The expenditure 9.6 Daimler-Benz is large relative to the rest of the data. Observations that are far removed from the bulk of the data are called
“outliers” and are ordinarily subjected to additional scrutiny to determine why they are different.
p
Solution and Discussion.
National 800-Meter-Dash Records for Women
44
p
OURCE
EXAMPLE 2.3 A Computer-Generated Stem-and-Leaf Diagram
for a Large Data Set
2.15 2.01
2.05 2.24
2.02 2.00
2.00 1.89
2.30 2.19
2.00 2.05
2.10 1.93
2.19 2.11
2.28 2.12
2.03 1.99
2.18 2.09
1.96 2.07
2.07 1.99
2.22 2.02
2.00 2.24
2.08 1.97
1.92 1.89
2.04 2.10
1.96 2.15
1.98 2.09
1.98 2.10
2.02 2.03
2.03 2.05
2.15 2.33
2.21 2.27
2.16 2.10
2.20 1.95
1.95
CHAPTER 2 DESCRIBING PATTERNS IN DATA
S : IAAF-ATFS Track and Field Statistics Handbook for the 1984 Los Angeles Olympics.
u When the stem-and-leaf diagram in Figure 2.2 is turned on its side with the stem
as a horizontal axis, we get a view of the pattern of variation.
In this case, the distribution is characterized by a mound of values leaves on the left that tail off to the one relatively large value at the right. The display is informative
because, in addition to giving us a picture of the variation, every RD number can be reconstructed exactly from the stem and corresponding leaf integers.
Stem-and-leaf diagrams are extremely versatile displays. A stem-and-leaf display involving a larger data set of three-digit numbers is given in Example 2.3. Other
examples and variants of the stem-and-leaf diagram are considered in the exercises. Of particular interest is the use of back-to-back stem-and-leaf diagrams to compare
two distributions. The distributions of the ratio current assets current liabilities for bankrupt and nonbankrupt firms are examined in Exercise 2.12 using back-to-back
stem-and-leaf diagrams with a common stem.
In Table 2.1, the national 800-meter-dash records for women are listed for 55 countries. The times are recorded in minutes. Construct a stem-and-leaf diagram for these data.
The stem-and-leaf diagram is shown in Figure 2.3. This diagram was produced by Minitab. There are several things to notice about the display
in Figure 2.3. Since we are dealing with three-digit numbers, the stem consists of two-digit numbers with, as usual, single-digit leaves. Second, some of the stem num-
bers, for example, 20, are repeated. Third, the leaf unit is .01, so when an individual value is reconstructed from the diagram, the decimal point falls between the two
stem digits; that is, the first line in the figure, 18 99, corresponds to the values 1.89, 1.89.
TABLE 2.1
22 3
0566799 4
445 5
6 7
8 96
2 18 99
4 19 23
13 19 556678899
25 20 000012223334
8 20 55577899
22 21 000012
16 21 5556899
9 22 01244
4 22 78
2 23 03
Figure 2.3 Stem-and-Leaf Diagram for 800-Meter Data
45
2.4 GRAPHICAL DISPLAYS OF DATA DISTRIBUTIONS
Finally, the first column of cumulative frequencies, which the software adds to the traditional stem-and-leaf picture, counts the cumulative number of values in the stem
categories. The frequencies are added from each end of the distribution until the “middle” category, indicated here with a parenthetical frequency of 8, is reached. So,
for example, there are 25 values less than or equal to 2.04, and 9 values greater than or equal to 2.20.
With a fairly large data set, the number of leaves attached to a stem may be large, and, because the stem categories are always defined by one- or two-digit integers, the
stem-and-leaf diagram may not be particularly informative. When this occurs, a given stem category may be repeated, with the leaf digits 0 through 4 associated with the
first occurrence of the category, and the leaf digits 5 through 9 associated with the second occurrence. This was the case with stem categories 19, 20, 21, and 22 above.
Ordinarily, a stem category is not repeated more than once.
Turning the stem-and-leaf diagram on its side, we see that the pattern of variation in the 800-meter times tends to fan out to the right.
The smallest value is 1.89 and the largest value is 2.33, a range of .44 minute. However, most of the times cluster around 1.95 to 2.12, a fairly narrow range of .17 minute; that
is, the bulk of the distribution is closer to the minimum, 1.89, than it is to the maximum, 2.33. Distributions that have this property are said to have a long right-hand tail or
to be skewed to the right. As you might expect, nations with highly developed track and field programs, like the United States, the former Soviet Union, and the former
German Democratic Republic, have the fastest times, and these times are nearly the same. Countries with less developed programs have times that are slower and more
varied.
18 99
19 23
19 556678899
20 000012223334
20 55577899
21 000012
21 5556899
22 01244
22 78
23 03
class intervals
class boundaries class limits.
histogram.
4
We shall adopt the convention of putting observations that fall exactly on the right-hand boundary
larger class limit into the next interval.
46
HISTOGRAMS
EXAMPLE 2.4 Constructing a Histogram for the 800-Meter Data
CHAPTER 2 DESCRIBING PATTERNS IN DATA
Since stem-and-leaf diagrams use, at most, only the first few digits of the numbers they represent, some information may be lost when constructing these diagrams from
numbers with more than three digits. In these cases, the remaining digits are simply ignored or truncated. For moderate to large data sets, a better graphical representation
of variability is provided by histograms.
As we have mentioned, constructing a dot diagram for a large data set can be tedious, and overcrowding of the dots can destroy the clarity of the diagram. Stem-and-leaf
diagrams display actual numerical values, but they too can be awkward and difficult to interpret for large data sets. In such cases, it is often convenient to group the
observations according to intervals and, for each interval, to record the frequency or
Frequency Relative frequency
Total number of values of values falling in the interval.
Ordinarily, the intervals are equal, consecutive, and cover the range of the data. However, the intervals may be unequal and even open-ended. In this sense, the
data categories are more flexible than those of a stem-and-leaf plot. The frequency distribution is given by listing the intervals and their associated frequencies or relative
frequencies. In this format, the intervals of the frequency distribution are called
and their endpoints are called or
In this way, the numbers represented by a class interval include the left-hand endpoint but not the right-hand endpoint. If the
data are discrete, the class intervals may be centered on the individual values with widths extending halfway to the observations on each side.
A display of a frequency distribution using a series of vertical bars with heights proportional to the frequencies or relative frequencies is called a
The number and positions of the class intervals of a frequency distribution are somewhat arbitrary. The number of classes usually ranges from 5 to 15, depending
on the size of the data set. With too few intervals, much of the information con- cerning the distribution of the observations within individual intervals is lost, since
only frequencies are recorded. With too many intervals and particularly with small data sets, the frequencies from one cell to the next can jump up and down in a
chaotic manner, and no clear pattern is evident. It is best to begin with a relatively large number of intervals, combining intervals until a smooth pattern emerges. In
other words, constructing a frequency distribution and the histogram requires some judgment.
Let’s return to the 800-meter data. A frequency distribution for these data is shown in Table 2.2, using the endpoint convention. For this example, there are 10 class intervals
of equal width, .05 minute. Draw the histogram for this frequency distribution.
1.90 3
6 12 9 9
4 6
3 2
1 2.00
2.10 2.20
2.30 Count
Relative frequency Frequency
.10 5
10 .20
Time sec.
Figure 2.4
3 55
6 55
12 55
9 55
9 55
4 55
6 55
3 55
2 55
1 55
a
Solution and Discussion.
Histogram for 800-Meter Data
. .
. .
. .
. .
. .
. 4
4 4
4 4
4 4
4 4
4
Frequency Distributions for 800-Meter Data
47
a
Class Interval Frequency
Relative Frequency [ 1.875 – 1.925
3 055
[ 1.925 – 1.975 6
109 [ 1.975 – 2.025
12 218
[ 2.025 – 2.075 9
164 [ 2.075 – 2.125
9 164
[ 2.125 – 2.175 4
073 [ 2.175 – 2.225
6 109
[ 2.225 – 2.275 3
055 [ 2.275 – 2.325
2 036
[ 2.325 – 2.375 1
018 Total
55 1 001
2.4 GRAPHICAL DISPLAYS OF DATA DISTRIBUTIONS
This entry is 1.000 within rounding error.
Making use of two vertical axes, we can display both the frequency distribution and the relative frequency distribution in the same figure as a
single histogram. The histogram for the 800-meter data is pictured in Figure 2.4. The heights of the bars in the figure are the frequencies or relative frequencies see the
axes at the left and right of the figure , and the widths of the bars are the class interval widths.
The bulk of the distribution, represented by the highest bars, is to the left. Since the distribution falls off to the right from its left-hand peak, it is skewed to the right—a
description of the variation that is consistent with the stem-and-leaf diagram in Figure 2.3. The histogram, in this case, gives a clearer picture of the distribution of women’s
800-meter records than the stem-and-leaf diagram. This usually happens for large data sets. On the other hand, the individual values cannot be determined from the
frequency distribution or the histogram .
TABLE 2.2
. . .
p
symmetric skewed
, , , ,
Journal of Finance,
48
p
EXAMPLE 2.5 Using a Histogram to Convey Important Stock Market
Information
CHAPTER 2 DESCRIBING PATTERNS IN DATA
Christie, W. B., and Schultz, P. H., “Why do NASDAQ market makers avoid odd-eighth quotes?” XLIX, No. 5, 1994, pp. 1813 – 1840.
Again we see that most of the national 800-meter times are within about .2 minute 12 seconds of one another, and these are the countries with the fastest times. There
is more separation among the relatively few countries with 800-meter times that are slower.
Histograms are versatile data displays. With a little experimentation, they can provide clear representations of variability. A quick glance at a histogram will give
the location and general shape of the data pattern. The pattern can be described as
or single long tail . A pattern is symmetric if the pattern
of variability on one side of a vertical line through the center is a mirror image of the pattern on the other side. A pattern is skewed if much of the distribution is
concentrated near one end of the range of possible values—that is, if one tail extends farther from the center than the other. Patterns of data that were skewed to the right
were exhibited in Examples 2.2 and 2.3. In these examples, the bulk of the data was on the left and, consequently, the right-hand tails higher values were much longer
than the left-hand tails lower values . Distributions with relatively long left-hand tails are said to be skewed to the left.
The number of peaks in a histogram is also of interest because two distinct peaks, even if one is lower than the other, may indicate two groups of numbers that are
different from one another in some fundamental way. For example, the histogram in Figure 2.4 has two peaks, although one is considerably smaller than the other. The
second peak occurs at an 800-meter time of about 2.2 minutes. The first peak at about 2 minutes is associated with national record 800-meter times for large developed
countries. Countries with national record times near the second peak are small and less developed.
A relatively simple display like a histogram can provide a considerable amount of useful information, as the next example illustrates.
Dealers who are market makers on the NASDAQ exchange give bid and asked prices on securities. The quotes are given in dollars and eighths of dollars. With
competition among several hundred dealers, we expect each of the fractions of dollars, , to occur about equally often.
Two investigators collected all bid and asked prices for 100 of the most actively traded stocks on the NASDAQ for 1991. The distribution of inside bid and asked
quotes is summarized by the histogram shown in Figure 2.5. The percentage along the vertical axis is an average of the frequencies at the bid and asked prices,
computed using all inside quotes for all 100 stocks throughout 1991. Interpret this histogram.
0 1 2 7
8 8 8 8
.125 .25
.375 .5
.625 .75
.875 Percent of inside quotes
5 10
15 20
25
Price fraction of inside quotes
p
Figure 2.5
Solution and Discussion.
The Distribution of Price Fractions for Inside Quotes of
100 NASDAQ Securities, 1991
49
p
OURCE
EXAMPLE 2.6 Using Histograms to Compare Two Data Distributions
2.4 GRAPHICAL DISPLAYS OF DATA DISTRIBUTIONS
S : Data courtesy of SONOCO Products, Inc. The complete data set is contained in Table 6,
Appendix C.
We expected a flat or uniform pattern, suggesting that all eighths are “equally likely.” Instead, the histogram is a comb pattern. There are very
few odd price quotes — many fewer than would be expected if prices were set in a competitive manner. If dealers agreed to avoid odd-eighth quotes, then the bid – asked
spread would always be at least dollar or 25 cents. Maintaining a bid – asked spread
of this nature imposes a real cost on investors. A histogram showing price fractions for 100 similar securities traded on the
NYSE AMEX exchanges is essentially flat, with all eighths represented equally. The presentation in the national press of the data shown in Figure 2.5, which suggests but
does not prove collusion among NASDAQ dealers, led to almost immediate changes in the nature of bid – asked quotes for heavily traded issues. Here the message from
the data is clearly and forcefully given by the histogram.
Like stem-and-leaf diagrams, histograms with a common set of equal class intervals can be used to compare two distributions. The best way to do this is to plot the his-
tograms back-to-back along a common scale. We explore this possibility in Example 2.6.
Paper is manufactured in continuous sheets several feet wide. Because of the orien- tation of fibers within the paper, it has a different strength when measured in the
direction produced by the machine machine direction than when measured across, or at right angles to, the machine direction. The latter direction is called the cross direc-
tion. Several plies of paper are used to produce cardboard and, as part of the cardboard manufacturing process, the strengths of samples of the various plies of paper are mea-
sured. The histograms in Figure 2.6 show the patterns of the measurements for strength in the machine direction and strength in the cross direction for 41 pieces of paper.
2 8
2 4
6 8
110 100
120 130
140 New paper
Old paper
4 3
2 1
2 4
6 8
100 110
120 130
140 Machine direction
5 10
15 50
60 70
80 Cross direction
Figure 2.6
Figure 2.7
density histogram. Solution and Discussion.
Histograms of Strengths in the Machine Direction and Cross Direction
Back-to-back Histograms of Machine Direction Strengths
50
CHAPTER 2 DESCRIBING PATTERNS IN DATA
There are two clear peaks in the histogram of cross direction strengths—one at about 52 and the other at about 72. Eleven of the pieces of paper were relatively old.
The remaining 30 pieces of paper were new at the time the measurements were made. Construct back-to-back histograms of strength in the machine direction for the old and
new paper.
Figure 2.7 displays the histograms of the machine di- rection strengths for the old paper and new paper in a back-to-back format with a
common set of class intervals. It is clear from this figure that, in general, the new paper is stronger in the machine direction than the old paper.
The differences in strengths in the machine direction for the old and new paper are “hidden” in the histogram in Figure 2.6. However, two peaks in machine direction
strengths are evident if the histogram is constructed with narrower class intervals see Exercise 2.14 .
Once the reason for the distinct peaks in the histogram of cross direction strengths age of paper was identified, the strengths in the machine direction were examined
for the same characteristic. In this example, the two groups of machine direction strengths were then compared using back-to-back histograms.
When the relative frequency in a class interval is represented by the area, rather than by the height of a bar, the histogram is called a
The bar has the same width as the class interval and a height adjusted to make its area
p
4
4 4
density measures the concentration of observations per unit of interval width.
Wall Street Journal
Wall Street Journal.
51
p
EXAMPLE 2.7 Constructing a Density Histogram
2.4 GRAPHICAL DISPLAYS OF DATA DISTRIBUTIONS
The relative frequency distributions in Table 2.3 differ slightly from the ones in the However, the minor changes that we made do not change the results appreciably.
height width equal to the relative frequency. The adjusted height is called the
density. Densities are determined from the relative frequency distribution using the definition
Relative frequency Density
Interval width and, consequently,
Relative frequency Class interval width
Density Area
In fact, this is how we scaled the two histograms in Figure 2.7 because the sample sizes, 11 and 30, were unequal.
We see that Consequently, for two class intervals of equal widths and the same relative
frequencies, the densities will necessarily be the same. For two class intervals of different widths, the same relative frequencies lead to different densities because the
two intervals will have different proportions of observations per amount of interval width.
Comparing relative frequency distributions spread out over a set of unequal class intervals is difficult, because relative frequency calculations are influenced by class
interval widths. How do we compare two identical relative frequencies when they are associated with two class intervals of considerably different widths? The scaling caused
by using areas to represent relative frequencies allows an unambiguous comparison because the sum of the areas of the bars of any density histogram is always 1.00 by
construction. The next example illustrates this point.
An article in the November 25, 1992, discussed the differences in
earnings for male and female doctors. The article pointed out that, although one-third of the residents and 40 of the medical students in America were female, female doctors
in private practice earned considerably less than their male counterparts. This income disparity occurred even in specialties in which women were heavily concentrated.
To indicate the magnitude of the differences, two relative frequency histograms one for males, one for females of income were displayed. The relative frequency
distributions, based on a survey of 17,000 group-practice doctors, are shown in Table 2.3 page 52 along with the density distributions created by dividing the relative
frequencies by the corresponding class interval widths.
Looking at the relative frequency distributions, we see, for example, that the largest relative frequency for male doctors occurs for the 1991 income category 150,000
to 200,000, whereas the largest relative frequency for female doctors is associated with the categories 0 to 60,000 and 80,000 to 100,000. Generally speaking, female
3
3
100 80
60 40
20 20
40 60
80 100
Density ×
10,000 Male doctors
Female doctors Income
1000s
100 200
300 400
Figure 2.8
Solution and Discussion.
Density His-
tograms of 1991 Incomes for Male and Female Doctors
Distributions of 1991 Income for Male and Female Doctors
52
Male Female
Relative Relative
Income 1,000’s frequency
Density frequency
Density [ 0, 60
.0737 .0012
.1919 .0032
[ 60, 80 .0842
.0042 .1414
.0071 [ 80, 100
.1053 .0053
.1919 .0096
[ 100, 125 .1579
.0063 .1616
.0065 [ 125, 150
.1263 .0051
.0909 .0036
[ 150, 200 .1684
.0034 .1111
.0022 [ 200, 250
.1263 .0025
.0606 .0012
[ 250, 300 .0947
.0019 .0404
.0008 [ 300, 400
.0632 .0006
.0101 .0001
Total 1.0000
1.0000
CHAPTER 2 DESCRIBING PATTERNS IN DATA
doctors appear to make less than male doctors, since the largest relative frequencies for women are associated with the lower income categories, and the largest relative
frequencies for men are associated with the middle income categories. But direct comparisons using relative frequencies are difficult in this case because the interval
widths are different. Instead, compare the distributions of income with back-to-back density histograms.
The density distributions in Table 2.3 are plotted as back-to-back density histograms in Figure 2.8. The picture is clear. Salaries of female
doctors are fairly tightly concentrated dense in the 60,000 to 125,000 range, with less concentration in the upper income categories. Salaries of male doctors,
on the other hand, are concentrated dense in the 80,000 to 150,000 range, with appreciable concentration relative to females in the upper income categories.
This survey indicates that female doctors make less than male doctors, and the nature of
TABLE 2.3
2.8
2.9
2.10
Gasoline Diesel
. .
. .
. .
. .
. .
. .
Minitab or similar program recommended
53
16.44 7.19
8.50 7.42
9.92 4.24
10.28 10.16
11.20 14.25
12.79 9.60
13.50 13.32
6.47 11.35
29.11 12.68
9.15 9.70
7.51 9.90
9.77 11.61
10.25 11.11
9.09 8.53
12.17 10.24
8.29 15.90
10.18 8.88
11.94 9.54
12.34 8.51
10.43 10.87
26.16 12.95
7.13 11.88
16.93 14.70
12.03 10.32
8.98 9.70
12.72 9.49
8.22 13.70
8.21 15.86
9.18 12.49
17.32
2.4 GRAPHICAL DISPLAYS OF DATA DISTRIBUTIONS
FuelCost.dat
the discrepancy is evident. Incomes of females are more tightly concentrated less variable than those of males, and this concentration occurs at the lower end relative
to male incomes of the income scale.
The graphical displays described in this section are extremely useful ways of looking at data. Modern computer software makes them easy to implement. Carefully
constructed pictures provide an immediate impression of the general features of a data set and often suggest avenues for further study. Plots, charts, and graphs are key
elements of exploratory data analysis.
The RD expenditure numbers discussed in Example 2.2 are given here: 4 4
3 6 4 4
3 7 9 6
3 9 3 6
3 5 3 0
4 5 3 9
2 2 a.
Construct a dot diagram. b.
Is the dot diagram consistent with the stem-and-leaf diagram in Figure 2.2? Discuss.
What, if anything, is wrong with the following choices of intervals for construct- ing a frequency distribution for data that run from 0 to 99?
a. [ 0, 25 , [ 25, 50 , and [ 55, 100
b. [ 0, 20 , [ 20, 40 , [ 40, 80 , and [ 75, 100
In the first phase of a study of the cost of transporting milk from farms to dairy plants, a survey was taken of firms
engaged in milk transportation. One of the variables measured was fuel cost. The fuel costs on a per-mile basis for 36 gasoline trucks and 23 diesel trucks
are given here data courtesy of M. Keaton .
EXERCISES
4
b b
c, a?