GRAPHICAL DISPLAYS OF DATA DISTRIBUTIONS 2.4

4 4 4 4

2.4 GRAPHICAL DISPLAYS OF DATA DISTRIBUTIONS

2.3 2.4

2.5 2.6 2.7 frequency distribution. 4 n x x 41

2.4 GRAPHICAL DISPLAYS OF DATA DISTRIBUTIONS

Refer to Exercise 2.1. For each of the studies, define the variable characteristic of interest. Indicate whether the variable is quantitative or qualitative. If the variable is quantitative, indicate whether it is discrete or continuous. Refer to Exercise 2.2. For each of the studies, define the variable characteristic of interest. Indicate whether the variable is quantitative or qualitative. If the variable is quantitative, indicate whether it is discrete or continuous. Consider the collection of students in the class. a. Describe two enumerative studies for which the students in the class may be regarded as a sample from a larger population. b. Describe two enumerative studies for which the students in the class may be regarded as the population. Suggest a frame for this population. A sample of 15 people were asked whether they favored or opposed a new system of high-speed rail transportation. The responses were coded as follows: favored 1, opposed 0. The data are 1 0 0 0 1 1 1 1 0 0 1 0 0 0 0 a. Sum these numbers and interpret the result. b. Calculate the sample mean, , and interpret this quantity for 0 – 1, or binary coded, data. A sample of ten recent graduates in accounting were asked about job satisfaction. The responses were coded as follows: satisfied 1, not satisfied 0. The data are 1 0 1 1 0 0 1 1 1 1 a. Sum these numbers and interpret the result. b. Calculate the sample mean, , and interpret this quantity for 0 – 1, or binary coded, data. Recall the sendouts of gas on Wednesdays in January discussed in Example 1.6 or the monthly inventory levels of diesel engines introduced in Example 1.8. These examples illustrate that repeated measurements of a given variable are different. The measurements vary. When the data set is very small, the differences are readily apparent. We can immediately see, for example, whether all the numbers are close together or whether one number is considerably smaller than all the rest. With moderate to large data sets, however, the pattern of variability is generally not evident. Until the data are organized in some meaningful fashion, it is often not clear what the data are telling us. Organizing the data graphically is a necessary first step to understanding the information contained in them. Graphical displays can provide an immediate interpretation not visible in the raw numbers. The pattern of variability in a data set is called its This distribution indicates the possible values, or categories, for the variable and the number of times each value or category occurs. Frequency distributions are best characterized by graphs or plots. The best display in a particular case depends on the nature and size of the data set. 500 1000 1500 2000 Inventory Figure 2.1 dot diagram stem-and-leaf diagram. Solution and Discussion. Dot Diagram for Inventory Levels 42 DOT DIAGRAM EXAMPLE 2.1 Constructing a Dot Diagram STEM-AND-LEAF DIAGRAM CHAPTER 2 DESCRIBING PATTERNS IN DATA We have already seen an example of a in Figure 1.10. This figure showed the frequencies of the mail arrival times for a department in a computer manufacturing company. For relatively small data sets that contain either discrete or rounded continuous measurements, a dot diagram provides a useful display of the variability. To create a dot diagram, draw a line with a scale covering the range of values of the measurements, and then plot the individual measurements above the line as prominent dots. The monthly inventory levels for diesel engines in thousands of dollars were given in Table 1.4 see Example 1.8 . Construct a dot diagram for these data. The dot diagram, constructed with the help of a computer program, is given in Figure 2.1. The dot diagram indicates that most of the monthly inventories cluster around the 1000 1,000,000 level, with a few levels above 1500 and none below 500. The dot diagram shows the pattern of the variation in these data — a pattern that is not obvious from examining the rows of numbers in Table 1.4. The dot diagram is the simplest way to display data. However, for moderate to large data sets, it is sometimes difficult to determine the actual numerical values from the scale beneath the dots. The dots blend together, or nearly identical data get rounded and plotted as the same numbers. Other graphical displays can be constructed that picture the frequency distribution and convey information about the magnitudes of the numbers themselves. A view of a frequency distribution that features the actual numerical values in the display is provided by a Stem-and-leaf diagrams work best for small- to moderate-size data sets where the measurements are all positive two- or three-digit numbers. A stem-and-leaf diagram is created from data arranged in ascending order of magnitude smallest to largest . The diagram uses the information in the leading digits of the numbers. For two-digit numbers, the first digits are the stems and the second 2 2 3 0566799 4 445 5 6 7 8 9 6 Figure 2.2 Solution and Discussion. Stem-and-Leaf Diagram for RD Expenditures . . . . . . . . . . . . 43 EXAMPLE 2.2 Constructing a Stem-and-Leaf Diagram for a Small Data Set

2.4 GRAPHICAL DISPLAYS OF DATA DISTRIBUTIONS

u u u u digits are the leaves, and the arrangement of the leaves on the stems provides a pictorial representation of the distribution. Consider the RD expenditures for the twelve largest automakers given in Exercise 1.37. Create a stem-and-leaf diagram for these data. The RD expenditures as a percentage of sales are reproduced here: 4 4 3 6 4 4 3 7 9 6 3 9 3 6 3 5 3 0 4 5 3 9 2 2 We represent 2.2, for example, as 2 2, where the first digit is the stem and the second digit is the leaf. The numbers 3.0 and 3.5 become 3 05, where 0 and 5 are the leaves attached to the same stem, 3. Continuing in this way we obtain the stem-and-leaf diagram in Figure 2.2. In summary, the integers 2, 3, . . . , 9 in the first column are the stems. The column of stem digits is often called the stem for convenience. The integers in the horizontal lines coming from the stems are the leaves. Each leaf digit corresponds to a number. Since the leaf digits, for this data set, are in tenths, we say the leaf unit is .10. The first line in the stem-and-leaf diagram, 2 2, indicates a value of 2.2 for the smallest number in the data set. The second line, 3 0566799, indicates that there are seven numbers with the stem, or first digit, 3, and with the leaf unit .10; the seven numbers are 3.0, 3.5, 3.6, 3.6, 3.7, 3.9, and 3.9. There are no numbers in the data set with the stems first digits 5 through 8, because there are no leaves attached to these stems. The final and largest RD expenditure is 9.6. It is clear that most of the data are associated with the stem category 3. That is, the majority of the automakers in this group spend between 3 and 4 of sales on research and development. The expenditure 9.6 Daimler-Benz is large relative to the rest of the data. Observations that are far removed from the bulk of the data are called “outliers” and are ordinarily subjected to additional scrutiny to determine why they are different. p Solution and Discussion. National 800-Meter-Dash Records for Women 44 p OURCE EXAMPLE 2.3 A Computer-Generated Stem-and-Leaf Diagram for a Large Data Set 2.15 2.01 2.05 2.24 2.02 2.00 2.00 1.89 2.30 2.19 2.00 2.05 2.10 1.93 2.19 2.11 2.28 2.12 2.03 1.99 2.18 2.09 1.96 2.07 2.07 1.99 2.22 2.02 2.00 2.24 2.08 1.97 1.92 1.89 2.04 2.10 1.96 2.15 1.98 2.09 1.98 2.10 2.02 2.03 2.03 2.05 2.15 2.33 2.21 2.27 2.16 2.10 2.20 1.95 1.95 CHAPTER 2 DESCRIBING PATTERNS IN DATA S : IAAF-ATFS Track and Field Statistics Handbook for the 1984 Los Angeles Olympics. u When the stem-and-leaf diagram in Figure 2.2 is turned on its side with the stem as a horizontal axis, we get a view of the pattern of variation. In this case, the distribution is characterized by a mound of values leaves on the left that tail off to the one relatively large value at the right. The display is informative because, in addition to giving us a picture of the variation, every RD number can be reconstructed exactly from the stem and corresponding leaf integers. Stem-and-leaf diagrams are extremely versatile displays. A stem-and-leaf display involving a larger data set of three-digit numbers is given in Example 2.3. Other examples and variants of the stem-and-leaf diagram are considered in the exercises. Of particular interest is the use of back-to-back stem-and-leaf diagrams to compare two distributions. The distributions of the ratio current assets current liabilities for bankrupt and nonbankrupt firms are examined in Exercise 2.12 using back-to-back stem-and-leaf diagrams with a common stem. In Table 2.1, the national 800-meter-dash records for women are listed for 55 countries. The times are recorded in minutes. Construct a stem-and-leaf diagram for these data. The stem-and-leaf diagram is shown in Figure 2.3. This diagram was produced by Minitab. There are several things to notice about the display in Figure 2.3. Since we are dealing with three-digit numbers, the stem consists of two-digit numbers with, as usual, single-digit leaves. Second, some of the stem num- bers, for example, 20, are repeated. Third, the leaf unit is .01, so when an individual value is reconstructed from the diagram, the decimal point falls between the two stem digits; that is, the first line in the figure, 18 99, corresponds to the values 1.89, 1.89. TABLE 2.1 22 3 0566799 4 445 5 6 7 8 96 2 18 99 4 19 23 13 19 556678899 25 20 000012223334 8 20 55577899 22 21 000012 16 21 5556899 9 22 01244 4 22 78 2 23 03 Figure 2.3 Stem-and-Leaf Diagram for 800-Meter Data 45

2.4 GRAPHICAL DISPLAYS OF DATA DISTRIBUTIONS

Finally, the first column of cumulative frequencies, which the software adds to the traditional stem-and-leaf picture, counts the cumulative number of values in the stem categories. The frequencies are added from each end of the distribution until the “middle” category, indicated here with a parenthetical frequency of 8, is reached. So, for example, there are 25 values less than or equal to 2.04, and 9 values greater than or equal to 2.20. With a fairly large data set, the number of leaves attached to a stem may be large, and, because the stem categories are always defined by one- or two-digit integers, the stem-and-leaf diagram may not be particularly informative. When this occurs, a given stem category may be repeated, with the leaf digits 0 through 4 associated with the first occurrence of the category, and the leaf digits 5 through 9 associated with the second occurrence. This was the case with stem categories 19, 20, 21, and 22 above. Ordinarily, a stem category is not repeated more than once. Turning the stem-and-leaf diagram on its side, we see that the pattern of variation in the 800-meter times tends to fan out to the right. The smallest value is 1.89 and the largest value is 2.33, a range of .44 minute. However, most of the times cluster around 1.95 to 2.12, a fairly narrow range of .17 minute; that is, the bulk of the distribution is closer to the minimum, 1.89, than it is to the maximum, 2.33. Distributions that have this property are said to have a long right-hand tail or to be skewed to the right. As you might expect, nations with highly developed track and field programs, like the United States, the former Soviet Union, and the former German Democratic Republic, have the fastest times, and these times are nearly the same. Countries with less developed programs have times that are slower and more varied. 18 99 19 23 19 556678899 20 000012223334 20 55577899 21 000012 21 5556899 22 01244 22 78 23 03 class intervals class boundaries class limits. histogram. 4 We shall adopt the convention of putting observations that fall exactly on the right-hand boundary larger class limit into the next interval. 46 HISTOGRAMS EXAMPLE 2.4 Constructing a Histogram for the 800-Meter Data CHAPTER 2 DESCRIBING PATTERNS IN DATA Since stem-and-leaf diagrams use, at most, only the first few digits of the numbers they represent, some information may be lost when constructing these diagrams from numbers with more than three digits. In these cases, the remaining digits are simply ignored or truncated. For moderate to large data sets, a better graphical representation of variability is provided by histograms. As we have mentioned, constructing a dot diagram for a large data set can be tedious, and overcrowding of the dots can destroy the clarity of the diagram. Stem-and-leaf diagrams display actual numerical values, but they too can be awkward and difficult to interpret for large data sets. In such cases, it is often convenient to group the observations according to intervals and, for each interval, to record the frequency or Frequency Relative frequency Total number of values of values falling in the interval. Ordinarily, the intervals are equal, consecutive, and cover the range of the data. However, the intervals may be unequal and even open-ended. In this sense, the data categories are more flexible than those of a stem-and-leaf plot. The frequency distribution is given by listing the intervals and their associated frequencies or relative frequencies. In this format, the intervals of the frequency distribution are called and their endpoints are called or In this way, the numbers represented by a class interval include the left-hand endpoint but not the right-hand endpoint. If the data are discrete, the class intervals may be centered on the individual values with widths extending halfway to the observations on each side. A display of a frequency distribution using a series of vertical bars with heights proportional to the frequencies or relative frequencies is called a The number and positions of the class intervals of a frequency distribution are somewhat arbitrary. The number of classes usually ranges from 5 to 15, depending on the size of the data set. With too few intervals, much of the information con- cerning the distribution of the observations within individual intervals is lost, since only frequencies are recorded. With too many intervals and particularly with small data sets, the frequencies from one cell to the next can jump up and down in a chaotic manner, and no clear pattern is evident. It is best to begin with a relatively large number of intervals, combining intervals until a smooth pattern emerges. In other words, constructing a frequency distribution and the histogram requires some judgment. Let’s return to the 800-meter data. A frequency distribution for these data is shown in Table 2.2, using the endpoint convention. For this example, there are 10 class intervals of equal width, .05 minute. Draw the histogram for this frequency distribution. 1.90 3 6 12 9 9 4 6 3 2 1 2.00 2.10 2.20 2.30 Count Relative frequency Frequency .10 5 10 .20 Time sec. Figure 2.4 3 55 6 55 12 55 9 55 9 55 4 55 6 55 3 55 2 55 1 55 a Solution and Discussion. Histogram for 800-Meter Data . . . . . . . . . . . 4 4 4 4 4 4 4 4 4 4 Frequency Distributions for 800-Meter Data 47 a Class Interval Frequency Relative Frequency [ 1.875 – 1.925 3 055 [ 1.925 – 1.975 6 109 [ 1.975 – 2.025 12 218 [ 2.025 – 2.075 9 164 [ 2.075 – 2.125 9 164 [ 2.125 – 2.175 4 073 [ 2.175 – 2.225 6 109 [ 2.225 – 2.275 3 055 [ 2.275 – 2.325 2 036 [ 2.325 – 2.375 1 018 Total 55 1 001

2.4 GRAPHICAL DISPLAYS OF DATA DISTRIBUTIONS

This entry is 1.000 within rounding error. Making use of two vertical axes, we can display both the frequency distribution and the relative frequency distribution in the same figure as a single histogram. The histogram for the 800-meter data is pictured in Figure 2.4. The heights of the bars in the figure are the frequencies or relative frequencies see the axes at the left and right of the figure , and the widths of the bars are the class interval widths. The bulk of the distribution, represented by the highest bars, is to the left. Since the distribution falls off to the right from its left-hand peak, it is skewed to the right—a description of the variation that is consistent with the stem-and-leaf diagram in Figure 2.3. The histogram, in this case, gives a clearer picture of the distribution of women’s 800-meter records than the stem-and-leaf diagram. This usually happens for large data sets. On the other hand, the individual values cannot be determined from the frequency distribution or the histogram . TABLE 2.2 . . . p symmetric skewed , , , , Journal of Finance, 48 p EXAMPLE 2.5 Using a Histogram to Convey Important Stock Market Information CHAPTER 2 DESCRIBING PATTERNS IN DATA Christie, W. B., and Schultz, P. H., “Why do NASDAQ market makers avoid odd-eighth quotes?” XLIX, No. 5, 1994, pp. 1813 – 1840. Again we see that most of the national 800-meter times are within about .2 minute 12 seconds of one another, and these are the countries with the fastest times. There is more separation among the relatively few countries with 800-meter times that are slower. Histograms are versatile data displays. With a little experimentation, they can provide clear representations of variability. A quick glance at a histogram will give the location and general shape of the data pattern. The pattern can be described as or single long tail . A pattern is symmetric if the pattern of variability on one side of a vertical line through the center is a mirror image of the pattern on the other side. A pattern is skewed if much of the distribution is concentrated near one end of the range of possible values—that is, if one tail extends farther from the center than the other. Patterns of data that were skewed to the right were exhibited in Examples 2.2 and 2.3. In these examples, the bulk of the data was on the left and, consequently, the right-hand tails higher values were much longer than the left-hand tails lower values . Distributions with relatively long left-hand tails are said to be skewed to the left. The number of peaks in a histogram is also of interest because two distinct peaks, even if one is lower than the other, may indicate two groups of numbers that are different from one another in some fundamental way. For example, the histogram in Figure 2.4 has two peaks, although one is considerably smaller than the other. The second peak occurs at an 800-meter time of about 2.2 minutes. The first peak at about 2 minutes is associated with national record 800-meter times for large developed countries. Countries with national record times near the second peak are small and less developed. A relatively simple display like a histogram can provide a considerable amount of useful information, as the next example illustrates. Dealers who are market makers on the NASDAQ exchange give bid and asked prices on securities. The quotes are given in dollars and eighths of dollars. With competition among several hundred dealers, we expect each of the fractions of dollars, , to occur about equally often. Two investigators collected all bid and asked prices for 100 of the most actively traded stocks on the NASDAQ for 1991. The distribution of inside bid and asked quotes is summarized by the histogram shown in Figure 2.5. The percentage along the vertical axis is an average of the frequencies at the bid and asked prices, computed using all inside quotes for all 100 stocks throughout 1991. Interpret this histogram. 0 1 2 7 8 8 8 8 .125 .25 .375 .5 .625 .75 .875 Percent of inside quotes 5 10 15 20 25 Price fraction of inside quotes p Figure 2.5 Solution and Discussion. The Distribution of Price Fractions for Inside Quotes of 100 NASDAQ Securities, 1991 49 p OURCE EXAMPLE 2.6 Using Histograms to Compare Two Data Distributions

2.4 GRAPHICAL DISPLAYS OF DATA DISTRIBUTIONS

S : Data courtesy of SONOCO Products, Inc. The complete data set is contained in Table 6, Appendix C. We expected a flat or uniform pattern, suggesting that all eighths are “equally likely.” Instead, the histogram is a comb pattern. There are very few odd price quotes — many fewer than would be expected if prices were set in a competitive manner. If dealers agreed to avoid odd-eighth quotes, then the bid – asked spread would always be at least dollar or 25 cents. Maintaining a bid – asked spread of this nature imposes a real cost on investors. A histogram showing price fractions for 100 similar securities traded on the NYSE AMEX exchanges is essentially flat, with all eighths represented equally. The presentation in the national press of the data shown in Figure 2.5, which suggests but does not prove collusion among NASDAQ dealers, led to almost immediate changes in the nature of bid – asked quotes for heavily traded issues. Here the message from the data is clearly and forcefully given by the histogram. Like stem-and-leaf diagrams, histograms with a common set of equal class intervals can be used to compare two distributions. The best way to do this is to plot the his- tograms back-to-back along a common scale. We explore this possibility in Example 2.6. Paper is manufactured in continuous sheets several feet wide. Because of the orien- tation of fibers within the paper, it has a different strength when measured in the direction produced by the machine machine direction than when measured across, or at right angles to, the machine direction. The latter direction is called the cross direc- tion. Several plies of paper are used to produce cardboard and, as part of the cardboard manufacturing process, the strengths of samples of the various plies of paper are mea- sured. The histograms in Figure 2.6 show the patterns of the measurements for strength in the machine direction and strength in the cross direction for 41 pieces of paper. 2 8 2 4 6 8 110 100 120 130 140 New paper Old paper 4 3 2 1 2 4 6 8 100 110 120 130 140 Machine direction 5 10 15 50 60 70 80 Cross direction Figure 2.6 Figure 2.7 density histogram. Solution and Discussion. Histograms of Strengths in the Machine Direction and Cross Direction Back-to-back Histograms of Machine Direction Strengths 50 CHAPTER 2 DESCRIBING PATTERNS IN DATA There are two clear peaks in the histogram of cross direction strengths—one at about 52 and the other at about 72. Eleven of the pieces of paper were relatively old. The remaining 30 pieces of paper were new at the time the measurements were made. Construct back-to-back histograms of strength in the machine direction for the old and new paper. Figure 2.7 displays the histograms of the machine di- rection strengths for the old paper and new paper in a back-to-back format with a common set of class intervals. It is clear from this figure that, in general, the new paper is stronger in the machine direction than the old paper. The differences in strengths in the machine direction for the old and new paper are “hidden” in the histogram in Figure 2.6. However, two peaks in machine direction strengths are evident if the histogram is constructed with narrower class intervals see Exercise 2.14 . Once the reason for the distinct peaks in the histogram of cross direction strengths age of paper was identified, the strengths in the machine direction were examined for the same characteristic. In this example, the two groups of machine direction strengths were then compared using back-to-back histograms. When the relative frequency in a class interval is represented by the area, rather than by the height of a bar, the histogram is called a The bar has the same width as the class interval and a height adjusted to make its area p 4 4 4 density measures the concentration of observations per unit of interval width. Wall Street Journal Wall Street Journal. 51 p EXAMPLE 2.7 Constructing a Density Histogram

2.4 GRAPHICAL DISPLAYS OF DATA DISTRIBUTIONS

The relative frequency distributions in Table 2.3 differ slightly from the ones in the However, the minor changes that we made do not change the results appreciably. height width equal to the relative frequency. The adjusted height is called the density. Densities are determined from the relative frequency distribution using the definition Relative frequency Density Interval width and, consequently, Relative frequency Class interval width Density Area In fact, this is how we scaled the two histograms in Figure 2.7 because the sample sizes, 11 and 30, were unequal. We see that Consequently, for two class intervals of equal widths and the same relative frequencies, the densities will necessarily be the same. For two class intervals of different widths, the same relative frequencies lead to different densities because the two intervals will have different proportions of observations per amount of interval width. Comparing relative frequency distributions spread out over a set of unequal class intervals is difficult, because relative frequency calculations are influenced by class interval widths. How do we compare two identical relative frequencies when they are associated with two class intervals of considerably different widths? The scaling caused by using areas to represent relative frequencies allows an unambiguous comparison because the sum of the areas of the bars of any density histogram is always 1.00 by construction. The next example illustrates this point. An article in the November 25, 1992, discussed the differences in earnings for male and female doctors. The article pointed out that, although one-third of the residents and 40 of the medical students in America were female, female doctors in private practice earned considerably less than their male counterparts. This income disparity occurred even in specialties in which women were heavily concentrated. To indicate the magnitude of the differences, two relative frequency histograms one for males, one for females of income were displayed. The relative frequency distributions, based on a survey of 17,000 group-practice doctors, are shown in Table 2.3 page 52 along with the density distributions created by dividing the relative frequencies by the corresponding class interval widths. Looking at the relative frequency distributions, we see, for example, that the largest relative frequency for male doctors occurs for the 1991 income category 150,000 to 200,000, whereas the largest relative frequency for female doctors is associated with the categories 0 to 60,000 and 80,000 to 100,000. Generally speaking, female 3 3 100 80 60 40 20 20 40 60 80 100 Density × 10,000 Male doctors Female doctors Income 1000s 100 200 300 400 Figure 2.8 Solution and Discussion. Density His- tograms of 1991 Incomes for Male and Female Doctors Distributions of 1991 Income for Male and Female Doctors 52 Male Female Relative Relative Income 1,000’s frequency Density frequency Density [ 0, 60 .0737 .0012 .1919 .0032 [ 60, 80 .0842 .0042 .1414 .0071 [ 80, 100 .1053 .0053 .1919 .0096 [ 100, 125 .1579 .0063 .1616 .0065 [ 125, 150 .1263 .0051 .0909 .0036 [ 150, 200 .1684 .0034 .1111 .0022 [ 200, 250 .1263 .0025 .0606 .0012 [ 250, 300 .0947 .0019 .0404 .0008 [ 300, 400 .0632 .0006 .0101 .0001 Total 1.0000 1.0000 CHAPTER 2 DESCRIBING PATTERNS IN DATA doctors appear to make less than male doctors, since the largest relative frequencies for women are associated with the lower income categories, and the largest relative frequencies for men are associated with the middle income categories. But direct comparisons using relative frequencies are difficult in this case because the interval widths are different. Instead, compare the distributions of income with back-to-back density histograms. The density distributions in Table 2.3 are plotted as back-to-back density histograms in Figure 2.8. The picture is clear. Salaries of female doctors are fairly tightly concentrated dense in the 60,000 to 125,000 range, with less concentration in the upper income categories. Salaries of male doctors, on the other hand, are concentrated dense in the 80,000 to 150,000 range, with appreciable concentration relative to females in the upper income categories. This survey indicates that female doctors make less than male doctors, and the nature of TABLE 2.3 2.8 2.9 2.10 Gasoline Diesel . . . . . . . . . . . . Minitab or similar program recommended 53 16.44 7.19 8.50 7.42 9.92 4.24 10.28 10.16 11.20 14.25 12.79 9.60 13.50 13.32 6.47 11.35 29.11 12.68 9.15 9.70 7.51 9.90 9.77 11.61 10.25 11.11 9.09 8.53 12.17 10.24 8.29 15.90 10.18 8.88 11.94 9.54 12.34 8.51 10.43 10.87 26.16 12.95 7.13 11.88 16.93 14.70 12.03 10.32 8.98 9.70 12.72 9.49 8.22 13.70 8.21 15.86 9.18 12.49 17.32

2.4 GRAPHICAL DISPLAYS OF DATA DISTRIBUTIONS

FuelCost.dat the discrepancy is evident. Incomes of females are more tightly concentrated less variable than those of males, and this concentration occurs at the lower end relative to male incomes of the income scale. The graphical displays described in this section are extremely useful ways of looking at data. Modern computer software makes them easy to implement. Carefully constructed pictures provide an immediate impression of the general features of a data set and often suggest avenues for further study. Plots, charts, and graphs are key elements of exploratory data analysis. The RD expenditure numbers discussed in Example 2.2 are given here: 4 4 3 6 4 4 3 7 9 6 3 9 3 6 3 5 3 0 4 5 3 9 2 2 a. Construct a dot diagram. b. Is the dot diagram consistent with the stem-and-leaf diagram in Figure 2.2? Discuss. What, if anything, is wrong with the following choices of intervals for construct- ing a frequency distribution for data that run from 0 to 99? a. [ 0, 25 , [ 25, 50 , and [ 55, 100 b. [ 0, 20 , [ 20, 40 , [ 40, 80 , and [ 75, 100 In the first phase of a study of the cost of transporting milk from farms to dairy plants, a survey was taken of firms engaged in milk transportation. One of the variables measured was fuel cost. The fuel costs on a per-mile basis for 36 gasoline trucks and 23 diesel trucks are given here data courtesy of M. Keaton . EXERCISES 4 b b c, a?